DeepSeek’s New OCR Model Can Process Over 2 Lakh Pages Daily on a Single GPU

DeepSeek AI has announced DeepSeek-OCR, a new optical character recognition (OCR) system designed to improve how large language models handle long text contexts through optical 2D mapping.

The technology introduces a vision-based approach to context compression, converting text into compact visual tokens. DeepSeek claimed that it achieves over 96% OCR precision when compressing text at a 9x to 10x ratio, and about 60% accuracy, even at 20x compression.

DeepSeek-OCR comprises two key components, DeepEncoder and DeepSeek3B-MoE-A570M, working together to balance accuracy and efficiency. DeepEncoder reduces vision tokens before processing, preventing GPU overload even with high-resolution inputs.

On the OmniDocBench benchmark, the system outperformed existing OCR models such as GOT-OCR2.0 and MinerU2.0, using fewer vision tokens while maintaining higher efficiency.

DeepSeek reported that the model processes over 2,00,000 pages per day on a single NVIDIA A100 GPU and scales up to 33 million pages daily using 20 nodes.

The company said this scalability makes DeepSeek-OCR suitable for large-scale document digitisation and AI training data generation. It also supports multiple resolutions and document types, including charts, chemical formulas, and multilingual text.

DeepSeek added that its approach represents a new paradigm in language model efficiency by using visual modalities for compression. The system’s design allows smaller language models to decode visual representations effectively, indicating potential applications in memory optimisation and long-context processing.

Both the code and model weights for DeepSeek-OCR are available as an open-source model on GitHub. The company said it aims to support broader research into combining vision and language for more efficient AI systems.

DeepSeek said the paradigm “opens new possibilities for rethinking how vision and language modalities can be synergistically combined to enhance computational efficiency in large-scale text processing and agent systems.”
The release follows DeepSeek’s recent V3.2-Exp model, which reportedly achieves major efficiency gains in training and inference, furthering its push toward cheaper long-context processing for LLMs.

The post DeepSeek’s New OCR Model Can Process Over 2 Lakh Pages Daily on a Single GPU appeared first on Analytics India Magazine.

Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...