Close Menu
    Facebook X (Twitter) Instagram
    Articles Stock
    • Home
    • Technology
    • AI
    • Pages
      • About us
      • Contact us
      • Disclaimer For Articles Stock
      • Privacy Policy
      • Terms and Conditions
    Facebook X (Twitter) Instagram
    Articles Stock
    AI

    DeepSeek AI Releases DeepSeek-OCR 2 with Causal Visible Circulation Encoder for Format Conscious Doc Understanding

    Naveed AhmadBy Naveed Ahmad30/01/2026Updated:30/01/2026No Comments3 Mins Read
    blog banner23 59

    **Revolutionizing Document OCR: DeepSeek AI’s Breakthrough Technology**

    Remember the last time you had to spend hours sifting through a stack of papers, deciphering handwritten notes, and wrestling with formatting nightmares? Those days are numbered! Introducing DeepSeek-OCR 2, the latest innovation from DeepSeek AI that’s about to change the game for document processing.

    **How Did We Get Here?**

    DeepSeek-OCR 2 is an open-source document OCR and understanding system that’s redefining the way we process documents. We’ve taken a fundamentally different approach by leveraging DeepEncoder V2, a language model-style transformer that converts a two-dimensional page into a one-dimensional sequence of visualized tokens, already aligned with a realized reading order.

    This is a significant departure from the traditional method of flattening images into a raster sequence. Most multimodal models do this and apply a transformer with static positional encodings, but it’s a poor match for documents with multi-column layouts, nested tables, and mixed language areas. Humans, on the other hand, exhibit a semantic order that jumps between areas, just like DeepSeek-OCR 2!

    **The Vision Tokenizer: The Unsung Hero**

    The vision tokenizer is inherited from DeepSeek-OCR and uses an 80M parameter SAM base spine followed by two convolution layers. It takes the image and downsizes it, reducing the visual token count by a factor of 16 and compressing features into an embedding dimension of 896.

    But here’s the kicker: DeepSeek-OCR 2 uses a multi-crop strategy to cover dense pages without letting the token count explode. A global view at 1024 × 1024 resolution produces 256 tokens, and then up to 6 local crops at 768 × 768 resolution add 144 tokens each. The visual token count ranges from 256 to 1120 per page – just what we needed!

    **DeepEncoder-V2: The Game-Changer**

    DeepEncoder-V2 is built by instantiating a Qwen2-0.5B model transformer as the vision encoder. The input sequence is created by adding a sequence of visual tokens from the tokenizer as the prefix and a set of learnable “causal flow” tokens as the suffix.

    **The Training Pipeline**

    The training data pipeline follows DeepSeek-OCR and focuses on OCR-intensive content, with OCR data accounting for 80% of the mix. We rebalance the sampling across text, formulas, and tables using a 3:1:1 ratio, ensuring the model sees enough structure-heavy examples.

    **Benchmarking Results on OmniDocBench**

    Our primary analysis uses OmniDocBench-v1.5, a benchmarking tool that includes 1355 pages in 9 document classes in Chinese and English. The results are astonishing! DeepSeek-OCR 2 achieves an overall OmniDocBench score of 91.09, a gain of 3.73 points over the original DeepSeek-OCR baseline, and does it with a slightly lower token budget.

    **Key Takeaways**

    * DeepSeek-OCR 2 replaces a CLIP ViT model encoder with DeepEncoder-V2, a Qwen2-0.5B-based language model encoder that converts a 2D document page into a 1D sequence of causal flow tokens aligned with a realized reading order.
    * The vision tokenizer uses an 80M parameter SAM base spine with convolutions, multi-crop global and local views, and keeps the visual token count between 256 and 1120 tokens per page.
    * Training follows a 3-stage pipeline: encoder pretraining, joint question enhancement with DeepSeek-3B-A500M, and decoder-only fine-tuning with the encoder frozen.

    Want to learn more? Check out the [paper](https://github.com/deepseek-ai/DeepSeek_OCR2_paper.pdf), [repo](https://github.com/deepseek-ai/DeepSeek-OCR-2), and [model weights](https://huggingface.co/deepseek-ai/DeepSeek-OCR-2). And, as always, follow us on Twitter, join our 100k+ ML SubReddit, and subscribe to our Newsletter!

    Naveed Ahmad

    Related Posts

    The White Home needs AI firms to cowl price hikes. Most have already mentioned they might.

    26/02/2026

    Riley Walz, the Jester of Silicon Valley, Is Becoming a member of OpenAI

    26/02/2026

    An accountant gained a giant jackpot on Kalshi by betting in opposition to DOGE

    26/02/2026
    Leave A Reply Cancel Reply

    Categories
    • AI
    Recent Comments
      Facebook X (Twitter) Instagram Pinterest
      © 2026 ThemeSphere. Designed by ThemeSphere.

      Type above and press Enter to search. Press Esc to cancel.