PaliGemma: Receipt & Invoice JSON v2

Minyang Chen
6 min readSep 7, 2024

In previous work, I created multiple experimental Large Language Model (LLM) architectures to convert receipt images into JSON or XML formats. You can find details in my Medium article here and the corresponding GitHub repository here.

This Article Presents a new Model:

  • A novel model fine-tuned on Google PaliGemma
  • An efficient vision language model for single-turn vision tasks

This new architecture addresses several limitations of previous models, simplifying the processing while enhancing reliability and accuracy in JSON structure generation. Along with a new training dataset: mychen76/invoices-and-receipts_ocr_v2.

The aim of this new model is generate clean and accurate Receipt JSON output efficiently without the need to additional framework.

The primary objective of this new model is to generate clean and accurate receipt JSON outputs efficiently without relying on additional frameworks

Model architecture

Figure 1: PaliGemma Receipt JSON

New! Fine-tuned model

> Model Id: mychen76/paligemma-receipt-json-3b-mix-448-v2b

--

--

Minyang Chen
Minyang Chen

Written by Minyang Chen

Enthusiastic in AI, Cloud, Big Data and Software Engineering. Sharing insights from my own experiences.

No responses yet