Member-only story

PaliGemma: Receipt & Invoice JSON v2

6 min readSep 7, 2024

In previous work, I created multiple experimental Large Language Model (LLM) architectures to convert receipt images into JSON or XML formats. You can find details in my Medium article here and the corresponding GitHub repository here.

This Article Presents a new Model:

A novel model fine-tuned on Google PaliGemma
An efficient vision language model for single-turn vision tasks

This new architecture addresses several limitations of previous models, simplifying the processing while enhancing reliability and accuracy in JSON structure generation. Along with a new training dataset: mychen76/invoices-and-receipts_ocr_v2.

The aim of this new model is generate clean and accurate Receipt JSON output efficiently without the need to additional framework.

The primary objective of this new model is to generate clean and accurate receipt JSON outputs efficiently without relying on additional frameworks

Model architecture

New! Fine-tuned model

> Model Id: mychen76/paligemma-receipt-json-3b-mix-448-v2b

PaliGemma: Receipt & Invoice JSON v2

Model architecture

Written by Minyang Chen

No responses yet