Finetune LLM to convert a receipt image to json or xml
Motivation
convert a receipt image to a JSON or XML object is to make it easier to store, analyze, and manipulate the data in a structured and organized way. Both JSON (JavaScript Object Notation) and XML (Extensible Markup Language) are lightweight data interchange formats that are easy for humans to read and write and easy for machines to parse and generate. By converting a receipt image to a JSON or XML object, the data can be easily stored in a database or transmitted over the internet, making it more accessible and usable for a variety of applications.
Additionally, JSON and XML objects can be easily processed and analyzed using programming languages and tools, making it easier to extract insights and make decisions based on the data. Overall, converting a receipt image to a JSON or XML object can help to streamline data management and analysis, making it more efficient and effective.
{
'receipt': {
'store': 'The Lone Pine',
'address': '43 Manchester Road',
'city': 'Brisbane',
'country': 'Australia',
'phone': '617-3236-6207',
'invoice': 'Invoice 08000008',
'date': '09/04/08',
'table': 'Table',
'items': [
{'name': 'Carlsberg Bottle', 'price': '16.00', 'quantity': '2'},
{'name': 'Heineken Draft Standard.', 'price': '15.20', 'quantity': '1'},
{'name': 'Heineken Draft Half Liter.', 'price': '15.20', 'quantity': '1'},
{'name': 'Carlsberg Bucket (5 bottles).', 'price': '80.00', 'quantity': '1'},
{'name': 'Grilled Chicken Breast.', 'price': '74.00', 'quantity': '1'},
{'name': 'Sirloin Steak', 'price': '96.00', 'quantity': '1'},
{'name': 'Coke', 'price': '3.50', 'quantity': '1'},
{'name': 'Ice Cream', 'price': '18.00', 'quantity': '5'}
],
'subtotal': '327.30',
'tax': '16.36',
'service_charge': '32.73',
'total': '400.00'
}
}
While OCR engines today is up task but still require high-quality images to accurately recognize text. If the image is blurry, distorted, or has poor lighting, the OCR engine may not be able to accurately recognize the text.
Objective
The objective was to investigate and compare use case for finetuned LLM (Language Models) to do data Image-to-structure data conversion. If that is a better approach than coding an parsing algorithm because LLM’s are capable of generating text that is similar to a given input text. This makes them useful for tasks such as image-to-text conversion, text-to-text conversion, and machine translation.
- Experiment-1: IMAGE → OCR Engine → text boxes → LLM → JSON object
- Experiment-2: IMAGE →LLM → XML or JSON object
Setup
My fineuned Models for this experiment published in Huggingface.
- OCR-to-Json: https://huggingface.co/mychen76/mistral7b_ocr_to_json_v1 ( using latest mistralai-7b as base)
- Image-to-Json_and_XML: https://huggingface.co/mychen76/invoice-and-receipts_donut_v1 (using donut as base model)
My Dataset: mychen76/invoices-and-receipts_ocr_v1
My Hardware: Ryzen 5700g, 96GB memory, RTX 3090 24GB vram
Experiments Result:
Exp-1: Input OCR result to LLM then output a JSON object
>> Success, receipt image converted into json format — took 46s
[Notebook-1] Convert Receipt/Invoice Image-to-Json with OCR engine and Finetuned LLM Model
Exp-2: Input Image to LLM then output structure data XML/JSON
>> Success, receipt image converted into xml/json format — took 8.9s
[Notebook-2] Convert Receipt or Invoice Image-to-strucuture data (xml/json) using a finetuned LLM (OCR Free)
Observation:
Result show less mean more. experiment-2 only took 8.9s where experiment-2 took 46s both measure from a cold start to task completion— in term of performance result, LLM model with OCR free clearly demonstrate this more elegant approach. Accuracy wise is around 76+ percent with finetune model train in 3 epoch. better result could be obtain with further improvement in data and model training.
How it works?
Experiment-1: IMAGE → OCR Engine → text boxes → LLM → JSON object
let’s dive into details, here we take a random sample receipt image from internet.
import urllib.request
import numpy as np
from PIL import Image
# pull a random receipt image from internet
image_url='https://groups.google.com/group/jzebra-users/attach/d16dbba8a612edfa/Bill%20Image_Receipt.png?part=0.1'
local_image_id='bill_image_receipt.png'
urllib.request.urlretrieve(image_url,local_image_id)
receipt_image = Image.open(local_image_id)
receipt_image_array = np.array(receipt_image.convert('RGB'))
feed the image into a OCR engine (PaddleOCR) in this case
from paddleocr import PaddleOCR, draw_ocr
from ast import literal_eval
import json
paddleocr = PaddleOCR(lang="en",ocr_version="PP-OCRv4",show_log = False,use_gpu=True)
def paddle_scan(paddleocr,img_path_or_nparray):
result = paddleocr.ocr(img_path_or_nparray,cls=True)
result = result[0]
boxes = [line[0] for line in result] #boundign box
txts = [line[1][0] for line in result] #raw text
scores = [line[1][1] for line in result] # scores
return txts, result
# perform ocr scan
receipt_texts, receipt_boxes = paddle_scan(paddleocr,receipt_image_array)
print(50*"--","\ntext only:\n",receipt_texts)
print(50*"--","\nocr boxes:\n",receipt_boxes)
get the result text boxes in a long array list. essentially, the position of each text detected and recognized by the engine.
[
[[[188.0, 54.0], [453.0, 54.0], [453.0, 85.0], [188.0, 85.0]], ('The Lone Pine', 0.9998093247413635)],
[[[194.0, 96.0], [449.0, 98.0], [449.0, 122.0], [194.0, 120.0]], ('43 Manchester Road', 0.9989086985588074)],
[[[228.0, 127.0], [416.0, 130.0], [416.0, 154.0], [228.0, 151.0]], ('12480 Brisbane', 0.9684376120567322)],
[[[267.0, 162.0], [375.0, 162.0], [375.0, 186.0], [267.0, 186.0]], ('Australia', 0.9997145533561707)],
[[[234.0, 193.0], [409.0, 193.0], [409.0, 216.0], [234.0, 216.0]], ('617-3236-6207', 0.9996872544288635)],
[[[46.0, 255.0], [308.0, 255.0], [308.0, 278.0], [46.0, 278.0]], ('Invoice 08000008', 0.9919947385787964)],
[[[466.0, 255.0], [598.0, 255.0], [598.0, 278.0], [466.0, 278.0]], ('09/04/08', 0.9994759559631348)],
[[[42.0, 283.0], [132.0, 283.0], [132.0, 311.0], [42.0, 311.0]], ('Table', 0.996922492980957)],
[[[174.0, 283.0], [214.0, 283.0], [214.0, 311.0], [174.0, 311.0]], ('25', 0.999789297580719)],
[[[514.0, 284.0], [601.0, 284.0], [601.0, 311.0], [514.0, 311.0]], ('12:45', 0.9964934587478638)],
[[[67.0, 346.0], [291.0, 349.0], [291.0, 376.0], [67.0, 374.0]], ('2 Carlsberg Bottle', 0.9987956285476685)],
[[[515.0, 346.0], [599.0, 346.0], [599.0, 372.0], [515.0, 372.0]], ('16.00', 0.999927818775177)],
[
[[69.0, 385.0], [395.0, 387.0], [395.0, 411.0], [69.0, 409.0]],
('3 Heineken Draft Standard.', 0.9832844138145447)
],
[[[515.0, 384.0], [599.0, 384.0], [599.0, 409.0], [515.0, 409.0]], ('24.60', 0.9998160600662231)],
[
[[71.0, 423.0], [391.0, 423.0], [391.0, 446.0], [71.0, 446.0]],
('1 Heineken Draft Half Liter.', 0.9641053080558777)
],
[[[515.0, 421.0], [601.0, 421.0], [601.0, 450.0], [515.0, 450.0]], ('15.20', 0.9998877644538879)],
[
[[69.0, 460.0], [430.0, 461.0], [430.0, 485.0], [69.0, 484.0]],
('2 Carlsberg Bucket (5 bottles).', 0.9744488596916199)
],
[[[515.0, 461.0], [599.0, 461.0], [599.0, 486.0], [515.0, 486.0]], ('80.00', 0.9999421834945679)],
[
[[69.0, 498.0], [367.0, 500.0], [367.0, 524.0], [69.0, 522.0]],
('4 Grilled Chicken Breast.', 0.9772973656654358)
],
[[[515.0, 499.0], [599.0, 499.0], [599.0, 524.0], [515.0, 524.0]], ('74.00', 0.9999668002128601)],
[[[68.0, 534.0], [250.0, 537.0], [250.0, 562.0], [68.0, 560.0]], ('3 Sirloin Steak', 0.9997310638427734)],
[[[515.0, 537.0], [599.0, 537.0], [599.0, 561.0], [515.0, 561.0]], ('96.00', 0.9999544024467468)],
[[[67.0, 571.0], [162.0, 574.0], [161.0, 601.0], [67.0, 598.0]], ('1 Coke', 0.9997828006744385)],
[[[530.0, 572.0], [602.0, 572.0], [602.0, 601.0], [530.0, 601.0]], ('3.50', 0.9999454617500305)],
[[[69.0, 609.0], [219.0, 613.0], [218.0, 638.0], [68.0, 634.0]], ('5 Ice Cream', 0.9914573431015015)],
[[[516.0, 611.0], [599.0, 611.0], [599.0, 637.0], [516.0, 637.0]], ('18.00', 0.9999335408210754)],
[[[154.0, 664.0], [288.0, 664.0], [288.0, 688.0], [154.0, 688.0]], ('Subtotal', 0.9990756511688232)],
[[[499.0, 664.0], [599.0, 664.0], [599.0, 688.0], [499.0, 688.0]], ('327.30', 0.9999766945838928)],
[
[[155.0, 701.0], [397.0, 701.0], [397.0, 724.0], [155.0, 724.0]],
('Sales/Gov Tax - 5%', 0.9551984071731567)
],
[[[514.0, 697.0], [601.0, 697.0], [601.0, 724.0], [514.0, 724.0]], ('16.36', 0.9998232126235962)],
[
[[155.0, 733.0], [419.0, 733.0], [419.0, 757.0], [155.0, 757.0]],
('Service Charge - 10%', 0.9921121597290039)
],
[[[512.0, 728.0], [601.0, 731.0], [600.0, 759.0], [511.0, 757.0]], ('32.73', 0.9999620318412781)],
[[[154.0, 775.0], [335.0, 775.0], [335.0, 799.0], [154.0, 799.0]], ('GRAND TOTAL', 0.9899778962135315)],
[[[499.0, 778.0], [599.0, 778.0], [599.0, 802.0], [499.0, 802.0]], ('376.40', 0.9999797940254211)],
[[[39.0, 831.0], [223.0, 831.0], [223.0, 859.0], [39.0, 859.0]], ('Thank you and', 0.9922560453414917)],
[[[336.0, 831.0], [407.0, 831.0], [407.0, 860.0], [336.0, 860.0]], ('Cash', 0.9998618364334106)],
[[[499.0, 831.0], [601.0, 831.0], [601.0, 859.0], [499.0, 859.0]], ('400.00', 0.9998553395271301)],
[[[38.0, 866.0], [220.0, 862.0], [220.0, 891.0], [38.0, 895.0]], ('see you again!', 0.9798097014427185)],
[[[336.0, 864.0], [438.0, 869.0], [437.0, 898.0], [335.0, 894.0]], ('Change', 0.9998979568481445)],
[[[515.0, 867.0], [599.0, 867.0], [599.0, 892.0], [515.0, 892.0]], ('23.60', 0.9999335408210754)],
[[[37.0, 901.0], [108.0, 901.0], [108.0, 930.0], [37.0, 930.0]], ('John', 0.9990780353546143)],
[
[[73.0, 962.0], [569.0, 965.0], [569.0, 991.0], [73.0, 989.0]],
('Bring this bill back within the next 10 days', 0.9880536794662476)
],
[
[[50.0, 1000.0], [591.0, 1000.0], [591.0, 1023.0], [50.0, 1023.0]],
("and get 15% discount on that day's food bill..", 0.9851430654525757)
]
]
now, we have the text boxes, let’s load a finetune model that can take the input and convert into a json object.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig, BitsAndBytesConfig
# quantization_config = BitsAndBytesConfig(llm_int8_enable_fp32_cpu_offload=True)
bnb_config = BitsAndBytesConfig(
llm_int8_enable_fp32_cpu_offload=True,
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
# control model memory allocation between devices for low GPU resource (0,cpu)
device_map = {
"transformer.word_embeddings": 0,
"transformer.word_embeddings_layernorm": 0,
"lm_head": 0,
"transformer.h": 0,
"transformer.ln_f": 0,
"model.embed_tokens": 0,
"model.layers":0,
"model.norm":0
}
device = "cuda" if torch.cuda.is_available() else "cpu"
# model use for inference
model_id="mychen76/mistral7b_ocr_to_json_v1"
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.float16,
quantization_config=bnb_config,
device_map=device_map)
# tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
then run inference by constructing a prompt.
prompt=f"""### Instruction:
You are POS receipt data expert, parse, detect, recognize and convert following receipt OCR image result into structure receipt data object.
Don't make up value not in the Input. Output must be a well-formed JSON object.```json
### Input:
{receipt_boxes}
### Output:
"""
with torch.inference_mode():
inputs = tokenizer(prompt,return_tensors="pt",truncation=True).to(device)
outputs = model.generate(**inputs, max_new_tokens=512) ##use_cache=True, do_sample=True,temperature=0.1, top_p=0.95
result_text = tokenizer.batch_decode(outputs)[0]
print(result_text)
and take output
{
'receipt': {
'store': 'The Lone Pine',
'address': '43 Manchester Road',
'city': 'Brisbane',
'country': 'Australia',
'phone': '617-3236-6207',
'invoice': 'Invoice 08000008',
'date': '09/04/08',
'table': 'Table',
'items': [
{'name': 'Carlsberg Bottle', 'price': '16.00', 'quantity': '2'},
{'name': 'Heineken Draft Standard.', 'price': '15.20', 'quantity': '1'},
{'name': 'Heineken Draft Half Liter.', 'price': '15.20', 'quantity': '1'},
{'name': 'Carlsberg Bucket (5 bottles).', 'price': '80.00', 'quantity': '1'},
{'name': 'Grilled Chicken Breast.', 'price': '74.00', 'quantity': '1'},
{'name': 'Sirloin Steak', 'price': '96.00', 'quantity': '1'},
{'name': 'Coke', 'price': '3.50', 'quantity': '1'},
{'name': 'Ice Cream', 'price': '18.00', 'quantity': '5'}
],
'subtotal': '327.30',
'tax': '16.36',
'service_charge': '32.73',
'total': '400.00'
}
}
so far, LLM result looks good, a well-formed json object.
Experiment-2: IMAGE →LLM → XML or JSON object
let’s dive into details, the conversion is simplified only take an input image.
Get some test data first — sample invoice.png
from datasets import load_dataset
dataset_id="mychen76/invoices-and-receipts_ocr_v1"
dataset = load_dataset(dataset_id)
def display_record(record):
image1 = record['image']
print("image size:", image1.size)
display(image1.resize((800,600)))
return image1
## pull a sample invoice image
input_image = display_record(dataset['test'][1])
next, load the model and processor
import torch
from transformers import DonutProcessor, VisionEncoderDecoderModel
model_id="mychen76/invoice-and-receipts_donut_v1"
def load_model(model_id=model_id):
processor = DonutProcessor.from_pretrained(model_id)
model = VisionEncoderDecoderModel.from_pretrained(model_id)
return processor, model
processor, model = load_model(model_id)
get XML output
def generateTextInImage(processor,model,input_image,task_prompt="<s_receipt>"):
pixel_values = processor(input_image, return_tensors="pt").pixel_values
print ("input pixel_values: ",pixel_values.shape)
task_prompt = "<s_receipt>"
decoder_input_ids = processor.tokenizer(task_prompt, add_special_tokens=False, return_tensors="pt")["input_ids"]
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
outputs = model.generate(pixel_values.to(device),
decoder_input_ids=decoder_input_ids.to(device),
max_length=model.decoder.config.max_position_embeddings,
early_stopping=True,
pad_token_id=processor.tokenizer.pad_token_id,
eos_token_id=processor.tokenizer.eos_token_id,
use_cache=True,
num_beams=1,
bad_words_ids=[[processor.tokenizer.unk_token_id]],
return_dict_in_generate=True,
output_scores=True,)
return outputs
def generateOutputXML(processor,model, input_image, task_start="<s_receipt>",task_end="</s_receipt>"):
import re
outputs=generateTextInImage(processor,model,input_image,task_prompt=task_start)
sequence = processor.batch_decode(outputs.sequences)[0]
sequence = sequence.replace(processor.tokenizer.eos_token, "").replace(processor.tokenizer.pad_token, "")
sequence = re.sub(r"<.*?>", "", sequence, count=1).strip() # remove first task start token
return sequence
## generate xml
xml=generateOutputXML(processor,model,input_image)
print(xml)
<s_header>
<s_invoice_no> 13194726</s_invoice_no>
<s_invoice_date> 05/29/2021</s_invoice_date>
<s_seller> Hopkins and
Sons 62283 Flores Tunnel North Luis, IA 69983</s_seller>
<s_client> Sims PLC USS Kramer FPO AA
81651</s_client>
<s_seller_tax_id> 952-73-7223</s_seller_tax_id>
<s_client_tax_id>
995-88-9495</s_client_tax_id>
<s_iban> GB31LZX520242755934691</s_iban>
</s_header>
<s_items>
<s_item_desc> Beach Lunch
Lounge Striped Shirt Dress Large Navy Blue White Long Sleeve Casual</s_item_desc>
<s_item_qty>
1,00</s_item_qty>
<s_item_net_price> 16,99</s_item_net_price>
<s_item_net_worth> 16,99</s_item_net_worth>
<s_item_vat>
10%</s_item_vat>
<s_item_gross_worth> 18,69</s_item_gross_worth>
<sep/>
<s_item_desc> Jams World Hawaiian 0 Dress
Rayon SZ.L</s_item_desc>
<s_item_qty> 5,00</s_item_qty>
<s_item_net_price> 65,00</s_item_net_price>
<s_item_net_worth>
325,00</s_item_net_worth>
<s_item_vat> 10%</s_item_vat>
<s_item_gross_worth>
357,50</s_item_gross_worth>
<sep/>
<s_item_desc> LuLaRoe Nicole Dress Size Large 26</s_item_desc>
<s_item_qty>
2,00</s_item_qty>
<s_item_net_price> 1,99</s_item_net_price>
<s_item_net_worth> 3,98</s_item_net_worth>
<s_item_vat>
10%</s_item_vat>
<s_item_gross_worth> 4,38</s_item_gross_worth>
<sep/>
<s_item_desc> phynny Was Medium Linen Wrap
Dress Dessert Rose Embroidered Bohemian</s_item_desc>
<s_item_qty> 2,00</s_item_qty>
<s_item_net_price>
89,99</s_item_net_price>
<s_item_net_worth> 179,98</s_item_net_worth>
<s_item_vat>
10%</s_item_vat>
<s_item_gross_worth> 197,98</s_item_gross_worth>
<sep/>
<s_item_desc> Eileen Fisher Women's Long
Sleeve Fleece Lined Front Pockets Dress XS Gray</s_item_desc>
<s_item_qty> 2,00</s_item_qty>
<s_item_net_price>
15,99</s_item_net_price>
<s_item_net_worth> 31,98</s_item_net_worth>
<s_item_vat>
10%</s_item_vat>
<s_item_gross_worth> 35,18</s_item_gross_worth>
<sep/>
<s_item_desc> Hanna Anderson Women's L Large
Coral Short Sleeve Casual Fall Tee Shirt Dress</s_item_desc>
<s_item_qty> 1,00</s_item_qty>
<s_item_net_price>
24,00</s_item_net_price>
<s_item_net_worth> 24,00</s_item_net_worth>
<s_item_vat>
10%</s_item_vat>
<s_item_gross_worth> 26,40</s_item_gross_worth>
</s_items>
<s_summary>
<s_total_net_worth>
$581,93</s_total_net_worth>
<s_total_vat> $58,19</s_total_vat>
<s_total_gross_worth> $
640,12</s_total_gross_worth>
</s_summary>
and in JSON format
{
'header': {
'invoice_no': '13194726',
'invoice_date': '05/29/2021',
'seller': 'Hopkins and Sons 62283 Flores Tunnel North Luis, IA 69983',
'client': 'Sims PLC USS Kramer FPO AA 81651',
'seller_tax_id': '952-73-7223',
'client_tax_id': '995-88-9495',
'iban': 'GB31LZX520242755934691'
},
'items': [
{
'item_desc': 'Beach Lunch Lounge Striped Shirt Dress Large Navy Blue White Long Sleeve Casual',
'item_qty': '1,00',
'item_net_price': '16,99',
'item_net_worth': '16,99',
'item_vat': '10%',
'item_gross_worth': '18,69'
},
{
'item_desc': 'Jams World Hawaiian 0 Dress Rayon SZ.L',
'item_qty': '5,00',
'item_net_price': '65,00',
'item_net_worth': '325,00',
'item_vat': '10%',
'item_gross_worth': '357,50'
},
{
'item_desc': 'LuLaRoe Nicole Dress Size Large 26',
'item_qty': '2,00',
'item_net_price': '1,99',
'item_net_worth': '3,98',
'item_vat': '10%',
'item_gross_worth': '4,38'
},
{
'item_desc': 'phynny Was Medium Linen Wrap Dress Dessert Rose Embroidered Bohemian',
'item_qty': '2,00',
'item_net_price': '89,99',
'item_net_worth': '179,98',
'item_vat': '10%',
'item_gross_worth': '197,98'
},
{
'item_desc': "Eileen Fisher Women's Long Sleeve Fleece Lined Front Pockets Dress XS Gray",
'item_qty': '2,00',
'item_net_price': '15,99',
'item_net_worth': '31,98',
'item_vat': '10%',
'item_gross_worth': '35,18'
},
{
'item_desc': "Hanna Anderson Women's L Large Coral Short Sleeve Casual Fall Tee Shirt Dress",
'item_qty': '1,00',
'item_net_price': '24,00',
'item_net_worth': '24,00',
'item_vat': '10%',
'item_gross_worth': '26,40'
}
],
'summary': {'total_net_worth': '$581,93', 'total_vat': '$58,19', 'total_gross_worth': '$ 640,12'}
}
{
'header': {
'invoice_no': '13194726',
'invoice_date': '05/29/2021',
'seller': 'Hopkins and Sons 62283 Flores Tunnel North Luis, IA 69983',
'client': 'Sims PLC USS Kramer FPO AA 81651',
'seller_tax_id': '952-73-7223',
'client_tax_id': '995-88-9495',
'iban': 'GB31LZX520242755934691'
},
'items': [
{
'item_desc': 'Beach Lunch Lounge Striped Shirt Dress Large Navy Blue White Long Sleeve Casual',
'item_qty': '1,00',
'item_net_price': '16,99',
'item_net_worth': '16,99',
'item_vat': '10%',
'item_gross_worth': '18,69'
},
{
'item_desc': 'Jams World Hawaiian 0 Dress Rayon SZ.L',
'item_qty': '5,00',
'item_net_price': '65,00',
'item_net_worth': '325,00',
'item_vat': '10%',
'item_gross_worth': '357,50'
},
{
'item_desc': 'LuLaRoe Nicole Dress Size Large 26',
'item_qty': '2,00',
'item_net_price': '1,99',
'item_net_worth': '3,98',
'item_vat': '10%',
'item_gross_worth': '4,38'
},
{
'item_desc': 'phynny Was Medium Linen Wrap Dress Dessert Rose Embroidered Bohemian',
'item_qty': '2,00',
'item_net_price': '89,99',
'item_net_worth': '179,98',
'item_vat': '10%',
'item_gross_worth': '197,98'
},
{
'item_desc': "Eileen Fisher Women's Long Sleeve Fleece Lined Front Pockets Dress XS Gray",
'item_qty': '2,00',
'item_net_price': '15,99',
'item_net_worth': '31,98',
'item_vat': '10%',
'item_gross_worth': '35,18'
},
{
'item_desc': "Hanna Anderson Women's L Large Coral Short Sleeve Casual Fall Tee Shirt Dress",
'item_qty': '1,00',
'item_net_price': '24,00',
'item_net_worth': '24,00',
'item_vat': '10%',
'item_gross_worth': '26,40'
}
],
'summary': {'total_net_worth': '$581,93', 'total_vat': '$58,19', 'total_gross_worth': '$ 640,12'}
}
Nice, the result return is a well structure data in both XML and JSON.
Summary
Overall, the experiment was a success, fine-tuning LLM clearly show good result in text-to-text or image-to-text translation task.
Hope you enjoy reading and learn something new as much I enjoyed doing this LLM finetuning task and experimental tasks.
Have a nice day!