Automated Document Processing with GenAI

Author: Fausto Albers

Date: 14 June 2024

Subject: Technical Deep Dive on the ADP POC presented at INGenious and Datastax AI Event on 13th June 2024, in Amsterdam

slidedeck ING - Fausto ALbers presentations - Excalidraw+

Predictable Outcomes with Probabilistic Systems from Unstructured Data - Automation for SMB

Small and medium-sized enterprises (SMBs) often encounter significant challenges in automation. This is particularly true for the Hospitality Industry, where the volume and diversity of documents can be extensive. At a recent tech conference by ING and Datastax, I presented an innovative solution that leverages Generative AI (GenAI) to streamline and enhance document processing. This blog post delves into the technical details of this tool, highlighting its architecture, components, and the potential impact it can have on SMBs.

Busy-Automation paradox

The Challenge: Complex Document Processing

Documents come in various formats, from neatly typed invoices to scanned handwritten notes. Extracting meaningful data from these documents and feeding it to Large Language Models (LLMs) for further processing is a complex task. The accuracy of LLMs is highly dependent on the quality of the input data, making preprocessing a crucial step. This is where the LLM Whisperer API comes into play.

LLM Whisperer API: Making Data LLM-Ready

The LLM Whisperer API, integrated with Unstract, is designed to handle the complexities of document data. It converts diverse document formats into LLM-friendly outputs by preserving the original layout and extracting key data accurately. The API can automatically switch to OCR mode for non-text documents, ensuring comprehensive data capture.

One of the standout features of LLM Whisperer is its layout-preserving mode, which ensures high accuracy by maintaining the structure of complex documents. This is particularly useful for documents with repeating sections or detailed line items, common in invoices and receipts. Additionally, the API offers auto-compaction, which reduces the number of tokens processed, thereby optimizing both time and cost.

The Zen of Instructor: Guided AI Responses

To further enhance the accuracy and reliability of the document processing tool, we integrated the Instructor Library. This library provides structured prompting and robust validation mechanisms, ensuring that the AI’s output adheres to predefined formats and standards. By leveraging Pydantic and OpenAI Function Calling for data validation, Instructor helps building guardrails, so that only correctly structured data is returned, in valid JSON.

The Instructor Library’s guided response framework allows for the creation of detailed and accurate prompts. This structured approach helps in maintaining consistency and accuracy, making the tool reliable across different applications and use cases.

Table of Contents

Predictable Outcomes with Probabilistic Systems from Unstructured Data - Automation for SMB

The Challenge: Complex Document Processing

LLM Whisperer API: Making Data LLM-Ready

The Zen of Instructor: Guided AI Responses

Langsmith: Observability in LLM Applications