Extract Data from Invoices (Beta)

The Invoice solution extracts important information such as invoice number, invoice date, due date, total amount, etc., from any invoice image or PDF. In real world, there are many key word variations for the same concept. For instance, “invoice #”, “invoice number”, “INV No.”, etc., all refer to the same thing. Our solution leverages machine learning models to understand the document context, consolidates the key variations in the same entity (for example, invoice_number ) and extracts the correct values accordingly.

The Einstein OCR invoice solution currently supports the list of entities below.

Entity Name

Key Variation Examples

invoice_number

invoice number, invoice #, invoice, invoice ID...

invoice_date

invoice date, date, ...

due_date

due date, due on, ...

purchase_order

PO number, PO#, purchase order, ...

total_amount

total, total amount, ...

total_tax_amount

tax, total tax, ...

amount_due

due, amount due, ...

When you call the API, send in the form as an image or PDF, set task to invoice and specify the tabulatev2 modelId. The JSON response contains entity-value pairs for each field in the form.

In the example above, the extracted entity value pairs are:

Entity

Value

invoice_number

940226

invoice_date

2/18/1994

due_date

3/20/1994

total_amount

852.8