Extract Data from Custom Forms (Beta)

Extract Entity Value Pairs

The custom form solution allows the users to specify what information they want to extract from their custom forms. The expected entity names are defined in a form template in JSON format, and for each entity, the user can provide its key variations to the best of one’s knowledge. The key variations are not necessarily a complete list, but the more variations are provided, the better our ML model can understand the concept of this entity. For instance, if the user wants to extract the date of birth from forms, he/she can define an entity named date of birth, and provide its key variations in a list like ["date of birth", "birthday", "DOB", "birth date"].

Extract Tables

The custom form solution automatically detects all the tables, recognize the texts in cells and infer the location, row and column number of each table cell. Set AutomaticallyRecognizeTables in the form template to "true" to enable this function.

Here's an example of a form template that extracts the date of birth and last name, and detect tables.

{
    "Fields":
    [
        {
            "key":
            {
                "entity": "date_of_birth",
                "entity_type": [],
                "text":
                [
                    "date of birth",
                    "birthday",
                    "DOB",
                    "birth date"
                ]
            }
        },
        {
            "key":
            {
                "entity": "last_name",
                "entity_type": [],
                "text":
                [
                    "last name",
                    "family name",
                    "surname"
                ]
            }
        }
    ],
    "Tables": [],
    "AutomaticallyRecognizeTables": true,
    "Version": "3.0"
}

When you call the API, send in the form as an image or PDF, set task to form, set formTemplate to the path of form template JSON file, and specify the tabulatev2 modelId. The JSON response contains entity-value pairs for each field in the form.