Detect Text

Returns a prediction from an OCR model for the specified image or PDF via URL or local file.

Request Parameters

Name

Type

Description

Available Version

formType

string

Standard form sent to the model. Use this parameter with the parameter task value of form. Valid values:
1040—United States federal income tax return, only 2019.
dl—Driver’s license. Supports driver’s licenses only from the United States, including Washington, DC.
passport—Supports passports only from Australia, Canada, United Kingdom, and United States.
paystub—Employee pay record.
permanentResident—Permanent resident card. Supports cards only from the United States, including Washington, DC.
w2—United States tax form that reports employee wages and taxes.

See Detect Text in Standard Forms.

2.0

formTemplate (Beta)

string

The path to the form template (in JSON format) of input custom form. The form template defines the entities to extract and their key variations. See examples at Extract Data From Custom Forms. Either formType or formTemplate needs to be specified when task=form.

modelId

string

ID of the model that makes the prediction.

  • OCRModel—Use this model for:
    • Business card, pass a task parameter value of contact.
  • tabulatev2—Use this model for:
    • Unformatted text, pass a task parameter value of text.
    • Text in tables. Pass a task parameter value of table.
    • Standard form, pass a task parameter value of form.
    • Custom form, pass a task parameter value of form (Beta).
    • Invoices, pass a task parameter value of invoice (Beta).
    • Support typed and handwritten text recognition (Beta).

2.0

sampleContent

string

Binary content of image or PDF file uploaded as multipart/form-data.

2.0

sampleLocation

string

URL of the image or PDF file. Use this parameter when sending in a file from a web location. The URL must be a direct link to the file.

2.0

sampleId

string

String that you can pass in to tag the prediction. Optional. Can be any value, and is returned in the response.

2.0

task

string

Optional. Designates the type of data in the image. Default is text. Valid values are:

  • contact—Image or PDF is a business card. Returns the entity for the detected text. For example, if the detected text is +16505551234, the tag contains the value PHONE.
  • form—Image or PDF is one of the supported standard or custom forms (Beta). Use this parameter with the formType parameter to specify the standard form. Use this parameter with the formTemplate parameter to process the custom form (Beta).
  • invoice—Image or PDF is an invoice. Return the extracted entity value pairs and tables. See examples at Extract Data from Invoices (Beta).
  • table—Image or PDF contains a table. Returns the row and column of the detected text.
  • text—Image or PDF contains unformatted text.

2.0

Form Template Content (Beta)

Name

Type

Required

Description

Fields

array of Key objects

Yes

Array of Key objects (see the Key Object table)

Table

array of TableCell objects

No

Array of TabelCell objects. Leave it as an empty array if no header information is specified.

AutomaticallyRecognizeTables

boolean

Yes

Choose from [true, false]. If true, the automatically recognized tables will be returned in response.

Version

string

Yes

Choose from ["1.0", "2.0", "3.0"]

Key Object (Beta)

Name

Type

Required

Description

key

object

Yes

See the Contents of Key Object table for descriptions.

Content of Key Object (Beta)

Name

Type

Required

Description

entity

string

Yes

The entity (field name) of this entity-value pair. Each entity should be an unique identifier within this form template. It can be any UTF8 strings.

text

array

Yes

A non-empty array of the key variations of the entity. If it is a virtual key, choose from ["person", "phone", "email", "address", "website", "org", "datetime"].

entity_type

array

No

The Salesforce form field type of this entity.

Leave it as an empty array if not specified. If empty, the default entity_type TEXT will be used.

There is no entity_type for virtual keys because the data format will be validated using the text information.

Keep the following points in mind when sending a file in for prediction:

  • Orientation—The model handles slight image or PDF orientation changes but not above 30-40%. Accuracy is better for files in which the text has a straight vertical orientation.

  • Max File Size—The maximum image or PDF file size you can pass to this resource is 10 MB.

  • Max Number of Pages—The maximum number of pages in a PDF is based on the value of the task parameter.

    • contact—The maximum number of pages is five. There should be only one business card per page.
    • table—The maximum number of pages is eight. The model can process multiple tables on a page, but a table that spans multiple pages is identified as separate tables. For example, if you have a table that spans pages one and two, the model returns results for two tables.
    • text —The maximum number of pages is eight.
    • form—The maximum number of pages is eight.
  • File Types—The supported file types are PNG, JPG, JPEG, and PDF.

  • Tables—The model can process multiple tables on a page, but a table that spans multiple pages is identified as separate tables. For example, if you have a table that spans pages one and two, the model returns results for two tables.

  • Response Sort Order—The detected strings returned in the response are sorted by probability.

  • Max Words Returned Per Image—When you send in an image file, the model returns a maximum of 600 words per image.

  • Supported Languages—Einstein OCR supports English only. The characters supported are:

!\"#$%&'()*+,-./0123456789:;<=>[email protected][\\]^_`abcdefghijklmnopqrstuvwxyz{|}~£≈
  • Checkboxes—The model currently doesn't support checkboxes in any text.

Response Body

Name

Type

Description

Available Version

object

string

Object returned; in this case, predictresponse.

2.0

probabilities

array

Array of probabilities for the prediction.

2.0

sampleId

string

Same value as request parameter. Returned only if the sampleId request parameter is provided.

2.0

task

string

Same value as request parameter. Returns text if the request parameter isn't supplied.

2.0

Probabilities Response Body

Name

Type

Description

Available Version

attributes

object

Contains additional attributes related to the task parameter. If the task parameter is table, the row and column IDs for the detected text are returned. If the task parameter is contact, the detected entity tags will be returned. If the task parametr is form, data for the form key and the form value are returned.

2.0

boundingBox

object

Contains the coordinates for the bounding box that encloses the detected text.

2.0

label

string

Content of the detected text when task is text or table. The label is “key-value” or “table” when the task is form, which indicates the type of this block.

2.0

probability

float

Probability value for the input. Values are between 0–1.

2.0

BoundingBox Response Body

Name

Type

Description

Available Version

maxX

int

X-coordinate of the right side of the bounding box. Number of pixels from the left edge of the image.

2.0

maxY

int

Y-coordinate of the bottom of the bounding box. Number of pixels from the top edge of the image.

2.0

minX

int

X-coordinate of the left side of the bounding box. The origin of the coordinate system is the top-left of the image. Number of pixels from the left edge of the image.

2.0

minY

int

Y-coordinate of the top of the bounding box. Number of pixels from the top edge of the image.

2.0

Attributes Response Body

Returned when label is “key-value”.

Name

Type

Description

Available Version

blockId

int

Unique ID for the key-value pair. Returned only when the task parameter value is form.

2.0

key

object

Contains the detected text in the form that’s part of the form. For example, in a driver's license, the key might be 4a ISS for issue date. Returned only when the task parameter value is form.

2.0

language

string

Language of the key and value. Defaults to English. Only English is currently supported. Returned only when the task parameter value is form.

2.0

pageNumber

int

Page that contains the identified text. The model always returns 1, except when you send in a multi-page PDF.

2.0

value

object

Contains the detected text of the data that was entered in the form field. For example, in a driver's license, the value might be 09/13/1999 for issue date. Returned only when the task parameter value is form.

2.0

normalizedText

string

Optional. Normalized representation of the key/value string.

For example, a raw text value of

123 Main Street, Suite 54, San Francisco, CA, 94101, U.S.A.

will be parsed as

{\"Street\":\"123 Main Street Suite 54\",\"City\":\"San Francisco\",\"State\":\"CA\",\"ZipCode\":\"94101\",\"Country\":"U.S.A"}

Empty fields are excluded from the compound JSON response.

Note Currently only U.S. addresses are normalized.

2.0

Attributes cellLocation Response Body

Returned when you pass a task parameter value of table

Name

Type

Description

Available Version

colIndex

int

Index of the column that contains the detected text.

2.0

rowIndex

int

Index of the row that contains the detected text.

2.0

Attributes Tag Response Body

Returned when you pass a task parameter value of contact

Name

Type

Description

Available Version

tag

string

Entity that the model predicts for the detected text. Valid values:

  • ADDRESS
  • EMAIL
  • FAX
  • HOME_PHONE
  • MOBILE_PHONE
  • OFFICE_PHONE
  • ORG
  • OTHER
  • PERSON
  • PHONE
  • WEBSITE

2.0

Attributes Key Response Body

Returned when you pass a task parameter value of form.

Name

Type

Description`

Available Version

boundingBox

object

Contains the coordinates for the bounding box that encloses the key. If text does not exist, it is [1,1,1,1].

2.0

entity

string

For the key text, specifies the type of form field. For example, in a driver's license, the key text can be 4a iss. The OCR model returns an entity value of issue_date.

2.0

text

string

Detected text in the form that’s part of the form. For example, in a driver's license, the key text could be 4a iss. It is nonexistent for in custom form solution.

2.0

Attributes Key boundingBox Response Body

The bounding box for the form key.

Name

Type

Description

Available Version

maxX

int

X-coordinate of the right side of the bounding box. Number of pixels from the left edge of the image.

2.0

maxY

int

Y-coordinate of the bottom of the bounding box. Number of pixels from the top edge of the image.

2.0

minX

int

X-coordinate of the left side of the bounding box. The origin of the coordinate system is the top-left of the image. Number of pixels from the left edge of the image.

2.0

minY

int

Y-coordinate of the top of the bounding box. Number of pixels from the top edge of the image.

2.0

Attributes Value Response Body

Returned when you pass a task parameter value of form.

Name

Type

Description

Available Version

boundingBox

object

Contains the coordinates for the bounding box that encloses the detected text value.

2.0

text

string

The data value for the specified key. For example, For example, in a driver's license, if key text is 4a iss, the value text might be 09/13/1999.

2.0

Attributes Value boundingBox Response Body

When label is “table”. The bounding box for the form value.

Name

Type

Description

Available Version

maxX

int

X-coordinate of the right side of the bounding box. Number of pixels from the left edge of the image.

2.0

maxY

int

Y-coordinate of the bottom of the bounding box. Number of pixels from the top edge of the image.

2.0

minX

int

X-coordinate of the left side of the bounding box. The origin of the coordinate system is the top-left of the image. Number of pixels from the left edge of the image.

2.0

minY

int

Y-coordinate of the top of the bounding box. Number of pixels from the top edge of the image.

2.0

Attributes Response Body

When label is “table”.

Name

Type

Description

tableName

string

Name of the table. Must be unique across all tableNames. It matches the user-defined tableName if a mapping is found, otherwise it is an automatically generated unique string, with format "table{number}{UUID}"

tableId

string

Each table in the response is assigned a count in the response

tableCells

array of tableCells object

A list of tableCells objects (see Attributes tableCells Object Response Body table)

pageNumber

string

Start page number of the table

blockId

int

ID of block. This ID is unique across the response and incrementally assigned after key value pairs.

Attributes tableCells Object Response Body

Name

Type

Description

boundingBox

object

Contains the coordinates for the bounding box that encloses the sub-label

text

string

The text of each element (cell) of the table

OCRProbability

float

OCR confidence of text in the cell

startCellLocation

cellLocation

See the description in Attributes tableCells Object cellLocation Response Body

endCellLocation

cellLocation

See the description in Attributes tableCells Object cellLocation Response Body

entity

string

Optional. It is set to user-defined entity of header if there is a match. Otherwise, it is non-existent.

cellType

string

Optional. It is one of {rowHeader, columnHeader, normal} if cellType inference is performed

normalizedText

string

Same as the normalizedText in Attributes Response Body

Attributes tableCells Object cellLocation Response Body

Name

Type

Description

rowIndex

int

The row index of the detected text

colIndex

int

The column index of detected text

rowHeader

string

Optional. The entity of the row header of this cell. If the header does not have an entity, use the header's text instead.

It does not exist if this cell does not have any headers or the header inference is not performed.

colHeader

string

Optional. The entity of the column header of this cell. If the header does not have an entity, use the header's text instead.

It does not exist if this cell does not have any headers or the header inference is not performed.

Language