Create Examples From a File

Adds examples from a .csv, .tsv, or .json file to a dataset.

Request Parameters

Name

Type

Description

Available Version

data

string

Path to the .csv, .tsv, or .json file on a local drive. The maximum file size you can upload from a local drive is 25 MB.

2.0

path

string

URL of the .csv, .tsv, or .json file. The maximum file size you can upload from a web location is 25 MB.

2.0

This call adds examples to the specified dataset from a .csv, .tsv, or .json file. This is an asynchronous call, so the results that are initially returned contain information for the original dataset and available is false.

Use the dataset ID and make a call to Get a Dataset to query when the upload is complete. When available is true and statusMsg is SUCCEEDED, the data upload is complete.

{
  "id": 1001412,
  "name": "weather",
  "createdAt": "2017-06-05T21:55:53.000+0000",
  "updatedAt": "2017-06-06T21:57:58.000+0000",
  "labelSummary": {
    "labels": [
      {
        "id": 13313,
        "datasetId": 1001412,
        "name": "hourly-forecast",
        "numExamples": 69
      },
      {
        "id": 13314,
        "datasetId": 1001412,
        "name": "current-weather",
        "numExamples": 87
      },
      {
        "id": 13315,
        "datasetId": 1001412,
        "name": "five-day-forecast",
        "numExamples": 63
      }
    ]
  },
  "totalExamples": 219,
  "totalLabels": 3,
  "available": true,
  "statusMsg": "SUCCEEDED",
  "type": "text-intent",
  "language": "ENGLISH",
  "numOfDuplicates": 3,
  "object": "dataset"
}

Each dataset type supports different file formats. This table lists the file formats supported by each dataset type.

text-intent

text-sentiment

.csv file

Y

Y

.tsv file

Y

Y

.json file

Y

N

Keep the following points in mind when creating examples:

FILE SIZE

  • The maximum file size you can upload from a local drive or web location is 25 MB.

DATSETS

  • The maximum total dataset size is 2 GB.

LABELS

  • If the file contains a label that's already in the dataset, the API adds the sentiment or intent strings (examples) with the specified label in the dataset.

  • If the file contains a label that isn't in the dataset, the API adds a new label (label name limit is 180 characters).

EXAMPLES

  • A dataset can have a maximum of 3 million words across all examples. If you try to train a dataset that has more than 3 million words, you receive an error.

  • For best results, we recommend that each example is around 100 words.

OTHER

  • The Einstein Language APIs support only UTF-8 text characters. If your examples or labels contain any non-UTF-8 text, you receive an error that the file format is invalid when you try to create the examples.

  • If you try to create examples in a dataset while a previous call to create examples is still processing (the dataset's available value is false), the call fails and you receive an error. You must wait until the dataset's available value is true before starting another upload.

Response Body

Name

Type

Description

Available Version

available

boolean

Specifies whether the dataset is ready to be trained.

2.0

createdAt

date

Date and time that the dataset was created.

2.0

id

long

ID of the dataset.

2.0

labelSummary

object

Contains the labels array that contains all the labels for the dataset.

2.0

language

string

Dataset language. Default is ENGLISH.

2.0

name

string

Name of the dataset.

2.0

numOfDuplicates

int

Number of duplicate text strings. This number includes duplicates in the .zip file from which the dataset was created plus the number of duplicate text strings from subsequent PUT calls to add text to the dataset.

2.0

object

string

Object returned; in this case, dataset.

2.0

statusMsg

string

Status of the dataset creation and data upload. Valid values are:

  • FAILED: <message>—Data upload has failed.
  • SUCCEEDED—Data upload is complete.
  • UPLOADING—Data upload is in progress.

2.0

totalExamples

int

Total number of examples in the dataset.

2.0

totalLabels

int

Total number of labels in the dataset.

2.0

type

string

Type of dataset data. Valid values are:

  • text-intent
  • text-sentiment

2.0

updatedAt

date

Date and time that the dataset was last updated.

2.0

Label Response Body

Name

Type

Description

Available Version

datasetId

long

ID of the dataset that the label belongs to.

2.0

id

long

ID of the label.

2.0

name

string

Name of the label.

2.0

numExamples

int

Number of examples that have the label.

2.0

See Create a Dataset From a File Asynchronously or Create a Dataset From a File Synchronously for information about the file structure.

Language