Creates a dataset, labels, and examples from the specified .csv, .tsv, or .json file. The call returns after the dataset is created and all of the data is uploaded. Use this API call for files that are smaller than 10 MB.
Request Parameters
Name | Type | Description | Available Version |
---|---|---|---|
| string | Path to the .csv, .tsv, or .json file on the local drive (FilePart). The maximum file size you can upload from a local drive is 25 MB. | 2.0 |
| string | Dataset language. Optional. Default is | 2.0 |
| string | Name of the dataset. Optional. If this parameter is omitted, the dataset name is derived from the file name. | 2.0 |
| string | URL of the .csv, .tsv, or .json file. The maximum file size you can upload from a web location is 25 MB. | 2.0 |
| string | Type of dataset data. Valid values are:
| 2.0 |
The API call is synchronous, so results are returned after the data has been uploaded to the dataset. If this call succeeds, it returns the labels
array, available
is true
, and statusMsg
is SUCCEEDED
.
Keep the following points in mind when creating datasets.
FILE SIZE
-
The maximum file size you can upload from a local drive or web location is 25 MB.
-
If your file is larger than 10 MB, we recommend that you use the asynchronous call to create a dataset. If you use this call with a large dataset .zip file, the call could time out. See Create a Dataset From a File Asynchronously.
-
If you have a large amount of data, consider breaking up your data into multiple files. You can load the first file using this call and then load subsequent files using PUT. See Create Examples From a File.
DATSETS
- The maximum total dataset size is 2 GB.
LABELS
-
The maximum label name length is 180 characters. If a file contains a class label name greater than 180 characters, the label is created in the dataset, but the API truncates the label name to 180 characters.
-
You must have at least two labels in the dataset. If you don't have at least two labels, you can create the dataset but training the dataset fails.
-
A dataset can have up to 500 labels. However, we recommend that a dataset has a maximum of 100 labels for better model accuracy.
EXAMPLES
-
If a file contains duplicate intent or sentiment strings, only the first one is loaded.
-
If the dataset type is
text-intent
, each label must have at least 20 examples. We recommend that each label has at least 100 examples for better model accuracy. If you have only 20 examples, be sure to use themultilingual-intent
ormultilingual-intent-ood
algorithm when you train the dataset. -
If the dataset type is
text-sentiment
, each label must have at least 100 examples. -
We recommend a maximum of 10,000 examples across all labels.
-
A dataset can have a maximum of 3 million words across all examples. If you try to train a dataset that has more than 3 million words, you receive an error.
-
For best results, we recommend that each example is less than 150 words.
-If the file contains duplicate text strings, only the first string is uploaded. If there's more than one text string with the same text, only the first string is uploaded and the others are skipped. This is true whether the duplicate text strings have the same label or different labels.
OTHER
-
The Einstein Language APIs support only UTF-8 text characters. If your examples or labels contain any non-UTF-8 text, you receive an error that the file format is invalid when you try to create the dataset.
-
When specifying the URL for a file in a cloud drive service like Dropbox, be sure it's a link to the file and not a link to the interactive download page. For example, the URL should look like
https://www.dropbox.com/s/abcdxyz/weather.csv?dl=1
-
If you create a dataset in Apex code, be sure that you reference the URL to the file with
https
and nothttp
.
Name | Type | Description | Available Version |
---|---|---|---|
| boolean | Specifies whether the dataset is ready to be trained. | 2.0 |
| date | Date and time that the dataset was created. | 2.0 |
| long | Dataset ID. | 2.0 |
| object | Contains the | 2.0 |
| string | Dataset language. Default is | 2.0 |
| string | Name of the dataset. The API uses the name of the file for the dataset name. | 2.0 |
| int | Number of duplicate text strings in the .zip file from which the dataset was created. | 2.0 |
| string | Object returned; in this case, | 2.0 |
| string | Status of the dataset creation and data upload. Valid values are:
| 2.0 |
| int | Total number of examples in the dataset. | 2.0 |
| string | Type of dataset data. Valid values are:
| 2.0 |
| date | Date and time that the dataset was last updated. | 2.0 |
Labels Response Body
Name | Type | Description | Available Version |
---|---|---|---|
| long | ID of the dataset that the label belongs to. | 2.0 |
| long | ID of the label. | 2.0 |
| string | Name of the label. | 2.0 |
| int | Number of examples that have the label. | 2.0 |
Each dataset type supports different file formats. This table lists the file formats supported by each dataset type.
text-intent | text-sentiment | |
---|---|---|
.csv file | Y | Y |
.tsv file | Y | Y |
.json file | Y | N |
CSV File Format Considerations
-
You can use a .csv file to create both intent and sentiment datasets.
-
When you create a dataset from a .csv file, the dataset name is inherited from the file name.
-
Each .csv file contains sentiment or intent data in this format:
"sentiment or intent string", label-name
followed by a CRLF. In the following example, the intent string is in double quotes followed by the labelcurrent-weather
.
"what's the weather look like",current-weather
"is it raining",current-weather
"what's the temperature",current-weather
- You can download example intent .csv files from https://einstein.ai/text/weather.csv and https://einstein.ai/text/case_routing_intent.csv.
TSV File Format Considerations
-
You can use a .tsv file for both intent and sentiment datasets.
-
When you create a dataset from a .tsv file, the dataset name is inherited from the file name.
-
Each .tsv file contains sentiment or intent data in this format:
"sentiment or intent string"<tab_char>label-name
followed by a CRLF. In the following example, the intent string is in double quotes followed by the labelcurrent-weather
.
"what's the weather look like",current-weather
"is it raining",current-weather
"what's the temperature",current-weather
- You can download example intent .tsv files from https://einstein.ai/text/weather.tsv and https://einstein.ai/text/case_routing_intent.tsv.
JSON File Format Considerations
-
You can use a .json file only for an intent dataset.
-
The top-level object must be called
"intents"
as shown in the example file. -
When you create a dataset from a .json file, the dataset name is inherited from the file name.
-
The JSON structure is a top-level object that contains multiple arrays. Each array contains the intent strings in double quotes. This image shows the structure of the JSON file. The labels are
current-weather
,five-day-forecast
, andhourly-forecast
. The content in double quotes becomes a single example that has the specified label.


- You can download example intent .json files from https://einstein.ai/text/weather.json and https://einstein.ai/text/case_routing_intent.json.