Creates a dataset, labels, and examples from the specified .zip file. The call returns after the dataset is created and all of the images are uploaded. Use this API call for .zip files that are smaller than 10 MB.
Request Parameters
Name | Type | Description | Available Version |
---|---|---|---|
| string | Path to the .zip file on the local drive (FilePart). The maximum .zip file size you can upload from a local drive is 50 MB. | 1.0 |
| string | Dataset language. Optional. Default is | 2.0 |
| string | Name of the dataset. Optional. If this parameter is omitted, the dataset name is derived from the .zip file name. If this parameter is omitted, the dataset name returned by this call is | 1.0 |
| string | URL of the .zip file. The maximum .zip file size you can upload from a web location is 50 MB. | 1.0 |
| string | Type of dataset data. Valid values are:
| 1.0 |
The API call is synchronous, so results are returned after the data has been uploaded to the dataset. If this call succeeds, it returns the labels
array, available
is true
, and statusMsg
is SUCCEEDED
.
You must provide the path to the .zip file on either the local machine or in the cloud.
If the dataset type is image
or image-multi-label
, this API:
- Creates a dataset that has the same name as the .zip file (limit is 100 characters), if the
name
parameter is omitted. - Creates a label for each directory in the .zip file. The label name is the same name as the directory name (limit is 180 characters).
- Creates an example for each image file in each directory in the .zip file. The example name is the same as the image file name.
If the dataset type is image-detection
, this API:
- Creates a dataset that has the same name as the .zip file (limit is 100 characters), if the
name
parameter is omitted. - Creates a label for each unique label in the annotations.csv file (limit is 180 characters).
- Creates an example for each image file in the .zip file.
Keep the following points in mind when creating datasets.
All Datasets
-
If your .zip file is more than 10 MB, we recommend that you use the asynchronous call to create a dataset. If you use this call with a large dataset .zip file, the call could time out. See Create a Dataset From a Zip File Asynchronously.
-
The maximum .zip file size you can upload from a local drive or a web location is 50 MB.
-
The maximum total dataset size is 2 GB. After you create the dataset, you can use the PUT call to add more examples to it.
-
If the
name
parameter is passed, the maximum length is 100 characters. -
The maximum image file name length is 150 characters including the file extension. If the .zip file contains a file with a name greater than 150 characters (including the file extension), the example is created in the dataset, but the API truncates the example name to 150 characters.
-
If the .zip file contains an image file that has a name containing spaces, the spaces are removed from the file name before the file is uploaded. For example, if you have a file called
sandy beach.jpg
the example name becomessandybeach.jpg
. If the .zip file contains an image file that has a name with non-ASCII characters, those characters are converted to UTF-8. -
When specifying the URL for a .zip file in a cloud drive service like Dropbox, be sure it's a link to the file and not a link to the interactive download page. For example, the URL should look like
https://www.dropbox.com/s/abcdxyz/mountainvsbeach.zip?dl=1
-
If the .zip file has an incorrect structure, the API returns an error:
FAILED: Invalid zip format provided for <dataset_name>
. -
If you create a dataset or upload images from a .zip file in Apex code, be sure that you reference the URL to the file with
https
and nothttp
.
Image or Image Multi-Label Datasets
-
The .zip file must have a specific directory structure:
-
In the root, there should be a parent directory that contains subdirectories.
-
Each subdirectory below the parent directory becomes a label in the dataset. This subdirectory must contain images to be added to the dataset.
-
Each subdirectory below the parent directory should contain only images and not any nested subdirectories.
-
If you have a large amount of data (gigabytes), you might want to break up your data into multiple .zip files. You can load the first .zip file using this call and then load subsequent .zip files using PUT. See Create Examples From Zip File.
-
If you create a dataset from a .zip file, you can only add examples to it from a .zip file using PUT. See Create Examples From Zip File. You can't add a single example from a file.
-
The maximum directory name length is 180 characters. If the .zip file contains a directory with a name greater than 180 characters, the label is created in the dataset, but the API truncates the label name to 180 characters.
-
The minimum number of labels is two. You can create a image classification dataset with only one label, but the dataset training will fail and return an error.
-
The minimum number of examples per label is 10.
-
The minimum number of total examples across all labels is 40.
-
Image files must be smaller than 1 MB. If the .zip file contains image files larger than 1 MB, the image won't be loaded and no error is returned.
-
Images must be no larger than 2,000 pixels high by 2,000 pixels wide. You can upload images that are larger, but training the dataset might fail.
-
The supported image file types are PNG, JPG, and JPEG. If the .zip file contains any unsupported image file types, those images won't be uploaded and no error is returned.
-
Duplicate images are handled differently based on the dataset type.
-
Image—For datasets of type
image
, if there are duplicate image files in the .zip file, only the first file is uploaded. Duplicate images are checked within directories and across directories. If there's more than one image file with the same file contents in the same directory or in multiple directories, only the first file is uploaded and the others are skipped. -
Multi-label—For datasets of type
image-multi-label
, if there are duplicate image files in a single directory, only the first file is uploaded and the others are skipped. In a multi-label dataset, it's expected that there are duplicate files across directories. If there's more than one image file with the same file contents in multiple directories, the file is loaded multiple times with a different label. -
You can download an example image .zip file from https://einstein.ai/images/mountainvsbeach.zip.
Object Detection Datasets
- Here are the guidelines for the .zip file:
- The .zip file must contain two types of elements: (1) the image files specified in the annotations.csv file and (2) a file named annotations.csv that contains the bounding box data.
- Images can be in the root of the .zip file or in a folder or folders in the root of the .zip file. If images are in folders more than one level deep, you'll receive an error when you try to create the dataset.
- The supported image file types are PNG, JPG, and JPEG. If the .zip file contains any unsupported image file types, those images won't be uploaded and no error is returned.
- The annotations.csv file is a text file that contains the data for the bounding boxes associated with each image. The file must have this exact name.
- The annotations.csv file can be anywhere within the .zip file.


-
The maximum label name length is 180 characters. If the annotations file contains a label with a name greater than 180 characters, the label is created in the dataset, but the API truncates the label name to 180 characters.
-
Image files must be smaller than 5 MB. If the .zip file contains image files larger than 5 MB, the image won't be loaded and no error is returned.
-
Labels are case sensitive. If you have labels
Oatmeal
andoatmeal
, they are two distinct labels in the dataset and the resulting model. -
The minimum number of labels is one.
-
When you create a dataset, all the images are checked for duplicates. If the .zip file contains multiple image files that have the same contents, only the first of the duplicate files is uploaded.
-
If there's an image in the .zip file, but no bounding box descriptions for that image in the annotations file, the image is dropped and no error is returned.
-
You can download an example object detection .zip file from https://einstein.ai/images/alpine.zip.
Annotations.csv File Format
The annotations.csv file contains the bounding box coordinates and the labels for each image.
-
The first row in the file contains the headers for the CSV values. We use the convention of
image_file
andboxn
, but each header value can be any string.image_file
—Header for the image file name.boxn
—Header for each bounding box element. The number ofboxn
values in the header is the maximum number of bounding boxes you can have in an image.
-
Each row after the header specifies the bounding box descriptions in JSON format for each image in the .zip file. There should be one row per file. Multiple bounding boxes for the same image are listed as separate columns in the same row. The image name provided must be the exact name of the image file included in the parent folder. The
x
,y
,width
, andheight
values specify the bounding box location within the image. The following table lists the required fields for each bounding box.
Name | Type | Description |
---|---|---|
| string | Classification label for the content in the bounding box. |
| int | Height of the bounding box in pixels. |
| int | Width of the bounding box in pixels. |
| int | Location of the bounding box on the horizontal axis. |
| int | Location of the bounding box on the vertical axis. |
Here's an example of an annotations.csv file for two images.
"image_file","box0","box1"
"picture1.jpg","{""label"": ""cat"", ""y"": 242, ""x"": 160, ""height"": 62, ""width"": 428}", "{""label"": ""turtle"", ""y"": 113, ""x"": 61, ""height"": 74, ""width"": 718}"
"picture2.jpg","{""label"": ""dog"", ""y"": 94, ""x"": 27, ""height"": 144, ""width"": 184}","{""label"": ""dog"", ""y"": 50, ""x"": 286, ""height"": 344, ""width"": 348}"
Here's the second image referenced in the annotations.csv file showing the bounding boxes.


Response Body
Name | Type | Description | Available Version |
---|---|---|---|
| boolean | Specifies whether the dataset is ready to be trained. | 1.0 |
| date | Date and time that the dataset was created. | 1.0 |
| long | Dataset ID. | 1.0 |
| object | Contains the | 1.0 |
| string | Dataset language. Default is | 2.0 |
| string | Name of the dataset. The API uses the name of the .zip file for the dataset name. | 1.0 |
| int | Number of duplicate images in the .zip file from which the dataset was created. | 2.0 |
| string | Object returned; in this case, | 1.0 |
| string | Status of the dataset creation and data upload. Valid values are:
| 1.0 |
| int | Total number of examples in the dataset. | 1.0 |
| string | Type of dataset data. Valid values are:
| 1.0 |
| date | Date and time that the dataset was last updated. | 1.0 |
Labels Response Body
Name | Type | Description | Available Version |
---|---|---|---|
| long | ID of the dataset that the label belongs to. | 1.0 |
| long | ID of the label. | 1.0 |
| string | Name of the label. | 1.0 |
| int | Number of examples that have the label. | 1.0 |