Einstein Platform Services

Use Global Datasets

Global datasets are public datasets that Salesforce provides. You can use these datasets to include additional data during training when you create a model.

One way you use global datasets is to create a negative class in your model. A negative class is returned in a prediction for an image that doesn't match any of the other classes.

For example, the beaches and mountains model has only two classes: Beaches and Mountains. If you classify an image of an airplane using that model, the response returns probabilities for those two classes, which aren’t very helpful.

Instead, you can use the global dataset to add another label named "other". Now when you train the beaches and mountains dataset, you include the global dataset. Your original dataset stays the same, but the data from the global dataset is added to the model. The model then contains these labels: Mountains, Beaches, and other.

Use a Global Dataset to Create a Model

This cURL command trains a dataset and creates a model using one of the global datasets. Replace <DATASET_ID> with your dataset ID. The global dataset ID specified here (1005161) is the ID for a standard image classification dataset.

curl -X POST -H "Authorization: Bearer <TOKEN>" -H "Cache-Control: no-cache" -H "Content-Type: multipart/form-data" -F "name=Beaches and Mountains w/Global Other Dataset" -F "datasetId=<DATASET_ID>" -F "trainParams={\"withGlobalDatasetId\" : 1005161}" https://api.einstein.ai/v2/vision/train

Now when you send an image of an airplane for classification, the model returns this response.

{
    "probabilities": [
        {
            "label": "other",
            "probability": 0.99999857
        },
        {
            "label": "Beaches",
            "probability": 8.929781e-7
        },
        {
            "label": "Mountains",
            "probability": 5.91046e-7
        }
    ],
    "object": "predictresponse"
}

You can use global datasets in both train and retrain API calls.

Find Out What Global Datasets are Available

Use the global query parameter to return a list of global datasets. This cURL command returns all the global datasets.

curl -X GET -H "Authorization: Bearer <TOKEN>" -H "Cache-Control: no-cache" https://api.einstein.ai/v2/vision/datasets?global=true

This cURL call returns a response similar to this one. Note the dataset type, and be sure that the global dataset type matches the type of the dataset that you train to create a model.

{
  "object": "list",
  "data": [
    {
      "id": 1005161,
      "name": "other",
      "createdAt": "2017-06-27T23:21:16.000+0000",
      "updatedAt": "2017-06-27T23:21:19.000+0000",
      "labelSummary": {
        "labels": [
          {
            "id": 24197,
            "datasetId": 1005161,
            "name": "other",
            "numExamples": 455
          }
        ]
      },
      "totalExamples": 455,
      "totalLabels": 1,
      "available": true,
      "statusMsg": "SUCCEEDED",
      "type": "image",
      "object": "dataset"
    }
  ]
}

Global datasets are a little different than datasets that you create using your own data.

  • You can return global datasets only by using the API call that gets all datasets.
  • Global datasets can only be used together with one of your own datasets to create a model.
  • You can’t get a single global dataset, get its examples, train it, or delete it.

Updated 3 months ago

Use Global Datasets


Global datasets are public datasets that Salesforce provides. You can use these datasets to include additional data during training when you create a model.

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.