Train a Dataset

Trains a dataset and creates a model.

Request Parameters

Name

Type

Description

Available Version

algorithm

string

Specifies the algorithm used to train the dataset. Optional. Use this parameter only when training a dataset with a type of text-intent. Valid values are:

  • intent—Uses the original training algorithm. The training process defaults to this algorithm if the parameter isn’t supplied. We recommend that you don't use this algorithm and instead use one of the following algorithms.
  • multilingual-intent—Uses the training algorithm that supports multiple languages. A model created using this algorithm returns one of the model labels, even if the text sent for prediction doesn’t fall into any of the labels.
  • multilingual-intent-ood—Uses the training algorithm that supports multiple languages and handles out-of-domain text. A model created using this algorithm returns an empty probabilities array when text sent for prediction doesn’t fall into one of the labels.

2.0

datasetId

long

ID of the dataset to train.

2.0

epochs

int

Number of training iterations for the neural network. Optional. If not specified, the default is calculated based on the dataset size. The larger the number, the longer the training takes to complete.

The training process stops before the specified number of epochs if the model has reached the optimal accuracy. When you get the training staus, the earlyStopping field specifies whether the training stopped early, and the lastEpochDone value specifies the last training iteration.

2.0

learningRate

float

N/A for intent or sentiment models.

2.0

name

string

Name of the model. Maximum length is 180 characters.

2.0

trainParams

object

JSON that contains parameters that specify how the model is created. Optional. Valid values:

  • {"trainSplitRatio": 0.n}—Lets you specify the ratio of data used to train the dataset and the data used to test the model. The default split ratio is 0.8; 80% of the data is used to train the dataset and create the model and 20% of the data is used to test the model. If you pass in a split ratio of 0.6, then 60% of the data is used to train the dataset and create the model and 40% of the data is used to test the model.

  • {"withFeedback": true}—Lets you specify that feedback examples are included in the data to be trained to create the model. If you omit this parameter, feedback examples aren't used in training.

  • {"withGlobalDatasetId": <DATASET_ID>}—Lets you specify that a global dataset is used in addition to the specified dataset to create the model.

2.0

Keep the following points in mind when training a dataset:

  • If you’re unsure which values to set for the epochs and learningRate parameters, we recommend that you omit them and use the defaults.
  • A dataset can have only one training in progress at a time. Let's say you train a dataset and there's a model with a status of RUNNING or QUEUED. If you attempt to train the same dataset again, you receive an error.
  • You receive an error when you train a dataset that has more than 3 million words across all examples. Be sure that when you create a dataset or add examples to a dataset, that it contains less than 3 million words. For best results, we recommend that each example is around 100 words.

Response Body

Name

Type

Description

Available Version

algorithm

string

Algorithm used to create the model. Returned only when the modelType is text-intent.

2.0

createdAt

date

Date and time that the model was created.

2.0

datasetId

long

ID of the dataset trained to create the model.

2.0

datasetVersionId

int

N/A

2.0

epochs

int

Number of epochs used during training.

2.0

language

string

Model language inherited from the dataset language.

2.0

learningRate

float

N/A for intent or sentiment models.

2.0

modelId

string

ID of the model. Contains letters and numbers.

2.0

modelType

string

Type of data from which the model was created. Inferred from the dataset type. Valid values are:

  • text-intent
  • text-sentiment

2.0

name

string

Name of the model.

2.0

object

string

Object returned; in this case, training.

2.0

progress

float

How far the training job has progressed. Values are between 0–1.

2.0

queuePosition

int

Where the training job is in the queue. This field appears in the response only if the status is QUEUED.

2.0

status

string

Status of the training job. Valid values are:

  • QUEUED—The training job is in the queue.
  • RUNNING—The training job is running.
  • SUCCEEDED—The training job succeeded, and the model was created.
  • FAILED—The training job failed.

2.0

trainParams

object

Training parameters passed into the request. For example, if you sent in a split of 0.7, the response contains "trainParams": {"trainSplitRatio": 0.7}

2.0

trainStats

object

Returns null when you train a dataset. Training statistics are returned when the status is SUCCEEDED or FAILED.

2.0

updatedAt

date

Date and time that the model was last updated.

2.0

This cURL command sends in the trainParams request parameter. This command has double quotes and escaped double quotes around trainSplitRatio to run on Windows. You might need to reformat it to run on another OS.

curl -X POST -H "Authorization: Bearer <TOKEN>" -H "Cache-Control: no-cache" -H "Content-Type: multipart/form-data" -F "name=Weather Intent Model" -F "datasetId=1001411" -F "trainParams={\"trainSplitRatio\":0.7}" https://api.einstein.ai/v2/language/train

You can pass in multiple training parameters. For example, you specify withFeedback and trainSplitRatio using this JSON: {"withFeedback" : true, "trainSplitRatio" : 0.7}.

If you want to train a dataset and update an existing model, see Retrain a Dataset.

Language