Data Manifest

This example details how to load remote datasets via a manifest file. The goal is to avoid handling and storing large datasets locally.

This example assumes you are a Data Scientist using kaos with a running endpoint


The following steps are required before being able to train the MNIST model.


The kaos ML platform is fully functional when initialized with a running endpoint from a System Administrator. See Workflows for more information regarding different kaos personas.

kaos init -e <running_endpoint>

Create a workspace

A workspace is required within kaos for organizing multiple environments and code. Refer to Workspaces for additional information.

$ kaos workspace create -n mnist
​Successfully set mnist workspace

Load the MNIST template

kaos is supplied with various templates (including MNIST) for ensuring simplicity in training and serving own models.

$ kaos template get --name mnist
​Successfully loaded mnist template

Train with Remote Data

The training pipeline requires at least a valid source and data bundle. The following command uses remote data in the form of a data manifest, opposed to local data. Refer to Data Bundle for additional information.

The data manifest,/templates/mnist/data_manifest_mid/, contains links to 1000s of small files for training. There is a tiny debug version containing links to 6 files in/templates/mnist/data_manifest_micro/

kaos train deploy -s templates/mnist/model-train \
-m templates/mnist/data_manifest_mid/
Submitting source bundle: templates/mnist/model-train
Compressing source bundle: 100%|███████████████████████████|
✔ Setting source bundle: /mnist:e23a2
Submitting data manifest: templates/mnist/data
Compressing data manifest: 100%|███████████████████████████|
✔ Setting data manifest: /features:c6062
| Image | Data | Hyperparams |
| ⨂ | ✔ | ✗ |
| <building> | /features:c6062 | |