Motivation

The development of kaos was fueled by our ambition to mimic natural incremental model development, simplify model reproducibility and collaboration, and automate ML infrastructure deployment in a flexible language-agnostic environment.

Incremental Development is Natural

The typical Data Science workflow is an iterative end-to-end pipeline. It is extremely unlikely that the first inputs result in a final model. Inputs are always changing due to additional training data, an improved model architecture, new tuning parameters, etc... The natural problem solving flow is temporal in nature since we adapt to outcomes - i.e. try X, adapt X, try Y, adapt Y, etc...

Typical tooling ignores the reality of natural temporal incremental development

Reproducibility is Tricky

Data Scientists rely on sampling from large datasets, building interactive visualizations, performing exhaustive statistical analyses, engineering features, developing models, and evaluating model metrics prior to delivery into production. The process is performed in many iterations, which inherently requires tracking which inputs and what processing caused what output (i.e. data provenance).

Tracking provenance explodes when multiple users share multiple models and their respective inputs (e.g. code, environment, data, and parameters).

ML Infrastructure is Tough

Data Scientists use multiple technologies and tools for data processing, algorithm and model development. This mandates the knowledge of what underlying resources are necessary for each task.

Deploying stable elastic infrastructure requires detailed knowledge of authentication, processing, storage and networking (i.e. DevOps).

Flexibility Flexibility Flexibility

Data Scientists require diverse libraries for processing features and/or training models, which is typically linked to a preferred programming language. Tooling flexibility is an absolute necessity to ensure Data Scientists are not hindered throughout their workflow.

A flexible tool must handle different frameworks, packages and languages.