The development of kaos was fueled by our ambition to mimic natural incremental model development, simplify model reproducibility and collaboration, and automate ML infrastructure deployment in a flexible language-agnostic environment.
The typical Data Science workflow is an iterative end-to-end pipeline. It is extremely unlikely that the first inputs result in a final model. Inputs are always changing due to additional training data, an improved model architecture, new tuning parameters, etc... The natural problem solving flow is temporal in nature since we adapt to outcomes - i.e. try X, adapt X, try Y, adapt Y, etc...
Data Scientists rely on sampling from large datasets, building interactive visualizations, performing exhaustive statistical analyses, engineering features, developing models, and evaluating model metrics prior to delivery into production. The process is performed in many iterations, which inherently requires tracking which inputs and what processing caused what output (i.e. data provenance).
Data Scientists use multiple technologies and tools for data processing, algorithm and model development. This mandates the knowledge of what underlying resources are necessary for each task.
Data Scientists require diverse libraries for processing features and/or training models, which is typically linked to a preferred programming language. Tooling flexibility is an absolute necessity to ensure Data Scientists are not hindered throughout their workflow.