Future Tools¶

A list of ML & Data tools which may be part of future versions of deployKF.

How do I request or contribute a tool?

If you would like to request or contribute support for a tool, please raise an issue on GitHub, or join the discussion on an existing issue.

Tool Roadmap¶

The following is a roadmap of ML & Data tools which are planned for future versions of deployKF, grouped by priority.

Higher Priority¶

Name (Click for Details)	Purpose
MLflow Model Registry	Model Registry
KServe	Model Serving

Medium Priority¶

Name (Click for Details)	Purpose
Feast	Feature Store
Apache Airflow	Workflow Orchestration

Lower Priority¶

Name (Click for Details)	Purpose
DataHub	Data Catalog
Airbyte	Data Integration
Label Studio	Data Labeling
BentoML Yatai	Model Serving
Seldon Core	Model Serving

Tool Details¶

The following sections provide details and descriptions of each tool which is planned for future versions of deployKF.

Details - MLflow Model Registry

MLflow Model Registry¶

Purpose	Model Registry
Maintainer	Databricks
Documentation	Documentation
Source Code	`mlflow/mlflow`
Roadmap Priority	Higher

MLflow Model Registry is an open source machine learning model registry.

A model registry decouples model training from model deployment, allowing you to break the model lifecycle down into three separate concerns. This separation enables you to have well-scoped pipelines, rather than trying to go from training to deployment all at once.

Model Training: Training new versions of models and logging them into the registry.
Model Evaluation: Evaluating versions of models and logging the results into the registry.
Model Deployment: Making informed decisions about which models to deploy and then deploying them.

The key features of MLflow Model Registry are:

Model Versioning: Version your model artifacts and attach metadata to each version.
Model Stage Transitions: Transition models between stages (e.g. staging to production).
Web UI: A graphical web interface for managing models.
Python API: A Python API for managing models.
REST API: A REST API for managing models.

Details - KServe

KServe¶

Purpose	Model Serving
Maintainer	Linux Foundation
Documentation	Documentation
Source Code	`kserve/kserve`
Roadmap Priority	Higher

KServe provides comprehensive interfaces for deploying, managing, and monitoring ML models on Kubernetes.

The core features of KServe are:

Support for Many Frameworks: KServe natively supports many ML frameworks (e.g. PyTorch, TensorFlow, scikit-learn, XGBoost).
Autoscaling, Even to Zero: KServe can autoscale model replicas to meet demand, even scaling to zero when there are no requests.
Model Monitoring: KServe integrates tools like Alibi Detect to provide model monitoring for drift and outlier detection.
Model Explainability: KServe integrates tools like Alibi Explain to provide model explainability.
Request Batching: KServe can batch requests to your model, improving throughput and reducing cost.
Canary Deployments: KServe can deploy new versions of your model alongside old versions, and route requests to the new version based on a percentage.
Feature Transformers: KServe can do feature pre/post processing alongside model inference (e.g. using Feast).
Inference Graphs: KServe can chain multiple models together to form an inference graph.

Details - Feast

Feast¶

Purpose	Feature Store
Maintainer	Tecton
Documentation	Documentation
Source Code	`feast-dev/feast`
Roadmap Priority	Medium

Feast is an open-source feature store for machine learning.

A good way to understand the purpose of a feature store is to think about the data access patterns encountered during the model lifecycle. A feature store should somehow make these data access patterns easier.

Feature Engineering: Accesses and transforms historical data to create features.
Target Engineering: Accesses and transforms historical data to create targets.
Model Training: Accesses features and targets to train and evaluate the model.
Model Inference: Accesses features of new data to predict the target.

The key features of Feast are:

Feature Registry: Where Feast persists feature definitions (not data) that are registered with with it (e.g. Local-Files, S3, GCS).
Python SDK: The primary interface for managing feature definitions, and retrieving feature values from Feast.
Offline Data Stores: A store which Feast can read feature values from, for historical data retrieval (e.g. Snowflake, BigQuery, Redshift).
Online Data Stores: A store which Feast can materialize (write) feature values into, for online model inference (e.g. Snowflake, Redis, DynamoDB, Bigtable).
Batch Materialization Engine: A data processing engine which Feast can use to materialize feature values from an Offline Store into an Online Store (e.g. Snowflake, Spark, Bytewax).

A good feature store is NOT a database, but rather a data access layer between your data sources and your ML models. Be very wary of any feature store that requires you to load your data into it directly.

Details - Apache Airflow

Apache Airflow¶

Purpose	Workflow Orchestration
Maintainer	Apache Software Foundation
Documentation	Documentation
Source Code	`apache/airflow`
Roadmap Priority	Medium

Apache Airflow is by far the most popular open-source workflow orchestration tool in the world.

The versatility and extensibility of Apache Airflow make it a great fit for many different use cases, including machine learning.

The key features of Apache Airflow are:

Python Centered: Airflow is written in Python and uses a Python DSL to define workflows.
Dynamic Workflows: Airflow's code-driven workflow definitions enable powerful patterns like dynamically generating workflows.
Extensive Plugins: Airflow has a rich ecosystem of plugins and integrations with other tools.
User Interface: Airflow is known for its powerful user interface which allows users to monitor and manage workflows.

Details - DataHub

DataHub¶

Purpose	Data Catalog
Maintainer	Acryl Data
Documentation	Documentation
Source Code	`datahub-project/datahub`
Roadmap Priority	Lower

DataHub is an open-source metadata platform for discovering, managing, and understanding data.

The core features of DataHub are:

Support for Many Data Sources: DataHub supports ingestion of metadata from many sources.
Search & Discovery: DataHub provides a search interface for discovering data.
Data Lineage: DataHub can capture and visualize complex data lineage.

Details - Airbyte

Airbyte¶

Purpose	Data Integration
Maintainer	Airbyte
Documentation	Documentation
Source Code	`airbytehq/airbyte`
Roadmap Priority	Lower

Airbyte is a data integration platform which aims to make it easy to move data from any source to any destination.

The core features of Airbyte are:

Comprehensive Connector Catalog: Airbyte has an extremely large catalog of connectors for data sources and destinations.
Airbyte Web UI: Airbyte provides a graphical web interface for managing data connectors and orchestrating data syncs.

Details - Label Studio

Label Studio¶

Purpose	Data Labeling
Maintainer	Heartex
Documentation	Documentation
Source Code	`heartexlabs/label-studio`
Roadmap Priority	Lower

Label Studio is an open-source data labeling platform which supports a variety of data types and labeling tasks.

The core features of Label Studio are:

Data Types: Label Studio supports a variety of data types, including text, images, audio, video, and time series.
Task Templates: Label Studio provides many templates for common labeling tasks, including text classification, named entity recognition, and object detection.
Label Studio Web UI: Label Studio provides a graphical web interface for labeling data and managing labeling projects.

Details - BentoML Yatai

BentoML Yatai¶

Purpose	Model Serving
Maintainer	BentoML
Documentation	Documentation
Source Code	`bentoml/Yatai`
Roadmap Priority	Lower

BentoML Yatai is a platform for managing the lifecycle of BentoML models on Kubernetes.

The core features of BentoML Yatai are:

Model Registry: A central registry for packaged Bentos.
Model Deployment: Managing the deployment of BentoML models to Kubernetes, including building model container images.
Web UI: A graphical web interface for viewing, deploying, and monitoring models.
REST APIs: A REST API for viewing, deploying, and monitoring models.
Kubernetes CRDs: Manage the deployment of models in a DevOps-friendly way.

Details - Seldon Core

Seldon Core¶

Purpose	Model Serving
Maintainer	Seldon
Documentation	Documentation
Source Code	`SeldonIO/seldon-core`
Roadmap Priority	Lower

Seldon Core provides interfaces for converting ML models into REST/gRPC microservices on Kubernetes.

The core features of Seldon Core are:

Support for Many Frameworks: Seldon Core natively supports many ML frameworks (e.g. TensorFlow, scikit-learn, XGBoost, HuggingFace, NVIDIA Triton).
Reusable Model Servers: Seldon Core removes the need to build a container image for each model, by providing a system to download model artifacts at runtime.
Model Deployment CRD Seldon Core provides a simple, yet powerful, Kubernetes CRD for deploying models.

Last update: 2024-04-21
Created: 2023-04-27