Skip to content

Future Tools

A list of ML & Data tools which may be part of future versions of deployKF.


Tool Roadmap

The following is a roadmap of planned tools, grouped by priority.

How do I request or contribute a tool?

If you would like to request or contribute support for a tool, please raise an issue on GitHub, or join the discussion on an existing issue.

Higher Priority

Name
(Click for Details)
Purpose
MLflow Model Registry Model Registry
KServe Model Serving

Medium Priority

Name
(Click for Details)
Purpose
Feast Feature Store
Apache Airflow Workflow Orchestration

Lower Priority

Name
(Click for Details)
Purpose
DataHub Data Catalog
Airbyte Data Integration
Label Studio Data Labeling
BentoML Yatai Model Serving
Seldon Core Model Serving

Tool Details

The following sections provide details and descriptions for each tool.

MLflow Model Registry

MLflow Model Registry is an open source machine learning model registry.

PurposeModel Registry
MaintainerDatabricks
DocumentationDocumentation
Source Codemlflow/mlflow
Roadmap PriorityHigher

A model registry decouples model training from model deployment, allowing you to break the model lifecycle down into three separate concerns. This separation enables you to have well-scoped pipelines, rather than trying to go from training to deployment all at once.

  1. Model Training: Training new versions of models and logging them into the registry.
  2. Model Evaluation: Evaluating versions of models and logging the results into the registry.
  3. Model Deployment: Making informed decisions about which models to deploy and then deploying them.

The key features of MLflow Model Registry are:

  • Model Versioning: Version your model artifacts and attach metadata to each version.
  • Model Stage Transitions: Transition models between stages (e.g. staging to production).
  • Web UI: A graphical web interface for managing models.
  • Python API: A Python API for managing models.
  • REST API: A REST API for managing models.

KServe

KServe provides comprehensive interfaces for deploying, managing, and monitoring ML models on Kubernetes.

PurposeModel Serving
MaintainerLinux Foundation
DocumentationDocumentation
Source Codekserve/kserve
Roadmap PriorityHigher

The core features of KServe are:

  • Support for Many Frameworks: KServe natively supports many ML frameworks (e.g. PyTorch, TensorFlow, scikit-learn, XGBoost).
  • Autoscaling, Even to Zero: KServe can autoscale model replicas to meet demand, even scaling to zero when there are no requests.
  • Model Monitoring: KServe integrates tools like Alibi Detect to provide model monitoring for drift and outlier detection.
  • Model Explainability: KServe integrates tools like Alibi Explain to provide model explainability.
  • Request Batching: KServe can batch requests to your model, improving throughput and reducing cost.
  • Canary Deployments: KServe can deploy new versions of your model alongside old versions, and route requests to the new version based on a percentage.
  • Feature Transformers: KServe can do feature pre/post processing alongside model inference (e.g. using Feast).
  • Inference Graphs: KServe can chain multiple models together to form an inference graph.

Feast

Feast is an open-source feature store for machine learning.

PurposeFeature Store
MaintainerTecton
DocumentationDocumentation
Source Codefeast-dev/feast
Roadmap PriorityMedium

A good way to understand the purpose of a feature store is to think about the data access patterns encountered during the model lifecycle. A feature store should somehow make these data access patterns easier.

  • Feature Engineering: Accesses and transforms historical data to create features.
  • Target Engineering: Accesses and transforms historical data to create targets.
  • Model Training: Accesses features and targets to train and evaluate the model.
  • Model Inference: Accesses features of new data to predict the target.

The key features of Feast are:

  • Feature Registry: Where Feast persists feature definitions (not data) that are registered with with it (e.g. Local-Files, S3, GCS).
  • Python SDK: The primary interface for managing feature definitions, and retrieving feature values from Feast.
  • Offline Data Stores: A store which Feast can read feature values from, for historical data retrieval (e.g. Snowflake, BigQuery, Redshift).
  • Online Data Stores: A store which Feast can materialize (write) feature values into, for online model inference (e.g. Snowflake, Redis, DynamoDB, Bigtable).
  • Batch Materialization Engine: A data processing engine which Feast can use to materialize feature values from an Offline Store into an Online Store (e.g. Snowflake, Spark, Bytewax).

A good feature store is NOT a database, but rather a data access layer between your data sources and your ML models. Be very wary of any feature store that requires you to load your data into it directly.

Apache Airflow

Apache Airflow is by far the most popular open-source workflow orchestration tool in the world.

PurposeWorkflow Orchestration
MaintainerApache Software Foundation
DocumentationDocumentation
Source Codeapache/airflow
Roadmap PriorityMedium

The versatility and extensibility of Apache Airflow make it a great fit for many different use cases, including machine learning.

The key features of Apache Airflow are:

DataHub

DataHub is an open-source metadata platform for discovering, managing, and understanding data.

PurposeData Catalog
MaintainerAcryl Data
DocumentationDocumentation
Source Codedatahub-project/datahub
Roadmap PriorityLower

The core features of DataHub are:

Airbyte

Airbyte is a data integration platform which aims to make it easy to move data from any source to any destination.

PurposeData Integration
MaintainerAirbyte
DocumentationDocumentation
Source Codeairbytehq/airbyte
Roadmap PriorityLower

The core features of Airbyte are:

  • Comprehensive Connector Catalog: Airbyte has an extremely large catalog of connectors for data sources and destinations.
  • Airbyte Web UI: Airbyte provides a graphical web interface for managing data connectors and orchestrating data syncs.

Label Studio

Label Studio is an open-source data labeling platform which supports a variety of data types and labeling tasks.

PurposeData Labeling
MaintainerHeartex
DocumentationDocumentation
Source Codeheartexlabs/label-studio
Roadmap PriorityLower

The core features of Label Studio are:

  • Data Types: Label Studio supports a variety of data types, including text, images, audio, video, and time series.
  • Task Templates: Label Studio provides many templates for common labeling tasks, including text classification, named entity recognition, and object detection.
  • Label Studio Web UI: Label Studio provides a graphical web interface for labeling data and managing labeling projects.

BentoML Yatai

BentoML Yatai is a platform for managing the lifecycle of BentoML models on Kubernetes.

PurposeModel Serving
MaintainerBentoML
DocumentationDocumentation
Source Codebentoml/Yatai
Roadmap PriorityLower

The core features of BentoML Yatai are:

Seldon Core

Seldon Core provides interfaces for converting ML models into REST/gRPC microservices on Kubernetes.

PurposeModel Serving
MaintainerSeldon
DocumentationDocumentation
Source CodeSeldonIO/seldon-core
Roadmap PriorityLower

The core features of Seldon Core are:

  • Support for Many Frameworks: Seldon Core natively supports many ML frameworks (e.g. TensorFlow, scikit-learn, XGBoost, HuggingFace, NVIDIA Triton).
  • Reusable Model Servers: Seldon Core removes the need to build a container image for each model, by providing a system to download model artifacts at runtime.
  • Model Deployment CRD Seldon Core provides a simple, yet powerful, Kubernetes CRD for deploying models.

Last update: 2024-03-16
Created: 2023-04-27