Future Tools¶
A list of ML & Data tools which may be part of future versions of deployKF.
How do I request or contribute a tool?
If you would like to request or contribute support for a tool, please raise an issue on GitHub, or join the discussion on an existing issue.
Tool Roadmap¶
The following is a roadmap of ML & Data tools which are planned for future versions of deployKF, grouped by priority.
Higher Priority¶
| Name (Click for Details) | Purpose |
|---|---|
| MLflow Model Registry | Model Registry |
| KServe | Model Serving |
Medium Priority¶
| Name (Click for Details) | Purpose |
|---|---|
| Feast | Feature Store |
| Apache Airflow | Workflow Orchestration |
Lower Priority¶
| Name (Click for Details) | Purpose |
|---|---|
| DataHub | Data Catalog |
| Airbyte | Data Integration |
| Label Studio | Data Labeling |
| BentoML Yatai | Model Serving |
| Seldon Core | Model Serving |
Tool Details¶
The following sections provide details and descriptions of each tool which is planned for future versions of deployKF.
Details - MLflow Model Registry
MLflow Model Registry¶
| Purpose | Model Registry |
|---|---|
| Maintainer | Databricks |
| Documentation | Documentation |
| Source Code | mlflow/mlflow |
| Roadmap Priority | Higher |
A model registry decouples model training from model deployment, allowing you to break the model lifecycle down into three separate concerns. This separation enables you to have well-scoped pipelines, rather than trying to go from training to deployment all at once.
- Model Training: Training new versions of models and logging them into the registry.
- Model Evaluation: Evaluating versions of models and logging the results into the registry.
- Model Deployment: Making informed decisions about which models to deploy and then deploying them.
The key features of MLflow Model Registry are:
- Model Versioning: Version your model artifacts and attach metadata to each version.
- Model Stage Transitions: Transition models between stages (e.g. staging to production).
- Web UI: A graphical web interface for managing models.
- Python API: A Python API for managing models.
- REST API: A REST API for managing models.
Details - KServe
KServe¶
| Purpose | Model Serving |
|---|---|
| Maintainer | Linux Foundation |
| Documentation | Documentation |
| Source Code | kserve/kserve |
| Roadmap Priority | Higher |
The core features of KServe are:
- Support for Many Frameworks: KServe natively supports many ML frameworks (e.g. PyTorch, TensorFlow, scikit-learn, XGBoost).
- Autoscaling, Even to Zero: KServe can autoscale model replicas to meet demand, even scaling to zero when there are no requests.
- Model Monitoring: KServe integrates tools like Alibi Detect to provide model monitoring for drift and outlier detection.
- Model Explainability: KServe integrates tools like Alibi Explain to provide model explainability.
- Request Batching: KServe can batch requests to your model, improving throughput and reducing cost.
- Canary Deployments: KServe can deploy new versions of your model alongside old versions, and route requests to the new version based on a percentage.
- Feature Transformers: KServe can do feature pre/post processing alongside model inference (e.g. using Feast).
- Inference Graphs: KServe can chain multiple models together to form an inference graph.
Details - Feast
Feast¶
| Purpose | Feature Store |
|---|---|
| Maintainer | Tecton |
| Documentation | Documentation |
| Source Code | feast-dev/feast |
| Roadmap Priority | Medium |
A good way to understand the purpose of a feature store is to think about the data access patterns encountered during the model lifecycle. A feature store should somehow make these data access patterns easier.
- Feature Engineering: Accesses and transforms historical data to create features.
- Target Engineering: Accesses and transforms historical data to create targets.
- Model Training: Accesses features and targets to train and evaluate the model.
- Model Inference: Accesses features of new data to predict the target.
The key features of Feast are:
- Feature Registry: Where Feast persists feature definitions (not data) that are registered with with it (e.g. Local-Files, S3, GCS).
- Python SDK: The primary interface for managing feature definitions, and retrieving feature values from Feast.
- Offline Data Stores: A store which Feast can read feature values from, for historical data retrieval (e.g. Snowflake, BigQuery, Redshift).
- Online Data Stores: A store which Feast can materialize (write) feature values into, for online model inference (e.g. Snowflake, Redis, DynamoDB, Bigtable).
- Batch Materialization Engine: A data processing engine which Feast can use to materialize feature values from an Offline Store into an Online Store (e.g. Snowflake, Spark, Bytewax).
A good feature store is NOT a database, but rather a data access layer between your data sources and your ML models. Be very wary of any feature store that requires you to load your data into it directly.
Details - Apache Airflow
Apache Airflow¶
| Purpose | Workflow Orchestration |
|---|---|
| Maintainer | Apache Software Foundation |
| Documentation | Documentation |
| Source Code | apache/airflow |
| Roadmap Priority | Medium |
The versatility and extensibility of Apache Airflow make it a great fit for many different use cases, including machine learning.
The key features of Apache Airflow are:
- Python Centered: Airflow is written in Python and uses a Python DSL to define workflows.
- Dynamic Workflows: Airflow's code-driven workflow definitions enable powerful patterns like dynamically generating workflows.
- Extensive Plugins: Airflow has a rich ecosystem of plugins and integrations with other tools.
- User Interface: Airflow is known for its powerful user interface which allows users to monitor and manage workflows.
Details - DataHub
DataHub¶
| Purpose | Data Catalog |
|---|---|
| Maintainer | Acryl Data |
| Documentation | Documentation |
| Source Code | datahub-project/datahub |
| Roadmap Priority | Lower |
The core features of DataHub are:
- Support for Many Data Sources: DataHub supports ingestion of metadata from many sources.
- Search & Discovery: DataHub provides a search interface for discovering data.
- Data Lineage: DataHub can capture and visualize complex data lineage.
Details - Airbyte
Airbyte¶
| Purpose | Data Integration |
|---|---|
| Maintainer | Airbyte |
| Documentation | Documentation |
| Source Code | airbytehq/airbyte |
| Roadmap Priority | Lower |
The core features of Airbyte are:
- Comprehensive Connector Catalog: Airbyte has an extremely large catalog of connectors for data sources and destinations.
- Airbyte Web UI: Airbyte provides a graphical web interface for managing data connectors and orchestrating data syncs.
Details - Label Studio
Label Studio¶
| Purpose | Data Labeling |
|---|---|
| Maintainer | Heartex |
| Documentation | Documentation |
| Source Code | heartexlabs/label-studio |
| Roadmap Priority | Lower |
The core features of Label Studio are:
- Data Types: Label Studio supports a variety of data types, including text, images, audio, video, and time series.
- Task Templates: Label Studio provides many templates for common labeling tasks, including text classification, named entity recognition, and object detection.
- Label Studio Web UI: Label Studio provides a graphical web interface for labeling data and managing labeling projects.
Details - BentoML Yatai
BentoML Yatai¶
| Purpose | Model Serving |
|---|---|
| Maintainer | BentoML |
| Documentation | Documentation |
| Source Code | bentoml/Yatai |
| Roadmap Priority | Lower |
The core features of BentoML Yatai are:
- Model Registry: A central registry for packaged Bentos.
- Model Deployment: Managing the deployment of BentoML models to Kubernetes, including building model container images.
- Web UI: A graphical web interface for viewing, deploying, and monitoring models.
- REST APIs: A REST API for viewing, deploying, and monitoring models.
- Kubernetes CRDs: Manage the deployment of models in a DevOps-friendly way.
Details - Seldon Core
Seldon Core¶
| Purpose | Model Serving |
|---|---|
| Maintainer | Seldon |
| Documentation | Documentation |
| Source Code | SeldonIO/seldon-core |
| Roadmap Priority | Lower |
The core features of Seldon Core are:
- Support for Many Frameworks: Seldon Core natively supports many ML frameworks (e.g. TensorFlow, scikit-learn, XGBoost, HuggingFace, NVIDIA Triton).
- Reusable Model Servers: Seldon Core removes the need to build a container image for each model, by providing a system to download model artifacts at runtime.
- Model Deployment CRD Seldon Core provides a simple, yet powerful, Kubernetes CRD for deploying models.
Created: 2023-04-27