Skip to content

MySQL

Learn how and why deployKF needs MySQL. Learn how to use an external MySQL database to improve the performance and reliability of Kubeflow Pipelines and Katib.


What is MySQL?

MySQL is an extremely popular and open-source relational database management system. Many of the world's largest applications use MySQL to store and manage their data.


Why does deployKF use MySQL?

MySQL is a dependency of the following ML & Data tools, which are part of deployKF:

  • Kubeflow Pipelines: stores metadata about pipelines, experiments, and runs
  • Katib: stores metadata about hyperparameter tuning experiments

Connect an External MySQL

By default, deployKF includes an embedded MySQL instance. However, to improve the performance and reliability of Kubeflow Pipelines and Katib, we recommend using an external MySQL database.

Embedded MySQL

You should ALWAYS use an external MySQL database. The embedded MySQL is a single-instance server running in a Kubernetes Pod, with no backups or high-availability.

1. Prepare MySQL

You may use any MySQL database service, as long as it is accessible from the Kubernetes cluster where deployKF is running.

You may consider using one of the following services:

Platform MySQL Service
Amazon Web Services Amazon Relational Database Service (RDS)
Microsoft Azure Azure Database for MySQL
Google Cloud Cloud SQL
Alibaba Cloud ApsaraDB RDS for MySQL
IBM Cloud IBM Cloud Databases for MySQL
Self-Hosted MySQL Community Edition

You must create some databases (schemas) and a user with the appropriate permissions to access them.

MySQL User Authentication

You MUST set the user's authentication plugin to mysql_native_password, NOT caching_sha2_password, which is the default in MySQL 8.0.4+. Kubeflow Pipelines does not support caching_sha2_password (kubeflow/pipelines#9549).

The following SQL command will show the authentication plugin for each user:

SELECT user, host, plugin FROM mysql.user;

For example, you might run the following SQL commands to create the databases and users:

-- create the databases
CREATE DATABASE IF NOT EXISTS `katib`;
CREATE DATABASE IF NOT EXISTS `kfp_cache`;
CREATE DATABASE IF NOT EXISTS `kfp_metadata`;
CREATE DATABASE IF NOT EXISTS `kfp_pipelines`;

-- create the 'kubeflow' user (allowing access from any host)
CREATE USER 'kubeflow'@'%' IDENTIFIED WITH mysql_native_password BY 'MY_PASSWORD';

-- grant access to the databases
GRANT ALL PRIVILEGES ON `katib`.* TO 'kubeflow'@'%';
GRANT ALL PRIVILEGES ON `kfp_cache`.* TO 'kubeflow'@'%';
GRANT ALL PRIVILEGES ON `kfp_metadata`.* TO 'kubeflow'@'%';
GRANT ALL PRIVILEGES ON `kfp_pipelines`.* TO 'kubeflow'@'%';

2. Disable the Embedded MySQL

The deploykf_opt.deploykf_mysql.enabled value controls if the embedded MySQL instance is deployed.

The following values will disable the embedded MySQL instance:

deploykf_opt:
  deploykf_mysql:
    enabled: false

3. Connect Katib

To connect Katib to your external MySQL database, you will need to configure the following values:

Value Purpose
kubeflow_tools.katib.mysqlDatabase name of database/schema
kubeflow_tools.katib.mysql connection details & credentials

The following values will connect Katib to an external MySQL database at mysql.example.com on port 3306, using the katib database, reading the username and password from a Kubernetes secret called my-secret-name (from the kubeflow namespace):

kubeflow_tools:
  katib:
    mysqlDatabase: "katib"

    mysql:
      useExternal: true
      host: "mysql.example.com"
      port: 3306
      auth:
        ## (OPTION 1):
        ##  - set username/password with values (NOT RECOMMENDED)
        #username: kubeflow
        #password: password

        ## (OPTION 2):
        ##  - read a kubernetes secret from the 'kubeflow' namespace
        ##  - note, `existingSecret*Key` specifies the KEY NAMES in the 
        ##    secret itself, which contain the secret values
        existingSecret: "my-secret-name"
        existingSecretUsernameKey: "username"
        existingSecretPasswordKey: "password"

4. Connect Kubeflow Pipelines

To connect Kubeflow Pipelines to your external MySQL database, you will need to configure the following values:

Value Purpose
kubeflow_tools.pipelines.mysqlDatabases names of databases/schemas
kubeflow_tools.pipelines.mysql connection details & credentials

The following values will connect Kubeflow Pipelines to an external MySQL database at mysql.example.com on port 3306, using the kfp_cache, kfp_metadata, and kfp_pipelines databases, reading the username and password from a Kubernetes secret called my-secret-name (from the kubeflow namespace):

kubeflow_tools:
  pipelines:
    mysqlDatabases:
      cacheDatabase: kfp_cache
      metadataDatabase: kfp_metadata
      pipelinesDatabase: kfp_pipelines

    mysql:
      useExternal: true
      host: "mysql.example.com"
      port: 3306
      auth:
        ## (OPTION 1):
        ##  - set username/password with values (NOT RECOMMENDED)
        #username: kubeflow
        #password: password

        ## (OPTION 2):
        ##  - read a kubernetes secret from the 'kubeflow' namespace
        ##  - note, `existingSecret*Key` specifies the KEY NAMES in the 
        ##    secret itself, which contain the secret values
        existingSecret: "my-secret-name"
        existingSecretUsernameKey: "username"
        existingSecretPasswordKey: "password"

Last update: 2024-05-10
Created: 2024-05-10