dasini.net &#8211; Diary of a MySQL expert

Watch my videos on my YouTube channel and subscribe.

Thanks for using HeatWave & MySQL!

0

Filed under:
HeatWave, MySQL, Tuto by Olivier DASINI

In-Database LLMs for Efficient Text Translation with HeatWave GenAI

August 13, 2024

Tags: AI, GenAI, Heatwave

This article builds upon the concepts introduced in my previous blog post, HeatWave GenAI: Your AI-Powered Content Creation Partner. For a deeper understanding, consider reading that article first.

HeatWave offered different solutions for your different workload:

HeatWave MySQL: built on MySQL, for your transactional needs.
HeatWave Analytics: get real-time analytics on your data.
HeatWave Lakehouse: query data in various formats in object storage
HeatWave AutoML: automate the pipeline to build, train, and explain ML models
HeatWave GenAI: integrated & automated GenAI with in-database LLMs

In this article you’ll experience the power of In-database translation with HeatWave GenAI. I am using HeatWave 9.0.1.

As demonstrated in my previous article, HeatWave GenAI: Your AI-Powered Content Creation Partner, HeatWave GenAI streamlines content generation by seamlessly integrating LLMs into the database. This innovative approach eliminates the complexities of external connections, allowing for effortless AI integration via straightforward SQL queries.

While HeatWave GenAI excels at generating English text, its capabilities extend to translation as well. Let’s explore how we can effectively translate English content into French using this powerful tool.

Load the LLM in HeatWave memory

Use the ML_MODEL_LOAD stored procedure: CALL sys.ML_MODEL_LOAD(‘<LLM>’, NULL);
The model needs to be loaded only once before generating text. It remains accessible within HeatWave as long as the service is running. This means you don’t have to reload the model for each text generation.

At the time of writing (August 2024) the available models are Mistral 7B (mistral-7b-instruct-v1) and Llama 2 (llama2-7b-v1).

-- Load Mistral 7B model
mysql>
CALL sys.ML_MODEL_LOAD('mistral-7b-instruct-v1', NULL);

Define your prompt in natural language

We obviously need a text to translate:

mysql> 
-- Text for Translation
SET @ENtext = "Dracula is a gothic horror novel by Bram Stoker, published on 26 May 1897. An epistolary novel, the narrative is related through letters, diary entries, and newspaper articles. It has no single protagonist and opens with solicitor Jonathan Harker taking a business trip to stay at the castle of a Transylvanian nobleman, Count Dracula. Harker escapes the castle after discovering that Dracula is a vampire, and the Count moves to England and plagues the seaside town of Whitby. A small group, led by Abraham Van Helsing, investigate, hunt and kill Dracula.";

Text extracted from Wikipedia: https://en.wikipedia.org/wiki/Dracula

With the text ready for translation, we need to provide clear instructions to the model about the desired output.
Actually, the most difficult part is here. The quality of the translation will be highly impacted by the quality of your prompt.

For now, let’s try a very simple prompt like: “Translate the following text into French, ensuring natural and grammatically correct phrasing“.
It will be concatenated (CONCAT) with the provided English text :

mysql> 
-- Enhanced prompt
SET @finalPrompt = CONCAT("Translate the following text into French, ensuring natural and grammatically correct phrasing: ", @ENtext);

Inference setting

Here, you select the desired task, choose the appropriate model, and fine-tune parameters to influence the output.

mysql> 
-- Inference setting
SET @inferenceSetup  = '{"task": "generation", "model_id": "mistral-7b-instruct-v1"}';

For simplicity, this example uses default parameters. A more complex example is provided below.

Generate the translation using HeatWave GenAI

And finally, send the enhanced prompt to the LLM using the ML_GENERATE function. The second parameter of this function is a JSON object that allows you to specify the task, the LLM, and the tuning options.

mysql>
-- Translation
SELECT sys.ML_GENERATE(@finalPrompt, @inferenceSetup)\G

*************************** 1. row ***************************
sys.ML_GENERATE(@finalPrompt, @inferenceSetup): {"text": " Réponse : Le roman gothique horreur de Bram Stoker, intitulé « Dracula », a été publié le 26 mai 1897. C'est un roman épistolaire dont la narration se fait par les lettres, les entrées dans le journal et les articles de presse. Il n'a pas d'héroïne principale et commence avec l'avocat Jonathan Harker qui effectue une visite d'affaires au château d'un nobleman transylvanien, le comte Dracula. Harker s'évade du château après avoir découvert que Dracula est un vampire, et le comte se rend en Angleterre et pille la ville côtière de Whitby. Un petit groupe, dirigé par Abraham Van Helsing, enquête, chasse et tue Dracula."

The output in a JSON that is not really nice to read.
But SQL is very powerful and you can easily improve the output using JSON_TABLE function.

SELECT TranslationFR AS TranslationFR FROM ( SELECT sys.ML_GENERATE(@finalPrompt, @inferenceSetup) AS textGen ) AS dt INNER JOIN JSON_TABLE ( dt.textGen, "$" COLUMNS( TranslationFR text PATH "$.text") ) AS jt \G

*************************** 1. row ***************************
TranslationFR:  Réponse : Le roman gothique horreur de Bram Stoker, intitulé « Dracula », a été publié le 26 mai 1897. C'est un roman épistolaire dont la narration se fait par les lettres, les entrées dans le journal et les articles de presse. Il n'a pas d'héroïne principale et commence avec l'avocat Jonathan Harker qui effectue une visite d'affaires au château d'un nobleman transylvanien, le comte Dracula. Harker s'évade du château après avoir découvert que Dracula est un vampire, et le comte se rend en Angleterre et pille la ville côtière de Whitby. Un petit groupe, dirigé par Abraham Van Helsing, enquête, chasse et tue Dracula.

An alternative is the following query:

SELECT TranslationFR->>"$.text" AS "TranslationFR" FROM (SELECT sys.ML_GENERATE(@finalPrompt, @inferenceSetup) AS TranslationFR ) AS dt \G

*************************** 1. row ***************************
TranslationFR:  Réponse : Le roman gothique horreur de Bram Stoker, intitulé « Dracula », a été publié le 26 mai 1897. C'est un roman épistolaire dont la narration se fait par les lettres, les entrées dans le journal et les articles de presse. Il n'a pas d'héroïne principale et commence avec l'avocat Jonathan Harker qui effectue une visite d'affaires au château d'un nobleman transylvanien, le comte Dracula. Harker s'évade du château après avoir découvert que Dracula est un vampire, et le comte se rend en Angleterre et pille la ville côtière de Whitby. Un petit groupe, dirigé par Abraham Van Helsing, enquête, chasse et tue Dracula.

Although the translation isn’t perfect, French speakers should grasp the gist, even if it reads like a particularly polite Advance-fee scam. 😀

Let’s see how to improve that!

Areas for improvement

You now have all the essential components to implement real-time translation within your applications.

Although the generated output may not always align with expectations, you can refine results by adjusting the prompt or modifying inference parameters.

Inference parameters for text generation include:

task
model_id
temperature
max_tokens
top_k
top_p
repeat_penalty
stop_sequences
token_likelihoods
echo
logprobs
stream
context

While prompt engineering is a critical skill, it’s beyond the scope of this article. I strongly recommend dedicating time to mastering this discipline for optimal results.

Let’s try to do some fine tuning.

The text for translation is the same:

-- Text for Translation
SET @ENtext = "Dracula is a gothic horror novel by Bram Stoker, published on 26 May 1897. An epistolary novel, the narrative is related through letters, diary entries, and newspaper articles. It has no single protagonist and opens with solicitor Jonathan Harker taking a business trip to stay at the castle of a Transylvanian nobleman, Count Dracula. Harker escapes the castle after discovering that Dracula is a vampire, and the Count moves to England and plagues the seaside town of Whitby. A small group, led by Abraham Van Helsing, investigate, hunt and kill Dracula.";

The enhanced prompt contains more information:
You are an expert translator from English to French. Your task is to translate the original text from English to French. Ensure the French translation is accurate, natural-sounding, and grammatically correct. Preserve the original text’s meaning, style, and context. Only generate the French translation of the original text. \n – Original Text: “, @ENtext, ” \n – French Translation:

-- Enhanced prompt
SET @finalPrompt = CONCAT("You are an expert translator from English to French. Your task is to translate the original text from English to French. Ensure the French translation is accurate, natural-sounding, and grammatically correct. Preserve the original text's meaning, style, and context. Only generate the French translation of the original text. \n  - Original Text: ", @ENtext, " \n  - French Translation: ");

I modified the inference parameters – temperature, repeat_penalty, top_k and top_p:

-- Inference setting
SET @inferenceSetup  = '{"task": "generation", "temperature": 0.9, "repeat_penalty": 1, "top_k": 0, "top_p": 0.2, "model_id": "mistral-7b-instruct-v1"}';

And finally, generate the translation:

-- Translation
SELECT TranslationFR->>"$.text" AS "TranslationFR" FROM (SELECT sys.ML_GENERATE(@finalPrompt, @inferenceSetup) AS TranslationFR ) AS dt \G

*************************** 1. row ***************************
TranslationFR:  Dracula est un roman gothique d'horreur de Bram Stoker, publié le 26 mai 1897. C'est un roman épistolaire, la narration se fait par des lettres, des entrées de journal et des articles de journaux. Il n'a pas de héros principaux et commence par le soliciteur Jonathan Harker qui se rend à l'abbaye du comte Dracula, un nobleman de Transylvanie. Harker s'évade de l'abbaye après avoir découvert que Dracula est un vampire, et le comte s'installe en Angleterre et pille la ville côtière de Whitby. Un petit groupe, dirigé par Abraham Van Helsing, enquête, chasse et tue Dracula.

The results show an improvement. Further refinements can be achieved by optimizing inference parameters and prompt engineering.

Peroration

This article explored the use of HeatWave GenAI for text translation. By integrating LLMs into the database, users can efficiently translate text. While this article provided a basic overview, further experimentation with prompts and parameters can enhance results.

By leveraging the power of in-database AI, organizations can unlock new opportunities for global communication and collaboration.

Stay tuned for more insights!

Watch my videos on my YouTube channel and subscribe.

Thanks for using HeatWave & MySQL!

1

Filed under:
AI, HeatWave by Olivier DASINI

HeatWave GenAI: Your AI-Powered Content Creation Partner

August 7, 2024

Tags: Heatwave

Generative artificial intelligence (GenAI) is reshaping the content creation landscape. By training on vast datasets, these “intelligent” systems can produce new, human-quality content across a multitude of domains.

Oracle’s HeatWave GenAI (starting with version 9.0.1) is at the forefront of this revolution, offering an integrated platform that combines in-database large language models (LLMs), vector stores, and scale-out vector processing to streamline content generation.

This article explores how HeatWave GenAI is empowering businesses to produce high-quality content rapidly and effectively, making it an indispensable tool for industries demanding speed, accuracy, and security.

HeatWave GenAI brings LLMs directly into your database, enabling powerful AI capabilities and natural language processing.

The key benefits of such architecture are:

Effortless LLM Integration: Simple SQL queries unlock AI power.
Seamless Database Integration: No complex external connections.
Simplified Architecture: Easy management and scalability.
Scalable Performance: Handles large datasets and high traffic.
Always Up-to-Date Data: In-database LLMs ensure data freshness.

Generate text-based content using HeatWave GenAI is a very simple 4 steps process:

Load the LLM in HeatWave memory
Define your prompt in natural language
Inference setting
Generate text-based content using HeatWave GenAI

Load the LLM in HeatWave memory

Use the ML_MODEL_LOAD stored procedure: CALL sys.ML_MODEL_LOAD(‘<LLM>’, NULL);
The model needs to be loaded only once before generating text. It remains accessible within HeatWave as long as the service is running. This means you don’t have to reload the model for each text generation.

At the time of writing (August 2024) the available models are Mistral 7B (mistral-7b-instruct-v1) and Llama 2 (llama2-7b-v1).

-- Load Mistral 7B model
mysql>
CALL sys.ML_MODEL_LOAD('mistral-7b-instruct-v1', NULL);

Define your prompt in natural language

Like with you prefered programming language, it is more convenient to use a session variable: SET @userPrompt=”My prompt”;

mysql>
SET @userPrompt="Explain what is Generative Artificial intelligence in one sentence";

or whatever your needs:

SET @userPrompt="How to connect to MySQL using Python 3?";
SET @userPrompt="Write a short email to schedule a meeting.";
...

As you likely know, the quality of the prompt significantly impacts the generated text.

Inference setting

Here, you select the desired task, choose the appropriate model, and fine-tune parameters to influence the output.

mysql> 
-- Inference setting
SET @inferenceSetup  = '{"task": "generation", "model_id": "mistral-7b-instruct-v1"}';

For simplicity, this example uses default parameters. A more complex example is provided below.

Generate text-based content using HeatWave GenAI

Last step is to pass the user prompt to the LLM using the ML_GENERATE function. The second parameter of this function is a JSON object that allows you to specify the task, the LLM, and some tuning options like the temperature.

mysql>
SELECT sys.ML_GENERATE(@userPrompt, @inferenceSetup)\G

The output in a JSON that is not really nice to read.
But SQL is very powerful and you can easily improve the output using JSON_TABLE function.

mysql>
SELECT GenContent AS GenContent FROM ( SELECT sys.ML_GENERATE(@userPrompt, @inferenceSetup) AS textGen ) AS dt INNER JOIN JSON_TABLE ( dt.textGen, "$" COLUMNS( GenContent text PATH "$.text") ) AS jt \G

You can influence LLM outputs by adjusting parameters like temperature, number of tokens, stop sequences, top_k, top_p, etc…

Text generation options include:

task
model_id
temperature
max_tokens
top_k
top_p
repeat_penalty
stop_sequences
token_likelihoods
echo
logprobs
stream
context

Below is a quick example:

mysql>
-- Define your prompt in natural language
SET @userPrompt="Explain what is Generative Artificial intelligence in one sentence";


-- Inference setting
SET @inferenceSetup  = '{"task": "generation", "temperature": 3, "repeat_penalty": 1, "top_k": 1, "max_tokens": 500, "model_id": "mistral-7b-instruct-v1"}';


-- Generate text-based content
SELECT GenContent AS GenContent FROM ( SELECT sys.ML_GENERATE(@userPrompt, @inferenceSetup) AS textGen ) AS dt INNER JOIN JSON_TABLE ( dt.textGen, "$" COLUMNS( GenContent text PATH "$.text") ) AS jt \G
*************************** 1. row ***************************
GenContent:  Generative Artificial Intelligence (AI) refers to a subset of AI that involves the creation of new, original content, such as images, music, or text, through the use of algorithms and machine learning models that can learn from and generate data.

And that’s all!
As you can see, it’s very simple and easy with HeatWave GenAI to create some exciting GenAI based applications like:

Enhancing Customer Experience

Intelligent Chatbots: Provide real-time, personalized customer support, answering queries, and resolving issues efficiently.
Sentiment Analysis: Understand customer feedback, identify trends, and tailor responses accordingly.

Boosting Content Creation

Content Generation: Create engaging blog articles, product descriptions, and social media posts at scale.
Personalized Recommendations: Offer tailored product suggestions based on customer preferences and behavior.
Social Media Management: Generate engaging content, schedule posts, and monitor performance.

Driving Business Insights

Text Classification: Categorize vast amounts of text data for efficient analysis and decision-making.
Market Research: Analyze customer sentiment, identify trends, and gain competitive insights.

Generative AI is a game-changer, offering endless possibilities to improve efficiency, enhance customer experiences, and drive business growth.

HeatWave GenAI: Your AI-Powered Content Creation Partner demo

Moving forward

While the examples provided offer a glimpse of generative AI’s potential, it’s important to note that significant improvements can be achieved through advanced prompt engineering techniques. Exploring these techniques in depth is beyond the scope of this article.

In upcoming articles, we’ll delve deeper into HeatWave GenAI’s capabilities, showcasing its application in text translation and Retrieval Augmented Generation (RAG).

Stay tuned for more insights!

Watch my videos on my YouTube channel and subscribe.

Thanks for using HeatWave & MySQL!

0

Filed under:
AI, HeatWave, MySQL by Olivier DASINI

Get Your HeatWave MySQL Data On-Premises with Replication

July 16, 2024

Tags: Heatwave

MySQL Replication exist for centuries… OK, I’m exaggerating a little bit 🙂 but this feature is as much appreciated as it is useful, because is helping DBAs to easily build scalable and redundant architectures:

MySQL 3.23.15 and up features support for one-way, asynchronous replication, in which one
server acts as the master, while one or more other servers act as slaves
MySQL 3.23/4.0/4.1 Reference Manual

At least from MySQL 3.23… I told you it’s been there for a long time: MySQL first Public Releases

But what does replication do?
Quoting the documentation again (the most recent this time)

Replication enables data from one MySQL database server (the source) to be copied to one or more MySQL database servers (the replicas).
Replication is asynchronous by default; replicas do not need to be connected permanently to receive updates from a source.
MySQL 8.4 Reference Manual

In this article we will see how to replicate data from HeatWave MySQL to a on-premise standalone MySQL Instance.

Get Your HeatWave MySQL Data On-Premises with Replication

Source is HeatWave MySQL 8.4.1
- IP: 137.235.181.51
- Port: 3306
Replica is MySQL 8.4.1
- IP: 88.65.229.255
- Port: 3306
Replication user
- ‘rplUser’@’88.65.229.%’

We will use 3 main components:

HeatWave MySQL as a source
The asynchronous replication capability of MySQL
A MySQL instance as a read replica

HeatWave MySQL

HeatWave is a fully-managed database service, powered by the integrated in-memory query accelerator. It is the only cloud-native database service that combines transactions, analytics, machine learning and generative AI services into HeatWave, delivering real-time, secure analytics without the complexity, latency, and cost of ETL duplication. It also includes HeatWave Lakehouse which allows users to query data stored in object storage in a variety of file formats.
HeatWave is developed, managed, and supported by the MySQL team in Oracle.

HeatWave MySQL accelerate MySQL query performance by orders of magnitude and get real-time analytics on your transactional data. HeatWave MySQL is built on MySQL Enterprise Edition.

The HeatWave MySQL endpoints are not directly accessible from the internet (i.e. private IP). That said, there are different possibilities to connect such as a compute instance, VPN connection, Bastion session, or network load balancer.
In this article I’ll use the latter, which is present on a public subnet, enables you to connect to HeatWave MySQL over the internet.
However please note that it is not recommended, to make a database accessible over the internet, because it is a security risk.
You must restrict the authorized public IP addresses to a single IP address or a small range of IP addresses, and use in-transit encryption.
It is recommended to use a VPN connection.
You are warned!

I assuming your HeatWave MySQL instance is already created. If not, you can use the Console or the command-line interface to create an instance.
Using the console is pretty straightforward, you can find the details in the OCI documentation.

I also assuming that you have already setup your Network Load Balancer. If not I recommend you this great tutorial from my colleague Scott : Connecting to a MySQL HeatWave Database Instance Using an OCI Network Load Balancer.

MySQL Replication setup

Since MySQL replication is a familiar concept, widely covered over the years, let’s get right to the practical details that will help you set up your replication efficiently.
You’ll find all the juicy details in the MySQL 8.4 Reference Manual and some extra here and here.

Because the data stream will be over the Internet, we must, for obvious security reason, encrypted the connection.
More information on setting up Replication to Use Encrypted Connections.

Source configuration

On the source, create the replication user then grant that user the REPLICATION SLAVE privilege.
If you want to ensure that the source only accepts replicas that connect using encrypted connections (and btw you really want that), use the REQUIRE SSL option

-- On the Source
-- Create the replication user
CREATE USER 'rplUser'@'88.65.229.%' IDENTIFIED BY 'Q{P6@EH$L!YFje^9' REQUIRE SSL;
GRANT REPLICATION SLAVE ON *.* TO 'rplUser'@'88.65.229.%';

If the HeatWave MySQL instance (the source) contains existing data it is necessary to copy this data to the replica. There are different ways to dump the data from the source database, the recommended way in that context is to use MySQL Shell dump and load utilities.

You will most likely need dumpInstance and the relevant options are dependent of your context, the most important is usually compatibility.
MySQL Shell allows you to store your backup into an object store bucket and load from there, this is very convenient. For this article, the size of my dataset allows me to store it locally, and the only relevant option is dryRun.
More details in the documentation: Instance Dump Utility, Schema Dump Utility, and Table Dump Utility.

$ mysqlsh --js admin@137.235.181.51 

JS> util.dumpInstance("/backup/20240714", { dryRun: true })
dryRun enabled, no locks will be acquired and no files will be created.
...

If the result is satisfactory, then disable dry run:

JS> util.dumpInstance("/backup/20240714")
...

For your record, below an example of a dump that is stored in an object Storage bucket named MyBck_07 on Oracle Cloud Infrastructure:

util.dumpInstance("20240714", {osBucketName: "MyBck_07", osNamespace: "abcdefghijkl", ocimds: true })

Restore the dump on replica

The dump must be restored on the replica:

-- On the Replica
\sql SET GLOBAL  local_infile=1;

util.loadDump("/backup/20240714", {updateGtidSet:"replace", skipBinlog:true})

\sql SET GLOBAL  local_infile=0;

All the option details are available in the documentation. Please note that you can test you command with the dryRun option.

Also for your record, if you want to restore a dump for a bucket, your command will look like:

util.loadDump("20240714", {osBucketName: "MyBck_07", osNamespace: "abcdefghijkl"})

Setting the Source Configuration on the Replica

To set up the replica to communicate with the source for replication, configure the replica with the necessary connection information:

-- On the Replica
CHANGE REPLICATION SOURCE TO SOURCE_HOST = '137.235.181.51', SOURCE_AUTO_POSITION = 1, SOURCE_SSL=1;

START REPLICA USER = 'rplUser' PASSWORD = 'Q{P6@EH$L!YFje^9';

For a full list of options, see CHANGE REPLICATION SOURCE TO options.

Administration

On the replica, you can manage your replication as you normally do:

-- Checking Replication Status
SQL > SHOW REPLICA STATUS\G
*************************** 1. row ***************************
             Replica_IO_State: Waiting for source to send event
                  Source_Host: 137.235.181.51
                  Source_User: rplUser
                  Source_Port: 3306
                Connect_Retry: 60
              Source_Log_File: binary-log.000088
          Read_Source_Log_Pos: 378
               Relay_Log_File: daz-Latitude-7400-relay-bin.000004
                Relay_Log_Pos: 557
        Relay_Source_Log_File: binary-log.000088
           Replica_IO_Running: Yes
          Replica_SQL_Running: Yes
...
          SQL_Remaining_Delay: NULL
    Replica_SQL_Running_State: Replica has read all relay log; waiting for more updates
...

Other useful commands on the replica, using performance_schema are:

SQL > SELECT * FROM performance_schema.replication_applier_status\G
*************************** 1. row ***************************
              CHANNEL_NAME: 
             SERVICE_STATE: ON
           REMAINING_DELAY: NULL
COUNT_TRANSACTIONS_RETRIES: 0

SQL > SELECT * FROM performance_schema.replication_connection_configuration\G
*************************** 1. row ***************************
                   CHANNEL_NAME: 
                           HOST: 137.235.181.51
                           PORT: 3306
                           USER: rplUser
              NETWORK_INTERFACE: 
                  AUTO_POSITION: 1
                    SSL_ALLOWED: YES
...

SQL > SELECT * FROM performance_schema.replication_connection_status\G
*************************** 1. row ***************************
               CHANNEL_NAME: 
                 GROUP_NAME: 
                SOURCE_UUID: 22f79ae9-991d-4b3c-82b9-b81bb2366635
                  THREAD_ID: 73
              SERVICE_STATE: ON
  COUNT_RECEIVED_HEARTBEATS: 17
   LAST_HEARTBEAT_TIMESTAMP: 2024-06-26 10:42:57.615232
   RECEIVED_TRANSACTION_SET: 22f79ae9-991d-4b3c-82b9-b81bb2366635:56
          LAST_ERROR_NUMBER: 0
         LAST_ERROR_MESSAGE: 
       LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00.000000
    LAST_QUEUED_TRANSACTION: 22f79ae9-991d-4b3c-82b9-b81bb2366635:56
...

On the source, you can check the status of connected replicas:

-- Check the status of connected replicas
SQL > SHOW PROCESSLIST \G
*************************** 6. row ***************************
     Id: 547
   User: rplUser
   Host: nlb-mhs-paris-dba.sub03221692542.vcnparisdba.oraclevcn.com:1234
     db: NULL
Command: Binlog Dump GTID
   Time: 428
  State: Source has sent all binlog to replica; waiting for more updates
   Info: NULL

Basic information about the replicas that were started with the --report-host option and are connected to the source:

SQL > SHOW REPLICAS \G
*************************** 1. row ***************************
   Server_Id: 1
        Host: Replica-8865229255
        Port: 3306
   Source_Id: 3640816068
Replica_UUID: 46c6de0c-2ff7-4b14-a80a-5dffea08864a

Other useful commands on the source, using performance_schema are:

SQL > SELECT 
   SERVICE_STATE, 
   HOST, 
   USER, 
   PORT
FROM performance_schema.replication_connection_configuration 
   INNER JOIN performance_schema.replication_applier_status 
      USING (CHANNEL_NAME) \G
*************************** 1. row ***************************
SERVICE_STATE: ON
         HOST: 137.235.181.51
         USER: rplUser
         PORT: 3306

SELECT
    concat(conn_status.channel_name, ' (', worker_id,')') AS channel,
    conn_status.service_state AS io_state,
    applier_status.service_state AS sql_state,
    format_pico_time(if(GTID_SUBTRACT(LAST_QUEUED_TRANSACTION, LAST_APPLIED_TRANSACTION) = "","0",
    abs(time_to_sec(if(time_to_sec(APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP)=0,0,
    timediff(APPLYING_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP,now()))))) * 1000000000000) latency,
    format_pico_time((LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP -
    LAST_QUEUED_TRANSACTION_ORIGINAL_COMMIT_TIMESTAMP) * 100000000000) transport_time,
    format_pico_time((LAST_QUEUED_TRANSACTION_END_QUEUE_TIMESTAMP -
    LAST_QUEUED_TRANSACTION_START_QUEUE_TIMESTAMP) * 1000000000000) time_to_relay_log,
    format_pico_time((LAST_APPLIED_TRANSACTION_END_APPLY_TIMESTAMP -
    LAST_APPLIED_TRANSACTION_START_APPLY_TIMESTAMP) * 1000000000000) apply_time,
    conn_status.LAST_QUEUED_TRANSACTION AS last_queued_transaction,
    applier_status.LAST_APPLIED_TRANSACTION AS last_applied_transaction
FROM
    performance_schema.replication_connection_status AS conn_status JOIN performance_schema.replication_applier_status_by_worker AS applier_status 
        ON applier_status.channel_name = conn_status.channel_name
\G
*************************** 1. row ***************************
                 channel:  (1)
                io_state: ON
               sql_state: ON
                 latency:   0 ps
          transport_time:   0 ps
       time_to_relay_log:   0 ps
              apply_time:   0 ps
 last_queued_transaction: 
last_applied_transaction: 
...

Peroration

This article guides you through setting up replication between a HeatWave MySQL instance (source) and an on-premise standalone MySQL instance (replica).
It highlights key steps like creating a replication user, securing the connection, and using MySQL Shell utilities for data transfer.
While a Network Load Balancer can be used for the connection, a VPN is recommended for better security.

Resources

Watch my videos on my YouTube channel and subscribe.

Thanks for using HeatWave & MySQL!

Comments Off

Filed under:
HeatWave, MySQL, Tuto by Olivier DASINI

MySQL HeatWave Report – December 2022 update

January 16, 2023

Tags: MySQL HeatWave

The following is just a summary of new features in MySQL HeatWave.

Highlight

Slides

MySQL HeatWave Report – December 2022 update by Olivier DASINI

Download this presentation and others on my Speaker Deck account.

Video

MySQL HeatWave Report – December 2022 update by Olivier DASINI

Watch my videos on my YouTube channel and subscribe.

Thanks for using HeatWave & MySQL!

0

Filed under:
MySQL by Olivier DASINI

Iris Data Set with MySQL HeatWave Machine Learning & Zeppelin

July 18, 2022

Tags: Cloud, ML, tuto

MySQL – The world’s most popular open source database – is also probably the most popular database in the Cloud.
MySQL HeatWave database service is THE MySQL PaaS, a fully managed cloud service in Oracle Cloud Infrastructure (OCI).

The only MySQL service 100% developed, managed and supported by the MySQL Team.

In short:

MySQL HeatWave is a massively parallel, high performance, in-memory query accelerator that accelerates MySQL performance by orders of magnitude for analytics workloads, mixed workloads, and Machine Learning.

For your analytics needs you can explore and visualize your data using a data visualization tool like Tableau, Qlik, Oracle Analytics Cloud, Apache Superset (see: Explore & visualize your MySQL HeatWave data with Superset)…

For your Machine Learning needs you can use a notebook like Jupyter, Apache Zeppelin (see: Interactively explore & visualize your MySQL HeatWave data with Apache Zeppelin),…

In this article I will show you how easy it is to use the Machine Learning capabilities of MySQL HeatWave. We will use the famous Iris dataset and we’re going to build, train, deploy and explain a machine learning model within MySQL HeatWave using Apache Zeppelin.

Prerequisites

To be able to reproduce this tutorial, you must have a MySQL HeatWave instance running (details here and also RTFM).
The data used in this article are available on my GitHub account here.

You need a MySQL client to load the data into MySQL HeatWave. I’m using MySQL Shell.
Furthermore if you want to use Zeppelin (although optional because we’re going to run mainly sql queries) , it must be properly setup.
Please read: Interactively explore & visualize your MySQL HeatWave data with Apache Zeppelin
You can also download the Zeppelin notebook here.

Finally, you might need to install some packages.
ex: (pip install pandas numpy unlzw3 sklearn seaborn matplotlib jupyter grpcio protobuf)

MySQL HeatWave ML

MySQL HeatWave ML makes it easy to use machine learning, whether you are a novice user or an experienced ML practitioner.

You provide the data, and HeatWave ML analyzes the characteristics of the data and creates an optimized machine learning model that you can use to generate predictions and explanations.

HeatWave ML supports supervised machine learning. That is, it creates a machine learning model by analyzing a labeled dataset to learn patterns that enable it to predict labels based on the features of the dataset. HeatWave ML supports both classification and regression models.
More information here.

**In-database Machine Learning with MySQL HeatWave**

All HeatWave ML operations are initiated by running CALL or SELECT statements, which can be easily integrated into your applications.

HeatWave ML routines include:

ML_TRAIN: Trains a machine learning model for a given training dataset.
ML_PREDICT_ROW: Makes predictions for one or more rows of data.
ML_PREDICT_TABLE: Makes predictions for a table of data.
ML_EXPLAIN_ROW: Explains predictions for one or more rows of data.
ML_EXPLAIN_TABLE: Explains predictions for a table of data.
ML_SCORE: Computes the quality of a model.
ML_MODEL_LOAD: Loads a machine learning model for predictions and explanations.
ML_MODEL_UNLOAD: Unloads a machine learning model.

Load the data

In a real life scenario, your data are already inside your MySQL instance. Although you might need to prepare them upfront, because the relevant data set must be inside a single table.

I’m using MySQL Shell (installed on the server that host Zeppelin) to load the data into the iris_ML schema. The dump file – MySQL_HeatWave-iris_ML.sql – was previously uploaded to the server into /home/ubuntu.

mysqlsh --version

mysqlsh myUser:MyP4s$W0rd@10.0.1.2 --sql -f /home/ubuntu/MySQL_HeatWave-iris_ML.sql

The dump contains 4 tables:

iris : the reference table ie source of truth, production data
iris_test : contains the test dataset
iris_train : contains the training dataset
iris_validate : contains the validation dataset

Check HeatWave ML status

HeatWave ML is enable by default as soon as your MySQL HeatWave Cluster is active.

The rapid_ml_status variable provides the status of HeatWave ML:

SHOW GLOBAL STATUS LIKE 'rapid_ml_status';

HeatWave ML is up and running \o/

Data exploration

The first step of any project based on data is to look at the data.
Summarize and visualize the data will help you to have a better understanding of the project.

It may surprise you but SQL provides some commands and functions for data exploration, although much less extensive than R or Python 🙂 :

-- Tables structure 
DESCRIBE iris_ML.iris;  -- Reference table (production table)
DESCRIBE iris_ML.iris_train; -- Table that contains the training dataset
DESCRIBE iris_ML.iris_test; -- Test dataset table
DESCRIBE iris_ML.iris_validate; -- Validation dataset table

-- Number of rows
SELECT count(*) FROM iris_ML.iris;
SELECT count(*) FROM iris_ML.iris_train;
SELECT count(*) FROM iris_ML.iris_test;
SELECT count(*) FROM iris_ML.iris_validate;

iris simulate the live production table. It contains the original data (150 rows) and actually it will not be used (as is) by HeatWave ML.
To keep things simple as possible in this article, iris as the same structure than iris_train. But in a real life scenario it will most likely not the case.
This production table may have additional columns like for example: a primary key, timestamp, etc… any useful business related information that are not relevant for building our model.

iris_train contains the training dataset a subset of iris table (120 rows), used to train the machine learning model.
This table is typically a subset of the reference table(s).

iris_test contains the test dataset a subset of iris table (30 rows), different than iris_train. Its structure is almost similar to iris_train but without the target column (class) .

iris_validate contains the validation dataset (30 rows). Same data than iris_test but same structure than iris_train in other words this table has the target column (class).

General requirements for HeatWave ML here.

-- Data sample
SELECT * FROM iris_ML.iris LIMIT 10;


-- Class distribution
SELECT class, count(*) FROM iris_ML.iris GROUP BY class;


-- Summary Statistics
SELECT MIN(sepal_length), MAX(sepal_length), ROUND(AVG(sepal_length), 2), ROUND(STD(sepal_length), 2) FROM iris_ML.iris;
SELECT MIN(sepal_width), MAX(sepal_width), ROUND(AVG(sepal_width), 2), ROUND(STD(sepal_width), 2) FROM iris_ML.iris;
SELECT MIN(petal_length), MAX(petal_length), ROUND(AVG(petal_length), 2), ROUND(STD(petal_length), 2) FROM iris_ML.iris;
SELECT MIN(petal_width), MAX(petal_width), ROUND(AVG(petal_width), 2), ROUND(STD(petal_width), 2) FROM iris_ML.iris;

Class distribution is well balanced:

Iris-virginica: 50 rows
Iris-setosa: 50 rows
Iris-versicolor: 50 rows

Data visualization

Visualize your data is probably the more convenient way to explore and understand them.

Below a little Python script to generate some graphs.

Beforehand, I had to edit the Zeppelin python interpreter – zeppelin.python – and replace python by python3:

Update zeppelin.python: replace python by python3 — *zeppelin.python*: python3:

The interpreter is: %python.ipython
The script is going to connect to MySQL HeatWave, thus you must update the database information (host, database, user, password)

%python.ipython

## Edit python interpreter on Zeppelin - zeppelin.python: python3  instead of python
## Update host, database, user and password (from you're MySQL HeatWave)



import mysql.connector as connection
import pandas as pd
import seaborn as sns
sns.set_palette('husl')
%matplotlib inline
import matplotlib.pyplot as plt


try:
    mydb = connection.connect(host="10.0.1.2", database='iris_ML', user="<user>", passwd="<password>", port=3306)
    query = "SELECT * FROM iris;"
    data = pd.read_sql(query,mydb)
    mydb.close() #close the connection
except Exception as e:
    my_conn.close()
    print(str(e))

data.head()

data.info()

data.describe()

data['class'].value_counts()

g = sns.pairplot(data, hue='class', markers='*')

plt.show()

HeatWave Machine Learning (ML) inclut tout ce dont les utilisateurs ont besoin pour créer, former, déployer et expliquer des modèles d’apprentissage automatique dans MySQL HeatWave, sans coût supplémentaire.

Dans ce webinaire vous apprendrez

The information above come from the Iris table (reference table). It could interesting to compare these data with the training dataset (iris_train) and the test dataset in order to evaluate the quality of these samples.

Replace query = “SELECT * FROM iris;” by query = “SELECT * FROM iris_train;” and then query = “SELECT * FROM iris_test;”

Training a Model

Now we have a better understanding of the data, let’s moving forward and train the model. We’re having a classification problem.

The ML_TRAIN routine, when run on a training dataset, produces a trained machine learning (ML) model.

# Train the model using ML_TRAIN

CALL sys.ML_TRAIN('iris_ML.iris_train', 'class', JSON_OBJECT('task', 'classification'), @iris_model);

You can show the current model, selecting the session variable @iris_model:

SELECT @iris_model;

The lifetime duration of a session variable is… the session lifetime duration. So when the session is closed the session variable content is lost.

ML_TRAIN stores the machine learning model – @iris_model – in the MODEL_CATALOG table:

# Model information

SELECT model_id, model_handle, model_owner, target_column_name, train_table_name, model_type, task, model_object_size  FROM ML_SCHEMA_admin.MODEL_CATALOG;

You can take the last created model using the following query:

SELECT model_handle FROM ML_SCHEMA_admin.MODEL_CATALOG ORDER BY model_id DESC LIMIT 1 INTO @iris_model;

When the model is created (using ML_TRAIN) you should load it into HeatWave ML (ML_MODEL_LOAD):

CALL sys.ML_MODEL_LOAD(@iris_model, NULL);

Row Prediction

HeatWave ML allows you to make prediction for individual rows or the entire table.

Row(s) predictions are generated by running ML_PREDICT_ROW.
Data are specified in JSON format.

# Predict 1 Row

SET @row_input = JSON_OBJECT( 
   "sepal_length", 7.3, 
   "sepal_width", 2.9, 
   "petal_length", 6.3, 
   "petal_width", 1.8
);  


SELECT sys.ML_PREDICT_ROW(@row_input, @iris_model);

Row Explanation

Being able to understand and explain a prediction is important in order to trust your model and to be able to explain the results. It might also be required by your local regulation.

ML_EXPLAIN_ROW generates explanations for one or more rows of data. Explanations help you understand which features have the most influence on a prediction.
Feature importance is presented as a value ranging from -1 to 1.

A positive value indicates that a feature contributed toward the prediction.
A negative value indicates that the feature contributed toward a different prediction

# Prediction explanation for 1 Row

SET @row_input = JSON_OBJECT( 
   "sepal_length", 7.3, 
   "sepal_width", 2.9, 
   "petal_length", 6.3, 
   "petal_width", 1.8
);    


SELECT sys.ML_EXPLAIN_ROW(@row_input, @iris_model);

Table Prediction

ML_PREDICT_TABLE generates predictions for an entire table and saves the results to an output table.

# Generate predictions for a table 

-- CALL sys.ML_MODEL_LOAD(@iris_model, NULL); -- Uncomment if the model is not yet loaded

DROP TABLE IF EXISTS iris_ML.iris_predictions; -- Useful if a table prediction was already generated

CALL sys.ML_PREDICT_TABLE('iris_ML.iris_test', @iris_model, 'iris_ML.iris_predictions');

A new table called iris_predictions is created. You can display Its first 5 rows with:

SELECT * FROM iris_ML.iris_predictions LIMIT 5;

Table Explanation

ML_EXPLAIN_TABLE explains predictions for an entire table and saves results to an output table.

# Prediction explanation for a table 

-- CALL sys.ML_MODEL_LOAD(@iris_model, NULL); -- Uncomment if the model is not yet loaded

DROP TABLE IF EXISTS iris_ML.iris_explanations;  -- Usueful if a table explanation was already done

CALL sys.ML_EXPLAIN_TABLE('iris_ML.iris_test', @iris_model, 'iris_ML.iris_explanations');

A new table called iris_explanations is created. You can display Its first 5 rows with:

SELECT * FROM iris_ML.iris_explanations LIMIT 5;

Scores

Scoring the model allows to assess the model’s reliability.

Models with a low score can be expected to perform poorly, producing predictions and explanations that cannot be relied upon. A low score typically indicates that the provided feature columns are not a good predictor of the target values.

HeatWave ML supports a variety of scoring metrics to help you understand how your model performs across a series of benchmarks.
Details here.

ML_SCORE returns a computed metric indicating the quality of the model.

-- This example uses the accuracy: Computes the fraction of labels a model predicts correctly

CALL sys.ML_SCORE('iris_ML.iris_validate', 'class', @iris_model, 'accuracy', @accuracy_score);

SELECT @accuracy_score;

Other metrics are also available. See here.

Here we go!
We’ve seen the machine learning life cycle.

I told you that Machine Learning with MySQL HeatWave was easy 🙂

You only need to use a limited set of SQL routines:

ML_TRAIN: Trains a machine learning model for a given training dataset.
ML_PREDICT_ROW: Makes predictions for one or more rows of data.
ML_PREDICT_TABLE: Makes predictions for a table of data.
ML_EXPLAIN_ROW: Explains predictions for one or more rows of data.
ML_EXPLAIN_TABLE: Explains predictions for a table of data.
ML_SCORE: Computes the quality of a model.

During the last 20 years MySQL has democratized the usage of transactional databases.
Now with MySQL HeatWave we are in the process of democratizing Analytics and Machine Learning.

With MySQL HeatWave,
valorizing your data has never been so easy!

Resources

Watch my videos on my YouTube channel and subscribe.

Thanks for using HeatWave & MySQL!

1

Filed under:
MySQL, Tuto by Olivier DASINI

Interactively explore & visualize your MySQL HeatWave data with Apache Zeppelin

July 5, 2022

Tags: Cloud, tuto

MySQL – The world’s most popular open source database – is a very good and safe choice as a transactional database, on-premise and in the cloud.

In the article – Explore & visualize your MySQL HeatWave data with Superset – I introduced MySQL HeatWave – a fully managed database service, that combines transactions, analytics, and machine learning services into one MySQL Database, delivering real-time, secure analytics without the complexity, latency, and cost of ETL duplication.
MySQL HeatWave is a native MySQL solution thus current MySQL applications work without changes.

Enabling a HeatWave Cluster also provides access to HeatWave Machine Learning (ML), which is a fully managed, highly scalable, cost-efficient, machine learning solution for data stored in MySQL.
HeatWave ML provides a simple SQL interface for training and using predictive machine learning models, which can be used by novice and experienced ML practitioners alike.
Machine learning expertise, specialized tools, and algorithms are not required.
With HeatWave ML, you can train a model with a single call to an SQL routine.

Below some numbers comparing MySQL HeatWave ML vs AWS Redshift ML:

MySQL HeatWave ML is 25x faster than Redshift ML
25x faster on average without compromising accuracy — **MySQL HeatWave ML is 25x faster than Redshift ML**

MySQL HeatWave ML is 1% of cost of Redshift ML
1% of cost and no additional cost for MySQL HeatWave customers — **MySQL HeatWave ML is 1% of cost of Redshift ML**

**Better performance with HeatWave ML with larger cluster size**

For more details & raw numbers read: Performance comparison of HeatWave ML with Redshift ML.

MySQL HeatWave is currently available on Oracle Cloud Infrastructure (OCI). Availability on Amazon Web Services (AWS) and Microsoft Azure is coming (Oracle’s “HeatWave” is going multicloud: How hot will things get for MySQL?).

MySQL HeatWave ML can also takes models tuned in MySQL and use them to answer queries from popular machine learning notebook services for code and data, such as Jupyter and Apache Zeppelin.

Apache Zeppelin is an open source web-based notebook that allows data ingestion, data discovery, data analytics, data visualization and data collaboration.
Apache Zeppelin supports many interpreters such as MySQL, Python, JDBC, Shell, Spark, …
More: https://zeppelin.apache.org/

In this article I will show you how to properly configure Apache Zeppelin in order to take advantage of the Analytics and Machine Learning capabilities of MySQL HeatWave the MySQL in-memory query accelerator.

MySQL HeatWave

I’m using OCI and I will not go into all the installation details (RTFM). Below the main steps.

Using your favorite Internet browser, open your OCI console, select a compartment and go to Databases / MySQL / DB Systems.

Create a DB Systems and select HeatWave.

The rest is pretty straightforward!

Please note the Show Advanced Options link at the bottom of the page.

It’s Data Import tab is useful if you want to create an instance with data.

More information in this article: Discovering MySQL Database Service – Episode 5 – Create a MySQL DB system from a MySQL Shell dump

Now we must add the HeatWave cluster.

On the left, under Resources click HeatWave.

Add the HeatWave Cluster…

Set the Node Count

I would recommend to estimate the node count, click Estimate Node Count then Generate Estimate:

Copy the Load Command.
It will be used to load the chosen data into the HeatWave cluster.

Apply the node count estimate:

Finally create the cluster by pushing the button: Add HeatWave Cluster

When the cluster is running, the very last step is to offload the data to the cluster by running the stored procedure (i.e. the “Load Command”) copied previously using your favorite MySQL client.

e.g. Offload to the HeatWave cluster all the tables inside the airportdb schema:
CALL sys.heatwave_load(JSON_ARRAY(‘airportdb’), NULL);

Now you have a MySQL HeatWave cluster up and running with some data loaded into the cluster;
Cheers!

Apache Zeppelin

I assume that you already know what is Apache Zeppelin. If not RTFM.

Please note this is a test installation, probably not suitable for a production environment.

Requirements

Apache Zeppelin officially supports and is tested, among other, on Ubuntu 20.04, so I’ve installed Zeppelin on a VM Ubuntu 20.04 from Oracle Cloud Infrastructure (OCI).

In addition to the OS, you also need the good version of OpenJDK or Oracle JDK.

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.4 LTS
Release:	20.04
Codename:	focal

$ java -version
openjdk version "17.0.3" 2022-04-19
OpenJDK Runtime Environment (build 17.0.3+7-Ubuntu-0ubuntu0.20.04.1)
OpenJDK 64-Bit Server VM (build 17.0.3+7-Ubuntu-0ubuntu0.20.04.1, mixed mode, sharing)

I’m good to go!

Binary

I downloaded the binary package with all interpreters. Available here.

wget https://dlcdn.apache.org/zeppelin/zeppelin-0.10.1/zeppelin-0.10.1-bin-all.tgz

Unpack it in a directory of your choice and you’re ready to go!

tar xvf zeppelin-0.10.1-bin-all.tgz
mv zeppelin-0.10.1-bin-all/ zeppelin
cd zeppelin

Firewalls

Depending on your architecture, you may have to deal with some firewalls.

In OCI I must update my Ingress Rules for the Default Security List (OCI virtual firewall) for the protocol TCP and the port you are going to use, 8080 in this article:

On the Ubuntu side I also had to update IPtable.

Starting Apache Zeppelin

By default Zeppelin is listening at 127.0.0.1:8080, so you can’t access it when it is deployed on another remote machine.
To access a remote Zeppelin, you need to change zeppelin.server.addr to 0.0.0.0 in conf/zeppelin-site.xml.

~/zeppelin$ cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml

~/zeppelin$ vim conf/zeppelin-site.xml
...

When the update is done:

~/zeppelin$ grep -m2  -a4  "<property>" conf/zeppelin-site.xml
-->

<configuration>

<property>
  <name>zeppelin.server.addr</name>
  <value>0.0.0.0</value>
  <description>Server binding address</description>
</property>

<property>
  <name>zeppelin.server.port</name>
  <value>8080</value>
  <description>Server port.</description>
</property>

Address is 0.0.0.0 and Port is 8080.

Now I can start Zeppelin:

~/zeppelin$ bin/zeppelin-daemon.sh start

~/zeppelin$ bin/zeppelin-daemon.sh status
Zeppelin is running                                        [  OK  ]

Yippee!

MySQL Connector J

The Zeppelin SQL interpreter needs the JDBC MySQL Connector, available at https://dev.mysql.com/downloads/connector/j/

Obviously, MySQL customers can use the commercial version available on MOS.

$ wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java_8.0.29-1ubuntu20.04_all.deb

$ sudo dpkg -i mysql-connector-java_8.0.29-1ubuntu20.04_all.deb

The JDBC MySQL Connector is now installed!

Setup the Zeppelin interpreter

Create a new interpreter:

Enter a name for the Interpreter Name box and select jdbc as Interpreter group:

Interpreter Name: MySQL HeatWave
Interpreter group: jdbc

JDBC interpreter lets you create a JDBC connection to any data sources seamlessly. Inserts, Updates, and Upserts are applied immediately after running each statement.

Properties

default.url : jdbc:mysql://<MySQL HeatWave IP Address>:3306/
default.user : <MySQL User>
default.password: <MySQL Password>
default.driver : com.mysql.jdbc.Driver

Dependencies

Dependencies artifact : mysql:mysql-connector-java:8.0.29

Don’t forget to save the configuration 🙂

To sum up, the interpreter configuration looks like:

MySQL HeatWave usage

I can now create a notebook – Check MySQL HeatWave (download the notebooks) – in order to check if HeatWave is enable:

The Default Interpreter must be MySQL_HeatWave:

For the check, let’s run these 2 queries:

SHOW VARIABLES LIKE 'rapid_bootstrap';

SHOW STATUS WHERE Variable_name IN ('rapid_cluster_status', 'rapid_ml_status', 'rapid_cluster_ready_number', 'rapid_load_progress');

The HeatWave cluster is up and ML is enable.
All good!

Analytics

Out of curiosity, let’s try some analytic queries.

I already loaded the airportdb sample database (see here) into the MySQL DB system.

Let’s create a new note – Airportdb Analytics Queries (download the notebooks):

airportdb must now be loaded into the HeatWave cluster in order to take advantage of the tremendous performance:

CALL sys.heatwave_load(JSON_ARRAY('airportdb'), NULL)

And here are the queries:

-- Number of Tickets > $500.00, Grouped By Price
SELECT booking.price, count(*) 
FROM airportdb.booking 
WHERE booking.price > 500 
GROUP BY booking.price 
ORDER BY booking.price 
LIMIT 10;


-- Average Age of Passengers By Country, Per Airline
SELECT airline.airlinename, AVG(datediff(departure,birthdate)/365.25) as avg_age, count(*) as nb_people
FROM airportdb.booking, airportdb.flight, airportdb.airline, airportdb.passengerdetails
WHERE booking.flight_id=flight.flight_id AND
    airline.airline_id=flight.airline_id AND
    booking.passenger_id=passengerdetails.passenger_id AND
    country IN ("SWITZERLAND", "FRANCE", "ITALY")
GROUP BY airline.airlinename
ORDER BY airline.airlinename, avg_age
LIMIT 10;

Great results, respectively less than 0 second and 1 second.

Without HeatWave the query execution times would be respectively 10 and 17 seconds:

Machine Learning

What about the Machine Learning possibilities of MySQL HeatWave ML?

Well this article is already long enough, so it will be the topic of the next article 🙂

Stay tune!

Resources

Watch my videos on my YouTube channel and subscribe.

Thanks for using HeatWave & MySQL!

0

Filed under:
Machine Learning, MySQL, Tuto by Olivier DASINI

Explore & visualize your MySQL HeatWave data with Superset

June 9, 2022

Tags: Analytics, Cloud, performance, tuto

MySQL – The world’s most popular open source database – is probably the best choice for a transactional database especially when considering the costs / performance ratio, and that on-premise or in the cloud.

But what about business intelligence (BI) needs? Analytics workloads?

We’ve got you covered now with the versatile MySQL HeatWave – a fully managed database service, that combines transactions, analytics, and machine learning services into one MySQL Database, delivering real-time, secure analytics without the complexity, latency, and cost of ETL duplication.
MySQL HeatWave is a native MySQL solution thus current MySQL applications work without changes.

MySQL HeatWave is also fast, super fast, easy to use and provides an incredible price/performance ratio…

Details & more numbers: Performance comparison of MySQL HeatWave with Snowflake, Amazon Redshift, Amazon Aurora, and Amazon RDS for MySQL.

MySQL HeatWave is currently available on Oracle Cloud Infrastructure (OCI). Availability on Amazon Web Services (AWS) and Microsoft Azure is coming (Oracle’s “HeatWave” is going multicloud: How hot will things get for MySQL?).

Indeed, having a fast and easy to use database is valuable. But analytics also imply that you need to explore and visualize your data.
There are plenty of tools available on the market, one of the them is Apache Superset, an open source modern data exploration and visualization platform.

In this article I will show you how to properly configure Apache Superset in order to take advantage of a high performance, in-memory query accelerator: MySQL HeatWave.

MySQL HeatWave

I’m on OCI and I will not go into all the installation details (RTFM). Below the main steps.

Using your favorite Internet browser, open your OCI console, select a compartment and go to Databases / MySQL / DB Systems.

Create a DB Systems and select HeatWave.

The rest is pretty straightforward!

Please note the Show Advanced Options link at the bottom of the page.

It’s Data Import tab is useful if you want to create an instance with data.

More information in this article: Discovering MySQL Database Service – Episode 5 – Create a MySQL DB system from a MySQL Shell dump

Now we must add the HeatWave cluster.

On the left, under Resources click HeatWave.

Add the HeatWave Cluster…

Set the Node Count

I would recommend to estimate the node count, click Estimate Node Count then Generate Estimate:

Copy the Load Command.
It will be used to load the chosen data into the HeatWave cluster.

Apply the node count estimate:

Finally create the cluster by pushing the button: Add HeatWave Cluster

When the cluster is running, the very last step is to offload the data to the cluster by running the stored procedure (i.e. the “Load Command”) copied previously using your favorite MySQL client.

e.g. Offload to the HeatWave cluster all the tables inside the airportdb schema:
CALL sys.heatwave_load(JSON_ARRAY(‘airportdb’), NULL);

Apache Superset

I assume that you know what is Apache Superset. If not RTFM.
I also assume that it is already installed. You’ll find the necessary information in their documentation: Installing Superset from Scratch or Installing Superset Locally Using Docker Compose.

it is worth noting that I had some issues during the installation process following the documentation.

For the record, below what I’ve done using a VM Ubuntu 20.04 from Oracle Cloud Infrastructure (OCI).
Please note this is a test installation, not suitable for a production environment.

Install the required dependencies (Ubuntu or Debian):

$ sudo apt update && sudo apt -y install build-essential libssl-dev libffi-dev python3-dev python3-pip libsasl2-dev libldap2-dev default-libmysqlclient-dev libprotobuf17 python3-protobuf

Upgrade pip (and prevent some issues)

$ pip3 install --upgrade pip

$ pip3 install zipp==3.8.0

$ pip3 install Flask-WTF==0.14.3

Install Superset

$ pip3 install apache-superset

Binaries are in $HOME/.local/bin
I decided to put the following information into my ~/.bashrc (however not sure it is a good practice, but good enough for testing)

export PATH="$HOME/.local/bin:$PATH"
export FLASK_APP=superset

then

$ source ~/.bashrc

Initialize the database:

$ superset db upgrade

$ superset fab create-admin


# If you want to load the embeded examples
# superset load_examples

# Create default roles and permissions
$ superset init

Please not that you will probably have to update your firewall.

If like me you are on OCI you must update the security list (OCI virtual firewall) for the protocol TCP and the port you are going to use, 8088 in this article

On Ubuntu you may have to update IPtable.

Start Superset

Superset is listening on port 8088

$ superset run -h 0.0.0.0 -p 8088 --with-threads --reload

If everything worked, you should be able to navigate to hostname:port in your browser and login using the username and password you created.

If you have everything on your laptop, you can remove -h 0.0.0.0 and then only local connections will be allowed (i.e. navigate to localhost:8088).

Install MySQL Connector Python 8.0

Unlike what is written here, I would recommend to use MySQL 8.0, to download and use the official MySQL Python connector available on the MySQL website.

Go to https://dev.mysql.com/downloads/connector/python/ for the latest version.

$ wget https://dev.mysql.com/get/Downloads/Connector-Python/mysql-connector-python-py3_8.0.29-1ubuntu20.04_amd64.deb

$ sudo dpkg -i mysql-connector-python-py3_8.0.29-1ubuntu20.04_amd64.deb

Obviously, MySQL customers can use the commercial version available on MOS.

Configure Superset for MySQL HeatWave

Using the Apache Superset UI you can add a new backend database.

Go to Data / Databases

Then click + DATABASE

Add your MySQL HeatWave database.
There are 2 important parts:

1/ SQLALCHEMY URI :

mysql+mysqlconnector://<MySQL user>:<MySQL password>@<MySQL host>

You need the MySQL user, password and IP.

2/ ENGINE PARAMETERS

In order to take advantage of the power of MySQL HeatWave, autocommit must be enabled.

If autocommit is disabled, queries are not offloaded and execution is performed on the MySQL DB System. In other words, the queries will be much slower. Details here.

Go to Advanced / Other / ENGINE PARAMETERS

{“isolation_level”:”AUTOCOMMIT”}

You all set 🙂

Thanks to MySQL HeatWave, you can now add your datasets and build awesome and very fast dashboards, explore and visualize your data at the speed of light.

Resources

Watch my videos on my YouTube channel and subscribe.

Thanks for using HeatWave & MySQL!

3

Filed under:
MySQL, Tuto by Olivier DASINI

Data Migration from MariaDB to MySQL

March 9, 2022

Tags: MariaDB, MDS, Migration, MySQL Shell

Yet another customer wanted to migrate from MariaDB to MySQL. So I decided to write down the overall data migration process.

Indeed each migration are a unique story so I won’t provide you a “How to migrate”…
The idea is to highlight the general path and help you to avoid common pitfalls and thus help you save some precious time.

Context

Source database
- MariaDB 10.3
- Around 8GB of data
- Only InnoDB tables
- Many user accounts (important information)
Target databases
- MySQL Database Service 8.0.28
- MySQL 8.0.28

In fact, the (real) target was MySQL Database Service the MySQL PaaS on Oracle Cloud Infrastructure, but I will also show the workflow for on-premise MySQL.

Please find below a reenactment of the events…

MySQL Shell

The first step is to install the MySQL DBA BFF: MySQL Shell.
MySQL Shell is an advanced client and code editor for MySQL. You can manage your data using SQL, JavaScript and Python.

In addition to data manipulation (btw MySQL 8.0 also provides a NoSQL CRUD API to handle JSON documents), there are also a lot of things you can do with MySQL Shell like create a MySQL cluster (InnoDB Cluster, InnoDB ReplicaSet, InnoDB ClusterSet), verify whether your MySQL server instances is ready for upgrade, dump and restore, etc… :

Believe me if you try it, you will love it 🙂

It is way better than the old mysql text client and some of these features work with MariaDB (at least the ones needed for the migration).

Easy to install.
e.g.

[opc@mariadb10-3 ~]$ 
sudo yum install -y mysql-shell

The source dataset

Let’s have a look on the dataset.

Unfortunately MariaDB does not have the sys schema…
We cannot use this very elegant query:

-- MySQL 5.7 / 8.0+

SELECT sys.format_bytes(sum(data_length)) DATA,
       sys.format_bytes(sum(index_length)) INDEXES,
       sys.format_bytes(sum(data_length + index_length)) 'TOTAL SIZE'
FROM information_schema.TABLES ORDER  BY data_length + index_length
;

So back in the time and let’s use the information_schema instead and the following ugly query:

-- MySQL (old version) / MariaDB --

SQL>
SELECT 
    CONCAT( SUM( ROUND( DATA_LENGTH / (1024 / 1024) ,2)), 'MB' ) AS Data,
    CONCAT( SUM( ROUND( INDEX_LENGTH / (1024 / 1024) ,2)), 'MB' ) AS Indexes,
    CONCAT( SUM( ROUND( (DATA_LENGTH + INDEX_LENGTH) / (1024 * 1024) ,2)), 'MB' ) AS 'Total Size' 
FROM information_schema.TABLES ORDER BY DATA_LENGTH + INDEX_LENGTH
;

+-----------------+-----------------+------------+
| Data            | Indexes         | Total Size |
+-----------------+-----------------+------------+
| 3044154215.00MB | 5688978432.00MB | 8328.61MB  |
+-----------------+-----------------+------------+

Approximately 8GB of data.

What about the storage engines?

Again because of lack of sys schema we cannot use this pretty query:

-- MySQL 5.7 / 8.0+

SELECT count(*) as '# TABLES', sys.format_bytes(sum(data_length)) DATA, 
       sys.format_bytes(sum(index_length)) INDEXES,   
       sys.format_bytes(sum(data_length + index_length)) 'TOTAL SIZE', 
       engine `ENGINE` FROM information_schema.TABLES 
WHERE TABLE_SCHEMA NOT IN ('sys','mysql', 'information_schema', 'performance_schema', 'mysql_innodb_cluster_metadata')
GROUP BY engine
ORDER BY engine;

So let’s back in the time and use instead:

-- MySQL (old version) / MariaDB --

SQL>
SELECT 
   count(*) as '# TABLES',
   CONCAT( SUM( ROUND( DATA_LENGTH / (1024 / 1024) ,2)), 'MB' ) AS Data,
   CONCAT( SUM( ROUND( INDEX_LENGTH / (1024 / 1024) ,2)), 'MB' ) AS Indexes,
   CONCAT( SUM( ROUND( (DATA_LENGTH + INDEX_LENGTH) / (1024 * 1024) ,2)), 'MB' ) AS 'Total Size',
   engine `ENGINE` FROM information_schema.TABLES 
WHERE TABLE_SCHEMA NOT IN ('sys','mysql', 'information_schema', 'performance_schema', 'mysql_innodb_cluster_metadata')
GROUP BY engine
ORDER BY engine
;

+----------+-----------------+-----------------+------------+--------+
| # TABLES | Data            | Indexes         | Total Size | ENGINE |
+----------+-----------------+-----------------+------------+--------+
|        2 | NULL            | NULL            | NULL       | NULL   |
|       26 | 3043442688.00MB | 5688705024.00MB | 8327.66MB  | InnoDB |
+----------+-----------------+-----------------+------------+--------+

All tables are in InnoDB \o/
That’s how DBA’s life should be 🙂

Migrate the data

The source is MariaDB 10.3 and the target is MySQL 8.0, on-premise or using MySQL Database Service.

With such amount of data ~10GB a dump is not an issue especially if you use MySQL Shell utilities (please forget about the aging mysqldump – it deserves a nice retirement).

Migrate to MySQL on-premise

Because I need the whole instance, I’m going to use MySQL Shell dumpInstance utility.

An important option of dumpInstance is compatibility. It can

enforce the InnoDB storage engine (force_innodb) ie change CREATE TABLE statements to use the InnoDB storage engine for any tables that do not already use it
ignore any missing primary keys (ignore_missing_pks),
automatically add primary keys in invisible columns (create_invisible_pks)
etc…

Here, I will only use force_innodb. (if your memory is better than mine you remember that all the tables are already in InnoDB, so not really needed in this case, but I must admit that I’m paranoid :-O).
Joke aside, it is in my opinion a good habit to use this option.

If you are worried about primary keys below a query that would help you to check your schema:

-- Find tables without PK

SELECT tables.table_schema , tables.table_name , tables.engine  
FROM information_schema.tables  LEFT JOIN (     
   SELECT table_schema , table_name
   FROM information_schema.statistics
   GROUP BY table_schema, table_name, index_name 
   HAVING SUM( 
     case when non_unique = 0 and nullable != 'YES' then 1 else 0 end ) = count(*) ) puks   
   ON tables.table_schema = puks.table_schema 
  AND tables.table_name = puks.table_name   
  WHERE puks.table_name IS null 
  AND tables.table_type = 'BASE TABLE' 
  AND tables.TABLE_SCHEMA NOT IN ('mysql', 'performance_schema', 'information_schema', 'sys');

The MySQL Shell dump instance command for dumping data from MariaDB to MySQL 8.0 on-premise looks like:

JS>
util.dumpInstance("/bkp/dumpMariadb1", {threads: 8, compatibility: ["force_innodb"]})

The dump will be stored in the /bkp/dumpMariadb1 directory
threads: 8 -> 8 parallel threads to dump chunks of data from the MariaDB instance
compatibility: force_innodb -> update to InnoDB for any tables that do not already use it

Depending on your context you may have to tune this command.

Move the dump on the MySQL 8.0 host and then restore the dump.

In MySQL 8.0 there is a special ninja technique 🙂 to load a dump at the speed of light… All right, I may be exaggerating a bit, but at least at the speed of the sound by disabling crash recovery aka disabling redo logging.

DISCLAIMER: This feature is intended only for loading data into a new MySQL instance. Do not disable redo logging on a production system. It is permitted to shutdown and restart the server while redo logging is disabled, but an unexpected server stoppage while redo logging is disabled can cause data loss and instance corruption.

The commands to restore the dump on MySQL 8.0 on-premise look like:

JS>
\sql SET GLOBAL local_infile=1;
\sql ALTER INSTANCE DISABLE INNODB REDO_LOG;

util.loadDump("/restore/dumpMariadb1", {threads: 8, ignoreVersion: true , skipBinlog: true, loadUsers: false})

\sql ALTER INSTANCE ENABLE INNODB REDO_LOG;
\sql SET GLOBAL  local_infile=0;

local_infile=1 -> allows local data loading
ALTER INSTANCE DISABLE INNODB REDO_LOG -> disable redo logging to speed up data loading
The dump has been moved to the MySQL instance at /restore/dumpMariadb1
threads: 8 -> 8 parallel threads to upload chunks of data to the MySQL instance
ignoreVersion: true -> allow the import of the MariaDB dump into MySQL (MariaDB is taken for a MySQL 5.5 version)
skipBinlog: true -> skips binary logging for the sessions used during the restore
loadUsers: false -> do not import users and their roles and grants. This one is very important because MariaDB users are not compatibles. Please note that the default is false (paranoid mode enable)

Migrate to MySQL Database Service

As a prerequisites I’m assuming that you have the basic Oracle Cloud Infrastructure and MySQL Database Service knowledge.
If it is not yet the case, please read my Discovering MySQL Database Service series.

To migrate your data into MySQL Database Service (MDS), MySQL Shell is again your best friend to export the data.
You can either store the data locally or very easily in an OCI object storage bucket.

If the data are located in a bucket you can then use the MDS data Import option to automatically create a new MDS instance populated with your data.
If you choose the “local” storage, the import will be manual only.

The main thing to keep in mind during the import process is that you will most likely have a user account issue (MariaDB users are not compatibles), so my recommendation is to take a MySQL Shell dumps with all the business data (i.e. mysql, information_schema, performance_schema, sys are not include) without the user accounts.

Because I need the whole instance, in both case I’m using MySQL Shell dumpInstance utility.

Backup the database into a bucket

I’m going to store my dump inside an Object Storage bucket so I must create the bucket before the dump. This is very easy by using the OCI interface.

If you are not familiar with OCI Object Storage you can follow this tutorial: Discovering MySQL Database Service – Episode 4 – Dump your MySQL data into an Object Storage bucket.

In order to put these data into the OCI bucket, we must make some configuration for allowing MySQL Shell to easily connect to the OCI object storage service.

To this end, we will create an OCI CLI configuration file.

The CLI is a small-footprint tool that you can use on its own or with the Console to complete Oracle Cloud Infrastructure tasks. The CLI provides the same core functionality as the Console, plus additional commands. Some of these, such as the ability to run scripts, extend Console functionality.

To be clear, we will not use the CLI but its configuration file. The configuration file name and default location (on Linux) is /home/opc/.oci/config.

It should have the following information:

user: OCID of the user calling the API.
fingerprint: Fingerprint for the public key that was added to this user.
key_file: Full path and filename of the private key.
tenancy: OCID of your tenancy.
region: An Oracle Cloud Infrastructure region.

For more details please click here.

As an example, mine looks like:

[DEFAULT]
user=ocid1.user.oc1..xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
fingerprint=xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx
key_file=/home/opc/.oci/oci_api_key.pem
tenancy=ocid1.tenancy.oc1..xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
compartment=ocid1.compartment.oc1..xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
region=eu-frankfurt-1

You may be wondering how to get this information? Fair question!
Don’t panic I already explained everything in this tutorial: Discovering MySQL Database Service – Episode 4 – Dump your MySQL data into an Object Storage bucket

My Bucket is named: migrationMariadbNoUser

The MySQL Shell dump instance command for migrating data from MariaDB to MDS and stored data into a bucket looks like:

JS>
util.dumpInstance("", {users: false, osBucketName: "migrationMariadbNoUser", osNamespace: "abcdefghijkl", ocimds: true,  threads: 8, compatibility: ["force_innodb", "strip_restricted_grants", "strip_definers", "strip_tablespaces"]})

users: false -> exclude users and their roles and grants in the dump
osBucketName: “migrationMariadbNoUser” -> name of the Object Storage bucket to which the dump is to be written
osNamespace: “abcdefghijkl” -> namespace where the Object Storage bucket named by osBucketName is located
ocimds: true -> enables checks and modifications for compatibility with MDS
threads: 8 -> 8 parallel threads to dump chunks of data from the MariaDB instance
compatibility: ->
- force_innodb -> change CREATE TABLE statements to use the InnoDB
- strip_restricted_grants -> remove specific privileges that are restricted by MDS
- strip_definers -> remove the DEFINER clause from views, routines, events, and triggers
- strip_tablespaces -> all tables are created in their default tablespace

Complete documentation is available here.

The command above is storing the dump, compatible with MDS, into the bucket migrationMariadbNoUser.

Backup the database locally

If your dataset is not big and/or you are too lazy to spend a little time in configuration, the alternative is to dump your data locally.
The main drawback is that you will have to import the data into MDS manually – but this should not be a big deal if your dataset is not huge.

If you want to store the dump locally the command looks like:

JS>
util.dumpInstance("/bkp/dumpMariadbNoUser", {users: false, threads: 8, compatibility: ["force_innodb", "strip_restricted_grants", "strip_definers", "strip_tablespaces", "create_invisible_pks"]})

The command is close to the one for the on-prem.
Please note – users: false – option. You can also dump the user and then run the import without loading the users (default behavior).

In fact, the most import difference compare to an on-prem data migration are the compatibility options. Because MDS is a PaaS you have less flexibility than the on-prem MySQL so more compatibility needs.

Restore into MySQL Database Service from a bucket

You have 2 solutions, the automated one, easiest but less flexible or the manual.

The first one is the MDS Data Import feature using the OCI interface.

MySQL Data Service - Data Import — MySQL Data Service – Data Import

You’ll find the “How To” in: Discovering MySQL Database Service – Episode 5 – Create a MySQL DB system from a MySQL Shell dump

Restore into MySQL Database Service from a local dump

In this case we cannot use the crash recovery trick nor the Data Import feature.
Anyway the MySQL Shell dumpInstance function will still do the job.

The command looks like:

JS>
util.loadDump("/restore/dumpMariadbNoUser", {threads: 8, ignoreVersion: true})

threads: 8 -> 8 parallel threads to upload chunks of data to the MySQL instance
ignoreVersion: true -> allow the import of the MariaDB dump into MySQL (MariaDB is taken for a MySQL 5.5 version)

loadUsers: false is not necessary here because the dump was made with users: false.

And don’t forget to re-create your users 🙂
You have plenty of authentication plugins to create secure user accounts.
We also provide Multifactor Authentication.

Btw if security is one of your concern (and it should be) below some MySQL password management features:

MySQL Security Serie – Password Management

Database migration is not only about data migration, so you still have some work to do. But now you should have a better understanding of the overall data migration process.

Watch my videos on my YouTube channel and subscribe.

Thanks for using HeatWave & MySQL!

0

Filed under:
Cloud, Migration, MySQL by Olivier DASINI

Discovering MySQL Database Service – Episode 10 – Connect to MySQL Database Service Using OCI Cloud Shell

October 5, 2021

Tags: Cloud, MDS, OCI

MySQL, the world’s most popular open source database, is available as a managed cloud service in Oracle Cloud Infrastructure (OCI) under the name of MySQL Database Service (MDS).

MySQL Database Service is a fully managed cloud service, 100% Developed, Managed and Supported by the MySQL Team.

This is the tenth episode of “Discovering MySQL Database Service“, a series of tutorials where I will show you, step by step, how to use MySQL Database Service and some other Oracle Cloud Infrastructure services.

Please also note that you can run this tutorial and thus try MySQL Database Service & the other Oracle Cloud Infrastructure services for free by starting your 30-day trial.

Planning episodes

Episode 10 – Connect to MySQL Database Service Using OCI Cloud Shell

In the previous episode we’ve seen how to connect to our MySQL Database Service instance with MySQL Workbench using a SSH port forwarding from the OCI Bastion service.

And in episode 8 – Discovering MySQL Database Service – Episode 8 – Connect to MySQL Database Service Using MySQL Shell – we’ve seen how to connect to our MySQL Database Service instance with MySQL Shell using here again a SSH port forwarding from the OCI Bastion service.

Two more steps to our Discovering MySQL Database Service journey.

In this episode, we’ll learn how to connect to our MySQL Database Service instance using the Oracle Cloud Infrastructure Cloud Shell.
It provides a pre-authenticated Oracle Cloud Infrastructure CLI and preinstalled developer tools for easily managing Oracle Cloud resources.

Cloud Shell

Oracle Cloud Infrastructure Cloud Shell gives you access to an always available Linux shell directly in the Oracle Cloud Infrastructure Console.

You can use the shell to interact with resources like MySQL Database Service, Oracle Container Engine for Kubernetes cluster, Oracle Autonomous Database, …

Cloud Shell provides:

An ephemeral machine to use as a host for a Linux shell, pre-configured with the latest version of the OCI Command Line Interface (CLI) and a number of useful tools
5GB of storage for your home directory
A persistent frame of the Console which stays active as you navigate to different pages of the console

For more information, please see the Cloud Shell documentation

Create a SSH port forwarding session

In episode 7 – Discovering MySQL Database Service – Episode 7 – Use a Bastion SSH port forwarding session – we created a Bastion, a SSH port forwarding (SSH tunneling) session and generated SSH key pair.

We are now going to use these in order to connect to MDS with Cloud Shell. However if you remember well the CIDR block allowlist is set to your location (IP of your office, of your home if you WFH, …), but not the Cloud Shell IP.

So many possibilities, we can setup this bastion only for Cloud Shell, we can add another bastion, dedicated for the Cloud Shell, we can update the current bastion. I’ll go for this last option.

So the plan is to connect to Cloud Shell and get the IP address.

Open the OCI console and click on the Cloud Shell icon, on the top right corner:

It will open the Cloud Shell in the bottom of the browser (the first time it can takes some time).

To get the IP address run the following command:

$ curl ifconfig.me

We can now update (or setup) the Bastion.

Go to the OCI console, in the menu, go to: Identity & Security / Bastion, Check that you are in the right compartment then click on the Bastion we previously created – BastionMDS in this example – or create a new one if you prefer/can.

Click on Edit

Update the CIDR Block Allowlist with the IP that you got when you executed curl ifconfig.me.
Don’t forget the /32 (CIDR block)

Then saves the changes.

We can now create a session and then get the SSH port forwarding command generated by OCI.

Click on Create Session

And like we have seen in Discovering MySQL Database Service – Episode 7 – Use a Bastion SSH port forwarding session, create the session with the relevant parameters:

the Session Type: SSH port forwarding session
a session name: a meaningful one 🙂
how to connect to the target host : we’ll use IP Address.
the MySQL Database Service instance IP Address: the IP address that we got in Discovering MySQL Database Service – Episode 5 – Create a MySQL DB system from a MySQL Shell dump
the MySQL Database Service instance port: one of the port that we got in Discovering MySQL Database Service – Episode 5 – Create a MySQL DB system from a MySQL Shell dump. I choose 3306 but you could have chosen 33060.

Then you can either generate (a new) SSH key pair or choose the public key we’ve created in Discovering MySQL Database Service – Episode 7 – Use a Bastion SSH port forwarding session.

The private SSH key must be download into the Cloud Shell and be protected (chmod 600)

Drag and drop your private SSH key to the Cloud Shell and execute chmod 600 on the file

$ chmod 600 myPrivateKey.key

For the last steps we must now get the SSH port forwarding command generated by OCI and copy/paste it to the Cloud Shell.

To do so, when the session is active, go to the right and click on the action menu (3 vertical dots on the right).
A contextual menu is showing up then click on copy SSH command.

Paste this command in your Cloud Shell prompt.

The SSH command looks like :

ssh -i <privateKey> -N -L <localPort>:10.0.1.231:3306 -p 22 ocid1.bastionsession.oc1.eu-frankfurt-1.amaaaaaacicuulyams3rrxazerty1uiopqsd2fghjklm3wxcvbr4m7wpu52a@host.bastion.eu-frankfurt-1.oci.oraclecloud.com

Replace <privateKey> with the path to the private key we’ve just downloaded.

Replace <localPort> with a free port on the Cloud Shell – 3306 (and 33060) should be free.

Example:

$ ssh -i ./ssh-key-discovering_MDS.key -N -L 3306:10.0.1.231:3306 -p 22 ocid1.bastionsession.oc1.eu-frankfurt-1.amaaaaaacicuulyams3rrxazerty1uiopqsd2fghjklm3wxcvbr4m7wpu52a@host.bastion.eu-frankfurt-1.oci.oraclecloud.com &

Before run the command add an ampersand (&) at the end.

FYI, the error message “bind: Cannot assign requested address” is not a problem, this is just because the Cloud Shell tries to bind on ipv6 too.
If you want to avoid it, just add -4 between ssh and -i like this: ssh -4 -i

Connect to MDS using MySQL Shell in Cloud Shell

As you can see in the picture above, MySQL Shell is part of Cloud Shell.

So we will use MySQL Shell to connect to our MySQL instance through the SSH tunnel using MySQL

I recommend you to read Discovering MySQL Database Service – Episode 8 – Connect to MySQL Database Service Using MySQL Shell for more details.

Basically you’ll need the following information for the connection:

MySQL client is here MySQL Shell : mysqlsh
MySQL Database Service user you’ve created : admin
IP to connect to MDS (through the SSH tunnel) : localhost
Port to connect to MDS (through the SSH tunnel) : 3306
Your MDS password

We are now connected to our MySQL Database Service instance \o/
The server version is here 8.0.26-cloud MySQL Enterprise

And as expected we can see and query the data we download during Discovering MySQL Database Service – Episode 5 – Create a MySQL DB system from a MySQL Shell dump.

Well, that’s all for today!
In this episode, we have seen how to securely connect to a MySQL Database Service instance using the Oracle Cloud Infrastructure Cloud Shell through a SSH tunnel (port forwarding session) using the OCI Bastion service.

Next episode is:

MySQL Database Service Features Overview

Resources

Planning episodes

Watch my videos on my YouTube channel and subscribe.