Interactively explore & visualize your MySQL HeatWave data with Apache Zeppelin

July 5, 2022

Tags: Cloud, tuto

MySQL – The world’s most popular open source database – is a very good and safe choice as a transactional database, on-premise and in the cloud.

In the article – Explore & visualize your MySQL HeatWave data with Superset – I introduced MySQL HeatWave – a fully managed database service, that combines transactions, analytics, and machine learning services into one MySQL Database, delivering real-time, secure analytics without the complexity, latency, and cost of ETL duplication.
MySQL HeatWave is a native MySQL solution thus current MySQL applications work without changes.

Enabling a HeatWave Cluster also provides access to HeatWave Machine Learning (ML), which is a fully managed, highly scalable, cost-efficient, machine learning solution for data stored in MySQL.
HeatWave ML provides a simple SQL interface for training and using predictive machine learning models, which can be used by novice and experienced ML practitioners alike.
Machine learning expertise, specialized tools, and algorithms are not required.
With HeatWave ML, you can train a model with a single call to an SQL routine.

Below some numbers comparing MySQL HeatWave ML vs AWS Redshift ML:

MySQL HeatWave ML is 25x faster than Redshift ML
25x faster on average without compromising accuracy — **MySQL HeatWave ML is 25x faster than Redshift ML**

MySQL HeatWave ML is 1% of cost of Redshift ML
1% of cost and no additional cost for MySQL HeatWave customers — **MySQL HeatWave ML is 1% of cost of Redshift ML**

**Better performance with HeatWave ML with larger cluster size**

For more details & raw numbers read: Performance comparison of HeatWave ML with Redshift ML.

MySQL HeatWave is currently available on Oracle Cloud Infrastructure (OCI). Availability on Amazon Web Services (AWS) and Microsoft Azure is coming (Oracle’s “HeatWave” is going multicloud: How hot will things get for MySQL?).

MySQL HeatWave ML can also takes models tuned in MySQL and use them to answer queries from popular machine learning notebook services for code and data, such as Jupyter and Apache Zeppelin.

Apache Zeppelin is an open source web-based notebook that allows data ingestion, data discovery, data analytics, data visualization and data collaboration.
Apache Zeppelin supports many interpreters such as MySQL, Python, JDBC, Shell, Spark, …
More: https://zeppelin.apache.org/

In this article I will show you how to properly configure Apache Zeppelin in order to take advantage of the Analytics and Machine Learning capabilities of MySQL HeatWave the MySQL in-memory query accelerator.

MySQL HeatWave

I’m using OCI and I will not go into all the installation details (RTFM). Below the main steps.

Using your favorite Internet browser, open your OCI console, select a compartment and go to Databases / MySQL / DB Systems.

Create a DB Systems and select HeatWave.

The rest is pretty straightforward!

Please note the Show Advanced Options link at the bottom of the page.

It’s Data Import tab is useful if you want to create an instance with data.

More information in this article: Discovering MySQL Database Service – Episode 5 – Create a MySQL DB system from a MySQL Shell dump

Now we must add the HeatWave cluster.

On the left, under Resources click HeatWave.

Add the HeatWave Cluster…

Set the Node Count

I would recommend to estimate the node count, click Estimate Node Count then Generate Estimate:

Copy the Load Command.
It will be used to load the chosen data into the HeatWave cluster.

Apply the node count estimate:

Finally create the cluster by pushing the button: Add HeatWave Cluster

When the cluster is running, the very last step is to offload the data to the cluster by running the stored procedure (i.e. the “Load Command”) copied previously using your favorite MySQL client.

e.g. Offload to the HeatWave cluster all the tables inside the airportdb schema:
CALL sys.heatwave_load(JSON_ARRAY(‘airportdb’), NULL);

Now you have a MySQL HeatWave cluster up and running with some data loaded into the cluster;
Cheers!

Apache Zeppelin

I assume that you already know what is Apache Zeppelin. If not RTFM.

Please note this is a test installation, probably not suitable for a production environment.

Requirements

Apache Zeppelin officially supports and is tested, among other, on Ubuntu 20.04, so I’ve installed Zeppelin on a VM Ubuntu 20.04 from Oracle Cloud Infrastructure (OCI).

In addition to the OS, you also need the good version of OpenJDK or Oracle JDK.

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 20.04.4 LTS
Release:	20.04
Codename:	focal

$ java -version
openjdk version "17.0.3" 2022-04-19
OpenJDK Runtime Environment (build 17.0.3+7-Ubuntu-0ubuntu0.20.04.1)
OpenJDK 64-Bit Server VM (build 17.0.3+7-Ubuntu-0ubuntu0.20.04.1, mixed mode, sharing)

I’m good to go!

Binary

I downloaded the binary package with all interpreters. Available here.

wget https://dlcdn.apache.org/zeppelin/zeppelin-0.10.1/zeppelin-0.10.1-bin-all.tgz

Unpack it in a directory of your choice and you’re ready to go!

tar xvf zeppelin-0.10.1-bin-all.tgz
mv zeppelin-0.10.1-bin-all/ zeppelin
cd zeppelin

Firewalls

Depending on your architecture, you may have to deal with some firewalls.

In OCI I must update my Ingress Rules for the Default Security List (OCI virtual firewall) for the protocol TCP and the port you are going to use, 8080 in this article:

On the Ubuntu side I also had to update IPtable.

Starting Apache Zeppelin

By default Zeppelin is listening at 127.0.0.1:8080, so you can’t access it when it is deployed on another remote machine.
To access a remote Zeppelin, you need to change zeppelin.server.addr to 0.0.0.0 in conf/zeppelin-site.xml.

~/zeppelin$ cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml

~/zeppelin$ vim conf/zeppelin-site.xml
...

When the update is done:

~/zeppelin$ grep -m2  -a4  "<property>" conf/zeppelin-site.xml
-->

<configuration>

<property>
  <name>zeppelin.server.addr</name>
  <value>0.0.0.0</value>
  <description>Server binding address</description>
</property>

<property>
  <name>zeppelin.server.port</name>
  <value>8080</value>
  <description>Server port.</description>
</property>

Address is 0.0.0.0 and Port is 8080.

Now I can start Zeppelin:

~/zeppelin$ bin/zeppelin-daemon.sh start

~/zeppelin$ bin/zeppelin-daemon.sh status
Zeppelin is running                                        [  OK  ]

Yippee!

MySQL Connector J

The Zeppelin SQL interpreter needs the JDBC MySQL Connector, available at https://dev.mysql.com/downloads/connector/j/

Obviously, MySQL customers can use the commercial version available on MOS.

$ wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java_8.0.29-1ubuntu20.04_all.deb

$ sudo dpkg -i mysql-connector-java_8.0.29-1ubuntu20.04_all.deb

The JDBC MySQL Connector is now installed!

Setup the Zeppelin interpreter

Create a new interpreter:

Enter a name for the Interpreter Name box and select jdbc as Interpreter group:

Interpreter Name: MySQL HeatWave
Interpreter group: jdbc

JDBC interpreter lets you create a JDBC connection to any data sources seamlessly. Inserts, Updates, and Upserts are applied immediately after running each statement.

Properties

default.url : jdbc:mysql://<MySQL HeatWave IP Address>:3306/
default.user : <MySQL User>
default.password: <MySQL Password>
default.driver : com.mysql.jdbc.Driver

Dependencies

Dependencies artifact : mysql:mysql-connector-java:8.0.29

Don’t forget to save the configuration 🙂

To sum up, the interpreter configuration looks like:

MySQL HeatWave usage

I can now create a notebook – Check MySQL HeatWave (download the notebooks) – in order to check if HeatWave is enable:

The Default Interpreter must be MySQL_HeatWave:

For the check, let’s run these 2 queries:

SHOW VARIABLES LIKE 'rapid_bootstrap';

SHOW STATUS WHERE Variable_name IN ('rapid_cluster_status', 'rapid_ml_status', 'rapid_cluster_ready_number', 'rapid_load_progress');

The HeatWave cluster is up and ML is enable.
All good!

Analytics

Out of curiosity, let’s try some analytic queries.

I already loaded the airportdb sample database (see here) into the MySQL DB system.

Let’s create a new note – Airportdb Analytics Queries (download the notebooks):

airportdb must now be loaded into the HeatWave cluster in order to take advantage of the tremendous performance:

CALL sys.heatwave_load(JSON_ARRAY('airportdb'), NULL)

And here are the queries:

-- Number of Tickets > $500.00, Grouped By Price
SELECT booking.price, count(*) 
FROM airportdb.booking 
WHERE booking.price > 500 
GROUP BY booking.price 
ORDER BY booking.price 
LIMIT 10;


-- Average Age of Passengers By Country, Per Airline
SELECT airline.airlinename, AVG(datediff(departure,birthdate)/365.25) as avg_age, count(*) as nb_people
FROM airportdb.booking, airportdb.flight, airportdb.airline, airportdb.passengerdetails
WHERE booking.flight_id=flight.flight_id AND
    airline.airline_id=flight.airline_id AND
    booking.passenger_id=passengerdetails.passenger_id AND
    country IN ("SWITZERLAND", "FRANCE", "ITALY")
GROUP BY airline.airlinename
ORDER BY airline.airlinename, avg_age
LIMIT 10;

Great results, respectively less than 0 second and 1 second.

Without HeatWave the query execution times would be respectively 10 and 17 seconds:

Machine Learning

What about the Machine Learning possibilities of MySQL HeatWave ML?

Well this article is already long enough, so it will be the topic of the next article 🙂

Stay tune!

Resources

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

Olivier DASINI

Cloud Solutions Architect at Oracle
MySQL Geek, strong interest in data and AI, author, blogger and speaker
I’m an insatiable hunger of learning.
—–

Blog: www.dasini.net/blog/en/
Twitter: https://twitter.com/freshdaz
SlideShare: www.slideshare.net/freshdaz
Youtube: https://www.youtube.com/channel/UC12TulyJsJZHoCmby3Nm3WQ
—–

One Response to “Interactively explore & visualize your MySQL HeatWave data with Apache Zeppelin”

Iris Data Set with MySQL HeatWave Machine Learning & Zeppelin | dasini.net - Diary of a MySQL expert

July 11th, 2025 at 9:19

[…] For your Machine Learning needs you can use a notebook like Jupyter, Apache Zeppelin (see: Interactively explore & visualize your MySQL HeatWave data with Apache Zeppelin),… […]

dasini.net – Diary of a MySQL expert