Interactively explore & visualize your MySQL HeatWave data with Apache Zeppelin

July 5, 2022
Tags: ,

MySQLThe world’s most popular open source database – is a very good and safe choice as a transactional database, on-premise and in the cloud.

In the article – Explore & visualize your MySQL HeatWave data with Superset – I introduced MySQL HeatWave – a fully managed database service, that combines transactions, analytics, and machine learning services into one MySQL Database, delivering real-time, secure analytics without the complexity, latency, and cost of ETL duplication.
MySQL HeatWave is a native MySQL solution thus current MySQL applications work without changes.

Enabling a HeatWave Cluster also provides access to HeatWave Machine Learning (ML), which is a fully managed, highly scalable, cost-efficient, machine learning solution for data stored in MySQL.
HeatWave ML provides a simple SQL interface for training and using predictive machine learning models, which can be used by novice and experienced ML practitioners alike.
Machine learning expertise, specialized tools, and algorithms are not required.
With HeatWave ML, you can train a model with a single call to an SQL routine. 

Below some numbers comparing MySQL HeatWave ML vs AWS Redshift ML:

MySQL HeatWave  ML is 25x faster than Redshift ML
25x faster on average without compromising accuracy
MySQL HeatWave ML is 25x faster than Redshift ML
MySQL HeatWave  ML is 1% of cost of Redshift ML
1% of cost and no additional cost for MySQL HeatWave customers
MySQL HeatWave ML is 1% of cost of Redshift ML
Better performance with HeatWave ML with larger cluster size
Better performance with HeatWave ML with larger cluster size

For more details & raw numbers read: Performance comparison of HeatWave ML with Redshift ML.

MySQL HeatWave is currently available on Oracle Cloud Infrastructure (OCI). Availability on Amazon Web Services (AWS) and Microsoft Azure is coming (Oracle’s “HeatWave” is going multicloud: How hot will things get for MySQL?).

MySQL HeatWave ML can also takes models tuned in MySQL and use them to answer queries from popular machine learning notebook services for code and data, such as Jupyter and Apache Zeppelin.

Apache Zeppelin is an open source web-based notebook that allows data ingestion, data discovery, data analytics, data visualization and data collaboration.
Apache Zeppelin supports many interpreters such as MySQL, Python, JDBC, Shell, Spark, …
More: https://zeppelin.apache.org/

In this article I will show you how to properly configure Apache Zeppelin in order to take advantage of the Analytics and Machine Learning capabilities of MySQL HeatWave the MySQL in-memory query accelerator.

MySQL HeatWave

I’m using OCI and I will not go into all the installation details (RTFM). Below the main steps.

Using your favorite Internet browser, open your OCI console, select a compartment and go to Databases / MySQL / DB Systems.

Create a DB Systems and select HeatWave.

The rest is pretty straightforward!

Please note the Show Advanced Options link at the bottom of the page.

It’s Data Import tab is useful if you want to create an instance with data.

More information in this article: Discovering MySQL Database Service – Episode 5 – Create a MySQL DB system from a MySQL Shell dump

Now we must add the HeatWave cluster.

On the left, under Resources click HeatWave.

Add the HeatWave Cluster…

Set the Node Count

I would recommend to estimate the node count, click Estimate Node Count then Generate Estimate:

Copy the Load Command.
It will be used to load the chosen data into the HeatWave cluster.

Apply the node count estimate:

Finally create the cluster by pushing the button: Add HeatWave Cluster

When the cluster is running, the very last step is to offload the data to the cluster by running the stored procedure (i.e. the “Load Command”) copied previously using your favorite MySQL client.

e.g. Offload to the HeatWave cluster all the tables inside the airportdb schema:
CALL sys.heatwave_load(JSON_ARRAY(‘airportdb’), NULL);

Now you have a MySQL HeatWave cluster up and running with some data loaded into the cluster;
Cheers!

Apache Zeppelin

I assume that you already know what is Apache Zeppelin. If not RTFM.

Please note this is a test installation, probably not suitable for a production environment.

Requirements

Apache Zeppelin officially supports and is tested, among other, on Ubuntu 20.04, so I’ve installed Zeppelin on a VM Ubuntu 20.04 from Oracle Cloud Infrastructure (OCI).

In addition to the OS, you also need the good version of OpenJDK or Oracle JDK.

I’m good to go!

Binary

I downloaded the binary package with all interpreters. Available here.

Unpack it in a directory of your choice and you’re ready to go!

Firewalls

Depending on your architecture, you may have to deal with some firewalls.

In OCI I must update my Ingress Rules for the Default Security List (OCI virtual firewall) for the protocol TCP and the port you are going to use, 8080 in this article:

On the Ubuntu side I also had to update IPtable.

Starting Apache Zeppelin

By default Zeppelin is listening at 127.0.0.1:8080, so you can’t access it when it is deployed on another remote machine.
To access a remote Zeppelin, you need to change zeppelin.server.addr to 0.0.0.0 in conf/zeppelin-site.xml.

When the update is done:

Address is 0.0.0.0 and Port is 8080.

Now I can start Zeppelin:

Yippee!

MySQL Connector J

The Zeppelin SQL interpreter needs the JDBC MySQL Connector, available at https://dev.mysql.com/downloads/connector/j/

Obviously, MySQL customers can use the commercial version available on MOS.

The JDBC MySQL Connector is now installed!

Setup the Zeppelin interpreter

Create a new interpreter:

Enter a name for the Interpreter Name box and select jdbc as Interpreter group:

  • Interpreter Name: MySQL HeatWave
  • Interpreter group: jdbc

JDBC interpreter lets you create a JDBC connection to any data sources seamlessly. Inserts, Updates, and Upserts are applied immediately after running each statement.

Properties

  • default.url : jdbc:mysql://<MySQL HeatWave IP Address>:3306/
  • default.user : <MySQL User>
  • default.password: <MySQL Password>
  • default.driver : com.mysql.jdbc.Driver

Dependencies

  • Dependencies artifact : mysql:mysql-connector-java:8.0.29

Don’t forget to save the configuration 🙂

To sum up, the interpreter configuration looks like:

MySQL HeatWave usage

I can now create a notebook – Check MySQL HeatWave (download the notebooks) – in order to check if HeatWave is enable:

The Default Interpreter must be MySQL_HeatWave:

For the check, let’s run these 2 queries:

The HeatWave cluster is up and ML is enable.
All good!

Analytics

Out of curiosity, let’s try some analytic queries.

I already loaded the airportdb sample database (see here) into the MySQL DB system.

Let’s create a new note – Airportdb Analytics Queries (download the notebooks):

airportdb must now be loaded into the HeatWave cluster in order to take advantage of the tremendous performance:

And here are the queries:

Great results, respectively less than 0 second and 1 second.

Without HeatWave the query execution times would be respectively 10 and 17 seconds:

Machine Learning

What about the Machine Learning possibilities of MySQL HeatWave ML?

Well this article is already long enough, so it will be the topic of the next article 🙂

Stay tune!

Resources

Follow me on twitter

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using MySQL!

Leave a Reply