Interactively explore & visualize your MySQL HeatWave data with Apache Zeppelin
MySQL – The world’s most popular open source database – is a very good and safe choice as a transactional database, on-premise and in the cloud.
In the article – Explore & visualize your MySQL HeatWave data with Superset – I introduced MySQL HeatWave – a fully managed database service, that combines transactions, analytics, and machine learning services into one MySQL Database, delivering real-time, secure analytics without the complexity, latency, and cost of ETL duplication.
MySQL HeatWave is a native MySQL solution thus current MySQL applications work without changes.
Enabling a HeatWave Cluster also provides access to HeatWave Machine Learning (ML), which is a fully managed, highly scalable, cost-efficient, machine learning solution for data stored in MySQL.
HeatWave ML provides a simple SQL interface for training and using predictive machine learning models, which can be used by novice and experienced ML practitioners alike.
Machine learning expertise, specialized tools, and algorithms are not required.
With HeatWave ML, you can train a model with a single call to an SQL routine.
Below some numbers comparing MySQL HeatWave ML vs AWS Redshift ML:
For more details & raw numbers read: Performance comparison of HeatWave ML with Redshift ML.
MySQL HeatWave is currently available on Oracle Cloud Infrastructure (OCI). Availability on Amazon Web Services (AWS) and Microsoft Azure is coming (Oracle’s “HeatWave” is going multicloud: How hot will things get for MySQL?).
MySQL HeatWave ML can also takes models tuned in MySQL and use them to answer queries from popular machine learning notebook services for code and data, such as Jupyter and Apache Zeppelin.
Apache Zeppelin is an open source web-based notebook that allows data ingestion, data discovery, data analytics, data visualization and data collaboration.
Apache Zeppelin supports many interpreters such as MySQL, Python, JDBC, Shell, Spark, …
More: https://zeppelin.apache.org/
In this article I will show you how to properly configure Apache Zeppelin in order to take advantage of the Analytics and Machine Learning capabilities of MySQL HeatWave the MySQL in-memory query accelerator.
MySQL HeatWave
I’m using OCI and I will not go into all the installation details (RTFM). Below the main steps.
Using your favorite Internet browser, open your OCI console, select a compartment and go to Databases / MySQL / DB Systems.
Create a DB Systems and select HeatWave.
The rest is pretty straightforward!
Please note the Show Advanced Options link at the bottom of the page.
It’s Data Import tab is useful if you want to create an instance with data.
More information in this article: Discovering MySQL Database Service – Episode 5 – Create a MySQL DB system from a MySQL Shell dump
Now we must add the HeatWave cluster.
On the left, under Resources click HeatWave.
Add the HeatWave Cluster…
Set the Node Count
I would recommend to estimate the node count, click Estimate Node Count then Generate Estimate:
Copy the Load Command.
It will be used to load the chosen data into the HeatWave cluster.
Apply the node count estimate:
Finally create the cluster by pushing the button: Add HeatWave Cluster
When the cluster is running, the very last step is to offload the data to the cluster by running the stored procedure (i.e. the “Load Command”) copied previously using your favorite MySQL client.
e.g. Offload to the HeatWave cluster all the tables inside the airportdb schema:
CALL sys.heatwave_load(JSON_ARRAY(‘airportdb’), NULL);
Now you have a MySQL HeatWave cluster up and running with some data loaded into the cluster;
Cheers!
Apache Zeppelin
I assume that you already know what is Apache Zeppelin. If not RTFM.
Please note this is a test installation, probably not suitable for a production environment.
Requirements
Apache Zeppelin officially supports and is tested, among other, on Ubuntu 20.04, so I’ve installed Zeppelin on a VM Ubuntu 20.04 from Oracle Cloud Infrastructure (OCI).
In addition to the OS, you also need the good version of OpenJDK or Oracle JDK.
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.4 LTS
Release: 20.04
Codename: focal
$ java -version
openjdk version "17.0.3" 2022-04-19
OpenJDK Runtime Environment (build 17.0.3+7-Ubuntu-0ubuntu0.20.04.1)
OpenJDK 64-Bit Server VM (build 17.0.3+7-Ubuntu-0ubuntu0.20.04.1, mixed mode, sharing)
I’m good to go!
Binary
I downloaded the binary package with all interpreters. Available here.
wget https://dlcdn.apache.org/zeppelin/zeppelin-0.10.1/zeppelin-0.10.1-bin-all.tgz
Unpack it in a directory of your choice and you’re ready to go!
tar xvf zeppelin-0.10.1-bin-all.tgz
mv zeppelin-0.10.1-bin-all/ zeppelin
cd zeppelin
Firewalls
Depending on your architecture, you may have to deal with some firewalls.
In OCI I must update my Ingress Rules for the Default Security List (OCI virtual firewall) for the protocol TCP and the port you are going to use, 8080 in this article:
On the Ubuntu side I also had to update IPtable.
Starting Apache Zeppelin
By default Zeppelin is listening at 127.0.0.1:8080, so you can’t access it when it is deployed on another remote machine.
To access a remote Zeppelin, you need to change zeppelin.server.addr to 0.0.0.0 in conf/zeppelin-site.xml.
~/zeppelin$ cp conf/zeppelin-site.xml.template conf/zeppelin-site.xml
~/zeppelin$ vim conf/zeppelin-site.xml
...
When the update is done:
~/zeppelin$ grep -m2 -a4 "<property>" conf/zeppelin-site.xml
-->
<configuration>
<property>
<name>zeppelin.server.addr</name>
<value>0.0.0.0</value>
<description>Server binding address</description>
</property>
<property>
<name>zeppelin.server.port</name>
<value>8080</value>
<description>Server port.</description>
</property>
Address is 0.0.0.0 and Port is 8080.
Now I can start Zeppelin:
~/zeppelin$ bin/zeppelin-daemon.sh start
~/zeppelin$ bin/zeppelin-daemon.sh status
Zeppelin is running [ OK ]
Yippee!
MySQL Connector J
The Zeppelin SQL interpreter needs the JDBC MySQL Connector, available at https://dev.mysql.com/downloads/connector/j/
Obviously, MySQL customers can use the commercial version available on MOS.
$ wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java_8.0.29-1ubuntu20.04_all.deb
$ sudo dpkg -i mysql-connector-java_8.0.29-1ubuntu20.04_all.deb
The JDBC MySQL Connector is now installed!
Setup the Zeppelin interpreter
Create a new interpreter:
Enter a name for the Interpreter Name box and select jdbc as Interpreter group:
- Interpreter Name: MySQL HeatWave
- Interpreter group: jdbc
JDBC interpreter lets you create a JDBC connection to any data sources seamlessly. Inserts, Updates, and Upserts are applied immediately after running each statement.
Properties
- default.url : jdbc:mysql://<MySQL HeatWave IP Address>:3306/
- default.user : <MySQL User>
- default.password: <MySQL Password>
- default.driver : com.mysql.jdbc.Driver
Dependencies
- Dependencies artifact : mysql:mysql-connector-java:8.0.29
Don’t forget to save the configuration 🙂
To sum up, the interpreter configuration looks like:
MySQL HeatWave usage
I can now create a notebook – Check MySQL HeatWave (download the notebooks) – in order to check if HeatWave is enable:
The Default Interpreter must be MySQL_HeatWave:
For the check, let’s run these 2 queries:
SHOW VARIABLES LIKE 'rapid_bootstrap';
SHOW STATUS WHERE Variable_name IN ('rapid_cluster_status', 'rapid_ml_status', 'rapid_cluster_ready_number', 'rapid_load_progress');
The HeatWave cluster is up and ML is enable.
All good!
Analytics
Out of curiosity, let’s try some analytic queries.
I already loaded the airportdb sample database (see here) into the MySQL DB system.
Let’s create a new note – Airportdb Analytics Queries (download the notebooks):
airportdb must now be loaded into the HeatWave cluster in order to take advantage of the tremendous performance:
CALL sys.heatwave_load(JSON_ARRAY('airportdb'), NULL)
And here are the queries:
-- Number of Tickets > $500.00, Grouped By Price
SELECT booking.price, count(*)
FROM airportdb.booking
WHERE booking.price > 500
GROUP BY booking.price
ORDER BY booking.price
LIMIT 10;
-- Average Age of Passengers By Country, Per Airline
SELECT airline.airlinename, AVG(datediff(departure,birthdate)/365.25) as avg_age, count(*) as nb_people
FROM airportdb.booking, airportdb.flight, airportdb.airline, airportdb.passengerdetails
WHERE booking.flight_id=flight.flight_id AND
airline.airline_id=flight.airline_id AND
booking.passenger_id=passengerdetails.passenger_id AND
country IN ("SWITZERLAND", "FRANCE", "ITALY")
GROUP BY airline.airlinename
ORDER BY airline.airlinename, avg_age
LIMIT 10;
Great results, respectively less than 0 second and 1 second.
Without HeatWave the query execution times would be respectively 10 and 17 seconds:
Machine Learning
What about the Machine Learning possibilities of MySQL HeatWave ML?
Well this article is already long enough, so it will be the topic of the next article 🙂
Stay tune!
Resources
- Introduction to the series Discovering MySQL HeatWave Database Service
- MySQL Database Service (MDS)
- MySQL HeatWave User Guide
- HeatWave ML
- Documentation of MySQL Database Service
- Oracle Cloud Infrastructure (OCI)
- Oracle Cloud Infrastructure Glossary
- Oracle Cloud Free Tier
- MySQL — The world’s most popular open source database
- MySQL Shell
- Performance comparison of MySQL HeatWave with Snowflake, Amazon Redshift, Amazon Aurora, and Amazon RDS for MySQL
- Oracle’s latest MySQL HeatWave cloud database simplifies use of machine learning
- Download the notebooks
- Apache Zeppelin
Watch my videos on my YouTube channel and subscribe.
Thanks for using HeatWave & MySQL!
Cloud Solutions Architect at Oracle
MySQL Geek, author, blogger and speaker
I’m an insatiable hunger of learning.
—–
Blog: www.dasini.net/blog/en/
Twitter: https://twitter.com/freshdaz
SlideShare: www.slideshare.net/freshdaz
Youtube: https://www.youtube.com/channel/UC12TulyJsJZHoCmby3Nm3WQ
—–
Leave a Reply