Querying the Unstructured: Natural Language to SQL for JSON Data
In Ask Your Database Anything: Natural Language to SQL (NL2SQL) in MySQL HeatWave
, we have explored the innovative MySQL HeatWave GenAI technology that converts Natural Language into SQL, making it easier for you to interact with databases. This feature collects information on the schemas, tables, and columns that you have access to, and then uses a Large Language Model (LLM) to generate an SQL query for the question pertaining to your data. It also lets you run the generated query and view the result set.
Following our last article, Let Your AI DBA Assistant Write Your MySQL Queries
, showcased an interesting use case: leveraging an AI DBA Assistant to generate monitoring and tuning queries for the Performance, Information, and Sys Schemas using plain English.
In this article, we will talk about how to handle JSON documents with MySQL HeatWave GenAI NL2SQL feature.

Context
I’m using MySQL HeatWave 9.4.2 on Oracle Cloud Infrastructure (OCI):
SQL>
-- HeatWave MySQL server version
SHOW VARIABLES WHERE Variable_name IN ('version_comment', 'version');
+-----------------+--------------------------+
| Variable_name | Value |
+-----------------+--------------------------+
| version | 9.4.2-cloud |
| version_comment | MySQL Enterprise - Cloud |
+-----------------+--------------------------+
Furthermore, I’m using a modified version (available on my GitHub account, file: nl2sql_world_json_nested_-_tables_json_nested.sql) of the well-known World database. Although the data hasn’t been updated in some time, it still serves as a useful tool for visualizing the results of SQL queries.
Here the schema:
CREATE TABLE `country_json_nested` (
`ID` int NOT NULL AUTO_INCREMENT COMMENT 'Primary key: integer that uniquely identifies country_json JSON documents.',
`country_description` json NOT NULL COMMENT 'Country information. Including country names, continent, region, surface area, identifiers (country code), demographics, economy (GNP), and government structure',
PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='Stores detailed information about countries, including identifiers, demographics, economy, and government structure.'
;
CREATE TABLE `country_language_json_nested` (
`ID` int NOT NULL AUTO_INCREMENT COMMENT 'Primary key: integer that uniquely identifies country_language_json JSON documents.',
`country_language_description` json NOT NULL COMMENT 'Country languages information. Including a three-letter country code referencing the country table, the name of the language spoken in the country, whether they are official and the share of the population using them.',
PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='Stores information about the languages spoken in each country, including whether they are official and the share of the population using them.'
;
CREATE TABLE `city_json_nested` (
`ID` int NOT NULL AUTO_INCREMENT COMMENT 'Primary key: integer that uniquely identifies city_json JSON documents.',
`city_description` json NOT NULL COMMENT 'Cities information. including their name, country, district, and population.',
PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='Stores information about cities, including their name, country, district, and population.'
;
Let’s run 3 queries on the data set:
MySQL>
-- Find the name, country code and the district of the city with ID 2989
SELECT
city_description->>"$.city_name" AS "City Name",
city_description->>"$.location.country_code" AS "Country Code",
city_description->>"$.location.district" AS District
FROM city_json_nested
WHERE city_description->>"$.ID_city" = 2989;
+-----------+--------------+-------------+
| city_name | country_code | district |
+-----------+--------------+-------------+
| Grenoble | FRA | Rhône-Alpes |
+-----------+--------------+-------------+
-- Find the government form of Germany
SELECT
country_description->>"$.government.government_form" AS "Government Form"
FROM country_json_nested
WHERE country_description->>"$.identifiers.code" = 'FRA';
+-----------------+
| government_form |
+-----------------+
| Republic |
+-----------------+
-- List all official languages spoken in Canada
SELECT
country_language_description->>"$.language_info.language" AS Language
FROM country_language_json_nested
WHERE country_language_description->>"$.country.country_code" = 'CAN'
AND country_language_description->>"$.language_info.is_official" = 'T';
+----------+
| language |
+----------+
| English |
| French |
+----------+
I’m employing the inline path operator (->>
), which serves as a convenient shorthand for extracting and unquoting values: JSON_UNQUOTE(JSON_EXTRACT(...))
.
The execution of these queries, while simple, still relies on existing SQL and JSON expertise. MySQL HeatWave GenAI can eliminates this dependency by enabling users to interact with the database via Natural Language to SQL. As highlighted in “Ask Your Database Anything: Natural Language to SQL in MySQL HeatWave” the new NL_SQL
routine allows non-technical users to write requests in plain English, and have the system automatically generate the necessary SQL.

Introducing JSON
JSON (JavaScript Object Notation) is a lightweight, human-readable format for structuring and exchanging data. Built around key–value pairs and ordered arrays, JSON enables developers to model complex, hierarchical information without the rigidity of a fixed schema.
Its simplicity, flexibility, and near-universal support across programming languages have made it the standard for transmitting structured data between servers, web applications, and APIs in modern software systems.
JSON in the MySQL Context
In MySQL, the native JSON data type — introduced in version 5.7 — brings NoSQL-style flexibility into the realm of relational databases. It allows developers to store, validate, and optimize JSON documents directly within columns, combining the ACID compliance and transactional integrity of MySQL with the dynamic nature of semi-structured data.
Specialized JSON functions such as JSON_EXTRACT()
, JSON_CONTAINS()
, JSON_OBJECT() or JSON_TABLE()
enable precise querying and manipulation, while indexing options like generated columns ensure high performance.
This hybrid approach lets MySQL handle everything from traditional records to complex API payloads within a unified, scalable system.
Some MySQL & JSON content that you may find useful:
- MySQL JSON Function Reference
- 30 mins with MySQL JSON functions
- 30 mins with JSON in MySQL
- Using MySQL as a Document Store
- MySQL JSON Document Store
- Two Models, One World: Introducing JSON Relational Duality Views in MySQL HeatWave
- Hands-on with JSON Duality Views in MySQL: From Creation to Day-to-Day Use
- JSON Duality Views
Challenges of Using the JSON Data Type
NL2SQL in MySQL HeatWave GenAI is easy to use and highly effective when configured with the right context. To help the LLM generate accurate queries, it’s important to narrow its focus by specifying the relevant schemas and tables through the schemas and tables parameters. Additionally, using clear, descriptive names for tables, columns, and views ensures the model can better interpret their purpose and produce more precise SQL statements.
The core difficulty stems from JSON’s structure: the document simultaneously holds both data and metadata (keys), yet MySQL treats the entire document as undifferentiated raw data within a single column. From the database’s perspective, the column contains a blob of JSON, meaning there’s little or no external schema metadata — such as explicit column names, data types, or constraints — to help the LLM interpret the semantic meaning or internal structure of the fields. This inherent lack of external context makes intelligent processing more challenging.
Examples
Using nl_sql is quite simple:
CALL sys.NL_SQL(
"NaturalLanguageStatement",
@output,
JSON_OBJECT(
'schemas', JSON_ARRAY('DBName'),
'model_id', 'ModelID'
)
);
Replace the following:
NaturalLanguageStatement
: natural-language statement. It can be a question, statement, or query pertaining to your data available in MySQL HeatWave.DBName
: database to consider for generating and running the SQL query.ModelID
: LLM to use.
You can find additional information here.
Let’s see an example:
MySQL>
SELECT model_id, provider
FROM sys.ML_SUPPORTED_LLMS
WHERE capabilities LIKE '["GENERATION"]';
+-------------------------------+---------------------------+
| model_id | provider |
+-------------------------------+---------------------------+
| llama2-7b-v1 | HeatWave |
| llama3-8b-instruct-v1 | HeatWave |
| llama3.1-8b-instruct-v1 | HeatWave |
| llama3.2-1b-instruct-v1 | HeatWave |
| llama3.2-3b-instruct-v1 | HeatWave |
| mistral-7b-instruct-v1 | HeatWave |
| mistral-7b-instruct-v3 | HeatWave |
| cohere.command-latest | OCI Generative AI Service |
| cohere.command-plus-latest | OCI Generative AI Service |
| cohere.command-a-03-2025 | OCI Generative AI Service |
| meta.llama-3.3-70b-instruct | OCI Generative AI Service |
| cohere.command-r-08-2024 | OCI Generative AI Service |
| cohere.command-r-plus-08-2024 | OCI Generative AI Service |
+-------------------------------+---------------------------+
MySQL>
SHOW TABLES;
+------------------------------------+
| Tables_in_nl2sql_world_json_nested |
+------------------------------------+
| city_json_nested |
| country_json_nested |
| country_language_json_nested |
+------------------------------------+
MySQL nl2sql_world_json_nested SQL> SET @nlq = "Number of cities";
Query OK, 0 rows affected (0.0003 sec)
MySQL nl2sql_world_json_nested SQL> CALL sys.NL_SQL(@nlq, @output, '{"model_id": "meta.llama-3.3-70b-instruct", "schemas":["nl2sql_world_json_nested"]}');
+-----------------------------------------------------------------------+
| Executing generated SQL statement... |
+-----------------------------------------------------------------------+
| SELECT COUNT(`ID`) FROM `nl2sql_world_json_nested`.`city_json_nested` |
+-----------------------------------------------------------------------+
1 row in set (1.6583 sec)
+-------------+
| COUNT(`ID`) |
+-------------+
| 4079 |
+-------------+
1 row in set (1.6583 sec)
That was straightforward!
Now, let’s raise the bar with a slightly more complex query:
MySQL nl2sql_world_json_nested SQL> SET @nlq = "Population of Monaco";
Query OK, 0 rows affected (0.0003 sec)
MySQL nl2sql_world_json_nested SQL> CALL sys.NL_SQL(@nlq, @output, '{"model_id": "meta.llama-3.3-70b-instruct", "schemas":["nl2sql_world_json_nested"]}');
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Executing generated SQL statement... |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| SELECT JSON_EXTRACT(`city_description`, '$.population') FROM `nl2sql_world_json_nested`.`city_json_nested` WHERE JSON_EXTRACT(`city_description`, '$.name') = 'Monaco' |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (2.1982 sec)
Empty set (2.1982 sec)
Query OK, 0 rows affected (2.1982 sec)
MySQL nl2sql_world_json_nested SQL> SET @nlq = "List all official languages spoken in Canada";
Query OK, 0 rows affected (0.0004 sec)
MySQL nl2sql_world_json_nested SQL> CALL sys.NL_SQL(@nlq, @output, '{"model_id": "meta.llama-3.3-70b-instruct", "schemas":["nl2sql_world_json_nested"]}');
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Executing generated SQL statement... |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| SELECT JSON_EXTRACT(`country_language_description`, '$[*].language') FROM `nl2sql_world_json_nested`.`country_language_json_nested` WHERE JSON_EXTRACT(`country_language_description`, '$[*].country_code') = '"CAN"' AND JSON_EXTRACT(`country_language_description`, '$[*].is_official') = 'true' |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (2.3889 sec)
Empty set (2.3889 sec)
Query OK, 0 rows affected (2.3889 sec)
In both of these recent cases, the generated queries were syntactically correct, but the system failed to produce a semantically relevant query.
But all is not lost…
Workaround: Defining Views for Better Context
A highly effective workaround is to introduce well-defined views over the existing table. The main idea is to expose the JSON’s internal structure by effectively translating the semi-structured data into a fully relational schema. By using descriptive, explicit column names in these views, you furnish the NL2SQL model with the necessary semantic metadata. This strategy significantly improves accuracy for complex JOIN operations and accurate filtering by providing clear, accessible relational keys.
Let’s see what the views (nl2sql_world_json_nested_-_view_json_nested.sql) might look like:
CREATE OR REPLACE VIEW country_nested_flat AS
SELECT
ID AS country_id,
-- Identifiers
JSON_UNQUOTE(JSON_EXTRACT(country_description, '$.identifiers.code')) AS country_code,
JSON_UNQUOTE(JSON_EXTRACT(country_description, '$.identifiers.code2')) AS country_code2,
JSON_UNQUOTE(JSON_EXTRACT(country_description, '$.identifiers.country_name')) AS country_name,
JSON_UNQUOTE(JSON_EXTRACT(country_description, '$.identifiers.local_name')) AS local_name,
-- Geography
JSON_UNQUOTE(JSON_EXTRACT(country_description, '$.geography.continent')) AS continent,
JSON_UNQUOTE(JSON_EXTRACT(country_description, '$.geography.region')) AS region,
CAST(JSON_EXTRACT(country_description, '$.geography.surface_area') AS DECIMAL(12,2)) AS surface_area,
CAST(JSON_EXTRACT(country_description, '$.geography.ID_capital') AS SIGNED) AS capital_id,
-- Demographics
CAST(JSON_EXTRACT(country_description, '$.demographics.country_population') AS SIGNED) AS population,
CAST(JSON_EXTRACT(country_description, '$.demographics.life_expectancy') AS DECIMAL(5,2)) AS life_expectancy,
CAST(JSON_EXTRACT(country_description, '$.demographics.independance_year') AS SIGNED) AS independance_year,
-- Economy
CAST(JSON_EXTRACT(country_description, '$.economy.GNP') AS DECIMAL(15,2)) AS gnp,
CAST(JSON_EXTRACT(country_description, '$.economy.GNPOld') AS DECIMAL(15,2)) AS gnp_old,
-- Government
JSON_UNQUOTE(JSON_EXTRACT(country_description, '$.government.head_of_state')) AS head_of_state,
JSON_UNQUOTE(JSON_EXTRACT(country_description, '$.government.government_form')) AS government_form
FROM country_json_nested;
CREATE OR REPLACE VIEW country_language_nested_flat AS
SELECT
ID AS language_id,
-- Country reference
JSON_UNQUOTE(JSON_EXTRACT(country_language_description, '$.country.country_code')) AS country_code,
-- Language details
JSON_UNQUOTE(JSON_EXTRACT(country_language_description, '$.language_info.language')) AS language,
JSON_UNQUOTE(JSON_EXTRACT(country_language_description, '$.language_info.is_official')) AS is_official,
CAST(JSON_EXTRACT(country_language_description, '$.language_info.percentage') AS DECIMAL(5,2)) AS percentage
FROM country_language_json_nested;
CREATE OR REPLACE VIEW city_nested_flat AS
SELECT
ID AS json_id,
-- City identifiers
CAST(JSON_EXTRACT(city_description, '$.ID_city') AS SIGNED) AS city_id,
JSON_UNQUOTE(JSON_EXTRACT(city_description, '$.city_name')) AS city_name,
-- Location details
JSON_UNQUOTE(JSON_EXTRACT(city_description, '$.location.district')) AS district,
JSON_UNQUOTE(JSON_EXTRACT(city_description, '$.location.country_code')) AS country_code,
-- Demographics
CAST(JSON_EXTRACT(city_description, '$.city_population') AS SIGNED) AS population
FROM city_json_nested;
Let’s now re-run our three previous queries, applying this new view-based approach:
MySQL nl2sql_world_json_nested SQL> SET @nlq = "Number of cities";
Query OK, 0 rows affected (0.0004 sec)
MySQL nl2sql_world_json_nested SQL> CALL sys.NL_SQL(@nlq, @output, '{"model_id": "meta.llama-3.3-70b-instruct", "schemas":["nl2sql_world_json_nested"]}');
+----------------------------------------------------------------------------+
| Executing generated SQL statement... |
+----------------------------------------------------------------------------+
| SELECT COUNT(`city_id`) FROM `nl2sql_world_json_nested`.`city_nested_flat` |
+----------------------------------------------------------------------------+
1 row in set (1.8282 sec)
+------------------+
| COUNT(`city_id`) |
+------------------+
| 4079 |
+------------------+
1 row in set (1.8282 sec)
Query OK, 0 rows affected (1.8282 sec)
MySQL nl2sql_world_json_nested SQL> SET @nlq = "Population of Monaco";
Query OK, 0 rows affected (0.0004 sec)
MySQL nl2sql_world_json_nested SQL> CALL sys.NL_SQL(@nlq, @output, '{"model_id": "meta.llama-3.3-70b-instruct", "schemas":["nl2sql_world_json_nested"]}');
+-----------------------------------------------------------------------------------------------------------+
| Executing generated SQL statement... |
+-----------------------------------------------------------------------------------------------------------+
| SELECT `population` FROM `nl2sql_world_json_nested`.`country_nested_flat` WHERE `country_name` = 'Monaco' |
+-----------------------------------------------------------------------------------------------------------+
1 row in set (2.0889 sec)
+------------+
| population |
+------------+
| 34000 |
+------------+
1 row in set (2.0889 sec)
Query OK, 0 rows affected (2.0889 sec)
MySQL nl2sql_world_json_nested SQL> SET @nlq = "List all official languages spoken in Canada";
Query OK, 0 rows affected (0.0004 sec)
MySQL nl2sql_world_json_nested SQL> CALL sys.NL_SQL(@nlq, @output, '{"model_id": "meta.llama-3.3-70b-instruct", "schemas":["nl2sql_world_json_nested"]}');
+---------------------------------------------------------------------------------------------------------------------------------------+
| Executing generated SQL statement... |
+---------------------------------------------------------------------------------------------------------------------------------------+
| SELECT `language` FROM `nl2sql_world_json_nested`.`country_language_nested_flat` WHERE `country_code` = 'CAN' AND `is_official` = 'T' |
+---------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (2.0249 sec)
+----------+
| language |
+----------+
| English |
| French |
+----------+
2 rows in set (2.0249 sec)
Query OK, 0 rows affected (2.0249 sec)
Q.E.D.
Peroraison
Bridging natural language processing and semi-structured data presents both exciting opportunities and unique challenges. MySQL HeatWave GenAI’s NL2SQL capability demonstrates how natural language can simplify interaction with complex data systems, even when working with intricate JSON documents.
However, because JSON stores both data and metadata together, the absence of explicit schema information can limit how effectively an LLM interprets and formulates queries. Creating well-structured views that expose JSON’s internal organization offers a practical solution — transforming unstructured data into meaningful relational context.
Ultimately, this approach not only enhances NL2SQL’s accuracy but also showcases how MySQL HeatWave continues to evolve as a powerful engine for intelligent, natural language–driven analytics.

Resources
- Ask Your Database Anything: Natural Language to SQL in MySQL HeatWave
- Let Your AI DBA Assistant Write Your MySQL Queries
- Introducing Natural Language to SQL for MySQL HeatWave
- Generate SQL Queries From Natural-Language Statements
- NL_SQL
- MySQL HeatWave GenAI Supported Models and Languages
Watch my videos on my YouTube channel and subscribe.
Thanks for using HeatWave & MySQL!

Cloud Solutions Architect at Oracle
MySQL Geek, strong interest in data and AI, author, blogger and speaker
I’m an insatiable hunger of learning.
—–
Blog: www.dasini.net/blog/en/
Twitter: https://twitter.com/freshdaz
SlideShare: www.slideshare.net/freshdaz
Youtube: https://www.youtube.com/channel/UC12TulyJsJZHoCmby3Nm3WQ
—–
Leave a Reply