Building an AI Vision Search Engine with MySQL HeatWave GenAI

May 14, 2026

Tags: AI, Artificial Intelligence, Cloud, GenAI, Howto, Image, MDS, tuto

Modern AI systems increasingly rely on multimodal data: text, images, documents, audio, and video. Among these modalities, image understanding has become one of the most important capabilities for AI-powered applications.

Organizations now expect systems to:

Search images using natural language
Compare visually similar assets
Classify and enrich image catalogs
Validate AI-generated descriptions
Build semantic retrieval pipelines

Traditionally, implementing these capabilities required specialized computer vision infrastructure, external vector databases, custom ML pipelines, and multiple frameworks.

With MySQL HeatWave GenAI, many of these capabilities can now be implemented directly inside SQL workflows using built-in AI routines such as:

sys.ML_GENERATE
sys.ML_EMBED_ROW
VECTOR data types
Similarity search with DISTANCE and the other Vector functions.

In this article, we will build the foundations of a Vision Model Evaluation Assistant using MySQL HeatWave GenAI.

The objective is not to create another image classifier, but rather a semantic image understanding platform capable of:

Generating image descriptions
Producing semantic embeddings
Performing text-to-image search
Performing semantic reverse image search
Evaluating image understanding quality

This article intentionally focuses on the database layer and the SQL logic powering the system.

Landscape of Vision-to-Embedding Approaches

Before building the solution, it is important to understand the major approaches currently used for AI-powered image understanding systems.

Let’s have a look on three paradigms for representing images in AI systems.

1. Direct Image Embeddings

The first approach consists of converting raw image pixels directly into vectors.

Popular architectures include:

CLIP-like (Contrastive Language-Image Pretraining) models
ResNet + projector architectures
Vision Transformers (ViT)

In this approach:

The image is fed directly into a neural network
The network produces a dense vector embedding
Similarity search is performed directly on image vectors

Workflow

Strengths

Extremely fast similarity search
Excellent for image-to-image matching
End-to-end architecture
No intermediate textual representation required

Limitations

These embeddings are often difficult to interpret:

No human-readable explanation
Harder to debug
Harder to refine
Black-box behavior

If a search result is incorrect, it is difficult to understand why the model failed.

To summarize

Pros

High-performance retrieval
Strong visual similarity matching
Can compare image-to-image directly

Cons

Requires specialized model deployment
Less explainable
Harder to integrate into SQL-centric workflows

2. Semantic Metadata Embeddings

This is the approach used in this article.

Instead of embedding raw pixels directly, we first generate a textual understanding of the image.

The workflow becomes:

This pipeline is composed of two distinct stages:

Step 1: Image -> Text

A Vision Language Model (VLM) analyzes the image and generates:

A description and / or Keywords
Contextual understanding

Example:

{    "description": "A red motorcycle parked on a city street near buildings during daytime.",

    "keywords": ["motorcycle", "street", "urban", "vehicle", "city"]
}

Step 2: Text -> Embedding

The generated text is then converted into a vector embedding using a text embedding model.

This creates semantic vectors representing the meaning of the image.

Strengths

This approach offers several advantages:

Human-readable and explainable

Unlike direct image embeddings, the generated metadata is understandable by humans.

If a search fails, developers can inspect:

The generated description
The extracted keywords
The semantic interpretation

This creates a powerful audit trail.

SQL-native workflow

Because the representation becomes textual:

Standard embedding models can be used
VECTOR columns integrate naturally
SQL workflows remain simple

This aligns perfectly with MySQL HeatWave GenAI capabilities.

Trade-offs

The quality of retrieval depends heavily on:

Caption quality
Prompt engineering
Vision model performance

Low-level visual details may also be lost:

Texture
Exact shape
Pixel-specific patterns

To summarize

Pros

Human-readable
Easy to debug
Works well with SQL systems
Easier operationalization

Cons

Less precise for purely visual similarity
Depends on generated metadata quality

3. Multimodal Fusion Models

A third category consists of multimodal architectures capable of jointly processing:

Images
Text
Audio
Video
Structured data

These systems use:

Early fusion
Intermediate fusion
Late fusion
Hybrid fusion architectures

Their objective is to create unified multimodal embeddings.

Strengths

Very powerful semantic understanding
Strong cross-modal reasoning
Rich contextual interpretation

Challenges

These architectures are often:

More complex
Harder to operationalize
Less transparent
Difficult to integrate into SQL-centric architectures

Why We Chose the Image -> Text -> Embedding Approach?

In this article, we intentionally choose the Semantic Metadata Embeddings pipeline because it aligns perfectly with MySQL HeatWave GenAI capabilities.

Specifically:

ML_GENERATE handles image-to-text generation
ML_EMBED_ROW handles text-to-vector conversion

This approach also creates a human-readable semantic audit trail.

If retrieval quality degrades, architects and developers can directly inspect the generated descriptions to understand:

What the model detected
What it missed
Why semantic similarity failed

For AI evaluation systems, explainability is often just as important as accuracy.

Overview of MySQL HeatWave GenAI

MySQL HeatWave is a fully managed cloud service that combines:

Transactional processing (OLTP)
Analytics (OLAP)
Machine Learning
Generative AI
Vector Store capabilities

within a single MySQL platform.

One of the most interesting aspects of HeatWave GenAI is that it enables AI-powered workflows directly inside SQL.

Instead of exporting data to external AI pipelines, developers can:

Generate embeddings
Execute semantic similarity search
Perform retrieval augmented generation (RAG)
Analyze images
Invoke LLMs

using SQL routines \o/.

MySQL HeatWave Key GenAI Capabilities

MySQL HeatWave GenAI includes:

Text generation
Text embeddings
Vector similarity search
In-database AI processing
Vision model integration

Key routines and data types used in this article include:

In this article, I’m using MySQL HeatWave version 9.6.1:

mysql> 
SELECT VERSION();
+-------------+
| VERSION()   |
+-------------+
| 9.6.1-cloud |
+-------------+

Significantly lower the Total Cost of Ownership

Traditionally, AI pipelines require:

Multiple services
External vector databases
Dedicated ML infrastructure
Separate orchestration layers

With MySQL HeatWave:

Data remains inside MySQL
AI logic becomes SQL-native
Governance is simplified
Operational complexity is reduced

By consolidating these diverse requirements into a single platform, MySQL HeatWave significantly lowers the Total Cost of Ownership (TCO). It eliminates the expensive licensing fees associated with multiple third-party services and external vector databases.

Furthermore, by making AI logic SQL-native and reducing operational complexity, organizations can leverage their existing database expertise rather than hiring specialized ML engineers to manage separate orchestration layers, drastically cutting both infrastructure and personnel overhead.

Summary of TCO Drivers

Skill Set Leverage: AI tasks become accessible to anyone with SQL skills, reducing the need for niche, high-cost specialists.
Reduced Licensing: Consolidating multiple services into one platform removes the need for separate vendor contracts.
Lower Infrastructure Costs: Eliminating dedicated ML infrastructure and external databases reduces the physical or cloud hardware footprint.
Operational Efficiency: Keeping data inside MySQL HeatWave simplifies governance and reduces the labor hours required for complex data movement and synchronization.

Generating Image Understanding with `sys.ML_GENERATE`

The foundation of our evaluation assistant is the ability to convert images into semantic understanding.

This is where sys.ML_GENERATE becomes extremely powerful.

Prerequisites

The environment requires:

A MySQL HeatWave instance with a HeatWave cluster attached
Access to supported vision models (like google.gemini-2.5-pro)
Access to supported embedding models (like cohere.embed-english-v3.0)
A table capable of storing:
- Images
- Descriptions
- Keywords
- Embeddings

Table Definition

CREATE TABLE image_details (  id_image int unsigned NOT NULL AUTO_INCREMENT,
  image_name varchar(255) NOT NULL,
  image_base64 longtext NOT NULL,
  image_description json NOT NULL,
  image_description_embedding vector(2048) NOT NULL COMMENT 'GENAI_OPTIONS=EMBED_MODEL_ID=cohere.embed-english-v3.0',
  image_keywords json NOT NULL,
  image_keywords_embedding vector(2048) NOT NULL COMMENT 'GENAI_OPTIONS=EMBED_MODEL_ID=cohere.embed-english-v3.0',
  created_at timestamp NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (id_image)
) ENGINE=InnoDB;

This table stores:

The original image (in base 64).
Semantic metadata (Image description and keywords).
Vector embeddings (of the description and the keywords).

directly inside MySQL.

What `sys.ML_GENERATE` Enables

The sys.ML_GENERATE routine supports multimodal prompts combining:

Text
Images

This enables:

Caption generation
Object detection prompts
Context extraction
Semantic interpretation

Please read the MySQL HeatWave ML_GENERATE Documentation if you want to know more.

Bridging Structured and Unstructured Data

One of the most interesting aspects of ML_GENERATE is that it bridges the gap between:

Structured SQL systems
Unstructured image content

The output becomes structured semantic metadata that can:

Be stored
Indexed
Embedded
Queried
Compared

using standard SQL operations.

Supported Vision Models

The supported vision models can be queried directly from MySQL HeatWave:

mysql>
SELECT * 
FROM sys.ML_SUPPORTED_LLMS
WHERE model_id LIKE 'google.gemini%'\G

*************************** 1. row ***************************
         provider: OCI Generative AI Service
         model_id: google.gemini-2.5-flash
availability_date: 2026-01-22
     capabilities: ["GENERATION"]
    default_model: 0
*************************** 2. row ***************************
         provider: OCI Generative AI Service
         model_id: google.gemini-2.5-pro
availability_date: 2026-01-22
     capabilities: ["GENERATION"]
    default_model: 0
*************************** 3. row ***************************
         provider: OCI Generative AI Service
         model_id: google.gemini-2.5-flash-lite
availability_date: 2026-01-22
     capabilities: ["GENERATION"]
    default_model: 0

The model used in this article is:

google.gemini-2.5-pro

The workflow is straightforward:

This transforms unstructured visual data into structured semantic data.

Examples

Object Detection Prompt

mysql> 
SELECT image_base64
FROM image_details
WHERE id_image = 1
INTO @image_base64;

SET @prompt_desc_key_json = 'What objects are present?';

SELECT JSON_UNQUOTE(
    JSON_EXTRACT(
        sys.ML_GENERATE(@prompt_desc_key_json, JSON_OBJECT("model_id", "google.gemini-2.5-pro", "image", @image_base64)),
        '$.text'
    )
) AS description
\G

*************************** 1. row ***************************
description: Based on the image provided, here are the objects that are present:

**Main Subject:**
*   A black motorcycle, which appears to be a Honda Hornet, is the central object. Its visible parts include:
    *   Wheels and tires
    *   Engine
    *   Exhaust pipe and muffler
    *   Frame
    *   Seat
    *   Fuel tank (with a black tank cover on it)
    *   Handlebars with mirrors and levers
    *   Front and rear disc brakes
    *   Gold-colored front forks

**Background and Surroundings:**
*   A white or light gray garage door with vertical panels.
*   A red brick wall on the left.
*   A white drainpipe between the wall and the garage door.
*   Pavement or concrete on the ground, with some patches of frost or thin snow.
*   Dry, brown leaves scattered on the ground.

Scene Context Prompt

mysql> 
SET @prompt_desc_key_json = 'What is the context of this scene?';

SELECT JSON_UNQUOTE(
    JSON_EXTRACT(
        sys.ML_GENERATE(@prompt_desc_key_json, JSON_OBJECT("model_id", "google.gemini-2.5-pro", "image", @image_base64)),
        '$.text'
    )
) AS description
\G

*************************** 1. row ***************************
description: Based on the image, the context of the scene is an outdoor, residential setting, likely during a cold season like late autumn or winter.

Here are the key details that establish this context:

*   **Subject:** The central focus is a modern, black and silver Honda Hornet motorcycle. It is parked and appears to be the subject of the photo, possibly taken by its owner.
*   **Location:** The motorcycle is parked on a paved surface, such as a driveway or a courtyard, in front of white, vertical-paneled garage doors. To the left, there is a red brick wall. This combination suggests a residential area.
*   **Season/Weather:** The ground is littered with dead, brown leaves and patches of what looks like frost or old, melting snow. This indicates cold weather, pointing to late autumn or winter. The lighting appears flat and overcast, which is common for that time of year.

This capability becomes the semantic foundation of the evaluation assistant. We are not merely generating captions.

We are enabling:

Queryable visual understanding
Semantic enrichment
Explainable AI retrieval

Text-to-Image Search (Semantic Retrieval)

One of the most powerful applications of semantic embeddings is natural language image search.

The objective is simple:

Input: Text
Output: Relevant Images

But internally, the logic is fundamentally different from traditional image retrieval systems.

The Core Idea

We are not searching for:

Pixels
Shapes
Exact visual patterns

We are searching for:

Meaning
Context
Semantics

This is semantic retrieval.

Workflow Overview

The workflow contains two phases:

Preprocessing
Query execution

Preprocessing Phase

Each image must first be semantically enriched.

The workflow is the following:

Examples

Supported Embedding Models

mysql> 
SELECT *
FROM sys.ML_SUPPORTED_LLMS
WHERE capabilities LIKE '["TEXT_EMBEDDINGS"]'\G

*************************** 1. row ***************************
         provider: HeatWave
         model_id: all_minilm_l12_v2
availability_date: 2024-07-01
     capabilities: ["TEXT_EMBEDDINGS"]
    default_model: 0
*************************** 2. row ***************************
         provider: HeatWave
         model_id: multilingual-e5-small
availability_date: 2024-07-24
     capabilities: ["TEXT_EMBEDDINGS"]
    default_model: 1
*************************** 3. row ***************************
         provider: OCI Generative AI Service
         model_id: cohere.embed-english-v3.0
availability_date: 2024-02-17
     capabilities: ["TEXT_EMBEDDINGS"]
    default_model: 0
*************************** 4. row ***************************
         provider: OCI Generative AI Service
         model_id: cohere.embed-multilingual-v3.0
availability_date: 2024-02-17
     capabilities: ["TEXT_EMBEDDINGS"]
    default_model: 0

Generating Semantic Metadata

mysql> 
SET @prompt_desc_key_json = '
Analyze the image and return a minified JSON object.
The JSON must contain two fields:
"description" (a concise, single-paragraph description of visible content with no speculation)
and "keywords" (a JSON array of relevant lowercase visual keywords).
Output ONLY the raw JSON on a single line without any markdown formatting, backticks, or preamble
';

Generate Description and Keywords

mysql> 
SELECT JSON_UNQUOTE(
    JSON_EXTRACT(
        sys.ML_GENERATE(@prompt_desc_key_json, JSON_OBJECT("model_id", "google.gemini-2.5-pro", "image", @image_base64)),
        '$.text'
    )
) AS description
\G

*************************** 1. row ***************************
description: {"description":"A side profile view of a black Honda Hornet naked motorcycle parked on a paved surface with scattered leaves. The bike has a black frame, engine, and fuel tank, with a silver tail section and exhaust pipe. The front forks are gold-colored. In the background, there is a red brick wall on the left and a light-colored, vertically-paneled garage door on the right.","keywords":["motorcycle","honda","honda hornet","naked bike","street bike","black","parked","outdoors","garage door","brick wall","engine","exhaust","two-wheeler","vehicle","wheel","tire","disc brake"]}

Generate Embeddings and Store Data

mysql> 
INSERT INTO image_details (
    image_name,
    image_base64,
    image_description,
    image_description_embedding,
    image_keywords,
    image_keywords_embedding
)
VALUES (
    image_name,
    image_base64,
    image_description,
    sys.ML_EMBED_ROW(
        @source_image_descriptions,
        '{"model_id": "cohere.embed-english-v3.0"}'
    ),
    image_keywords,
    sys.ML_EMBED_ROW(
        @source_image_keywords,
        '{"model_id": "cohere.embed-english-v3.0"}'
    )
);

Query Phase

At query time:

The user enters a text description
The text becomes an embedding
Similarity search retrieves matching images

Generate Query Embedding

mysql> 
SET @embeddOptions = '{"model_id": "cohere.embed-english-v3.0"}';

SET @searchImage = 'street motorcycle';

SELECT sys.ML_EMBED_ROW(
    @searchImage,
    @embeddOptions
) INTO @searchImageEmbedding;

If you want to see what a vector embedding looks like under the hood, you can use the FROM_VECTOR() function:

mysql>
SELECT from_vector(@searchImageEmbedding)\G
*************************** 1. row ***************************
from_vector(@searchImageEmbedding): [2.09045e-02,1.06277e-02,2.59590e-03,-6.01807e-02,-4.44336e-02,8.44574e-03,-4.88281e-02,8.72803e-03,-1.45645e-02,2.78168e-02,-4.73328e-02,2.53296e-03,-5.47791e-03,-1.36337e-02,4.00696e-02,-1.90735e-02,6.69556e-02,-3.72314e-02,3.95203e-02,2.24457e-02,-1.06964e-02,7.75757e-02,-1 
                 ... even more numbers ...
,-3.04871e-02,5.33295e-03,3.96729e-03]

Similarity Search Using Descriptions

mysql> 
SELECT 
    image_name,
    DISTANCE(
        image_description_embedding,
        @searchImageEmbedding,
        'COSINE'
    ) AS min_distance
FROM image_details
ORDER BY min_distance
LIMIT 3;

+-----------------------+---------------------+
| image_name            | min_distance        |
+-----------------------+---------------------+
| vintage_sidecar.JPG   | 0.46460431814193726 |
| Le_Mans_24_moto.png   | 0.49795448780059814 |
| hornet_black_gold.jpg |  0.5088052749633789 |
+-----------------------+---------------------+

Similarity Search Using Description + Keywords

mysql> 
WITH distances AS (
    SELECT
        image_name,
        (
            DISTANCE(
                image_keywords_embedding,
                @searchImageEmbedding,
                'COSINE'
            )
            +
            DISTANCE(
                image_description_embedding,
                @searchImageEmbedding,
                'COSINE'
            )
        ) / 2 AS avg_distance
    FROM image_details
)
SELECT *
FROM distances
ORDER BY avg_distance ASC
LIMIT 3;

+-----------------------+---------------------+
| image_name            | avg_distance        |
+-----------------------+---------------------+
| hornet_black_gold.jpg | 0.48873063921928406 |
| vintage_sidecar.JPG   |  0.4904197156429291 |
| Yamaha_DTX.jpg        |  0.5230678915977478 |
+-----------------------+---------------------+

Simple Re-ranking Strategy

mysql> 
WITH initial_results AS (
    SELECT
        image_name,
        DISTANCE(
            image_keywords_embedding,
            @searchImageEmbedding,
            'COSINE'
        ) AS keywords_distance,

        DISTANCE(
            image_description_embedding,
            @searchImageEmbedding,
            'COSINE'
        ) AS description_distance

    FROM image_details

    ORDER BY keywords_distance + description_distance
    LIMIT 15
),

reranked_results AS (
    SELECT
        image_name,
        (
            0.3 * keywords_distance +
            0.7 * description_distance
        ) AS combined_distance

    FROM initial_results
)

SELECT *
FROM reranked_results
ORDER BY combined_distance ASC
LIMIT 3;

+-----------------------+--------------------+
| image_name            | combined_distance  |
+-----------------------+--------------------+
| vintage_sidecar.JPG   | 0.4800935566425323 |
| hornet_black_gold.jpg |  0.496760493516922 |
| Le_Mans_24_moto.png   | 0.5196295261383057 |
+-----------------------+--------------------+

Key Insight

The most important concept here is this:

We are not comparing images.
We are comparing semantic representations of images.

This distinction fundamentally changes how image retrieval systems can be designed using SQL-native AI capabilities.

Reverse Image Search (Image-to-Image via Semantics)

Traditional reverse image search systems rely heavily on:

Pixel similarity
Feature extraction
Specialized computer vision pipelines

Our approach is different.

We perform Semantic Image-to-Image Search.

This is:

Similarity of meaning
Not similarity of pixels

Workflow

Why This Is Interesting

Two images may be visually different while still being semantically related.

For example:

Different motorcycles
Different lighting conditions
Different camera angles

Yet both images may describe: « A motorcycle parked on an urban street ».

Semantic retrieval captures this meaning.

Advantages of the Semantic Approach

Simpler Architecture

No need for:

Complex Computer Vision frameworks
GPU-heavy image feature pipelines
Specialized image vector databases

Explainability

Because the retrieval is based on generated semantic metadata:

Results can be audited
Descriptions can be refined
Prompts can be improved

SQL-native Workflow

Everything remains inside MySQL HeatWave:

Metadata
Embeddings
Similarity search
Ranking logic

Architectural Trade-offs & Design Considerations

Building AI-powered semantic image systems involves several important architectural decisions.

Description vs Keywords

Both representations serve different purposes.

Descriptions

Descriptions provide:

Rich contextual understanding
Better semantic reasoning
Natural language flexibility

However:

They may introduce noise
Longer text may dilute embeddings

Keywords

Keywords provide:

Focused semantic signals
Better precision
Faster matching

However:

They lose contextual richness

Strategy implemented in this article

A hybrid approach:

Used descriptions for semantic context
Used keywords for precision
Combined both during re-ranking

However

I’m storing two embeddings doubles vector storage (2 × 2048 dimensions per image).
For very large libraries, consider using only one – either concatenated or choose the one that performs better on your validation set.

Storing Images in the Database vs Object Storage

Another important design consideration concerns image storage.

Storing Images in MySQL

Advantages:

Simpler architecture
Centralized governance
Easier transactional consistency

Disadvantages:

Larger database size increases buffer pool pressure
Backup overhead
Potentially performance impact (queries, replication,… )

Storing Images in Object Storage

Advantages:

Better scalability
Lower storage costs
Better CDN integration

Disadvantages:

More complex architecture
Additional security management

Recommended Hybrid Strategy

A practical enterprise approach is often:

Store thumbnails (small, e.g., 512×512) as base64 in MySQL (should be sufficient for vision model analysis)
Store full-resolution images in object storage for later retrieval or user download

This enables:

Fast previews
Efficient semantic retrieval
Scalable storage architecture

Prompt Engineering Matters

Semantic quality heavily depends on prompts.

Poor prompts generate:

Vague descriptions
Weak keywords
Low-quality embeddings

Well-designed prompts improve:

Retrieval quality
Explainability
Evaluation consistency

Prompt engineering becomes a critical part of semantic architecture design.

Péroraison

MySQL HeatWave GenAI significantly simplifies the implementation of semantic image understanding systems while substantially lowering the Total Cost of Ownership through the elimination of external infrastructure and fragmented AI pipelines.

By combining:

Vision-language models
Text embeddings
Vector search
SQL-native AI routines

organizations can build powerful multimodal systems directly inside MySQL HeatWave.

The « image -> text -> embedding » approach presented in this article provides:

Explainability
Operational simplicity
SQL-native integration
Semantic search capabilities

Most importantly, it enables architects and developers to build AI systems that remain understandable, debuggable, and governable — which is often one of the biggest challenges in enterprise AI adoption.

Resources

Follow me on Linkedin

Watch my videos on my YouTube channel and subscribe.

My Slideshare account.

My Speaker Deck account.

Thanks for using HeatWave & MySQL!

Olivier DASINI

MySQL Cloud AI & Analytics Solutions Architect at Oracle
MySQL Geek, strong interest in data and AI, author, blogger and speaker
I’m an insatiable hunger of learning.
—–

Blog: www.dasini.net/blog/en/
Linkedin : www.linkedin.com/in/olivier-dasini
Speaker Deck: https://speakerdeck.com/freshdaz
SlideShare: www.slideshare.net/freshdaz
Youtube: https://www.youtube.com/channel/UC12TulyJsJZHoCmby3Nm3WQ
—–

Data Daz (dasini.net) – Data Systems, AI, and Real-World Insights

Building an AI Vision Search Engine with MySQL HeatWave GenAI

Landscape of Vision-to-Embedding Approaches

1. Direct Image Embeddings

Workflow

Strengths

Limitations

To summarize

Pros

Cons

2. Semantic Metadata Embeddings

Step 1: Image -> Text

Step 2: Text -> Embedding

Strengths

Human-readable and explainable

SQL-native workflow

Trade-offs

To summarize

Pros

Cons

3. Multimodal Fusion Models

Strengths

Challenges

Why We Chose the Image -> Text -> Embedding Approach?

Overview of MySQL HeatWave GenAI

MySQL HeatWave Key GenAI Capabilities

Significantly lower the Total Cost of Ownership

Summary of TCO Drivers

Generating Image Understanding with sys.ML_GENERATE

Prerequisites

Table Definition

What sys.ML_GENERATE Enables

Bridging Structured and Unstructured Data

Supported Vision Models

Examples

Object Detection Prompt

Scene Context Prompt

Text-to-Image Search (Semantic Retrieval)

The Core Idea

Workflow Overview

Preprocessing Phase

Examples

Supported Embedding Models

Generating Semantic Metadata

Generate Description and Keywords

Generate Embeddings and Store Data

Query Phase

Generate Query Embedding

Similarity Search Using Descriptions

Similarity Search Using Description + Keywords

Simple Re-ranking Strategy

Key Insight

Reverse Image Search (Image-to-Image via Semantics)

Workflow

Why This Is Interesting

Advantages of the Semantic Approach

Simpler Architecture

Explainability

SQL-native Workflow

Architectural Trade-offs & Design Considerations

Description vs Keywords

Descriptions

Keywords

Strategy implemented in this article

Storing Images in the Database vs Object Storage

Storing Images in MySQL

Storing Images in Object Storage

Recommended Hybrid Strategy

Prompt Engineering Matters

Péroraison

Resources

Articles similaires

Leave a Reply

Subscribe via Email

Recent Posts

Archives

MySQL Blogs

Tags

Generating Image Understanding with `sys.ML_GENERATE`

What `sys.ML_GENERATE` Enables