{"id":7711,"date":"2025-04-15T09:42:34","date_gmt":"2025-04-15T08:42:34","guid":{"rendered":"https:\/\/dasini.net\/blog\/?p=7711"},"modified":"2025-04-15T09:42:35","modified_gmt":"2025-04-15T08:42:35","slug":"build-an-ai-powered-search-engine-with-heatwave-genai-part-3","status":"publish","type":"post","link":"https:\/\/dasini.net\/blog\/2025\/04\/15\/build-an-ai-powered-search-engine-with-heatwave-genai-part-3\/","title":{"rendered":"Build an AI-Powered Search Engine with HeatWave GenAI (part 3)"},"content":{"rendered":"\n

In Build an AI-Powered Search Engine with HeatWave GenAI (part 1)<\/a>, we introduced the fundamentals of creating an AI-powered search engine using HeatWave GenAI<\/strong>. We highlighted the advantages of semantic search powered by large language models<\/strong> over traditional SQL-based approaches and provided a hands-on guide for generating embeddings<\/strong> and running similarity searches<\/strong> \u2014 key techniques that significantly improve the retrieval of relevant content.<\/p>\n\n\n\n

In the second opus \u2014 Build an AI-Powered Search Engine with HeatWave GenAI (part 2)<\/a> \u2014 we shifted our focus to improving search result quality through reranking strategies<\/strong> and the use of article summaries for embedding generation. We demonstrated how to implement these enhancements entirely within HeatWave using JavaScript-based stored procedures<\/strong>. By assigning different weights to title and excerpt distances, and generating embeddings from sanitized summaries, we boosted the precision and relevance of search results. This approach showcases HeatWave GenAI<\/strong>\u2019s ability to embed advanced AI capabilities directly within the database layer<\/strong>.<\/p>\n\n\n\n

In this third installment, we\u2019ll take it a step further by incorporating full article content into the search engine. While titles, excerpts, or even summaries may work well in many cases, there are situations where deeper, more detailed information is needed to return truly relevant answers.<\/p>\n\n\n\n

<\/div>\n\n\n\n

What are we going to do?<\/h2>\n\n\n\n

The process is slightly more complex than what we’ve covered so far (in part 1<\/a> & part 2<\/a>). In WordPress, article content is stored in HTML format within the post_content<\/em><\/code> column. This will be our starting point, and the goal is to generate embeddings. <\/p>\n\n\n\n

To achieve this, we\u2019ll need to write a few lines of code. While this could be done directly within HeatWave using JavaScript stored procedures \u2014 as we saw in part 2: A Javascript, stored procedure & AI story<\/a> \u2014 I\u2019ll instead use the unofficial language of data: Python.
Please bear in mind that\u00a0I\u2019m not a developer<\/strong>, so\u00a0this code is provided for illustrative purposes only<\/strong>. It may contain errors or limitations. Please\u00a0use it at your own risk<\/strong>\u00a0and adapt it to your specific needs (also feel free to share back).<\/p>\n\n\n\n

<\/div>\n\n\n\n

Below are the steps we\u2019ll follow to move forward:<\/p>\n\n\n\n

    \n
  1. Defining the embeddings storage table<\/li>\n\n\n\n
  2. Fetch articles from the database.<\/li>\n\n\n\n
  3. Remove HTML tags and normalize whitespace.<\/li>\n\n\n\n
  4. Split articles into overlapping chunks of words.<\/li>\n\n\n\n
  5. Generate embeddings for a given article.<\/li>\n\n\n\n
  6. Insert article chunks with their embeddings into HeatWave.<\/li>\n<\/ol>\n\n\n\n

    Let\u2019s break down each step together!<\/p>\n\n\n\n

    <\/div>\n\n\n\n

    I’am using HeatWave 9.2.2:<\/p>\n\n\n\n

    SELECT version();\n+-------------+\n| version()   |\n+-------------+\n| 9.2.2-cloud |\n+-------------+<\/code><\/pre>\n\n\n\n
    <\/div>\n\n\n\n

    Defining the embeddings storage table<\/h2>\n\n\n\n

    I created a new table named wp_post_chunks_embeddings_minilm<\/em><\/code> to store the embeddings generated from article chunks.<\/p>\n\n\n\n