Natural Language to SQL using Adalflow and Ollama

Convert natural language questions to SQL Queries

Gopi Subramanian`

Oct 16, 2024

A common LLM use case is to convert questions in English into SQL queries. This article demonstrates this use case using Sequential container from Adalflow. The pipeline shared in this article is a trival one. Real world scenarios are more demanding.

The goal is to generate SQL as shown in the below table.

Text to SQL LLM Pipeline

The below image depicts the steps in an English to SQL query conversion LLM pipeline. We start with a sample database. Inspect that database and retrieve table and column details. Inject these details into a Prompt. Pass that prompt to an LLM to generate SQL for English questions related to the tables. NLToSQL block using the LLM Generator and Prompt helps in translating the question into SQL command.

The Execute_SQL block is a boiler plate code to execute the generated SQL Finally we add a LLM Summarizer block to return our results as English summary. We pass the user question and the results from the database and get a summary.

In additon to Adalflow, we use Ollama for hosting LLMs locally. For this code, we use Microsoft Phi3:latest model. sqlite3 for sample database.

The code is available in

Open Lightning Studio

Create a sample database

The schema was borrowed from here. There are two tables. The Products table is a master list of products. Inventory table holds the details of inventory of each product by warehouse.

import sqlite3

database_schema = """

DROP TABLE IF EXISTS PRODUCTS;
CREATE TABLE PRODUCTS (
    SKU INT,
    DESCRIPTION STRING
);
DROP TABLE IF EXISTS INVENTORY;
CREATE TABLE INVENTORY (
    SKU INT,
    QUANTITY INT,
    WAREHOUSE STRING
);

INSERT INTO PRODUCTS(SKU,DESCRIPTION) VALUES (100,'short sleeve cotton shirt blue');
INSERT INTO PRODUCTS(SKU,DESCRIPTION) VALUES (200,'long sleeve flanelle shirt red');
INSERT INTO PRODUCTS(SKU,DESCRIPTION) VALUES (300, 'Black Running Shoes size 11');
INSERT INTO PRODUCTS(SKU,DESCRIPTION) VALUES (400, 'Sweat Pants Grey Men');
INSERT INTO PRODUCTS(SKU,DESCRIPTION) VALUES (500, 'Crop top green Women');
INSERT INTO PRODUCTS(SKU,DESCRIPTION) VALUES (600, 'Long Skirt Blue');
INSERT INTO PRODUCTS(SKU,DESCRIPTION) VALUES (700, 'Summer hat kids Orange');
INSERT INTO PRODUCTS(SKU,DESCRIPTION) VALUES (800, 'Sunglasses Unisex black');
INSERT INTO PRODUCTS(SKU,DESCRIPTION) VALUES (900, 'Jeans Black');
INSERT INTO PRODUCTS(SKU,DESCRIPTION) VALUES (1000, 'Kaki cotton pants summer women');

INSERT INTO INVENTORY VALUES (100, 154,'New York');
INSERT INTO INVENTORY VALUES (200, 245, 'Seattle');
INSERT INTO INVENTORY VALUES (300, 364, 'San Francisco');
INSERT INTO INVENTORY VALUES (400, 421, 'New York');
INSERT INTO INVENTORY VALUES (500, 532, 'Seattle');
INSERT INTO INVENTORY VALUES (600, 678, 'New York');
INSERT INTO INVENTORY VALUES (700, 741, 'San Francisco');
INSERT INTO INVENTORY VALUES (800, 845, 'New York');
INSERT INTO INVENTORY VALUES (900, 987, 'Seattle');
INSERT INTO INVENTORY VALUES (1000, 149, 'San Francisco');
"""

connection_obj = sqlite3.connect('inventory.db')
# cursor object
cursor_obj = connection_obj.cursor()
cursor_obj.executescript(database_schema)
# Test inserted records
cursor_obj.execute("SELECT * from INVENTORY") 
# fetch all the data 
print(cursor_obj.fetchall())
# Another query
cursor_obj.execute("SELECT name FROM sqlite_master WHERE type='table';")
print(cursor_obj.fetchall())
connection_obj.close()

[(100, 154, 'New York'), (200, 245, 'Seattle'), (300, 364, 'San Francisco'), (400, 421, 'New York'), (500, 532, 'Seattle'), (600, 678, 'New York'), (700, 741, 'San Francisco'), (800, 845, 'New York'), (900, 987, 'Seattle'), (1000, 149, 'San Francisco')]
[('PRODUCTS',), ('INVENTORY',)]

Prepare the prompt

The DBTool class introspects the given sqlite database, to retrieve the table details.

class DBTool():
    
    def __init__(self, database):
        try:
            self.connection = sqlite3.connect(database)
            self.cursor = self.connection.cursor()
        except:
           print(f"Unable to open{database}") 
        
        self.table_cols   = {}
        self.__get_tbls()
        self.__get_cols()
        self.connection.close()
    
    def get_string_signature(self):
        return_string = """
        Table names and their columns are listed below \n
        """
        
        table_count = 1
        for table, columns in self.table_cols.items():
            return_string+= str(table_count) + "\n"
            return_string+= f"Table name:{table} \n"
            for idx, col_name in enumerate(columns):
                return_string+="\t" + str(idx+1) + f" {col_name} \n"
            table_count+=1
        
        return return_string
            
    
    def __get_tbls(self):
        self.cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
        tables = self.cursor.fetchall()
        for (table,) in tables:
            self.table_cols[table] = []
    
    def __get_cols(self):
        for table in self.table_cols.keys():
            columns = self.cursor.execute(f"PRAGMA table_info({table})")
            for column in columns:
                self.table_cols[table].append(column[1])


db_tool = DBTool("inventory.db")
table_schema = db_tool.get_string_signature()

The get_string_signature function, creates a string, where the tables and the columns are listed. Here is the example string generated for sample database.

        Table names and their columns are listed below 

        1
Table name:PRODUCTS 
	1 SKU 
	2 DESCRIPTION 
2
Table name:INVENTORY 
	1 SKU 
	2 QUANTITY 
	3 WAREHOUSE

Let us pass this description to the prompt.

Prepare and test the prompt

Parameter db_template is a placeholder for table description. The description inside the prompt clarifies that its a database expert and given a list of tables and their columns, it should translate any english question regarding the database into SQL counterpart. Inside the prompt, the placeholder table_schema_str is where we put the table name and column descriptions from DBTool.

The generator is the interface to the LMM, it is initated with the appropriate prompts.

from adalflow.core.generator import Generator
from adalflow.components.model_client.ollama_client import OllamaClient
from adalflow.core.default_prompt_template import SIMPLE_DEFAULT_LIGHTRAG_SYSTEM_PROMPT

db_template="""
<SYS>You are sqlite database expert. Given the list of tables and columns, you job is to translate
user queries into corresponding error free SQL statements. 
1. Make sure the column names in the query are from the table.
2. Think step by step and carefully check the user question, list of tables and column names before
giving the query back.
{% if table_schema_str %}
{{table_schema_str}}
{% endif %}
- Dont give any suggestion. Only generate the SQL.
</SYS>
Translate the input: {{input_str}}
You:
"""
{% if table_schema_str %}
{{table_schema_str}}
{% endif %}
- Dont give any suggestion. Only generate the SQL.
</SYS>
Translate the input: {{input_str}}
You:
"""
host = "127.0.0.1:11434"

generator = Generator(
    model_client = OllamaClient(host=host)
   ,model_kwargs = {"model":"phi3:latest","options":{"temperature":0.7,"seed":77}}
   ,template = db_template
   ,prompt_kwargs ={"table_schema_str": table_schema}
   ,name ="SQL Generator" 
)

from adalflow.core.component import Component

class NLToSql(Component):
    
    def __init__(self, generator):
        super().__init__()
        self.generator = generator
        
    def call(self, input: str) -> str:
        input_dict = {"input_str":str(input)}
        result = self.generator.call(input_dict)
        return {"result": result.data, "query": str(input)}
        
nl_sql = NLToSql(generator)
query = "what is the inventory for product jeans black"
nl_sql(query)

{'result': "SELECT i.QUANTITY \nFROM INVENTORY i \nJOIN PRODUCTS p ON i.SKU = p.SKU \nWHERE p.DESCRIPTION LIKE '%jeans%' AND p.DESCRIPTION LIKE '%black%';",
 'query': 'what is the inventory for product jeans black'}

The component NLToSql can now translate a given english question into SQL with the help of generator. A sample call is made to the component, using the question “what is the inventory for product jeans black“, gave us the sql. Though not the optimal, still it can do the job.

SELECT i.QUANTITY 
FROM PRODUCTS p 
JOIN INVENTORY i ON p.SKU = i.SKU 
WHERE p.DESCRIPTION LIKE '%jeans%' AND p.DESCRIPTION LIKE '%black%';

from adalflow.core.component import fun_to_component

@fun_to_component
def execute_sql(input_json):
    """
    Execute the given SQL command
    """
    import sqlite3
    import json
    
    connection = sqlite3.connect("inventory.db")
    cursor_obj = connection.cursor()
    
    sql = input_json["result"]
    user_query = input_json["query"]
    
    cursor_obj.execute(sql)
    
    rows = cursor_obj.fetchall()
    return_records = [ {cursor_obj.description[i][0]:x for i,x in enumerate(row)} for row in rows] 
    
    connection.close()
    
    return json.dumps({"user_input": user_query, "results": return_records})

Let us now proceed to write a function to execute a given sql. Using the decorator @fun_to_component we convert this function into a component, so we can easily add it to our pipeline.

This should complete our pipeline. However, say for some queries, if we can give an english summary back to the user. Let us assemble a Component with an approporiate generator for this task.

import json

summarizer_template="""
<SYS>You can tell stories from number. Help summarize the numerical
answer provided in an easy to understand manner. The numerical answer
is the result from a database query for the user question.
- Give a clear explanation in a couple of lines.
</SYS>
Summarize the input: 
user question: {{input_str}}
numerical answer: {{numerical_answer}}
You:
"""
host = "127.0.0.1:11434"

summary_generator = Generator(
    model_client = OllamaClient(host=host)
   ,model_kwargs = {"model":"phi3:latest","options":{"temperature":0.7,"seed":77}}
   ,template = summarizer_template
   ,name ="SQL Result Summarizer" 
)

class SQLSummarizer(Component):
    
    def __init__(self, generator):
        super().__init__()
        self.generator = summary_generator
        
    def call(self, input: str) -> str:
        
        input_dict = json.loads(input)
        user_query = input_dict["user_input"]
        numerics   = input_dict["results"]
        input_dict = {"input_str": user_query, "numerical_answer": numerics}
        result = self.generator.call(input_dict)
        return json.dumps({"result": result.data, "query": str(user_query)})
    
summarizer = SQLSummarizer(summary_generator)
input = json.dumps({"user_input": "what is the inventory for product jeans black", "results": [{"QUANTITY": 987}]})
summarizer(input)

The SQLSummarizer is capable of taking a english question and the records returned from the database for the SQL translation of that question and give an natural language summary.

Using the Sequential container, which can orcestrate a list of containers in the given order, we realize our Text to SQL pipeline.

from adalflow.core.container import Sequential

seq = Sequential(nl_sql, execute_sql, summarizer)
query = "what is the inventory for product jeans black"
print(seq(query))

Here is another sample result for the query “What is the inventory for product jeans black”

{"result": "The current stock level for 'Black Jeans' products stands at a total of 987 items available in our inventory. This means you can order quite confidently without fearing immediate supply issues!", "query": "what is the inventory for product jeans black"}

Conclusion

This demonstrates a trivial text to sql conversion pipeline. We didnt cover evaluation of this pipeline. There are datasets available using which gold dataset can be created. Spider is once such dataset. https://yale-lily.github.io/spider

From their website,

“It consists of 10,181 questions and 5,693 unique complex SQL queries on 200 databases with multiple tables covering 138 different domains.”

Defog.ai, company has developed an approach to evaluate the SQL generated from text input. https://github.com/defog-ai/sql-eval

Thili.net’s Learn Fast

Discussion about this post