Article Alex Woodhead · Jun 12, 2023 3m read

LangChain fixed the SQL for me

This article is a simple quick starter (what I did was) with SqlDatabaseChain.

Hope this ignites some interest.

Many thanks to:

sqlalchemy-iris author @Dmitry Maslennikov

Your project made this possible today.

The article script uses openai API so caution not to share table information and records externally, that you didn't intend to.

A local model could be plugged in , instead if needed.

Creating a new virtual environment

mkdir chainsql

cd chainsql

python -m venv .

scripts\activate

pip install langchain

pip install wget

# Need to connect to IRIS so installing a fresh python driver
python -c "import wget;url='https://raw.githubusercontent.com/intersystems-community/iris-driver-distribution/main/DB-API/intersystems_irispython-3.2.0-py3-none-any.whl';wget.download(url)"

# And for more magic
pip install sqlalchemy-iris

pip install openai

set OPENAI_API_KEY=[ Your OpenAI Key ]

python

 

Initial Test

from langchain import OpenAI, SQLDatabase, SQLDatabaseChain

db = SQLDatabase.from_uri("iris://superuser:******@localhost:51775/USER")

llm = OpenAI(temperature=0, verbose=True)

db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)

db_chain.run("How many Tables are there")

Result error

sqlalchemy.exc.DatabaseError: (intersystems_iris.dbapi._DBAPI.DatabaseError) [SQLCODE: <-25>:<Input encountered after end of query>]
[Location: <Prepare>]
[%msg: < Input (;) encountered after end of query^SELECT COUNT ( * ) FROM information_schema . tables WHERE table_schema = :%qpar(1) ;>]
[SQL: SELECT COUNT(*) FROM information_schema.tables WHERE table_schema = 'public';]
(Background on this error at: https://sqlalche.me/e/20/4xp6)
←[32;1m←[1;3mSELECT COUNT(*) FROM information_schema.tables WHERE table_schema = 'public';←[0m>>>

Inter-developer dialogue

IRIS didn't like being given SQL queries that end with a semicolon.

What to do now? ?

Idea: How about I tell LangChain to fix it for me

Cool. Lets do this !!

Test Two

from langchain import OpenAI, SQLDatabase, SQLDatabaseChain

from langchain.prompts.prompt import PromptTemplate

_DEFAULT_TEMPLATE = """Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.

Use the following format:

Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"

The SQL query should NOT end with semi-colon
Question: {input}"""

PROMPT = PromptTemplate(
     input_variables=["input", "dialect"], template=_DEFAULT_TEMPLATE
)

db = SQLDatabase.from_uri("iris://superuser:******@localhost:51775/USER") llm = OpenAI(temperature=0, verbose=True)

llm = OpenAI(temperature=0, verbose=True)

db_chain = SQLDatabaseChain(llm=llm, database=db, prompt=PROMPT, verbose=True) 

db_chain.run("How many Tables are there")

 

Result Two

SQLQuery:←[32;1m←[1;3mSELECT COUNT(*) FROM information_schema.tables←[0m
SQLResult: ←[33;1m←[1;3m[(499,)]←[0m
Answer:←[32;1m←[1;3mThere are 499 tables.←[0m
←[1m> Finished chain.←[0m
'There are 499 tables.'

I said it would be quick.

References:

https://walkingtree.tech/natural-language-to-query-your-sql-database-us…

https://python.langchain.com/en/latest/modules/chains/examples/sqlite.h…

https://python.langchain.com/en/latest/modules/agents/plan_and_execute…

Comments

Alex Woodhead  Jun 14, 2023 to Renan Lourenco

@Renan Lourenco and @Dmitry Maslennikov Another suggestion for sqlalchemy-iris would be to detect statements that end with "LIMIT #" and rewrite this to the beginning "SELECT TOP # ..... FROM

0
Dmitry Maslennikov  Jun 15, 2023 to Alex Woodhead

This part I can do, but, I don't think we should even consider it. This is part of IRIS SQL Dialect, and it's not good when some SQL queries may not work with different connections, e.g. using JDBC, or SMP.

0
Benjamin De Boe  Jun 16, 2023 to Dmitry Maslennikov

SQL Server and Sybase both also use TOP semantics rather than LIMIT/OFFSET. I think a decent Large Dialect Model should be able to handle that when passed the proper prompt. ;-)

This said, we have a backlog item to support LIMIT/OFFSET in IRIS SQL as well, but given that TOP is a common pattern as well, it doesn't have a high priority.

0
Benjamin De Boe · Jun 19, 2023

@Alex Woodhead @Renan Lourenco , I was trying to hook this up with a local LLM (GPT4All), but am having no luck. When I just run it by default, the {table_info} it feeds to the prompt makes the prompt too large for my (cheapo?) LLM. But when I try to make it only look in my application's schema by using the corresponding SQLDatabase constructor option, the SQLAlchemy driver tries to run a SET search_path = MySchemaName command, which is not supported and fails as well. Simply taking out the table info means it'll just try without schema names and that doesn't work for me either, unfortunately.

Is this anything you've run into and found a handy workaround for?

0
Renan Lourenco  Jul 6, 2023 to Benjamin De Boe

@Benjamin De Boe have you tried something like below? That worked for me (partially). The tables I'm dealing with have a fair number of fields so I was running into the too large prompt issue too. I had to give a break from exploring further

db = SQLDatabase.from_uri(connection_string, schema="HSAA",include_tables=['Episode','Patient']
0