LangChain fixed the SQL for me
This article is a simple quick starter (what I did was) with SqlDatabaseChain.
Hope this ignites some interest.
Many thanks to:
sqlalchemy-iris author @Dmitry Maslennikov
Your project made this possible today.
The article script uses openai API so caution not to share table information and records externally, that you didn't intend to.
A local model could be plugged in , instead if needed.
Creating a new virtual environment
mkdir chainsql cd chainsql python -m venv . scripts\activate pip install langchain pip install wget # Need to connect to IRIS so installing a fresh python driver python -c "import wget;url='https://raw.githubusercontent.com/intersystems-community/iris-driver-distribution/main/DB-API/intersystems_irispython-3.2.0-py3-none-any.whl';wget.download(url)" # And for more magic pip install sqlalchemy-iris pip install openai set OPENAI_API_KEY=[ Your OpenAI Key ] python
Initial Test
from langchain import OpenAI, SQLDatabase, SQLDatabaseChain
db = SQLDatabase.from_uri("iris://superuser:******@localhost:51775/USER")
llm = OpenAI(temperature=0, verbose=True)
db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)
db_chain.run("How many Tables are there")Result error
sqlalchemy.exc.DatabaseError: (intersystems_iris.dbapi._DBAPI.DatabaseError) [SQLCODE: <-25>:<Input encountered after end of query>] [Location: <Prepare>] [%msg: < Input (;) encountered after end of query^SELECT COUNT ( * ) FROM information_schema . tables WHERE table_schema = :%qpar(1) ;>] [SQL: SELECT COUNT(*) FROM information_schema.tables WHERE table_schema = 'public';] (Background on this error at: https://sqlalche.me/e/20/4xp6) ←[32;1m←[1;3mSELECT COUNT(*) FROM information_schema.tables WHERE table_schema = 'public';←[0m>>>
Inter-developer dialogue
IRIS didn't like being given SQL queries that end with a semicolon.
What to do now? ?
Idea: How about I tell LangChain to fix it for me
Cool. Lets do this !!
Test Two
from langchain import OpenAI, SQLDatabase, SQLDatabaseChain
from langchain.prompts.prompt import PromptTemplate
_DEFAULT_TEMPLATE = """Given an input question, first create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer.
Use the following format:
Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"
The SQL query should NOT end with semi-colon
Question: {input}"""
PROMPT = PromptTemplate(
input_variables=["input", "dialect"], template=_DEFAULT_TEMPLATE
)
db = SQLDatabase.from_uri("iris://superuser:******@localhost:51775/USER") llm = OpenAI(temperature=0, verbose=True)
llm = OpenAI(temperature=0, verbose=True)
db_chain = SQLDatabaseChain(llm=llm, database=db, prompt=PROMPT, verbose=True)
db_chain.run("How many Tables are there")
Result Two
SQLQuery:←[32;1m←[1;3mSELECT COUNT(*) FROM information_schema.tables←[0m SQLResult: ←[33;1m←[1;3m[(499,)]←[0m Answer:←[32;1m←[1;3mThere are 499 tables.←[0m ←[1m> Finished chain.←[0m 'There are 499 tables.'
I said it would be quick.
References:
https://walkingtree.tech/natural-language-to-query-your-sql-database-us…
https://python.langchain.com/en/latest/modules/chains/examples/sqlite.h…
https://python.langchain.com/en/latest/modules/agents/plan_and_execute…
Comments
@Alex Woodhead, please to say that a fix was introduced to sqlalchemy-iris removing trailing semi-colons
Thanks @Dmitry Maslennikov
@Renan Lourenco and @Dmitry Maslennikov Another suggestion for sqlalchemy-iris would be to detect statements that end with "LIMIT #" and rewrite this to the beginning "SELECT TOP # ..... FROM
This part I can do, but, I don't think we should even consider it. This is part of IRIS SQL Dialect, and it's not good when some SQL queries may not work with different connections, e.g. using JDBC, or SMP.
SQL Server and Sybase both also use TOP semantics rather than LIMIT/OFFSET. I think a decent Large Dialect Model should be able to handle that when passed the proper prompt. ;-)
This said, we have a backlog item to support LIMIT/OFFSET in IRIS SQL as well, but given that TOP is a common pattern as well, it doesn't have a high priority.
@Alex Woodhead @Renan Lourenco , I was trying to hook this up with a local LLM (GPT4All), but am having no luck. When I just run it by default, the {table_info} it feeds to the prompt makes the prompt too large for my (cheapo?) LLM. But when I try to make it only look in my application's schema by using the corresponding SQLDatabase constructor option, the SQLAlchemy driver tries to run a SET search_path = MySchemaName command, which is not supported and fails as well. Simply taking out the table info means it'll just try without schema names and that doesn't work for me either, unfortunately.
Is this anything you've run into and found a handy workaround for?
@Benjamin De Boe have you tried something like below? That worked for me (partially). The tables I'm dealing with have a fair number of fields so I was running into the too large prompt issue too. I had to give a break from exploring further
db = SQLDatabase.from_uri(connection_string, schema="HSAA",include_tables=['Episode','Patient']💡 This article is considered as InterSystems Data Platform Best Practice.