Using SQL Gateway with Python, Vector Search, and Interoperability in

Article

Julio Esquerdo · Feb 10 7m read

#Contest #JSON #ODBC #Python #REST API #SQL #Vector Search #InterSystems IRIS

Using SQL Gateway with Python, Vector Search, and Interoperability in InterSystems Iris

Part 2 – Python and Vector Search

Since we have access to the data from our external table, we can use everything that Iris has to offer with this data. Let's, for example, read the data from our external table and generate a polynomial regression with it.

For more information on using python with Iris, see the documentation available at https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls?KEY=AFL_epython

Let's now consume the data from the external database to calculate a polynomial regression. To do this, we will use a python code to run a SQL that will read our MySQL table and turn it into a pandas dataframe:

ClassMethod CalcularRegressaoPolinomialODBC() As %String [ Language = python ]
{
    import iris
    import json
    import pandas as pd
    from sklearn.preprocessing import PolynomialFeatures
    from sklearn.linear_model import LinearRegression
    from sklearn.metrics import mean_absolute_error
    import numpy as np
    import matplotlib
    import matplotlib.pyplot as plt
    matplotlib.use("Agg")

    # Define Grau 2 para a regressão
    grau = 2

    # Recupera dados da tabela remota via ODBC
    rs = iris.sql.exec("select venda as x, temperatura as y from estat.fabrica")
    df = rs.dataframe()

    # Reformatando x para uma matriz 2D exigida pelo scikit-learn
    X = df[['x']]
    y = df['y']

    # Transformação para incluir termos polinomiais
    poly = PolynomialFeatures(degree=grau)
    X_poly = poly.fit_transform(X)

    # Inicializa e ajusta o modelo de regressão polinomial
    model = LinearRegression()
    model.fit(X_poly, y)

    # Extrai os coeficientes do modelo ajustado
    coeficientes = model.coef_.tolist()  # Coeficientes polinomiais
    intercepto = model.intercept_       # Intercepto
    r_quadrado = model.score(X_poly, y) # R Quadrado

    # Previsão para a curva de regressão
    x_pred = np.linspace(df['x'].min(), df['x'].max(), 100).reshape(-1, 1)
    x_pred_poly = poly.transform(x_pred)
    y_pred = model.predict(x_pred_poly)

    # Calcula Y_pred baseado no X
    Y_pred = model.predict(X_poly)

    # Calcula MAE
    MAE = mean_absolute_error(y, Y_pred)

    # Geração do gráfico da Regressão
    plt.figure(figsize=(8, 6))
    plt.scatter(df['x'], df['y'], color='blue', label='Dados Originais')
    plt.plot(df['x'], df['y'], color='black', label='Linha dos Dados Originais')
    plt.scatter(df['x'], Y_pred, color='green', label='Dados Previstos')
    plt.plot(x_pred, y_pred, color='red', label='Curva da Regressão Polinomial')
    plt.title(f'Regressão Polinomial (Grau {grau})')
    plt.xlabel('X')
    plt.ylabel('Y')
    plt.legend()
    plt.grid(True)

    # Salvando o gráfico como imagem
    caminho_arquivo = 'c:\\temp\\RegressaoPolinomialODBC.png'
    plt.savefig(caminho_arquivo, dpi=300, bbox_inches='tight')
    plt.close()

    resultado = {
        'coeficientes': coeficientes,
        'intercepto': intercepto,
        'r_quadrado': r_quadrado,
        'MAE': MAE
    }

    return json.dumps(resultado)
}

The first action we take in the code is to read the data from our external table via SQL and then turn it into a Pandas dataframe. Always remembering that the data is physically stored in MySQL and is accessed via ODBC through the SQL Gateway configured in Iris. With this we can use the python libraries for calculation and graphing, as we see in the code.

Executing our routine we have the information of the generated model:

Our routine also generates a graph that gives us visual support for polynomial regression. Let's see how the graph turned out:

Another action we can take with the data that is now available is the use of Vector Search and RAG with the use of an LLM. To do this, we will vectorize the model of our table and from there ask LLM for some information.

For more information on using Vector Search in Iris, see the text available at https://www.intersystems.com/vectorsearch/

First, let's vectorize our table model. Below is the code with which we carry out this task:

ClassMethod IngestRelacional() As %String [ Language = python ]
{

    import json
    from langchain_iris import IRISVector
    from langchain_openai import OpenAIEmbeddings
    from langchain_text_splitters import RecursiveCharacterTextSplitter
    import iris

    try:

        apiKey = iris.cls("Vector.Util").apikey()
        collectionName = "odbc_work"

        metadados_tabelas = [
            "Tabela: estat.fabrica; Colunas: chave(INT), venda(INT), temperatura(INT)"
        ]

        text_splitter = RecursiveCharacterTextSplitter(chunk_size=2048, chunk_overlap=0)
        documents=text_splitter.create_documents(metadados_tabelas)

        # Vetorizar as definições das tabelas
        vectorstore = IRISVector.from_documents(
        documents=documents,
        embedding=OpenAIEmbeddings(openai_api_key=apiKey),
        dimension=1536,
        collection_name=collectionName
        )

        return json.dumps({"status": True})
    except Exception as err:
        return json.dumps({"error": str(err)})
}

Note that we don't pass the contents of the table to the ingest code, but its model. In this way, LLM is able, upon receiving the columns and their properties, to define an SQL according to our request.

This ingest code creates the odbc_work table that will be used to search for similarity in the table model, and then ask the LLM to return an SQL. For this we created a KEY API at OpenAI and used langchain_iris as a python library. For more details about the langchain_iris see the link https://github.com/caretdev/langchain-iris

After ingest the definition of our table, we will have the odbc_work table generated:

Now let's go to our third part, which is the access of a REST API that will consume the data that is vectorized to assemble a RAG.

See you later!

Using SQL Gateway with Python, Vector Search, and Interoperability in InterSystems Iris - Part 2 – Python and Vector Search