Run Your AI Agent with InterSystems IRIS and Local Models using Ollama |

Article

Alberto Fuentes · Sep 16 4m read

Open Exchange

#Artificial Intelligence (AI) #Generative AI (GenAI) #Large Language Model (LLM) #Python #SQL #Vector Search #InterSystems IRIS

In the previous article, we saw how to build a customer service AI agent with smolagents and InterSystems IRIS, combining SQL, RAG with vector search, and interoperability.

In that case, we used cloud models (OpenAI) for the LLM and embeddings.

This time, we’ll take it one step further: running the same agent, but with local models thanks to Ollama.

Why run models locally?

Using LLMs in the cloud is the simplest option to get started:

✅ Models already optimized and maintained
✅ Easy access with a simple API
✅ Serverless service: no need to worry about hardware or maintenance
❌ Usage costs
❌ Dependency on external services
❌ Privacy restrictions when sending data

On the other hand, running models locally gives us:

✅ Full control over data and environment
✅ No variable usage costs
✅ Possibility to fine-tune or adapt models with techniques such as LoRA (Low-Rank Adaptation), which allows training certain layers of the model to adapt it to your specific domain without retraining the entire model
❌ Higher resource consumption on your server
❌ Limitations on model size depending on your hardware

That’s where Ollama comes into play.

What is Ollama?

Ollama is a tool that makes it easy to run language models and embeddings on your own computer with a very simple experience:

Download models with an ollama pull
Run them locally, exposed as an HTTP API
Integrate them directly into your applications, just like you would with OpenAI

In short: the same API you’d use in the cloud, but running on your laptop or server.

Basic Ollama setup

First, install Ollama from its website and verify that it works:

ollama --version

Then, download a couple of models:

# Download an embeddings model
ollama pull nomic-embed-text:latest

# Download a language model
ollama pull llama3.1:8b

# See all available models
ollama list

You can test embeddings directly with a curl:

curl http://localhost:11434/api/embeddings -d '{
  "model": "nomic-embed-text:latest",
  "prompt": "Ollama makes it easy to run LLMs locally."
}'

Using Ollama in the IRIS agent

The Customer Support Agent Demo repository already includes the configuration for Ollama. You just need to:

Download the models needed to run them in Ollama
I used nomic-embed-text for vector search embeddings and devstral as the LLM.
Configure IRIS to use Ollama embeddings with the local model:

INSERT INTO %Embedding.Config (Name, Configuration, EmbeddingClass, VectorLength, Description)
  VALUES ('ollama-nomic-config', 
          '{"apiBase":"http://host.docker.internal:11434/api/embeddings", 
            "modelName": "nomic-embed-text:latest"}',
          'Embedding.Ollama', 
          768,  
          'embedding model in Ollama');

Adjust the column size to store vectors in the sample tables (the local model has a different vector size than the original OpenAI one).

ALTER TABLE Agent_Data.Products DROP COLUMN Embedding;
ALTER TABLE Agent_Data.Products ADD COLUMN Embedding VECTOR(FLOAT, 768);

ALTER TABLE Agent_Data.DocChunks DROP COLUMN Embedding;
ALTER TABLE Agent_Data.DocChunks ADD COLUMN Embedding VECTOR(FLOAT, 768);

Configure the .env environment file to specify the models we want to use:

OPENAI_MODEL=devstral:24b-small-2505-q4_K_M
OPENAI_API_BASE=http://localhost:11434/v1
EMBEDDING_CONFIG_NAME=ollama-nomic-config

Update the embeddings

Since we have a different embedding model than the original, we need to update the embeddings using the local nomic-embed-text:

python scripts/embed_sql.py

Run the agent so that it uses the new configuration

The code will now use the configuration so that both embeddings and the LLM are served from the local endpoint.

With this configuration, you can ask questions such as:

“Where is my order #1001?”
“What is the return period?”

And the agent will use:

IRIS SQL for structured data
Vector search with Ollama embeddings (local)
Interoperability to simulate external API calls
A local LLM to plan and generate code that calls the necessary tools to obtain the answer

Conclusion

Thanks to Ollama, we can run our Customer Support Agent with IRIS without relying on the cloud:

Privacy and control of data
Zero cost per token
Total flexibility to test and adapt models (LoRA)

The challenge? You need a machine with enough memory and CPU/GPU to run large models. But for prototypes and testing, it’s a very powerful and practical option.

Useful references

📖 Official Ollama documentation: https://ollama.com
📦 List of available Ollama models: https://ollama.com/library
🧩 Introduction to LoRA for efficiently adapting large models:
- Simple explanation: https://huggingface.co/docs/peft/main/en/conceptual_guides/lora

Go to the original post written by @Alberto Fuentes