Article
· Jan 26 6m read

AI Agents from Scratch Part 1: Forging the Brain

image1

Some concepts make perfect sense on paper, whereas others require you to get your hands dirty.
Take driving, for example. You can memorize every component of the engine mechanics, but that does not mean you can actually drive.

You cannot truly grasp it until you are in the driver's seat, physically feeling the friction point of the clutch and the vibration of the road beneath.
While some computing concepts are intuitive, Intelligent Agents are different. To understand them, you have to get in the driver's seat.

In my previous articles regarding AI agents, we discussed such tools as CrewAI and LangGraph. In this guide, however, we are going to build an AI agent micro-framework from scratch. Writing an agent goes beyond mere syntax; it is a journey every developer should undertake to try and solve real-world problems.

Still, beyond the experience itself, there is another fundamental reason to do this, best summarized by Richard Feynman:

"What I cannot create, I do not understand."

So… What Is an AI Agent?

Let 's be specific. An agent is essentially a code that pursues a goal. It does not just chat. It “reads the room” and executes various tasks, ranging from sorting emails to managing complex schedules.

Unlike rigid scripts, agents possess agency. Conventional scripts break the moment reality deviates from hard-coded rules. Agents do not. They adapt. If a flight is cancelled, they do not crash with an error; they simply reroute.

I like visualizing the architecture as a biological system:

  • The Hands: The Tools. Without execution environments or APIs, the brain is trapped in a jar.
  • The Nervous System: Your orchestration layer. It manages state and logs memory.
  • The Body: The deployment infrastructure that ensures reliability and uptime.

A single agent might be impressive, but Agentic AI is where the true real power lies.
It is the whole system where multiple specialized agents collaborate to achieve a shared objective.

It is essentially a digital agency: you have one agent conducting research, another one drafting a copy, and a 'manager' node ensuring no one steps on each other's toes.

Still, honestly, theory can only get us so far, but I am itching to actually build this thing.
Let’s get our hands dirty. I have named this project MAIS, which serves a dual purpose: technically, it stands for Multi-Agent Interoperability Systems. However, in Portuguese, it simply means 'plus.' It is a nod to our constant search for extra capabilities.

The Brain: Agnostic Intelligence with LiteLLM

To power our agents, we need flexibility, but hardcoding a specific provider like OpenAI limits us. What if we want to test Gemini 3.0? What if a client prefers to run Llama 3 locally via Ollama?

To accomplish that, I prefer to rely on a library that has become a staple in The Musketeers projects: LiteLLM.

The beauty of LiteLLM lies in its standardization. It acts as a universal adapter, normalizing requests and responses across over 100+ providers.
This abstraction is crucial for a Multi-Agent System because it allows us to mix and match models based on the agent's specific needs.
Let’s imagine the following scenario:

  • The first agent uses a fast, cost-effective model (e.g., gpt-4o-mini).
  • The second agent utilizes a model with high reasoning capabilities and a large context window (e.g., claude-3-5-sonnet) to analyze complex data.

With our architecture, we can define which model an agent will work with simply by changing a string in the settings.

Security First: Handling API Keys

Connecting to these providers requires API Keys, and we certainly do not want to hardcode secrets in our source code.
The "InterSystems’ way" to handle this is via Production Credentials.

To ensure our keys remain protected, the LLM Adapter acts as a bridge to IRIS's secure credential storage.
We utilize a property named APIKeysConfig to manage the handover.
You should populate it with the provider-specific key names required by LiteLLM (e.g., OPENAI_API_KEY, AZURE_API_KEY), separated by commas.

When the adapter gets initialized, it pulls the actual secrets from the secure storage and assigns them as environment variables, allowing LiteLLM to authenticate without ever exposing raw keys in the code:

Method OnInit() As %Status
{
    Set tSC = $$$OK
    Try {
        Do ..Initialize()
    } Catch e {
        Set tSC = e.AsStatus()
    }
    Quit tSC
}

/// Configure API Keys in Python Environment
Method Initialize() [ Language = python ]
{
    import os
    for tKeyName in self.APIKeysConfig.split(','): 
        credential = iris.cls("Ens.Config.Credentials")._OpenId(tKeyName.strip())
        if not credential:
            continue
        os.environ[tKeyName] = credential.PasswordGet()
}

Now that the security layer is in place, let’s focus on the core reasoning of our Adapter.
This is where we define which model to call and how to structure the message.
The Adapter has a configuration property that determines the default model:

/// Default model to use if not specified in request
Property DefaultModel As %String [ InitialExpression = "gpt-4o-mini" ]; 

However, the magic happens at runtime. The input message dc.mais.messages.LLMRequest has an optional Model property. If the orchestrator (BPL) sends this property filled in, the Adapter respects the dynamic choice.
Otherwise, it falls back to DefaultModel.

Separating Instructions from Input

Another important design decision was how we send text to the LLM. Instead of sending just a raw string, I split the concept into two fields in the request:

  1. Content: This is where the “System Prompt” or the current Agent’s instructions go (e.g., “You are a waiter who is an expert in wines…”).
  2. UserContent: This is where the user’s actual input goes (e.g., “Which wine pairs well with fish?”).

It allows us to build a clean messages array for LiteLLM, ensuring that the AI can clearly distinguish its personafrom the user’s question.

Here is how the main CallLiteLLM method assembles this puzzle using Python directly within IRIS:


Method CallLiteLLM(pRequest As dc.mais.messages.LLMRequest) As dc.mais.messages.LLMResponse [ Language = python ] { import litellm import json import time import iris t_attempt = 0 max_retries = self.MaxRetries retry_delay = self.RetryDelay last_error = None pResponse = iris.cls("dc.mais.messages.LLMResponse")._New() while t_attempt <= max_retries: t_attempt += 1 try: model = pRequest.Model if not model: model = self.GetDefaultModel() messages = [{"role": "user", "content": pRequest.Content}] if pRequest.UserContent: messages.append({"role": "user", "content": pRequest.UserContent}) response = litellm.completion(model=model, messages=messages) pResponse.Model = response.model choices_list = [] if hasattr(response, 'choices'): for choice in response.choices: if hasattr(choice, 'model_dump'): choices_list.append(choice.model_dump()) elif hasattr(choice, 'dict'): choices_list.append(choice.dict()) else: choices_list.append(dict(choice)) pResponse.Choices = json.dumps(choices_list) if (len(response.choices) > 0 ): pResponse.Content = response.choices[0].message.content if hasattr(response, 'usage'): if hasattr(response.usage, 'model_dump'): pResponse.Usage = json.dumps(response.usage.model_dump()) else: pResponse.Usage = json.dumps(dict(response.usage)) if hasattr(response, 'error') and response.error: pResponse.Error = json.dumps(dict(response.error)) return pResponse except Exception as e: last_error = str(e) class_name = "dc.mais.adapter.LiteLLM" iris.cls("Ens.Util.Log").LogError(class_name, "CallLiteLLM", f"LiteLLM call attempt {t_attempt} failed: {last_error}") if t_attempt > max_retries: break time.sleep(retry_delay) error_payload = { "message": "All LiteLLM call attempts failed", "details": last_error } pResponse.Error = json.dumps(error_payload) return pResponse }

For those who prefer the classic syntax, I also included an ObjectScript version of the same method:

Method CallLiteLLMObjectScript(pRequest As dc.mais.messages.LLMRequest, Output pResponse As dc.mais.messages.LLMResponse) As %Status
{
    Set tSC = $$$OK
    Set tAttempt = 0
    Set pResponse = ##class(dc.mais.messages.LLMResponse).%New()

    While tAttempt <= ..MaxRetries {
        Set tAttempt = tAttempt + 1

        Try {
            Set model = $Select(pRequest.Model '= "": pRequest.Model, 1: ..GetDefaultModel())

            // Prepare History
            Set jsonHistory = [].%FromJSON(pRequest.History)
            Set:(jsonHistory="") jsonHistory = []

            // Inject Tool Output if present (Close the Loop logic)
            If (pRequest.ToolCallId '= "") && (pRequest.ToolOutput '= "") {
                Set tToolMsg = {}
                Set tToolMsg.role = "tool"
                Set tToolMsg.content = pRequest.ToolOutput
                Set tToolMsg."tool_call_id" = pRequest.ToolCallId
                Do jsonHistory.%Push(tToolMsg)
            }

            // Add current user content/prompt only if not empty
            If (pRequest.Content '= "") {
                Do jsonHistory.%Push({"role": "user", "content": (pRequest.Content)})
            }

            // Add extra user content field if present
            If (pRequest.UserContent'="") {
                Do jsonHistory.%Push({"role": "user", "content": (pRequest.UserContent)})
            }

            Set strMessages = jsonHistory.%ToJSON()

            // Call Python Helper
            Set tResponse = ..PyCompletion(model, strMessages, pRequest.Parameter, 1)

            // Map Response
            Set pResponse.Model = tResponse.model

            // Convert Python choices to IRIS DynamicArray
            Set choices = []
            For i=0:1:tResponse.choices."__len__"()-1 {
                Do choices.%Push({}.%FromJSON(tResponse.choices."__getitem__"(i)."to_json"()))
            }
            Set pResponse.Choices = choices.%ToJSON()

            // Process the last choice
            If (choices.%Size()>0) {
                Set choice = choices.%Get(choices.%Size()-1)

                If ($IsObject(choice.message)){
                    Set pResponse.Content = choice.message.content

                    // Extract Tool Calls
                    // Check if 'tool_calls' exists and is a valid Object (DynamicArray)
                    Set tToolCalls = choice.message."tool_calls"

                    // Verify it is an Object (Array) and not empty string
                    If $IsObject(tToolCalls) {
                        Do ..GetToolCalls(tToolCalls, .pResponse)
                    }
                }
            }

            // Map Usage
            If ..hasattr(tResponse, "usage") {
                Set pResponse.Usage = {}.%FromJSON(tResponse.usage."to_json"()).%ToJSON()
            }

            // Success - Exit Loop
            Quit 

        } Catch e {
            Set tSC = e.AsStatus()
            $$$LOGERROR("LiteLLM call attempt "_tAttempt_" failed: "_$System.Status.GetOneErrorText(tSC))
            If tAttempt > ..MaxRetries Quit
            Hang ..RetryDelay
        }
    }

    If ($$$ISERR(tSC)) {
        Set pResponse.Error = {
            "message": "All LiteLLM call attempts failed",
            "details": ($System.Status.GetOneErrorText(tSC))
        }.%ToJSON()
    }
    Quit tSC
}

You might have noticed the call to ..PyCompletion(...) inside the ObjectScript version of this logic.
It is not a standard system method; but a custom helper designed to handle Data Marshaling between the two languages.
While IRIS allows direct calls to Python, passing complex nested structures(e.g., lists of objects containing specific data types) can sometimes require manual conversion.
The PyCompletion method acts as a translation layer. It accepts the data from ObjectScript as serialized JSON strings. Then it deserializes them into native Python dictionaries and lists (using json.loads) inside the Python environment. Finally, it executes the LiteLLM request.
This "Hybrid" approach keeps our ObjectScript code clean and readable, focusing purely on business logic (looping, history management), while offloading the heavy lifting of data type conversion and library interaction to a small, dedicated Python wrapper.

This simple structure gives us tremendous control. While BPL can swap out the brain of the operation (the Model) or the personality (Content) dynamically at each step of the flow, the Adapter takes care of the technical “plumbing.”

The Stage is Set, but It is Empty

We have covered a lot of ground so far. We have built a secure, provider-agnostic bridge to the LLM using Python and LiteLLM. We have solved the tricky interoperability issues with **kwargs and established a secure way to handle credentials with the help of the IRIS native storage.

However, if you look closely, you will see that we have a beautiful car with a powerful engine, but no driver.

We have established the link to the 'Brain,' but it lacks a defined persona. We can invoke GPT, but without specific instructions, it does not know if it should act as a helpful Greeter or a technical Support Engineer. It is currently just a stateless processor void of memory, lacking a goal and disconnected from any tools.

In Part 2, we will give this brain a Soul. We will do the following:

  1. Build the dc.mais.adapter.Agent class to define personas.
  2. Master Dynamic Prompt Engineering to enforce business rules.
  3. Implement the "Allow List" logic for agent-to-agent communication.
  4. Dive into the ReAct Paradigm theory that makes agents truly smart.

Did I overcomplicate the adapter? Do you have a cleaner way to handle the environment variables?
If so, or if you spot a flaw in my logic before we get to Part 2, call it out in the comments below! I am writing this to learn from you as much as to share.

Acknowledgments: A special thanks to my fellow Musketeer, @José Pereira, who introduced me to the wonders of LiteLLM.

Stay tuned. We are just getting started.

Discussion (0)0
Log in or sign up to continue