Written by

Solution Architect at Zorgi
MOD
Article Lorenzo Scalese · 1 hr ago 9m read

Bringing Server-Sent Events to ObjectScript: Enabling AI Streaming

Introduction — The Problem with AI Streaming in ObjectScript

Today, I would like to introduce a problem I encountered and the solution I found when integrating AI APIs into an ObjectScript application. My initial tests were successful, yet somewhat frustrating.

The HTTP call worked; the request was properly sent to my LLM APIs. But then, silence... a long wait. Eventually, the entire response arrived as a single block.

Technically, it worked, but the user experience was disappointing compared to a ChatGPT session.

Modern models are designed to stream their output token by token. It makes the wait much shorter, as you can start reading the response even if the full answer has not been generated yet.  To enable this behavior, you should just pass stream=true to the API. However, there is an essential detail lying behind this apparent simplicity: streaming relies on Server-Sent Events (SSE).

You cannot exploit this mode without client-side SSE support.

With the %Net.HttpRequest class, typically used in ObjectScript, the response is buffered until the connection is closed. In other words, there is no incremental reading, no progressive tokens,, and therefore no streaming. 

If we wish to integrate LLMs into an IRIS application, it is crucial to be able to handle a text/event-stream flow, parse events on the fly, and process data in real time.

It is precisely why I added client-side support for Server-Sent Events to the fast-http project, enabling an ObjectScript application natively to consume AI APIs in streaming mode.  

Why LLM APIs Use Server-Sent Events Instead of WebSocket?

When you discover that APIs from OpenAI, Anthropic, or Mistral AI use Server-Sent Events for streaming, one question naturally arises:

Why SSE? Why not WebSocket?

At first glance, WebSocket seems to be the obvious choice for "real-time" communication. That would have been my initial thought.However, regarding LLMs, SSE offers several architectural advantages.  

A Unidirectional Flow Is Sufficient

In LLM streaming, the client sends a request, and the server progressively generates a response. There is no need for continuous bidirectional communication.

The pattern is simple:

Client  →  HTTP Request
Server →  Continuous Data Stream

SSE is designed entirely for this unidirectional server-to-client model. WebSocket is inherently bidirectional, which adds unnecessary complexity for this specific use case.

SSE Remains Standard HTTP

SSE relies on HTTP/1.1 with a simple content-type:

Content-Type: text/event-stream

There are no specific requirements or protocol upgrades like with WebSockets. This means the following: 

  • Native compatibility with existing infrastructures
  • Better integration with proxies and load balancers
  • Simplicity on the server side

The streaming process remains a "simple" long-lived HTTP request.

SSE is, therefore, the ideal compromise.  The client does not need to send messages during generation; the server produces a sequential stream, and the connection naturally closes once the response is finished.

Anatomy of a Server-Sent Events Stream

An SSE stream is neither JSON nor traditional "chunked JSON."

It is a structured line-based text stream with a very simple but specific format.

The server responds with the following:

HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

Then, it keeps the connection open and sends a sequence of events.

Structure of an SSE Event

Such event consists of "key: value" lines, followed by a blank line.

basic example:

data: Hello world

The empty line is critical; it signals to the client that the event is complete.

Standard Fields

An SSE event can contain several fields:

  • retry: Recommended reconnection delay
  • event: Custom event type
  • id: Event identifier
  • data: The main payload

An event can also be a simple comment. In this case, it starts with a colon (:), e.g.,:

: this is a comment

Comments are often used as keep-alive mechanisms to maintain an open connection.

Real-world Case: Streaming an LLM API

In the case of such LLM APIs as OpenAI's, a typical stream looks like the one below:  

data: {"id":"...","choices":[{"delta":{"content":"Hel"}}]}

data: {"id":"...","choices":[{"delta":{"content":"lo"}}]}

data: {"id":"...","choices":[{"delta":{"content":" world"}}]}

data: [DONE]

Each data: block corresponds to a chunk of generation.

To correctly consume the SSE stream, we must do the following:

  • Read the response incrementally.
  • Accumulate received lines.
  • Detect event separators (empty lines).
  • Parse the data: content.

We should not wait for the connection to close before processing begins.

This is precisely where using the traditional %Net.HttpRequest becomes problematic: we must receive the full response before we can start processing.

The Challenge in ObjectScript

With fast-http, we could easily make a chat completion call like the one below:

Set body = {
"model": "gpt-4",
"messages": [{"role": "user", "content": "Tell me a short story."}],
"stream": true
}
Set response = ##class(dc.http.FastHTTP).DirectPost("url=https://api.openai.com/v1/chat/completions,Header_Authorization=Bearer {YOUR_TOKEN},Header_Accept=text/event-stream", body, .client)

However, we would not be able to process the response until it was fully received.

Intercepting the Response Stream

To process events as they arrive,we need to be able to access the data exactly when it is received. When exploring the possibilities offered by %Net.HttpRequest, I encountered an interesting property: ResponseStream.

This property allows us to define the stream into which the HTTP response will be written. By default, IRIS creates a %Stream instance,so we typically never touch it or may even ignore its existence.

The idea is elementary: instead of letting %Net.HttpRequest write the response in a standard stream, we can supply a stream we control ourselves.

To do this, simply create a class inheriting from %Stream.GlobalBinary and override the Write and WriteLine writing methods:

/// Stream used to handle stream mode response
Class dc.http.Stream Extends %Stream.GlobalBinary
{

Property SSEHandler As dc.http.SSEHandler;

Method Write(data As %String = "") As %Status
{
    Do ..Notify(data)
    Return ##super(data)
}

Method WriteLine(data As %String = "") As %Status
{
    Do ..Notify(data _ ..LineTerminator)
    Return ##super(data)
}

Method Notify(data As %String) As %Status
{
    Return:'$IsObject(..SSEHandler) $$$OK
    Return ..SSEHandler.BufferProcessing($ZConvert(data, "I", "UTF8"))
}
}

We have our hook!
Every time %Net.HttpRequest receives data from the server, it is automatically written into ResponseStream.

By intercepting the Write and WriteLine methods, it becomes possible to do the following:

  • Analyze the received data.
  • Accumulate the stream's lines.
  • Detect the end of an SSE event.
  • Immediately trigger the corresponding processing.

In other words, we can transform a raw HTTP stream into a real-time, actionable event stream. We simply need to assign our custom stream before sending the request:

Set httpRequest = ##class(%Net.HttpRequest).%New()
Set httpRequest.ResponseStream = ##class(dc.http.Stream).%New()

From this moment on, every chunk received from the server passes through our stream, allowing us to build an SSE client in ObjectScript. Our next step is to correctly parse the text/event-stream events to reconstruct the data: blocks and trigger application callbacks.

SSE Parser Architecture

Rather than implementing the parsing directly within the stream class, the SSE support was built around three main components:

  • Stream that intercepts the received data.
  • Handler that interprets the SSE events.
  • An Adapter that exposes these events to the application.

The goal is to separate responsibilities clearly:

  • The stream strictly deals with receiving network data.
  • The handler parses the SSE protocol, manages a buffer, splits messages on \n\n, and then calls the adapter.
  • The adapter transforms these events into usable callbacks within the application code.

Such design allows flexible and easily reusable implementation.

Users of fast-http no longer need to worry about parsing; they can choose a provided Adapter or implement a custom one.

diagram_class_sse.png

Code-wise, making a call to an endpoint that responds with events takes only a few lines:

Set stream = ##class(dc.http.SSEChatConsoleAdapter).GetStream()
Set config = "url=http://sse-mock:5000/stream,timeout=10"
Set response = ##class(dc.http.FastHTTP).DirectGet(config, , , stream)

If you have started the containers using the docker-compose.yml file, the example above will work with a hardcoded response from sse_server.py.

If you have a key or any other access to an OpenAI-compatible LLM API, you can adapt the following example for a v1/chat/completion call:

Set stream = ##class(dc.http.SSEChatConsoleAdapter).GetStream()
Set body = {"model": "gpt-4","messages": [{"role": "user", "content": "Tell me a short story."}],"stream": true}
Set response = ##class(dc.http.FastHTTP).DirectPost("url=https://api.openai.com/v1/chat/completions,Header_Authorization=Bearer {MyToken}", body, .client, stream)

For better understanding of what happens at runtime, look at a sequence diagram representation below:

diagram_sequence_sse.png

The Adapters

At the time of writing, fast-http implements two adapters:

SSEBasicAdapter
This adapter displays only the parsed SSE event as soon as it is received. It is particularly useful for debugging or getting a grasp of how things work.

SSEChatConsoleAdapter
Tailored for chat completion sessions, it checks the received message type and immediately outputs the generated text if it is a chat.completion.chunk.

Obviously, we can always design other, more generic adapters, such as the one converting the received SSE messages into outgoing WebSocket ones.

The flexibility really lies in the adapter; a developer can create custom adapters simply by inheriting from dc.http.SSEAdapter and implementing the OnMessage method.

Building a Chat Session in ObjectScript

With all those tools at our disposal, building a chat session in ObjectScript becomes straightforward.  Here is a demonstration method (Note: The $Char(27)... sequences are ANSI escape codes used to add colors inside the terminal):

ClassMethod DemoSession(systemPrompt As %String = "You are a helpful assistant.", model As %String = "gpt-4") As %Status
{
    Set config = "url=https://api.openai.com/v1/chat/completions,Header_Authorization=Bearer {MyToken},Header_Accept=text/event-stream"
    Set chat = { "model": (model), "messages": [{"role": "system", "content": (systemPrompt)}], "stream": true }
    Write !,"Ask something to start a session.  Type ""exit"" to quit.",!

    Set stream = ##class(dc.http.SSEChatConsoleAdapter).GetStream()
    Set adapter = stream.SSEHandler.Adapter

    For  {
        Write !,!, $Char(27)_"[1;34mYour message: "_$Char(27)_"[0m"
        Read input
        Quit:input="exit"
        Write !,!, $Char(27)_"[1;32mResponse: "_$Char(27)_"[0m",!
        Do chat.messages.%Push({"role": "user", "content": (input)})
        Do ##class(dc.http.FastHTTP).DirectPost(config, chat, .client, stream)
        Do chat.messages.%Push(adapter.GetAssistantMessage())
    }
    Return $$$OK
}

Here is a short video of this script in action. I did, however, change the endpoint before recording to a point with a lightweight local LLM:


Conclusion

This work on fast-http was born from a simple frustration: when I began experimenting with LLM APIs in ObjectScript, I could clearly see that something was missing. There was a lot of code to write, and responses arrived all at once, whereas AI assistants fundamentally rely on token streaming.

By exploring how Server-Sent Events work and examining the possibilities offered by %Net.HttpRequest, I realized I could fill this gap with a few well-placed abstractions. This led to the implementation of SSE support in fast-http, resulting in a more complete architecture built around Streams, Handlers, and Adapters.

In the end, what started as a mere technical experiment transformed into a more natural way to make integrating AI APIs within an ObjectScript application. Seeing a chat session running directly in the IRIS terminal, with tokens appearing as generation happens, was quite satisfying.

I hope this work, now available to the community, will facilitate other developers' experiments with LLMs using IRIS and, perhaps, bring a little inspiration as well.