Vector Embeddings Feedback

Discussion

#Docker #Large Language Model (LLM) #Vector Search #InterSystems IRIS

Background

Embeddings is a new IRIS feature empowering the latest capability in AI semantic search.
This presents as a new kind of column on a table that holds vector data.
The embedding column supports search for another existing column of the same table.
As records are added or updated to the table, the supported column is passed through an AI model and the semantic signature is returned.
This signature information is stored as the vector for future search comparison.
Subsequently when search runs, a comparison of the stored signatures occurs without any further AI model processing overhead.

Embedding search is like having a future proof categorization capability without manually adding new categories to existing data or labeling records.

ie: Show me others like this one.

The following ideas , questions and statements are limited but hope is they provide a starting point for any discssion and directions.

Class compilation deployment dependency

Before a class with an embedding column can be compiled a named config must have already been deployed.
This could affect patching for example Change-Control-Record other deployments needing a pre-insert-config step before loading new class versions step.

Clarification: If the configuration changes for a given embedded config name should the classes that use it need to be recompiled. ie: Is there any generator behavior cos / python that need to be aware of. If there is no actual compilation dependency, maybe a warning would be more flexible.
Also expecting it is supported to update the config after the class is compiled.

Docker build workaround

Ordinarily would insert embedding config by SQL but there seemed a SQL dependency was not available at Docker build time, so have used Object insert instead:
File: iris.script

// Add Embedding before dependent classes can be compiled
Set embedConf=##class(%Embedding.Config).%New()
Set embedConf.Name="toot-v2-config"
Set embedConf.Configuration="{""modelName"": ""toot-v2-config"",""modelPath"":""/opt/hub/toot/"",""tokenizerPath"":""/opt/hub/toot/tokenizer_tune.json"",""HotStart"":1}"
Set embedConf.EmbeddingClass="TOOT.Data.Embedding2"
Set embedConf.VectorLength=384
Set embedConf.Description="an embedding model provided by Alex Woodhead"
Set tSC=embedConf.%Save()

Where docker build runs instruction

RUN --mount=type=bind,src=.,dst=. \
    iris start IRIS && \
	iris session IRIS < iris.script && \
	iris session IRIS -U %SYS "##class(SYS.Container).QuiesceForBundling()" && \
    iris stop IRIS quietly

Online updates

One feature used in production for the Trackcare product is online index updates.
This allows users safely back onto a system while new indexes are being built.
It reduces or eliminates patching downtime for end user.
Can also allow application specialists to smoke test a patched production early to accelerate availability.
Capability: At a point in time the application just transparently switches to using the newest version of an index.
Is there synergy for an online update capability for embeddings, as a competitive feature?
Consider an IRIS customer will have to make a decision to choose a specific model to get embeddings from.
As a persistent data column this is in the hands of the IRIS customer to manage.
One trend that seems relentless is that better, smaller and more efficient embedding models keep arriving.

Challenge 1

An external API is used to generate embeddings.
The dependency service upgrades to a new version with different embeddings.
Is there timly planned application downtime to update all embeddings to use the new API version.
ie: The embedding of new search queries need to resemble the embeddings already saved in the table.

Challenge 2

The embedding model is staged locally ( from Huggingface or local directory).
Now the development project is responsible for timely model version choice.
Do they delay choice waiting for a better model later in the project?
Delaying can reduce early learnings in an innovative project cycle.

Challenge 3

Bespoke embeddings. Harder task, higher reward.
Having a hard cutoff for embeddings quality completion.
Therefore no scope to upgrade production embedding data after go live.
Can this dissuade an otherwise viable business option.

The ask:

Could an "Online Update" equivalent IRIS feature allow seamless transparent switch-over to a newer embeddings version in production? Does this improve early adoption for using the IRIS embeddings feature over the choice of additional hybrid services?
Could an online update atomically also wrap the update of existing embedding config to new version (same name). ie: The named config in property/index in class definition will remain unaltered. So no recomplation necessary.

Embedding Batching

The current interface facilitates the generation of a single embedding, one record at a time.
1) Hypothesis - For external API embedding services, batching into less messages could be more efficient in latency, throughput and processing cost of the remote API?
2) Hypothesis - Where enabled by config and infrastructure capable, batching embedding generation cold be a more efficient use of locally hosted models and corresponding GPU / CPU?
Where this might have synergy
In the area of table index updates, there is a batching context where multiple updates to same bitmap chunk is deferred.
Was wondering if this also might have a similar context to schedule and then conclude a batch of embedding updates.

Customization

The customization of embedding is straight forward.
Subclass %Embedding.Interface
Implement two methods:
* Embedding
* IsValidConfig

A good reason to subclass at least in development environment is to give additional instrumentation points to log and catch errors (IRIS and Python), warnings, trace information, or specific issues loading and using a model.
Debugging this area is a bit different than conventional object script.
Some suggestions:

ClassMethod LoadModelPy(filepathIn, As %String status As Ens.Util.PyByRef) As %Boolean [ Language = python ]
import traceback
if not os.path.exists(filepathIn):
	if None!=status:
		status.value=iris.cls("%SYSTEM.Status").Error(5001,"File not found at "+filepathIn)
	return 0
try:
	# bla bla
	pass
except:
	print(traceback.format_exc())
	if None!=status:
		status.value=iris.cls("%SYSTEM.Status").Error(5001,"Error parsing midi file "+filepathIn+"::"+traceback.format_exc())
	return 0

Class not found error

Did you change the name of the class referred to by the embedding config, but have not updated the embedding config to the new value?

Controlling GPU usage

The python method that loads the model is an opportunity to confirm whether GPU is available.
Additionally the config could guide whether any GPU should actually be used. For example a web security context may always prefer using CPU only models.

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
if 'cpu'!=device:
    model.value.to(device)

Accept that an environment variable may alternatively be the preferred control point for the solution.

Accept the M1 to M4 Apple procesor has a different query for GPU availability.

Caching the model idea

How it works:
The first time an "Embedding" method is called for a new embedding:
The AI model is loaded into a process wide variable for example: %model

Set modelV=##class(Ens.Util.PyByRef).%New()
Do ..LoadModelPy(config.%Get("modelPath"),config.%Get("tokenizerPath"),modelV)
Set %model("Toot","modelName")=config.%Get("modelName")
Set %model("Toot")=modelV.value

model.value = SentenceTransformer(...)

Each subsequent time the embedding method by IRIS, the same already loaded model is reused.
This means the model is not reloaded for each embedding insert.
It can be made to respond to config changes For example, reloading for a new config version.

The model can be cleared down by the application by removing the variable.

Reference

Feedback was from exploring using embedding in an application task:

https://openexchange.intersystems.com/package/toot

Hope this helps