Find

Announcement
· Oct 22, 2024

[Video] Using LLMs with InterSystems IRIS Interoperability Productions

Hi Community,

Play the new video on InterSystems Developers YouTube:

⏯ Using LLMs with InterSystems IRIS Interoperability Productions @ Global Summit 2024

When we think of Large Language Models (LLMs), we usually think of chatbots like ChatGPT. These mimic human communication and are great for interacting with humans. But what if we have a different use case, such as a requirement for machine- readable output? Prompt engineering allows us to influence the output of the model. This video uses examples (e.g. sentiment analysis) to show how you can use InterSystems IRIS Productions, Embedded Python, and REST APIs to realize your own LLM-based applications. This can be done either by using self-hosted models to achieve maximum data protection or by using web services to gain access to the most powerful models currently available.  

🗣 Presenter: @Andreas Schuetz, Sales Engineer, InterSystems

Enjoy watching, and expect more videos! 👍

Discussion (0)1
Log in or sign up to continue
Question
· Oct 22, 2024

cannot connect to the management portal but can connect to the web gateway - "Server is currently unavailable"

I just restarted an AWS instance that have been dormant for 6 months (license still valid) only to find that the web application on there doesn't work and that I cannot connect to the management portal. I can however connect to the web gateway fine but it reports the "Server is currently unavailable". 

Grateful any for any advice.

 

Fred Gustafsson

2 Comments
Discussion (2)2
Log in or sign up to continue
Discussion (0)1
Log in or sign up to continue
Article
· Oct 22, 2024 4m read

Descubriendo pistas consultando las tablas de mensajes de interoperabilidad

Cuando utilizáis InterSystems IRIS como motor de interoperabilidad, todos sabemos y apreciamos lo fácil que es usar el Visor de Mensajes para revisar las trazas de los mensajes y ver exactamente qué está ocurriendo en vuestra producción. Sin embargo, cuando un sistema maneja millones de mensajes al día, puede que no sepáis por dónde empezar vuestra investigación.

A lo largo de mis años apoyando producciones en IRIS, a menudo me encuentro investigando cosas como...

  • ¿Qué tipo de rendimiento tiene este flujo de trabajo?  
  • ¿Dónde está el cuello de botella?  
  • ¿Cuáles son mis errores más comunes?

Uno de mis lugares favoritos para buscar pistas es la tabla de Encabezados de Mensajes, que almacena metadatos sobre cada mensaje que pasa por el sistema. Estos son los mismos mensajes que aparecen en el Visor de Mensajes y en las Trazas Visuales.

He recopilado una colección de consultas SQL útiles, y me encantaría compartirlas con vosotros. Mis ejemplos provienen principalmente de casos de uso de HealthShare o IRIS for Health, pero se pueden adaptar fácilmente a cualquier flujo de trabajo que tengáis...

-- SQL query to find the # of messages through a component per day
select {fn SUBSTRING(timeprocessed,1,10)} AS day, count(*) MessagesThisDay 
FROM Ens.MessageHeader
where TargetConfigName = 'HS.Hub.Push.Evaluator' 
GROUP BY {fn SUBSTRING(timeprocessed,1,10)}
ORDER BY day ASC
-- SQL query to find long-running messages through particular components
SELECT PReq.SessionID as SessionId, 
  PReq.TimeCreated as pReqTimeCreated, 
  PRes.TimeCreated as pResTimeCreated, 
  {fn TIMESTAMPDIFF(SQL_TSI_SECOND, PReq.TimeCreated,PRes.TimeCreated)} as TimeDelay
FROM (
  SELECT ID, SessionId, TimeCreated
  FROM Ens.MessageHeader
  WHERE MessageBodyClassName = 'HS.Message.PatientSearchRequest'
  AND SourceConfigName = 'HS.Hub.MPI.Manager'
  AND TargetConfigName = 'HUB'
) as PReq
INNER JOIN (
  SELECT ID, SessionId, TimeCreated
  FROM Ens.MessageHeader
  WHERE MessageBodyClassName = 'HS.Message.PatientSearchResponse'
  AND SourceConfigName = 'HS.Hub.MPI.Manager'
  AND TargetConfigName = 'HS.IHE.PIX.Manager.Process'
) as PRes on pReq.SessionId = PRes.SessionId
WHERE {fn TIMESTAMPDIFF(SQL_TSI_SECOND, PReq.TimeCreated,PRes.TimeCreated)} > 1
ORDER BY SessionId desc ----------------------------------------------------------
/*-- Query to find the bottleneck message through a particular component
  -- set your threshold for "how long is too long (e.g. 20 seconds)
  -- look for clusters of messages that are longer than that (e.g. the first cluster started at 3:22:00, then there was a second cluster at 5:15:30)
  -- in each cluster, look at the first message in that cluster (chronologically). That is likely to be the bottleneck message, and all messages after it are victims of its bottleneck 
*/
SELECT %NOLOCK req.TargetConfigName, req.MessageBodyClassName, req.SessionId, req.TimeCreated, req.TimeProcessed, {fn TIMESTAMPDIFF(SQL_TSI_SECOND, req.TimeCreated, req.TimeProcessed)} as TimeToProcess
FROM Ens.MessageHeader AS req
WHERE req.TargetConfigName = 'HS.Hub.Management.Operations'
  AND req.TimeCreated BETWEEN '2021-04-21 00:00:00' AND '2021-04-21 11:00:00'
  AND {fn TIMESTAMPDIFF(SQL_TSI_SECOND, req.TimeCreated, req.TimeProcessed)} > 20
/* If you have a particular error that you're investigating, try this one. It scans through the Ensemble Error Log for "Object to Load not found" entries, then returns some key fields from the relevant PatientSearchRequest message */
SELECT l.SessionId, mh.MessageBodyID, mh.TimeCreated, psr.SearchMode, psr.RequestingUser, psr.FirstName, psr.MiddleName, psr.LastName, psr.SSN, psr.Sex, psr.DOB
FROM Ens_Util.Log as l
INNER JOIN Ens.MessageHeader as mh on l.SessionId = mh.SessionId
INNER JOIN HS_Message.PatientSearchRequest as psr on mh.MessageBodyID = psr.ID
WHERE l.Type = 'Error'
AND l.ConfigName = 'HSPI.Server.APIOperation'
AND l.Text like 'ERROR #5809: Object to Load not found%'
AND mh.MessageBodyClassName = 'HS.Message.PatientSearchRequest'
AND mh.SourceConfigName = 'HSPI.Server.APIWebService'
AND mh.TargetConfigName = 'HSPI.Server.APIOperation'
-- Scan the Ensemble Error Log for a particular timeframe. Count up the different types of errors
SELECT substring(text,1,80) as AbbreviatedError, count(*) as NumTheseErrors
FROM Ens_Util.Log
WHERE Type = 'Error'
AND TimeLogged > '2022-03-03 00:00:00' -- when the last batch started
AND TimeLogged < '2022-03-03 16:00:00' -- when we estimate this batch might end
GROUP BY substring(text,1,80)
ORDER BY NumTheseErrors desc
-- Find the Gateway Processing Time for each StreameltRequest / ECRFetchResponse pair
SELECT sr.Gateway,request.sessionid, response.sessionid, request.timecreated AS starttime, response.timecreated AS stoptime, 
  datediff(ms,request.timecreated,response.timecreated) AS ProcessingTime, 
  Avg(datediff(ms,request.timecreated,response.timecreated)) AS AverageProcessingTimeAllGateways
FROM Ens.MessageHeader request
INNER JOIN Ens.MessageHeader AS response ON response.correspondingmessageid = request.id
INNER JOIN HS_Message.StreamletRequest AS sr ON sr.ID = request.MessageBodyId
WHERE request.messagebodyclassname = 'HS.Message.StreamletRequest'
AND response.messagebodyclassname = 'HS.Message.ECRFetchResponse'
Discussion (0)1
Log in or sign up to continue
Article
· Oct 22, 2024 5m read

LLM Models and RAG Applications Step-by-Step - Part III - Searching and Injecting Context

Welcome to the third and final publication of our articles dedicated to the development of RAG applications based on LLM models. In this final article, we will see, based on our small example project, how we can find the most appropriate context for the question we want to send to our LLM model and for this we will make use of the vector search functionality included in IRIS.

Meme Creator - Funny Context Meme Generator at MemeCreator.org!

Vector searches

A key element of any RAG application is the vector search mechanism, which allows you to search within a table with records of this type for those most similar to the reference vector. To do this, it is necessary to first generate the embedding of the question that is going to be passed to the LLM. Let's take a look at our example project to see how we generate this embedding and use it to launch the query to our IRIS database:

model = sentence_transformers.SentenceTransformer('sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2')  
question = model.encode("¿Qué medicamento puede tomar mi hijo de 2 años para bajar la fiebre?", normalize_embeddings=True)
array = np.array(question)
formatted_array = np.vectorize('{:.12f}'.format)(array)
parameterQuery = []
parameterQuery.append(str(','.join(formatted_array)))
cursorIRIS.execute("SELECT distinct(Document) FROM (SELECT VECTOR_DOT_PRODUCT(VectorizedPhrase, TO_VECTOR(?, DECIMAL)) AS similarity, Document FROM LLMRAG.DOCUMENTCHUNK) WHERE similarity > 0.6", parameterQuery)
similarityRows = cursorIRIS.fetchall()

As you can see, we instantiate the embedding generator model and vectorize the question that we are going to send to our LLM. Next, we launch a query to our LLMRAG.DOCUMENTCHUNK data table looking for those vectors whose similarity exceeds 0.6 (this value is entirely based on the product developer's criteria).

As you can see, the command used for the search is VECTOR_DOT_PRODUCT, but it is not the only option, let's take a look at the two options we have for similarity searches.

Dot Product (VECTOR_DOT_PRODUCT)

This algebraic operation is nothing more than the sum of the products of each pair of elements that occupy the same position in their respective vectors, represented as follows

generated description: dot product calc

InterSystems recommends using this method when the vectors being worked on are unitary, that is, their modulus is 1. For those of you who are not familiar with algebra, the modulus is calculated as follows:

{\displaystyle \mid {\vec {v}}\mid ={\sqrt {v_{1}^{2}+v_{2}^{2}+v_{3}^{2}}}}

Cosine Similarity (VECTOR_COSINE)

This calculation represents the scalar product of the vectors divided by the product of their lengths and its formula is as follows:

generated description: cosine calc

In both cases, the closer the result is to 1, the greater the similarity between the vectors

Context injection

With the query above we will obtain the texts that are related to the question that we will send to the LLM. In our case we are not going to send the text that we have vectorized since it is more interesting to send the entire document with the drug leaflet since the LLM model will be able to put together a much more complete response with the entire document. Let's take a look at our code to see how we are doing it:

for similarityRow in similarityRows:
    for doc in docs_before_split:
        if similarityRow[0] == doc.metadata['source'].upper():
            context = context +"".join(doc.page_content)
prompt = hub.pull("rlm/rag-prompt")

rag_chain = (
    {"context": lambda x: context, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke("¿Qué medicamento puede tomar mi hijo de 2 años para bajar la fiebre?")

Empezamos con un bucle for que nos recorrerá todos los registros vectorizados que son similares a la pregunta realizada. Como en nuestro ejemplo

We start with a for loop that will go through all the vectorized records that are similar to the question asked. As in our example we have the documents stored in the memory from the previous step of cutting and vectorizing, we have reused it in the second loop to extract the text directly. The best thing to do would be to access the document that we will have stored in our system without needing to use that second for loop.

Once the text of the documents that will form the context of the question has been stored in a variable, the next step will be to inform the LLM of the context of the question that we are going to pass to it. Once the context has been passed, we only need to send our question to the model. In this case, we want to know what medicines we can give our 2-year-old son to lower his fever. Let's look at the answers without context and with context:

Without context:

Fever in young children can be worrying, but it is important to manage it appropriately. For a 2-year-old, the most commonly recommended fever-reducing medications are paracetamol (acetaminophen) and ibuprofen.

With context:

Dalsy 40 mg/ml oral suspension, containing ibuprofen, can be used in children from 3 months of age for fever relief. The recommended dose for children 2 years of age depends on their weight and should be administered by prescription. For example, for a child weighing 10 kg, the recommended dose is 1.8 to 2.4 mL per dose, with a maximum daily dose of 7.2 mL (288 mg). Always consult a doctor before administering any medication to a child.

As you can see, when we do not have context the answer is quite generic, while with the appropriate context the answer is much more direct and indicates that it should always be under medical prescription

Conclusions

In this series of articles we have presented the fundamentals of RAG application development. As you can see, the basic concepts are quite simple, but as we know, the devil is always in the details. For every project, the following decisions need to be made:

  • Which LLM model to use? On-premise or online service?
  • Which embedding model should we use? Does it work correctly for the language we are going to use? Do we need to lemmatize the texts we are going to vectorize?
  • How are we going to divide our documents for the context? By paragraph? By text length? With overlaps?
  • Is our context based only on unstructured documents or do we have different data sources?
  • Do we need to re-rank the results of the vector search with which we have extracted the context? And if so, what model do we apply?
  • ...

Developing RAG applications involves more effort in validating results than in the actual technical development of the application. You must be very, very sure that your application does not provide erroneous or inaccurate answers, as this could have serious consequences, not only legal but also in terms of trust.

Discussion (0)1
Log in or sign up to continue