The results can be good, but there's room for improvement.

One of my go-to LLM tests is to provide a zero-shot prompt for ObjectScript code that calculates the distance between two points. As seen in this example (https://community.intersystems.com/ask-dc-ai?question_id=238163), which has some issues.

Most LLMs get close to answering this question, but they often fail with operator precedence and / or the square root function, either misspelling $ZSQR as $SQRT or hallucinating a math library that doesnt even exist, such as ##class(%SYSTEM.Math).Sqrt().

The problem stems from the token volume of other languages outweighing ObjectScript in the gradient descent during training. This causes solutions and functions from other languages to bleed into the ObjectScript code.

RAG powered by IRIS vector search is a good approach to address this, but it looks like it can be improved.

Generally, the LLM often has the right idea of how to answer the question, but not always with the correct details. To use an analogy, the LLM already has millions of recipes, but these might not match the ingredients in the kitchen. If we can tell it what ingredients to use, it will do a far better job at morphing a relevant recipe.

One strategy is to first ask the LLM to break down a question into smaller questions, such as "How do you calculate the square root of a number?" However, the DC AI still struggles with these simple atomic questions about the ObjectScript language:

https://community.intersystems.com/ask-dc-ai?question_id=239065
https://community.intersystems.com/ask-dc-ai?question_id=239114
https://community.intersystems.com/ask-dc-ai?question_id=239125

At a minimum, the vector database should also include these atomic facts about IRIS and ObjectScript.

One approach I've experimented with is to produce a compact document of atomic facts and either include the entire text in the prompt or have the LLM select which facts it thinks it needs first from a keyword list.

This results in a prompt reengineered by the LLM itself:

"InterSystems IRIS ObjectScript Reference: ObjectScript has strict left-to-right operator precedence, use brackets to ensure correct operator precedence, $ZSQR(num) is used to calculate the square root of the given number, $ZPOWER(num,exponent) is used to calculate the value of a number raised to a specified power. Task: How do you calculate the distance between two points."

With this approach, we see that the DC AI gives a much better response:

https://community.intersystems.com/ask-dc-ai?question_id=239152

ClassMethod CalculateDistance(x1, y1, x2, y2) As %Float
{
    // Calculate the differences
    Set dx = x2 - x1
    Set dy = y2 - y1
    
    // Calculate the distance using the Pythagorean theorem
    Set distance = $ZSQR($ZPOWER(dx, 2) + $ZPOWER(dy, 2))
    
    Return distance
}

I'm enthusiastic about this technology's potential. If you need a beta tester for future versions, please reach out. I'm also happy to contribute my ideas further if its of use.

Hi Scott,

Probably best to avoid modifying the Ens.MessageHeader properties, doing so might affect trace logging and potentially lead to unexpected side effects.

Here are a few alternative ideas....

  1. Modify the MSH Segment: In the normalization process, tweak the sender or receiver field in the MSH segment to include a unique identifier that corresponds to the source service name. Then use the MSH to route the message.  
  2. Use a Utility Method: Develop a utility method within a class that inherits from Ens.Util.FunctionSet. This method would read the source config name from the first message header in the session. You can then use this method in your router logic as it will be automagically included.  
  3. Separate Normalization Processes: A config only option would be to create a normalization process for each service and then use that process name in the router logic.

Hey Paul,

Half agree if the OP requirements turn out to be a file to file use case.

If not, I wanted to defend the record mapper solution a little so as not to put off other readers from this approach.

As an example, I recently implemented a solution that had a million line CSV file. This would generate around 300,000 messages from a complex mapping. Each record was individually passed into a DTL which in turn produced a master file update message. These were then pushed into the local EPR (as per the EPR's requirements).

Yes, many messages, but not what I would call an overhead when taken into context of the frequency and need for M16 messages.

Bottom line, the solution I selected was the most maintainable solution. Almost zero glue code to go wrong and all maintanence managed inside DTL's. This is exactly what the record mapper was designed for.

Looks like there are some conflicts in the points...

Java Native has 2 and then 3 points.

Here the LLM has 3, 4 and 6 points...

LLM AI or LangChain usage: Chat GPT, Bard and others - 3
LLM AI or LangChain usage: Chat GPT, Bard and others - 4 points
Collect 6 bonus expert points for building a solution that uses LangChain libs or Large Language Models (LLM)

I recommend using two terminals to experiment with locking and unlocking various globals. By observing the lock table during this process, you'll gain a clearer understanding of lock behavior across different processes.

Next, consider what you're aiming to lock. In other words, identify what you're trying to safeguard and against which potential issues. For instance, is the variable "loc" unique? Could two processes obtain the same value for "loc"? Without seeing the preceding code, it's challenging to discern if "loc" was assigned manually or via `$Increment`. Remember, using `$Increment` eliminates the need for locks in most cases.

Also, reevaluate your decision to use transactions for a single global write. If your goal is transactional integrity, think about which additional global writes should be encompassed in that transaction. For example, if "obj" is defined just before this code with a `%Save()` method, then include that save in the same transaction. Otherwise, a system interruption between the two actions can lead to an unindexed object, compromising data integrity.

I strongly advise revisiting the documentation multiple times and actively experimenting with these concepts. While these techniques offer significant advantages, their efficacy diminishes if not executed properly.

Hi Rochdi,

As mentioned, always ensure locks are explicitly released after use.

Reading between the lines a little, you might find these following points useful...

  • Locks don't actually "seal" a global node; they're advisory.
  • Any rogue process can still write to a locked global.
  • Another process is only deterred from writing to a global if it also attempts to obtain a lock and fails. The developer is responsible to implement this and handle failed locks in every location a write happens.
  • Without a timeout, a process can hang indefinitely due to a lock. You could argue its good practice to use a timeout.
  • If you implement a timeout, always verify the value of $test to ensure you've acquired the lock and not just timed out.
  • $Increment() is useful for creating sequenced ordinal ID's that are always unique, without the need for locks. This is true for high-concurrency solutions.

(If you're using locks only to make the key "inc" unique, then consider using $increment and forgo the locks.)

Hi Yone,

I would keep it simple, avoid unpacking JSON here and make pResponse a generic Ens.StreamContainer

Something like this should do it...

set tSC=httpRequest.Post(URL,0)
if $$$ISERR(tSC) return tSC  //no need to throw, the director will handle tSC
set pResponse=##class(Ens.StreamContainer).%New()
return pResponse.StreamSet(tResponse.Data)
ClassMethod OnPage() As %Status [ ServerOnly = 1 ]
{
    
    //just the query string...
    set qs=%request.CgiEnvs("QUERY_STRING")
        
    //SOLUTION 1: $piece only
    set externalCenterCode=$p(qs,":")	
    set startDateRange=$p($p(qs,":authoredOn=le",2),":")
    set endDataRange=$p($p(qs,":authoredOn=ge",2),":")

    
    //SOLUTION 2: generic solution if params grow	
    for i=1:1:$l(qs,":") {
        set nvp=$p(qs,":",i),name=$p(nvp,"=",1),value=$p(nvp,"=",2)
        //fix the quirks
        if value="" set value="name",name="ecc"
        if name="authoredOn" set name=$e(value,1,2),value=$e(value,3,*)
        set params(name)=value
    }

    //SOLUTION 3: regex(ish) solution
    set code=$p(qs,":")
    set loc=$locate(qs,"le\d{4}-\d{2}-\d{2}")
    set start=$e(qs,loc+2,loc+11)
    set loc=$locate(qs,"ge\d{4}-\d{2}-\d{2}")
    set end=$e(qs,loc+2,loc+11)


    //some helper code to dump the variables into the CSP page
    write !,"<pre>"
    zwrite
    //use this to take a good look at the request object...
    zwrite %request
    write !,"</pre>"
    quit $$$OK
}

Here are three solutions and a couple of inline tips, including your request for regex example

I wouldn't worry too much about using $piece, its very common to use it in this way

Eduards comment above also has a fourth suggestion to use $lfs (list from string) which is also commonly used as a way of piecing out data