Replies by Sean Connelly for InterSystems Developer Community

Sean Connelly · Aug 26, 2024

The results can be good, but there's room for improvement.

One of my go-to LLM tests is to provide a zero-shot prompt for ObjectScript code that calculates the distance between two points. As seen in this example (https://community.intersystems.com/ask-dc-ai?question_id=238163), which has some issues.

Most LLMs get close to answering this question, but they often fail with operator precedence and / or the square root function, either misspelling $ZSQR as $SQRT or hallucinating a math library that doesnt even exist, such as ##class(%SYSTEM.Math).Sqrt().

The problem stems from the token volume of other languages outweighing ObjectScript in the gradient descent during training. This causes solutions and functions from other languages to bleed into the ObjectScript code.

RAG powered by IRIS vector search is a good approach to address this, but it looks like it can be improved.

Generally, the LLM often has the right idea of how to answer the question, but not always with the correct details. To use an analogy, the LLM already has millions of recipes, but these might not match the ingredients in the kitchen. If we can tell it what ingredients to use, it will do a far better job at morphing a relevant recipe.

One strategy is to first ask the LLM to break down a question into smaller questions, such as "How do you calculate the square root of a number?" However, the DC AI still struggles with these simple atomic questions about the ObjectScript language:

https://community.intersystems.com/ask-dc-ai?question_id=239065
https://community.intersystems.com/ask-dc-ai?question_id=239114
https://community.intersystems.com/ask-dc-ai?question_id=239125

At a minimum, the vector database should also include these atomic facts about IRIS and ObjectScript.

One approach I've experimented with is to produce a compact document of atomic facts and either include the entire text in the prompt or have the LLM select which facts it thinks it needs first from a keyword list.

This results in a prompt reengineered by the LLM itself:

"InterSystems IRIS ObjectScript Reference: ObjectScript has strict left-to-right operator precedence, use brackets to ensure correct operator precedence, $ZSQR(num) is used to calculate the square root of the given number, $ZPOWER(num,exponent) is used to calculate the value of a number raised to a specified power. Task: How do you calculate the distance between two points."

With this approach, we see that the DC AI gives a much better response:

https://community.intersystems.com/ask-dc-ai?question_id=239152

ClassMethod CalculateDistance(x1, y1, x2, y2) As %Float
{
// Calculate the differences
Set dx = x2 - x1
Set dy = y2 - y1

// Calculate the distance using the Pythagorean theorem
Set distance = $ZSQR($ZPOWER(dx, 2) + $ZPOWER(dy, 2))

Return distance
}

I'm enthusiastic about this technology's potential. If you need a beta tester for future versions, please reach out. I'm also happy to contribute my ideas further if its of use.