New post

Find

Announcement
· May 20

Vote for the Best Demo! The InterSystems Demo Games Are On!

Hi Community,

We’ve got something exciting for you — it’s time for a new demo video contest, and this time, you’re in the judge’s seat!

📺 Demo Games for InterSystems Sales Engineers 📺

For this contest, InterSystems Sales Engineers from around the world submitted short demo videos showcasing unique use cases, smart solutions, and powerful capabilities of InterSystems technologies.

Now it’s your turn! We’re opening up voting to the entire Developer Community. Your insight and perspective as developers make you the perfect experts.

👉 How to participate

  • Watch the demos on the Demo Games Contest Page
  • Vote for your favourite entries (Developer Community login required)

🗓 Contest period

Voting is open from May 26 until September 14, 2025, and winners will be announced shortly after.

New videos will be added throughout the contest — so keep checking back. You might find a favourite right at the end!

✅ How to vote: 

All active members who have made a valid contribution to the Developer Community or Open Exchange — such as asking or answering questions, writing articles, or publishing applications — are eligible to vote. This includes customers, partners, and employees who registered using their corporate email addresses.

To vote:

  1. Log in (or create an account) on the Developer Community
  2. Visit the Contest Page
  3. Select your top 3 favourite videos and click the “Vote” button for each

🏆 Scoring system:

  • 1st place vote = 9 points
  • 2nd place = 6 points
  • 3rd place = 3 points

🔁 Votes from the Official Community Moderators* count double, so your top picks really matter!

* Moderators are indicated by a green circle around their avatar. 


Let the Demo Games begin – and may the best demo win!

Discussion (0)2
Log in or sign up to continue
Discussion (3)2
Log in or sign up to continue
Article
· May 20 7m read

Uma Solução de Busca Semântica de Código para o TrakCare Utilizando o IRIS Vector Search

Este artigo apresenta uma solução em potencial para a busca semântica de código no TrakCare usando o IRIS Vector Search.

Aqui está uma breve visão geral dos resultados da busca semântica de código do TrakCare para a consulta: "Validação antes de salvar o objeto no banco de dados".

 

  • Modelo de Embedding de Código

Existem diversos modelos de embedding desenvolvidos para frases e parágrafos, mas eles não são ideais para embeddings específicos de código.

Foram avaliados três modelos de embedding específicos para código: voyage-code-2, CodeBERT e GraphCodeBERT. Embora nenhum desses modelos tenha sido pré-treinado para a linguagem ObjectScripts, eles ainda superaram os modelos de embedding de propósito geral nesse contexto.

O CodeBERT foi escolhido como o modelo de embedding para esta solução, oferecendo desempenho confiável sem a necessidade de uma chave de API.😁

class GraphCodeBERTEmbeddingModel:
    def __init__(self, model_name="microsoft/codebert-base"):
        self.tokenizer = RobertaTokenizer.from_pretrained(model_name)
        self.model = RobertaModel.from_pretrained(model_name)
    
    def get_embedding(self, text):
        """
        Generate a CodeBERT embedding for the given text.
        """
        inputs = self.tokenizer(text, return_tensors="pt", max_length=512, truncation=True, padding="max_length")
        with torch.no_grad():
            outputs = self.model(**inputs)
        # Use the [CLS] token embedding for the representation
        cls_embedding = outputs.last_hidden_state[:, 0, :].squeeze().numpy()
        return cls_embedding

 

  • Banco de Dados Vetorial IRIS

Uma tabela é definida com uma coluna do tipo VECTOR para armazenar os embeddings. Observe que o índice COLUMNAR não é suportado para colunas do tipo VECTOR.

Os embeddings do CodeBERT possuem 768 dimensões. Ele pode processar textos com comprimento máximo de 512 tokens.

CREATE TABLE TrakCareCodeVector (
                file VARCHAR(150),
                codes VARCHAR(2000),
                codes_vector VECTOR(DOUBLE,768)
            ) 

  

  • API DB Python

A API DB Python é utilizada para estabelecer uma conexão com a instância IRIS e executar as instruções SQL.

  1. Construir o Banco de Dados Vetorial para o Código Fonte TrakCare
  2. Recuperar os Top_K Embeddings de Código com Maior DOT_PRODUCT do Banco de Dados Vetorial IRIS
# build IRIS vector database
import iris
import os
from dotenv import load_dotenv

load_dotenv()

class IrisConn:
    """Connection with IRIS instance to execute the SQL statements """
    def __init__(self) -> None:
        connection_string = os.getenv("CONNECTION_STRING")
        username = os.getenv("IRISUSERNAME")
        password = os.getenv("PASSWORD")

        self.connection = iris.connect(
            connectionstr=connection_string,
            username=username,
            password=password,
            timeout=10000,
        )
        self.cursor = self.connection.cursor()

    def insert(self, params: list):
        try:
            sql = "INSERT INTO TrakCareCodeVector (file, codes, codes_vector) VALUES (?, ?, TO_VECTOR(?,double))"
            self.cursor.execute(sql,params)
        except Exception as ex:
            print(ex)
    
    def fetch_query(self, query: str):
        self.cursor.execute(query)
        return self.cursor.fetchall()

    def close_db(self):
        self.cursor.close()
        self.connection.close()
        
from transformers import AutoTokenizer, AutoModel, RobertaTokenizer, RobertaModel, logging
import torch
import numpy as np
import os
from db import IrisConn
from GraphcodebertEmbeddings import MethodEmbeddingGenerator
from IRISClassParser import parse_directory
import sys, getopt

class GraphCodeBERTEmbeddingModel:
    def __init__(self, model_name="microsoft/codebert-base"):
        self.tokenizer = RobertaTokenizer.from_pretrained(model_name)
        self.model = RobertaModel.from_pretrained(model_name)
    
    def get_embedding(self, text):
        """
        Generate a CodeBERT embedding for the given text.
        """
        inputs = self.tokenizer(text, return_tensors="pt", max_length=512, truncation=True, padding="max_length")
        with torch.no_grad():
            outputs = self.model(**inputs)
        # Use the [CLS] token embedding for the representation
        cls_embedding = outputs.last_hidden_state[:, 0, :].squeeze().numpy()
        return cls_embedding
        
class IrisVectorDB:
    def __init__(self, vector_dim):
        """
        Initialize the IRIS vector database.
        """
        self.conn = IrisConn()
        self.vector_dim = vector_dim

    def insert(self, description: str, codes: str, vector):
        params=[description, codes, f'{vector.tolist()}']
        self.conn.insert(params)

    def search(self, query_vector, top_k=5):
        query_vectorStr = query_vector.tolist()
        query = f"SELECT TOP {top_k} file,codes FROM TrakCareCodeVector ORDER BY VECTOR_COSINE(codes_vector, TO_VECTOR('{query_vectorStr}',double)) DESC"
        results = self.conn.fetch_query(query)
        return results
    
# Chatbot for code retrieval
class CodeRetrieveChatbot:
    def __init__(self, embedding_model, vector_db):
        self.embedding_model = embedding_model
        self.vector_db = vector_db
    
    def add_to_database(self, description, code_snippet, embedding = None):
        if embedding is None:
            embedding = self.embedding_model.get_embedding(code_snippet)
        self.vector_db.insert(description, code_snippet, embedding)
    
    def retrieve_code(self, query, top_k=5):
        """
        Retrieve the most relevant code snippets for the given query.
        """
        query_embedding = self.embedding_model.get_embedding(query)
        results = self.vector_db.search(query_embedding, top_k)
        return results
  • Fragmentos de Código (Code Chunks)

Como o CodeBERT consegue processar textos com um comprimento máximo de 512 tokens, classes e métodos muito grandes precisam ser divididos em partes menores. Cada uma dessas partes, ou fragmentos de código, é então incorporada (embedded) e armazenada no banco de dados vetorial.

from transformers import AutoTokenizer, AutoModel, RobertaTokenizer, RobertaModel
import torch
from IRISClassParser import parse_directory

class MethodEmbeddingGenerator:
    def __init__(self, model_name="microsoft/codebert-base"):
        """
        Initialize the embedding generator with CodeBERT.

        :param model_name: The name of the pretrained CodeBERT model.
        """
        self.tokenizer = RobertaTokenizer.from_pretrained(model_name)
        self.model = RobertaModel.from_pretrained(model_name)
        self.max_tokens = self.tokenizer.model_max_length  # Typically 512 for CodeBERT
    def chunk_method(self, method_implementation):
        """
        Split method implementation into chunks based on lines of code that approximate the token limit.

        :param method_implementation: The method implementation as a string.
        :return: A list of chunks.
        """
        lines = method_implementation.splitlines()
        chunks = []
        current_chunk = []
        current_length = 0
        for line in lines:
            # Estimate tokens of the line
            line_token_estimate = len(self.tokenizer.tokenize(line))
            if current_length + line_token_estimate <= self.max_tokens - 2:
                current_chunk.append(line)
                current_length += line_token_estimate
            else:
                # Add the current chunk to chunks and reset
                chunks.append("\n".join(current_chunk))
                current_chunk = [line]
                current_length = line_token_estimate

        # Add the last chunk if it has content
        if current_chunk:
            chunks.append("\n".join(current_chunk))

        return chunks

    def get_embeddings(self, method_implementation):
        """
        Generate embeddings for a method implementation, handling large methods by chunking.

        :param method_implementation: The method implementation as a string.
        :return: A list of embeddings (one for each chunk).
        """
        chunks = self.chunk_method(method_implementation)
        embeddings = {}

        for chunk in chunks:
            inputs = self.tokenizer(chunk, return_tensors="pt", truncation=True, padding=True, max_length=self.max_tokens)
            with torch.no_grad():
                outputs = self.model(**inputs)
                # Use the [CLS] token embedding (index 0) as the representation
                cls_embedding = outputs.last_hidden_state[:, 0, :].squeeze(0)
                embeddings[chunk] = cls_embedding.numpy()

        return embeddings

    def process_methods(self, methods):
        """
        Process a list of methods to generate embeddings for each.

        :param methods: A list of dictionaries with method names and implementations.
        :return: A dictionary with method names as keys and embeddings as values.
        """
        method_embeddings = {}
        for method in methods:
            method_name = method["name"]
            implementation = method["implementation"]
            print(f"Processing method embedding: {method_name}")
            method_embeddings[method_name] = self.get_embeddings(implementation)
        return method_embeddings
  • Interface do Usuário - O Aplicativo Angular

A arquitetura utiliza Angular como frontend e Python (Flask) como backend.

 

  • Próximos Passos 

O resultado da busca não é perfeito porque o modelo de embedding não foi pré-treinado para ObjectScripts.

Discussion (0)1
Log in or sign up to continue
Question
· May 20

Poor database write performance

Hello,

I have created this script that does lot of writes to a single global. DB write performance is much slower than expected (compared to another similar systems).

set rec = "..." //fill it with something
set time = $piece($horolog,",",2)
while(($piece($horolog,",",2)-time) < 30) //30 seconds
    set ^A($System.Util.CreateGUID()) = rec
}

I have notified the following : 

  • CPU usage does not reach 100% on a single core (eg: 25% of total CPU usage should be seen on a 4 cores system). Instead, much lower CPU usage is shown (with some drops to 0% from time to time). It looks like process has to wait for I/O completion before proceeding. Removing the set statement in the while loop above (only keeping CreateGUID) allow to reach 100% single core usage.
  • In Process Monitor, it writes to the database using mostly 8KB blocks. Even if database is defined to use 8KB blocks, IRIS is usually able to batch multiple writes at same time (giving a better performance). The write to WIJ is done using 256KB blocks (as expected). 

I have another IRIS server with exact same specs, and it behave as expected : 

  • CPU reach 100% usage on one core (as expected).
  • It performs writes using blocks bigger than 8KB, followed by a lot of writes using 512KB blocks at the end (before another journal / database cycle occurs). The slow server does not have such 512KB writes.

What I have tried :

  • restarting the IRIS instance
  • using another database (like TEMP). One theory is that "slow" database would be heavily fragmented but it's not the case.
  • killing the ^A global before or writing to another global

Additional info : 

  • databases are the same : encrypted, 8KB block size. Both have similar size (around 1.5TB) and free space left (a few percent). It should be plenty of space to store ^A nodes (not expansion needed).
  • journaling is enabled on both systems.
  • global buffer cache : both are set to 4095MB
  • both systems are using Windows Server 2012 and Hyper-V. CPU frequency is similar.
  • both systems are using Sophos. Maybe there is an exclusion rule for D:\ drive made for one of them. But AFAIK that should not explain the "8KB only" writes.
9 Comments
Discussion (9)4
Log in or sign up to continue
Question
· May 20

%Stream files do not get purged from the <namespace>/stream/ folder on the system disk

As the title says, I've noticed that files that gets saved to the disk where the database lies (.DAT file) in the stream directory, does not get purged. Is this expected and do we need to create our own schedule task to clean this folder up?

I could only find old answers that say this, however I find it a bit odd if that is the case because they are considered temporary files. Perhaps I do not handle the streams correctly in the code?

set fileToDisk = ##class(%Stream.FileBinary).%New()
set fileToDisk.Filename = fileDestinationPath
set copyStatus = fileToDisk.CopyFromAndSave(response.Data)
7 Comments
Discussion (7)3
Log in or sign up to continue