Article
· Mar 21 2m read

VECTOR inside IRIS

This is an attempt to run a vector search demo completely in IRIS
There are no external tools and all you need is a Terminal / Console and the management portal.
Special thanks to Alvin Ryanputra as his package iris-vector-search that was the base
of inspiration and the source for test data.
My package is based on IRIS 2024.1 release and requires attention to your processor capabilities.

I attempted to write the demo in pure ObjectScript.
Only the calculation of the description_vectoris done in embedded Python

Calculation of a vector with 384 dimensions over 2247 records takes time.
In my Docker container, it was running 01:53:14 to generate them completely.

You have been warned!
So I adjusted this step to be reentrant to allow pausing vector calculation.
Every 50 records you get an offer to have a stop. 

The demo looks like this:

USER>do ^A.DemoV

      Test Vector Search
=============================
     1 - Initialize Tables
     2 - Generate Data
     3 - VECTOR_COSINE
     4 - VECTOR_DOT_PRODUCT
     5 - Create Scotch
     6 - Load Scotch.csv
     7 - generate VECTORs
     8 - VECTOR search
Select Function or * to exit : 8

      Default search:
Let's look for TOP 3 scotch that costs less than $100,
 and has an earthy and creamy taste
     change price limit [100]: 50
     change phrase [earthy and creamy taste]: earthy 

 calculating search vector
  
     Total below $50: 222 

ID      price   name
1990    40      Wemyss Vintage Malts 'The Peat Chimney,' 8 year old, 40%
1785    39      The Famous Jubilee, 40%
1868    40      Tomatin, 15 year old, 43%
2038    45      Glen Grant, 10 year old, 43%
1733    29      Isle of Skye, 8 year old, 43% 5 Rows(s) Affected


- You see the basic functionalities of Vectors in steps 1..4
- Steps 5..8 are related to the search example I borrowed from Alvin
- Step 6 (import of test data) is straight ObjectScript
  SQL LOAD DATA was far too sensible for irregularities in the input CSV

I suggest following the examples also in MGMT portal to watch how Vectors operate.

GitHub
 

Discussion (11)5
Log in or sign up to continue

Here is the documentation for TO_VECTOR, https://docs.intersystems.com/irislatest/csp/docbook/DocBook.UI.Page.cls...

The Vector datatype documentation is below, but doesn't necessarily help a lot. I think this is because you will typically need a Python library, like sentence_transformers used in iris-vector-search, to generate useful vectors.

https://docs.intersystems.com/iris20241/csp/documatic/%25CSP.Documatic.c...

experimenting with class %Library.Vector I found an unattractive way:

;; compose JSON array  >> v
USER>zw v
v=[($double(.5)),($double(1.5)),($double(2.2000000000000001776))]  ; <DYNAMIC ARRAY>
USER>set vec=##class(%Vector).OdbcToLogical(v)
 
USER>zw vec
vec={"type":"double", "count":3, "length":3, "vector":[$double(.5),$double(1.5),$double(2.2000000000000001776)]}  ; <VECTOR>

Applying OdbcToLogical  was really shocking

Here's how I was able to use LOAD DATA.  First I ran into the issue of commas one field, in otherwise csv data.
So, I edited the data file with vi and changed all the delimiters to the '|' symbol using this vi command. :1,$ s/","/"|"/g
Then using SQL for the create table,
CREATE TABLE scotch_reviews ( name VARCHAR(255),
            category VARCHAR(255),
            review_point INT,
            price DOUBLE,
            description VARCHAR(2000),
            description_vector VECTOR(DOUBLE, 384))
Then using LOAD DATA:
LOAD BULK DATA FROM FILE 'scotch_reviews.tbl'
    INTO scotch_reviews (name, category, review_point, price, description)
    USING '{ "from": {"file": {"columnseparator":"|"} } }'
And it worked.