Written by

Question Kim Trieu · Mar 26, 2024

VECTOR_COSINE() error help ObjectScript error: <PYTHON EXCEPTION> *<class 'OSError'>: isc_stdout_write: PyArg_ParseTuple failed!

Using VECTOR_COSINE() in SQL query to perform a text similarity search on existing embeddings in a %VECTOR column.

Code is below.

Commented out sql query returns this error: SQLCODE: -29  Field 'NEW_EMBEDDING_STR' not found in the applicable tables^ SELECT TOP ? maxID , activity , outcome FROMMain .AITest ORDER BY VECTOR_COSINE ( new_embedding_str ,

Sql query as written returns ERROR #5002: ObjectScript error: <PYTHON EXCEPTION> *<class 'OSError'>: isc_stdout_write: PyArg_ParseTuple failed!

Table column properties:

Property MaxID As %Integer;

Property Activity As %String(MAXLEN = "");

Property Outcome As %String(MAXLEN = "");

Property Embeddings As %Vector;

Property Total As %String;

Property WorkflowIDs As %String;
ClassMethod GetSimilarity(newWorkflow As %String) [ Language = python, SqlProc ]
{
    import iris

    try:
        libraries = iris.cls('Main.AITest').ImportLibraries()
        if libraries:
            SentenceTransformer = libraries[0]
            model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
        else:
            print("Error: No libraries found.")
            return None

        new_embedding = model.encode(newWorkflow)
        new_embedding_str = ', '.join(map(str, new_embedding))

        sql = "select maxID, activity, outcome from Main.AITest order by VECTOR_COSINE(Embeddings, TO_VECTOR(?, double))"
        # Also tried sql = "select maxID, activity, outcome from Main.AITest order by VECTOR_COSINE(?, TO_VECTOR(Embeddings,         double))"
        params = [new_embedding_str]
        rs = iris.sql.exec(sql, *params)

        if rs.ResultSet._SQLCODE < 0:
            print(f"Loop failed with SQLCODE: {rs.ResultSet._SQLCODE} {rs.ResultSet._Message}")
        elif rs.ResultSet._SQLCODE == 100:
            print("Finished")

    except Exception as e:
            print(f"SQLCODE: {e.sqlcode} {e.message}")
        else:
            print(e)
}
ClassMethod UploadEmbedding(embedding, maxID As %Integer) [ Language = python, SqlProc ]
{
    import iris
    try:
        sql = "UPDATE Main.AITest SET embeddings = (TO_VECTOR(?,double)) WHERE MaxID = ?"
        params = [embedding, maxID]
        iris.sql.exec(sql, *params)

    except Exception as e:
            print(f"SQLCODE: {e.sqlcode} {e.message}")
        else:
            print(e)
}
ClassMethod CreateEmbeddings() [ Language = python, SqlProc ]
{
    import iris
    try:
        libraries = iris.cls('Main.AITest').ImportLibraries()

        if libraries:
            SentenceTransformer = libraries[0]
            model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

        else:
            print("Error: No libraries found.")
            return None

        select = "select activity, outcome, maxID from Main.AITest"
        where = ""
        condition1 = condition2 = ""
        orderby = ""
        sql = select + " " + where + " " + condition1 + " " + condition2 + " " + orderby

        rs = iris.sql.exec(sql)

        for idx, row in enumerate(rs):

            workflow = row[0] + " " + row[1]
            maxID = row[2]
            embedding = model.encode(workflow)
            embedding_string = ', '.join(map(str, embedding))

            iris.cls('Main.AITest').UploadEmbedding(embedding_string, maxID)

        if rs.ResultSet._SQLCODE < 0:
            print(f"Loop failed with SQLCODE: {rs.ResultSet._SQLCODE} {rs.ResultSet._Message}")
        elif rs.ResultSet._SQLCODE == 100:
            print("Finished")

    except Exception as e:
            print(f"SQLCODE: {e.sqlcode} {e.message}")

        else:
            print(e)
}
Product version: IRIS 2023.3
$ZV: IRIS 2024.1.0.263.0

Comments

Robert Cemper · Mar 26, 2024

I use a similar setup in my example in OEX ,
Especially with TO_VECTOR() I ran into problems if I didn't add the explicit
type and size of the vector.  e.g. TO_VECTOR("..........",,DOUBLE,384)
as model all-MiniLM-L6-v2  produces exactly that type and size:
Don't ask for why. I just used the proposed example

0
Kim Trieu  Mar 26, 2024 to Robert Cemper

Thanks, I don't seem to have any issues with TO_VECTOR() and adding the embedding(s) to the %vector column as long as I add the comma as a separator in the array. The issue seems to be with using VECTOR_COSINE() and VECTOR_DOT_PRODUCT() in the sql query. Is there any other way of retrieving the embeddings from the %vector column to calculate the cosine similarity?

0
Benjamin De Boe · Mar 27, 2024

something odd is going on with that new_embedding_str parameter value you're adding. Rather than taking the value, it's taking the parameter name, hence the "field not found" error. Try removing that * before parameters in your call to exec(), or just inline the parameter value into the statement as you're not reusing it anyway.

0
Robert Cemper  Mar 27, 2024 to Benjamin De Boe

Sounds promising:
in my COS example, I had to use pure String parameters like  "0.54456,0.6987646,0.I123" 
happened by accident, more for reading and debugging

0
Kim Trieu  Mar 27, 2024 to Robert Cemper

Yep, this bypasses the parameter issue Benjamin described, but I still get this error. ObjectScript error: <PYTHON EXCEPTION> *<class 'OSError'>: isc_stdout_write: PyArg_ParseTuple failed!

0
Kim Trieu  Mar 27, 2024 to Benjamin De Boe

Thanks for responding, I tried your suggestions, unfortunately I still get a sql error.

"SQLCODE: -29  Field 'NEW_EMBEDDING_STR' not found in the applicable tables^ SELECT TOP ? maxID , activity , outcome FROM Main . AITest ORDER BY VECTOR_COSINE ( Embeddings , new_embedding_str )
 Field 'NEW_EMBEDDING_STR' not found in the applicable tables^ SELECT TOP ? maxID , activity , outcome FROM Main . AITest ORDER BY VECTOR_COSINE ( Embeddings , new_embedding_str )"

0
Robert Cemper  Mar 27, 2024 to Kim Trieu

By SQL priniciple:     
any straight name is meant to be a COLUMN of the actual table.  eg. Embeddings
but an external HostVariable is designated by an initial : (colon)  e.g.  : new_embedding_str or :params

This seems to be the deeper cause your problem

in addition I'd suggest to add  print(new_embedding_str)  to verify its format. (one of my problems.

Furthermore, take a closer look to this example:
iris-vector-search/blob/main/demo/sql_demo.ipynb

0
Kim Trieu  Mar 27, 2024 to Robert Cemper

Thanks! Here is the result of print(new_embedding_str): -0.012367633, 0.08229666, -0.045526657, -0.021790344, -0.09783857, 0.011233694, 0.11478377, 0.015476992, 0.066593476, -0.028646931, -0.00035714556, -0.020890323, -0.014966787, -0.009651463, -0.04823198, 0.02970142, -0.03654133, -0.0733902, -0.077465616, -0.007498019, 0.007159351, 0.115939245, -0.053096283, -0.0019092769, 0.03061681, 0.060498957, -0.0109981615, 0.05554405, 0.06436163, 0.0010217681, 0.008984506, -0.01909181, 0.049410313, -0.0046289866, -0.0002824377, 0.043872904, -0.03933238, 0.0070418958, 0.04450893, 0.0057665105, 0.029787099, -0.106896654, -0.010929529, 0.019587921, -0.031226052, 0.054222684, -0.053108577, 0.023915643, 0.016592344, -0.045376964, 0.056992017, -0.074442, -0.0842688, -0.031837333, 0.06094658, 0.069696166, 0.016502133, 0.006163614, 0.042928968, 0.059463773, 0.051030602, -0.0009090814, -0.018656455, 0.050873574, 0.04656456, -0.04863924, 0.0071582315, -0.05078049, -0.008812764, -0.09778179, -0.028910201, 0.0030154034, -0.007890963, 0.09406418, 0.012195868, -0.0032306092, 0.029213682, -0.08315854, 0.099823065, 0.14098983, -0.034719538, -0.069456525, -0.01513024, 0.042170536, -0.013932263, 0.06428065, 0.018271074, -0.0077171763, -0.03326928, -0.018273575, -0.05889311, -0.039047785, 0.031104289, -0.008865226, -0.035700947, 0.039564047, -0.060396545, 0.009985891, -0.07055272, 0.1288366, 0.01864772, 0.0060013114, 0.066211775, -0.021810597, -0.07669776, -0.049433775, 0.010740815, 0.035635687, 0.053287655, -0.068942666, -0.056721188, -0.02227402, -0.040967852, -0.0016006116, -0.061532926, -0.033666585, -0.056064934, 0.03403046, 0.046199817, 0.06698784, 0.060698483, -0.058410134, -0.07579008, -0.018291337, -0.06771481, -0.058569603, 0.033723082, -7.52444e-33, 0.009717529, 0.031346004, 0.07498145, -0.06071525, 0.0041581895, 0.0024989324, -0.10493763, 0.022117518, -0.015239807, -0.048102356, -0.043640785, -0.06609356, 0.007661355, -0.017238742, 0.06961938, -0.061138105, 0.050104957, 0.04784467, 0.0031345962, -0.020252144, -0.06554345, 0.1011566, 0.04156833, -0.10441905, -0.05407763, -0.037749078, 0.009589713, -0.046128895, 0.05732413, 0.022369683, 0.009047337, -0.052184585, 0.003846389, -0.015119601, 0.01722195, -0.040975425, 0.034113463, -0.03093567, -0.006747672, 0.0008930639, -0.015441975, -0.06361249, 0.0036032437, 0.03340732, 0.03015114, 0.033482574, 0.0052873297, -0.0019037911, -0.06700053, -0.020265548, 0.044921134, 0.0075110453, 0.043736465, 0.02989835, -0.001148476, -0.0101042045, 0.049161088, 0.024709491, -0.09946261, 0.0063731205, -0.030562524, 0.061343204, 0.030892957, 0.018738467, -0.026431091, 0.0060403524, 0.045076493, -0.032179367, 0.0555283, 0.025604432, -0.08719626, 0.012434642, 0.023987422, 0.034689777, -0.049034715, -0.055082064, -0.0054916176, 0.011287644, 0.027124986, 0.017873017, -0.103416376, -0.089201406, 0.020051159, 0.038698107, -0.022287738, 0.008249157, -0.029930592, -0.1644036, 0.028224042, -0.019334221, -0.08657236, -0.05819561, -0.09543146, -0.023918334, 0.077383414, 3.4810153e-33, 0.020336987, 0.05965215, -0.069160566, 0.04744932, 0.10784883, -0.0018692117, 0.046611253, 0.061773244, -0.061153416, 0.051496595, -0.0023423578, 0.028044755, 0.020787673, -0.012962601, 0.10118507, 0.05530823, 0.014362301, 0.030259274, 0.035992704, -0.021907462, -0.03618004, 0.036039226, -0.03216552, -0.0009927913, -0.01762192, 0.059019443, 0.092808425, -0.0886788, 0.02172343, 0.05927173, -0.05763399, -0.058060993, 0.06465574, 0.020360274, -0.11098842, 0.039607946, 0.15205052, -0.041188046, -0.026513787, -0.016015524, 0.026130088, 0.06341589, 0.07706007, 0.10055258, -0.049063258, -0.030709844, 0.0024071054, 0.037557274, 0.0134173455, 0.04175756, 0.014264862, -0.042392667, 0.02398161, -0.012472425, -0.028243985, 0.05818705, -0.13230394, -0.00079989113, -0.018629689, 0.016585136, -0.066821516, 0.001377592, -0.014958773, 0.024381585, 0.042036515, -0.0039283433, -0.03435791, -0.030376079, -0.053603675, -0.09048265, 0.097804986, 0.0257748, -0.13503976, -0.029972358, -0.0062397476, -0.07655412, -0.028073613, -0.002961969, -0.02508737, 0.0697685, 0.052836604, 0.008967755, 0.019639365, 0.023614386, 0.009734772, -0.07505753, -0.0038914427, 0.11299017, -0.047973536, 0.031216266, 0.01108175, 0.0918755, -0.04206866, 0.055191476, -0.043200813, -1.5496786e-08, 0.023434758, -0.009669568, -0.060132094, -0.03594698, 0.02324335, 0.094348945, 0.020180248, -0.14573996, -0.014434391, -0.022762574, 0.027893221, -0.009096078, -0.06613074, -0.015954193, 0.024300676, -0.013187228, -0.053018987, -0.031342704, -0.039301537, 0.06303767, -0.013777301, -0.016790744, -0.036375914, -0.039216142, -0.02706131, 0.02465825, -0.038558148, 0.12131968, 0.009228927, 0.059315138, 0.061568566, 0.1151985, -0.024498144, -0.02873189, -0.087456346, 0.06311445, 0.018630201, -0.038958035, 0.059134826, -0.003913152, 0.05524228, 0.03242394, -0.0332109, 0.07078217, -0.029175876, -0.010001539, 0.02611168, -0.059106592, -0.04523128, -0.01344679, 0.009532376, -0.0101005845, 0.059452664, 0.01845829, -0.018972963, -0.03161491, -0.027005812, 0.020496355, -0.07044746, 0.06034374, 0.042089693, -0.031130666, 0.09647217, 0.0023168556

This is also the format the embeddings are stored (1 embedding per row) in the vector database. The main difference between what is printed here and what I am familiar with when using Jupyter Notebook/Google Colab is the array isn't wrapped in square brackets and it hasn't been converted to a numpy array format but I don't think that is the issue here...?

0
Enrico Parisi · Mar 27, 2024

I'm no expert in vectors so I might be wrong, however it seems to me that the problem is not in the VECTOR_COSINE() function, instead it seems to me the problem is in the TO_VECTOR() function inside VECTOR_COSINE().

The documentation for the TO_VECTOR() function for the first argument (data) says:

If you are using TO_VECTOR in Dynamic SQL, other data types are accepted. The allowed data types are determined by the SelectMode. If the SelectMode is ODBC or Display, the data argument may be passed in as a DynamicArray, as well as a string. If the SelectMode is Logical, the data argument must be entered as a $vector.

In this case the SelectMode is Logical and it does not seems to me that the TO_VECTOR() first argument is passed as $vector.

0
Kim Trieu  Mar 27, 2024 to Enrico Parisi

Thank you! I changed my sql query such that  both arguments inside the VECTOR_COSINE() function should be $vector. i.e. sql = "select top 3 maxID, activity, outcome from Main.KimAITest order by VECTOR_COSINE(TO_VECTOR(Embeddings, double), TO_VECTOR(:newembeddingstr, double)) desc" I no longer see the same errors as previously mentioned, but now the for loop on the result set is returning an error. 

Loop failed with SQLCODE: -400 ObjectScript error: <ZVOPL>%0AmBs1^%sqlcq.USER.cls151.1

The same for loop works when the sql query does not include the VECTOR_COSINE() function.

0