Question
· Mar 26

VECTOR_COSINE() error help ObjectScript error: <PYTHON EXCEPTION> *<class 'OSError'>: isc_stdout_write: PyArg_ParseTuple failed!

Using VECTOR_COSINE() in SQL query to perform a text similarity search on existing embeddings in a %VECTOR column.

Code is below.

Commented out sql query returns this error: SQLCODE: -29  Field 'NEW_EMBEDDING_STR' not found in the applicable tables^ SELECT TOP ? maxID , activity , outcome FROMMain .AITest ORDER BY VECTOR_COSINE ( new_embedding_str ,

Sql query as written returns ERROR #5002: ObjectScript error: <PYTHON EXCEPTION> *<class 'OSError'>: isc_stdout_write: PyArg_ParseTuple failed!

Table column properties:

Property MaxID As %Integer;

Property Activity As %String(MAXLEN = "");

Property Outcome As %String(MAXLEN = "");

Property Embeddings As %Vector;

Property Total As %String;

Property WorkflowIDs As %String;
ClassMethod GetSimilarity(newWorkflow As %String) [ Language = python, SqlProc ]
{
    import iris

    try:
        libraries = iris.cls('Main.AITest').ImportLibraries()
        if libraries:
            SentenceTransformer = libraries[0]
            model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
        else:
            print("Error: No libraries found.")
            return None

        new_embedding = model.encode(newWorkflow)
        new_embedding_str = ', '.join(map(str, new_embedding))

        sql = "select maxID, activity, outcome from Main.AITest order by VECTOR_COSINE(Embeddings, TO_VECTOR(?, double))"
        # Also tried sql = "select maxID, activity, outcome from Main.AITest order by VECTOR_COSINE(?, TO_VECTOR(Embeddings,         double))"
        params = [new_embedding_str]
        rs = iris.sql.exec(sql, *params)

        if rs.ResultSet._SQLCODE < 0:
            print(f"Loop failed with SQLCODE: {rs.ResultSet._SQLCODE} {rs.ResultSet._Message}")
        elif rs.ResultSet._SQLCODE == 100:
            print("Finished")

    except Exception as e:
            print(f"SQLCODE: {e.sqlcode} {e.message}")
        else:
            print(e)
}
ClassMethod UploadEmbedding(embedding, maxID As %Integer) [ Language = python, SqlProc ]
{
    import iris
    try:
        sql = "UPDATE Main.AITest SET embeddings = (TO_VECTOR(?,double)) WHERE MaxID = ?"
        params = [embedding, maxID]
        iris.sql.exec(sql, *params)

    except Exception as e:
            print(f"SQLCODE: {e.sqlcode} {e.message}")
        else:
            print(e)
}
ClassMethod CreateEmbeddings() [ Language = python, SqlProc ]
{
    import iris
    try:
        libraries = iris.cls('Main.AITest').ImportLibraries()

        if libraries:
            SentenceTransformer = libraries[0]
            model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

        else:
            print("Error: No libraries found.")
            return None

        select = "select activity, outcome, maxID from Main.AITest"
        where = ""
        condition1 = condition2 = ""
        orderby = ""
        sql = select + " " + where + " " + condition1 + " " + condition2 + " " + orderby

        rs = iris.sql.exec(sql)

        for idx, row in enumerate(rs):

            workflow = row[0] + " " + row[1]
            maxID = row[2]
            embedding = model.encode(workflow)
            embedding_string = ', '.join(map(str, embedding))

            iris.cls('Main.AITest').UploadEmbedding(embedding_string, maxID)

        if rs.ResultSet._SQLCODE < 0:
            print(f"Loop failed with SQLCODE: {rs.ResultSet._SQLCODE} {rs.ResultSet._Message}")
        elif rs.ResultSet._SQLCODE == 100:
            print("Finished")

    except Exception as e:
            print(f"SQLCODE: {e.sqlcode} {e.message}")

        else:
            print(e)
}
Product version: IRIS 2023.3
$ZV: IRIS 2024.1.0.263.0
Discussion (10)2
Log in or sign up to continue

Thanks, I don't seem to have any issues with TO_VECTOR() and adding the embedding(s) to the %vector column as long as I add the comma as a separator in the array. The issue seems to be with using VECTOR_COSINE() and VECTOR_DOT_PRODUCT() in the sql query. Is there any other way of retrieving the embeddings from the %vector column to calculate the cosine similarity?

something odd is going on with that new_embedding_str parameter value you're adding. Rather than taking the value, it's taking the parameter name, hence the "field not found" error. Try removing that * before parameters in your call to exec(), or just inline the parameter value into the statement as you're not reusing it anyway.

Thanks for responding, I tried your suggestions, unfortunately I still get a sql error.

"SQLCODE: -29  Field 'NEW_EMBEDDING_STR' not found in the applicable tables^ SELECT TOP ? maxID , activity , outcome FROM Main . AITest ORDER BY VECTOR_COSINE ( Embeddings , new_embedding_str )
 Field 'NEW_EMBEDDING_STR' not found in the applicable tables^ SELECT TOP ? maxID , activity , outcome FROM Main . AITest ORDER BY VECTOR_COSINE ( Embeddings , new_embedding_str )"

By SQL priniciple:     
any straight name is meant to be a COLUMN of the actual table.  eg. Embeddings
but an external HostVariable is designated by an initial : (colon)  e.g.  : new_embedding_str or :params

This seems to be the deeper cause your problem

in addition I'd suggest to add  print(new_embedding_str)  to verify its format. (one of my problems.

Furthermore, take a closer look to this example:
iris-vector-search/blob/main/demo/sql_demo.ipynb

Thanks! Here is the result of print(new_embedding_str): -0.012367633, 0.08229666, -0.045526657, -0.021790344, -0.09783857, 0.011233694, 0.11478377, 0.015476992, 0.066593476, -0.028646931, -0.00035714556, -0.020890323, -0.014966787, -0.009651463, -0.04823198, 0.02970142, -0.03654133, -0.0733902, -0.077465616, -0.007498019, 0.007159351, 0.115939245, -0.053096283, -0.0019092769, 0.03061681, 0.060498957, -0.0109981615, 0.05554405, 0.06436163, 0.0010217681, 0.008984506, -0.01909181, 0.049410313, -0.0046289866, -0.0002824377, 0.043872904, -0.03933238, 0.0070418958, 0.04450893, 0.0057665105, 0.029787099, -0.106896654, -0.010929529, 0.019587921, -0.031226052, 0.054222684, -0.053108577, 0.023915643, 0.016592344, -0.045376964, 0.056992017, -0.074442, -0.0842688, -0.031837333, 0.06094658, 0.069696166, 0.016502133, 0.006163614, 0.042928968, 0.059463773, 0.051030602, -0.0009090814, -0.018656455, 0.050873574, 0.04656456, -0.04863924, 0.0071582315, -0.05078049, -0.008812764, -0.09778179, -0.028910201, 0.0030154034, -0.007890963, 0.09406418, 0.012195868, -0.0032306092, 0.029213682, -0.08315854, 0.099823065, 0.14098983, -0.034719538, -0.069456525, -0.01513024, 0.042170536, -0.013932263, 0.06428065, 0.018271074, -0.0077171763, -0.03326928, -0.018273575, -0.05889311, -0.039047785, 0.031104289, -0.008865226, -0.035700947, 0.039564047, -0.060396545, 0.009985891, -0.07055272, 0.1288366, 0.01864772, 0.0060013114, 0.066211775, -0.021810597, -0.07669776, -0.049433775, 0.010740815, 0.035635687, 0.053287655, -0.068942666, -0.056721188, -0.02227402, -0.040967852, -0.0016006116, -0.061532926, -0.033666585, -0.056064934, 0.03403046, 0.046199817, 0.06698784, 0.060698483, -0.058410134, -0.07579008, -0.018291337, -0.06771481, -0.058569603, 0.033723082, -7.52444e-33, 0.009717529, 0.031346004, 0.07498145, -0.06071525, 0.0041581895, 0.0024989324, -0.10493763, 0.022117518, -0.015239807, -0.048102356, -0.043640785, -0.06609356, 0.007661355, -0.017238742, 0.06961938, -0.061138105, 0.050104957, 0.04784467, 0.0031345962, -0.020252144, -0.06554345, 0.1011566, 0.04156833, -0.10441905, -0.05407763, -0.037749078, 0.009589713, -0.046128895, 0.05732413, 0.022369683, 0.009047337, -0.052184585, 0.003846389, -0.015119601, 0.01722195, -0.040975425, 0.034113463, -0.03093567, -0.006747672, 0.0008930639, -0.015441975, -0.06361249, 0.0036032437, 0.03340732, 0.03015114, 0.033482574, 0.0052873297, -0.0019037911, -0.06700053, -0.020265548, 0.044921134, 0.0075110453, 0.043736465, 0.02989835, -0.001148476, -0.0101042045, 0.049161088, 0.024709491, -0.09946261, 0.0063731205, -0.030562524, 0.061343204, 0.030892957, 0.018738467, -0.026431091, 0.0060403524, 0.045076493, -0.032179367, 0.0555283, 0.025604432, -0.08719626, 0.012434642, 0.023987422, 0.034689777, -0.049034715, -0.055082064, -0.0054916176, 0.011287644, 0.027124986, 0.017873017, -0.103416376, -0.089201406, 0.020051159, 0.038698107, -0.022287738, 0.008249157, -0.029930592, -0.1644036, 0.028224042, -0.019334221, -0.08657236, -0.05819561, -0.09543146, -0.023918334, 0.077383414, 3.4810153e-33, 0.020336987, 0.05965215, -0.069160566, 0.04744932, 0.10784883, -0.0018692117, 0.046611253, 0.061773244, -0.061153416, 0.051496595, -0.0023423578, 0.028044755, 0.020787673, -0.012962601, 0.10118507, 0.05530823, 0.014362301, 0.030259274, 0.035992704, -0.021907462, -0.03618004, 0.036039226, -0.03216552, -0.0009927913, -0.01762192, 0.059019443, 0.092808425, -0.0886788, 0.02172343, 0.05927173, -0.05763399, -0.058060993, 0.06465574, 0.020360274, -0.11098842, 0.039607946, 0.15205052, -0.041188046, -0.026513787, -0.016015524, 0.026130088, 0.06341589, 0.07706007, 0.10055258, -0.049063258, -0.030709844, 0.0024071054, 0.037557274, 0.0134173455, 0.04175756, 0.014264862, -0.042392667, 0.02398161, -0.012472425, -0.028243985, 0.05818705, -0.13230394, -0.00079989113, -0.018629689, 0.016585136, -0.066821516, 0.001377592, -0.014958773, 0.024381585, 0.042036515, -0.0039283433, -0.03435791, -0.030376079, -0.053603675, -0.09048265, 0.097804986, 0.0257748, -0.13503976, -0.029972358, -0.0062397476, -0.07655412, -0.028073613, -0.002961969, -0.02508737, 0.0697685, 0.052836604, 0.008967755, 0.019639365, 0.023614386, 0.009734772, -0.07505753, -0.0038914427, 0.11299017, -0.047973536, 0.031216266, 0.01108175, 0.0918755, -0.04206866, 0.055191476, -0.043200813, -1.5496786e-08, 0.023434758, -0.009669568, -0.060132094, -0.03594698, 0.02324335, 0.094348945, 0.020180248, -0.14573996, -0.014434391, -0.022762574, 0.027893221, -0.009096078, -0.06613074, -0.015954193, 0.024300676, -0.013187228, -0.053018987, -0.031342704, -0.039301537, 0.06303767, -0.013777301, -0.016790744, -0.036375914, -0.039216142, -0.02706131, 0.02465825, -0.038558148, 0.12131968, 0.009228927, 0.059315138, 0.061568566, 0.1151985, -0.024498144, -0.02873189, -0.087456346, 0.06311445, 0.018630201, -0.038958035, 0.059134826, -0.003913152, 0.05524228, 0.03242394, -0.0332109, 0.07078217, -0.029175876, -0.010001539, 0.02611168, -0.059106592, -0.04523128, -0.01344679, 0.009532376, -0.0101005845, 0.059452664, 0.01845829, -0.018972963, -0.03161491, -0.027005812, 0.020496355, -0.07044746, 0.06034374, 0.042089693, -0.031130666, 0.09647217, 0.0023168556

This is also the format the embeddings are stored (1 embedding per row) in the vector database. The main difference between what is printed here and what I am familiar with when using Jupyter Notebook/Google Colab is the array isn't wrapped in square brackets and it hasn't been converted to a numpy array format but I don't think that is the issue here...?

I'm no expert in vectors so I might be wrong, however it seems to me that the problem is not in the VECTOR_COSINE() function, instead it seems to me the problem is in the TO_VECTOR() function inside VECTOR_COSINE().

The documentation for the TO_VECTOR() function for the first argument (data) says:

If you are using TO_VECTOR in Dynamic SQL, other data types are accepted. The allowed data types are determined by the SelectMode. If the SelectMode is ODBC or Display, the data argument may be passed in as a DynamicArray, as well as a string. If the SelectMode is Logical, the data argument must be entered as a $vector.

In this case the SelectMode is Logical and it does not seems to me that the TO_VECTOR() first argument is passed as $vector.

Thank you! I changed my sql query such that  both arguments inside the VECTOR_COSINE() function should be $vector. i.e. sql = "select top 3 maxID, activity, outcome from Main.KimAITest order by VECTOR_COSINE(TO_VECTOR(Embeddings, double), TO_VECTOR(:newembeddingstr, double)) desc" I no longer see the same errors as previously mentioned, but now the for loop on the result set is returning an error. 

Loop failed with SQLCODE: -400 ObjectScript error: <ZVOPL>%0AmBs1^%sqlcq.USER.cls151.1

The same for loop works when the sql query does not include the VECTOR_COSINE() function.