something odd is going on with that new_embedding_str parameter value you're adding. Rather than taking the value, it's taking the parameter name, hence the "field not found" error. Try removing that * before parameters in your call to exec(), or just inline the parameter value into the statement as you're not reusing it anyway.

Hi @Vadim Aniskin ,

while putting together a sample of the new feature, I found out that very unfortunately this change did not make it into 2024.1.0 after all. It passed our internal testing a few months ago and was promoted through project and integration branches using our existing automation, and as such was added to the draft list of features we wanted to describe in the release notes. However, because of overlapping changes it did not get promoted into the main release branch automatically (as those earlier steps), and ended up in a manual queue. That took a little longer than expected, and I did not perform a final check before publishing the draft release notes. To my knowledge, this is the first time we had a fully-greenlit feature miss out on the automation, but that's no excuse and we've learned to do a manual check right before release. 

In short, this change is now on its way to 2024.2, and the 2024.1 release notes will be updated shortly.

I'm sorry about the confusion this may have caused,
benjamin

Hi @Iryna Mykhailova , I'm sorry your students had a bad experience setting up a deployment in the preview environment. We have indeed found some glitches along the way, and have prioritized fixing them in the main code branch that's on its way to GA, rather than patching the Developer Access Program environment.

Great to hear though you're thinking of promoting this to your students. When we are GA, it would be great to see them test this out, not least the Cloud IntegratedML piece that's quite unique to IRIS.

as @Ben Spead pointed out, we are currently having issues with eu-central-1. In fact, the Developer Access Program should only have shown the us-east-1 region but at some point in syncing with the portal for our GA cloud services that option slipped back in. This said, the DAP environment is a preview system and we're getting close to releasing a GA version of InterSystems IRIS Cloud SQL and Cloud IntegratedML, based on feedback and experiences from that preview, including those at the hackathon.

indeed, indices for vectors are not straightforward at all. Even though our %FunctionalIndex structure allows you to hook into filing and build pretty much any data structure you want, leveraging it in SQL is hard because the corresponding %SQL.AbstractFind is for filtering (in a WHERE clause) and not a good fit for the combination of filtering and ranking that is the common expectation when doing a vector search.

Both the indexing techniques and a proper fit in a relational model are the subject of ongoing academic research. Other vendors such as SingleStore have focused on ensuring the dot product (or other distance function) can be executed very efficiently so they just need to throw a lot of compute at it to make up for the lack of an index.

Hi, I don't think CTEs would help here, as you still should make sure to include the fields required for the Condition2-5.  I think @Luis Angel Pérez Ramos suggestion is the right way to go, using a JOIN and then CASE statements in the COUNT. If you can provide more details on the two tables and exact conditions, I'm sure we can help with the actual query you'll need.

CTEs are also mostly there for readability and wouldn't impact query performance by themselves. It's worth checking out if there aren't any opportunities for indices to speed up the JOIN and Condition1 parts.

All this said, we are planning to add CTE support to IRIS SQL in the near term.

@Alex Woodhead @Renan Lourenco , I was trying to hook this up with a local LLM (GPT4All), but am having no luck. When I just run it by default, the {table_info} it feeds to the prompt makes the prompt too large for my (cheapo?) LLM. But when I try to make it only look in my application's schema by using the corresponding SQLDatabase constructor option, the SQLAlchemy driver tries to run a SET search_path = MySchemaName command, which is not supported and fails as well. Simply taking out the table info means it'll just try without schema names and that doesn't work for me either, unfortunately.

Is this anything you've run into and found a handy workaround for?

Great article Chad!

FWIW, we're working on a faster version of ^%GSIZE (with an API more fitting the current century ;-) ) that uses stochastic sampling similar to the faster table stats gathering introduced in 2021.2? I'll also take the opportunity for a shameless plug of my SQL utilities package, which incorporates much of what's described in this article and will take advantage of that faster global size estimator as soon as it's released.