Hi @Iryna Mykhailova , I'm sorry your students had a bad experience setting up a deployment in the preview environment. We have indeed found some glitches along the way, and have prioritized fixing them in the main code branch that's on its way to GA, rather than patching the Developer Access Program environment.

Great to hear though you're thinking of promoting this to your students. When we are GA, it would be great to see them test this out, not least the Cloud IntegratedML piece that's quite unique to IRIS.

as @Ben Spead pointed out, we are currently having issues with eu-central-1. In fact, the Developer Access Program should only have shown the us-east-1 region but at some point in syncing with the portal for our GA cloud services that option slipped back in. This said, the DAP environment is a preview system and we're getting close to releasing a GA version of InterSystems IRIS Cloud SQL and Cloud IntegratedML, based on feedback and experiences from that preview, including those at the hackathon.

indeed, indices for vectors are not straightforward at all. Even though our %FunctionalIndex structure allows you to hook into filing and build pretty much any data structure you want, leveraging it in SQL is hard because the corresponding %SQL.AbstractFind is for filtering (in a WHERE clause) and not a good fit for the combination of filtering and ranking that is the common expectation when doing a vector search.

Both the indexing techniques and a proper fit in a relational model are the subject of ongoing academic research. Other vendors such as SingleStore have focused on ensuring the dot product (or other distance function) can be executed very efficiently so they just need to throw a lot of compute at it to make up for the lack of an index.

Hi, I don't think CTEs would help here, as you still should make sure to include the fields required for the Condition2-5.  I think @Luis Angel Pérez Ramos suggestion is the right way to go, using a JOIN and then CASE statements in the COUNT. If you can provide more details on the two tables and exact conditions, I'm sure we can help with the actual query you'll need.

CTEs are also mostly there for readability and wouldn't impact query performance by themselves. It's worth checking out if there aren't any opportunities for indices to speed up the JOIN and Condition1 parts.

All this said, we are planning to add CTE support to IRIS SQL in the near term.

@Alex Woodhead @Renan Lourenco , I was trying to hook this up with a local LLM (GPT4All), but am having no luck. When I just run it by default, the {table_info} it feeds to the prompt makes the prompt too large for my (cheapo?) LLM. But when I try to make it only look in my application's schema by using the corresponding SQLDatabase constructor option, the SQLAlchemy driver tries to run a SET search_path = MySchemaName command, which is not supported and fails as well. Simply taking out the table info means it'll just try without schema names and that doesn't work for me either, unfortunately.

Is this anything you've run into and found a handy workaround for?

Great article Chad!

FWIW, we're working on a faster version of ^%GSIZE (with an API more fitting the current century ;-) ) that uses stochastic sampling similar to the faster table stats gathering introduced in 2021.2? I'll also take the opportunity for a shameless plug of my SQL utilities package, which incorporates much of what's described in this article and will take advantage of that faster global size estimator as soon as it's released.

Hi @Lorenzo Scalese , you're spot on with those caveats about stats currently being part of the code. I wrote this article on that subject a while ago, and you'll see at the end that we're planning to move these stats into the extent to live where they belong: with the data rather than the code. 

Just one note: while AutoTune tries to make sure you have at least some stats before planning a first query to a new table (when fast block sampling is available), nothing prevents you from gathering table stats again after your table is fully loaded, after a big update, in a post-install script, ...