Hi @Iryna Mykhailova , I'm sorry your students had a bad experience setting up a deployment in the preview environment. We have indeed found some glitches along the way, and have prioritized fixing them in the main code branch that's on its way to GA, rather than patching the Developer Access Program environment.

Great to hear though you're thinking of promoting this to your students. When we are GA, it would be great to see them test this out, not least the Cloud IntegratedML piece that's quite unique to IRIS.

as @Ben Spead pointed out, we are currently having issues with eu-central-1. In fact, the Developer Access Program should only have shown the us-east-1 region but at some point in syncing with the portal for our GA cloud services that option slipped back in. This said, the DAP environment is a preview system and we're getting close to releasing a GA version of InterSystems IRIS Cloud SQL and Cloud IntegratedML, based on feedback and experiences from that preview, including those at the hackathon.

indeed, indices for vectors are not straightforward at all. Even though our %FunctionalIndex structure allows you to hook into filing and build pretty much any data structure you want, leveraging it in SQL is hard because the corresponding %SQL.AbstractFind is for filtering (in a WHERE clause) and not a good fit for the combination of filtering and ranking that is the common expectation when doing a vector search.

Both the indexing techniques and a proper fit in a relational model are the subject of ongoing academic research. Other vendors such as SingleStore have focused on ensuring the dot product (or other distance function) can be executed very efficiently so they just need to throw a lot of compute at it to make up for the lack of an index.

Hi, I don't think CTEs would help here, as you still should make sure to include the fields required for the Condition2-5.  I think @Luis Angel Pérez Ramos suggestion is the right way to go, using a JOIN and then CASE statements in the COUNT. If you can provide more details on the two tables and exact conditions, I'm sure we can help with the actual query you'll need.

CTEs are also mostly there for readability and wouldn't impact query performance by themselves. It's worth checking out if there aren't any opportunities for indices to speed up the JOIN and Condition1 parts.

All this said, we are planning to add CTE support to IRIS SQL in the near term.

@Alex Woodhead @Renan Lourenco , I was trying to hook this up with a local LLM (GPT4All), but am having no luck. When I just run it by default, the {table_info} it feeds to the prompt makes the prompt too large for my (cheapo?) LLM. But when I try to make it only look in my application's schema by using the corresponding SQLDatabase constructor option, the SQLAlchemy driver tries to run a SET search_path = MySchemaName command, which is not supported and fails as well. Simply taking out the table info means it'll just try without schema names and that doesn't work for me either, unfortunately.

Is this anything you've run into and found a handy workaround for?

Great article Chad!

FWIW, we're working on a faster version of ^%GSIZE (with an API more fitting the current century ;-) ) that uses stochastic sampling similar to the faster table stats gathering introduced in 2021.2? I'll also take the opportunity for a shameless plug of my SQL utilities package, which incorporates much of what's described in this article and will take advantage of that faster global size estimator as soon as it's released.

Hi @Lorenzo Scalese , you're spot on with those caveats about stats currently being part of the code. I wrote this article on that subject a while ago, and you'll see at the end that we're planning to move these stats into the extent to live where they belong: with the data rather than the code. 

Just one note: while AutoTune tries to make sure you have at least some stats before planning a first query to a new table (when fast block sampling is available), nothing prevents you from gathering table stats again after your table is fully loaded, after a big update, in a post-install script, ...

Hi @Pravin Barton , that's actually a great question and I was wondering whether or not to add a note about that in the article, but then thought it might distract from the SQL focus.

IRIS BI indeed pre-computes aggregates and builds an array of bitmap indices for lightning-fast filtering and partitioning of MDX query results. MDX offers a syntax that lends itself well for analytics, with the structure of a query natively thinking in terms of aggregates rather than rows, so it's targeting the same kinds of analytical questions as Columnar Storage. IRIS BI comes with a UI surface as well, with Analyzer for ad-hoc querying (you could say it's a visual MDX query builder) and a dashboard component that is meant to be easily embeddable in your applications. For SQL, there are plenty of third-party BI tools offering eye candy of a sweetness your dentist couldn't possibly approve.

I'd say the main difference between the two, ignoring that UI aspect for a second, is that Columnar Storage applies more upstream, acting as the storage model for your master copy of the data, where IRIS BI is by definition an analytics-optimized copy of that data. If your database exists for the purpose of analytics (e.g. a data warehouse), you could as well have the master copy use columnar storage outright and enjoy the flexibility of SQL on the query / UI side. If you're doing analytics on operational data, the application schema may be a little more normalized and not a great fit for Columnar Storage or direct querying, which is where IRIS BI offers an easy way to create an analytical model with measures and dimensions, and populate that automatically using cube build / sync (those are just buttons, not ETL flows you have to code!).

Two more points that'll hopefully not add too much confusion:

We'll be looking into ways how IRIS BI can take advantage of some of the foundational elements of Columnar Storage, namely the new $vector internal datatype we introduced. We expect it'll be a great fit for your cube measures as it'll work well with the existing bitmaps we build for dimensions. The R&D for that is planned later this year, after we've released production readiness for the SQL side in 2023.1, and after that we'll happily brag about the gains (and document tradeoffs) here :-).

Columnar Indices are secondary storage for table data where the primary storage uses row format. That makes them very similar in purpose to IRIS BI in the way I described it earlier, so when to use what?

  • First there is the general schema layout aspect: if your application schema is in full third normal form, the columnar index' benefits may not outweigh the cost of a five-way JOIN to get the relevant data together. The IRIS BI cube infrastructure is better suited to deal with that.
  • A table's indices are maintained with every INSERT, UPDATE or DELETE statement, which means the overhead to build your analytical-query-accelerator now needs to be paid when you're filing rows in the table, rather than when you press the build / sync button on your cube (which could be in a trigger too, of course). 
  • But probably the most important question is how you'd like to query it. MDX can look a bit daunting at first, but the IRIS BI infrastructure and embeddability has been used successfully by many of our customers. SQL, on the other hand, is more widely understood by data analysts and their client BI tools

In any case, the proof of the pudding is in the eating. Adding a columnar index and giving your queries a quick try is really simple, and your InterSystems account team (in your case we're one another's account team ;-) ) is happy to help with that.

%SYS.Audit already projects to each namespace, but the data is in a privileged database and requires %Admin_Secure:USE privileges, so better not try to circumvent / confuse that through mappings.

I would not create additional temp tables or mappings to try and cling to SQL as that is meant to run in a single namespace as a query language. This sounds more like a procedural thing for ObjectScript, so a stored procedure that loops through the Production namespaces, retrieves all config items in a local variable and then uses that for querying the audit table after switching to %SYS sounds more appropriate. Rather than a PPG, I'd just collect it in the qHandle local variable passed between Execute() and Fetch() methods for your custom query, unless it risks getting too big