Replies by Benjamin De Boe for InterSystems Developer Community

Benjamin De Boe · Mar 18, 2024

Please open an iService ticket to request assistance, referencing your AWS account details.

Benjamin De Boe · Jan 25, 2024

has your Cloud SQL deployment been enabled for external connections? Please see https://docs.intersystems.com/services/csp/docbook/DocBook.UI.Page.cls?K...

go to post

Benjamin De Boe · Jan 21, 2024

FYI - the new Vector Search capability, and some other smaller items advertised in the release notes, are not yet part of this first developer preview. We're working hard to make them part of the next drop!

go to post

Benjamin De Boe · Jan 16, 2024

indeed. please see the bottom paragraph of the post above, and feel free to reach out to me directly if you have any specific questions about your use of the technology.

go to post

Benjamin De Boe · Nov 30, 2023

Hi @Iryna Mykhailova , I'm sorry your students had a bad experience setting up a deployment in the preview environment. We have indeed found some glitches along the way, and have prioritized fixing them in the main code branch that's on its way to GA, rather than patching the Developer Access Program environment.

Great to hear though you're thinking of promoting this to your students. When we are GA, it would be great to see them test this out, not least the Cloud IntegratedML piece that's quite unique to IRIS.

go to post

Benjamin De Boe · Nov 30, 2023

as @Ben Spead pointed out, we are currently having issues with eu-central-1. In fact, the Developer Access Program should only have shown the us-east-1 region but at some point in syncing with the portal for our GA cloud services that option slipped back in. This said, the DAP environment is a preview system and we're getting close to releasing a GA version of InterSystems IRIS Cloud SQL and Cloud IntegratedML, based on feedback and experiences from that preview, including those at the hackathon.

go to post

Benjamin De Boe · Nov 29, 2023

Yay, nice work Tim, and exciting collaborations that made this possible!

We're also trying to phrase the individual items in a more "actionable" way so they're easier to check off, rather than just describe what changed. On that aspect as well, are your feedback is much appreciated!

go to post

Benjamin De Boe · Oct 31, 2023

FYI - we plan a native datatype for vector content and fast similarity functions in 2024.1, with deeper integration planned for the next few releases. Stay tuned...

go to post

Benjamin De Boe · Sep 25, 2023

indeed, indices for vectors are not straightforward at all. Even though our %FunctionalIndex structure allows you to hook into filing and build pretty much any data structure you want, leveraging it in SQL is hard because the corresponding %SQL.AbstractFind is for filtering (in a WHERE clause) and not a good fit for the combination of filtering and ranking that is the common expectation when doing a vector search.

Both the indexing techniques and a proper fit in a relational model are the subject of ongoing academic research. Other vendors such as SingleStore have focused on ensuring the dot product (or other distance function) can be executed very efficiently so they just need to throw a lot of compute at it to make up for the lack of an index.

go to post

Benjamin De Boe · Sep 18, 2023

Nice to see what you were able to pull together here 👍

FYI - We have an internal research project on making vectors a first-class datatype, leveraging the same internals that columnar storage uses, which will be a better fit than $list (which offers flexibility we don't need here). Hopefully we'll be able to share more details on that later this year!

go to post

Benjamin De Boe · Jul 12, 2023

Yes, we are actively looking into this and have just finished a research sprint into the SQL datatype part. There's a lot of work still in having efficient similarity search across large numbers of vectors to actually leverage it in a useful way, so we're on the lookout for good use cases and arguments to justify that development effort.

go to post

Benjamin De Boe · Jul 10, 2023

Hi, I don't think CTEs would help here, as you still should make sure to include the fields required for the Condition2-5. I think @Luis Angel Pérez Ramos suggestion is the right way to go, using a JOIN and then CASE statements in the COUNT. If you can provide more details on the two tables and exact conditions, I'm sure we can help with the actual query you'll need.

CTEs are also mostly there for readability and wouldn't impact query performance by themselves. It's worth checking out if there aren't any opportunities for indices to speed up the JOIN and Condition1 parts.

All this said, we are planning to add CTE support to IRIS SQL in the near term.

go to post

Benjamin De Boe · Jun 19, 2023

@Alex Woodhead @Renan Lourenco , I was trying to hook this up with a local LLM (GPT4All), but am having no luck. When I just run it by default, the {table_info} it feeds to the prompt makes the prompt too large for my (cheapo?) LLM. But when I try to make it only look in my application's schema by using the corresponding SQLDatabase constructor option, the SQLAlchemy driver tries to run a SET search_path = MySchemaName command, which is not supported and fails as well. Simply taking out the table info means it'll just try without schema names and that doesn't work for me either, unfortunately.

Is this anything you've run into and found a handy workaround for?

go to post

Benjamin De Boe · Jun 16, 2023

SQL Server and Sybase both also use TOP semantics rather than LIMIT/OFFSET. I think a decent Large Dialect Model should be able to handle that when passed the proper prompt. ;-)

This said, we have a backlog item to support LIMIT/OFFSET in IRIS SQL as well, but given that TOP is a common pattern as well, it doesn't have a high priority.

go to post

Benjamin De Boe · Apr 23, 2023

this article also has more on which scenarios are the best fit, with a link to a demo repo at the end

go to post

Benjamin De Boe · Apr 18, 2023

Great article Chad!

FWIW, we're working on a faster version of ^%GSIZE (with an API more fitting the current century ;-) ) that uses stochastic sampling similar to the faster table stats gathering introduced in 2021.2? I'll also take the opportunity for a shameless plug of my SQL utilities package, which incorporates much of what's described in this article and will take advantage of that faster global size estimator as soon as it's released.

go to post

Benjamin De Boe · Apr 13, 2023

Please do NOT make any assumptions about system globals such as the ^odd series. User code should stick to documented APIs such as the Export and Clone options described earlier in the thread.

go to post

Benjamin De Boe · Mar 30, 2023

Hi Dmitry, we've recently been working on this function and a bunch of SQL optimizer enhancements to leverage it in query processing. I can't promise an exact release date, but it definitely will be this calendar year.

go to post

Benjamin De Boe · Mar 24, 2023

Hi @Lorenzo Scalese , you're spot on with those caveats about stats currently being part of the code. I wrote this article on that subject a while ago, and you'll see at the end that we're planning to move these stats into the extent to live where they belong: with the data rather than the code.

Just one note: while AutoTune tries to make sure you have at least some stats before planning a first query to a new table (when fast block sampling is available), nothing prevents you from gathering table stats again after your table is fully loaded, after a big update, in a post-install script, ...

go to post

Benjamin De Boe · Mar 3, 2023

of course he does! https://github.com/grongierisc/iris-dollar-list#12-usage