Replies by Benjamin De Boe for InterSystems Developer Community

another question to ask yourself is whether you really want that CSV file data to be in IRIS tables. Maybe if it's rarely accessed projecting it as a foreign table is more cost effective?

go to post

Benjamin De Boe · Apr 3

Nice article!

as a complement to the section on Tune Table, I'd like to refer to this article I wrote about a bit of automation we put in the product in 2021.2 (that we intend to enhance this year), and also this one on caveats wrt packaging statistics

go to post

Benjamin De Boe · Mar 27

something odd is going on with that new_embedding_str parameter value you're adding. Rather than taking the value, it's taking the parameter name, hence the "field not found" error. Try removing that * before parameters in your call to exec(), or just inline the parameter value into the statement as you're not reusing it anyway.

go to post

Benjamin De Boe · Mar 18

Hi @Vadim Aniskin ,

while putting together a sample of the new feature, I found out that very unfortunately this change did not make it into 2024.1.0 after all. It passed our internal testing a few months ago and was promoted through project and integration branches using our existing automation, and as such was added to the draft list of features we wanted to describe in the release notes. However, because of overlapping changes it did not get promoted into the main release branch automatically (as those earlier steps), and ended up in a manual queue. That took a little longer than expected, and I did not perform a final check before publishing the draft release notes. To my knowledge, this is the first time we had a fully-greenlit feature miss out on the automation, but that's no excuse and we've learned to do a manual check right before release.

In short, this change is now on its way to 2024.2, and the 2024.1 release notes will be updated shortly.

I'm sorry about the confusion this may have caused,
benjamin

go to post

Benjamin De Boe · Mar 18

Please open an iService ticket to request assistance, referencing your AWS account details.

go to post

Benjamin De Boe · Jan 25

has your Cloud SQL deployment been enabled for external connections? Please see https://docs.intersystems.com/services/csp/docbook/DocBook.UI.Page.cls?K...

go to post

Benjamin De Boe · Jan 21

FYI - the new Vector Search capability, and some other smaller items advertised in the release notes, are not yet part of this first developer preview. We're working hard to make them part of the next drop!

go to post

Benjamin De Boe · Jan 16

indeed. please see the bottom paragraph of the post above, and feel free to reach out to me directly if you have any specific questions about your use of the technology.

go to post

Benjamin De Boe · Nov 30, 2023

Hi @Iryna Mykhailova , I'm sorry your students had a bad experience setting up a deployment in the preview environment. We have indeed found some glitches along the way, and have prioritized fixing them in the main code branch that's on its way to GA, rather than patching the Developer Access Program environment.

Great to hear though you're thinking of promoting this to your students. When we are GA, it would be great to see them test this out, not least the Cloud IntegratedML piece that's quite unique to IRIS.

go to post

Benjamin De Boe · Nov 30, 2023

as @Ben Spead pointed out, we are currently having issues with eu-central-1. In fact, the Developer Access Program should only have shown the us-east-1 region but at some point in syncing with the portal for our GA cloud services that option slipped back in. This said, the DAP environment is a preview system and we're getting close to releasing a GA version of InterSystems IRIS Cloud SQL and Cloud IntegratedML, based on feedback and experiences from that preview, including those at the hackathon.

go to post

Benjamin De Boe · Nov 29, 2023

Yay, nice work Tim, and exciting collaborations that made this possible!

We're also trying to phrase the individual items in a more "actionable" way so they're easier to check off, rather than just describe what changed. On that aspect as well, are your feedback is much appreciated!

go to post

Benjamin De Boe · Oct 31, 2023

FYI - we plan a native datatype for vector content and fast similarity functions in 2024.1, with deeper integration planned for the next few releases. Stay tuned...

go to post

Benjamin De Boe · Sep 25, 2023

indeed, indices for vectors are not straightforward at all. Even though our %FunctionalIndex structure allows you to hook into filing and build pretty much any data structure you want, leveraging it in SQL is hard because the corresponding %SQL.AbstractFind is for filtering (in a WHERE clause) and not a good fit for the combination of filtering and ranking that is the common expectation when doing a vector search.

Both the indexing techniques and a proper fit in a relational model are the subject of ongoing academic research. Other vendors such as SingleStore have focused on ensuring the dot product (or other distance function) can be executed very efficiently so they just need to throw a lot of compute at it to make up for the lack of an index.

go to post

Benjamin De Boe · Sep 18, 2023

Nice to see what you were able to pull together here 👍

FYI - We have an internal research project on making vectors a first-class datatype, leveraging the same internals that columnar storage uses, which will be a better fit than $list (which offers flexibility we don't need here). Hopefully we'll be able to share more details on that later this year!

go to post

Benjamin De Boe · Jul 12, 2023

Yes, we are actively looking into this and have just finished a research sprint into the SQL datatype part. There's a lot of work still in having efficient similarity search across large numbers of vectors to actually leverage it in a useful way, so we're on the lookout for good use cases and arguments to justify that development effort.

go to post

Benjamin De Boe · Jul 10, 2023

Hi, I don't think CTEs would help here, as you still should make sure to include the fields required for the Condition2-5. I think @Luis Angel Pérez Ramos suggestion is the right way to go, using a JOIN and then CASE statements in the COUNT. If you can provide more details on the two tables and exact conditions, I'm sure we can help with the actual query you'll need.

CTEs are also mostly there for readability and wouldn't impact query performance by themselves. It's worth checking out if there aren't any opportunities for indices to speed up the JOIN and Condition1 parts.

All this said, we are planning to add CTE support to IRIS SQL in the near term.

go to post

Benjamin De Boe · Jun 19, 2023

@Alex Woodhead @Renan Lourenco , I was trying to hook this up with a local LLM (GPT4All), but am having no luck. When I just run it by default, the {table_info} it feeds to the prompt makes the prompt too large for my (cheapo?) LLM. But when I try to make it only look in my application's schema by using the corresponding SQLDatabase constructor option, the SQLAlchemy driver tries to run a SET search_path = MySchemaName command, which is not supported and fails as well. Simply taking out the table info means it'll just try without schema names and that doesn't work for me either, unfortunately.

Is this anything you've run into and found a handy workaround for?

go to post

Benjamin De Boe · Jun 16, 2023

SQL Server and Sybase both also use TOP semantics rather than LIMIT/OFFSET. I think a decent Large Dialect Model should be able to handle that when passed the proper prompt. ;-)

This said, we have a backlog item to support LIMIT/OFFSET in IRIS SQL as well, but given that TOP is a common pattern as well, it doesn't have a high priority.

go to post

Benjamin De Boe · Apr 23, 2023

this article also has more on which scenarios are the best fit, with a link to a demo repo at the end

go to post

Benjamin De Boe · Apr 18, 2023

Great article Chad!

FWIW, we're working on a faster version of ^%GSIZE (with an API more fitting the current century ;-) ) that uses stochastic sampling similar to the faster table stats gathering introduced in 2021.2? I'll also take the opportunity for a shameless plug of my SQL utilities package, which incorporates much of what's described in this article and will take advantage of that faster global size estimator as soon as it's released.