Hi David,

AutoParallel is based on a comprehensive formula weighing the cost (setup cost for the coordination work, writing and then recombining per-process results) against the benefits (independent work that can run in parallel).

For queries not doing any aggregation, iow when the result rows correspond directly to rows in the table being queried (and especially if there are no JOINs), having to write and then read the per-process results, even when done in parallel, does not add value if there's no significant compute cost associated with for example validating a filter predicate. 

For the SELECT COUNT(*), the optimizer is satisfying that from the index (you don't seem to have a bitmap extent index, but that index on age is still plenty small), which is very cheap to read so 1M rows still don't weigh up against the setup costs. 

Anyway, AutoParallel works very well for most of our customers. It's based on a cost formula that incorporates a few constants representing the cost of IO and computations that generalize across infrastructure, so for some environments there may be cases where it over-estimates or under-estimates the exact cost, leading to edge cases where the other option might have yielded a slightly faster query, but generally the formula holds well and every now and then we review whether the constants need to be tuned (wrt newer standards for hardware).

As for the particular example with 0.06 vs 0.24s, I think there may be something different at play there. The (presumed!) non-parallel case does 600k grefs whereas the parallel one only needs 318. Even if your result should only have 300 rows, I would expect it to need at least twice as many grefs (index lookup + master map), so I'd recommend giving that another try after ensuring table stats are up to date and comparing the query plans (for the %PARALLEL, %NOPARALLEL, and query without either hint). A possible explanation might be that your query tool (the SMP?) at the UI level was only retrieving the first 100 rows for the first query, and all of them for the second test.

This is indeed expected behaviour. It's not the SELECT * itself, but the fact that this query is not applying any filtering or doing any other calculations that are worth parallelizing. So the query is asking to return all rows as-is, passing them back through a single connection/process. Therefore the optimizer argues there's no benefit in parallelizing that work, as the work of collating the per-process results back into a single resultset is pure overhead.

The actual formula being applied is a little more subtle (a WHERE clause that is expected to only filter out a small fraction of the rows would not be enough to parallelize either) and as has been suggested you still need to hit the AutoParallel threshold for the process mgmt code not to outweigh the benefits (e.g. if there's only a few hundred rows).

Thanks,
benjamin

I have only this morning updated my settings about notifications, which had a few things switched off that I didn't know existed as separate options. I'm wondering if some recent enhancement to the DC refining control on what to subscribe to was a little conservative in starting with notifications disabled.

I believe I should be all set now, as I got a notification for your post right away now.

Hi Andreas,

thanks for your patience.

I have now finally been able to reproduce your issue using the released 2024.2 kit, connecting to it from a JDBC client rather than the shell. I do not quite understand why the THROUGH command is throwing an error in that client-to-server context and not when invoked from the shell in code that is creating a "nested" server-to-server JDBC connection to the Foreign Server, but those seem to be the facts.

This said, the good news is the exact same scenario works fine on 2024.3. I have tried sneaking the 2024.3 JDBC driver into the 2024.2 kit, but that doesn't seem to make a difference so it's a server-side code that fixed this. We're just two weeks away from 2024.3 going GA and there's a totally stable developer preview version of it available, so I hope that gets you by.

Thanks,
benjamin

Hi @Andreas Schneider ,

I have just downloaded the released 2024.2 container as well as the latest 2024.3, set up foreign server connections from the former to the latter, and all THROUGH SERVER commands work as expected:

[SQL]USER>>through server myhost select $zv
2.      through server myhost select $zv

| Expression_1 |
| -- |
| IRIS for UNIX (Ubuntu Server LTS for x86-64 Containers) 2024.3 (Build 206U) Fri Oct 11 2024 16:59:56 EDT |

1 Rows(s) Affected
statement prepare time(s)/globals/cmds/disk: 0.0485s/6,382/43,267/0ms
          execute time(s)/globals/cmds/disk: 0.0028s/40/576/0ms
                                query class: %SQL.PassthroughResult.Dynamic
---------------------------------------------------------------------------
[SQL]USER>>through server myself select $zv
3.      through server myself select $zv

| Expression_1 |
| -- |
| IRIS for UNIX (Ubuntu Server LTS for x86-64 Containers) 2024.2 (Build 247U) Tue Jul 16 2024 09:29:04 EDT |

1 Rows(s) Affected
statement prepare time(s)/globals/cmds/disk: 0.1686s/6,388/43,347/0ms
          execute time(s)/globals/cmds/disk: 0.0031s/40/576/0ms
                                query class: %SQL.PassthroughResult.Dynamic
---------------------------------------------------------------------------
[SQL]USER>>select $zv
4.      select $zv

| Expression_1 |
| -- |
| IRIS for UNIX (Ubuntu Server LTS for x86-64 Containers) 2024.2 (Build 247U) Tue Jul 16 2024 09:29:04 EDT |

1 Rows(s) Affected
statement prepare time(s)/globals/cmds/disk: 0.0852s/35,278/160,395/0ms
          execute time(s)/globals/cmds/disk: 0.0001s/0/411/0ms
                                query class: %sqlcq.USER.cls24
---------------------------------------------------------------------------
[SQL]USER>>

I'm afraid I have no other suggestion than to re-check whether your 2024.2 instance and the JDBC driver it uses are in sync and there are no stale classpath entries in your SQL Gateway connection or SQL Data Lens config.

FWIW, Foreign Tables use new xDBC Gateway infrastructure that only relies on the SQL Gateway connection definition for getting the server URL and credentials, so the "test" button in the SQL Gateway definition screen is not necessarily representative for xDBC Gateway connectivity. This is a temporary situation until the IRIS Wallet (secure credential score) is ready for the new Gateway (and Foreign Tables) to adopt, but we've never run into any differences in behaviour like the one you are facing.

To validate whether the xDBC Gateway can connect and possibly glean more detail from the error message, please try this, replacing the host/port/credentials as appropriate for your setup:

w ##class(%XDBC.Gateway.Connection).%GetConnection({ "host": "host.docker.internal", "port": 51774, "databasename": "USER", "datasource": "com.intersystems.jdbc.IRISDataSource", "user":"_SYSTEM", "password":"SYS","type":"jdbc" },.sc)
d $system.OBJ.DisplayError(sc)

hi @Andreas Schneider  , sorry about the embarrassingly late response! It seems my DC email notifications aren't coming through anymore.

The error you're seeing may well be because you might still be using an older JDBC or ODBC driver version. The THROUGH keyword at the start of the statement is brand new and was added to the JDBC or ODBC driver versions that ship with IRIS 2024.2 (v3.9.0 for JDBC), but if you're using a query tool such as DBeaver, they may still be using something older.

As you already found / tried, you shouldn't add any (single) quotes around the statement to be sent to the foreign server. 

The <UNDEFINED> error is puzzling. This hasn't been reported by any of the customers we know are using THROUGH today (we've worked closely with at least one!), but it seems from your $zv string you are using one of the early Developer Preview kits and not the final 2024.2.0 release. Can you please try upgrading to the official release (build 247)? I can see in our internal tracking system at least one more related change went in between the DP build you listed and the final release, and have reasonable confidence that'll sort things.

Apologies for the delayed response

If you're on that release I'm afraid you'll have to consider legacy options you may have found in the doc already: %SQL.Import.Mgr and %SQL.Export.Mgr. They lack comprehensive documentation, but from the example in their shared superclass, you can derive pretty much what you'd need.

I have not specifically tested my own SQL toolbox on 2021.1, but it doesn't rely on LOAD DATA for everything and may offer some suggestions on where to look next.

Something you'll likely need when trying to import row IDs from one instance to another is to temporarily override the InsertIdentity setting. See the INSERT reference doc for pointers on how to enable manually specifying RowID values.

Great stuff! This was really easy to get going with once I got the right OpenAI API key.

I was curious how it'd do on a different (more complex) schema and it seemed to hold up quite well. Two things I noted so far (but I'm not done with poking around) is that it seems to add LIMIT n to queries every now and then, which is not valid IRIS SQL syntax (and likely you can suggest that in the prompt), and that it does weird things when you try to put your own SQL in the execute box (it restores the previous query - just a UI thing I guess).

You guys really earned that first place in the contest!!

Equally curious in the feedback here, as we're currently working on a long-awaited project to collect table statistics automatically in the background.

See also this earlier article, with some discussion at the bottom.

As for #4: if you'd need to clear your cached queries in order for new stats to be picked up, that's a bug. Please let us know if that is what you'd find ;-)

As described in the article, this error message also fooled me into thinking it was not finding the SSLDefs.ini file, but in practice it meant that it didn't find a server entry (linking the hostname/port combination to an SSL definition) in the SSLDefs.ini file. Maybe that's also what you're running into @Alexey Nechaev ?

We'll be improving this error message in the near future.

something odd is going on with that new_embedding_str parameter value you're adding. Rather than taking the value, it's taking the parameter name, hence the "field not found" error. Try removing that * before parameters in your call to exec(), or just inline the parameter value into the statement as you're not reusing it anyway.

Hi @Vadim Aniskin ,

while putting together a sample of the new feature, I found out that very unfortunately this change did not make it into 2024.1.0 after all. It passed our internal testing a few months ago and was promoted through project and integration branches using our existing automation, and as such was added to the draft list of features we wanted to describe in the release notes. However, because of overlapping changes it did not get promoted into the main release branch automatically (as those earlier steps), and ended up in a manual queue. That took a little longer than expected, and I did not perform a final check before publishing the draft release notes. To my knowledge, this is the first time we had a fully-greenlit feature miss out on the automation, but that's no excuse and we've learned to do a manual check right before release. 

In short, this change is now on its way to 2024.2, and the 2024.1 release notes will be updated shortly.

I'm sorry about the confusion this may have caused,
benjamin