Love the article! Very well-phrased considerations on the use of AI, almost all of which I share. Especially in the context of #1, we should not forget that the second L in LLM is for Language, and not for Fact or Solution (otherwise it would be a really bad acronym!). Therefore, if we're not qualified to spot what hallucinations crept into the response, its nicely-phrased language will probably make sure we never will.

PS: so glad you passed that Stats test and joined InterSystems :-) 

I believe what you're looking at is the new, more fine-grained set of %Native_* resources you need to use native functions. Look for DP-423341 in the upgrade guide. It seems we failed to describe this requirement in the  Native API documentation (or at least I didn't find it where I expected it), so we'll get that addressed.

I'd also recommend defaulting to the new, dynamic upgrade guide that makes it easier to filter on particular types of issues. This is now replacing the old, static pages that were more reliant on / vulnerable to manual curation. In fact you'll no longer find those static pages from the menu in the 2025.1 doc.

Any tools that use SQL to access partitioned tables will just work, as from the SQL query perspective there is no change. This includes Adaptive Analytics, InterSystems Reports, and any third-party BI tools. Also, IRIS BI cubes can use partitioned tables as their source class.

We currently have no plans to support partitioning of IRIS BI cubes themselves, as they have their own bucketing structure and less commonly have both hot and cold data, so some of the motivations for table partitioning don't apply. 

Hi @Scott Roth , the %MANAGE_FOREIGN_SERVER privilege was only just introduced with 2024.2, as part of finalizing the full production support for Foreign Servers (see also release notes). I'm not sure though why it wouldn't appear after you created it. Can you confirm whether it's still there right after the CREATE SERVER command, whether you're using the same user for both connections, and whether or not you can CREATE FOREIGN TABLEs with that server (before logging off and / or after logging back in).

I understand upgrading may not be straightforward, but the most logical explanation would be that the initial, crude privilege checking (that we replaced in 2024.2 as advertised) has a hole in it. 

thanks,
benjamin

No, I would leave out the semicolon at the end of that query. It's typically used as a statement separator, but not really part of query syntax itself. IRIS (as of 2023.2) will tolerate it at the end of a statement, but it doesn't seem that Spark really does anything with it as it wraps what you sent to dbtable with further queries, causing the error you saw.

You may also want to apply 

  .option(“pushDownLimit”, false)

Indeed, as of 3.10.1, we're publishing our JDBC drivers directly to Maven when needed, offering bugfixes as well as enhancements independently of IRIS releases. This significantly increases our ability to address customer feedback.

For convenience, we'll continue to ship jar files with IRIS, using the version that is current at the time of the IRIS release. 

Hi David,

AutoParallel is based on a comprehensive formula weighing the cost (setup cost for the coordination work, writing and then recombining per-process results) against the benefits (independent work that can run in parallel).

For queries not doing any aggregation, iow when the result rows correspond directly to rows in the table being queried (and especially if there are no JOINs), having to write and then read the per-process results, even when done in parallel, does not add value if there's no significant compute cost associated with for example validating a filter predicate. 

For the SELECT COUNT(*), the optimizer is satisfying that from the index (you don't seem to have a bitmap extent index, but that index on age is still plenty small), which is very cheap to read so 1M rows still don't weigh up against the setup costs. 

Anyway, AutoParallel works very well for most of our customers. It's based on a cost formula that incorporates a few constants representing the cost of IO and computations that generalize across infrastructure, so for some environments there may be cases where it over-estimates or under-estimates the exact cost, leading to edge cases where the other option might have yielded a slightly faster query, but generally the formula holds well and every now and then we review whether the constants need to be tuned (wrt newer standards for hardware).

As for the particular example with 0.06 vs 0.24s, I think there may be something different at play there. The (presumed!) non-parallel case does 600k grefs whereas the parallel one only needs 318. Even if your result should only have 300 rows, I would expect it to need at least twice as many grefs (index lookup + master map), so I'd recommend giving that another try after ensuring table stats are up to date and comparing the query plans (for the %PARALLEL, %NOPARALLEL, and query without either hint). A possible explanation might be that your query tool (the SMP?) at the UI level was only retrieving the first 100 rows for the first query, and all of them for the second test.