Any tools that use SQL to access partitioned tables will just work, as from the SQL query perspective there is no change. This includes Adaptive Analytics, InterSystems Reports, and any third-party BI tools. Also, IRIS BI cubes can use partitioned tables as their source class.

We currently have no plans to support partitioning of IRIS BI cubes themselves, as they have their own bucketing structure and less commonly have both hot and cold data, so some of the motivations for table partitioning don't apply. 

Nice article @Ben Schlanger !

I like how you're laying out the investigative process, though I think it's worth noting that every case is different and therefore recommendations also can differ. Especially the %NORUNTIME hint should be used with caution as it may deprive you of better plans in most scenarios. In fact, we like to say that any time you have to revert to that hint, it's worth opening a case with the WRC as it should be an improvement opportunity for our engine to make that better choice automatically (available statistics permitting) :-)

Also, I'd like to advertise a few improvements we've made since the IRIS version shown here:

  • Improved feedback in the query plan: we've been displaying a note in the query plan for a while now if there's a chance that different runtime parameter values may lead to a different plan, and as of IRIS 2023.3 are even calling out the specific predicates that drove the RTPC decision. For example, your plan may say "This query plan was selected based on the runtime parameter values that led to improved selectivity estimation of the range condition enc.EncounterTime BETWEEN '2022-01-01' AND '2023-12-31'"
  • Showing the actual runtime plan: Starting with IRIS 2023.3, we've enhanced the EXPLAIN and SMP utilities to no longer show the generic plan, after substituting out all parameter values, but the actual plan you'll get at runtime with the literal values you put in the query text. This addresses step #4 in the investigation described above.
  • SQL Process View: As of IRIS 2022.2, the Operations menu in the System Management Portal includes a "SQL Activity" link that leads you to a page listing all currently-running SQL statements, and allows you to drill through to the statement details and query plan. This also helps with step #4, and to identify any long-running queries in the first place. An aggregated form of this data is also available through the /api/metrics endpoint for consumption through a monitoring tool.
  • Query and schema recommendations: In IRIS 2024.3, released last month, we've further expanded the information contained in the query plan from the RTPC related notes described above, to also include warnings on indices marked as non-selectable (cf investigation step #1), indices that are being ignored because they have non-matching collation, whether the plan is frozen, and similar additional information that may help you improve the statement text, schema, or overall system settings

The above features are all specifically introduced to help diagnose long-running queries more quickly and identify how things can be sped up, but of course these versions also include general performance enhancements and refinements to the RTPC infrastructure too, so it'll be exciting to see how fast this customer's query runs on the latest and greatest IRIS release!

Hi @Scott Roth , the %MANAGE_FOREIGN_SERVER privilege was only just introduced with 2024.2, as part of finalizing the full production support for Foreign Servers (see also release notes). I'm not sure though why it wouldn't appear after you created it. Can you confirm whether it's still there right after the CREATE SERVER command, whether you're using the same user for both connections, and whether or not you can CREATE FOREIGN TABLEs with that server (before logging off and / or after logging back in).

I understand upgrading may not be straightforward, but the most logical explanation would be that the initial, crude privilege checking (that we replaced in 2024.2 as advertised) has a hole in it. 

thanks,
benjamin

No, I would leave out the semicolon at the end of that query. It's typically used as a statement separator, but not really part of query syntax itself. IRIS (as of 2023.2) will tolerate it at the end of a statement, but it doesn't seem that Spark really does anything with it as it wraps what you sent to dbtable with further queries, causing the error you saw.

You may also want to apply 

  .option(“pushDownLimit”, false)

Indeed, as of 3.10.1, we're publishing our JDBC drivers directly to Maven when needed, offering bugfixes as well as enhancements independently of IRIS releases. This significantly increases our ability to address customer feedback.

For convenience, we'll continue to ship jar files with IRIS, using the version that is current at the time of the IRIS release. 

Hi David,

AutoParallel is based on a comprehensive formula weighing the cost (setup cost for the coordination work, writing and then recombining per-process results) against the benefits (independent work that can run in parallel).

For queries not doing any aggregation, iow when the result rows correspond directly to rows in the table being queried (and especially if there are no JOINs), having to write and then read the per-process results, even when done in parallel, does not add value if there's no significant compute cost associated with for example validating a filter predicate. 

For the SELECT COUNT(*), the optimizer is satisfying that from the index (you don't seem to have a bitmap extent index, but that index on age is still plenty small), which is very cheap to read so 1M rows still don't weigh up against the setup costs. 

Anyway, AutoParallel works very well for most of our customers. It's based on a cost formula that incorporates a few constants representing the cost of IO and computations that generalize across infrastructure, so for some environments there may be cases where it over-estimates or under-estimates the exact cost, leading to edge cases where the other option might have yielded a slightly faster query, but generally the formula holds well and every now and then we review whether the constants need to be tuned (wrt newer standards for hardware).

As for the particular example with 0.06 vs 0.24s, I think there may be something different at play there. The (presumed!) non-parallel case does 600k grefs whereas the parallel one only needs 318. Even if your result should only have 300 rows, I would expect it to need at least twice as many grefs (index lookup + master map), so I'd recommend giving that another try after ensuring table stats are up to date and comparing the query plans (for the %PARALLEL, %NOPARALLEL, and query without either hint). A possible explanation might be that your query tool (the SMP?) at the UI level was only retrieving the first 100 rows for the first query, and all of them for the second test.

This is indeed expected behaviour. It's not the SELECT * itself, but the fact that this query is not applying any filtering or doing any other calculations that are worth parallelizing. So the query is asking to return all rows as-is, passing them back through a single connection/process. Therefore the optimizer argues there's no benefit in parallelizing that work, as the work of collating the per-process results back into a single resultset is pure overhead.

The actual formula being applied is a little more subtle (a WHERE clause that is expected to only filter out a small fraction of the rows would not be enough to parallelize either) and as has been suggested you still need to hit the AutoParallel threshold for the process mgmt code not to outweigh the benefits (e.g. if there's only a few hundred rows).

Thanks,
benjamin

I have only this morning updated my settings about notifications, which had a few things switched off that I didn't know existed as separate options. I'm wondering if some recent enhancement to the DC refining control on what to subscribe to was a little conservative in starting with notifications disabled.

I believe I should be all set now, as I got a notification for your post right away now.