Cool stuff! For this age-old iKnow demo I used to scrape a bunch of hotel reviews from the web, but it was pretty crude and server-based, so impossible to include in the demo repo. I've heard of beautifulsoup before, but never considered revisiting that data sourcing piece of my demo now that we can use it directly through Embedded Python.

I'm not sure what you mean with "linear cost pagination", but as soon as your query plan gets more complex than reading a single stream (and especially when sorting) there is upfront work before you can start reading the first row so any pagination of that final result simply cannot be linear. 

That small disclaimer aside, "customizable filtering and sorting" sound like application-level features you'd apply on top of the result set returned by SQL and the %ScrollableResultSet class offers something along those lines. If you're looking for server-side support, IRIS SQL currently only supports the TOP clause, but we have support for LIMIT / OFFSET semantics queued for development.

Where Oracle uses the FROM DUAL pseudo-table, IRIS will just do without a FROM clause altogether for cases where you just want to select a single row of constant expressions. The following statement should work:

INSERT INTO mytable (column1, column2, column_n)
  SELECT expr1, expr2, expr_n UNION ALL
   SELECT expr1, expr2, expr_n;

As of IRIS 2021.1, we allow users to create their own aggregate functions. Perhaps there's a beautiful community contribution in there? :-)

You could build something simple where you just stuff all values in a temporary global (the name of which you pass as a state) and sort them (using $sortbegin/$sortend), maintaining a count and then in the FINALIZE method gather the actual median value.

Two caveats:

  • don't bother implementing a MERGE function. We don't support parallel execution just yet.
  • in some query execution plans, the FINALIZE method may be called more than once (e.g. if the aggregate is used in the SELECT list and say a HAVING clause). So you may want to cache your result somewhere (a PPG will do as this is in the last single-process mile of query processing, typically mere milliseconds apart)

We'll be removing these annoyances in a future version

Oliver already pointed you at the key setting for this, but I thought I'd include a snippet from and link to the corresponding doc as well

If Source has a foreign key public RowID, and you want Destination to have the same foreign key relationship, you must define Destination with the ALLOWIDENTITYINSERT=1 parameter. When defining Destination as a persistent class, specify Parameter ALLOWIDENTITYINSERT=1;. When using CREATE TABLE to define Destination, specify %CLASSPARAMETER ALLOWIDENTITYINSERT=1. If a table is defined as ALLOWIDENTITYINSERT=1, this setting cannot be changed by the SetOption("IdentityInsert") method.

That would be through calling %Next() right away. The way how our query code works is to walk through results and tell you when it runs out of rows to pass back. 

Our online documentation has a feedback button that makes your comments immediately find their way to the doc writers. They really appreciate suggestions and usually turn them around quite quickly :-)

Have you taken a look at DataMove? DataMove supports live environments, ensuring any operations to the global (subscripts) being moved are propagated along to the new location without requiring downtime.

I understand it might not yet be available on the releases you currently have deployed, but would be interested in learning more about features that might be missing or generally experiences from doing this on production systems. 

This is expected behaviour. for a SELECT statement, %Execute() (or %ExecDirect()) will set %SQLCODE to 0 if the statement was successfully executed and ready to return rows, or an error code if the execution went wrong. It's through the use of %Next() that it will figure out whether there are any rows left. The first %Next() call on such a resultset will return 0 and at the same time set %SQLCODE to 100. This behaviour is similar and perhaps even more visible in the case of %ROWCOUNT for SELECT statements, which increases with every call to %Next(). See also the docs for %Next() (second bullet)

The simple reason for this is that we're not building the whole result set upfront, but rather "get ready" to iterate through it with %Next() calls, as users may only be interested in the first few rows. Some call this a lazy execution model, but I prefer the term efficient :-). Of course, for DML statements such as UPDATE and DELETE, the %ROWCOUNT is known immediately upon execution, as there is no result set to scan through

Also interested to see submissions / ideas on handling really BIG globals. DataMove is a powerful feature for moving data from one database to another in a live environment, but it operates at a fairly low level. Adding a layer around it for easy archiving of table data, an interface to keep track of (potentially large numbers of) subscript-level mappings, etc.. would be examples of helpful utilities in the big global space. 

FWIW, we have several roadmap items for IRIS in this area, but I won't go into detail on them as it's much more exciting and informative for us to learn about the problems you are seeing and the solutions you'd propose to them :-)

Two small notes:

  • The difference between %Dictionary.IndexDefinition and %Dictionary.CompiledIndex is that the former has all the indices you defined in that class itself, and the latter also the ones it got through class inheritance. Obviously, you can only delete the ones in the former list, unless you're sure you're not accidentally dropping one that still matters for another subclass of the same parent in which it was defined.
  • If you use %DeleteId() to drop an index, you'll need to call %PurgeIndices() for those indices first (iianm) and recompile the affected class(es) afterwards. A slightly more elegant approach would be to call DROP INDEX for each entry in the list. That'll take care of the recompiling and data-dropping for you.

A little more documentation on the SQL projection of collection properties here.

Currently the SQLPROJECTION = table option is only available for "array of" properties. In 2022.1 we intend to make this option also available for "list of" properties.

If you're interested, I can dig up a script I wrote a while ago to create a class that projects a read-only view of your "list of" property. No warranty!

But for several of the EnsEdi Globals I get an error message telling me, they can't be deleted, because they are wrong.

Can you be more specific about what you mean with "can't be deleted"? If this is a <PROTECT> error because they are mapped to ENSLIB (as part of enabling the namespace for Interop as Robert described), you can just ignore that error in a try / catch block.

I'm not sure about the general application context but if you need to go through the full list of globals for your app, it may be be better to query the globals that are in your application's database (directory) using %SYS.GlobalQuery:DirectoryList() rather than all the ones that are visible in the namespace using %SYS.GlobalQuery:NamespaceList() or ^$GLOBAL. That'll avoid the issues with ^EnsEDI.

Note that if you are actually using DeepSee or interop (or more generally rely on any features that may be writing their own globals), just looping through the global list for this kind of operations is very risky. Most built-in IRIS features (besides DeepSee and interop, that is) will write to globals prefixed with ^IRIS.*