Thanks Theo, I've also realized that IRIS is much more capricious than Postgres and that I will have to spend half my life tuning indexes in it.
- Log in to post comments
Thanks Theo, I've also realized that IRIS is much more capricious than Postgres and that I will have to spend half my life tuning indexes in it.
Is your account hacked or what. This is the second or third month in a row that you've been asking strange questions (almost) every day.
* * *
Essentially, I would recommend always using Return to exit a function (as in all other programming languages) and Quit to exit a loop.
Hi Vitaly and thanks for the idea. The table contains 3.5M rows
create table test as select row_status from <src>
create bitmap index test_row_status_idx on test(row_status)
select top 10 ID from test where row_status in ('U','I') order by ID
0.579s
So most probably you are right and the table indeed has some issues with indexes
Done. Neither `collect statistics` nor `tune table` help.
By the way, how normal is this? Index usage = 0 everywhere.png)
Two times :( First this index, then the whole table too. Then I recreated the index as bitmap, rebuilt all indexes again - doesn't work
If it is important, the index was created automatically by the Data Migration Wizard. In the list of indexes, formerly it had type 'index'. I've recreated the index, this time explictly as 'bitmap':
create bitmap index <name> on <schema>.<table>(row_status)
Now I see that it has type 'bitmap'. Unfortunately, that did not work, and the plan hardly differs:
SELECT TOP ? ID FROM <table>.<schema> WHERE row_status IN ( ? , ? ) ORDER BY ID /*#OPTIONS {"DynamicSQLTypeList":"10,1,1"} */
Query Plan
Relative Cost = 105.04
• Read extent bitmap <table>.<schema>.%%DDLBEIndex, looping on ID.
• For each row:
- Read master map <table>.<schema>.IDKEY, using the given idkey value.
- Test the IN condition on %SQLUPPER(row_status) and the NOT NULL condition on %SQLUPPER(row_status).
- Output the row.
three more notes (without using the concatenation like `order by ID || ''`):
This plan is for the slow (to be more precise, infinite) one. The "fast" one (that uses the concatenation trick) looks like:
SELECT TOP ? ID FROM <schema>.<table> WHERE row_status IN ( ? , ? ) ORDER BY ID || '' /*#OPTIONS {"DynamicSQLTypeList":"10,1,1"} */ /*#OPTIONS {"rtpc-utility":1} */ /*#OPTIONS {"rtpc-log-sel":["1^.0390625"]} */
Warning
• Complex condition TOP on ID from table <schema>.<table> will not filter out any rows.
Information
• This query plan was selected based on the runtime parameter values that led to:
Using the outlier selectivity in an IN condition on row_status.
Query Plan
Relative Cost = 620185
• Call module B, which populates temp-file A.
• Read temp-file A, looping on the '_' expression and ID.
• For each row:
- Output the row.
Module: B
• Read index map <schema>.<table>.row_status_idx, looping on %SQLUPPER(row_status) (with a given set of values) and ID (with a TOP condition).
• For each row:
- Add a row to temp-file A, subscripted by the '_' expression and ID,
with no node data.
Hi Nick,
SELECT TOP ? ID FROM <schema> . <table> WHERE row_status IN ( ? , ? ) ORDER BY ID /*#OPTIONS {"DynamicSQLTypeList":"10,1,1"} */
Query Plan
Relative Cost = 105.04
• Read extent bitmap <schema>.<table>.%%DDLBEIndex, looping on ID.
• For each row:
- Read master map <schema>.<table>.IDKEY, using the given idkey value.
- Test the IN condition on %SQLUPPER(row_status) and the NOT NULL condition on %SQLUPPER(row_status).
- Output the row.
Thanks Vitaly, it looks like a magic spell but it works.
I'm curious why LPAD, RPAD and similar string functions are not exposed as $SYSTEM.SQL.Functions class members
Shouldn't the 'target' parameter be passed by reference?
Thanks a lot Cecilia, now I can continue learning HealthShare 😊
No, JVM settings are not the reason.
It seems to me the reason is the way how the XDBC Java wrapper is implemented. There are two things I don't really like about the code:
1. Statements created by an instance of XDBC Connection objects are being put in a List which keeps growing and growing. Even if I call Statement.close(), it will not be removed from the list immediately. The list is cleared only when the Connection.close() method is called.
2. Less likely, but this could also be the cause of the problem: no ResultSet.close() calls. The specification says:
https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#close()
It is strongly recommended that an application explicitly commits or rolls back an active transaction prior to calling the close method. If the close method is called and there is an active transaction, the results are implementation-defined.Modern JDBC drivers try to close dependent objects cascadely on Connection.close() but it is not always the case, and it is recommended to always close them explicitly.
In particular, someone complained about the similar issue with the Postgres JDBC driver here. And in the XDBC code I see that it tries to close and release all Statement instances but not ResultSets. On the other hand, Postgres driver maintainers reported that namely this issue was fixed.
The responses look like hacky Perl scripts
Hi Tani, how difficult is this to connect IRIS FHIR server to Keycloak or to any other OAuth2 server? Does Keycloak need to be additionally configured for that (e.g. additional scopes should be added or something like that)?
Nothing special... each file just defines the context for connecting to a specific IRIS instance (and also local and remote "folders"). The advantage is that the connection attributes for a specific server are stored in the same file, and I don't need to register “global” connection parameters for a specific IRIS server.
1. It is officially declared that XDBC is the most modern remote data access technology
2. I use remote data sources very extensively and have tried everything - linked tables, Ens.* business operations and adapters, foreign tables, and XDBC. XDBC looks like the most simple and attractive data access technology. Using foreign servers and tables you need to describe both the server and the tables. Using XDBC you work with any ODBC or JDBC data source transparently.
In particular, I have had bad experience with both Postgres linked tables and with Postgres foreign tables, especially when remote/Postgres tables contain columns of type `text` or `bytea`. In this case, IRIS often silently returns an empty dataset without any errors even if a remote table is actually not empty.
Batch updates are indeed supported by XDBC but I need to execute updates in real time without any delay, row-by-row.
I use a bunch of .code-workspace files
Hi Enrico,
I'm aware of that warning "FOR INTERNAL USE" in the source code, but the way I create XDBC connections is officially documented
The OS is Ubuntu, IRIS is containerized, the memory is consumed by the IRIS Java language server
Regarding memory leaks, I suspect that ResultSets are not closed properly - method %XDBC.Gateway.JDBC.ResultSet::Close() is empty and does nothing but it seems to be it should call ..%externalResultSet.close(). But I'm not sure it is the only reason
UPD: calling Java GC really helps:
Set gateway = $system.external.getGateway("%Java Server")
Do gateway.invoke("java.lang.System", "gc")
Your macro works perfectly, thanks a lot!
Thanks Yuri, I've tweaked Java a bit according your recommendations, I will report on the results later (right now the production hangs and I'm unable to terminate Java processes :)
Correct me if I'm wrong but latest IRIS distributions come with Java 11 which rarely requires fine-tuning
Sure, here it is:
Class User.TestService Extends EnsLib.Kafka.Service
{
Method OnProcessInput(pInput As %Net.Remote.Object, Output pOutput As %RegisteredObject) As %Status {
Set tSC = $$$OK
Try {
Set tMsg = ##class(EnsLib.Kafka.Message).%New()
Do tMsg.FromRemoteObject(pInput)
#Dim row as Test.Stat
Set row = ##class(Test.Stat).%New()
Set row.Topic = tMsg.topic
Set row.Key = tMsg.key
// row.Ts - generated automatically in the Test.Stat constructor
Do row.%Save()
$$$LOGINFO("Saved")
}
Catch (ex) {
Set tSC = ex.AsStatus()
}
Quit tSC
}
}
Hi Peter and thanks a lot for the idea. There is an index for that column. I rebuilt the index and now it is fast. So simple.
Hi Tani and thanks a lot for the info! To avoid struggling with interprocess communication, I experimented a little more and ended up with Workers and Python (see below). I'd appreciate your opinion
I finally managed to solve the problem in Python. It's not perfect but it works:
Class User.Timer Extends %RegisteredObject
{
Property Executor [ Private ];
Method Initialize(maxWorkers As %Integer = 4) [ Language = python ]
{
import concurrent.futures
import time
import threading
self.Executor = concurrent.futures.ThreadPoolExecutor(max_workers=maxWorkers)
}
Method Close() [ Language = python ]
{
if self.Executor:
self.Executor.shutdown()
}
Method Greet(name)
{
Write "Hello ", name, !
}
Method OnCallback0(methodName As %String) [ Private ]
{
Do $METHOD(instance, methodName)
}
Method OnCallback1(instance As %RegisteredObject, method As %String, arg1) [ Private ]
{
Do $METHOD(instance, method, arg1)
}
Method OnCallback2(instance As %RegisteredObject, method As %String, arg1, arg2) [ Private ]
{
Do $METHOD(instance, method, arg1, arg2)
}
Method OnCallback3(instance As %RegisteredObject, method As %String, arg1, arg2, arg3) [ Private ]
{
Do $METHOD(instance, method, arg1, arg2, arg3)
}
Method OnCallback4(instance As %RegisteredObject, method As %String, arg1, arg2, arg3, arg4) [ Private ]
{
Do $METHOD(instance, method, arg1, arg2, arg3, arg4)
}
Method OnCallback5(instance As %RegisteredObject, method As %String, arg1, arg2, arg3, arg4, arg5) [ Private ]
{
Do $METHOD(instance, method, arg1, arg2, arg3, arg4, arg5)
}
Method InternalRun(delayMs As %Integer, wait As %Boolean, instance As %RegisteredObject, method As %String, args... As %List) [ Internal, Language = python ]
{
import time
import iris
if not self.Executor:
raise Exception("The 'Initialize' method has not been called.")
def worker_function():
time.sleep(delayMs / 1000)
if len(args) == 0:
self.OnCallback0(instance, method)
elif len(args) == 1:
self.OnCallback1(instance, method, args[0])
elif len(args) == 2:
self.OnCallback2(instance, method, args[0], args[1])
elif len(args) == 3:
self.OnCallback3(instance, method, args[0], args[1], args[2])
elif len(args) == 4:
self.OnCallback4(instance, method, args[0], args[1], args[2], args[3])
elif len(args) == 5:
self.OnCallback5(instance, method, args[0], args[1], args[2], args[3], args[4])
else:
raise Exception("Too many arguments.")
return 0
future = self.Executor.submit(worker_function)
# wait == 0 means fire-and-forget
try:
if (wait == 1):
rv = future.result()
except Exception as e:
print(f"{e}")
}
/// delayMs - the parameter specifies the timer delay in milliseconds
/// wait - if the parameter is false, the process will not wait for the Future result to be returned (fire-and-forget)
/// instance - any object which method should be called with a delay
/// method - specifies the object's callback method name
/// args - the callback method arguments (up to 5)
Method Run(delayMs As %Integer, wait As %Boolean, instance As %RegisteredObject, method As %String, args... As %List)
{
Do ..InternalRun(delayMs, wait, instance, method, args...)
}
ClassMethod Test()
{
Set obj = ##class(Timer).%New()
Do obj.Initialize()
Do obj.Run(1000, 0, obj, "Greet", "John")
Do obj.Run(2000, 0, obj, "Greet", "Jessica")
Write "If 'wait == 0' this line will be printed first", !
Do obj.Close()
}
}
Hi Alexey, that's the point! I expect the following sequence:
- the worker is created in background and waits for, say, 5 seconds. The main thread isn't blocked!
- the program prints "This should be printed first"
- the waiting interval of 5 seconds expires and the worker emits its own message