XDBC memory leaks
I have a business service that actively reads data from a remote Postgres database. OnProcessInput opens a XDBC (actually JDBC) connection, executes an SQL query, fetches several thousand rows, iterates the resultset, and closes the connection. On each iteration I also need to update each source row in the remote database using PreparedStatement.
In other words, in every OnProcessInput call I have a long running SELECT statement and several thousands small UPDATE statements.
The problems I'm facing are:
- I cannot execute an UPDATE statement via the same connection that was used before to select rows. IRIS raises exceptions of kind "<NOTOPEN> 34 %PreInvoke+1^%Net.Remote.Object.1". Obviously, a separate connection instance is needed for updates.
- In the Java/.NET world, a database connection is a reusable resource that is managed by a pool. But if I start creating and destroying connections frequently, I see a drop in the performance of UPDATE operations (about 10x):
Method ReadRows()
{
Set cnn = ##class(%XDBC.Gateway.Connection).GetConnection("postgres")
Set st = cnn.CreateStatement()
Set rs = st.ExecuteQuery("select * from my_postgres_table limit 10000")
While (rs.Next()) {
Set id = rs.Get("id")
Do ..UpdateRow(id, "S")
}
Do rs.Close()
Do st.Close()
Do cnn.Close()
}
/// DOES IRIS LEVERAGE JAVA/JDBC connection pooling?
Method UpdateRow(id as %String, status as %String)
{
Set cnn = ##class(%XDBC.Gateway.Connection).GetConnection("postgres")
Set st = cnn.PrepareStatement("update my_postgres_table set status = ? where id = ?")
...
// Set parameters, execute UPDATE
Do st.Close()
Do cnn.Close()
Set cnn = ""
}
- This approach also results in a rapid increase in memory consumption by the IRIS server. Pretty soon, IRIS eats up all the memory and I have to restart the Language Server.
- If I create longer-lived connections, the UPDATE performance is much better, but memory leakage still occurs, although not that quickly.
What am I doing wrong? How can I avoid memory leaks while achieving maximum performance for data update operations in the remote database?
UPD: the problem is solved. Briefly, XDBC communicated with a remote database via the %Java Server gateway and not via the %JDBC Server one, and it is absolutely necessary to specify the path to the Postgres JDBC driver in the ClassPath of this gateway. If this is not done, the code will somehow work, but there will be memory leaks.
Comments
I'm afraid I don't have an answer, I use JDBC regularly (admittedly , not Postgres and NOT using %XDBC classes) and I've never seen this behavior.
If anyone wants to dive deeper in this issue, more information are required to analyze and maybe try to reproduce it, like:
"....rapid increase in memory consumption by the IRIS server" What process is consuming memory?
What OS, Windows or Linux? If Linux, what flavor?
What Java version are you using?
What version of Postgres JDBC driver are you using?
Last but to least, note that class %XDBC.Gateway says:
"FOR INTERNAL USE - do not invoke directly"
Hi Enrico,
-
I'm aware of that warning "FOR INTERNAL USE" in the source code, but the way I create XDBC connections is officially documented
-
The OS is Ubuntu, IRIS is containerized, the memory is consumed by the IRIS Java language server
-
Regarding memory leaks, I suspect that ResultSets are not closed properly - method %XDBC.Gateway.JDBC.ResultSet::Close() is empty and does nothing but it seems to be it should call ..%externalResultSet.close(). But I'm not sure it is the only reason
-
UPD: calling Java GC really helps:
Set gateway = $system.external.getGateway("%Java Server")
Do gateway.invoke("java.lang.System", "gc")
Why don't you modify your query to execute just one massive SQL? An update with the select in the where.
Batch updates are indeed supported by XDBC but I need to execute updates in real time without any delay, row-by-row.
But a batch update will be more real time than a loop and multiple updates.
Have you tried using the proper, official, documented and supported way to do that using Foreign Tables?
Does the memory leak happen using Foreign Tables ?
Maybe the way you are using the "FOR INTERNAL USE" classes is not proper/correct.
Why using something you are not supposed to use when there is a proper/supported way to do this?
1. It is officially declared that XDBC is the most modern remote data access technology
2. I use remote data sources very extensively and have tried everything - linked tables, Ens.* business operations and adapters, foreign tables, and XDBC. XDBC looks like the most simple and attractive data access technology. Using foreign servers and tables you need to describe both the server and the tables. Using XDBC you work with any ODBC or JDBC data source transparently.
In particular, I have had bad experience with both Postgres linked tables and with Postgres foreign tables, especially when remote/Postgres tables contain columns of type `text` or `bytea`. In this case, IRIS often silently returns an empty dataset without any errors even if a remote table is actually not empty.
You may reduce the number of XDBC connects by 50%
#1)
Create an SQL function for the update
CREATE FUNCTION done_func(integer,varchar) RETURNS integer
AS 'update my_postgres_table set status=$2 where id=$1'
LANGUAGE SQL
IMMUTABLE
RETURNS NULL ON NULL INPUT
RETURN 1 ;#2)
run your SELECT, adding the function in your WHERE clause which is always TRUE
select * from my_postgres_table limit 10000 Where 1 = done_Func(id,'S') You should check the exact syntax for PostgreSQL
I just composed it guided by
https://www.postgresql.org/docs/current/sql-createfunction.html
Have you tried limiting the memory on the java process? On the "JVM arguements" on the external language server settings add a parameter like "-Xmx16g" to limit the memory to 16gb. Have a google for "JVM arguments xmx" there are loads of setting for memory to play with.
No, JVM settings are not the reason.
It seems to me the reason is the way how the XDBC Java wrapper is implemented. There are two things I don't really like about the code:
1. Statements created by an instance of XDBC Connection objects are being put in a List which keeps growing and growing. Even if I call Statement.close(), it will not be removed from the list immediately. The list is cleared only when the Connection.close() method is called.
2. Less likely, but this could also be the cause of the problem: no ResultSet.close() calls. The specification says:
https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#close()
It is strongly recommended that an application explicitly commits or rolls back an active transaction prior to calling the close method. If the close method is called and there is an active transaction, the results are implementation-defined.Modern JDBC drivers try to close dependent objects cascadely on Connection.close() but it is not always the case, and it is recommended to always close them explicitly.
In particular, someone complained about the similar issue with the Postgres JDBC driver here. And in the XDBC code I see that it tries to close and release all Statement instances but not ResultSets. On the other hand, Postgres driver maintainers reported that namely this issue was fixed.
💡 This question is considered a Key Question. More details here.