Batch insert with multi-connections in Cache

Question

Question

xu zong · Nov 22, 2022

Hi,

We have 2 writer threads, each has a connection to Cache. When write records to Cache, we use PreparedStatement.executeBatch() with the batchSize 10 000.

We got very very slow insert speed.

But if only 1 writer thread used, the speed is faster, just as normal.

Any idea?

Product version: Caché 2018.1

Discussion (9)2

Log in or sign up to continue

score 0 · Answer 1 · 2022-11-22T03:57:58-05:00

I think more details are needed, looks like it is JDBC or ODBC connection, which one?

Depending on a particular task, probably there are various other options to make it even faster.

score 0 · Answer 2 · 2022-11-22T04:11:04-05:00

xu zong · Nov 22, 2022

Thanks for the reply. Here is the core testing code we use.

0 0

score 0 · Answer 3 · 2022-11-22T08:30:47-05:00

Eduard Lebedyuk · Nov 22, 2022

Try this example.

0 0

score 1 · Answer 4 · 2022-11-22T08:39:16-05:00

Without additional details it may be hard to tell for certain. If it's an issue on Cache side you might consider using $Sequence in your storage model

$SEQUENCE provides a fast way for multiple processes to obtain unique (non-duplicate) integer indices for the same global variable. For each process, $SEQUENCE allocates a sequence (range) of integer values. $SEQUENCE uses this allocated sequence to assign a value to gvar and returns this gvar value. Subsequent calls to $SEQUENCE increment to the next value in the allocated sequence for that process. When a process consumes all of the integer values in its allocated sequence, its next call to $SEQUENCE automatically allocates a new sequence of integer values. $SEQUENCE automatically determines the size of the sequence of integer values to allocate. It determines the size of the allocated sequence separately for each sequence allocation. In some cases, this allocated sequence may be a single integer.

$SEQUENCE is intended to be used when multiple processes concurrently increment the same global. $SEQUENCE allocates to each concurrent process a unique range of values for the gvar global. Each process can then call $SEQUENCE to assign sequential values from its allocated range of values.

In your class definition within the you could consider changing the IdFunction from the default which is Increment to sequence

but again more detail is needed.

score 0 · Answer 5 · 2022-11-22T09:34:30-05:00

Hi @xu zong

If your scenario allows, check the Restriction Keywords Argument in special %NOCHECK, %NOLOCK and %NOTRIGGER that can improve a lot the performance of bulk inserts/updates

score 0 · Answer 6 · 2022-11-22T21:21:05-05:00

We have 132 columns in the table, but no constraint is used, including 'unique validation' .

PS: If we change batchSize to 1000 for each writer threads, insert speed seems ok for us. Don't know why.

score 1 · Answer 7 · 2022-11-23T07:04:05-05:00

Cristiano Silva · Nov 23, 2022

In most case %NOLOCK is enough to improve performance. If is possible do a try.

1 0

score 0 · Answer 8 · 2022-11-24T00:26:26-05:00

xu zong · Nov 24, 2022

It works! Thanks a lot.

0 0

score 2 · Answer 9 · 2022-11-25T01:43:34-05:00

The explanation here is the table lock threshold. If a process modifies a certain number of entries within one table without committing it will trigger an escalation from atomic locks per record to a full table lock.

This is configurable but i am not sure about the default value but i think its between 1000 and 10000.

If the lock is triggered essentially all other processes which want to write have to wait until the process has committed its changes which will release the lock.

Solution is to use %NOLOCK if you are sure you will not hit the same record with multiple processes.