Testing Columnar Storage

Article

Luis Angel Pére... · Feb 29, 2024 4m read

Open Exchange

#Columnar Storage #Docker #InterSystems IRIS #Open Exchange

As most of you probably already know, since approximately the end of 2022 InterSystems IRIS included the columnar storage functionality to its database, well, in today's article we are going to put it to the test in comparison to the usual row storage.

Columnar Storage

What is the main characteristic of this type of storage? Well, if we consult the official documentation we will see this fantastic table that explains the main characteristics of both types of storage (by rows or by columns):

As you can see, columnar storage is designed primarily for analytical tasks in which queries are launched against specific fields in our table, while row storage is more optimal when a large number of insertion, update and deletion operations are required. as well as obtaining complete records.

If you continue reading the documentation you will see how simple it is to configure our table to be able to use columnar storage:

CREATE TABLE table (column type, column2 type2, column3 type3) WITH STORAGETYPE = COLUMNAR

Using this command we would be defining all the columns of our table with columnar storage, but we could opt for a mixed model in which our table has row storage but certain columns make use of columnar storage.

This mixed scenario could be interesting in cases where aggregation operations such as sums, averages, etc. are common. For this case we could define which column is the one that will use said storage:

CREATE TABLE table (column type, column2 type2, column3 type3 WITH STORAGETYPE = COLUMNAR)

In the previous example we defined a table with row storage and a column (column3) with columnar storage.

Comparative

To compare the time spent by column storage and row storage in different queries, we have created a small exercise using Jupyter Notebook that will insert a series of records that we will generate in two tables, the first with storage with rows ( Test.PurchaseOrderRow) and the second with columnar storage in two of its columns (Test.PurchaseOrderColumnar)

Test.PurchaseOrderRow

CREATE TABLE Test.PurchaseOrderRow (
    Reference INTEGER,
    Customer VARCHAR(225),
    PaymentDate DATE,
    Vat NUMERIC(10,2),
    Amount NUMERIC(10,2),
    Status VARCHAR(10))

Test.PurchaseOrderColumnar

CREATE TABLE Test.PurchaseOrderColumnar (
    Reference INTEGER,
    Customer VARCHAR(225),
    PaymentDate DATE,
    Vat NUMERIC(10,2),
    Amount NUMERIC(10,2) WITH STORAGETYPE = COLUMNAR,
    Status VARCHAR(10) WITH STORAGETYPE = COLUMNAR)

If you download the Open Exchange project and deploy it in your local Docker, you can access the Jupyter Notebook instance and review the file PerformanceTests.ipynb, which will be responsible for generating the random data that we are going to store in different phases in our tables and finally it will show us a graph with the performance of the query operations.

Let's take a quick look at our project configuration:

docker-compose.yml

version: '3.7'
services:
  # iris
  iris:
    init: true
    container_name: iris
    build:
      context: .
      dockerfile: iris/Dockerfile
    ports:
      - 52774:52773
      - 51774:1972
    volumes:
    - ./shared:/shared
    environment:
    - ISC_DATA_DIRECTORY=/shared/durable
    command: --check-caps false --ISCAgent false
  # jupyter notebook
  jupyter:
    build:
      context: .
      dockerfile: jupyter/Dockerfile
    container_name: jupyter
    ports:
      - "8888:8888"
    environment:
      - JUPYTER_ENABLE_LAB=yes
      - JUPYTER_ALLOW_INSECURE_WRITES=true
    volumes:
      - ./jupyter:/home/jovyan
      - ./data:/app/data
    command: "start-notebook.sh --NotebookApp.token='' --NotebookApp.password=''"

We deploy the IRIS and Jupyter containers in our docker, initially configuring IRIS with the namespace "TEST" and the two tables required for the test.

To avoid boring you with code, you can consult the PerformanceTests.ipynb file from which we will connect to IRIS, generate the records to be inserted and store them in IRIS

Test execution

The results have been the following (in seconds):

Inserts:

The insertions made are of bulk type:

INSERT INTO Test.PurchaseOrderColumnar (Reference, Customer, PaymentDate, Vat, Amount, Status) VALUES (?, ?, ?, ?, ?, ?)

And the time for each batch of inserts is as follows:

Total inserts	Row storage	Mixed storage
1000	0.031733	0.041677
5000	0.159338	0.185252
20000	0.565775	0.642662
50000	1.486459	1.747124
100000	2.735016	3.265492
200000	5.395032	6.382278

Selects:

The Select launched includes an aggregation function and a condition, both on columns with columnar storage:

SELECT AVG(Amount) FROM Test.PurchaseOrderColumnar WHERE Status = 'SENT'

Total rows	Row storage	Mixed storage
1000	0.002039	0.001178
5000	0.00328	0.000647
20000	0.005493	0.001555
50000	0.016616	0.000987
100000	0.036112	0.001605
200000	0.070909	0.002738

Conclusions

As you can see in the results obtained, the operation is exactly what is indicated in the documentation. Including columns with columnar storage has slightly penalized performance during insert (about 18% slower for our example) while queries on those same columns have dramatically improved response time (258 times faster).

It is undoubtedly something to take into account when planning the development of any application.

Go to the original post written by @Luis Angel Pérez Ramos

Dewey Hunt · Mar 4, 2024

I would be curious to see this same test using pure Object script (M) and globals.

0 0

Robert Cemper · Mar 5, 2024

You might appreciate this approach
https://community.intersystems.com/post/columnar-vs-row-storage-iris-native

and BTW. thanks for the inspiration.

Robert Cemper · Mar 4, 2024

Just in case i misunderstood you.:

this is a storage concept for Objects and SQL
it is pure Globals and ObjectScript (M) in the core
take a look at the Globals used in the example and you see all that you are looking for,
and it is just ObjectScript that runs behind the scene

So: what are you looking for ?
And BTW. Its concept ia available for 4+ decades.
It just had no Object, no SQL, no fancy name then,

Hi @Luis Angel Pérez Ramos
I got in fact the same values with my iris community edition.

Test Columnar vs. Row Storage
=============================
     1 - Initialize Tables
     2 - Generate Data
     3 - Compare SELECT
     4 - Loop SELECT
     5 - Auto Loop
Select Function or * to exit : 5 Loops to run :25
Set steps by loop
Records to add (1...10000)[1]:10000 
records = 15000 row = .033238 col = .044981
records = 25000 row = .007728 col = .000254
records = 35000 row = .011427 col = .000335
records = 45000 row = .014625 col = .000406
records = 55000 row = .018682 col = .000500
records = 65000 row = .023468 col = .000562
records = 75000 row = .026235 col = .000659
records = 85000 row = .029151 col = .000738
records = 95000 row = .032212 col = .000794
records = 105000 row = .035926 col = .000856
records = 115000 row = .039431 col = .000934
records = 125000 row = .043036 col = .001008
records = 135000 row = .049134 col = .001074
records = 145000 row = .050405 col = .001404
records = 155000 row = .054313 col = .001669
records = 165000 row = .058039 col = .001380
records = 175000 row = .060756 col = .001384
records = 185000 row = .064746 col = .001451
records = 195000 row = .068403 col = .001665
records = 205000 row = .070737 col = .001642
records = 215000 row = .073610 col = .001690
records = 225000 row = .078551 col = .001797
records = 235000 row = .084139 col = .001997
records = 245000 row = .087316 col = .001908
records = 255000 row = .087862 col = .002546
records = 265000 row = .090478 col = .002152