Article
· Aug 27 3m read

Importing CSV Data Into InterSystems IRIS and preserving IDs

Hi folks!

It is very easy to import CSV data into IRIS. But what if we want to preserve the original IDs in CSV?

Recently I came across with the situation when I needed to import two csv's into IRIS which were linked by one column referencing  to another csv's col: a typical Foreign Key and Primary Key situation, where csv1 contains this column as Primary Key, and csv2 as Foreign key with id's related to csv1.

The image is generated by ChatGPT so don't blame it - it tried its best to generate countries as primary keys with countries.csv-cities.csv relationship :)

I know at least three useful utilities to import CSV: csvgencsvgen-python, and bdb-sql-utils. But if I import both CSVs, e.g., via csvgen, there will be two classes in IRIS with data imported, generated internal ID, and IDKey. And it is not possible to change the IDKey to another index in the class once you have data in it. So turned out it's not that obvious how to import CSV and preserve a column with ID data as ID Key in IRIS.

Of course it is possible and I'm sure you know a lot of ways to do and it is possible now to import and preserve existing ID's in csv as ID keys in  csvgen and csvgen-python  both. To generate an iris class and import data vs a given CSV with a primary key, provide the name of the column in the pkey parameter (the last one), so the utility will add an IDKey, PrimaryKey index to the class. E.g. if we import countries.csv and want to make the Name column an IDKey and Primary Key call csvgen as follows:

//primary key name is the 11th parameter :)
zw ##class(community.csvgen).Generate("path/to/countries.csv,",","package.Countries",,,,,,,,"Name")

What it does under the hood can be listed as follows:

  • - generates class with properties as usual
  • - deletes all data,
  • - deletes the  DDLBEIndex bitmap if exists (it prevents from creating an alternative IDKey to existing one),
  • - sets (temporary) the system-wide option DDLPKeyNotIDKey=0
  • - adds Primary Key index for a given column with name provided

And as a result, you have a newly generated class with data and primary key ID key for a given column name.

Here is the code in csvgen.

So, how do you connect two generated classes. In my case I needed to have swizzling of class1 instances in class2.property. So I just renamed the datatype in generated class to a class with Primary Key-IdKey.

Here is the example demo app that analyses potato consumption and import in different countries (don't ask me why I invented such an example - maybe I was hungry). Countries are real, but the consumption is generated by gpt he/she said it is close to reality, as it turned out that it is pretty dificult to obtain this data. 

Here is the countries.csv and potatos_sales.csv.

This is how I import data and generate classes:

   

    zpm "install csvgen"
    set file="/home/irisowner/dev/data/countries.csv"
    zw ##class(community.csvgen).Generate(file,",","esh.csvpkey.Countries",,,,,,,,"Name")
    set file="/home/irisowner/dev/data/potato_sales.csv"
    zw ##class(community.csvgen).Generate(file,",","esh.csvpkey.Potatos",,,,,1)

It generates a countries class with PrimaryKey:

Class esh.csvpkey.Countries Extends %Persistent [ ClassType = persistent, DdlAllowed, Final, Owner = {irisowner}, ProcedureBlock, SqlRowIdPrivate, SqlTableName = Countries ]
{

Property Name As %Library.String(MAXLEN = 250) [ SqlColumnNumber = 2 ];
Property Code As %Library.String(MAXLEN = 250) [ SqlColumnNumber = 3 ];
...

Index COUNTRIESPKEY1 On Name [ IdKey, PrimaryKey, SqlName = COUNTRIES_PKEY1, Unique ];
...

}

Then I changed the generated property countries from %String to reference the countries class:

Property Country As esh.csvpkey.Countries [ SqlColumnNumber = 2 ];

And I've built a very obvious IRIS BI/DSW demo to see how it's going with potatoes in countries through the years:

Hope you found this helpful and entertaining ;)

Discussion (0)1
Log in or sign up to continue