Article
· Aug 3, 2020 3m read

Creating classes/tables with more than 999 properties in InterSystems IRIS

InterSystems IRIS currently limits classes to 999 properties.

But what to do if you need to store more data per object?

This article would answer this question (with the additional cameo of Community Python Gateway and how you can transfer wide datasets into Python).

The answer is very simple actually - InterSystems IRIS currently limits classes to 999 properties, but not to 999 primitives. The property in InterSystems IRIS can be an object with 999 properties and so on - the limit can be easily disregarded.

Approach 1.

Store 100 properties per serial property. First create a stored class that stores a hundred properties.

Class Test.Serial Extends %SerialObject
{

Property col0;
...
Property col99;
}

And in your main class add as much properties as you need:

Class Test.Record Extends %Persistent
{
Property col00 As Test.Serial;

Property col01 As Test.Serial;
...

Property col63 As Test.Serial;
}

This immediately raises your limit to 99900 properties.

This approach offers uniform access for all properties via SQL and object layers (we always know property reference by it's number).

Approach 2.

One $lb property.

Class Test.Record Extends %Persistent
{
Property col As %List;
}

This approach is simpler but does not provide explicit column names.

Use SQL $LIST* Functions to access list elements.

Approach 3.

Use Collection (List Of/Array Of) property.

Class Test.Record Extends %Persistent
{
Property col As List Of %Integer;
}

This approach also does not provide explicit column names for individual values (but do you really need it?). Use property parameters to project the property as SQL column/table.

Docs for collection properties.

Approach 4.

Do not create properties at all and expose them via SQL Stored procedure/%DispatchGetProperty.

Class Test.Record Extends %Persistent
{

Parameter GLVN = {..GLVN("Test.Record")};

/// SELECT Test_Record.col(ID, 123) 
/// FROM Test.Record
///
/// w ##class(Test.Record).col(1, )
ClassMethod col(id, num) As %Decimal [ SqlProc ]
{
    #define GLVN(%class) ##Expression(##class(Test.Record).GLVN(%class))
    quit $lg($$$GLVN("Test.Record")(id), num + 1)
}

/// Refer to properties as: obj.col123 
Method %DispatchGetProperty(Property As %String) [ CodeMode = expression ]
{
..col(..%Id(), $e(Property, 4, *))
}


/// Get data global
/// w ##class(Test.Record).GLVN("Test.Record")
ClassMethod GLVN(class As %Dictionary.CacheClassname = {$classname()}) As %String
{
    return:'$$$comClassDefined(class) ""
    set strategy = $$$comClassKeyGet(class, $$$cCLASSstoragestrategy)
    return $$$defMemberKeyGet(class, $$$cCLASSstorage, strategy, $$$cSDEFdatalocation)
}

The trick here is to store everything in the main $lb and use unallocated schema storage spaces to store your data. Here's an article on global storage.

With this approach, you can also easily transfer the data into Python environment with Community Python Gateway via the ExecuteGlobal method.

This is also the fastest way to import CSV files due to the similarity of the structures.

 

Conclusion

999 property limit can be easily extended in InterSystems IRIS.

 

Do you know other approaches to storing wide datasets? If so, please share them!

Discussion (13)2
Log in or sign up to continue

While I always advertise CSV2CLASS methods for generic solutions, wide datasets often possess an (un)fortunate characteristic of also being long.

In that case custom object-less parser works better.

Here's how it can be implemented.

1. Align storage schema with CSV structure

2. Modify this snippet for your class/CSV file:

Parameter GLVN = {..GLVN("Test.Record")};

Parameter SEPARATOR = ";";

ClassMethod Import(file = "source.csv", killExtent As %Boolean = {$$$YES})
{
    set stream = ##class(%Stream.FileCharacter).%New()
    do stream.LinkToFile(file)
    
    kill:killExtent @..#GLVN
    
    set i=0
    set start = $zh
    while 'stream.AtEnd {
        set i = i + 1
        set line = stream.ReadLine($$$MaxStringLength)
        
        set @..#GLVN($i(@..#GLVN)) = ..ProcessLine(line)
        
        write:'(i#100000) "Processed:", i, !
    }
    set end = $zh
    
    write "Done",!
    write "Time: ", end - start, !
}

ClassMethod ProcessLine(line As %String) As %List
{
    set list = $lfs(line, ..#SEPARATOR)
    set list2 = ""
    set ptr=0
    
    // NULLs and numbers handling.
    // Add generic handlers here.
    // For example translate "N/A" value into $lb() if that's how source data rolls
    while $listnext(list, ptr, value) {
        set list2 = list2 _ $select($g(value)="":$lb(), $ISVALIDNUM(value):$lb(+value), 1:$lb(value))
    }

    // Add specific handlers here
    // For example convert date into horolog in column4

    // Add %%CLASSNAME
    set list2 = $lb() _ list2
    
    quit list2
}

Wide datasets are fairly typical for:

  • Industrial data
    • IoT
    • Sensors data
    • Mining and processing data
    • Spectrometry data
  • Analytical data
    • Most datasets after one-hot-encoding applied
    • NLP datasets
    • Any dataset where we need to raise dimensionality
    • Media featuresets
  • Social Network/modelling schemas

I'm fairly sure there's more areas but I have not encountered them myself.

Recently I have delivered a PoC with classes more than 6400 columns wide and that's where I got my inspiration for this article (I chose approach 4).

@Renato Banzai also wrote an excellent article on his project with more than 999 properties.

Overall I'd like to say that a class with more than 999 properties is a correct design in many cases.

My EAV implementation is the same as your Approach 3, so it will work fine even with fully filled 4.000.000 attributes.
Since the string has a limit of 3,641,144, approaches with serial and %List are dropped.

All other things being equal, everything depends on the specific technical task: speed, support for Objects/SQL, the ability to name each attribute, the number of attributes, and so on.

The approach 1 doesn't "raises your limit to 99900 properties" but rather to 6600 properties.

You can test it through large.utils.cls

w ##class(large.utils).init(66,100)
deleting large.serial100
creating large.serial100
compiling large.serial100

Compilation started on 07/31/2023 14:48:16
Compiling class large.serial100
Compiling routine large.serial100.1
Compilation finished successfully in 0.218s.

creating large.c66
compiling large.c66

Compilation started on 07/31/2023 14:48:16
Compiling class large.c66
Compiling table large.c66
Compiling routine large.c66.1
Compilation finished successfully in 8.356s.

1
w ##class(large.utils).init(67,100)
deleting large.serial100
creating large.serial100
compiling large.serial100

Compilation started on 07/31/2023 14:48:27
Compiling class large.serial100
Compiling routine large.serial100.1
Compilation finished successfully in 0.213s.

creating large.c67
compiling large.c67

Compilation started on 07/31/2023 14:48:27
Compiling class large.c67
Compiling table large.c67
Compiling routine large.c67.1
ERROR #5002: ObjectScript error: <MAXSTRING>CompileRtns+286^%occRoutine
ERROR #5002: ObjectScript error: <NOROUTINE>DescribePhase2+9^%occSysDescriptor *large.c67.1
Detected 2 errors during compilation in 6.896s.

0 a(<MAXSTRING>CompileRtns+286^%occRoutineÛCompileRtns+286^%occRoutine  IRISAPP³e^ReturnError+2^%occSystem^1!e^CompileList+229^%occCompile^1e^CompileList+23^%apiOBJ^1e^Compile+1^%apiOBJ^1e^Compile+1^%SYSTEM.OBJ.1^1^init+50^large.utils.1^1e^^^0K0 G¦
                                                                          large.c67/    IRISAPP#!e^CompileRtns+388^%occRoutine^1
IRISAPP>