Eduard Lebedyuk · Aug 11, 2020 go to post

By default IRIS listens on all interfaces.

Are you able to access SMP from a remote machine?

Eduard Lebedyuk · Aug 5, 2020 go to post

That's good and well for sparse datasets (where say you have a record with 10 000 possible attributes but on average only 50 are filled).

EAV does not help in dense cases where every record actually has 10 000 attributes.

Eduard Lebedyuk · Aug 5, 2020 go to post

Wide datasets are fairly typical for:

  • Industrial data
    • IoT
    • Sensors data
    • Mining and processing data
    • Spectrometry data
  • Analytical data
    • Most datasets after one-hot-encoding applied
    • NLP datasets
    • Any dataset where we need to raise dimensionality
    • Media featuresets
  • Social Network/modelling schemas

I'm fairly sure there's more areas but I have not encountered them myself.

Recently I have delivered a PoC with classes more than 6400 columns wide and that's where I got my inspiration for this article (I chose approach 4).

@Renato.Banzaialso wrote an excellent article on his project with more than 999 properties.

Overall I'd like to say that a class with more than 999 properties is a correct design in many cases.

Eduard Lebedyuk · Aug 4, 2020 go to post

While I always advertise CSV2CLASS methods for generic solutions, wide datasets often possess an (un)fortunate characteristic of also being long.

In that case custom object-less parser works better.

Here's how it can be implemented.

1. Align storage schema with CSV structure

2. Modify this snippet for your class/CSV file:

Parameter GLVN = {..GLVN("Test.Record")};

Parameter SEPARATOR = ";";

ClassMethod Import(file = "source.csv", killExtent As %Boolean = {$$$YES})
{
    set stream = ##class(%Stream.FileCharacter).%New()
    do stream.LinkToFile(file)
    
    kill:killExtent @..#GLVN
    
    set i=0
    set start = $zh
    while 'stream.AtEnd {
        set i = i + 1
        set line = stream.ReadLine($$$MaxStringLength)
        
        set @..#GLVN($i(@..#GLVN)) = ..ProcessLine(line)
        
        write:'(i#100000) "Processed:", i, !
    }
    set end = $zh
    
    write "Done",!
    write "Time: ", end - start, !
}

ClassMethod ProcessLine(line As %String) As %List
{
    set list = $lfs(line, ..#SEPARATOR)
    set list2 = ""
    set ptr=0
    
    // NULLs and numbers handling.
    // Add generic handlers here.
    // For example translate "N/A" value into $lb() if that's how source data rolls
    while $listnext(list, ptr, value) {
        set list2 = list2 _ $select($g(value)="":$lb(), $ISVALIDNUM(value):$lb(+value), 1:$lb(value))
    }

    // Add specific handlers here
    // For example convert date into horolog in column4

    // Add %%CLASSNAME
    set list2 = $lb() _ list2
    
    quit list2
}
Eduard Lebedyuk · Aug 3, 2020 go to post

Restarting is the easiest way.

If you can't just overwrite global buffer with another global (but check that your target global is really flushed out of global buffer).

Eduard Lebedyuk · Jul 28, 2020 go to post

Python Gateway - invoke Python code snippets for your analytics and machine-learning related tasks

Are there any docs or guides on that feature?

Eduard Lebedyuk · Jul 27, 2020 go to post

If the method is inherited in a Persistent class, it won't compile.  

You can easily modify the method to ignore abstract method if they are system (start with %) or originate from system classes (start with %).

It looks as though it checks for the name of the abstract method but not the number of parameters the method requires.

You can modify checker to check whatever you need.

Also, if I add another parameter in addition to the parameters listed, this also seems to be ok.  

It is, inherited method can accept more arguments than a parent method.

Do you know if there are plans for the compiler to check for the implementation of abstract methods instead of having to inherit code to do this?

Please file a WRC issue.

Eduard Lebedyuk · Jul 27, 2020 go to post

You can easily add your own compile-time check via method generators. Here's an example which checks that all abastract methods are implemented.

Class Test.AbstractChecker
{

ClassMethod Check() As %Status [ CodeMode = objectgenerator, ForceGenerate ]
{
    #Dim sc As %Status = $$$OK
    
    // Get class name from %compiledclass object which is an instance of a currently compiled class
    Set class = %compiledclass.Name

    // Iterate over class methods.
    // You can also use %class object to iterate
    Set method=$$$comMemberNext(class, $$$cCLASSmethod, "")
    While method'="" {
        
        // Get mthod abstract state
        Set abstract = $$$comMemberKeyGet(class, $$$cCLASSmethod, method, $$$cMETHabstract)
        
        // Quit iteration when we find any abstract compiled method
        If abstract {
            set origin = $$$comMemberKeyGet(class, $$$cCLASSmethod, method, $$$cMETHorigin)
            Set sc = $$$ERROR($$$GeneralError, $$$FormatText("Abstract method %1 in class %2 not implemented (origins from %3)", method, class, origin))
            Quit
        }
        
        // Get next method
        Set method=$$$comMemberNext(class, $$$cCLASSmethod, method)        
    }
    Quit sc
}

}

After adding this class to inheritance I get an error on compilation:

Eduard Lebedyuk · Jul 19, 2020 go to post

PythonGateway does not have this limitation with global transfer and sql procedure for SQL access. Recently I've done an ml poc with a data set 6400 columns wide without issues.

Eduard Lebedyuk · Jul 16, 2020 go to post

How do I reference parent from child if parent id is composite?

Here's my global:

^data("idA", "idB") = "Parent"
^data("idA", "idB", 1) = "Child"

And Class is:

Class Parent {
Property IdAProp;
Property IdBProp;
}

SQL Storage for Parent works.

For Child I have tried:

<Subscript name="1">
<Expression>{User.Parent.IdAProp}</Expression>
</Subscript>
<Subscript name="2">
<Expression>{User.Parent.IdBProp}</Expression>
</Subscript>
<Subscript name="3">
<Expression>{Position}</Expression>
</Subscript>

But compilation fails with:

ERROR #5547: Map: Map1 - Subscript Expression - invalid expression '{User.Parent.IdAProp}'.  Must be a valid field reference.    If this is the Master Map, it must be an IDKEY field.
  > ERROR #5030: An error occurred while compiling class 'User.Children'

UPD. SQL schema name should be used, not class.

Eduard Lebedyuk · Jul 15, 2020 go to post
Class SerialA Extends (%SerialObject, %XML.Adaptor) [ ProcedureBlock ]
{
      Property SerialB as SerialB;
     Property SerialC as SerialC(XMLPROJECTION = "NONE");
}