Eduard Lebedyuk · Aug 19, 2020 go to post

I think this is a task better suited for Version Control System, such as git.

  1. Export all relevant code from the first namespace
  2. Commit
  3. Export all relevant code from the second namespace 
  4. Diff

And for CD/CI systems such as Jenkins, GitHub or GitLab.

That said you can use this SQL to compare class hashes (if hashes are identical than classes are identical)

SELECT
  Name,
  Hash,
  TimeChanged,
  TimeCreated
FROM %Dictionary.CompiledClass

After that you can use this SQL to compare hashes of the individual methods (if classes do not match):

SELECT
  parent,
  Name,
  RuntimeHash
FROM %Dictionary.CompiledMethod
Eduard Lebedyuk · Aug 13, 2020 go to post

This is an InterSystems IRIS functionality.

I recommend upgrading your Caché application to InterSystems IRIS.

Eduard Lebedyuk · Aug 11, 2020 go to post

By default IRIS listens on all interfaces.

Are you able to access SMP from a remote machine?

Eduard Lebedyuk · Aug 5, 2020 go to post

That's good and well for sparse datasets (where say you have a record with 10 000 possible attributes but on average only 50 are filled).

EAV does not help in dense cases where every record actually has 10 000 attributes.

Eduard Lebedyuk · Aug 5, 2020 go to post

Wide datasets are fairly typical for:

  • Industrial data
    • IoT
    • Sensors data
    • Mining and processing data
    • Spectrometry data
  • Analytical data
    • Most datasets after one-hot-encoding applied
    • NLP datasets
    • Any dataset where we need to raise dimensionality
    • Media featuresets
  • Social Network/modelling schemas

I'm fairly sure there's more areas but I have not encountered them myself.

Recently I have delivered a PoC with classes more than 6400 columns wide and that's where I got my inspiration for this article (I chose approach 4).

@Renato Banzai also wrote an excellent article on his project with more than 999 properties.

Overall I'd like to say that a class with more than 999 properties is a correct design in many cases.

Eduard Lebedyuk · Aug 4, 2020 go to post

While I always advertise CSV2CLASS methods for generic solutions, wide datasets often possess an (un)fortunate characteristic of also being long.

In that case custom object-less parser works better.

Here's how it can be implemented.

1. Align storage schema with CSV structure

2. Modify this snippet for your class/CSV file:

Parameter GLVN = {..GLVN("Test.Record")};

Parameter SEPARATOR = ";";

ClassMethod Import(file = "source.csv", killExtent As %Boolean = {$$$YES})
{
    set stream = ##class(%Stream.FileCharacter).%New()
    do stream.LinkToFile(file)
    
    kill:killExtent @..#GLVN
    
    set i=0
    set start = $zh
    while 'stream.AtEnd {
        set i = i + 1
        set line = stream.ReadLine($$$MaxStringLength)
        
        set @..#GLVN($i(@..#GLVN)) = ..ProcessLine(line)
        
        write:'(i#100000) "Processed:", i, !
    }
    set end = $zh
    
    write "Done",!
    write "Time: ", end - start, !
}

ClassMethod ProcessLine(line As %String) As %List
{
    set list = $lfs(line, ..#SEPARATOR)
    set list2 = ""
    set ptr=0
    
    // NULLs and numbers handling.
    // Add generic handlers here.
    // For example translate "N/A" value into $lb() if that's how source data rolls
    while $listnext(list, ptr, value) {
        set list2 = list2 _ $select($g(value)="":$lb(), $ISVALIDNUM(value):$lb(+value), 1:$lb(value))
    }

    // Add specific handlers here
    // For example convert date into horolog in column4

    // Add %%CLASSNAME
    set list2 = $lb() _ list2
    
    quit list2
}
Eduard Lebedyuk · Aug 3, 2020 go to post

Restarting is the easiest way.

If you can't just overwrite global buffer with another global (but check that your target global is really flushed out of global buffer).