Timur Safin · Nov 22, 2016 go to post

Thanks for debugging advice - that was greatest missing point which prevented me from using generators wider. I've never understood how to debug them easily. Now I see!

[Goodbye readability! Hello performance, but hard to read!]

Timur Safin · Oct 12, 2016 go to post

I concur the warning about security implications once you've exported terminal access via unauthorized CSP application.

This is very big, and very wide security hole! One could escalate their permissions to localsystem superuser (when you run Windows) and could do pretty much anything with your system, if you didn't properly lock all involved layers.

Timur Safin · Oct 12, 2016 go to post

Putting aside legaility of this task (let assume this is all your code), your case sounds is even much more complicated: you are appranetly running newer version of an engine, with updated tokens and bytecode interpreter, but need to restore code for older version of bytecode. 

That was not frequent, but bytecode tokens map did change over the time, so you actually need 2 decompilers (for older version, and for currently used) not one. Possibility to have those is quite zero. Sorry.

Timur Safin · Oct 12, 2016 go to post

[This is slightly overdue, but very welcomed change in any case!]

Did I interpret these correctly, and search improvements were mostly implemented using iFind capabilities?

Timur Safin · Sep 28, 2016 go to post

There is Russian proverb for such cases "Rumors about my death is slightly exaggerated"[1] .  Be it BigData, which is declared by Gartner as dead, but is here to stay, in slightly wider form and in more scenarios, or be it MapReduce. Yes, Google marketers claim to not use it anymore, after they have moved to better/more suitable for search interfaces, and yes, in Java world Apache Hadoop is not the best MapReduce implementation nowadays, where Apache Spark is the better/more modern implementation of the same/similar concepts.

But life is more complicated than that shown to us by marketing, there are still some big players, which are still using their own C++ implementation of MapReduce in their search infrastructure - like Russian Yandex search giant. And this is big enough for me to still count it as relevant.

[1] As Eduard has pointed out that was Mark Twain who originally said "The report of my death was an exaggeration." Thanks for correction, @Eduard!

Timur Safin · Sep 8, 2016 go to post

For this particular usage scenario, running x86 code thru binary translator would be not a very good idea. Raspberry Pi itself, is not very fastest ARM processor, and adding JIT oberhead would make emulation layer work extremelly slow.

OTOH, initial porting to any new hardware platform, especially for the OS which is already supported (Debian) might be quite easy (especially if you disable for a moment assembler optimizations, and compile full C kernel). [Jose might correct me here, but at least that was my impression from InterSystems times]

The problem though - whether it worth all the pain. What is the reasonable outcome any vendor could get from such habby device owner? 50¢? 5$? Ok, we are talking about educational market, thus assuming there won't be any money stream, but rather enabling ecosystem. Why we think that RaspburryPI build would not repeat GlobalDB failed experiment? Why it would be different this time (on smaller hardware market, and with less powerful hardware)

Timur Safin · Sep 8, 2016 go to post

Nice catch, Daniel!

I wonder though, have you opened prodlog to change behavior of PutLine() method?

Timur Safin · Sep 6, 2016 go to post

Good point - the less traffic is there, the better final result.
Although this would be not very much canonical from MapReduce point of view, but the more aggregation could be done on a single node/worker, the better for reducer.

Timur Safin · Aug 26, 2016 go to post

Very good question. The push operation of our FIFO is safe, even in their "lock-free" way, because of $increment/$sequence usage and their guarantees. But pop operation is troublesome if there will be multiple workers retrieving the same head just at the same moment. 

So, yes, there is no "exactly one" guarantee, and if reduction phase will be running concurrently (it 's not yet planned such) then we have to lock each read-delete operation.

This gonna be problem for multiple node scenarios, so we will talk about this problem when we will approach remote execution and multiple nodes. Thanks for note!

Timur Safin · Aug 2, 2016 go to post

Like this one?

USER>write $match("abcdeabcde", "(a|b).*(de|fg)")
1
USER>write $match("abcdeabcfg", "(a|b).*(de|fg)")
1
Timur Safin · Jul 21, 2016 go to post

You only partially correct:

  • yes, as lazy developers we prefer to write only 1 method call, instead of 2 chained together;
  • but, no, this is not %Connect (which may be expensive operation) which should move to the %OnNew, but rather other way around.

I.e. for the cases when we need both (not actually in 100% of a cases, rather 90%) we could create combined classmethod, which will create instance of a class via call to %New() and then will proceed the necessary side-effect. i.e. 

ClassMethod %ConnectNew(Config As %Object) As Sample.RemoteProxy { ... }

In general, you should rather avoid creating huge DOM tree of an objects, or proceeding network operations inside of %New constructor. Constructor needs to allocate just bare minimum of memory, necessary for beginning of operations, and initialize fields to their default values [that will be done automatically]. That's it.

Timur Safin · Jul 18, 2016 go to post

Also about comment to move %Connect code inside of %New.

This is not, generally a good idea to insert potentially long and slow code inside of object constructor. I prefer to have slim and fast %New, which might be nested elsewher to some wrapping onjects. While keeping slow, and expensive functions like %Connect in this case, outside of constructor, independently callable.

For example, try to use incorrect login details here and then see how long it will take to fail such connection (i.e. timeout period).

Timur Safin · Jul 18, 2016 go to post

Because:

  1. I hate long list of arguments passed to function, especially when most of them are optional;
  2. In similar cases I prefer to use named-arguments approach, whcih I saw otiginally in Perl (here is the quick link I've found which shows this idiom). Named arguments allow to pass arguments in any order, which allows to avoid many related errors if (optional) argument passed in the worng order.
  3. Named-arguments were actually creating hash object in Perl, with which we worked later, accessing it's key-value pairs. 
  4. But at the end of a day the new, JSON dynamic objects we have in Cache' are semantically equivalent to hash-objects we were operating in Perl in the past;

Thus similar idiom could be used in the ObjectScript.

P.S.

Though I agree, that was some stretching to use this idiom in this particular case, with not that much large number of arguments. But at least it didn't make code less readable. :)

Timur Safin · Jul 8, 2016 go to post

Actually I don't see any value for checking $data for intermediate subscript (and check their consistency only at the most beginning of a function). Here is my [hopefully] simpler version

CompareArrays(refL, refR)
    if $data(@refL) '= $data(@refR) {
        // they are not consistent: one is non-array
        return 0
    }
    
    do {
        // fetch next data node subscript and it's value
        set refL = $query(@refL, 1, valueL), refR = $query(@refR, 1, valueR)
        if refL="" || (refR="") {
            quit
        }
        set subL = $qlength(refL), subR = $qlength(refR)
        if subL'=subR || (valueL '= valueR) {
            return 0
        }
        // check each subscipt individually
        for i=1:1:subL {
            if $qsubscript(refL, i) '= $qsubscript(refR, i) {
                return 0
            }
        }
    while refL'="" && (refR'="")
    // only after all checks passed
    return refL=refR
DebugArrayCompare()
    new
    set m(1,1,1)=11,m(1,2)=12,m(2,1)=133
    set n(1,1,1)=11,n(1,2)=12,n(2,1)=133
    write $$CompareArrays($name(m),$name(n)),!
    set n(3,1)=0
    write $$CompareArrays($name(m),$name(n)),!
    quit
 
Timur Safin · Jun 30, 2016 go to post

So you are using ODBC access in WinSQL to connect to CacheODBC source, from particular namespace...

 

 

Did you check you are using DSN pointing to the desired namespace? Did you check the bitness (32- or 64-bit) for DSN you use in WinSQL? 

Timur Safin · Jun 22, 2016 go to post

Small correction though: the referred github sources are fork, and have been created not by Dmitry Maslennikov (@daimor) but by Eduard Lebedyuk (@eduard93)

Timur Safin · Jun 17, 2016 go to post

Thanks, Dima, [I did expect you will publish it] and this advice is very interesting and easier to apply by "lazy devops engineer". Though some explanations and comments won't harm. Hope you'll find some time eventually to write article. 

P.S.

Could not resist and not say my few notes about your docker file:

- from pure micro-services point of view for the generic case of multiple ECP clients it makes no much sense IMHO to install csp gateway to each of instantiated docker instances;

- I'd invoke it at the master (ECP database server) instance, or probably as separate docker image;

- [though I suspect, that for HAproxy scenario you might needed to have this CSP-gateway services spread over each instance just for high-availability scenario. I'll be curious that Luca would recommend here from micro-services prospective?]

Timur Safin · Jun 16, 2016 go to post

Let put aside software architecture (I'll write later some number of articles abut what I mean here), let talk about dirty details. 

If you have any oncrete details about the way you use Swarm, Ansible, Chef, or similar, then I (and community) will highly appreciate.

P.S.

It will simplify things a lot if we could configure ECP mapping at the runtime via some set of API calls, and not statucally via editing cache.cpf. Something like it's done in MongoDB for adding new shard:

sh.addShard("repl0/mongodb3.example.net:27327")

https://docs.mongodb.com/manual/reference/method/sh.addShard/

But not for the scenario of adding shard to shard-manager in particular, but for something more generic for ECP or mapping. I suspect there is something related already implemented for EM, but I have no clue how to use it for my case.

P.P.S.

And I know there is already implemented AssignShards call in the forthcoming product, but it's too much specific, creating particular set of mappings. I'd need to have it more generic. 

Timur Safin · Jun 16, 2016 go to post

Could you please share those Terraform configs, or at leats key part of them (concerning Cache.cpf modifications and alike).

Timur Safin · Jun 3, 2016 go to post

This sounds very interesting.

I could not give any data proven onclusion without looking into sar or mgstat data, but from your words it sounds like the bottleneck here is ObjectScript VM or engine interprocessor locks implementation. This is hard to believe taking into accont that we are talking about "io bound" experiment, but if you will show us sar metrics...

Timur Safin · Jun 3, 2016 go to post

Few easy questions first:

- how much memory did you allocate for your global buffers?

- Did you see ^mgstat statistics at the moment your code was busy walking over huge globals?

- and did you play with global prefetching in this case?

P.S.

Let put aside write amplification problem, disable global modifications and attack read performance first.

Timur Safin · Jun 1, 2016 go to post

Given the returned from Quote^%qcr expression you could use XECUTE to reevaluate the string, i.e.:

DEVLATEST:22:51:39:USER>set q= $$Quote^%qcr(lb)
 
DEVLATEST:22:51:54:USER>x "s u = "_q
 
DEVLATEST:22:52:30:USER>zw u
u=$lb(1,2,3,",",5)
Timur Safin · Jun 1, 2016 go to post

ZWRITE command is implemented in the ObjectScript, and if you are happy with the way it's quoting $LB then you could reuse it's core functionality, i.e.

DEVLATEST:22:47:39:USER>set lb = $listbuild(1,2,3,",",5)
 
DEVLATEST:22:47:41:USER>write $$Quote^%qcr(lb)
$lb(1,2,3,",",5)
Timur Safin · May 30, 2016 go to post

Yes, we need a way to change your own vote after you've accidentally pushed the wrong star (and the chance to push wrong start is dramatically increased if you have fat fingers and touchscreen. Done it many times :( ).

Timur Safin · May 30, 2016 go to post

Let see to the keypad again, it's getting obvious instantly that keys are (mostly) located by groups of 3 symbols, and if there would not be those "s" (corresponding to "7777") and "z" (which produces "9999") then implementation will be simple formula with division by 3 and corresponding modulo. Also space is exception and is not a part of sequential numerics.

So given this assumprion let us create the 1st approximation (no compression, not name reduction, everything ie readable and commented):

 0(s) public {
    set S=""
    for %=1:1:$length(s) {
        set P=""
        set = $e(s,%)
        set = $a(c) - $a("a")
        if c=" " 
            set P="0"
        elseif c="z" {
            set P="9999"
        elseif c="s" {
            set P="7777"
        elseif i<($a("s") - $a("a")) // ($a("s") - $a("a")) = 18
            set n=i\3+1,m=i#3+1
            set before2 = $a("1") ; 49
            set $p(P,$c(+ before2),m+1)=""
        else {
            set n=i-1\3+1,m=i-1#3+1
            set before2 = $a("1") ; 49
            set $p(P,$c(+ before2),m+1)=""
        }
        set:$extract(S,*)=$extract(P,1) S=S_" "
        set S=S_P
    }
    quit S
}

[Don't botther to count symbols - we will compress the code a bit]

This strange `set $piece(string,symbol,offset+1) = ""` is actually filling of a string with the given symbol.

Let us review those several ifs, we see, actually, 2 groups of them:

  • 3 ifs for exceptions fom formulae;
  • 2 ifs for disjointed rows of groupd by 3 keys. The formulae is atually the same, but witth some offset.

So let's get rid of ifs via $select and extra offset itroduced.

 #; get rid of ifs, replace with $selects
1(s) public {
    set S=""
    for %=1:1:$length(s) {
        set P=""
        set = $e(s,%)
        set = $a(c) - $a("a") ; $a("a")=97
        set = '< 18 ; ($a("s") - $a("a")) = 18
        set $p(P, $c(- o\3+1 + $a("1")), - o#3+1+1)=""
        set P=$select(c=" ": "0", 
                      c="z": "9999",
                      c="s": "7777",
                      1:P)
        set:$extract(S,*)=$extract(P,1) S=S_" "
        set S=S_P
    }
    quit S
}

[I believe this step is still obvious]

[[I dislike the way I had to put expressions without pairs but we need as short as possible, sothis is inevitable evil.]]

Now this is time to get shorter, but stiill readable version:

 #; name reduction
2(s) public {
    S=""
    %=1:1:$l(s) {
        P=""
        = $e(s,%)
        = $a(c) - $a("a") ; $a("a")=97
        = '< 18
        $p(P, $c(- o\3+1 + $a("1")), - o#3+1+1)=""
        P=$s(c=" ": "0", c="z": "9999", c="s": "7777", 1:P)
        s:$e(S,*)=$e(P,1) S=S_" "
        S=S_P
    }
    S
}

That was simple- select code in Studio, then press Ctrl+Shift+E. 

And the latest step, is to convert this barely readable code to 1 line mess hard-code stuff:

 #; linearization
3(s) public {
 S="" %=1:1:$l(s) {P="",c=$e(s,%),i=$a(c)-97,o=i'<18,$p(P,$c(i-o\3+50),i-o#3+2)="",P=$s(c=" ":"0",c="z":"9999",c="s":"7777",1:P) s:$e(S,*)=$e(P,1) S=S_" " S=S_PS
}

That's (if you count the 1st indent symbol) 173 characters length.

P.S.

Eduard Lebedyuk has improved this result a little bit:

  • he has replaced quoted literals with numerics (because they are both represend canonical numerics
    from ObjectScript point of view);
  • and replaced (where possiblle) comparisons of characters with comparisons of their derived ordinals (minus 97, or "a")
 #; +Eduard modifications
4(s) public {
 S="" %=1:1:$l(s){P="",c=$e(s,%),i=$a(c)-97,o=i>17,$p(P,$c(i-o\3+50),i-o#3+2)=P,P=$s(i<0:0,i=25:9999,i=18:7777,1:P),S=S_$s($e(S,*)=$e(P):" "_P,1:P)S
}

We are 158 now!

Timur Safin · May 30, 2016 go to post

157 is impressive: my first result was 174, which Eduard has improved to 161. I didn't thought it might be improven even further :)