Why I love ObjectScript and why I think I might love Python More |

Article

Nigel Salm · Aug 18, 2021 15m read

#Coding Guidelines #ObjectScript #Python #InterSystems IRIS

Why I love ObjectScript and why I think I might love Python More

I was looking at the thread of messages on the topic of "Performance when constructing a comma-separated string", and I started writing a response but got distracted, the page refreshed, and I lost my text. I couldn't spend the time rewriting my response, so I started writing this document instead.

I started writing MUMPS at the beginning of my career. I wrote very tight and dense code blocks where exercises such as the string example were authentic challenges. We squeezed every last bit of performance out of the Digital DEC or VAX servers, where we planned where a key global would be positioned on a disk platter. When Caché was released, we were still working with M/SQL. There was a period where I was involved in several performance comparisons between Caché against Oracle, Sybase and SQL Server. We would design a schema of a few tables, populate them with several million records and then execute many searches on the resultant database. I used to write two versions of the SQL statements. One version would be a pure SQL statement, and the other would be a custom query which I would write into the class definition. The bulk of the logic goes into the 'Fetch', and I would craft my 'Fetch' method to maximize the indices I had defined and use ^CachéTemp for any interim results complex joins. I would sometimes job off one or more sub-queries that would create the interim temp globals and then resolve the joins once all of the jobbed processes had finished. The result could be summarised as follows:

Inserting data into the database using SQL or Caché Objects was always faster than any other DB. Using pure COS and direct global sets was an order of magnitude faster than SQL, Objects, and any other databases. The resultant database would be roughly half the size of the database created by any of the relational databases.

When I compared the code that I wrote in my 'Fetch' method against the code generated by the Caché SQL Engine, I used fewer variables, 25% fewer lines of code, and the code was more readable.

The number of physical data block reads would be roughly the same as the code generated by M/SQL. However, the number of logical reads from the Global Buffer Pool would be 20% less than M/SQL.

I made use of every trick in the "MUMPS Developers Cook Book". I used commands such as 'execute', 'job' (effectively creating threads to handle sub-queries in parallel), indirection, and post-conditions. We recommend that developers don't use these language features to write readable and maintainable code by other developers.

I would initialize variables in the form:

set (a,b,c,d)="",(x,y,z)=0,p1=+$h,p2=...,pN=99

I squeezed as many expressions into one line of code as I possibly could. We believed a cost was incurred when reading each line of code into the "execute buffer". Therefore the number of lines of code executed always had a direct and inverse effect on performance.

When I work on code written by some other developer, and I notice that there are blocks of code consisting of one set command per line, I get somewhat worked up and invariably compress those 30 lines down into one. I fell in love with Caché Objects. Twenty-five years later, that love affair has outlasted two long term relationships and marriage. Class definitions, with precise and very readable property names, bitmap indexing on everything unless find indexing can do better. Parent-Child relationships rather than One-Many when I can. I will use a custom primary key in code tables when bitmap indexing is not required because set record=$g(^global(code)) will always be faster than

set record="",rowId=$o(^IndexGlobal("IndexName",code,"")) set:$l(rowId) record=^Global(rowId)

There were some forms of SQL select statements that M/SQL either didn't support or performed poorly. In general, Caché was 2-3 times faster than any other database.

Over the years, the SQL engine has significantly improved. Bitmap and iFind indexing were introduced. We use iFind indexing on the Names and Addresses of Patients in a database of 15 million Patients. Every other field is bitmap indexed. When we receive a FHIR Patient Search with several parameters, we support all FHIR specification qualifiers and operators. We construct an SQL statement that starts with a join across all of the entities of FHIR Patient, which we store in persisted classes. I am pushing for us to use the IRIS for Health repository for our next phase of development. IRIS has had two releases and has matured since I first worked with it in version 2019.1. The join is followed by any iFind clauses on Names and Addresses if specified in the search criteria. Then AND/OR clauses are added for any fields in the search criteria that we know are supported with Bitmap Indices. The deterministic or probabilistic searches we perform are so accurate and so fast it still has me jumping around in excitement (at my age!!!).

I must confess that I had never liked SQL when I was one of an ever-shrinking pool of developers that wrote MUMPS code in the late '80s. My peers were all too quick to jump into bed with Oracle or SQL Server, and it was difficult at times not to fall into a state of despair as I listened to the naysayers shouting, "MUMPS is dead.

Then, at the annual MUMPS Conference in Dublin, we woke up one morning to a note pushed under our doors announcing that InterSystems had bought DTM. At a conference held a year later in Birmingham, I was working for InterSystems, and we were showing off Visual Basic forms using the Caché dll that we had acquired when we bought Data Tree. Micronetics were on the stand opposite ours, and they didn't have a dll. Their sound system was louder than ours, but we knew we had won. It would take another year before we had bought DSM from Digital and finally MSM from Micronetics, and then there was no holding back. I remember showing off M/SQL to a customer in Birmingham who wrote accounting software. One of their customers was Barings Bank who had just lost 859 000 000 GBP due to their rogue trader Nick Leason. I couldn't help but set up my example database so that I could run an SQL query that was probably no more complex than "SELECT sum(Total) from Accounts WHERE .... and AccountNumber="666..". The account number was the account number that Nick Leason had used to hide the trading he was doing to rescue his situation that was getting worse with every single ring of the Singapore Stock Market Trading Floor Bell. I remember standing there giggling quietly, partly because of the implicit reference to the Barings Bank Collapse but also because the query actually executed, provided the correct answer and didn't take more than a minute to run (none of these things was certainties then).

That was the only memory that I have of enjoying SQL. I would deal with one audience after another audience of Oracle and SQL Server DBA's where I demonstrated Caché, Caché and VB, Caché Objects and Caché SQL and delighting in Caché Objects: so elegant, so obvious, so malleable, so readable. Object syntax (in any language) is so much more natural to me than any SQL statement and when we did get the opportunity to take a prospective customers application schema and run it through the SQL importer and translate the set of stored procedures that the prospective customer would include in the schema into either Caché Objects or pure Caché Globals I became very acquainted with reading the SQL execution plan and the generated stored query and getting into long conversations with Aviel Klausner about the one SQL query that the prospect customer had given me that wasn't working and that would make the difference between: watching the Oracle DBA's slink out of the conference room back to the safety of their index tuning and the guaranteed 6 hours of downtime that their systems had every day while backups were being done, where they could coax their relational applications back to life in readiness for the next days trading, or the excitement of winning over a customer that we had been pursuing for months who was more interested in the speed of Caché, the Object Orientation, the gateways to .Net or Java, the simple elegance of the CSP broker. I think that the question: "Why write applications INSIDE a DB environment at all?" isn't a question at all. Firstly, I create a database that will contain my code and another for my globals and right there, I have a point of separation. We have all grown up over the last 25+ years thinking of Classes, Objects, ObjectScript and Globals as being all lumped together. I argue that at runtime, the code executing in the code buffer is OBJ code. OBJ code is essentially a mixture of compiled C code, pure machine code optimized for the platform it is running on, and some remnants of the class definition that is required if you are using $classname, $classmethod, $property and other factors. Much of the 'engine' of Caché or IRIS is written in ObjectScript is a testament that ObjectScript is a perfect language to work with. It is a language that can be explicit, can be abbreviated, can be very compact. It contains all of the operators and constructs of any modern language (if, ifelse, else, try - catch, while, for [to be fair our implementation of FOR is wonderful: | for i="apples","pears","Nigel","Fruit" {} | for {} | for i=$$$StartGValue():$$$Increment():$$$EndValue() | ]). If one of the early MUMPS creators had called $order "$next", it would immediately be recognizable as Next() as found in every other language that iterates through an array. $PIECE is a bit quirky, but only because every other database uses fixed-length fields. The concept of delimited strings used as a database construct is alien to a SQL DBA. When you look at the compiled machine code of either form of database, the machine instructions are moving through the string, character by character, and either counting the number of characters or looking for a specific field delimiter, whilst counting the characters as it does so.

$list squeezed a little bit of performance gain over $piece but at the cost of an extra byte or two at the beginning of each field but still less space-consuming than fixed-length fields. The reason why all of the system code is written in ObjectScript, even to this day, is because it is a very efficient language, it is a very readable language, and when the core Caché/IRIS developers, Scott Jones, Dave McCalldon, Mo Chung, required something where ObjectScript was inadequate, they would write that in C and bury it in the Kernal.

Next: If I have a table definition and I have fields that require some form of formatting or validation over and above the obvious constraints of type and length, then I want to write that code and keep it nice and close to the field definition itself. Why would I want to go into another language, another environment, to write that validation? The relational databases use stored procedures and triggers to handle such validation using the SQL language to express the validation logic. Find me a programmer who would rather use SQL to write sometimes complex logic rather than Basic or C# or C++ or OjectScript or Python, and I'll buy you a beer when I next pass through Vienna :-)

At university, I learned to program in Fortran and Pascal. Pascal was a perfectly readable usable language and being an easily excitable 19-year-old, it fascinated me that the Pascal Compiler could be written in Pascal. Later on, I learned COBOL. WTF??? And yet, I have a friend who is a developer for Sage Accounting, and he writes COBOL because Sage Accounting was written in COBOL. Page after Page of the most verbose, unreadable, unusable language I have ever come across. In fact, there is a ton of COBOL out there.

You would think that Pascal would have easily surpassed COBAL and even Basic. But it didn't. And why didn't it? Easy, it wasn't used in Banking applications (COBOL was used extensively in large Mainframe Batch Processing Applications such as Accounting). We joked that we couldn't get Banks to buy into the Caché Model because we weren't expensive enough. It wasn't that ObjectScript couldn't do the transactional processing of those Banking applications, and we were demonstrably faster than whatever technology they were using. The problem was that they had spent so much money on the systems they had and the hardware required to run those overnight Batches in time for the Banks to open at 9 am the following day. The expensive server rooms with radon gas and filtration systems remove even the smallest dust particles lest a particle land on a disk platter or mag tape and render an entire day's worth of account transactions useless.

Pascal should have outlived Basic, and it possibly would have if Microsoft hadn't built Visual Basic and gone into competition with Delphi and Borland. Their IDE looked exactly like VB but used Pascal rather than Basic. And this was all happening while Microsoft brought out C# because they had to accommodate all those C++ programmers, and they certainly weren't going to win over the C++ programmers with Basic. They were also threatening to bring out their version of Java or remove support for Java because it annoyed them that Java ran on hardware platforms that Windows would never be able to run on. It was only when technology advances made the concept of VM's or Containers a realistic deployment option that Microsoft backed off. And so Pascal and Delphi just disappeared. I did a quick search in Google now, and there is a Pascal Interpreter for Android, so it is still out there.

Given that Pascal was just a language as opposed to Basic, which was just a language in one sense. But Microsoft used it for scripting in applications such as Excel and a proprietary connection to SQL Server, which allowed them to bind two intrinsically unsuitable environments together without the pesty hassle of complying with the ODBC and JDBC standards. Standards were heavily backed by Oracle, Sybase and pretty much everyone who had to provide a gateway to their proprietary versions of SQL. And so Basic lived on, and I'm happy that I started my programming career with Pascal, followed by a rude awakening when I wrote COBOL programs for a year working for an insurance company, when I arrived on the wet and grey shores of the UK and walked into my first job, which, just happened to be a MUMPS house. Every evolution of MUMPS to CachéObjectScript, then Objects, followed by Object Gateways to .Net and Java, Caché Basic and MultiValueBasic and now Python. Python takes me full circle, and in a sense, proves the point that ObjectScript is not some aberration lumped onto a non-relational outcast of database technology.

Caché Globals, these multidimensional sparse arrays that are so very convenient to the very nature of Healthcare data to such an extent that no matter how hard Oracle and Microsoft have tried to consume that market space and though they may well have killed off Pascal and Fortran and even basic, they haven't been able to kill off InterSystems. I remember attending an Oracle seminar on "Oracle for Health". The presenter was going on about Oracle in HealthCare which, she assured us, would take over the Healthcare Market once and for all. I put my hand up and asked, "Isn't that what you have claimed with every major release for years now? You failed then. What makes you think you'll do any better this time around?" She stared at me, "Who are you?" she asked. I replied: "I am from InterSystems. We dominate the Healthcare market and have done for 35 years. We have done so because our technology was born in Massachusetts General Hospital, and guess what. They still run their core systems on our Technologies.". At which point, two burly security guards removed me from the auditorium.

So you have Oracle with their pSQL, and they own Java. You have Microsoft with SQL Server, C# and tSQL, and when you need to interact with Java, you are constrained to JDBC. Likewise, if you live in Java and have to talk to SQL Server tables, you are constrained to using ODBC, and where do we sit? Well, we have this rather clever idea of having wrappers for .Net and Java. When using ObjectScript, I instantiate an instance of Class A. It doesn't actually matter whether Class A is actually a .Net class or a Java Class, or an ObjectScript Class because I will instantiate those objects using precisely the same syntax in all cases. Then I am going to invoke the instance or class methods to manipulate those objects. It doesn't matter what the insides of those methods look like because the syntax for interacting with those classes and their methods is essentially identical no matter what they contain.

And along comes Python, which shares many features of ObjectScript in that it is an interpreted language as opposed to a compiled language. It is very readable and very usable. Just as Caché ObjectScript found a Niche in Unstructured Data, Python found a Niche in the world of Mathematical Modelling, ML, AI and much, much more. This is not a world that C# or Java are particularly comfortable in, and nor is ObjectScript, for that matter. So InterSystems has concentrated on providing increasingly powerful functionality for manipulating vast amounts of unstructured data, throw in some iFind and iKnow, some very clever indexing techniques and probability matching algorithms, and you then invite Python to come and cuddle up to our multidimensional sparse arrays and bring with it, its millions of baby .py's that do just about everything complex that you'll need. You have a match made in heaven. Oh, and just in case, I forget to mention that several architectures that dominate the world of web page development are all based on JS (Angular.js, REACT.js, Vue.js, Bootstrap (ok, there is no JS, but it is JS in all but name) and Node.js) and JS Arrays. JS isn't going away anytime soon. However, it will be interesting to see where Golang will go if you catch my drift. I have noticed that there have been entries based on JS arrays in the last couple of code competitions. If there is one technology that understands arrays better than any other technology, then it's IRIS.

And I think back to those days, sitting in my office at the company I worked for in the heart of London. The company, at one point, had been full of MUMPS programmers but then turned them into Relational SQL programmers and then made them Redundant. I remember that feeling of beginning to question whether my faith that MUMPS, being just the best language I had ever encountered, might be wrong? The language and the companies that had built interpretations of that language were going to die. And that made me very sad because by then, I had learned five other programming languages (APL, Basic, Fortran, COBOL and Pascal) before I discovered MUMPS, and MUMPS was just so straightforward. Easy to write, easy to read, easy to deploy. In short, it was as natural to me as English, and it had a rhythm that reminded me of the hymns we sang at the Methodist school I attended:

Onward, Christian soldiers!

Marching as to war,

With the cross of Jesus

Going on before.

Christ, the royal Master,

Leads against the foe;

Forward into battle,

See his banners go!

But it didn't die. The song changed a bit:

Onboard Nigel Saaalllm

Flying off to War

With his ISC CreditCard

Going On Before.

John, the Master, McCormack

Leads against the foe (Microsoft)

Forward into Battle

See his Duty Frees Go

And CacheObjectScript was even better than MUMPS if that was possible, And CacheObjects looked so cool when demonstrated to an audience for the first few times, and CacheSQL left its M/SQL days behind and has become rather good over the years. Still, I don't particularly appreciate writing much SQL, but I have found a nice balance between Objects and SQL and Direct Global references as I have relaxed. And whereas my code was heavily weighted towards Direct Global references, with some OO and minimal SQL. When the products reassured me that I could trust that the generated code was tight, elegant, efficient, and readable, the balance has shifted again. Now I seldom use direct global names, lots of Objects and a reasonable amount of SQL.

Working with Python will require my mind to see different patterns from my ObjectScript Code. There are way too many '__abc__' and other strange structures, but once I write a few pages of py code and then stand back. Just as I do when painting an oil painting, the patterns will pop out at me. Just as I see music as colour synesthesia, so too will my colour coded py programs flowing across the page begin to resemble a little watercolour or even a heavy oil painting. I will be delighted, and all will be right with the world.

Nigel Salm · Nov 2, 2021

Hi Michael, what aspects of ObjectScript do you find frustrating. I'd be interested to get the views of developers new to ObjectScript for two reasons.

1) Ip have worked with Fortran, Pascal, Basic, APL, Cobol, DOS, C#, JS and now Python and R and, with the exception of APL, I generally found that most languages consist of a set of commands, functions, datatypes, operators, and though the actual name of a command may vary between languages I fundamentally understand that there will be an if/ifelse/else, for, try/catch, foreach, construct equivalent in every language. Once you've absorbed the language elements then its just a case of expressing the logic of your function using the elements of that language. There are some things in ObjectScript such as $query, execute, indirection, the ability to treat a string as a numeric (write ! Write "15 Apples" + "25 Pears" w " Fruit") are peculiar to ObjectScript and I have found nothing similar in other languages. I make heavy use of things like $classmethod(), $classnsme(), $property(), as I write a lot of abstract classes which, when inherited into a host of other classes, I don't always know which class I am in but I know that it has a class method "MyMethod" so I will write code such as

Set tSC=$clasdmethod($classname(), "MyMethod", {param1}, ...)

So, over the years I have managed to get ObjectScript to express all the logic I have ever needed (short of Mathematical modelling and Data Science which historically has not been an issue as I have written transactional applications or Ensemble interfaces. Now that we have applications out there with 30+ years of structured and unstructured data stored in them I can't wait for the opportunity to build iFind indices. IKnow entity relationships and using IRIS analytucs, especially python, R and their large libraries of analytical functions and turn my attention to say a LabTrak database and start looking for correlations between histology or cytology observations, the patient demographics and some external datasources of lead levels asbestos, socio-economic backgrounds and see what correlations there may be between those dimensions and the likelihood of a patient with certain key dimensions developing a type of cancer, or diabetes and so on.

2) Ii have been learning a lot if new stuff recently, Ubuntu, Docker, Node.js. Python, since I became the father of 4 baby Pi 4B's and in learning all of these new technologies I came across a host of Cheat Sheets and I was struck by how useful they have been and how, in a sense, they are almost works of art. So I have set my heart on developing a set of ObjectScript, Ensemble Adapter, Cache Studio, System Management Cheat Sheets for IRIS and given the sheer volume of stuff that I could include I am keen to understand what I include and what I leave out and understanding where you have been frustrated may help me focus on what would be included and what is best left for the full documentation.

Nigel

1 0

Arto Alatalo · Aug 23, 2021

Great article! Thank you for sharing your experience and thoughts with us.

2 0

Ben Spead · Oct 28, 2021

Closely related to this topic, make sure to catch the 4 sessions on Embedded Python at Virtual Summit 2021, and see my comment in this thread on how I see incredible power in being able to leverage the python ecosystem as an ObjectScript developer without having to actually write python code: https://community.intersystems.com/post/start-learning-about-embedded-py...

@Nigel.Salm5021 - thank you for taking the time to write this *excellent* article ... I am going to make it required reading for my entire team in order to help them better understand the rich history of the ObjectScript language and Caché/InterSystems IRIS platform!!

@Nigel.Salm5021 - I love the story about your being escorted out of the Oracle conference session. However, should the question posed to you by the speaker be "Who are you?" rather than "How are you?" .... it looks like it might be a typo or bad auto-correct :)

0 0

Yes. You're right, it should be Who, not How. I'll see if I can edit the article.

Ben Spead · Nov 3, 2021

you can edit the article as long as you are the original author. Thanks again for the great write-up!

Michael Pine · Oct 28, 2021

As a new developer at InterSystems, I've found myself frustrated at times with objectscript. This gives me a somewhat new found appreciation of the history! Thanks for the thoughts!

@Michael Pine - I understand that frustration from when I was new and I think the biggest contributing factor (for me) to that frustration was a lack of resources for quickly finding code samples or getting my questions answered. I think that the D.C. has started to fill in this hole for people, and the more we all use it to share ObjectScript Q&As, best practices and Tips & Tricks, the more we will lower that frustration for ourselves and for others coming to the language :) So glad to have you as part of the Community!

Joel Solon · Oct 28, 2021

All I can say is Wow! I experienced many of the same things over my career, but I was living in safe, protected, training-room-land, not the real world of development like Nigel. Great article!

I'll just note that Nigel says "might" in the title.

One thing that confused me, Nigel. You wrote "If one of the early MUMPS creators had called $order "$next", it would immediately be recognizable as Next()...". But I know that you know that the original $order was called $next.

Hi Joel, you are correct however the ansii standard used $order however a couple of vendors supported $next. I was sitting on the ansii committee at the time when we were deciding whether we would ratify both and the final conclusion that there was need for only one function and that function name would be $order. On the surface $next would seem to be the more logical choice and unfortunately it was so long ago I cant remember the final reasoning that decided on $order.

Joel Solon · Nov 5, 2021

Very strange! I was never on the standards committee. I started using MUMPS in 1987, and I remember it differently ($next being replaced by $order). Let me check the archives...

My 1981 copy of the Standard MUMPS Pocket Guide states that variables can have only non-negative integer subscripts, and lists $next (returning -1 if no more subscripts exist), but there's no mention of $order.
My 1983 copy of the Pocket Guide (based on the 1977 ANSI Standard) states that variables can use any string as a subscript and lists $order (both noted as approved extensions of the Standard; $order using "" as both the seed and the flag for no more subscripts), and $next is still listed.
My 1987 copy of the Pocket Guide (based on the 1984 ANSI Standard) has the same info as the 1983 Guide, except the approved extensions are now part of the Standard.
My 1995 copy of the Pocket Guide (based on the 1995 ANSI Standard) lists the 2 argument form of $order, but $next (having been deprecated) is not listed.

Public viewing of the Solon Archives available by appointment only ;-)

https://www.youtube.com/watch?v=rIz_xhYK2Mo

Nigel Salm · Nov 4, 2021

OMG, I had completely forgotten about those pocket guides. I had 3 or 4 of them and as far as I know, I still have them in a crate of paperwork in the loft in my house in Johannesburg. And you are right, $next() was a valid ANSII function until replaced by $order(). I stand corrected and the video clip was a magnificent 'extra' to emphasise your "solon Archives". Made me grin from ear to ear.

Heloisa Ramalho · Nov 5, 2021

I think that the reasoning for having two similar functions $next and $order is that $next returns -1 when no next subscript is found, preserving compatibility with the original applications, whereas $order returns "" when no next subscript is found, besides additional functionality of course.

Herman Slagman · Nov 4, 2021

Old man
Glad you were of great help in the early days of Caché.

Nigel Salm · Nov 5, 2021

Cheeky sod, enough with the old man already!