Article
Rob Tweed · Dec 20, 2021 1m read

Holiday Reading: What Lies Beneath!

For those of you who might be new to IRIS, and even those who have used Cache or IRIS for some time but want to explore beyond its usually-assumed boundaries and practices, you might want to dive into this detailed exploration of the database engine that is at its heart, and discover just what you can really do with it, going way beyond what InterSystems have done with it for you. 

You'll discover that it's actually a hugely powerful yet incredibly simple storage engine that allows you to model any kind of database you wish, once you understand and master its simple, basic principles. Its storage model can be referred to as "Global Storage" which, it turns out, can be modelled on top of any hierarchical data storage engine, and even on the Redis NoSQL database.  However, the "native" implementations, eg in IRIS and Cache, are the fastest by a significant margin, outpacing what the database world normally recognise as the planet's fastest databases (eg LMDB).

Global Storage is (sadly) one of the best kept secrets of the database world, but I've distilled into this set of articles all my 40-odd years of knowledge and experience of using and pushing this database technology to its limits.  It's my attempt to try to make it all at least a little bit less secret and open your eyes to what really lies beneath!

So strap yourself in and read all about it here:

https://github.com/robtweed/global_storage

10
4 349
Discussion (5)4
Log in or sign up to continue

Interesting article, thanks Rob.

Have you written any documents (or aware of any documentation) that details how the globals work in the background? I know when I started coding Mumps in the early 90's, I assumed that every time you referred to a global in the code the data was being accessed directly from the disk. However I know that isn't the case and have a vague understanding that the global data is stored in data blocks stored in a b-tree structure but I would like a deeper understanding of how globals are actually stored and managed.  

Glad you liked the articles, Ken.  How IRIS (and Cache) handles the physical side of global storage is, of course, proprietary, but, as the article referred to by Alexander explains, it's done using a fairly classic b-tree architecture.  For performance, access to the physical database is buffered via memory, the amount of which you can configure. Additionally, IRIS and Cache both have ECP networking which adds an amazing level of additional power and flexibility with Globals able to be transparently abstracted across networked machines - needless to say the technicalities of ECP are a closely-guarded proprietary secret!

As my articles explain, however, you can actually implement the basics of global storage on top of a number of other different databases, with BerkeleyDB being probably the closest example to how IRIS and Cache implement them.

Of course, for the average user, how the concept/abstraction of Global Storage is physically implemented is of less interest than how you can harness and make use of Global Storage to do the kinds of things you want to do.  That, of course, was the focus of my articles, to show just some of the most common ways (and some of the lesser-known and very sophisticated ways) in which you can harness Global Storage.

I've sometimes described Global Storage as a "proto-database" - a very simple but powerful and flexible database engine on which you can model pretty much anything else and on which you can layer all the other stuff you need to create your specific database environment.  As such, it's unique in the database marketplace, and it has always baffled me over the years why it's so little known about and used: it blows away everything else out there.

Anyway, a Happy New Year to all fans of IRIS and Global Storage!

Rob

Rob ... this is incredibly rich - thank you for taking the time to write all of this up!!

Great article and github md tips