Article
· Jun 18, 2024 4m read

Unity Catalog for IRIS Workloads - Collision Theory Confirmed

Collision Theory Confirmed

Innovation happens when two or more technologies collide to create something new. The best collisions can CHANGE lives, eliminate WASTE, DIFFERENTIATE in the market or flat out give me another project I dont have time for, but this one would really really matter.

I attend conferences and hackathons searching for that something to click enough to make me walk out of a keynote happily distracted and snag one of those (rare indeed) empty benches next to a power outlet and consume code bases. This high occurred at  InterSystems Global Summit 2024 , but it wasnt apparent until a shot was fired 2900 miles away at DAIS 2024  at the very same time when Unity Catalog went Open Source! 

I am not one that can see through or engage a textile metaphor to articulate what is needed for data workloads at ridiculously weird/fast times to serve in my industry. I need software to back it, and pretty much immediately. This is the difference between solving a gap vs. curating an innovative idea I'd presume, so solutioning encouraged, and OSS is on the case.

Possible mis-use of "Collision Theory" for a publication aside, here is a collision or an eminent one at least that I pondered in an Uber ride a Wednesday ago and is still holding close to ring zero in my carbon based operating system.

The Collision

Vectors


You dont need to dig too hard to get excited about the immediately possibility of Vector Data Structures right along side all the other ones from a SQL perspective. "Already Ready" is a real thing and a column and ELT/ETL away without moving any data at all.

@Alvin Ryanputra 's GrandHack MIT Demo is kind on the eyes to highlight this data mix-in powered by embeddings/vectors:

SELECT TOP 3 * FROM scotch_reviews 
WHERE price < 100 // SQL STUFF
ORDER BY VECTOR_DOT_PRODUCT(description_vector, TO_VECTOR(:search_vector)) DESC // VECTOR SORCERY

You got your SQL mixed in with my Vectors could be a modern day Reese's commercial, and the fact you can take an already existing text column and rip it to evergreened embeddings and persist it should set off some light bulbs somewhere.

Python


Unity Catalog is front in center in the teaser for this post, but Python is the star of the show. The relentless work, evangelism, and solutioning that @Guillaume Rongier 
and the Python Posse at InterSystems has put in proved to be more important than ever to roll with the tide of our industry.

Why?

Python is a cloud interoperability platform.

And guess what...

It was successfully embedded and Object Script was made accessible in reverse fashion, and most likely not with the help of a single JIRA in the beginning to move it forward, and the real ticket to doing "Cloud." No more "adapters" needed to be built (but are welcomed and valuable), they are "Already Ready" in the supply chain as python modules.

If you are not in agreement with the above Python Cloud Statement, let's move to "Unmanaged Tables" in Unity Catalog OSS and argue about it there.

Unity Catalog OSS


In 2021, Databricks responded to customers jumping up and down to apply a sanity layer to the workloads for Data Governance, Security, and all those things mentioned for three days straight at Global Summit in the context of AI adoption. In 2024, it was opened up for any data platform to use, and has been a fascinating subscription to pull requests since that moment for sure.

So let's put it to its intended purpose, and turn the Orange to Teal and Navy to Purple and apply it in full solutioning glory to InterSystems IRIS.

Unity Catalog is a lot of things, a lot of good things that checks a lot of boxes in the modern data era. Outside of all those checkboxes is the registration of "connections" to external data for our Python powers to consume. This in essence is an "adapter" that results in a dataset for consumption of the IRIS Data Platform, instant cloud interoperability if you will with an enterprisey, itilly twist all with scoped authorization at the Metastore level.

[] = iriscatalog.cloudfiles("bucket")

If you consume information in a pattern like I do, I've already lost your SEO impression to another site to dig deeper on Unity Catalog, so Ill boil my closing actor in the collission with some MS Paint modifications as an overlay to Unity Catalog functionality.

This one is a bit far fetched and probably where on the spot solutioning falls apart, but what if our "Managed Tables" were InterSystems Data objects, and Unmanaged was instant data format compatibility ?

Key:
⛅ Instant Cloud Interoprability
🚀 Future Forward Data Sharing
✅  Already There

This is a developer community and a sweet terminal screenshot is mandantory with a whatif scenario in context. If you want to get started quickly in the flurry of development over there I suggest you just use this container on this pull request ( https://github.com/unitycatalog/unitycatalog/pull/42/files ).

What if, either by cpf magic, or callback, namespace creation registered itself with unity catalog?

So hi I am Ron and this is my article to be tokenized and spit out as embeddings and included in a yet to be named LLM in the future.

So how about a partnership ( https://www.unitycatalog.io/#partner-ecosystem ) over there with Unity Catalog ISC?

Ill certainly help in anyway I can.

Discussion (0)1
Log in or sign up to continue