Article
· 8 hr ago 8m read

Using DocDB in SQL, almostContestant

From the previous article, we identified some issues when working with JSON in SQL.

IRIS offers a dedicated feature for handling JSON documents, called DocDB.

InterSystems IRIS® data platform DocDB is a facility for storing and retrieving database data. It is compatible with, but separate from, traditional SQL table and field (class and property) data storage and retrieval. It is based on JSON (JavaScript Object Notation) which provides support for web-based data exchange. InterSystems IRIS provides support for developing DocDB databases and applications in REST and in ObjectScript, as well as providing SQL support for creating or querying DocDB data.

By its nature, InterSystems IRIS Document Database is a schema-less data structure. That means that each document has its own structure, which may differ from other documents in the same database. This has several benefits when compared with SQL, which requires a pre-defined data structure.

The word “document” is used here as a specific industry-wide technical term, as a dynamic data storage structure. “Document”, as used in DocDB, should not be confused with a text document, or with documentation.

Let's explore how DocDB can help store JSON in the database and integrate it into projects that rely solely on xDBC protocols.

Let's start

DocDB defines two key components:

  • %DocDB.Database - Although it expects the creation of a "Database," which can be confusing since we already have a database in SQL terms, it is essentially a class in ObjectScript. For those more familiar with SQL, it functions as a table.
  • %DocDB.Document - A base class for a "database" that extends the %Persistent class and introduces DocDB-specific properties:
    • %DocumentId - IdKey
    • %Doc As %DynamicAbstractObject - The actual storage for the JSON document
    • %LastModified - An automatically updated timestamp for each insert and update

Creating a Table (Database)

Now, let's create our first table, or rather, our first "Database." It seems that the expectation was not for someone to create a DocDB.Database using only SQL. As a result, there is no way to create a new "Database" using SQL alone. To thoroughly test this, we will use a plain ObjectScript approach. Below is an example of how to define a class that extends %DocDB.Document:

Class User.docdb Extends %DocDB.Document [ DdlAllowed ]
{

}

Checking the newly created table using SQL shows that it is functional.

Time to give it a first try and insert some data

We can insert any data without validation, meaning there are no restrictions on what can be inserted into %Doc. Implementing validation would be beneficial.

Extracting Values from a Document

%DocDB.Database allows properties to be extracted from documents, making them available as dedicated columns. This also enables indexing on these properties.

Would need to get database first.

USER>set docdb=##class(%DocDB.Database).%GetDatabase("User.docdb")

<THROW>%GetDatabase+5^%DocDB.Database.1 *%Exception.StatusException ERROR #25351: DocDB Database 'User.docdb' does not exist.

USER 2e1>w $SYSTEM.DocDB.Exists("User.docdb")
0

Hmm, database "does not exist", okay, let's create then

USER>set docdb=##class(%DocDB.Database).%CreateDatabase("User.docdb")

<THROW>%CreateDatabase+13^%DocDB.Database.1 *%Exception.StatusException ERROR #25070: The generated class name for the database 'User.docdb' conflicts with another class: User.docdb
USER 2e1>

This suggests that a simple class definition is not sufficient. Instead, we must use %DocDB.Database from the beginning, which is inconvenient, especially when using source control.

To resolve this, we delete the existing class and create the database correctly:

USER>do $system.OBJ.Delete("User.docdb")

Deleting class User.docdb
USER>set docdb=##class(%DocDB.Database).%CreateDatabase("User.docdb")

USER>zwrite docdb
docdb=6@%DocDB.Database  ; <OREF,refs=1>
+----------------- general information ---------------
|      oref value: 6
|      class name: %DocDB.Database
|           %%OID: $lb("3","%DocDB.Database")
| reference count: 1
+----------------- attribute values ------------------
|       %Concurrency = 1  <Set>
|          ClassName = "User.docdb"
|       DocumentType = ""
|               Name = "User.docdb"
|           Resource = ""
|   SqlNameQualified = "SQLUser.docdb"
+-----------------------------------------------------

This time, it works, and previously inserted data remains intact.

Assuming we have a document like this

{"name":"test", "some_value":12345}

Let's extract both of these fields, using %CreateProperty method

USER>do docdb.%CreateProperty("name","%String","$.name",0)

USER>do docdb.%CreateProperty("someValue","%String","$.some_value",0)

And check the table

Upon checking the table, we see two new columns, but they contain null values. It appears that these properties do not apply retroactively to existing data. If a developer later adds properties and indexes for optimization, the existing data will not automatically reflect these changes.

Update with the same value, and check if %doc is json. And we got our value.

Let's have a look at the class now, which is fully created and updated by %DocDB.Database

Class User.docdb Extends %DocDB.Document [ Owner = {irisowner}, ProcedureBlock ]
{

Property name As %String [ SqlComputeCode = { set {*}=$$%EvaluatePathOne^%DocDB.Document({%Doc},"$.name")
}, SqlComputed, SqlComputeOnChange = %Doc ];
Property someValue As %String [ SqlComputeCode = { set {*}=$$%EvaluatePathOne^%DocDB.Document({%Doc},"$.some_value")
}, SqlComputed, SqlComputeOnChange = %Doc ];
Index name On name;
Index someValue On someValue;
}

So, created properties, contains a code to extract value from %Doc, and yes, it only populated if %Doc is changed. And it created Indexes for both fields, no one asked for. Having many extracted values, will increase globals usage just by amount of indexes.

It will be possible to update those created properties, with no harm to original %Doc, but the values will became irrelevant.

 
Insert invalid data

%DocDB.Database, has %GetProperty method

USER>zw docdb.%GetPropertyDefinition("someValue")

{"Name":"someValue","Type":"%Library.String"}  ; <DYNAMIC OBJECT> 
USER>zw docdb.%GetPropertyDefinition("name")

{"Name":"name","Type":"%Library.String"}  ; <DYNAMIC OBJECT>

The path to the value which was used in %CreateProperty is gone, no way to validate it. If path requires incorrect, to update it, %DropProperty required to be called first and %CreateProperty again.

%FindDocuments

%DocDB.Database offers a way to search through the documents

To find one or more documents in a database and return the document(s) as JSON, invoke the %FindDocuments() method. This method takes any combination of three optional positional predicates: a restriction array, a projection array, and a limit key:value pair.

Most important part, %FindDocuments does not care about %Doc itself, it only works on properties. Quite fragile, throws exceptions on anything that does not match expectations. In fact, just constructs some SQL query and executes it.

USER>do docdb.%FindDocuments(["firstName","B","%STARTSWITH"]).%ToJSON() 

<THROW>%FindDocuments+37^%DocDB.Database.1 *%Exception.StatusException ERROR #25541: DocDB Property 'firstName' does not exist in 'User.docdb'

USER>do docdb.%FindDocuments(["name","test","in"],["name"]).%ToJSON()

{"sqlcode":100,"message":null,"content":[{"name":"test"}]}
USER>do docdb.%FindDocuments(["name","","in"],["name"]).%ToJSON() 

<THROW>%FindDocuments+37^%DocDB.Database.1 *%Exception.SQL -12 -12   A term expected, beginning with either of:  identifier, constant, aggregate, $$, (, :, +, -, %ALPHAUP, %EXACT, %MVR %SQLSTRING, %SQLUPPER, %STRING, %TRUNCATE, or %UPPER^ SELECT name FROM SQLUser . docdb WHERE name IN ( )

USER>do docdb.%FindDocuments(["name","test","="]).%ToJSON()

{"sqlcode":100,"message":null,"content":[{"%Doc":"{\"name\":\"test\", \"some_value\":12345}","%DocumentId":"1","%LastModified":"2025-02-05 12:25:02.405"}]}
USER 2e1>do docdb.%FindDocuments(["Name","test","="]).%ToJSON() 

<THROW>%FindDocuments+37^%DocDB.Database.1 *%Exception.StatusException ERROR #25541: DocDB Property 'Name' does not exist in 'User.docdb'

USER>do docdb.%FindDocuments(["%Doc","JSON","IS"]).%ToJSON() 

<THROW>%FindDocuments+37^%DocDB.Database.1 *%Exception.StatusException ERROR #25540: DocDB Comparison operator is not valid: 'IS'
USER 2e1>do docdb.%FindDocuments(["%Doc","","IS JSON"]).%ToJSON() 

<THROW>%FindDocuments+37^%DocDB.Database.1 *%Exception.StatusException ERROR #25540: DocDB Comparison operator is not valid: 'IS JSON'

Plain SQL would be much more reliable 

Storage

Another very interesting moment is how efficiently is JSON stored in the database.

^poCN.bvx3.1(1)=$lb("","2025-02-05 12:25:02.405","test",12345)
^poCN.bvx3.1(1,"%Doc")="{""name"":""test"", ""some_value"":12345}"
^poCN.bvx3.1(2)=$lb("","2025-02-05 12:25:02.405")
^poCN.bvx3.1(2,"%Doc")="[1,2,3]"
^poCN.bvx3.1(3)=$lb("","2025-02-05 12:01:18.542")
^poCN.bvx3.1(3,"%Doc")="test"
^poCN.bvx3.1(4)=$lb("","2025-02-05 12:01:19.445")
^poCN.bvx3.1(4,"%Doc")=$c(0)
^poCN.bvx3.1(5)=$lb("","2025-02-05 12:01:20.794")

JSON is stored as plain text, while other databases use binary formats for more efficient storage and searching. IRIS's DocDB does not support direct searching within document content unless JSON_TABLE is used, which still requires parsing JSON into an internal binary format.

In version 2025.1, %DynamicAbstractObject introduces %ToPVA and %FromPVA methods, which seem to store JSON in a binary format.

USER>do ({"name":"value"}).%ToPVA($name(^JSON.Data(1))) 

USER>zw ^JSON.Data
^JSON.Data(1,0,0)="PVA1"_$c(134,0,6,0,2,0,0,0,0,0,14,0,15,0,2,0,21,9,6,136,0,1,6,0,1,0,2,1,137,0,1,5,8,1,6)_"value"_$c(6,0,6)_"name"_$c(5) 

USER>zw {}.%FromPVA($name(^JSON.Data(1)))

{"name":"value"}  ; <DYNAMIC OBJECT,refs=1>

However, there are inconsistencies when handling certain structures.

USER>do ({}).%ToPVA($name(^JSON.Data(1)))

<SYSTEM>%ToPVA+1^%Library.DynamicAbstractObject.1

USER>do ({"name":{}}).%ToPVA($name(^JSON.Data(1)))

<SYSTEM>%ToPVA+1^%Library.DynamicAbstractObject.1

Conclusion

As of now, %DocDB is only practical within ObjectScript and has limitations in SQL. Performance concerns arise when dealing with large datasets. Everything %DocDB offers can be achieved using plain SQL while maintaining full SQL support. Given the current implementation, there is little incentive to use DocDB over standard SQL approaches.

Discussion (1)1
Log in or sign up to continue

Thanks for this valuable feedback.

Few years ago (2020 ish), i had to do a project based on DocDB.
We encontered the same issues :

  • API first, not code first
    • Workaround : We relied heavily on scripts to generate the "Databases", "Properties", "Indexes" and so.
  • When you create a property it's not automatically indexed
    • Workaround : We created a wrapper around the SDK to ensure that every property was indexed
  • No way to enforce a schema
    • Workaround : No workaround, we didn't really care about that at the time

What you didn't mention and that we encountered :

  • Composite indexes are not supported
    • Workaround : We solved this with the "wrapper" we created
  • No support of neasted objects
    • Workaround : We didn't solve this, we had to flatten all the objects :(
  • Some operators was not supported or not working as expected
    • Workaround : We created some wrc tickets and most of them were fixed :) or built our own sql statements based on indexes

What great is that we never get blocked by those issues, we always found a workaround.

I'm glad to see that DocDB is still alive and the team is working on it.

It's a step in the right direction to support "json" databases. I can't wait to see the next steps maybe a client side library, support for nested objects, composite indexes, a great sql function to support json objects, etc.