Inside $LISTBUILDContestant
Most IRIS developers use $LISTBUILD every day — often without even noticing it.
It is not just a convenient function for building lists. It is also the default internal format used to store row data, global values, and many intermediate structures inside the database engine.
Despite this, the actual binary representation of $LISTBUILD values is rarely discussed. Most developers rely on its behavior, but never look at how the data is really stored.
This article focuses strictly on the binary layout of $LISTBUILD values, based on direct inspection via zzdump.
Element Structure
A $LISTBUILD value is a sequence of elements. Each element is encoded as:
[length][type][payload]
For small elements:
lengthis 1 byte and includes the entire elementtypeis 1 bytepayloadis variable
Example:
$lb("hello")
→ 07 01 68 65 6C 6C 6F
Endianness
Updated: thanks to @Steven Hobbs
All examples in this article use little-endian byte order — and this is not incidental.
Despite IRIS supporting both little-endian and big-endian platforms internally, $LISTBUILD uses a consistent little-endian encoding for all binary payloads.
This applies to:
- integer values
- decimal mantissas
- floating-point values (
08,09) - UTF-16 strings (UTF-16LE)
For example:
- integers are stored least-significant byte first
- IEEE floating-point values follow little-endian layout
- Unicode strings use UTF-16LE encoding
Extended Length Encoding
When the element size exceeds one byte capacity, IRIS switches to extended forms:
00+ 2-byte length00 00+ 4-byte length
Example:
00 01 01 01 ...
Length encoding is therefore variable-width and marker-based.
String Types
| Type | Meaning |
|---|---|
01 |
ASCII string |
02 |
Unicode string |
Examples:
$lb("") → 02 01
$lb("hello") → 07 01 68 65 6C 6C 6F
$lb("привет") → 0E 02 ...
Unicode strings are stored as UTF-16.
Surrogate Pairs
Characters outside the Basic Multilingual Plane use UTF-16 surrogate pairs:
$lb("🔟")
→ 06 02 3D D8 1F DD
D83D DD1F → U+1F51F
Unicode values are stored as raw UTF-16 (platform endianness), without compression.
Integer Encoding
| Type | Meaning |
|---|---|
04 |
non-negative integer |
05 |
negative integer |
Encoding is variable-length.
Special cases:
0 → 02 04
-1 → 02 05
Examples:
1 → 03 04 01
255 → 03 04 FF
256 → 04 04 00 01
-2 → 03 05 FE
-256 → 03 05 00
-257 → 04 05 FF FE
Observations:
- positive integers use unsigned binary representation
- negative integers use two’s complement with variable width
- payload grows only when required
Decimal Numbers
| Type | Meaning |
|---|---|
06 |
positive decimal |
07 |
negative decimal |
Structure:
[length][type][scale][mantissa...]
scale— 1 byte (decimal exponent)mantissa— variable-length integer
Example:
0.1 → 04 06 FF 01
0.01 → 04 06 FE 01
0.00002 → 04 06 FB 02
Large value:
2^32 + 0.1
→ 08 06 FF 01 00 00 00 0A
Interpretation:
value = mantissa × 10^exponent
Negative values mirror the same structure:
-0.00002 → 04 07 FB FE
Decimal values are stored as scaled integers, preserving exact decimal semantics.
Binary Floating-Point
IRIS supports two binary floating-point encodings.
Compact Float (Type 08)
[length][08][payload...]
- IEEE 754 single-precision (float32)
- little/big-endian depending on platform
- trailing zero bytes are omitted
Examples:
1.5 → 04 08 C0 3F
1.25 → 04 08 A0 3F
0.5 → 03 08 3F
10.0 → 04 08 20 41
IEEE Double (Type 09)
[length][09][8 bytes]
- IEEE 754 double-precision (float64)
- fixed 8-byte payload
Example:
$double(0.1)
→ 0A 09 9A 99 99 99 99 99 B9 3F
Special IEEE Values
$LISTBUILD also preserves special IEEE floating-point values such as NaN and Infinity.
Example:
zzdump $lb($double("-NAN")),$lb($double("NAN")),$lb($double("INF")),$lb($double("-INF"))
0000: 04 09 F8 FF
0000: 04 09 F8 7F
0000: 04 08 80 7F
0000: 04 08 80 FF
Observations
| Value | Encoding |
|---|---|
$double("-NAN") |
04 09 F8 FF |
$double("NAN") |
04 09 F8 7F |
$double("INF") |
04 08 80 7F |
$double("-INF") |
04 08 80 FF |
These encodings match the expected IEEE bit patterns:
NaN→ exponent all ones, non-zero mantissaInfinity→ exponent all ones, zero mantissa- sign bit preserved in the high byte
Type Selection Behavior
Unlike regular numeric values, special IEEE values do not strictly follow the usual 08 vs 09 distinction:
NaNis stored using type09Infinityis stored using type08
This shows that IRIS does not enforce a fixed storage width for floating-point values. Instead, it appears to choose the most compact representation that preserves the value.
Practical Implication
The distinction between types
08and09is not purely “float32 vs float64”. It is influenced by whether the value can be represented in a smaller IEEE form without loss.
For finite values, this usually means:
08→ compact IEEE single-precision09→ full IEEE double-precision
For special values, IRIS may use either type depending on which representation is shorter.
Summary
$LISTBUILD is not just a helper function — it is a core binary storage format inside IRIS.
It combines:
-
variable-length element encoding
-
compact integer representation
-
decimal values as scaled integers
-
binary floating-point in two forms:
- compact float32 (
08) - full float64 (
09)
- compact float32 (
The format is:
- space-efficient
- self-delimiting
- internally consistent across types
While not formally documented, its structure is stable and precise enough to be reverse-engineered and implemented outside IRIS.
Comments
THANK YOU @Dmitry Maslennikov for this excellent insight into the probably
most important internal structure element of IRIS and its data type variations.
👍👏
Just two comments:
"Observations: ... payload grows only when required" is correct but a more correct explanation would be "the whole list structure is created with a minimum memory usage in mind".
The above note (minimum storage size) leads to special cases:
1) an integer 0 is stored in two bytes only
02 04 (length, type) and not
03 04 00 (length, type, data) which is also accepted(*)
The same goes for nullstring (which is obvious)
02 01 (length, type and, of course, no data), ASCII nullstring
02 02 (length, type) a "Unicode" nullstring,
2) If only the length component is present and equals 1,
then this indicates a NULL element (i.e. a missing element):
set x=$lb(85,,,0,"","abc")
zzdump x --> 03 04 55 01 01 02 04 02 01 05 01 61 62 63
which breaks down into
03 04 55 $li(x,1) = 85
01 $li(x,2) = <NULL VALUE>
01 $li(x,3) = <NULL VALUE>
02 04 $li(x,4) = 0
02 01 $li(x,5) = "" / nullstring
05 01 61 62 63 $li(x,6) = "abc"
(*) I use this side effect (list use minimal bytes for integers) and ASCII strings are accepted even if their type is unicode) in some of my CallOuts to return results (as an IRIS-List) without explicitly converting C's two byte string into one byte ASCII where it aplies and I return integer values either as four or eith bytes even if the value would fit in two, three or five bytes.
For example, the ReadColumn() method of the excel library class could return something like
set colData = %exl.ReadColumn(3)
zzdump colData
08 02 61 00 62 00 63 00 06 04 55 00 00 00 (ASCII "as" Unicode, 1 byte integer in 4 bytes)
zwrite colData
colData=$lb("abc",85)
Thank you guys at ISC for this wise implementation!
A $LIST string containing 02 02 is NOT a Unicode nullstring. That byte string is never generated by $LISTBUILD. On a Unicode instance 02 01 is the nullstring. If a string has only byte characters then type code 02 cannot be used. If it were used then an 8-bit instance cannot access that byte string and $LISTSAME may give incorrect results.
I know that the $lb() representation of a null string is $c(2,1) on all instances (8-bit or Unicode), but passing the argument $c(2,2) to $list() is still correctly interpreted (respectively extracted) as a null string by the function.
set result = $c(3,4,125,2,2,5,1,97,98,99) // return value from a C-program
set mylist = $lb(125,"","abc")
zwrite result --> result=$lb(125,"","abc") // correct
write $listsame(result,mylist) --> 0 // because the type bytes are differentbut after executing the following
if $list(result,2) = $list(mylist,2) { write "YES" } else { write "NO" }
if $list(result,2) = "" { write "YES" } else { write "NO" }you will see two YESes which proofs my answer from 18. Apr.
Detecting what is a valid $LIST string can be difficult and you cannot trust $LISTVALID to always reject illegal $LIST strings, although it will not reject a legal $LIST string. I.e., $LISTVALID gives false positives but does not give false negatives.
I admit that $LISTVALID accepts $C(2,2) and that is probably a bug. However, $LISTVALID will not accept $c(3,2,32), a Unicode $LIST string containing one blank characters while $LISTVALID will accept $C(3,1,32) , an 8-bit $LIST string containing one blank character.
$LISTVALID is the only $LISTxxx function that spends extra execution cycles to find harmless, invalid $LIST strings. The other $LISTxxx functions do check string lengths and byte code ranges to give a <LIST> or <FUNCTION> error signal instead of touching out-of-bounds memory but these other $LISTxxx functions spend no further execution time doing extra testing.
$LISTVALID(result) will accept your 'result' string from above. However, if we change your 'result' variable to make the third $LIST entry encode three 8-bit characters using type type code 0x02 then both $LISTVALID and the ZWRITE command will reject your 'result' string because it contains a Unicode $LIST element that only contains 8-bit characters.
USER>set result = $c(3,4,125,2,2,5, 2,97,98,99)
USER>zw result
result=$c(3,4)_"}"_$c(2,2,5,2)_"abc"
USER>zw $LISTVALID(result)
0
Also, $LIST(result,3) will generate a <FUNCTION> signal as there seems to be some worry about widening a Unicode string that does not need widening. But $LIST(result,4) will signal <NULL VALUE> because $LIST will skip over the properly sized 3rd $LIST element.
Setting x="" is a perfectly valid $LIST string with zero elements and you can repetitively execute
SET x = x _ $LB(.....)
to grow the size of $LIST string 'x'.
If you execute SET y=$LB(), then 'y' is a $LIST string with one element that will signal <NULL VALUE> if you evaluate $LIST(y,1) (or equivalently $LIST(y)).
One final example:
USER>set y=$lb(,,"Z")
USER>zzdump y
0000: 01 01 03 01 5A ....Z
USER>zwrite y
y=$lb(,,"Z")
Summary: It requires care to call a $LISTxxx function with a string argument not generated by $LISTBUILD (or by the concatenation of either empty strings or strings generated by $LISTBUILD or by SET $LIST(variable,i)=value) Carelessly constructed $LIST strings can work in some places but give strange results in other places.
Very good article @Dmitry Maslennikov
The $LIST framework was created alongside classes in Caché as a replacement for the older $PIECE function. With $PIECE, you must predefine a delimiter that cannot occur in the data, while $LIST avoids this constraint entirely.
The behavior of $LISTBUILD on big-endian instances and little-endian instances is identical. $LISTBUILD always builds a string of 8-bit bytes with the payload always in little-endian order. This means $LIST data can be shared between big- and little-endian instances. A $LIST string only uses characters between $C(0) and $C(255) since byte strings can be easily transferred over 8-bit byte network connections and can be efficiently stored in an InterSystems Globals database. On an IRIS Unicode instance a $LIST string in memory wastes half the bytes since in-memory strings are always encoded using UTF-16 representation. However, InterSystems IRIS will convert such data to 8-bit bytes when storing it in the database.
There are additional type codes than those described in the above article.
Thanks for the notes. I have never seen a big-endian system and did not have a chance to test my it there, so, just assumed the behavior. I’ll update article about it
As for other types, I’m aware about few more types, which I thought mostly used in network protocol, and did not want to cover them yet
Found types 12 and 13, which looks to be some ascii, same as type 1
USER>s l = $c(3,13,65) zw l w "$lv = ", $lv(l),!,"$li = ",$li(l)
l=$lb("A")
$lv = 1
$li = A
USER>s l = $c(3,12,65) zw l w "$lv = ", $lv(l),!,"$li = ",$li(l)
l=$lb("A")
$lv = 1
$li = A
other types really seems to be handled by network protocol only
Look at the $system.Process.ListFormat(newformat) method.
It allows you to turn off/on IEEE $DOUBLE compression in $LISTBUILD using $LIST code 0x09. For many years this was defaulted off but it is now turned on by default. You may have to turn it off if you are sending $LIST data strings to old client software that does not support $DOUBLE compression.
$system.Process.Listformat(newformat) also allows you to turn on compressed Unicode strings formats in $LISTBUILD using $LIST codes 0x0D, 0x0E and 0x0F. This is turned off by default because some customers are using old client software that does not support Unicode compression. On a Unicode instance this can reduce the number of bytes in a Unicode string that uses mostly one of the ASCII, Latin-1, Greek or Cyrillic character sets with a small number of other Unicode characters mixed in. These codes are used as a replacement for the 0x02 code when the resulting $LIST string will be shortened.
Every IRIS and every Caché server version since 2016.2 will accept $LIST strings containing these $LIST compression codes even when $LISTBUILD has been told not to generate these codes.
$LIST codes 0x0A, 0x0B and 0x0C are reserved for the future and have not been implemented. $LIST code 0x10 is used the the ObjectScript $VECTOR types. $LIST byte codes larger than 0x18 are reserved communicating special cases during internal transfers using $LIST data strings. These special communication bytes codes are never generated by $LISTBUILD.
Some time ago I noticed (rather by chance) the presence of some other (new) types, but I did not look more closely into the what and why of those codes - I'm just happay with the "old ones", that I use as menitioned above.
Thank you for the explanation.