Written by

IRIS Developer Advocate, Software developer at CaretDev, Tabcorp

MOD

Article Dmitry Maslennikov · Apr 18 5m read

Inside $LISTBUILD

Most IRIS developers use $LISTBUILD every day — often without even noticing it.

It is not just a convenient function for building lists. It is also the default internal format used to store row data, global values, and many intermediate structures inside the database engine.

Despite this, the actual binary representation of $LISTBUILD values is rarely discussed. Most developers rely on its behavior, but never look at how the data is really stored.

This article focuses strictly on the binary layout of $LISTBUILD values, based on direct inspection via zzdump.

Element Structure

A $LISTBUILD value is a sequence of elements. Each element is encoded as:

[length][type][payload]

For small elements:

length is 1 byte and includes the entire element
type is 1 byte
payload is variable

Example:

$lb("hello")
→ 07 01 68 65 6C 6C 6F

Endianness

Updated: thanks to @Steven Hobbs

All examples in this article use little-endian byte order — and this is not incidental.

Despite IRIS supporting both little-endian and big-endian platforms internally, $LISTBUILD uses a consistent little-endian encoding for all binary payloads.

This applies to:

integer values
decimal mantissas
floating-point values (08, 09)
UTF-16 strings (UTF-16LE)

For example:

integers are stored least-significant byte first
IEEE floating-point values follow little-endian layout
Unicode strings use UTF-16LE encoding

Extended Length Encoding

When the element size exceeds one byte capacity, IRIS switches to extended forms:

00 + 2-byte length
00 00 + 4-byte length

Example:

00 01 01 01 ...

Length encoding is therefore variable-width and marker-based.

String Types

Type	Meaning
`01`	ASCII string
`02`	Unicode string

Examples:

$lb("")       → 02 01
$lb("hello")  → 07 01 68 65 6C 6C 6F
$lb("привет") → 0E 02 ...

Unicode strings are stored as UTF-16.

Surrogate Pairs

Characters outside the Basic Multilingual Plane use UTF-16 surrogate pairs:

$lb("🔟")
→ 06 02 3D D8 1F DD

D83D DD1F → U+1F51F

Unicode values are stored as raw UTF-16 (platform endianness), without compression.

Integer Encoding

Type	Meaning
`04`	non-negative integer
`05`	negative integer

Encoding is variable-length.

Special cases:

0  → 02 04
-1 → 02 05

Examples:

1     → 03 04 01
255   → 03 04 FF
256   → 04 04 00 01

-2    → 03 05 FE
-256  → 03 05 00
-257  → 04 05 FF FE

Observations:

positive integers use unsigned binary representation
negative integers use two’s complement with variable width
payload grows only when required

Decimal Numbers

Type	Meaning
`06`	positive decimal
`07`	negative decimal

Structure:

[length][type][scale][mantissa...]

scale — 1 byte (decimal exponent)
mantissa — variable-length integer

Example:

0.1     → 04 06 FF 01
0.01    → 04 06 FE 01
0.00002 → 04 06 FB 02

Large value:

2^32 + 0.1
→ 08 06 FF 01 00 00 00 0A

Interpretation:

value = mantissa × 10^exponent

Negative values mirror the same structure:

-0.00002 → 04 07 FB FE

Decimal values are stored as scaled integers, preserving exact decimal semantics.

Binary Floating-Point

IRIS supports two binary floating-point encodings.

Compact Float (Type 08)

[length][08][payload...]

IEEE 754 single-precision (float32)
little/big-endian depending on platform
trailing zero bytes are omitted

Examples:

1.5  → 04 08 C0 3F
1.25 → 04 08 A0 3F
0.5  → 03 08 3F
10.0 → 04 08 20 41

IEEE Double (Type 09)

[length][09][8 bytes]

IEEE 754 double-precision (float64)
fixed 8-byte payload

Example:

$double(0.1)
→ 0A 09 9A 99 99 99 99 99 B9 3F

Special IEEE Values

$LISTBUILD also preserves special IEEE floating-point values such as NaN and Infinity.

Example:

zzdump $lb($double("-NAN")),$lb($double("NAN")),$lb($double("INF")),$lb($double("-INF"))

0000: 04 09 F8 FF
0000: 04 09 F8 7F
0000: 04 08 80 7F
0000: 04 08 80 FF

Observations

Value	Encoding
`$double("-NAN")`	`04 09 F8 FF`
`$double("NAN")`	`04 09 F8 7F`
`$double("INF")`	`04 08 80 7F`
`$double("-INF")`	`04 08 80 FF`

These encodings match the expected IEEE bit patterns:

NaN → exponent all ones, non-zero mantissa
Infinity → exponent all ones, zero mantissa
sign bit preserved in the high byte

Type Selection Behavior

Unlike regular numeric values, special IEEE values do not strictly follow the usual 08 vs 09 distinction:

NaN is stored using type 09
Infinity is stored using type 08

This shows that IRIS does not enforce a fixed storage width for floating-point values. Instead, it appears to choose the most compact representation that preserves the value.

Practical Implication

The distinction between types 08 and 09 is not purely “float32 vs float64”. It is influenced by whether the value can be represented in a smaller IEEE form without loss.

For finite values, this usually means:

08 → compact IEEE single-precision
09 → full IEEE double-precision

For special values, IRIS may use either type depending on which representation is shorter.

Summary

$LISTBUILD is not just a helper function — it is a core binary storage format inside IRIS.

It combines:

variable-length element encoding
compact integer representation
decimal values as scaled integers
binary floating-point in two forms:
- compact float32 (08)
- full float64 (09)

The format is:

space-efficient
self-delimiting
internally consistent across types

While not formally documented, its structure is stable and precise enough to be reverse-engineered and implemented outside IRIS.

Discussion (11)0

Add reply

Comments

Robert Cemper · Apr 18

THANK YOU @Dmitry Maslennikov for this excellent insight into the probably
most important internal structure element of IRIS and its data type variations.

👍👏

1 0

Julius Kavay · Apr 18

Just two comments:
"Observations: ... payload grows only when required" is correct but a more correct explanation would be "the whole list structure is created with a minimum memory usage in mind".

The above note (minimum storage size) leads to special cases:

1) an integer 0 is stored in two bytes only
   02 04     (length, type) and not
   03 04 00  (length, type, data) which is also accepted(*)
   
   The same goes for nullstring (which is obvious)
   02 01     (length, type and, of course, no data), ASCII nullstring
   02 02     (length, type) a "Unicode" nullstring, 
   
2) If only the length component is present and equals 1,
   then this indicates a NULL element (i.e. a missing element):

set x=$lb(85,,,0,"","abc")
zzdump x --> 03 04 55 01 01 02 04 02 01 05 01 61 62 63

which breaks down into
03 04 55         $li(x,1) = 85
01               $li(x,2) = <NULL VALUE>
01               $li(x,3) = <NULL VALUE>
02 04            $li(x,4) = 0
02 01            $li(x,5) = "" / nullstring
05 01 61 62 63   $li(x,6) = "abc"

(*) I use this side effect (list use minimal bytes for integers) and ASCII strings are accepted even if their type is unicode) in some of my CallOuts to return results (as an IRIS-List) without explicitly converting C's two byte string into one byte ASCII where it aplies and I return integer values either as four or eith bytes even if the value would fit in two, three or five bytes.

For example, the ReadColumn() method of the excel library class could return something like

set colData = %exl.ReadColumn(3)

zzdump colData
08 02 61 00 62 00 63 00 06 04 55 00 00 00   (ASCII "as" Unicode, 1 byte integer in 4 bytes)

zwrite colData
colData=$lb("abc",85)

Thank you guys at ISC for this wise implementation!

5 1

Steven Hobbs Apr 20 to Julius Kavay

A $LIST string containing 02 02 is NOT a Unicode nullstring. That byte string is never generated by $LISTBUILD. On a Unicode instance 02 01 is the nullstring. If a string has only byte characters then type code 02 cannot be used. If it were used then an 8-bit instance cannot access that byte string and $LISTSAME may give incorrect results.

0 0

Julius Kavay Apr 20 to Steven Hobbs

I know that the $lb() representation of a null string is $c(2,1) on all instances (8-bit or Unicode), but passing the argument $c(2,2) to $list() is still correctly interpreted (respectively extracted) as a null string by the function.

set result = $c(3,4,125,2,2,5,1,97,98,99)    // return value from a C-program
set mylist = $lb(125,"","abc")

zwrite result  --> result=$lb(125,"","abc")  // correct 
write $listsame(result,mylist) --> 0         // because the type bytes are different

but after executing the following

if $list(result,2) = $list(mylist,2) { write "YES" } else { write "NO" }
if $list(result,2) = "" { write "YES" } else { write "NO" }

you will see two YESes which proofs my answer from 18. Apr.

0 0

Steven Hobbs Apr 20 to Julius Kavay

Detecting what is a valid $LIST string can be difficult and you cannot trust $LISTVALID to always reject illegal $LIST strings, although it will not reject a legal $LIST string. I.e., $LISTVALID gives false positives but does not give false negatives.

I admit that $LISTVALID accepts $C(2,2) and that is probably a bug. However, $LISTVALID will not accept $c(3,2,32), a Unicode $LIST string containing one blank characters while $LISTVALID will accept $C(3,1,32) , an 8-bit $LIST string containing one blank character.

$LISTVALID is the only $LISTxxx function that spends extra execution cycles to find harmless, invalid $LIST strings. The other $LISTxxx functions do check string lengths and byte code ranges to give a <LIST> or <FUNCTION> error signal instead of touching out-of-bounds memory but these other $LISTxxx functions spend no further execution time doing extra testing.

$LISTVALID(result) will accept your 'result' string from above. However, if we change your 'result' variable to make the third $LIST entry encode three 8-bit characters using type type code 0x02 then both $LISTVALID and the ZWRITE command will reject your 'result' string because it contains a Unicode $LIST element that only contains 8-bit characters.

USER>set result = $c(3,4,125,2,2,5, 2,97,98,99)

USER>zw result
result=$c(3,4)_"}"_$c(2,2,5,2)_"abc"

USER>zw $LISTVALID(result)
0

Also, $LIST(result,3) will generate a <FUNCTION> signal as there seems to be some worry about widening a Unicode string that does not need widening. But $LIST(result,4) will signal <NULL VALUE> because $LIST will skip over the properly sized 3rd $LIST element.

Setting x="" is a perfectly valid $LIST string with zero elements and you can repetitively execute
SET x = x _ $LB(.....)
to grow the size of $LIST string 'x'.

If you execute SET y=$LB(), then 'y' is a $LIST string with one element that will signal <NULL VALUE> if you evaluate $LIST(y,1) (or equivalently $LIST(y)).

One final example:
USER>set y=$lb(,,"Z")

USER>zzdump y

0000: 01 01 03 01 5A ....Z
USER>zwrite y
y=$lb(,,"Z")

Summary: It requires care to call a $LISTxxx function with a string argument not generated by $LISTBUILD (or by the concatenation of either empty strings or strings generated by $LISTBUILD or by SET $LIST(variable,i)=value) Carelessly constructed $LIST strings can work in some places but give strange results in other places.

0 0

Yaron Munz · Apr 20

Very good article @Dmitry Maslennikov

The $LIST framework was created alongside classes in Caché as a replacement for the older $PIECE function. With $PIECE, you must predefine a delimiter that cannot occur in the data, while $LIST avoids this constraint entirely.

0 0

Steven Hobbs · Apr 20

The behavior of $LISTBUILD on big-endian instances and little-endian instances is identical. $LISTBUILD always builds a string of 8-bit bytes with the payload always in little-endian order. This means $LIST data can be shared between big- and little-endian instances. A $LIST string only uses characters between $C(0) and $C(255) since byte strings can be easily transferred over 8-bit byte network connections and can be efficiently stored in an InterSystems Globals database. On an IRIS Unicode instance a $LIST string in memory wastes half the bytes since in-memory strings are always encoded using UTF-16 representation. However, InterSystems IRIS will convert such data to 8-bit bytes when storing it in the database.

There are additional type codes than those described in the above article.

2 0

Dmitry Maslennikov Apr 20 to Steven Hobbs

Thanks for the notes. I have never seen a big-endian system and did not have a chance to test my it there, so, just assumed the behavior. I’ll update article about it

As for other types, I’m aware about few more types, which I thought mostly used in network protocol, and did not want to cover them yet

0 0

Dmitry Maslennikov Apr 20 to Steven Hobbs

Found types 12 and 13, which looks to be some ascii, same as type 1

USER>s l = $c(3,13,65) zw l w "$lv = ", $lv(l),!,"$li = ",$li(l) 
l=$lb("A")
$lv = 1
$li = A
USER>s l = $c(3,12,65) zw l w "$lv = ", $lv(l),!,"$li = ",$li(l) 
l=$lb("A")
$lv = 1
$li = A

other types really seems to be handled by network protocol only

0 0

Steven Hobbs Apr 20 to Dmitry Maslennikov

Look at the $system.Process.ListFormat(newformat) method.

It allows you to turn off/on IEEE $DOUBLE compression in $LISTBUILD using $LIST code 0x09. For many years this was defaulted off but it is now turned on by default. You may have to turn it off if you are sending $LIST data strings to old client software that does not support $DOUBLE compression.

$system.Process.Listformat(newformat) also allows you to turn on compressed Unicode strings formats in $LISTBUILD using $LIST codes 0x0D, 0x0E and 0x0F. This is turned off by default because some customers are using old client software that does not support Unicode compression. On a Unicode instance this can reduce the number of bytes in a Unicode string that uses mostly one of the ASCII, Latin-1, Greek or Cyrillic character sets with a small number of other Unicode characters mixed in. These codes are used as a replacement for the 0x02 code when the resulting $LIST string will be shortened.

Every IRIS and every Caché server version since 2016.2 will accept $LIST strings containing these $LIST compression codes even when $LISTBUILD has been told not to generate these codes.

$LIST codes 0x0A, 0x0B and 0x0C are reserved for the future and have not been implemented. $LIST code 0x10 is used the the ObjectScript $VECTOR types. $LIST byte codes larger than 0x18 are reserved communicating special cases during internal transfers using $LIST data strings. These special communication bytes codes are never generated by $LISTBUILD.

0 0

Julius Kavay Apr 20 to Steven Hobbs

Some time ago I noticed (rather by chance) the presence of some other (new) types, but I did not look more closely into the what and why of those codes - I'm just happay with the "old ones", that I use as menitioned above.
Thank you for the explanation.

0 0