Inside $LISTBUILDContestant
Most IRIS developers use $LISTBUILD every day — often without even noticing it.
It is not just a convenient function for building lists. It is also the default internal format used to store row data, global values, and many intermediate structures inside the database engine.
Despite this, the actual binary representation of $LISTBUILD values is rarely discussed. Most developers rely on its behavior, but never look at how the data is really stored.
This article focuses strictly on the binary layout of $LISTBUILD values, based on direct inspection via zzdump.
Element Structure
A $LISTBUILD value is a sequence of elements. Each element is encoded as:
[length][type][payload]
For small elements:
lengthis 1 byte and includes the entire elementtypeis 1 bytepayloadis variable
Example:
$lb("hello")
→ 07 01 68 65 6C 6C 6F
Endianness
The examples below assume a little-endian platform, which is the most common deployment for IRIS.
IRIS supports both little-endian and big-endian architectures, and the byte order of stored values follows the underlying platform.
This affects:
- integer payloads
- decimal mantissas
- floating-point values (
08,09) - UTF-16 encoding
$LISTBUILDis not strictly platform-independent at the byte level. Any external decoder must account for endianness.
Extended Length Encoding
When the element size exceeds one byte capacity, IRIS switches to extended forms:
00+ 2-byte length00 00+ 4-byte length
Example:
00 01 01 01 ...
Length encoding is therefore variable-width and marker-based.
String Types
| Type | Meaning |
|---|---|
01 |
ASCII string |
02 |
Unicode string |
Examples:
$lb("") → 02 01
$lb("hello") → 07 01 68 65 6C 6C 6F
$lb("привет") → 0E 02 ...
Unicode strings are stored as UTF-16.
Surrogate Pairs
Characters outside the Basic Multilingual Plane use UTF-16 surrogate pairs:
$lb("🔟")
→ 06 02 3D D8 1F DD
D83D DD1F → U+1F51F
Unicode values are stored as raw UTF-16 (platform endianness), without compression.
Integer Encoding
| Type | Meaning |
|---|---|
04 |
non-negative integer |
05 |
negative integer |
Encoding is variable-length.
Special cases:
0 → 02 04
-1 → 02 05
Examples:
1 → 03 04 01
255 → 03 04 FF
256 → 04 04 00 01
-2 → 03 05 FE
-256 → 03 05 00
-257 → 04 05 FF FE
Observations:
- positive integers use unsigned binary representation
- negative integers use two’s complement with variable width
- payload grows only when required
Decimal Numbers
| Type | Meaning |
|---|---|
06 |
positive decimal |
07 |
negative decimal |
Structure:
[length][type][scale][mantissa...]
scale— 1 byte (decimal exponent)mantissa— variable-length integer
Example:
0.1 → 04 06 FF 01
0.01 → 04 06 FE 01
0.00002 → 04 06 FB 02
Large value:
2^32 + 0.1
→ 08 06 FF 01 00 00 00 0A
Interpretation:
value = mantissa × 10^exponent
Negative values mirror the same structure:
-0.00002 → 04 07 FB FE
Decimal values are stored as scaled integers, preserving exact decimal semantics.
Binary Floating-Point
IRIS supports two binary floating-point encodings.
Compact Float (Type 08)
[length][08][payload...]
- IEEE 754 single-precision (float32)
- little/big-endian depending on platform
- trailing zero bytes are omitted
Examples:
1.5 → 04 08 C0 3F
1.25 → 04 08 A0 3F
0.5 → 03 08 3F
10.0 → 04 08 20 41
IEEE Double (Type 09)
[length][09][8 bytes]
- IEEE 754 double-precision (float64)
- fixed 8-byte payload
Example:
$double(0.1)
→ 0A 09 9A 99 99 99 99 99 B9 3F
Summary
$LISTBUILD is not just a helper function — it is a core binary storage format inside IRIS.
It combines:
-
variable-length element encoding
-
compact integer representation
-
decimal values as scaled integers
-
binary floating-point in two forms:
- compact float32 (
08) - full float64 (
09)
- compact float32 (
The format is:
- space-efficient
- self-delimiting
- internally consistent across types
While not formally documented, its structure is stable and precise enough to be reverse-engineered and implemented outside IRIS.
Comments
THANK YOU @Dmitry Maslennikov for this excellent insight into the probably
most important internal structure element of IRIS and its data type variations.
👍👏
Just two comments:
"Observations: ... payload grows only when required" is correct but a more correct explanation would be "the whole list structure is created with a minimum memory usage in mind".
The above note (minimum storage size) leads to special cases:
1) an integer 0 is stored in two bytes only
02 04 (length, type) and not
03 04 00 (length, type, data) which is also accepted(*)
The same goes for nullstring (which is obvious)
02 01 (length, type and, of course, no data), ASCII nullstring
02 02 (length, type) a "Unicode" nullstring,
2) If only the length component is present and equals 1,
then this indicates a NULL element (i.e. a missing element):
set x=$lb(85,,,0,"","abc")
zzdump x --> 03 04 55 01 01 02 04 02 01 05 01 61 62 63
which breaks down into
03 04 55 $li(x,1) = 85
01 $li(x,2) = <NULL VALUE>
01 $li(x,3) = <NULL VALUE>
02 04 $li(x,4) = 0
02 01 $li(x,5) = "" / nullstring
05 01 61 62 63 $li(x,6) = "abc"
(*) I use this side effect (list use minimal bytes for integers) and ASCII strings are accepted even if their type is unicode) in some of my CallOuts to return results (as an IRIS-List) without explicitly converting C's two byte string into one byte ASCII where it aplies and I return integer values either as four or eith bytes even if the value would fit in two, three or five bytes.
For example, the ReadColumn() method of the excel library class could return something like
set colData = %exl.ReadColumn(3)
zzdump colData
08 02 61 00 62 00 63 00 06 04 55 00 00 00 (ASCII "as" Unicode, 1 byte integer in 4 bytes)
zwrite colData
colData=$lb("abc",85)
Thank you guys at ISC for this wise implementation!