.png)
I attended Cloud Native Security Con in Seattle with full intention of crushing OTEL day, then perusing the subject of security applied to Cloud Native workloads the following days leading up to CTF as a professional excercise. This was happily upended by a new understanding of eBPF, which got my screens, career, workloads, and atitude a much needed upgrade with new approaches to solving workload problems.
So I made it to the eBPF party and have been attending clinic after clinic on the subject ever since, here I would like to "unbox" eBPF as a technical solution, mapped directly to what we do in practice (even if its a bit off), and step through eBPF through my experimentation on supporting InterSystems IRIS Workloads, particularly on Kubernetes, but not necessarily void on standalone workloads.
.png)
eBee Steps with eBPF and InterSystems IRIS Workloads
eBPF
eBPF (extended Berkeley Packet Filter), is a killer Linux kernel feature that implements a VM within kernel space and makes it possible to run sandboxed apps safely with guiderails. These apps can "map" data into user land for observability, tracing, security, and networking. I think of it as a "sniffer" of the OS, where traditionally it was associated with BPF and networking, and the extended version "sniffing" tracepoints, processes, scheduling, execution and block device access. If you didnt buy my analogy of eBPF, here is one from the pros:
"What JavaScript is to the browser, eBPF is to the Linux Kernel"
JavaScript lets you attach callbacks to events in the DOM in order to bring dynamic features to your web page. In a similar fashion, eBPF allows to hook to kernel events and extend their logic when these events are triggered!
IF; the following prometheus metric seems impossible to you, employ eBPF to watch processes that are supposed to be there and monitor in band through the kernel.
# HELP iris_instance_status The thing thats down telling us its down.
# TYPE iris_instance_status gauge
iris_instance_status 0
ObjectScript
ObjectScript
IF; you are tired of begging for the following for a sidecar to get needed observability, Goodbye sidecars
iris-sidecar:
resources:
requests:
memory: "2Mi"
cpu: "125m"
ObjectScript
ObjectScript
Where
One of the most satisfying things about how eBPF is applied, is where it runs... in the a VM, inside the kernel. And thanks to Linux Namespacing, you can guess how powerful that is in a cloud native environment, let alone a kernel sitting in some sort of virtualizing or a big iron ghetto blaster machine with admirational hardware.
.png)
Obligatory Hello World
For those of you who like to try things from themselves and from the "beginning" so to speak, I salute you with an obligatory Hello World, twisted to be a tad bit "irisy." However, its mostly undersstood that programming in eBPF wont become a skill that is frequently excercised, but concentrated on individuals who do Linux kernel development or building next generation monitoring tools.
I run Pop OS/Ubuntu, and here is my cheat code to getting into the eBPF world quickly on 23.04:
sudo apt install -y zip bison build-essential cmake flex git libedit-dev \
libllvm15 llvm-15-dev libclang-15-dev python3 zlib1g-dev libelf-dev libfl-dev python3-setuptools \
liblzma-dev libdebuginfod-dev arping netperf iperf libpolly-15-dev
git clone https://github.com/iovisor/bcc.git
mkdir bcc/build; cd bcc/build
cmake ..
make
sudo make install
cmake -DPYTHON_CMD=python3 ..
pushd ../src/python/
make
sudo make install
popd
cd bcc
make install
Bash
Bash
First ensure the target kernel has the required stuff...
cat /boot/config-$(uname -r) | grep 'CONFIG_BPF'
CONFIG_BPF=y
Bash
Bash
If `CONFIG_BPF=y` is in your window somewhere, we are good to go.
What we want to accomplish here with this simple hello world, is to get visibility into when IRIS is doing LInux system calls, without the use of anything but eBPF tooling and the kernel itself.
Here is a good way to go about exploration:
1️⃣ Find a Linux System Call of Interest
sudo ls /sys/kernel/debug/tracing/events/syscalls
ObjectScript
ObjectScript
For this example, we are going to trap when somebody (modified to trap IRIS), creates a directory through the syscall `sys_enter_mkdir`.
2️⃣ Insert it into the Following Hello World
Your BPF program to load and run is in the variable BPF_SOURCE_CODE, modify it to include the syscall you want to trap.
from bcc import BPF
from bcc.utils import printb
BPF_SOURCE_CODE = r"""
TRACEPOINT_PROBE(syscalls, sys_enter_mkdir) {
bpf_trace_printk("Directory was created by IRIS: %s\n", args->pathname);
return 0;
}
"""
bpf = BPF(text = BPF_SOURCE_CODE)
print("Go create a dir with IRIS...")
print("CTRL-C to exit")
while True:
try:
(task, pid, cpu, flags, ts, msg) = bpf.trace_fields()
if "iris" in task.decode("utf-8"):
printb(b"%s-%-6d %s" % (task, pid, msg))
except ValueError:
continue
except KeyboardInterrupt:
break
Python
Python
3️⃣ Load into the Kernel, Run
.png)
Create a dir in IRIS
.png)
Inspect the trace!
.png)
eBPF Powered Binaries
Doesnt take too long when going through the bcc repository and realize that there are plenty of examples, tools and binaries out there that take advantage of eBPF to do fun tracing, and "grep" in this case will suffice to derive some value.
Lets do just that on a start and stop of IRIS with some supplied examples.
execsnoop Trace new processes via exec() syscalls.
This one here tells a tale of the arguments to irisdb on start/stop.
sudo python3 execsnoopy.py | grep iris
iris 3014275 COMM PID PPID RET ARGS
3011645 0 /usr/bin/iris stop IRIS quietly restart
irisstop 3014275 3011645 0 /usr/irissys/bin/irisstop quietly restart
irisdb 3014276 3014275 0 ./irisdb -s/data/IRIS/mgr/ -cV
irisdb 3014277 3014275 0 ./irisdb -s/data/IRIS/mgr/ -U -B OPT^SHUTDOWN(1)
irisdb 3014279 3014275 0 ./irisdb -s/data/IRIS/mgr/ -cV
irisdb 3014280 3014275 0 ./irisdb -s/data/IRIS/mgr/ -cV
sh 3014281 3014275 0 /bin/sh -c -- /usr/irissys/bin/irisdb -s /data/IRIS/mgr/ -cL
irisdb 3014282 3014281 0 /usr/irissys/bin/irisdb -s /data/IRIS/mgr/ -cL
irisdb 3014283 3014275 0 ./irisdb -s/data/IRIS/mgr/ -cV
irisrecov 3014284 3014275 0 ./irisrecov /data/IRIS/mgr/ quietly
iriswdimj 3014678 3014284 0 /usr/irissys/bin/iriswdimj -t
iriswdimj 3014679 3014284 0 /usr/irissys/bin/iriswdimj -j /data/IRIS/mgr/
rm 3014680 3014284 0 /usr/bin/rm -f iris.use
irisdb 3014684 3014275 0 ./irisdb -s/data/IRIS/mgr/ -w/data/IRIS/mgr/ -cd -B -V CLONE^STU
sh 3014685 3014275 0 /bin/sh -c -- /usr/irissys/bin/irisdb -s /data/IRIS/mgr/ -cL
irisdb 3014686 3014685 0 /usr/irissys/bin/irisdb -s /data/IRIS/mgr/ -cL
irisdb 3014687 3014275 0 ./irisdb -s/data/IRIS/mgr/ -cV
irisrecov 3014688 3014275 0 ./irisrecov /data/IRIS/mgr/ quietly
iriswdimj 3015082 3014688 0 /usr/irissys/bin/iriswdimj -t
iriswdimj 3015083 3014688 0 /usr/irissys/bin/iriswdimj -j /data/IRIS/mgr/
rm 3015084 3014688 0 /usr/bin/rm -f iris.use
irisdb 3015088 3014275 0 ./irisdb -s/data/IRIS/mgr/ -w/data/IRIS/mgr/ -cc -B -C/data/IRIS/iris.cpf*IRIS
irisdb 3015140 3014275 0 ./irisdb -s/data/IRIS/mgr/ -w/data/IRIS/mgr/ -U -B -b1024 -Erunlevel=sys/complete QUIET^STU
irisdb 3015142 3015140 0 /usr/irissys/bin/irisdb -s/data/IRIS/mgr -cj -p13 START^MONITOR
irisdb 3015143 3015140 0 /usr/irissys/bin/irisdb -s/data/IRIS/mgr -cj -p13 START^CLNDMN
irisdb 3015144 3015140 0 /usr/irissys/bin/irisdb -s/data/IRIS/mgr -cj -p13 ErrorPurge^Config.Startup
irisdb 3015145 3015140 0 /usr/irissys/bin/irisdb -s/data/IRIS/mgr -cj -p13 START^LMFMON
irisdb 3015146 3015140 0 /usr/irissys/bin/irisdb -s/data/IRIS/mgr -cj -p13 ^RECEIVE
irisdb 3015147 3015146 0 /usr/irissys/bin/irisdb -s/data/IRIS/mgr -cj -p16 SCAN^JRNZIP
irisdb 3015148 3015140 0 /usr/irissys/bin/irisdb -s/data/IRIS/mgr -cj -p13 OneServerJob^STU
irisdb 3015149 3015148 0 /usr/irissys/bin/irisdb -s/data/IRIS/mgr -cj -p19 Master^%SYS.SERVER
irisdb 3015150 3015140 0 /usr/irissys/bin/irisdb -s/data/IRIS/mgr -cj -p13 systemRestart^%SYS.cspServer2
irisdb 3015151 3015140 0 /usr/irissys/bin/irisdb -s/data/IRIS/mgr -cj -p13 SERVERS^STU1
requirements_ch 3015152 3015140 0 /usr/irissys/bin/requirements_check
dirname 3015153 3015152 0 /usr/bin/dirname /usr/irissys/bin/requirements_check
httpd 3015215 3015151 0 /usr/irissys/httpd/bin/httpd -f /data/IRIS/httpd/conf/httpd.conf -d /usr/irissys/httpd -c Listen 52773
irisdb 3015362 3015140 0 /usr/irissys/bin/irisdb -s/data/IRIS/mgr -cj -p13 OnSystemStartup^HS.FHIRServer.Util.SystemStartup
irisdb 3015363 3015140 0 /usr/irissys/bin/irisdb -s/data/IRIS/mgr -cj -p13 OnSystemStartup^HS.HC.Util.SystemStartup
irisdb 3015364 3015151 0 /usr/irissys/bin/irisdb -s/data/IRIS/mgr -cj -p21 RunManager^%SYS.Task
irisdb 3015365 3015151 0 /usr/irissys/bin/irisdb -s/data/IRIS/mgr -cj -p21 Start^%SYS.Monitor.Control
irisdb 3015366 3015151 0 /usr/irissys/bin/irisdb -s/data/IRIS/mgr -cj -p21 Daemon^LOGDMN
irisdb 3015367 3015151 0 /usr/irissys/bin/irisdb -s/data/IRIS/mgr -cj -p21 RunDaemon^%SYS.WorkQueueMgr
irisdb 3015368 3015151 0 /usr/irissys/bin/irisdb -s/data/IRIS/mgr -cj -p21 RunRemoteQueueDaemon^%SYS.WorkQueueMgr
irisdb 3015369 3015362 0 /usr/irissys/bin/irisdb -s/data/IRIS/mgr -cj -p19 RunAll^HS.HC.Util.Installer.Upgrade.BackgroundItem
irisdb 3015370 3014275 0 ./irisdb -s/data/IRIS/mgr/ -cV
irisdb 3015436 3015367 0 /usr/irissys/bin/irisdb -s/data/IRIS/mgr -cj -p25 startWork^%SYS.WorkQueueMgr
irisdb 3015437 3015367 0 /usr/irissys/bin/irisdb -s/data/IRIS/mgr -cj -p25 startWork^%SYS.WorkQueueMgr
Bash
Bash
statsnoop Trace stat() syscalls... returns file attributes about an inode, file/dir access.
This one here is informative to dir and file level access during a start/stop... a bit chatty, but informative to what iris is doing during startup, including cpf access, journals, wij activity and the use of system tooling to get the job done.
sudo python3 statsnoop.py | grep iris
3016831 irisdb 0 0 /data/IRIS/mgr/
3016831 irisdb 0 0 /data/IRIS/mgr/
3016825 irisstop 0 0 /data/IRIS/mgr
3016825 irisstop 0 0 /usr/irissys/bin/irisuxsusr
3016825 irisstop 0 0 ./irisdb
3016825 irisstop 0 0 ../bin
3016832 sh -1 2 /usr/irissys/bin/glibc-hwcaps/x86-64-v3/
3016832 sh -1 2 /usr/irissys/bin/glibc-hwcaps/x86-64-v2/
3016832 sh 0 0 /usr/irissys/bin/
3016832 sh 0 0 /home/irisowner
3016833 irisdb -1 2 /usr/irissys/bin/glibc-hwcaps/x86-64-v3/
3016833 irisdb -1 2 /usr/irissys/bin/glibc-hwcaps/x86-64-v2/
3016833 irisdb 0 0 /usr/irissys/bin/
3016833 irisdb 0 0 /data/IRIS/mgr/
3016833 irisdb 0 0 /data/IRIS/mgr/
3016833 irisdb 0 0 /data/IRIS/mgr/
3016834 irisstop 0 0 ./irisdb
3016834 irisstop 0 0 ../bin
3016834 irisdb -1 2 /usr/irissys/bin/glibc-hwcaps/x86-64-v3/
3016834 irisdb -1 2 /usr/irissys/bin/glibc-hwcaps/x86-64-v2/
3016834 irisdb 0 0 /usr/irissys/bin/
3016834 irisdb 0 0 /data/IRIS/mgr/
3016834 irisdb 0 0 /data/IRIS/mgr/
3016835 irisstop 0 0 ./irisrecov
3016835 irisstop 0 0 ../bin
3016835 irisrecov -1 2 /usr/irissys/bin/glibc-hwcaps/x86-64-v3/
3016835 irisrecov -1 2 /usr/irissys/bin/glibc-hwcaps/x86-64-v2/
3016835 irisrecov 0 0 /usr/irissys/bin/
3016835 irisrecov 0 0 /home/irisowner
3016835 irisrecov 0 0 .
3016835 irisrecov 0 0 iris.cpf
3016841 irisrecov 0 0 /usr/bin/cut
3016841 irisrecov 0 0 /usr/bin/tr
3016841 irisrecov 0 0 /usr/bin/sed
3017761 requirements_ch 0 0 /home/irisowner
3017761 requirements_ch -1 2 /usr/irissys/bin/requirements.isc
3017691 irisdb 0 0 /data/IRIS/iris.cpf
3017691 irisdb 0 0 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf_5275
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf_5275
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf_5275
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb 0 0 /data/IRIS/iris.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb 0 0 /data/IRIS/iris.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb 0 0 /data/IRIS/iris.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb 0 0 /data/IRIS/iris.cpf
3017691 irisdb -1 2 /usr/lib64/libcrypto.so.1.1
3017691 irisdb -1 2 /usr/lib64/libcrypto.so.3
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb 0 0 /data/IRIS/iris.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb 0 0 /data/IRIS/iris.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb 0 0 /data/IRIS/iris.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb 0 0 /data/IRIS/iris.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb 0 0 /data/IRIS/iris.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb 0 0 /data/IRIS/iris.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb 0 0 /data/IRIS/iris.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb 0 0 /data/IRIS/iris.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb 0 0 /data/IRIS/iris.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb 0 0 /data/IRIS/iris.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf_20240908
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb 0 0 /data/IRIS/_LastGood_.cpf_5275
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb 0 0 /data/IRIS/_LastGood_.cpf_5275
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb -1 2 /data/IRIS/_LastGood_.cpf
3017691 irisdb 0 0 /etc/localtime
3017691 irisdb 0 0 /data/IRIS/_LastGood_.cpf
3017691 irisdb 0 0 /data/IRIS/mgr/irisaudit/
3017691 irisdb 0 0 /data/IRIS/mgr/irisaudit/
3017691 irisdb 0 0 /data/IRIS/mgr/irisaudit/
3017691 irisdb 0 0 /data/IRIS/mgr/irisaudit/
3017691 irisdb 0 0 /data/IRIS/mgr/irisaudit/
3017691 irisdb 0 0 /data/IRIS/mgr/irisaudit/
3017756 irisdb -1 2 /data/IRIS/mgr/journal/20240908.002
3017756 irisdb -1 2 /data/IRIS/mgr/journal/20240908.002
3017756 irisdb 0 0 /data/IRIS/mgr/journal/20240908.002z
3017756 irisdb -1 2 /data/IRIS/mgr/journal/20240908.002
3017756 irisdb 0 0 /data/IRIS/mgr/journal/20240908.002z
3017756 irisdb -1 2 /data/IRIS/mgr/journal/20240908.001
Bash
Bash
Flamegraphs

One of the coolest things I stumbled upon with the eBPF tooling was Brendan Gregg's implementation of flamegraphs on top of bpf output to understand performance and stack traces.
Given the following perf recording during a start/stop of IRIS:
sudo perf record -F 99 -a -g -- sleep 60
[ perf record: Woken up 7 times to write data ]
[ perf record: Captured and wrote 3.701 MB perf.data (15013 samples) ]
Bash
Bash
Generate the following flame graph with the below:
sudo perf script > out.perf
./stackcollapse-perf.pl out.perf > /tmp/gar.thing
./flamegraph.pl /tmp/gar.thing > flamegraph.svg
ObjectScript
ObjectScript
.png)
I gave it the college try uploading the svg, but it did not work out with this editor, and for some reason was unable to attach it. Understand though it is interactive and clickable to drill down into stack traces, outside of just looking cool.
- The function on the bottom is the function on-CPU. The higher up the y-axis, the further nested the function.
- The width of each function on the graph represents the amount of time that function took to execute as a percentage of the total time of its parent function.
- Finding functions that are both high on the y-axis (deeply nested) and wide on the x-axis (time-intensive) is a great way to narrow down performance and optimization issues.
"high and wide" <--- 👀
red == user-level
orange == kernel
yellow == c++
green == JIT, java etc.
I really liked this explanation of flamegraph interpetation laid out here (credit for above) where I derived a baseline understanding on how to read flamegraphs. Especially powerful for those who are running Python in IRIS on productions with userland code and looking for optimization.
Onward and upward, I hope this piqued your interest, now lets move on to the world of eBPF apps, where the pros have put together phenomanal solutions to put eBPF to work on fleets of systems safely and in a lightweight manner.