eBPF: Parca - Continuous Profiling for IRIS Workloads

Article

sween · Sep 10, 2024 4m read

#Kubernetes #Monitoring #Performance #Security #InterSystems IRIS

So if you are following from the previous post or dropping in now, let's segway to the world of eBPF applications and take a look at Parca, which builds on our brief investigation of performance bottlenecks using eBPF, but puts a killer app on top of your cluster to monitor all your iris workloads, continually, cluster wide!

Continous Profiling with Parca, IRIS Workloads Cluster Wide

Parca

Parca is named after the Program for Arctic Regional Climate Assessment (PARCA) and the practice of ice core profiling that has been done as part of it to study climate change. This open source eBPF project aims to reduce some carbon emissions produced by unnecessary resource usage of data centers, we can use it to get "more for less" with resource consumption, and optimize on our cloud native workloads running IRIS.

Parca is a continuous profiling project. Continuous profiling is the act of taking profiles (such as CPU, Memory, I/O and more) of programs in a systematic way. Parca collects, stores and makes profiles available to be queried over time, and due to its low overhead using eBPF can do this without detrimenting the target workloads.

Where

If you thought monitoring a kernel that runs multiple linux kernel namespaces was cool on the last post, Parca manages to bring all of that together in one spot, with a single pane of glass across all nodes (kernels) in a cluster.

Parca two main components:

Parca: The server that stores profiling data and allows it to be queried and analyzed over time.
Parca Agent: An eBPF-based whole-system profiler that runs on the nodes.

To hop right into "Parca applied", I configured Parca on my cluster with the following:

 kubectl create namespace parca
 kubectl apply -f https://github.com/parca-dev/parca/releases/download/v0.21.0/kubernetes-manifest.yaml
 kubectl apply -f https://github.com/parca-dev/parca-agent/releases/download/v0.31.1/kubernetes-manifest.yaml

Results in a daemonset, running the agent on all 10 nodes, with about 3-4 iris workloads scattered throughout the cluster.

Note: Parca runs standalone too, no k8s reqd!

Lets Profile

Now, I know I have a couple of workloads on this cluster of interest, one of them is a fhir workload that is servicing a GET on the /metadata endpoint for 3 pods on an interval for friends I am trying to impress at an eBPF party, the other is a straight up 2024.2 pod running the following as a JOB:

Class EBPF.ParcaIRISPythonProfiling Extends %RegisteredObject
{

/// Do ##class(EBPF.ParcaIRISPythonProfiling).Run()
ClassMethod Run()
{
    While 1 {
            HANG 10
            Do ..TerribleCode()
            Do ..WorserCode()
            Do ..OkCode()
            zn "%SYS"
            do ##class(%SYS.System).WriteToConsoleLog("Parca Demo Fired")
            zn "PARCA"
    }
}

ClassMethod TerribleCode() [ Language = python ]
{

    import time
    def terrible_code():
        time.sleep(30)
        print("TerribleCode Fired...")
    terrible_code()
}

ClassMethod WorserCode() [ Language = python ]
{
    import time
    def worser_code():
        time.sleep(60)
        print("WorserCode Fired...")
    worser_code()
}

ClassMethod OkCode() [ Language = python ]
{

    import time
    def ok_code():
        time.sleep(1)
        print("OkCode Fired....")
    ok_code()
}

}

Now, I popped a metallb service on the parca service and dove right into the console, lets take a peak at what we can observe in the two workloads.

Python Execution

So I didnt get what I wanted out of the results here, but I did get some hints on how IRIS is doing the whole python integration thing.

In Parca, I constrained on the particular pod, summed it by the same thing and selected a sane timeframe:

And here was the resulting pprof:

I can see irisdb doing the Python Execution, traces with ISCAgent, and on the right I can see basically iris init stuff in the container. Full transparency, I was expecting to see the python methods, so I have to work on on that, but I did learn pythoninit.so is the star of the python call out show.

FHIR Thinger

Now this one does show some traces from a kernel perspective relevant to a FHIR workload. On the left, you can see the apache threads for the web server standing up the api, and you can also see in the irisdb traces the unmarshalling of JSON.

All spawning from a thread by what is known as a `zu210fun` party!

Now, lets take a look at the same workload in Grafana as Parca exports to observability:

Not earth shattering I know, but the point being distributed profiling an IRIS app with an eBPF, in lightweight fashion, across an entire cluster... with the sole goal of and not ever having to ask a customer for a pButtons report again!