Internals

Architecture

One sentence: perf record → agent → TCP + zstd → server → parser → source mapper → SSE → browser.

PerfLens architecture: target device runs perf record + agent; local machine runs server + parser + source mapper + SSE; browser renders the UI
The agent collects on the target. The server parses and broadcasts. The browser does no perf work.

On the target device

The agent is the only thing running on the target. Two flavors ship: a ~600-line Python 3.5+ script for hosts with Python, and a single static C binary with vendored zstd for bare-metal targets. Both speak the same wire protocol — the server cannot tell which agent connected.

Capability probing

Before collecting anything, the agent inspects the kernel:

This costs roughly 6–12 seconds on first connection and is a one-time hit.

Collection rounds

Each round runs perf record and perf stat in parallel for N seconds (default 8), then perf script flattens the trace. The result — perf script text optionally followed by a ### PERF_STAT ### section — is compressed with zstd -1 (system or vendored) and pushed over TCP with a 5-byte header. Typical compression: 20–40×.

Health metrics

Independent of the perf pipeline, the agent collects device health every 2 seconds — CPU per-core, memory, load, temperature, process stats, network bytes — and streams them as JSON frames with flag 4. The browser renders sparklines, gauges, and per-core CPU bars without affecting the perf collection.

On the local machine

One Python process owns everything on the local side. A ThreadingHTTPServer serves the UI and the JSON API; a separate thread owns the single TCP listener that accepts agent connections.

Parser

parser.py turns perf script output into:

The parser is defensive on purpose. The perf script format drifts across kernel versions; the optional [cpu], pid/tid, and flags fields appear in different combinations on 2.6, 3.x, 4.x, 5.x, and 6.x. The parser handles all of them and silently tolerates lines it doesn't recognize.

Source mapper

source_mapper.py pipelines sample addresses through addr2line -f (or -fi when --inline is on) in batches of 500. A single mapper is created at server startup and shared across requests — no per-request forking.

Lookups feed a heat map: each source line gets a sample count, and the UI colors it red → amber → green by share of the file's total. With --toolchain-prefix, the same flag derives the right addr2line and readelf for a cross-compiled target in one step. --sysroot resolves shared-library module paths and source files under a target tree, similar to perf --symfs.

Sessions

Every agent connection becomes a session. Raw chunks are written to disk under sessions/<timestamp>_<agent>/ as the agent streams them; metadata (events, sample totals, perf stat values, platform) is written when the connection closes.

Replay is lazy: the UI hits /api/sessions/<id> only when you click a session, and the server re-parses the raw chunks on demand. A standalone perf.data file can also be imported at startup with --import — the server runs perf script against it once and exposes the result as a session.

Wire protocol

Every message is a 5-byte header followed by a payload of exactly LEN bytes.

Wire protocol: 4-byte length + 1-byte flag + payload
header = struct.pack('!IB', len(payload), compression_flag)
sock.sendall(header + payload)
FieldSizeMeaning
LEN4 bytes (uint32, big-endian)Payload length in bytes
FLAG1 byte (uint8)See flag table below
PAYLOADLEN bytesFrame body (perf data, JSON, etc.)

Flag values

FlagDirectionPayload
0agent → serverRaw UTF-8 perf script output
1agent → serverZstd-compressed perf script output
2server → agentCommand request (JSON): start, stop, pause, resume, configure, list_processes, reprobe, …
3agent → serverCommand response (JSON)
4agent → serverHealth metrics (JSON, every 2 s)

Payloads with flag 1 are decompressed with zstd -d -c. Perf data payloads carry plain perf script text, optionally followed by a ### PERF_STAT ### section the parser splits out.

Two connection modes

The agent can be the connector or the listener — the wire protocol is the same either way.

Broadcast to the browser

The server pushes parsed state to the UI through a Server-Sent Events stream at /api/stream. Four event types: status, event_types, per_event, perf_stat. The browser holds the connection open and rerenders whenever a new round lands. There is no polling.

Per-thread analysis, source views, and exports are pull-based: the UI hits /api/thread-view, /api/source, or the /api/export/… endpoints on demand — see the Reference for the full list.

Known limits