Advanced Page Cache observability and troubleshooting tools #
Let’s touch on some advanced tools we can use to perform low-level kernel tracing and debugging.
eBPF tools #
First of all, we can use eBPF
tools. The [bcc
]https://github.com/iovisor/bcc and bpftrace
are your friends when you want to get some internal kernel information.
Let’s take a look at some tools which come with it.
Writeback monitor #
$ sudo bpftrace ./writeback.bt
Attaching 4 probes...
Tracing writeback... Hit Ctrl-C to end.
TIME DEVICE PAGES REASON ms
15:01:48 btrfs-1 7355 periodic 0.003
15:01:49 btrfs-1 7355 periodic 0.003
15:01:51 btrfs-1 7355 periodic 0.006
15:01:54 btrfs-1 7355 periodic 0.005
15:01:54 btrfs-1 7355 periodic 0.004
15:01:56 btrfs-1 7355 periodic 0.005
Page Cache Top #
19:49:52 Buffers MB: 0 / Cached MB: 610 / Sort: HITS / Order: descending
PID UID CMD HITS MISSES DIRTIES READ_HIT% WRITE_HIT%
66229 vagrant vmtouch 44745 44032 0 50.4% 49.6%
66229 vagrant bash 205 0 0 100.0% 0.0%
66227 root cachetop 17 0 0 100.0% 0.0%
222 dbus dbus-daemon 16 0 0 100.0% 0.0%
317 vagrant tmux: server 4 0 0 100.0% 0.0%
Cache stat #
[vagrant@archlinux tools]$ sudo ./cachestat
HITS MISSES DIRTIES HITRATIO BUFFERS_MB CACHED_MB
10 0 0 100.00% 0 610
4 0 0 100.00% 0 610
4 0 0 100.00% 0 610
21 0 0 100.00% 0 610
624 0 0 100.00% 0 438
2 0 0 100.00% 0 438
4 0 0 100.00% 0 438
0 0 0 0.00% 0 438
19 0 0 100.00% 0 438
0 428 0 0.00% 0 546
28144 16384 0 63.21% 0 610
0 0 0 0.00% 0 610
0 0 0 0.00% 0 610
17 0 0 100.00% 0 610
0 0 0 0.00% 0 610
bpftrace
and kfunc
trace
#
Other than that, eBPF
and bpftrace
have recently got a new great feature named kfunc
. Thus, using it, you can trace some kernel functions without kernel debugging information installed.
It’s still close to experimental functionality, but it looks really promising.
Perf tool #
But if you want to go deeper, I have something for you. perf
allows you to set up dynamic tracing kernel probes almost at any kernel function. The only issue is the kernel debug information should be installed. Unfortunately, not all distributives provide it and, sometimes, you will need to recompile the kernel manually with some additional flags.
But when you get the debug info, you can perform really crazy investigations. For example, if we want to track the major page faults, we can find the kernel function which is in charge (https://elixir.bootlin.com/linux/latest/source and its search for help) and setup a probe:
perf probe -f "do_read_fault vma->vm_file->f_inode->i_ino"
where do_read_fault
is our kernel function and vma->vm_file->f_inode->i_ino
is an inode number of the file where the major page fault occurs.
Now you can start recording events:
perf record -e probe:do_read_fault -ag -- sleep 10
And after 10 seconds, we can grep out the inodes with perf script
and bash magic:
perf script | grep i_ino | cut -d ' ' -f 1,8| sed 's#i_ino=##g' | sort | uniq -c | sort -rn