Advanced Page Cache observability and troubleshooting tools

Advanced Page Cache observability and troubleshooting tools #

Let’s touch on some advanced tools we can use to perform low-level kernel tracing and debugging.

eBPF tools #

First of all, we can use eBPF tools. The [bcc]https://github.com/iovisor/bcc and bpftrace are your friends when you want to get some internal kernel information.

Let’s take a look at some tools which come with it.

Writeback monitor #

$ sudo bpftrace ./writeback.bt

Attaching 4 probes...
Tracing writeback... Hit Ctrl-C to end.
TIME      DEVICE   PAGES    REASON           ms
15:01:48  btrfs-1  7355     periodic         0.003
15:01:49  btrfs-1  7355     periodic         0.003
15:01:51  btrfs-1  7355     periodic         0.006
15:01:54  btrfs-1  7355     periodic         0.005
15:01:54  btrfs-1  7355     periodic         0.004
15:01:56  btrfs-1  7355     periodic         0.005

Page Cache Top #

19:49:52 Buffers MB: 0 / Cached MB: 610 / Sort: HITS / Order: descending  
PID      UID      CMD              HITS     MISSES   DIRTIES  READ_HIT%  WRITE_HIT%  
   66229 vagrant  vmtouch             44745    44032        0      50.4%      49.6%  
   66229 vagrant  bash                  205        0        0     100.0%       0.0%  
   66227 root     cachetop               17        0        0     100.0%       0.0%  
     222 dbus     dbus-daemon            16        0        0     100.0%       0.0%  
     317 vagrant  tmux: server            4        0        0     100.0%       0.0%

Cache stat #

[vagrant@archlinux tools]$ sudo ./cachestat  
    HITS   MISSES  DIRTIES HITRATIO   BUFFERS_MB  CACHED_MB  
      10        0        0  100.00%            0        610  
       4        0        0  100.00%            0        610  
       4        0        0  100.00%            0        610  
      21        0        0  100.00%            0        610  
     624        0        0  100.00%            0        438  
       2        0        0  100.00%            0        438  
       4        0        0  100.00%            0        438  
       0        0        0    0.00%            0        438  
      19        0        0  100.00%            0        438  
       0      428        0    0.00%            0        546  
   28144    16384        0   63.21%            0        610  
       0        0        0    0.00%            0        610  
       0        0        0    0.00%            0        610  
      17        0        0  100.00%            0        610  
       0        0        0    0.00%            0        610

bpftrace and kfunc trace #

Other than that, eBPF and bpftrace have recently got a new great feature named kfunc. Thus, using it, you can trace some kernel functions without kernel debugging information installed.

It’s still close to experimental functionality, but it looks really promising.

Perf tool #

But if you want to go deeper, I have something for you. perf allows you to set up dynamic tracing kernel probes almost at any kernel function. The only issue is the kernel debug information should be installed. Unfortunately, not all distributives provide it and, sometimes, you will need to recompile the kernel manually with some additional flags.

But when you get the debug info, you can perform really crazy investigations. For example, if we want to track the major page faults, we can find the kernel function which is in charge (https://elixir.bootlin.com/linux/latest/source and its search for help) and setup a probe:

perf probe -f "do_read_fault vma->vm_file->f_inode->i_ino"

where do_read_fault is our kernel function and vma->vm_file->f_inode->i_ino is an inode number of the file where the major page fault occurs.

Now you can start recording events:

perf record -e probe:do_read_fault -ag -- sleep 10

And after 10 seconds, we can grep out the inodes with perf script and bash magic:

perf script | grep i_ino | cut -d ' ' -f 1,8| sed 's#i_ino=##g' | sort | uniq -c | sort -rn