Direct IO

Direct IO (DIO) (NOT READY) #

As usual, there is always an exception to any rule. And Page Cache is no different. So let’s talk about file reads and writes, which can ignore Page Cache content.

Why it’s good #

Some applications require low-level access to the storage subsystem and the linux kernel gives such a feature by providing O_DIRECT file open flag. This IO is called the Direct IO or DIO. A program, which opens a file with this flag, bypasses the kernel Page Cache completely and directly communicates with the VFS and the underlying filesystem.

The pros are:

  • Lower CPU usage and thus higher throughput you can get;
  • Linux Async IO (man 7 aio) works only with DIO (io_submit);
  • zero-copy Avoiding double buffering () between Page Cache and user-space buffers;
  • More control over the writeback.

Why it’s bad and io_uring alternative #

  • need to align read and writes to the block size;
  • not all file systems are the same in implementing DIO;
  • DIO without Linux AIO is slow and not useful at all;
  • not cross-platform;
  • DIO and buffered IO can’t be performed at the same time for the file.

DIO usually makes no sense without AIO, but AIO has a lot of bad design decisions:

So I think this is ridiculously ugly.

AIO is a horrible ad-hoc design, with the main excuse being “other, less gifted people, made that design, and we are implementing it for compatibility because database people - who seldom have any shred of taste - actually use it”.

But AIO was always really really ugly.

                                                Linus Torvalds
Heads-up! With DIO still need to run fsync() on a file!

Let’s write an example with golang and iouring-go library:

TODO
Read next chapter →