What every SRE should know about GNU/Linux resolvers and Dual-Stack applications #
In this series of posts, I’d like to make a deep dive into the GNU/Linux local facilities used to convert a domain name or hostname into IP addresses, specifically in the context of dual-stack applications. This process of resolution is one of the oldest forms of networking abstraction, designed to replace hard-to-remember network addresses with human-readable strings. Although it may seem simple at first glance, the entire process involving stub resolvers is filled with complexities and subtle nuances. One contributing factor to this complexity is the growing number of IPv6 addresses, which, although not increasing at the pace everyone might want, is gradually changing servers and clients to support dual-stack hosts. Thus a seamless transition to IPv6 become an important feature and should occur without degrading user experience or increasing response latency.
We will start with a brief history of resolvers, exploring how they evolved, the issues and problems that the getaddrinfo()
aims to resolve, and what happens under the hood: how it interacts with the name service switch (NSS
), caches results, and aids in building applications suited for a dual-stack world with both IPv4 and IPv6 address families. This abstraction and address-agnostic approach are essential to modern software development, and a sloppy implementation can lead to subtle bugs that are hard to debug in production. That’s why we will cover the dual stack applications more thoroughly from both client and server perspectives, trying to understand the order of using available destination addresses from a list of IPv4 and IPv6 addresses, and exploring algorithms to improve response latency in cases of network routing instability or misconfiguration.
We will also examine the most feature-rich alternative C
language resolver, c-ares
, discussing its potential advantages and why you might consider using it. However, our discussion will not be limited to C
stub resolvers; we will also cover mainstream languages such as Python
, Go
(Golang
), Rust
, Java
, and NodeJS
, focusing on their internals, decisions and trade-offs.
Another important topic is how to configure and manage /etc/resolv.conf
on modern GNU/Linux systems. At first glance, managing /etc/resolv.conf
might seem straightforward – simply add a nameserver and a search domain. But when a system has multiple physical interfaces (e.g., LAN and WiFi) and several virtual ones such as VPN tunnels, all configured with DHCP clients, the situation becomes more complex. Each DHCP server might provide its own nameservers and a search domain, necessitating some logic to coordinate and reconcile these changes. Modern GNU/Linux distributions usually employ systemd-resolved
to address this issue, and we will explore its capabilities.
As usual, we will touch on related topics to dual-stack programs, such as IPv4-mapped addresses, different ways to bind sockets for dual-stack servers, and how systemd
can help manage listener sockets.
After we have gained a complete understanding of the resolving process, tools, and solutions, we will examine several popular load balancers: Nginx, Envoy (Envoyproxy), and HAProxy. These are excellent examples because they are designed to be dual-stack for both clients (downstreams) and backends (upstreams).
Finally, we’ll review some new and advanced topics not always directly related to a local stub resolver and dual stack applications but certainly important for domain name resolution and promising in terms of refining the resolving process in various directions: DNS push notifications, the new DNS resource record HTTPS, DNS over TLS (DoT), DNS over HTTPS (DoH), oblivious DNS (ODNS), and DNSSEC.
But before we kick off, here is some preparational information.
Setup playground #
All examples in this series are runnable and represent real, working code. To follow along and experiment with the code effectively – a great way to learn – you’ll need a setup similar to mine. I use the latest LTS Ubuntu 24.04 cloud image on my macOS, managed under the lima project, which allows me to run Linux containers.
For testing domain name resolution, I’m using “microsoft.com
” for all tests because it provides multiple A
and AAAA
records. Additionally, its DNS server shuffles records with every call, which can help easily determine if the answer is served from the cache or not.