Dual-Stack applications

6. Dual-Stack applications #

IPv4 or IPv6

Let’s now focus on dual-stack programs, which support both IPv4 and IPv6. Here are some critical questions to consider:

For server code:

  • How can we easily listen on all IPv4 and all IPv6 addresses? Do we need separate listeners for each?
  • Are there any tools or helpers available to manage multiple listeners?

For client code:

  • Which address family should our client program resolve and use: A, AAAA, or both?
  • What to do if the resolver returns multiple addresses for each family?
  • Does a machine have active IPv4 and IPv6 connectivity? Is the IPv6 routing configured correctly to the destination?
  • In case of connection errors, which address should be used to reconnect?

Ideally, we want to abstract away from these technical details to focus more on developing our core business logic, such as converting JSONs to protobufs and vice versa. 🙃

To address these questions effectively, we need to take a closer look at getaddrinfo(), its arguments, and its implementation details. Let’s dive in, starting with the server side.

6.1 Dual stack server #

Common sense suggests that to support both IPv4 and IPv6, we would need at least two listening sockets: one for each address family. However, just as we use the special IPv4 address 0.0.0.0 (also known as wildcard IPv4 or INADDR_ANY) to bind to all available IPv4 interfaces, there is a similar solution for IPv6. The IPv6 wildcard address, represented as :: or IN6ADDR_ANY_INIT, functions similarly by allowing binding to all interfaces and all IPv6 addresses. But it has one more feature that the IPv4 wildcard address is missing. The listener on :: address can handle IPv4 connections too, thanks to a special block of IPv6 addresses known as IPv4-mapped addresses.

Good explanation of what are these IPv4-mapped addresses can be found in RFC 3493 Basic Socket Interface Extensions for IPv6:

3.7 Compatibility with IPv4 Nodes

The API also provides a different type of compatibility: the ability for IPv6 applications to interoperate with IPv4 applications. This feature uses the IPv4-mapped IPv6 address format defined in the IPv6 addressing architecture specification [2]. This address format allows the IPv4 address of an IPv4 node to be represented as an IPv6 address. The IPv4 address is encoded into the low-order 32 bits of the IPv6 address, and the high-order 96 bits hold the fixed prefix 0:0:0:0:0:FFFF. IPv4-mapped addresses are written as follows:

 ::FFFF:<IPv4-address>

These addresses can be generated automatically by the getaddrinfo() function, as described in Section 6.1. Applications may use AF_INET6 sockets to open TCP connections to IPv4 nodes, or send UDP packets to IPv4 nodes, by simply encoding the destination’s IPv4 address as an IPv4-mapped IPv6 address, and passing that address, within a sockaddr_in6 structure, in the connect() or sendto() call. When applications use AF_INET6 sockets to accept TCP connections from IPv4 nodes, or receive UDP packets from IPv4 nodes, the system returns the peer’s address to the application in the accept(), recvfrom(), or getpeername() call using a sockaddr_in6 structure encoded this way.

getaddrinfo() in server code plays several roles:

  • It allows the use of hostnames to bind a socket if the node parameter of getaddrinfo() is set.
  • It prepares the necessary data structures for use in socket API calls.
  • It makes it possible to bind to all available and future addresses with minimal changes to the code.

By leveraging these functionalities, getaddrinfo() simplifies the process of setting up network connections in server applications.

By leveraging these functionalities, getaddrinfo() simplifies the process of setting up network connections in server applications.

Server code example:

#define _POSIX_C_SOURCE 200112L

#include <arpa/inet.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <unistd.h>

#define PORT "80"
#define BACKLOG 10

void handle_client(int client_fd) {
  const char *response = "HTTP/1.0 200 OK\r\n"
                         "Content-Type: text/html\r\n"
                         "Connection: close\r\n"
                         "\r\n"
                         "<html><body><h1>Hello, World!</h1></body></html>\n";

  send(client_fd, response, strlen(response), 0);
  shutdown(client_fd, SHUT_WR);
  close(client_fd);
}

int main() {
  struct addrinfo hints, *servinfo, *p;
  int sockfd;
  int yes = 1;
  int rv;

  memset(&hints, 0, sizeof hints);
  hints.ai_family = AF_INET6; // <--- ①
  hints.ai_socktype = SOCK_STREAM;
  hints.ai_flags = AI_PASSIVE; // <--- ②

  if ((rv = getaddrinfo(NULL /* ---> ③ <---*/, PORT, &hints, &servinfo)) != 0) {
    fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(rv));
    return 1;
  }

  // ---> ④ <---
  for (p = servinfo; p != NULL; p = p->ai_next) {
    if ((sockfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol)) == -1) {
      perror("server: socket");
      continue;
    }

    if (setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &yes, sizeof(int)) == -1) {
      perror("setsockopt");
      close(sockfd);
      return 1;
    }

    if (bind(sockfd, p->ai_addr, p->ai_addrlen) == -1) {
      close(sockfd);
      perror("server: bind");
      continue;
    }

    break;
  }

  if (p == NULL) {
    fprintf(stderr, "server: failed to bind\n");
    return 2;
  }

  freeaddrinfo(servinfo); // all done with this structure

  if (listen(sockfd, BACKLOG) == -1) {
    perror("listen");
    close(sockfd);
    return 1;
  }

  printf("server: waiting for connections...\n");

  while (1) {
    struct sockaddr_storage client_addr;
    socklen_t addr_size = sizeof client_addr;
    int client_fd = accept(sockfd, (struct sockaddr *)&client_addr, &addr_size);
    if (client_fd == -1) {
      perror("accept");
      continue;
    }

    char name[INET6_ADDRSTRLEN];
    char port[10];
    getnameinfo((struct sockaddr *)&client_addr, addr_size, name, sizeof(name), // <--- ⑤
                port, sizeof(port), NI_NUMERICHOST | NI_NUMERICSERV);
    printf("client %s:%s\n", name, port);

    handle_client(client_fd);
  }

  close(sockfd);
  return 0;
}

① – Set AF_INET6 family to allow handle IPv4 and IPv6 addresses.

② – AI_PASSIVE flag asks getaddrinfo() to return a socket suitable for binding the socket that will accept connections.

③ – NULL for node with the AI_PASSIVE flag uses either INADDR_ANY for IPv4 address, or IN6ADDR_ANY_INIT for IPv6 address.

④ – The RFC 6724 (Default Address Selection for Internet Protocol Version 6 (IPv6)) clearly explains how to use the returned list of addresses in the applications:

Well-behaved applications SHOULD NOT simply use the first address returned from an API such as getaddrinfo() and then give up if it fails. For many applications, it is appropriate to iterate through the list of addresses returned from getaddrinfo() until a working address is found. For other applications, it might be appropriate to try multiple addresses in parallel (e.g., with some small delay in between) and use the first one to succeed.

The last sentence concerns the client programs and will be discussed later.

In our example, we bind() and listen() only to the first successful address, but if you’re writing a real server, it may be necessary to listen to multiple addresses due to security concerns. Therefore, your code should create multiple listeners and accept connections on all of them simultaneously.

⑤ – using getnameinfo (man 3 getnameinfo) to convert the client address to human readable format.

Compile and run:

$ gcc ./server.c -o server && ./server

Check that the server listens on proper address family and port:

$ sudo netstat -ntlp | grep 80
tcp6       0      0 :::80         :::*       LISTEN   423492/./server

Let’s run some requests in the new terminal:

$ curl localhost                                # <--- ①
<html><body><h1>Hello, World!</h1></body></html>

$ curl 192.168.5.15                             # <--- ②
<html><body><h1>Hello, World!</h1></body></html>

$ curl "[2001:db8:123:456:6af2:68fe:ff7c:e25c]" # <--- ③
<html><body><h1>Hello, World!</h1></body></html>

$ curl --interface 127.0.0.1 localhost          # <--- ④
<html><body><h1>Hello, World!</h1></body></html>

$ curl "[fe80::5055:55ff:fe8e:3d07%eth0]"       # <--- ⑤
<html><body><h1>Hello, World!</h1></body></html>

and collect the output:

server: waiting for connections…                  
client ::1:41646                                  # <--- ①
client ::ffff:192.168.5.15:52236                  # <--- ②
client 2001:db8:123:456:6af2:68fe:ff7c:e25c:41098 # <--- ③
client ::ffff:127.0.0.1:35800                     # <--- ④
client fe80::5055:55ff:fe8e:3d07%eth0:34328       # <--- ⑤

① – connection to localhost transforms to ::1 client address.

② – 192.168.5.15 is container local address, the log shows IPv4-mapped address.

③ – 2001:db8:123:456:6af2:68fe:ff7c:e25 is a local container address with global scope.

④ – using --interface we can force curl to use IPv4 source address for localhost.

Internally curl uses bind() before connect() to set the source address:

$ strace -e trace=network curl --interface 127.0.0.1 localhost
setsockopt(5, SOL_SOCKET, SO_BINDTODEVICE, "127.0.0.1\0", 10) = -1 ENODEV (No such device)
setsockopt(5, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
getsockname(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.1")}, [128 => 16]) = 0
connect(5, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress)

⑤ – if we’d like to use a link-local IPv6 address we need to use zone id and specify interface with % character.

6.1.1 IPV6_V6ONLY socket option #

You always can opt out of handling IPv4 addresses with IPv6. To do so you need to set the socket option IPV6_V6ONLY:

If this flag is set to true (nonzero), then the socket is restricted to sending and receiving IPv6 packets only. In this case, an IPv4 and an IPv6 application can bind to a single port at the same time.

If this flag is set to false (zero), then the socket can be used to send and receive packets to and from an IPv6 address or an IPv4-mapped IPv6 address. The argument is a pointer to a boolean value in an integer.

The default value for this flag is defined by the contents of the file /proc/sys/net/ipv6/bindv6only. The default value for that file is 0 (false).

6.1.1.1 Nginx and Envoy (Envoyproxy) and IPV6_V6ONLY #

For example, Nginx and Envoy (Envoyproxy) set this option by default on all IPv6 listeners. The justification is to be explicit and to avoid hiding the IPv4-mapped addresses. Additionally, creating a socket with IPV6_V6ONLY is the default for Windows.

Therefore, for Nginx, if you need to listen on all interfaces, the default suggestion is to use two wildcard addresses:

listen [::]:80;
listen 80;

Request in logs:

127.0.0.1 - - [07/Jul/2024:08:41:34 +0100] "GET / HTTP/1.1" 200 615 "-" "curl/8.5.0"
::1 - - [07/Jul/2024:08:41:45 +0100] "GET / HTTP/1.1" 200 615 "-" "curl/8.5.0"

However you can change it back to Linux defaults:

For nginx by setting ipv6only=off for listen directive:

listen [::]:80 ipv6only=off;

In logs now:

::1 - - [07/Jul/2024:08:40:40 +0100] "GET / HTTP/1.1" 200 615 "-" "curl/8.5.0"
::ffff:127.0.0.1 - - [07/Jul/2024:08:40:48 +0100] "GET / HTTP/1.1" 200 615 "-" "curl/8.5.0"

For envoy set the ipv4_compat for a listener:

  listeners:
  - name: listener_0
    address:
      socket_address:
        address: "::"
        port_value: 8080
        ipv4_compat: true

6.1.1.2 Go (Golang) and IPV6_V6ONLY #

Go (golang), on the other hand, decided to choose simplicity (as usual) and provide sane defaults. It doesn’t reset the Linux default (IPV6_V6ONLY is not set) and it silently transforms IPv4-mapped addresses to IPv4 addresses and treats them as equal.

Server example:

package main

import (
	"fmt"
	"log"
	"net"
	"net/http"
)

func main() {
	connStateHandler := func(conn net.Conn, state http.ConnState) {
		if state != http.StateNew {
			return
		}
		if addr, ok := conn.RemoteAddr().(*net.TCPAddr); ok {
			log.Printf("connected: %#v, %s:%d", addr.IP, addr.IP, addr.Port)
		}
	}

	server := &http.Server{
		Addr:      ":80",
		ConnState: connStateHandler,
		Handler: http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
			fmt.Fprintln(w, "Hello, this is the hardcoded response!")
		}),
	}

	fmt.Println("Starting server at :80")
	if err := server.ListenAndServe(); err != nil {
		log.Fatalf("Error starting server: %v", err)
	}
}

Run it:

$ go run ./main.go

Check listening sockets, the server listens to IPv6 wildcard only:

$ sudo netstat -ntlp | grep 80
tcp6       0      0 :::80            :::*              LISTEN      426610/main

Repeat the same exercise with curl and addresses from both families:

$ curl localhost
Hello, this is the hardcoded response!

$ curl --interface 127.0.0.1 localhost
Hello, this is the hardcoded response!

$ curl 192.168.5.15
Hello, this is the hardcoded response!

$ curl "[2001:db8:123:456:6af2:68fe:ff7c:e25c]"
Hello, this is the hardcoded response!

The result contains IPv6 and IPv4 address without IPv4-mapped addresses in string representation:

Starting server at :80

connected: net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x1}, ::1:55540
connected: net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0x7f, 0x0, 0x0, 0x1}, 127.0.0.1:58108
connected: net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xc0, 0xa8, 0x5, 0xf}, 192.168.5.15:41442
connected: net.IP{0x20, 0x1, 0xd, 0xb8, 0x1, 0x23, 0x4, 0x56, 0x6a, 0xf2, 0x68, 0xfe, 0xff, 0x7c, 0xe2, 0x5c}, 2001:db8:123:456:6af2:68fe:ff7c:e25c:38752

But as you can see the binary representation of IP addresses are IPv4-mapped.

Internally the To4() converts IPv4-mapped and real IPv4 to the same net.IP:

https://cs.opensource.google/go/go/+/refs/tags/go1.22.5:src/net/ip.go;l=210-223

// To4 converts the IPv4 address ip to a 4-byte representation.
// If ip is not an IPv4 address, To4 returns nil.
func (ip IP) To4() IP {
	if len(ip) == IPv4len {
		return ip
	}
	if len(ip) == IPv6len &&
		isZeros(ip[0:10]) &&
		ip[10] == 0xff &&
		ip[11] == 0xff {
		return ip[12:16]
	}
	return nil
} 

Also String() method of net.IP does this transformation for us implicitly:

https://cs.opensource.google/go/go/+/refs/tags/go1.22.5:src/net/ip.go;l=289-308

// String returns the string form of the IP address ip.
// It returns one of 4 forms:
// - "<nil>", if ip has length 0
// - dotted decimal ("192.0.2.1"), if ip is an IPv4 or IP4-mapped IPv6 address
// - IPv6 conforming to RFC 5952 ("2001:db8::1"), if ip is a valid IPv6 address
// - the hexadecimal form of ip, without punctuation, if no other cases apply
func (ip IP) String() string {
	if len(ip) == 0 {
		return "<nil>"
	}
	if len(ip) != IPv4len && len(ip) != IPv6len {
		return "?" + hexString(ip)
	}
	// If IPv4, use dotted notation.
	if p4 := ip.To4(); len(p4) == IPv4len {
		return netip.AddrFrom4([4]byte(p4)).String()
	}
	return netip.AddrFrom16([16]byte(ip)).String()
} 

6.1.1.3 Rust #

If we make the same experiment with Rust, hyper and tokio:

Dependencies:

[dependencies]
hyper = { version = "0.14", features = ["full"] }
tokio = { version = "1", features = ["full"] }

Code:

use hyper::service::{make_service_fn, service_fn};
use hyper::{Body, Request, Response, Server};
use std::convert::Infallible;
use std::net::SocketAddr;
use std::sync::Arc;
use tokio::sync::Mutex;

// Shared state for logging
struct Logger {
    // Here you can add additional shared state if needed
}

impl Logger {
    fn log(&self, addr: &SocketAddr) {
        println!("Received request from {}:{}, is ipv4: {}", addr.ip(), addr.port(), addr.is_ipv4());
    }
}

async fn handle_request(
    req: Request<Body>,
    logger: Arc<Mutex<Logger>>,
    remote_addr: SocketAddr,
) -> Result<Response<Body>, Infallible> {
    // Log the remote address
    {
        let logger = logger.lock().await;
        logger.log(&remote_addr);
    }

    Ok(Response::new(Body::from("Hello, world!")))
}

#[tokio::main]
async fn main() {
    // Define the address to listen on (IPv6 address :: and port 3000)
    let addr = ([0, 0, 0, 0, 0, 0, 0, 0], 3000).into();

    // Create a logger instance
    let logger = Arc::new(Mutex::new(Logger {}));

    // Create a service that handles requests
    let make_svc = make_service_fn(move |conn: &hyper::server::conn::AddrStream| {
        let logger = logger.clone();
        let remote_addr = conn.remote_addr();

        async move {
            Ok::<_, Infallible>(service_fn(move |req| {
                handle_request(req, logger.clone(), remote_addr)
            }))
        }
    });

    // Create the server with the specified address and service
    let server = Server::bind(&addr).serve(make_svc);

    // Run the server
    println!("Listening on http://[::]:3000");
    if let Err(e) = server.await {
        eprintln!("Server error: {}", e);
    }
}

And run it:

$ cargo run

Send some requests with curl:

$ curl "[fe80::5055:55ff:fe8e:3d07%eth0]:3000"
$ curl 192.168.5.15:3000
$ curl 127.0.0.1:3000

We get the following logs:

Received request from fe80::5055:55ff:fe8e:3d07:44388, is ipv4: false
Received request from ::ffff:192.168.5.15:49918, is ipv4: false
Received request from ::ffff:127.0.0.1:42254, is ipv4: false

where you can see that IPv4-mapped addresses are shown as is, and the is_ipv4() method returns false for them.

6.1.2 Multiple listening sockets with systemd #

One more side note about multiple listener sockets.

There are many cases when an application can’t use a wildcard address due to security concerns and should bind to a number of specific addresses, regardless of the address family. In such cases, you don’t have many options other than creating and managing multiple accept sockets. If you have such a task, consider looking into the systemd socket activation feature, which can simplify socket creation and provide all the benefits of systemd.

Go server example which gets listeners directly from systemd:

package main

import (
    "fmt"
    "net"
    "os"
    "github.com/coreos/go-systemd/v22/activation"
)

func handleConnection(conn net.Conn) {
    defer conn.Close()
    buf := make([]byte, 1024)
    for {
        n, err := conn.Read(buf)
        if err != nil {
            fmt.Println("Error reading from connection:", err)
            return
        }
        fmt.Printf("Received: %s\n", string(buf[:n]))
        conn.Write([]byte("Hello from Go!\n"))
    }
}

func main() {
    // Retrieve sockets from systemd
    listeners, err := activation.Listeners()
    if err != nil {
        fmt.Fprintf(os.Stderr, "Error retrieving listeners: %v\n", err)
        os.Exit(1)
    }

    if len(listeners) == 0 {
        fmt.Fprintln(os.Stderr, "No sockets to listen on.")
        os.Exit(1)
    }

    // Listen for connections on each socket
    for _, listener := range listeners {
        go func(l net.Listener) {
            for {
                conn, err := l.Accept()
                if err != nil {
                    fmt.Println("Error accepting connection:", err)
                    continue
                }
                go handleConnection(conn)
            }
        }(listener)
    }

    // Block the main goroutine to keep the program running
    select {}
}

Socket file /etc/systemd/system/go-service.socket:

[Unit]
Description=Sockets for Go service

[Socket]
ListenStream=127:0.0.1:12345
ListenStream=192.168.1.1:23456
ListenStream=[fe80::5055:55ff:fe8e:3d07]:34567

[Install]
WantedBy=sockets.target

systemd service unit file /etc/systemd/system/go-service.service:

[Unit]
Description=Your Go service
After=network.target

[Service]
ExecStart=/path/to/your/go/program
NonBlocking=true

[Install]
WantedBy=multi-user.target
Both above files should have the same name before the dot!

Reload and start the service and the socket:

$ sudo systemctl daemon-reload
$ sudo systemctl enable go-service.socket
$ sudo systemctl start go-service.socket

6.2 Dual stack client #

Dual stack client applications are usually more complex than the server side in terms of address agnosticism. In server code getaddrinfo() calls can be reasonably slow or even omitted, with sockets created manually (which is not a good practice). But, client code often needs to re-resolve hostnames periodically. For instance, a load balancer needs to understand if there are new backend updates from a service discovery. One of the issues arising from this is the overhead of performing an additional DNS request for an unconfigured network stack (unfortunately, this is often still IPv6), which might add unnecessary delays and increase DNS traffic without any benefits.

Another complexity lies behind getaddrinfo(). Its calls are blocking (we will look at asynchronous alternatives in the next chapter). The blocking nature leads to the following issues:

  • If you set the AF_UNSPEC family, which you probably should in order to be dual stack and IPv6 ready, you could wait longer if your resolver is periodically slow. This issue arises from the design of getaddrinfo(), which always waits for two answers for A and AAAA queries in this case. Even if one answer has already arrived, it will wait until the arrival of the second one or until a timeout form /etc/resolv.conf.
  • Avoid setting AF_UNSPEC and instead make two asynchronous calls with some logic and timeouts on top, though this can be cumbersome and difficult to do properly.

The AI_ADDRCONFIG hints flag is intended to help mitigate such issues and reduce traffic and latency. However, there is an important caveat with its heuristics. First, let’s read the section of man page for getaddrinfo():

If hints.ai_flags includes the AI_ADDRCONFIG flag, then IPv4 addresses are returned in the list pointed to by res only if the local system has at least one IPv4 address configured, and IPv6 addresses are returned only if the local system has at least one IPv6 address configured. The loopback address is not considered for this case as valid as a configured address.

The exception is made only for loopback addresses. Any other IPv6 addresses are considered valid, even for link-local scoped addresses, which are configured automatically when a link is up.

The conclusion is not favorable; the AI_ADDRCONFIG flag could potentially be useful only with IPv6-only hosts with a fully disabled IPv4 stack (because autoconfigured IPv4 addresses are also valid addresses) or IPv4 only.

But thanks to RFC 6724’s sorting algorithm (discussed in the next chapter) – which includes Rule 2: Prefer matching scope in Section 6 – the return order will prefer IPv4 over IPv6 if there are no global scope IPv6 source addresses on the device. Which we already saw with our getaddrinfo() experiments in Chapter 3.

The AI_ADDRCONFIG is enabled by default (man 3 getaddrinfo):

Specifying hints as NULL is equivalent to setting ai_socktype and ai_protocol to 0; ai_family to AF_UNSPEC; and ai_flags to (AI_V4MAPPED | AI_ADDRCONFIG).

which violates POSIX, but helps to improve user experience:

According to POSIX.1, specifying hints as NULL should cause ai_flags to be assumed as 0. The GNU C library instead assumes a value of (AI_V4MAPPED | AI_ADDRCONFIG) for this case, since this value is considered an improvement on the specification.

If we look under the hood, how getaddrinfo() manages to figure out existing addresses, we will see the__check_pf() function. It has a cache for the responses, and in case of a cache miss or a stale entry, it connects to NETLINK socket and queries routing information, which is parsed in the make_request() function afterwords to find an answer:

void
attribute_hidden
__check_pf (bool *seen_ipv4, bool *seen_ipv6,
	    struct in6addrinfo **in6ai, size_t *in6ailen)
{
  
  if (cache_valid_p ())
    {
      data = cache;
      
    }
  else
    {
     int fd = __socket (PF_NETLINK, SOCK_RAW | SOCK_CLOEXEC, NETLINK_ROUTE);
     
     data = make_request (fd, nladdr.nl_pid);
     
    }
  
  if (data != NULL)
    {
      /* It worked.  */
      *seen_ipv4 = data->seen_ipv4;
      *seen_ipv6 = data->seen_ipv6;
      *in6ailen = data->in6ailen;
      *in6ai = data->in6ai;
            
      return;
    }
  /* We cannot determine what interfaces are available.  Be
     pessimistic.  */
  *seen_ipv4 = true;
  *seen_ipv6 = true;
}

Another somewhat confusing getaddrinfo() hints flags is AI_V4MAPPED. RFC 3493 Basic Socket Interface Extensions for IPv6 where getaddrinfo() was introduced explains it a bit:

If the AI_V4MAPPED flag is specified along with an ai_family of AF_INET6, then getaddrinfo() shall return IPv4-mapped IPv6 addresses on finding no matching IPv6 addresses (ai_addrlen shall be 16).

For example, when using the DNS, if no AAAA records are found then a query is made for A records and any found are returned as IPv4-mapped IPv6 addresses.

The AI_V4MAPPED flag shall be ignored unless ai_family equals AF_INET6.

If the AI_ALL flag is used with the AI_V4MAPPED flag, then getaddrinfo() shall return all matching IPv6 and IPv4 addresses.

For example, when using the DNS, queries are made for both AAAA records and A records, and getaddrinfo() returns the combined results of both queries. Any IPv4 addresses found are returned as IPv4-mapped IPv6 addresses.

Thus the AI_V4MAPPED could help write a tool which wants to deal only with addresses in IPv6 format.

For IPv6-only hosts with no IPv4 connectivity, there is no routing for IPv4-mapped addresses because they are a special case in the IPv6 address space. If you need to route such addresses, you can, for instance, set up NAT64.

So, we have addresses now, but in what order should we try to connect to them? As usual, there is no silver bullet (we will take a look at Nginx and Envoy, which address this issue from different angles). However, there are two main views on this problem:

  1. Consider all addresses returned for a hostname as different ways to communicate with the same entity (it could be a multi-address host, service, website, etc.). Sometimes it’s also called a multihomed host:

    A multihomed host is a computer or device that is connected to two or more networks and can be identified by multiple IP addresses, typically associated with different network interfaces. This configuration is common in scenarios requiring redundancy, load balancing, or enhanced security.

    Connecting to any of the returned addresses ends up at the same entity. For example, if a host has both an IPv4 and an IPv6 address and our client is dual-stack, it doesn’t matter via which protocol a connection is established from the user’s point of view.

  2. Consider addresses returned as different entities under one domain name. This is now a Service Discovery via DNS rather than exploring different paths. For example StatefulSet pod names in Kubernetes in DNS.

The second option is kind of straightforward, and we will not talk about it much in this series. The client side code should implement a load balancer algorithm, use all the records as its backends, periodically refresh the records, and filter out some suspicious addresses (for example, link-local scope addresses).

But the first option has more subtle details that we need to discuss. Let’s start with the sorting problem.

6.2.1 Sorting destination addresses (RFC 6724) #

As we already saw and know, a hostname could be resolved into multiple A and/or AAAA addresses.

The issue is in which order to iterate through all of them (basically in which order getaddrinfo() or its alternatives return the result).

Let’s not forget that the original goal of introducing IPv6 addresses was to migrate to them as soon as possible. Over time, this has evolved to a goal of eventually migrating. With this in mind, it makes sense to prioritize returning IPv6 addresses first, if they exist, and IPv4 addresses next. This approach was standardized in RFC 3484 and later updated in RFC 6724: Default Address Selection for Internet Protocol Version 6 (IPv6). These RFCs provide two algorithms for address selection: one for source address selection (which occurs inside the Linux kernel and won’t be covered in detail here) and the destination address selection algorithm described in Section 6 of the document.

Thus, in order to seamlessly and smoothly migrate a dual-stack application from IPv4 to IPv6, it should follow RFC 6724 and behave well, where:

Well-behaved applications SHOULD NOT simply use the first address returned from an API such as getaddrinfo() and then give up if it fails. For many applications, it is appropriate to iterate through the list of addresses returned from getaddrinfo() until a working address is found. For other applications, it might be appropriate to try multiple addresses in parallel (e.g., with some small delay in between) and use the first one to succeed.

So the algorithm is the following:

flowchart TD Rule1[Rule 1: Avoid unusable destinations] Rule2[Rule 2: Prefer matching scope] Rule3[Rule 3: Avoid deprecated addresses] Rule4[Rule 4: Prefer home addresses] Rule5[Rule 5: Prefer matching label] Rule6[Rule 6: Prefer higher precedence] Rule7[Rule 7: Prefer native transport] Rule8[Rule 8: Prefer smaller scope] Rule9[Rule 9: Use longest matching prefix] Rule10[Rule 10: Otherwise, leave the order unchanged] Rule1 --> Rule2 --> Rule3 --> Rule4 --> Rule5 Rule5 --> Rule6 --> Rule7 --> Rule8 --> Rule9 --> Rule10

Figure 3. – Destination address selection algorithm RFC 6724

Let me provide a quick explanation with examples. To sort addresses, a comparator function is used. This function takes two addresses and determines which one wins based on a series of rules. If the first rule results in a tie, the next rule is applied, and so on.

The algorithm employs terms introduced in various RFCs, including:

  • address scopes;
  • IPv6 address types;
  • policy table with a configuration file.

But before we start, let’s take a closer look at each of these components.

The IPv6 addressing architecture [RFC4291] allows multiple unicast addresses to be assigned to interfaces. These addresses might have different reachability scopes (link-local, site-local, or global).

Here we need to quickly remind ourselves what scopes are there:

  • host scope – ::1 and 127.0.0.1
  • link-local scope – fe80::/10 and 169.254.0.0/16
  • unique Local Address (ULA) scope – fc00::/7
  • global scope – all other addresses (for IPv6 usually starts with 2000::/3), including IPv4 private networks such as 192.168.0.0/16, 172.16.0.0/12 and 10.0.0.0/8.

These addresses might also be “preferred” or “deprecated” [RFC4862]. Privacy considerations have introduced the concepts of “public addresses” and “temporary addresses” [RFC4941]. The mobility architecture introduces “home addresses” and “care-of addresses” [RFC6275].

Policy table is an auxiliary data structure to help sort source and destination addresses. It has a default value:

Prefix        Precedence Label
::1/128               50     0
::/0                  40     1
::ffff:0:0/96         35     4
2002::/16             30     2
2001::/32              5     5
fc00::/7               3    13
::/96                  1     3
fec0::/10              1    11
3ffe::/16              1    12

and could be tuned by changing /etc/gai.conf (man 5 gai.conf).

2.1. Policy Table

The policy table is a longest-matching-prefix lookup table, much like a routing table. Given an address A, a lookup in the policy table produces two values: a precedence value denoted Precedence(A) and a classification or label denoted Label(A).

The precedence value Precedence(A) is used for sorting destination addresses. If Precedence(A) > Precedence(B), we say that address A has higher precedence than address B, meaning that our algorithm will prefer to sort destination address A before destination address B.

The label value Label(A) allows for policies that prefer a particular source address prefix for use with a destination address prefix. The algorithms prefer to use a source address S with a destination address D if Label(S) = Label(D).

Also the destination address selection algorithm needs to know a source address for the destination address. We will review how a stub resolver can obtain thin information later in this chapter.

  1. Source Address Selection

The source address selection algorithm produces as output a single source address for use with a given destination address. This algorithm only applies to IPv6 destination addresses, not IPv4 addresses.

Now we can review the rules with examples.

Rule 1 Avoid unusable destinations.

If a stub resolver knows that a destination address is unreachable, it should deprioritize (pessimize) it.

Example: if a stub resolver has a cache and/or health checks mechanism, or history of connections, it can immediately pessimize an address.

Rule 2: Prefer matching scope.

Retrieve source addresses for the destination addresses under consideration and compare the scopes of each pair. If the scopes match for a pair, prioritize that destination address.

Example: DNS returned 203.0.113.1 and 2001:0db8:0:1::1. The system has 192.168.0.2 and a link-local IPv6 address. The result will be [203.0.113.1, 2001:0db8:0:1::1] because 192.168.0.2 and 203.0.113.1 are both global scoped addresses.

Rule 3: Avoid deprecated addresses.

Retrieve source addresses for the destination addresses under consideration. If a source address is marked as deprecated for a pair, deprioritize (pessimize) the corresponding destination address.

Rule 4: Prefer home addresses.

It’s related to mobile home and care-of addresses. A home address should win.

Rule 5: Prefer matching label.

Retrieve source addresses for the destination addresses under consideration. If a pair has equal labels in the policy table, prioritize (bump) the related destination address.

Example: DNS returned 2002:c633:6401::1 or 2001:db8:1::1. The source address for the first is 2002:c633:6401::2 and for the second is fe80::2. The result is [2002:c633:6401::1, 2001:db8:1::1] because label is the same for a pair 2002:c633:6401::1 and 2002:c633:6401::2 and is 1.

Rule 6: Prefer higher precedence.

Perform the same steps as above, but compare the Precedence values of the two destination addresses only.

Example: DNS returned 203.0.113.1 and 2001:0db8:0:1::1. The result is [ 2001:0db8:0:1::1, 203.0.113.1]. The precedence for 2001:0db8:0:1::1 is 40, and for 203.0.113.1 is 35.

Rule 7: Prefer native transport.

Always prefer a non-encapsulated destination address.

Rule 8: Prefer smaller scope.

Compare the scopes of the destination addresses under consideration and select the one with the smallest scope.

Example: DNS returned 2001:0db8:0:1::1 and fe80::2. The result is [fe80::2, 2001:0db8:0:1::1]. The link local scope is smaller.

Rule 9: Use longest matching prefix.

Retrieve source addresses for the destination addresses under consideration and compare the longest matching prefixes between each pair.

Example: DNS returned 2001:db8:1::1 and 2001:db8:3ffe::1. Sources are 2001:db8:1::2 and 2001:db8:3f44::2. The result is [2001:db8:1::1, 2001:db8:3ffe::1]. The longest matching prefix wins for the pair of 2001:db8:1::1 and 2001:db8:1::2.

Rule 10: Otherwise, leave the order unchanged.

Preserve the order provided by the DNS server, allowing for possible Round-robin DNS.

One important implication from the above is that IPv6 usually takes precedence over IPv4 due to the higher default precedence in the policy table (man 5 gai.conf). If you do not want this behavior, you can change it by adding the following line to /etc/gai.conf:

precedence ::ffff:0:0/96  100 

This line sets a precedence of 100 for the IPv4-mapped range. This adjustment is likely the only practical use of /etc/gai.conf since distributing changes across multiple machines is difficult and unreliable. However, be aware that not all stub resolvers read this configuration file.

6.2.1.1 Retrieve Source address for Destination address #

Let’s now explore the details of how to obtain a source address for a given destination.

The algorithm is using a feature of connect (man 2 connect) syscall for SOCK_DGRAM (UDP) sockets:

If the socket sockfd is of type SOCK_DGRAM, then addr is the address to which datagrams are sent by default, and the only address from which datagrams are received.

But what is more interesting for us is that we can run getsockname (man 2 getsockname) afterwards for the file descriptor and get the source address for the destination:

getsockname() returns the current address to which the socket sockfd is bound, in the buffer pointed to by addr.

All code lives for getaddrinfo() of glibc here:

/* We overwrite the type with SOCK_DGRAM since we do not
 want connect() to connect to the other side.  If we
 cannot determine the source address remember this
 fact. */
if (fd == -1 || (af == AF_INET && q->ai_family == AF_INET6))
{
  if (fd != -1)
    __close_nocancel_nostatus (fd);
  af = q->ai_family;
  fd = __socket (af, SOCK_DGRAM | SOCK_CLOEXEC, IPPROTO_IP);
}
     else
{
  /* Reset the connection.  */
  struct sockaddr sa = { .sa_family = AF_UNSPEC };
  __connect (fd, &sa, sizeof (sa));
}
     if (try_connect (&fd, &af, &results[i].source_addr, q->ai_addr,
	       q->ai_addrlen, q->ai_family))
{
  results[i].source_addr_len = sizeof (results[i].source_addr);
  results[i].got_source_addr = true;

The results array is used next to sort:

/* We got all the source addresses we can get, now sort using
	 the information.  */
    struct sort_result_combo src
	= { .results = results, .nresults = nresults };
      if (__glibc_unlikely (gaiconf_reload_flag_ever_set))
	{
	  __libc_lock_define_initialized (static, lock);
	  __libc_lock_lock (lock);
	  if (__libc_once_get (old_once) && gaiconf_reload_flag)
	    gaiconf_reload ();
	  __qsort_r (order, nresults, sizeof (order[0]), rfc3484_sort, &src);
	  __libc_lock_unlock (lock);
	}
      else
	__qsort_r (order, nresults, sizeof (order[0]), rfc3484_sort, &src);

Where rfc3484_sort contains all 10 rules from the RFC.

static int rfc3484_sort (const void *p1, const void *p2, void *arg)

The retrieval of source addresses involves at least two syscalls for each destination address. This overhead should be considered if performance is crucial. For example, the alternative stub resolver c-ares (which we will review later) allows the option to disable sorting by RFC 6724.

If we run the above program under strace we will see all the calls:

$ strace -f -s0 -e trace=network ./getaddrinfo microsoft.com
① socket(AF_INET6, SOCK_DGRAM|SOCK_CLOEXEC, IPPROTO_IP) = 3
② connect(3, {sa_family=AF_INET6, sin6_port=htons(53), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "2603:1030:b:3::152", &sin6_addr), sin6_scope_id=0}, 28) = 0 
③ getsockname(3, {sa_family=AF_INET6, sin6_port=htons(47754), sin6_flowinfo=htonl(0), inet_pton(AF_INET6, "2001:db8:123:456:6af2:68fe:ff7c:e25c", &sin6_addr), sin6_scope_id=0}, [28]) = 0 
④ connect(3, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0

① – create a UDP socket to connect to the destination;

② – connect to it;

③ – obtaining the source address;

④ – reset socket by using connect() to AF_UNSPEC.

6.2.2 Happy Eyeballs: Success with Dual-Stack Hosts #

We have yet to address a very crucial issue: the reliability of IPv6 network routing in the real world. This problem is so significant that it has been a major obstacle to migration, leading to the development of new RFCs to promote IPv6 adoption.

From the RFC 6555 Happy Eyeballs: Success with Dual-Stack Hosts:

In order to use applications over IPv6, it is necessary that users enjoy nearly identical performance as compared to IPv4. A combination of today’s applications, IPv6 tunneling, IPv6 service providers, and some of today’s content providers all cause the user experience to suffer. For IPv6, a content provider may ensure a positive user experience by using a DNS white list of IPv6 service providers who peer directly with them. However, this does not scale well (to the number of DNS servers worldwide or the number of content providers worldwide) and does react to intermittent network path outages.

To addresses this issue, let’s revisit the following quote from RFC 6724: Default Address Selection for Internet Protocol Version 6 (IPv6) first:

Well-behaved applications SHOULD NOT simply use the first address returned from an API such as getaddrinfo() and then give up if it fails. For many applications, it is appropriate to iterate through the list of addresses returned from getaddrinfo() until a working address is found. For other applications, it might be appropriate to try multiple addresses in parallel (e.g., with some small delay in between) and use the first one to succeed.

The last sentence essentially captures the gist of the Happy Eyeballs algorithm: when it is impossible to determine the state of the network in advance, but the system is configured correctly with both global scope IP families, the only option is to use both families concurrently, with some preference given to IPv6.

There are two RFCs about Happy Eyeballs algorithm:

  1. RFC 6555 Happy Eyeballs: Success with Dual-Stack Hosts (obsoleted by RFC 8305 ↓).
  2. RFC 8305 Happy Eyeballs Version 2: Better Connectivity Using Concurrency

The motivation for the emergence of the above RFCs was the desire to create a standardized algorithm that prioritizes IPv6 addresses. This was necessary to combat the variety of existing algorithms that did not prioritize IPv6, thereby failing to encourage infrastructure upgrades and reduce reliance on outdated IPv4 hardware. Additionally, making numerous concurrent connections simultaneously can harm the network by overloading network equipment, routers, and servers. As stated in RFC 6555:

Instead, applications reduce connection setup delays themselves, by more aggressively making connections on IPv6 and IPv4. There are a variety of algorithms that can be envisioned. This document specifies requirements for any such algorithm, with the goals that the network and servers not be inordinately harmed with a simple doubling of traffic on IPv6 and IPv4 and the host’s address preference be honored.

  1. Algorithm Requirements

A “Happy Eyeballs” algorithm has two primary goals:

  1. Provides fast connection for users, by quickly attempting to connect using IPv6 and (if that connection attempt is not quickly successful) to connect using IPv4.

  2. Avoids thrashing the network, by not (always) making simultaneous connection attempts on both IPv6 and IPv4

By following these not-always-simple rules, you can create a client application that is reliable, flexible, and predictable.

It’s time to look at alternative stub resolvers and examples of real high load software.

IPv4 and IPv6 Read next chapter →