getaddrinfo() and POSIX spec

3. getaddrinfo() and POSIX spec #

Thus, instead of the deprecated gethostbyname(), getaddrinfo() should be used within libc. The getaddrinfo() function is a POSIX-standardized function and is defined in RFC 3943. It is IP version agnostic and returns data structures that can be easily reused in subsequent socket API calls (such as socket(), connect(), sendto()).

First of all, if you have a codebase that uses gethostbyname() and you are looking to migrate to the modern getaddrinfo(), I have bad news: it’s not a drop-in replacement. You need to understand the new data structures, logic and flags.

Let’s now take a closer look at its parameters and ways to call them, and understand why it is far superior to its predecessor.

3.1 Resolving hostname (node) #

We begin with a simple, typical example of client code that resolves a hostname into IP addresses:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <arpa/inet.h>
#include <unistd.h>

void print_ip_addresses(const char *hostname) {
    struct addrinfo hints, *res, *p;
    int status;
    char ipstr[INET6_ADDRSTRLEN];

    memset(&hints, 0, sizeof hints);          // <--- ① 
    hints.ai_family = AF_UNSPEC;              // <--- ②
    hints.ai_socktype = SOCK_STREAM;          // <--- ③

    if ((status = getaddrinfo(hostname, /* ---> ④ <--- */ NULL, &hints, &res)) != 0) {
        fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(status));
        return;
    }

    printf("IP addresses for %s:\n\n", hostname);

    for(p = res; p != NULL; p = p->ai_next) { // <--- ⑤
        void *addr;
        char *ipver;

        if (p->ai_family == AF_INET) { // IPv4
            struct sockaddr_in *ipv4 = (struct sockaddr_in *)p->ai_addr;
            addr = &(ipv4->sin_addr);
            ipver = "IPv4";
        } else { // IPv6
            struct sockaddr_in6 *ipv6 = (struct sockaddr_in6 *)p->ai_addr;
            addr = &(ipv6->sin6_addr);
            ipver = "IPv6";
        }

        // Convert the IP to a string and print it:
        inet_ntop(p->ai_family, addr, ipstr, sizeof ipstr);
        printf("  %s: %s\n", ipver, ipstr);
    }

    freeaddrinfo(res); // free the linked list
}

int main(int argc, char *argv[]) {
    if (argc != 2) {
        fprintf(stderr, "usage: %s hostname\n", argv[0]);
        return 1;
    }

    print_ip_addresses(argv[1]);

    return 0;
}

① – the hints parameter is the primary place to control the behavior of getaddrinfo(). We will be experimenting with it extensively.

The memset() function sets all fields to 0, which means none of the members are NULL and, as a result, no default values from getaddrinfo() will be used.

② – the address family is set to AF_UNSPEC, which means it returns all existing addresses, including both IPv6 and IPv4.

④ – the service field (essentially the port) is set to NULL because, for a simpler resolver, we are not intending to connect to the remote host.

③ – here, we set the hints.ai_socktype to SOCK_STREAM because, as I mentioned earlier, the main purpose of getaddrinfo() is to work with socket API functions and prepare data structures for future use. In our case, when we only need IP addresses, the socket type we set here doesn’t really matter. However, we don’t want to set it to NULL. If we did, a getaddrinfo() call would return three times more addresses due to its internal logic, preparing sockets for TCP, UDP, and RAW connections.

⑤ – the result of the call is a linked list of addrinfo data structures, ready to be used in subsequent socket API calls such as socket (man 2 socket), connect (man 2 connect), bind (man 2 bind), etc.

Compile it with glibc:

$ sudo apt-get install gcc
$ gcc -o getaddrinfo ./getaddrinfo.c

or with musl libc:

$ sudo apt install musl musl-tools
$ musl-gcc -o getaddrinfo ./getaddrinfo.c

and run:

$ ./getaddrinfo microsoft.com
IP addresses for microsoft.com:
  IPv4: 20.76.201.171
  IPv4: 20.112.250.133
  IPv4: 20.231.239.246
  IPv4: 20.70.246.20
  IPv4: 20.236.44.162
  IPv6: 2603:1020:201:10::10f
  IPv6: 2603:1030:c02:8::14
  IPv6: 2603:1030:b:3::152
  IPv6: 2603:1030:20e:3::23c
  IPv6: 2603:1010:3:3::5b

Please note that if you have a global scope IPv6 address on your machine, IPv6 addresses will be shown first. This is due to the default behavior described in RFC 6724, which we will discuss later.

If you don’t have an IPv6 address but want to experiment with different resolver logic for dual-stack applications, we can work around this by assigning a random global scope IPv6 address to one of the interfaces (not the loopback interface):

$ sudo ip a add 2001:db8:123:456:6af2:68fe:ff7c:e25c dev eth0

When you want to delete it to rollback to the IPv4 only global scope addresses:

$ sudo ip a del 2001:db8:123:456:6af2:68fe:ff7c:e25c dev eth0

For the purposes of our experiments with stub resolvers, it’s acceptable that this address will be without proper network routing.

Generally speaking, there are several reasons why getaddrinfo() returns a linked list. According to the documentation from man 3 gettaddrinfo:

There are several reasons why the linked list may have more than one addrinfo structure, including: the network host is multihomed, accessible over multiple protocols (e.g., both AF_INET and AF_INET6); or the same service is available from multiple socket types (one SOCK_STREAM address and another SOCK_DGRAM address, for example). Normally, the application should try using the addresses in the order in which they are returned. The sorting function used within getaddrinfo() is defined in RFC 3484; the order can be tweaked for a particular system by editing /etc/gai.conf (available since glibc 2.5)

3.2 Resolving ports (services) #

An interesting aspect of getaddrinfo() is that it also can resolve service names to ports using the /etc/services file. For example, if we change the above example code by:

  • adding a “domain” service (it’s a DNS port 53);
  • removing the line ③ with hints.ai_socktype = SOCK_STREAM.
status = getaddrinfo(hostname, "domain", &hints, &res)

we will get every IP address twice in the output. The reason is the /etc/services for domain service contains ports for UDP and TCP protocols, hence getaddrinfo() prepares two sockets of each protocol for every address returned from a nameserver:

$ grep domain /etc/services
domain          53/tcp                          # Domain Name Server
domain          53/udp
Read next chapter →