3. getaddrinfo()
and POSIX spec
#
Thus, instead of the deprecated gethostbyname()
, getaddrinfo()
should be used within libc
. The getaddrinfo()
function is a POSIX-standardized function and is defined in RFC 3943. It is IP version agnostic and returns data structures that can be easily reused in subsequent socket API calls (such as socket()
, connect()
, sendto()
).
First of all, if you have a codebase that uses gethostbyname()
and you are looking to migrate to the modern getaddrinfo()
, I have bad news: it’s not a drop-in replacement. You need to understand the new data structures, logic and flags.
Let’s now take a closer look at its parameters and ways to call them, and understand why it is far superior to its predecessor.
3.1 Resolving hostname (node) #
We begin with a simple, typical example of client code that resolves a hostname into IP addresses:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <arpa/inet.h>
#include <unistd.h>
void print_ip_addresses(const char *hostname) {
struct addrinfo hints, *res, *p;
int status;
char ipstr[INET6_ADDRSTRLEN];
memset(&hints, 0, sizeof hints); // <--- ①
hints.ai_family = AF_UNSPEC; // <--- ②
hints.ai_socktype = SOCK_STREAM; // <--- ③
if ((status = getaddrinfo(hostname, /* ---> ④ <--- */ NULL, &hints, &res)) != 0) {
fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(status));
return;
}
printf("IP addresses for %s:\n\n", hostname);
for(p = res; p != NULL; p = p->ai_next) { // <--- ⑤
void *addr;
char *ipver;
if (p->ai_family == AF_INET) { // IPv4
struct sockaddr_in *ipv4 = (struct sockaddr_in *)p->ai_addr;
addr = &(ipv4->sin_addr);
ipver = "IPv4";
} else { // IPv6
struct sockaddr_in6 *ipv6 = (struct sockaddr_in6 *)p->ai_addr;
addr = &(ipv6->sin6_addr);
ipver = "IPv6";
}
// Convert the IP to a string and print it:
inet_ntop(p->ai_family, addr, ipstr, sizeof ipstr);
printf(" %s: %s\n", ipver, ipstr);
}
freeaddrinfo(res); // free the linked list
}
int main(int argc, char *argv[]) {
if (argc != 2) {
fprintf(stderr, "usage: %s hostname\n", argv[0]);
return 1;
}
print_ip_addresses(argv[1]);
return 0;
}
① – the hints
parameter is the primary place to control the behavior of getaddrinfo()
. We will be experimenting with it extensively.
Thememset()
function sets all fields to 0, which means none of the members areNULL
and, as a result, no default values fromgetaddrinfo()
will be used.
② – the address family is set to AF_UNSPEC
, which means it returns all existing addresses, including both IPv6 and IPv4.
④ – the service
field (essentially the port) is set to NULL
because, for a simpler resolver, we are not intending to connect to the remote host.
③ – here, we set the hints.ai_socktype
to SOCK_STREAM
because, as I mentioned earlier, the main purpose of getaddrinfo()
is to work with socket API functions and prepare data structures for future use. In our case, when we only need IP addresses, the socket type we set here doesn’t really matter. However, we don’t want to set it to NULL
. If we did, a getaddrinfo()
call would return three times more addresses due to its internal logic, preparing sockets for TCP
, UDP
, and RAW
connections.
⑤ – the result of the call is a linked list of addrinfo
data structures, ready to be used in subsequent socket API calls such as socket
(man 2 socket
), connect
(man 2 connect
), bind
(man 2 bind
), etc.
Compile it with glibc
:
$ sudo apt-get install gcc
$ gcc -o getaddrinfo ./getaddrinfo.c
or with musl libc
:
$ sudo apt install musl musl-tools
$ musl-gcc -o getaddrinfo ./getaddrinfo.c
and run:
$ ./getaddrinfo microsoft.com
IP addresses for microsoft.com:
IPv4: 20.76.201.171
IPv4: 20.112.250.133
IPv4: 20.231.239.246
IPv4: 20.70.246.20
IPv4: 20.236.44.162
IPv6: 2603:1020:201:10::10f
IPv6: 2603:1030:c02:8::14
IPv6: 2603:1030:b:3::152
IPv6: 2603:1030:20e:3::23c
IPv6: 2603:1010:3:3::5b
Please note that if you have a global scope IPv6 address on your machine, IPv6 addresses will be shown first. This is due to the default behavior described in RFC 6724, which we will discuss later.
If you don’t have an IPv6 address but want to experiment with different resolver logic for dual-stack applications, we can work around this by assigning a random global scope IPv6 address to one of the interfaces (not the loopback interface):
$ sudo ip a add 2001:db8:123:456:6af2:68fe:ff7c:e25c dev eth0
When you want to delete it to rollback to the IPv4 only global scope addresses:
$ sudo ip a del 2001:db8:123:456:6af2:68fe:ff7c:e25c dev eth0
For the purposes of our experiments with stub resolvers, it’s acceptable that this address will be without proper network routing.
Generally speaking, there are several reasons why getaddrinfo()
returns a linked list. According to the documentation from man 3 gettaddrinfo
:
There are several reasons why the linked list may have more than one
addrinfo
structure, including: the network host is multihomed, accessible over multiple protocols (e.g., bothAF_INET
andAF_INET6
); or the same service is available from multiple socket types (oneSOCK_STREAM
address and anotherSOCK_DGRAM
address, for example). Normally, the application should try using the addresses in the order in which they are returned. The sorting function used withingetaddrinfo()
is defined in RFC 3484; the order can be tweaked for a particular system by editing/etc/gai.conf
(available sinceglibc
2.5)
3.2 Resolving ports (services) #
An interesting aspect of getaddrinfo()
is that it also can resolve service names to ports using the /etc/services
file. For example, if we change the above example code by:
- adding a “
domain
” service (it’s a DNS port 53); - removing the line ③ with
hints.ai_socktype = SOCK_STREAM
.
status = getaddrinfo(hostname, "domain", &hints, &res)
we will get every IP address twice in the output. The reason is the /etc/services
for domain
service contains ports for UDP
and TCP
protocols, hence getaddrinfo()
prepares two sockets of each protocol for every address returned from a nameserver:
$ grep domain /etc/services
domain 53/tcp # Domain Name Server
domain 53/udp