A walk-through of a TCP handshake
tcpdump
is a great tool for really making sense of what's going on "under the hood"
in your network communications — I've been called on more than once to troubleshoot an issue that
required me to dig down into the wire-protocol layer that tcpdump
exposes. There's
actually a more modern graphical tool called Wireshark
that exposes the same data,
while adding some graphical niceties, but since the output is equivalent and it's easier to show
tcpdump
output in a blog post like this one, I'll stick with tcpdump
output here. In this post, I'll capture the tcpdump
output of a TCP handshake and
walk through each byte of it and what each means and what it's for.
Of course, the first thing I need to do — before I even launch a browser — is to
start up tcpdump
in listening mode. As it turns out, by default, tcpdump
spits out everything that passes through your network card — which is
quite a lot. To start out with, then, it's worth narrowing down exactly what we're interested in:
TCP traffic on port 443. tcpdump
includes an option to filter the results using
an expression language: tcp port 443
is the filter that I'll use here. Also by
default, tcpdump
only summarizes the data under the assumption that you're mostly
interested in TCP behavior. In this case, I want to see everything, so I pass in the -x
option which instructs it to output the contents of every single data packet in hexadecimal. For
what should probably be obvious reasons, tcpdump
must run as root
on
Unix (including Mac OS/X) platforms. (If you're on Windows, there's an equivalent program called
windump that accepts the same parameters).
sh-3.2# tcpdump -x tcp port 443
tcpdump: data link type PKTAP
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on pktap, link-type PKTAP (Apple DLT_PKTAP), capture size 262144 bytes
tcpdump
waits until it sees any network traffic - when I open up, for example,
the Amazon home page, there's a flurry of activity. The first three packets that are exchanged
are, as required, the TCP 3-way handshake, which I step through below.
14:39:43.084497 IP localhost.54626 > server-54-192-87-96.lax3.r.cloudfront.net.https: Flags [S], seq 4183262244, win 65535, options
[mss 1460,nop,wscale 5,nop,nop,TS val 541058676 ecr 0,sackOK,eol], length 0
0x0000: XXXX XXXX XXXX YYYY YYYY YYYY 0800 4500
0x0010: 0040 94ae 4000 4006 ed7e 0a64 2007 36c0
0x0020: 5760 d562 01bb f957 8424 0000 0000 b002
0x0030: ffff 2324 0000 0204 05b4 0103 0305 0101
0x0040: 080a 203f e674 0000 0000 0402 0000
Figure 1: TCP SYN packet
The first packet that my browser exchanges with Amazon is a TCP "synchronize" (SYN
)
packet, shown in figure 1. The first line of the tcpdump
output above is a summary of
the packet (which is all you see if you don't ask for a full hex dump),
followed by the full 78-byte TCP packet. tcpdump
doesn't offer much help in
interpreting the hex dump (that's what the summary line is for, after all), but if you're at least
somewhat familiar with TCP/IP, you know that this is an Ethernet header, followed by an IP header,
followed by a TCP header. Just to be on the safe side, I've masked out my source and destination
MAC addresses with YYYY YYYY YYYY and XXXX XXXX XXXX, respectively (you'll see actual numbers if
you try this yourself). These are followed by the two-byte next protocol indicator, which is 0x0800, or
the Internet Protocol (IP). The IP header, per
the IP specification, section 3.1 starts
with a four-bit version number 4 and a four-bit header length 5. The header length is counted in 32-bit "words",
so this packet includes 20 bytes of IP data — the contents of the IP
header here are:
0x0000: 4500
0x0010: 0040 94ae 4000 4006 ed7e 0a64 2007 36c0
0x0020: 5760
(notice that the 20 byte length count includes the leading byte that declares the length in the first place).
The next byte is the "type of service" byte which is 00 here: this byte is rarely, if ever used. This is followed by the two-byte
total length of the packet 0x0040 (64 decimal) — this includes the IP header itself, but
not the 14-byte Ethernet header. 0x94ae is the fragment identifier (used in reassembling partial
packets). This fragment identifier in this case turns out to be relatively superfluous because
the second bit of the next byte 0x40 is the "do not fragment" bit. The remainder of the byte and
the next one are the fragment offset — 0 since this is a full packet which doesn't permit
fragmentation anyway. Next is the time to live 0x40 (64) — each hop is responsible for
decrementing this value and discarding it whenever the value is 0 to prevent packet looping. The
next byte, 0x06, is the protocol indicator of the next header — in this case TCP.
The next two bytes, 0xed7e, are the header checksum. This is defined tersely in the RFC as
"the 16 bit one's complement of the one's complement sum of all 16 bit words in the header."
This would suggest that you could compute it with a routine similar to listing 1, below:
unsigned short headers[] = {...}; // headers go here
unsigned short checksum = 0x0000;
for (int i = 0; i < ((headers[0] & 0x0F00) >> 7); i++) {
checksum += headers[i];
}
headers[5] = ~checksum;
Listing 1: almost, but not quite, IP header checksum routine
This doesn't quite work, though, because it discards overflow - any data in the high-order half has to be added back to the low order half before inverting it, as in listing 2:
checksum = (checksum & 0xFFFF) + ((checksum & 0xFFFF0000) >> 16);
headers[5] = ~checksum;
Listing 2: accounting properly for overflow
This process has the benefit that the receiver can check the checksum quickly by performing the same routine and verifying that the result is 0, as expected.
Finally, the last 8 bytes are the source address and destination address. My IP address (at least as far as my localhost is concerned) is 10.100.32.7, and the destination address (Amazon's web server) is 54.192.87.96. Notice in figure 1 that this is actually embedded in the text host name given out by the CDN — in this case Amazon's own CloudFront CDN.
IP headers are permitted to include quite a few optional values, but none of the packets in this exchange include them; they're all "bare" 20-byte IP headers.
Moving on to the TCP header, as specified by RFC 793,
which starts at byte 34 of figure 1, you see that the first two bytes are the "source port" of 54626
(0xd562). The next two bytes are the destination port 443 (0x1bb). Notice in the tcpdump
summary line
that the source port 54626 is shown as the source, but 443, the destination, is annotated with
the "friendlier" name
https
. Next up is the 8-byte sequence number and the 8-byte acknowledgment number.
The sequence number is f957 8424
— TCP is a "sliding window" protocol, so each
exchange starts with a random sequence number which is incremented by the size of the previous
packet on each subsequent packet not including headers: so the next packet sent by this
socket should (and, if you glance down a bit, is) actually be f957 8425
, 1 more than
the last packet. The
8-byte acknowledgment number is 0 — since the other side hasn't sent anything, there's
nothing to acknowledge.
The next byte is 0xb0; the first four bits of this are the header length, in 32-bit "words", as in the IP header.
The following byte, 0x02, is the "flags" byte of which only one flag is set in this case: the
next-to-last bit, indicating that this is a "synchronize" (SYN
) packet — in
other words, starts a new socket connection. 0xffff is the window size, indicating that
the receiver of this packet can respond with up to 65,535 bytes of unacknowledged data at a time,
but no more (but see the options, below). As with
the IP header, the TCP header has its own checksum which follows the window size; in this case,
0x2324. The TCP header calculation is slightly more complex than the IP header, because it
actually incorporates some elements from the IP header.
The unused (and mostly unspecified) "urgent pointer" that follows is 0, as it is for effectively all TCP traffic.
That's the end of the standard TCP header. Like IP, TCP allows for variable options to be appended to its header, but unlike IP, these are actually pretty common in TCP. Since you know from the header length that this is a 44-byte header, and the standard TCP header consumes exactly 20 bytes. Remember, though, that the TCP header length is given in 4-byte words, so it isn't safe to assume that all of the remaining data in the header are options; the options themselves encode enough information to process each one in turn even if you don't know ahead of time where to stop. Each option is a tag/length/value triple; the tag is the option specifier, the length is how long the option is, including the tag and length byte (limiting TCP options to < 253 bytes of value - probably a good thing, considering that these are prepended to data packets!), and the value varies depending on the tag. The TCP specification does permit the length/value to be omitted for tags which don't require data — as it turns out, there are only two of these, which I'll cover below.
The first tag is 0x02, which is specified in RFC 793 as the maximum segment size option. It's followed by 0x04 bytes of data which are themselves the maximum TCP segment of 0x05b4, or 1460 bytes. This instructs the receiver that, although the client can buffer up to 65,536 bytes of data at a time, the network card can only accept 1460 of them, so each packet must be less than this size, including TCP/IP headers.
This is followed by the "no-op" value 0x01. This is one of the two defined options that doesn't require (or allow) a length. This is typically used, as it is in this case, to align the next option on a word boundary.
Next, the options list ends with option 0x03. You can scour RFC 793 for an option with a tag 0x03, but you won't find one - this option was actually defined almost 10 years after TCP was, in RFC 1323. The "window scale" option is a 3-byte option and thus has a single byte of value: in this case 0x05. When TCP was first defined in 1981, an unacknowledged window of 65,536 bytes seemed like a lot — it was unlikely that the networks of the time would be able to send that much data before the application could consume and acknowledge that much outstanding data. However, not even 10 years later, this small window size was resulting in performance problems due to underutilization of the network. Rather than changing the TCP header specification, this option tells the receiver to left shift the window size specification 5 times (that is, multiply it by 32). This would imply that the client is advertising a buffer of 65535 * 32 = 2,097,120 bytes, but at this point, it can't make any assumptions that the receiver will understand the window scaling option, so to be on the safe side, it starts out by advertising as large a window as the TCP specification permits.
After two no-op bytes, the next option, 0x08, of length 0x0a (10 bytes), is the timestamp value. This is also part of the RFC 1323 high-throughput specification. By affixing a timestamp to each packet, the TCP implementation can get a better measure of what sort of network delay is in place and do a better job of only retransmitting packets when it's certain that they have actually been dropped by the underlying network.
The next option byte is 0x04 TCP selective acknowledgments permitted, which, too, is specified in RFC 1323 along with the By affixing a timestamp to each packet, the TCP implementation can get a better measure of what sort of network delay is in place and do a better job of only retransmitting packets when it's certain that they have actually been dropped by the underlying network. window scale and timestamp options. The following length byte is 2: this indicates that there is no value, since TCP SACK doesn't need it. The first version of TCP required that, if a packet were lost, then all subsequent packets should be assumed to have been lost, too (remember that packets follow a sequential numbering scheme). Selective acknowledgments permit the receiver to acknowledge that packets 1 and 3 were received, but not 2, hence limiting the number of packets which need to be retransmitted. Practically speaking, all TCP implementations permit this, but it is still required to advertise support for it with this option.
There are two bytes remaining - both 0's. Per the specification, the next byte should be parsed as an option tag and it is: in this case the "end-of-list" tag which is the other tag that doesn't permit (or allow) a length byte. This is necessary here because without it the TCP header would end "too soon".
It's helpful to see the whole packet "unrolled" and each piece labelled as below:
Starting offset Contents Meaning 0x0000 XXXXXXXXXXXX Destination MAC address 0x0006 YYYYYYYYYYYY Source MAC address 0x000c 0800 Next protocol type (IP) 0x000e 45 IP version and IP header length / 4 0x000f 00 Type of service 0x0010 0040 Total packet length 0x0012 94ae Fragment Identifier 0x0014 4000 Fragmentation options 0x0016 40 Time to live 0x0018 06 Next protocol (TCP) 0x001a ed7e Header checksum 0x001c 0a642007 Source IP address 0x001e 36c05760 Destination IP address 0x0022 d562 Source port 0x0024 01bb Destination port 0x0026 f9578424 Starting sequence number 0x002a 00000000 Acknowledging sequence number 0x002e b0 RCP header length 0x002f 02 TCP flags 0x0030 ffff Window size 0x0032 2324 TCP header checksum 0x0034 0000 Urgent pointer 0x0036 020405b4 Maximum segment size 0x003a 01 No-op 0x003b 030305 Window scale 0x003e 01 No-op 0x003f 01 No-op 0x0040 080a203fe67400000000 Time stamp 0x004a 0402 TCP Sack permitted 0x004c 00 End of options 0x004d 00 Padding
Per the TCP handshake protocol, the server is now responsible for acknowledging the SYN packet,
which it does with the next packet:
You can see a lot of similarities between this packet and the previous one. In fact, it's easier
just to consider the differences. First of all, byte 17, 0x3c, is the length of this packet —
four bytes shorter than the last one. This packet doesn't include an ID, and its time to live of
0xf1 is a bit higher (it's probably safe to assume that it was set by Amazon to 255 and was
decremented by exactly one at each of 14 routers that it passed through to make it back to me).
The header checksum is different of course and finally, the source and destination IP addresses
are swapped — this makes sense because now Amazon is replying to me.
14:39:43.118141 IP server-54-192-87-96.lax3.r.cloudfront.net.https > localhost.54626: Flags [S.], seq 3050391779, ack 4183262245, win 28960, options
[mss 1460,sackOK,TS val 1898598028 ecr 541058676,nop,wscale 8], length 0
0x0000: YYYY YYYY YYYY XXXX XXXX XXXX 0800 4500
0x0010: 003c 0000 4000 f106 d130 36c0 5760 0a64
0x0020: 2007 01bb d562 b5d1 48e3 f957 8425 a012
0x0030: 7120 0489 0000 0204 05b4 0402 080a 712a
0x0040: 4e8c 203f e674 0103 0308
Similarly, the TCP header starts with an inverted pair of source and destination ports. The
sequence number here is b5d1 48e3
— both sides maintain independent streams of
sequence numbers. Now, though, the acknowledgment number is f957 8425: one more than the
sequence number of the original SYN packet. TCP requires that each side acknowledge the next
byte — in other words, the Amazon server is telling me that it expects to receive sequence
number f957 8425 next. If my computer wasn't expecting to send that sequence number, that
would be an indicator to the TCP infrastructure that a packet had been either lost or duplicated.
The length of this header is 40 bytes instead of 44 as in the previous packet, and there's an
extra flag set, the "ACK" flag. From this point on, the ACK flag is set on every packet, but this
will be the last one in this interchange which also has the SYN flag set, because the
synchronization is now considered complete: both sides have agreed on a starting sequence number
and can thus recognize and recover from lost or duplicated packets. Interestingly, Amazon
advertises a smaller window size of 0x7120 (28,960 bytes). The options are the same —
Amazon is essentially agreeing to the options that my browser proposed — but it's interesting
to see that they're presented in a different order, and that they're aligned precisely at the
end of the header so that the 0x0 "end-of-list" tag is not present (or needed).
The TCP handshake is still not quite finished, though. My browser still has to acknowledge
the acknowledgment, which it does with its next packet:
This one is almost identical to the first packet now, since the source/destinations are the same.
The two main differences are that the SYN flag is no longer set and the acknowledgment number
(bolded above) is included. The window size is now 1015 (4117) scaled by 32 to 131,744, since
both sides have established now that they support window scaling. Finally, there are fewer
options in this case; the only option provided is the timestamp option (which must be present on
every packet is TCP timestamps are being used).
14:39:43.118213 IP localhost.54626 > server-54-192-87-96.lax3.r.cloudfront.net.https: Flags [.], ack 1, win 4117,
options [nop,nop,TS val 541058709 ecr 1898598028], length 0
0x0000: XXXX XXXX XXXX YYYY YYYY YYYY 0800 4500
0x0010: 0034 eda4 4000 4006 9494 0a64 2007 36c0
0x0020: 5760 d562 01bb f957 8425 b5d1 48e4 8010
0x0030: 1015 9440 0000 0101 080a 203f e695 712a
0x0040: 4e8c
At this point, the TCP handshake is done — both sides are ready to send each other data but as of now, any interim router (which it looks like there are at least 14 of) can see and log any data that's exchanged. Since the requested protocol was https, indicated by the choice of destination port 443, though, the client (my browser) knows to now begin an SSL handshake before transmitting even a single byte of HTTP data. In my next post, I'll walk through the (considerably more involved) SSL handshake that also occurs before the browser and the server can begin transferring actual HTTP messages.
Add a comment:
Completely off-topic or spam comments will be removed at the discretion of the moderator.
You may preserve formatting (e.g. a code sample) by indenting with four spaces preceding the formatted line(s)