Heartbleed: What the heck happened
So, anybody who administers a web server — or anybody who uses a webserver administered by somebody — has by now heard of the catastrophic "11 on a scale of 1 to 10" HeartBleed vulnerability in OpenSSL. Other than being a very cool, very memorable name that spurs even non-technical people into action, what exactly is HeartBleed?
HeartBleed is the semi-informal name for CVE-2014-0160. It's a bug in OpenSSL, an open source implementation of the SSL protocol that was originally developed by Dr. Taher ElGamal for Netscape Communications in 1995. SSL, short for "Secure Sockets Layer" is in turn an open protocol specification for adding security to an insecure network connection by way of public key cryptography. For various technical reasons, SSL hasn't actually been called SSL for a very long time — usually when people refer to SSL, they actually mean TLS (Transport Layer Security), which was standardized by the IETF in 1999. (TLS v1.0 is technically SSL v3.1). So, although essentially all work in OpenSSL has been on the TLS protocol for the past decade and a half has been in support of TLS, it's still called OpenSSL; mostly because SSL sounds catchier than TLS.
Other than the proprietary implementation in Netscape's Navigator browser and Commerce Server web server, OpenSSL is the oldest implementation of the protocol. It began in 1996 as SSLeay when Eric A. Young (the EAY in SSLeay) and Tim Hudson started working on an open-source implementation of SSL, then in its second version (SSLv2). In 1998, both went to work for RSA Security and were unable to continue working on the open source project; it was forked and rebranded OpenSSL where it continues to be the most popular implementation of SSL/TLS in use. It isn't the only one; GnuTLS and NSS are two open source alternatives also in common use, but OpenSSL is by far the most complete implementation and remains a "fan favorite" for reasons of completeness as well as familiarity.
SSL: Converting an insecure channel to secure one
To properly understand Heartbleed, it's necessary to understand first what OpenSSL, and by extension SSL, does and how it does it. The Internet is a communications network for computers; it facilitates the exchange of messages over potentially large distances, even when a direct connection is unavailable. Even more specifically, the internet is a packet-switching network — this means that each message to be transmitted from computer A to computer B is divided into smaller packets, each of which are transmitted individually to the next-nearest node from point A to point B. So, for instance, the computer you're reading this on likely doesn't have a direct connection to my hosting provider; instead, you're probably connecting over WiFi to a wireless router that's connected to an access point maintained by your cable company. In turn, your cable company probably doesn't have a direct connection to my hosting provider, but they do have a connection to a network switch that is "closer". So, when you requested this web page, that request was segmented into smaller packets, transmitted one at a time to your wireless router, then transmitted again to your cable company, then again to a "tier 1 switch", and so on until they found my hosting provider.
Once all of the packets in your "show me the Heartbleed article from commandlinefanatic.com" request made it, they were reassembled by my hosting provider, and this web page was packetized and sent back. It may have been sent back along the same route as the original request, but it may not have been. In fact, each individual packet may have taken a different route; it's up to the receiving computer to collect and reassemble all of these packets and potentially request that a packet be re-sent if it's lost along the way (it happens). This is all fairly complex, but was standardized by the Internet Engineering Task Force's TCP Protocol as long ago as 1981. TCP provides the backbone of the modern internet and does its job admirably well.
One thing that TCP does not address, however, is privacy. When you request a web page, that request goes through several hops before it reaches its final destination, and a network administrator at any of those hops can take a look at the contents of your request or the response. The folks at Netscape who saw the commercial potential of the Internet in the mid-90's sought to address this with SSL. SSL encrypts requests and responses in flight in a fairly clever way, taking advantage of public key cryptography concepts to establish a secure channel over the inherently insecure internet. The packets still include routing information exposing the source and destination (as they must in order for the underlying routing infrastructure to do its job), but the contents of each packet are protected from eavesdroppers using strong cryptography. So strong, in fact, that the US national security agency restricted the export or even the use of SSL for a long time.
SSL also uses public-key cryptography to guard against a more subtle vulnerability. When you're transmitting packets from one router to another, leaving it up to each router to pass your packets on to either the final destination or a router that is logically closer, there's nothing stopping a malicious router from just pretending to be the receiver. So SSL is designed around a network of trust where certificate authorities vouch for the association of a public key with a specific host name; the trusted certificate authorities digitally sign such assertions (called certificates) in a way that your browser can verify. One consequence of this is that certificate signing — the verifiable association of a host name with a public key — is a relatively heavyweight process and incurs a moderate financial cost as well, so is something that website operators typically only do once a year. A public key, then, must be "big" enough to last for at least a year's worth of transactions.
Datagram Networks
As you look over the description of TCP, though, you may notice that it's not necessarily a one-size-fits-all networking approach. TCP was designed around the concept of "streams" of data wherein it's assumed that each side will be sending a lot of data back and forth — hence the need for packetization. Technically speaking, there's a lot of overhead in splitting data into packets, sequencing them, resequencing them, requesting missing packets, potentially throttling a sender who is sending packets faster than a receiver can manage them — what if you don't need all that overhead? TCP provides reliability, but what if speed is more important than reliability?
At the same time that the IETF was standardizing TCP, they developed a parallel protocol called UDP, "User Datagram Protocol", which was much simpler, and much faster than TCP. User Datagram Protocol doesn't packetize anything, doesn't sequence things, doesn't request things be re-sent or slowed down; in essence, it's TCP with the TCP part stripped out. Although you have to be a major networking geek to even know what UDP is, there are practical applications of it. For instance, NTP, a standard protocol for clock synchronization over the internet, is defined in terms of UDP rather than TCP. This make sense; if a time server broadcasts the fact that it's currently 1:07 PM, and this broadcast gets lost, there's no value in asking the time server to resend it.
UDP is particularly enticing for Voice Over IP (VOIP) protocols. For two
users to carry a voice conversation, the connection has to deliver packets as
fast as possible, and there's no value at all in retransmitting a lost packet;
the packets must be delivered as they're received, with little or preferrably
no buffering, to maintain the continuity of the conversation. But arguably the most
important use of UDP on the modern internet — and one that you interact
with every day whether you're aware of it or not — is the name resolution
protocol DNS (Domain Name Service
) which standardizes how the
IP addresses that correspond to domain names like
commandlinefanatic.com
can be requested.
In fact, there are quite a few potential use cases for UDP, but one barrier to their adoption was the fact that SSL/TLS was defined in terms of TCP, not UDP. This meant that there were no privacy protections for network protocols based on UDP. In 2006, the IETF drafted a new standard called DTLS (Datagram Transport Layer Security) that was designed to provide privacy and integrity protections to datagram protocols such as UDP. However, there is something of a square peg in a round hole effect when you try to apply TLS to UDP.
Applying SSL Concepts to Datagram Networks
Cryptography in its modern form has been around since essentially 1974 when IBM created the Data Encryption Standard (DES) for the United States government. Although DES is too weak by modern standards to provide real protection for data, ciphers in practical use such as the current Advanced Encryption Standard (AES) are variants of the same basic concept. Data to be encrypted (plaintext) is combined with a secret key by the sender before transmission to produce ciphertext. The receiver also has access to the secret key and thus combines it with the ciphertext on receipt to recover the plaintext. As long as the key remains secret, nobody except the sender or the receiver can recover the original plaintext.
This works perfectly well, but management of this secret key becomes a problem — how can I safely establish a secret key with somebody over the internet without having one in the first place? One early solution to this problem was called Kerberos and involved a centralized ticket-granting-server with whom a secure key was established offline which would then be used to hand out one-time-use keys for communicating parties. As you can imagine, this wouldn't scale to the size of the modern internet, and the centralized ticket granting service becomes somewhat suspect from a trust perspective.
A better solution is public key cryptography. DES and AES are examples of symmetric cryptography — both sender and receiver must share a key. Public key cryptography splits a key into two pieces, public and private. Messages are encrypted with the public key, but can only be decrypted with the private key. This allows the sender to broadcast the public key, not caring who might be listening, and then accept messages encrypted by anybody which only he can decrypt (being the only holder of the private key). Without going into the details, one characteristic of public key cryptography is that it necessarily relies on operations that computers are inherently bad at — in effect, taking advantage of the slowness of, say, exponentiation operations. So, by their nature, public key cryptography operations are slow, so in general they're used to negotiate a one-time-use symmetric cryptography key which then takes over the protection of the channel. The bulk of the TLS protocol revolves around standardizing exactly this operation.
Enter UDP. In the context of TCP — by design a relatively long-lived connection, the whole public key negotiation works great. When the connection is initially established, an "expensive" public-key operation handshake takes place to create a secure context which is then used over the lifetime of the connection. But UDP, by definition, is connectionless. Each packet — on the order of a few hundred bytes — is its own individual connection. Since UDP is only useful for applications where a certain amount of packet loss is acceptable, there's no guarantee of continuity. In some ways, DTLS adds a minimal amount of "TCP-ness" to a UDP "session"; at least enough to allow a negotiated key to be reusable over a longer period of time than the context of a single packet.
What this means, though, is that neither side can ever know when it's safe to discard the keying material. With TCP, there's a logical connection — it has a beginning and an end; when one side closes the connection, the key can be safely discarded. UDP has no close message, because the context is a single packet. In February 2012, the IETF sought to address this problem with the TLS/DTLS heartbeat extension.
The DTLS Heartbeat
The heartbeat extension was simple enough; one side would send a heartbeat request and if the other side didn't respond with a heartbeat response quickly enough, the connection could be assumed be be closed, and the keying material safely discarded. However, the protocol designer sought to address one extra concern — Path MTU discovery. Each hop between sender and receiver has a maximum message size, and that size can vary from one hop to the next. So a packet that's perfectly acceptable to your Wifi router might be discarded by the tier 1 backbone router. For TCP, this isn't a problem, since messages can be packetized as small as they need to be to fit through a given hop. But UDP doesn't support packetization — each message is a packet, so if a router encounters a UDP message that's too big, it discards it with a control message back to the sender.
DTLS throws a monkey wrench into this whole process, though, because it adds overhead to each packet which the sender is unaware of. So, while adding a heartbeat to DTLS, RFC 6520 additionally expanded the heartbeat protocol to permit Path MTU discovery. The application could transmit the largest-sized message it needed to support, DTLS would add its own headers and footers, and try to get it through to the receiver. If the receiver got it, it would be responsible for reflecting back the entirety of the message so that the receiver knew that the entire message was received without truncation. If the message was never received, the DTLS layer can try another one, a bit smaller, until it determines the largest size message that can be delivered to the receiver.
OpenSSL's charter is to implement the TLS protocol and all of its related protocols, so after RFC 6520 was standardized, support was added. However, the implementer made one minor mistake — one that all C programmers have made at one time or another — when he didn't check the length parameter that was passed in. The protocol required the implementation to accept a variable- sized buffer, prepended with its length, which would be reflected back. Unfortunately, the implementer forgot to check to see if the reported size of the buffer was smaller than the size of the actual UDP packet itself. The attack, then, was simple - pass in a 20-byte UDP packet but self-report that the buffer is 64K long. The server receives it and reflects back the 20 bytes of the request, along with whatever the contents of the following memory happened to be. This exposed a lot of data including the private key with which all of the symmetric keys were protected by — effectively exposing the plaintext contents of every client/server communication the server was or would be involved with.
HeartBleed was made public on Apr 7, 2014, but the bug was introduced on March 14, 2012 with the 1.0.1 release of OpenSSL. Although it usually takes a while for server administrators to upgrade their software — especially their libraries — 1.0.1 was also the first release of OpenSSL to support TLS 1.1, which included the fix for the BEAST attach which was then striking fear into the hearts of many server administrators, propelling them to upgrade quickly. (Ironically, Heartbleed ended up being much worse than Beast... not that server admins who upgrade responsibly shouldn't be commended). It's likely that this vulnerability was open on many servers for two years; it's unknown how many black-hat hackers may have discovered it before the repsonsible disclosure by Neel Mehta of Google. Web site administrators are urging everybody to reset their passwords, but it will never be known how much data may have leaked in that time period.