Heartbleed @ AWS (Amazon)

Post Reply
faith
Posts: 25
Joined: Mon Jan 28, 2019 7:19 am
Has thanked: 0
Been thanked: 0
Contact:

Heartbleed @ AWS (Amazon)

Post by faith » Sat Apr 20, 2019 6:09 pm

Via Colm MacCárthaigh


I think right around this minute is just about exactly 5 years since the Heartbleed vulnerability in OpenSSL became public. I remember the day vividly, and if you're interested, allow me to tell you about how the day, and the subsequent months, and years unfolded ...
Five years ago I was the Principal Engineer for @AWS Elastic Load Balancer. I was about a year into that, having moved over to build some cool tech that would later become AWS HyperPlane. Previously I'd worked on CloudFront and Route 53 and DDOS stuff.

CloudFront and ELB are easily two of the biggest TLS/SSL things at Amazon, and I'd previously worked on OpenSSL things, like Apache's mod_ssl, so then the issue went public ... I was one of the first people paged. I was on the 14th floor of our Blackfoot building. It was very very quickly evident that Heartbleed wasn't like other vulnerabilities. Normally there's a window between going public and exploits being crafted, but heartbleed was so easy to exploit that it took just minutes of poking around.Heartbleed was a memory disclosure vulnerability, which in theory is supposed to be less significant than a remote execution vulnerability, but this was scarier than any bug I'd ever seen. XKCD has an explainer
The TLS protocol had been extended to include a "Heartbeat" extension. It was intended for keep-alives and MTU discovery for DTLS, which uses UDP, but OpenSSL had included it in regular TLS too (which uses TCP).And at bottom, the bug was simple, you send a small amount of data, and ask the server to send you back up to 16k of data, and it would send back 16K of decrypted plaintext from memory. URLs, passwords, cookies, credit cards, just about anything could be in there. Ouch ouch.

OpenSSL was and is very very widely used, just about everyone was impacted in some way. AWS services, our competitors services, basically all of our customers in their own stacks. It felt like the internet was on fire.At Amazon we use conference calls for high severity events, usually operational, this was declared a security sev-1 (I've never seen another like this). Call leader that day was Kevin Miller. He just happened to be at all, but it worked out well because he had crypto experience.We quickly figured that we'd be patching everything that day, so an emergency was declared and all AWS software deployments were paused. This is incredibly disruptive, but the call leader has the authority to do this on their own. Our CEO and SVP agreed with the call.Within Amazon, we have our own package system called Brazil. At the time a part of Amazon.com (retail) owned our internal OpenSSL package, but over on ELB we took it over that day and came up with a minimal 2-line hot-patch. Didn't want new risks.

Within about an hour, deployments with the hot patch were in progress, and it went out quicker than I've seen anything. Within a matter of hours, AWS was 100% patched. Even 5 years ago, this was millions of deployments. Amazingly, there were no reports of customer impact either. In parallel to that were discussions about customer messaging and notification. We were asked to analyze if we thought private keys could have been disclosed. This wasn't an easy call. It looked like keys weren't leaking, but intermediate data used as part of key operations was.My best guess on the day was that enough material was in there that keys could be at risk. I recommended thatl customers rotate and revoke keys if they can, and our CISO and CEO took that as good enough and began that painful process.About a week later, that hunch was proved right, we know for sure because CloudFlare ran a contest to see if folks could re-assemble keys and they could. Impressive stuff!
To backtrack a little: once the HeartBleed website went live (which incidentally was hosted on AWS S3! and there was never event a hint of taking it down) we started getting a *lot* of customer contacts.

HeartBleed was really well marketed, which is a good thing! Months later in a presentation I showed that it made more headlines and news articles in one day than any war had since Vietnam. Good because people patched. 98% of customers patched within a week.I know that because on the night of Heartbleed we did something we never did before: we started vulnerability scanning every EC2 IP address and sending customers notifications. We thought it was a big enough deal that the emails would be worth it.
The day after Heartbleed, our core cryptography people met, I remember another collegue was there, and we did a few more things with the OpenSSL package. Amazon's OpenSSL has always been a bit different than the public one, but that day we created a new "hardened" branch.I won't go into what we did with it here, but quite a bit at the time, Emilia Kasper included some of the changes into base OpenSSL later I think. Our customers mostly upgraded to the latest public version from OpenSSL, which we had in Amazon Linux too.Unfortunately we had a few customers stuck though; their OpenSSL libraries were embedded in commercial software that they couldn't quickly upgrade. One of our VPs reached out "Is there anything we can do here?"

So at about 2AM, I wrote a Netfilter plugin that could block heart bleed using the Linux Kernel firewall. It's still on Github it tracks the TLS record layer state machine and would drop any heartbeat messages. Crude but effective.A Linux netfilter conntracking module that understands TLS records - In our annual planning, we had raised the idea of writing our own TLS/SSL implementation because we thought we could better, but it was a nascent plan. Well that went from nascent to DO IT NOW. I started writing when became Amazon s2n. It took about 5 weekends, just me, and there's something very special about finally getting a bunch of code together and seeing it work in a browser. It took a little longer, and 3 intense security reviews, to get approval to Open Source it, but our CEO was very supportive.

s2n is coded specifically in a way to try to avoid the problem heartbleed hit. Rather than parse memory into integers using pointers directly, all across the code, s2n uses a "stuffer" data structure that includes a cursor. Similar to BoringSSL's crypto_bytes, or DJB's stralloc.Oh BoringSSL! In the months after HeartBleed, the industry rallied to get OpenSSL more funding and support through the core infrastructure initiative. We still take part! And the BoringSSL and LibreSSL forks of OpenSSL happened. Great work from each!, that has also born fruits and helped us produce tools that can analyze cryptography code and find even subtle problems.

I'm almost done, but before I finish, I kind of depressing twist on this whole thing: The Heart Beat extension never really made any sense to begin with. A 0-byte record could have been used as a keep-alive, and ordinary path MTU discovery works for UDP!

All of this trouble for a feature that to this day I can't even think of a good use case for. This is one reason why "Don't do less well. Do less, well." resonates with me as a motto.

Thanks For Reading



Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest