As the Wanna(cry|crypt) malware campaign slows down, it’s a good time to analyze what happened, particularly between 12th and 15th May, 2017. Why this timeframe? Because it is the most critical period for an enterprise reacting to an unexpected, highly visible and, in fact, very dangerous threat. After 15th May the incident response process would have been focusing on more staid planning internally as well as giving political answers, but that is irrelevant from a CISO’s point of view.
Why Wannacry could spread so fast?
On May 12th , many organizations faced a previously unknown ransomware that spread fast on internal networks without any user interaction. Obviously, this malware had a worm exploiting an unpatched bug on the network level in Windows-based systems. As many mission-critical systems are still using old Windows versions (actually, in the UK a reported 90 percent of National Health Service trusts run at least one Windows XP device, an operating system Microsoft first introduced in 2001 and hasn’t supported since 2014) that are connected to LANs. Although these old systems shouldn’t be there, they were the attack surface for this attack, causing severe damage. But this bug didn’t just affect the old core systems, but even the latest clients that were unpatched because it can take days or even weeks to patch and test fixes in large enterprises.
An NSA data breach is the source
As time went by, it turned out that this attack can be traced back to a tool that was leaked in March by The Shadow Brokers from a massive data breach at NSA. Hackers? NSA? High profile targets? Meltdown of Great Britain’s healthcare system? Problems worldwide? With these names, this situation naturally received intense media and political attention. And all this thing happened on Friday May 15th , just before the weekend. This is a security teams’ worst nightmare. Therefore, everyone – indeed – launched (or ought to have launched) their incident management process.
Reconstruction of Wannacry spreading stages
At Balabit on Friday night, we were constantly receiving information from our customers and other security professionals. Meanwhile, we were helping organizations directly and through the media to minimize risks. Our internal security team was busy analyzing our own exposure. Therefore, we can reconstruct what happened in thousands of operational rooms worldwide.
- 1st step: Isolation: Infected endpoints needed to be isolated as soon as possible. Rip out the power cable as soon we see the malware!
- 2nd step: Information gathering: what is this, how does it work, how can we manage it? National CERTs released the official alerts on Friday the morning, just after a few hours of the initial outbreak. As this was too slow (Wannacry can spread across the network in minutes), the most efficient platforms for information sharing were Twitter and security blogs, besides informal communications between companies.
- 3rd step: Network segmentation: As it turned out, the exploit affects SMBv1 so this protocol needed to be filtered out from the network traffic. This was a risky decision as it was unforeseeable what services would be affected. A real example for risk assessment: prevent malware spreading or keep the business processes alive?
- 4th step: Implement countermeasures: Initial IOCs were spread in the security community in the afternoon. Anti-virus vendors also spread their signatures for Wannacry at that time. It took hours, the whole night for many to update IDS and firewall rules, AV systems and as many Windows servers and clients as possible.
- 5th step: Go home, have a rest and keep your fingers crossed: As @malwaretechblog, with the help of Darien Huss found the “kill switch”, the propagation dramatically slowed down. The security teams had the fear, uncertainty and doubt, what would come next? Maybe a next variant? Did they patch all systems against MS17-010? What could happen on Monday? Should the company be afraid of making headlines? Did they misconfigure something in the rush?
Now we see that the direct damage was relatively small, but the indirect costs were pretty high. Thousands spent the night at their workplace and who knows how many configuration errors were made on infrastructure during this heightened situation?
Best practices we can learn from the Wannacry incident
- First of all, if your organization doesn’t have a similar experience, you might not have an incident management process in place that works well in a crisis. In that case, we recommend you be prepared for the next one – be aware, that will come soon.
- Consolidate all the information provided by your existing security tools, e.g. firewall, IDS, AV software, in one place in the form of logs. Our central logging solution, syslog-ng can help you find the potential IOCs with a single search from the logs provided by your security infrastructure.
- If you have a mature process and everything is in place, you should have experienced the 5 stages of the above mentioned incident response process – and as a result, your network infrastructure might have a strange behavior now. That shouldn’t be a surprise as you installed a patch without prior deep testing and blocked a network protocol that might be vital.
- Balabit Privileged Session Management solution enables you to easily reconstruct who did what in your network during the weekend – by reviewing exactly what has happened on the administrators’ screen during the configuration – and allows you to shorten investigation time and avoid unexpected cost.
- Balabit Privileged Account Analytics can highlight those administrative sessions that contained risky activities.
Fast reaction always carries the risk of human errors. With Balabit Privileged Access Management solutions, you can make post-incident activities in a more comfortable and secure way.