Replicating at Speed
The security community became aware of a new type of hardware-based “Zero Day” vulnerability a few years ago, starting with a complex issue dubbed RowHammer. This was an exploit that took advantage of the underlying hardware, where flipping bits under the control of one program could impact and compromise adjacent memory.
The latest threats making news are called Spectre and Meltdown, which, while distinct from each other, take advantage of a common feature in most modern processors: speculative execution.
In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch will go (i.e., an if-then-else structure) before this is known definitively. The purpose of the branch predictor is to improve the flow in the instruction pipeline. Branch predictors play a critical role in achieving effective performance in many modern pipelined microprocessor architectures such as x86.
Without branch prediction, the processor would have to wait until the conditional jump instruction has passed the execute stage before the next instruction can enter the fetch stage in the pipeline. The branch predictor attempts to avoid this waste of time by trying to guess whether the conditional jump is most likely to be taken or not taken. The branch that is guessed to be the most likely is then fetched and speculatively executed. If it is later detected that the guess was wrong, then the speculatively executed or partially executed instructions are discarded and the pipeline starts over with the correct branch, incurring a delay.
This type of execution is what enables modern processors to be super efficient and is the basis of the economies of scale achieved with virtualization in both the private and public cloud environment.
In June 2017, a team of independent researchers, university research labs, some of Google’s Project Zero members, and Cyberus Technology discovered two security vulnerabilities enabled by the widespread use of speculative execution. The problem was also independently discovered by other researchers at about the same time. The vulnerabilities were eventually made public in January 2018, and were dubbed Meltdown and Spectre. They potentially allow malicious software to read otherwise protected memory on a computer system, gaining access to sensitive data such as passwords and encryption keys.
To make speculative execution as efficient as it can be, some combinations of an operating system and underlying hardware let it touch data in the operating system’s private memory before it is actually needed. The vulnerabilities stem from the ability of a malicious program to then infer what this otherwise inaccessible data was, after the fact.
The more widely discussed of the two vulnerabilities, Meltdown, relies on certain hardware choices to read users’ sensitive information, but can be addressed using software updates for the relevant computing platforms. Spectre, the lesser known and more difficult to apply of the two, makes it possible for a program to access data in a separate program running on the chip, and is far more difficult to fix with a single solution. While not ideal, the best general defenses involve detection on one hand, and separation of running processes’ cache activity from each other. The latter mitigation can carry a significant performance penalty on some architectures, and over particular workloads.
Both flaws work by performing an indexed load from memory. During this load, a first piece of data, A (supposed to be off limits), is read from memory, and then this piece of data is used to calculate the address of another piece of data, BA (accessible), to be read from memory as well. As A is off-limits, the processor ultimately cancels any direct effect of the operation on the registers and memory once it notices that the read should not have been allowed. However, BA is still present in the cache, and this condition can be detected by the attacker by reading BA for all possible values of A, and observing which read operation performs noticeably faster than the others.
The difference between the flaws is the not-allowed condition that is being bypassed and the means of doing so:
In Meltdown, the attacker causes speculative execution to breach the protection boundary between a user program’s memory space and a protected kernel page, by making speculative execution fetch an address into the cache. Modern Intel processors will execute such code, and then fetch and store the speculative results, leaving them in cache. Later, the exploitative code can measure what happened, either after a fault or after preparing the original code as part of a memory transaction, which will quell the fault. Some processors, such as AMD, are believed to be immune against this, as they perform the page accessibility test before executing the speculative read.
Google had originally planned a coordinated disclosure of the full Project Zero report on Tuesday January 9, 2018, and said it had been working with both hardware and software companies to mitigate the risks over a number of months. The heightening speculation over the issue however seems to have forced an accelerated publicity schedule.
This issue was discovered in labs by ethical hacking teams, and there are no known large-scale attacks attributed to the exploits. However, in today’s world, Nation States and even criminal gangs are involved in large-scale research on zero day vulnerabilities, so the risks cannot be ignored.
While this covers all modern processors, there are far easier ways to compromise end-user devices. This type of exploit is of most concern in shared environments such as VMware, OpenStack and various commercial Cloud offerings.
At this time, VMware, Cisco, AWS, Microsoft, Google and others are deploying patches and a pipeline of fixes is underway.
For their own infrastructure, clients should begin installing all relevant patches as they become available. Those utilizing public cloud offerings should work closely with their vendors on any mitigations necessary.
The major early concern is that there appears to be a significant performance penalty with the fixes. This may require adding CPU resources internally, but clients utilizing cloud vendors need to be aware that utilization may increase. Both of these issues will have cost implications.
Intel and other process manufacturers have stated the performance impact is highly workload-dependent, but have not given specific guidance.
UPDATE: On Saturday, in The Verge, Epic Games showed the impact on performance for their multiplayer universe game Fortnite was approximately 20 percent. This type of overhead would indicate that clients will have to plan carefully.
The new exploits are actually covered by three separate NIST CVEs:
As security threats increase and hackers’ methods become more sophisticated, it’s vital to evaluate your company’s cybersecurity. Learn more about NetCraftsmen assessments, which can provide you with a security score and pinpoint areas for remediation.
Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.
John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services. Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.
He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.