Speculative Execution

What is Speculative Execution?

Speculative execution refers to the optimization method used by a computer system for a few specific tasks that may not be required. These tasks are typically performed before knowing whether or not they are needed.

Technically, this is a process that makes the best use of the available resources of a computer system and improves its performance by scheduling the instructions at a time ahead of the branch occurrence.

Understanding Speculative Execution

What is Speculative Execution

Speculative execution is a method to boost the performance of a processor in a computer system, making the best use of the available resources and reducing the chances of lags in the process.

Ideally, speculative execution is one of three mechanisms of out-of-order execution, which is also known as dynamic execution.

In this method, predictions of instructions are made before branching if it is assumed that they are necessary for the operation in the near future.

However, if the task is found to be not needed after it is completed, the results of it are ignored and the resultant changes in the task due to speculative execution are reverted.

The speculative execution technique helps in increasing the performance of modern microprocessors up to a certain extent for chips designed or built by any manufacturer such as:

AMD
Intel
ARM
IBM

In the true sense, speculative execution technique offered a significant boost in the performance of the CPUs when it was introduced as an upgrade of the earlier Intel processors in those that came out in the mid-1990s.

Typically, the cores of the modern CPUs that do not use this technique can be used especially in the ultra-low power setting or in those environments that need minimum processing tasks.

As a matter of fact, there are two primary aspects of speculative execution such as:

The level of speculativeness

This is the particular aspect that helps in understanding the various conditional branches that can be executed speculatively and particularly in a sequence, first, second, third, and so on.

This technique is quite beneficial for boosting the performance of the CPU if the unsolved conditional branch does not cease speculative execution.

This specific condition is even more necessary for the superscalar processors because it will help with issuing a lot of multiple instructions for each cycle.

Different types and brands of processors can issue a diverse number of pending conditional branches. For example:

The Power2 issues two conditional branches
The PowerPC 620 and R10000 issues four conditional branches
The α 21164 issues as many as six pending conditional branches.

There are a few particular processors that only permit a single pending conditional branch.

These CPUs are similar to any basic block scheduler that typically schedules instructions for executing them in parallel up to the end of the existing fundamental block.

On the other hand, there are processors that allow multiple pending conditional branches.

These processors typically resemble the global schedulers and can go over and beyond the fundamental block boundaries.

The degree of speculativeness

This particular concept is associated with the distance of the instructions in addition to the unsolved conditional branches.

All these instructions are typically employed according to the predicted conditional branch.

In this specific concept, the CPUs follow different methods, where the simplest of all is when the speculative execution can be made no further than fetching a few instructions that typically belong to a predicted path.

This degree of speculativeness is found to be much higher in some processors where those specific instructions that are along the predicted path and are fetched, decoded, or fetched and decoded, and dispatched.

This degree of speculativeness can also be higher for those particular processors that fetch, decode, dispatch, and execute the instructions subsequent to a pending conditional branch but not complete them.

As it is in the case of these processors, the instructions are allowed to be executed speculatively.

It offers a mechanism that helps to undo the instructions if, in case, the prediction made is incorrect. It can even undo two instructions in one cycle.

The main purpose of this specific method is to retain sequential consistency for the out-of-order execution. Typically, it is based on the history buffer.

Typically, speculative execution is performed in a variety of situations and areas such as:

Making branch predictions in the pipelined processors
Making value predictions to exploit the value locality
To control optimistic concurrency in the database systems
To prefetch files and memory and files

Ideally, multiple branch predictions are made to predict the instructions that are highly likely to be used in the near future.

Also, dataflow analysis is made along with it in order to align the instructions to ensure optimal execution in this way rather than executing them in the order they came in.

Speculative Execution Attacks

Speculative execution may sometimes result in some security vulnerabilities when implemented on common CPU architectures which are caused due to speculative execution attacks, which are mainly caused due to the redundant tasks.

These vulnerabilities in fact enable a height of privileges in all types and brands of processors.

According to Google’s research findings, most of the vulnerabilities are the result of side channel attacks on cache timing that typically exploit CPU speculation.

It is also dependent largely on the malware that may be running locally.

Though most of the ARM processors are not affected by the side-channel attacks on the speculation mechanism, as detailed by Google, most of the others and a few subsets of ARM-designed processors are susceptible to other different types of attacks as follows:

Variant 1 – This involves bounds check bypass store which is denoted as CVE-2017-5753 and also as CVE-2018-3693
Variant 2 – This involves branch target injection which is denoted as CVE-2017-5715
Variant 3 – This involves using speculative reads of data that are inaccessible and which is denoted as CVE-2017-5754
Subvariant 3a – This involves using speculative reads of data that are not accessible and is denoted as CVE-2018-3640
Variant 4 – This involves speculative bypassing of stores by newer loads in spite of a dependency being present and is denoted as CVE-2018-3639
Straight-line speculation – This refers to the speculative execution of the instructions in the memory in a linear manner subsequent to an unconditional alteration in the control flow and is denoted as CVE-2020-13844
Spectre-BHB – This involves branch target injection in the context of the same software, which is different from Spectre v2, where branch targets are injected across diverse exception levels, and is denoted as CVE-2022-23960.

Attacks can also happen in different categories and attack scenarios such as:

The Inter-VM category involves attack scenarios such as hypervisor to guest, guest to guest and host to guest
The Intra-OS category involves attack scenarios such as kernel to user, process to process, and intra-process
The enclave category involves attack scenarios such as enclave to any attack where the code is executed outside the enclave but is able to read the memory within it.

Some of the most common vulnerabilities that are caused by these attacks are as follows:

Foreshadow
Meltdown
Specter
Pacman
Micro architectural data sampling
SPOILER

In fact, the Intel CPUs targeted speculative execution for these flaws and vulnerabilities.

Speculative Execution Examples

One good and special case of speculative execution is speculative multithreading. Apart from that, the direct mapped cache also allows for a fast and simple speculative execution as well as concurrent processing and Out-of-Order Execution or OoOE are also good examples of a use of speculative execution that helps in predicting and retrieving data that may be required in the near future.

There are different variants of speculative execution followed by the modern processors, which typically is the computation related to earlier concepts. Some of these variants are:

Eager execution

This is a special type of speculative execution in which both ends of a conditional branch are carried out. However, in such cases, it is only the correct results that are committed, provided the predicate is true.

Also known as oracle execution, eager execution, in theory, offers the same performance with unlimited resources as a perfect branch prediction.

On the other hand, if the resources are limited, this method is to be employed with extreme caution because the number of resources that may be required will grow exponentially with every level of branch executed in this mode.

Predictive execution

This is another kind of speculative execution technique in which some of the results are predicted. The execution continues according to and along with the path predicted until the actual result is identified.

If the prediction is made correctly, the execution prediction is permitted to commit. However, if the prediction is incorrect, the execution needs to be unrolled and done all over again.

Some of the common types of predictive execution include the following:

Branch prediction
Memory dependence prediction

There is also a general form of predictive execution called value prediction.

There are some related concepts as well with reference to the speculative execution technique, which is called lazy execution or evaluation.

This technique is, however, the reverse of eager execution. This is because in this specific process, there is no speculation involved and it is too complicated.

What is the Use of Speculative Execution?

The main purpose of speculative execution is to avoid any situation where there could be a delay in the process after it is known that a particular type of task is necessary.

Ideally, speculative execution is used to speed up the performance because certain tasks will already have been completed ahead of time and will be used as and when needed.

Another significant objective of using this process is to allow for and make the best use of concurrency when and if additional resources are available.

The modern pipelined microprocessors use this processing technique also to lower the cost of conditional branch instructions.

This is typically done by using the schemes that guess the path of execution of a program considering the history of the earlier branch executions.

How is Speculative Execution Implemented?

Typically, in most speculative executions the control flow of a program is involved. This means that the program or the processor does not have to wait for all of the branch instructions to be resolved in order to determine the specific operations that are required to be executed.

The control flow helps them predict the subsequent instructions.

The techniques use two specific methods to implement them. These are:

Concurrent processing by using a branch predictor
Out-of-Order Execution

The implementation of the branch prediction process helps in several ways. For example:

This helps to presume the instructions that may be used in the future.
It also helps in data flow analysis which further helps in arranging the instructions for their execution in the optimal way instead of their order of appearance in the memory.

All these lower the overall execution time.

The main objective is to fetch data in anticipation that it will be needed later on, and is mainly done in the following way:

Guessing the branch by the predictor that is most likely to be used in the process
Gathering the following set of instructions related to that specific branch
Executing those instructions speculatively without knowing which specific branch it will actually use
Lining up the instructions in a proper sequence if these are guessed correctly by the branch predictor

If it is incorrect, the system will load the correct instructions and will proceed with them in place of the predicted ones.

However, the chances of such incidents happening are much less than 5% because the accuracy rate of the branch predictors is significantly high.

This means that there will hardly be any need for reloading new instructions.

On the other hand, as for the OoOE:

It does not let the pipeline to stall and therefore prevents the CPU from stopping processing instructions until the issues are resolved.
It also does not allow the creation of a gap between the speeds of the CPU and the main memory of the system.

Since there is no such gap, there is no question of it growing, and therefore the CPU does not have to wait for the memory to deliver the following instructions that are necessary, thereby reducing the time taken for one execution eventually.

With the help of OoOE, the processor of the system is therefore kept busy most of the time, if not all of the time.

This means that there is no idle time and the overall performance of the CPU as well as the system is enhanced significantly.

How to Enable Speculative Execution?

You will need to enable a specific set of highly sophisticated mechanisms to optimize the speculative execution technique where.

As said earlier, the processor will execute a set of tasks before it is even asked to do so. This will ensure that this specific information is ready at any given point when it is required.

Ideally, in Hadoop, speculative execution will be enabled by default as a “MapReduce job optimization technique.” If it is not, you may need to configure it from the XML configuration file.

All you have to do is simply set the mapreduce.map.speculative property to true. This will enable the speculative execution of the map task.

Once done, this will do a lot of good even in a profoundly exploited multi-tenant setting on a huge cluster such as:

It will reduce the average response times of jobs roughly by as much as 13%
It will decrease the standard deviation of the elapsed times of the job roughly by as much as 40%
It will lower the consumption of overall resources roughly by as much as 24%.

How to Prevent Speculative Execution?

If you find that data redundancy is causing issues in the operation of the processor of the computer system, you may be better off having the speculative execution disabled, or removing the sensitive information from the memory, or making it not readable during a speculative execution.

In order to disable speculative execution, you will have to disable the setting of the property value through the “mapred.map.”

All you have to do is set the mapreduce.reduce.speculative property to true. This will enable the speculative execution of the reduced task.

There are different attack mitigation techniques to follow as well after you disable speculative execution.

Though these techniques will have different degrees of impact depending on the category of attacks, the common processes to follow are mentioned here.

Prevent speculation technique – This process involves the following:

Speculation barrier by means of executing serializing instructions
Isolation of the security domain of the CPU core
Indirect speculation barrier on the branch depending on the mode change and demand
Safely speculated and non-speculated indirect branches

Removing sensitive content from the memory – This process involves the following:

Segregation of hypervisor address space
Splitting the user and kernel page tables

Removing observation channels – This process involves the following:

Mapping guest memory in the root extended page tables as non-cacheable
Not sharing physical pages across the guests
Decreasing browser timer precision

All these will minimize the attacks on the sensitive information that may be accessed in the address space.

Speculative Execution vs Branch Prediction

Speculation execution presumes which instructions are highly likely to be used in the near future and conducts data flow analysis to arrange them for best possible execution, but branch prediction determines the location where the execution will carry on after a conditional jump, so that it can read the subsequent instructions from the memory.
Branch prediction is a specific technique used by speculative execution, but the converse is not true.
For a speculation execution to be successful, the branch prediction needs to be correct. If not, it is discarded.
A speculative execution can happen even if there isn’t any real conditional branch within the code, which is not the case with branch prediction.
In some cases and in some ways, speculative execution is much easier in comparison to branch prediction.

Conclusion

In conclusion, it can be said that speculative execution is a key feature that increases the performance and job efficiency of a CPU in a computer system.

It reduces the response time and makes the system much faster. However, it may result in some issues as well due to data redundancy which can be resolved easily.