Branch Prediction

What is Branch Prediction?

Branch prediction in a CPU design refers to the process which guesses the result of a conditional function and organizes the most expected result. This action is performed by a digital circuit called the branch predictor.

Also known as branch predication or simply predication, this is an important process that reduces the cost of branching and expedites branch instruction processing by using the pipelining of the CPU.

Understanding Branch Prediction

What is Branch Prediction

Branch prediction process helps the CPU in executing instructions using pipelining.

The process is completed faster because the CPU does not have to wait for a conditional jump to be carried out which slows down the process significantly.

The instructions are broken down into predicates by using branch prediction which is similar to predicate logic and is executed only if it is true.

This helps the processor work more efficiently. This is an example of conditional logic with a branch predictor.

The significance of branch prediction for the efficient functioning of the CPU is quite profound because no work needs to be done when the pipeline stages reload.

The gain produced by it is reduced by the program transfer instructions present and the sequence is changed which causes all of the instructions entering the pipeline invalid to become after program transfer.

Branch prediction logic helps eliminate these problems in the pipeline.

Branch prediction may or may not be taken depending on the conditions such as:

When the prediction is true, no clock cycles are lost because the pipeline will not be flushed and
When the prediction is false, the current instruction will start over since the pipeline will be flushed.

At the fetch stage of branch prediction, there are mainly three specific things that are to be ascertained namely:

Whether or not the instruction fetched is a branch
The direction of the conditional branch and
The target of the branch, if it is taken.

If there is a conditional jump instruction, usually a two-way branching is implemented with it. This jump may be taken or not taken.

If it is taken, then it will move or ‘jump’ to a different position in the program memory.

If it is not taken, then it will continue being executed instantly after the conditional jump.

However, it is not always confirmed whether a conditional jump is taken or not taken until the calculation of the condition is made and the conditional jump passes the execution phase in the instruction pipeline.

Branch prediction is typically followed by using a 4-way set associative cache with 256 entries.

The process is called the Branch Target Buffer or BTB where there is a directory entry made for each line which comprises:

A valid bit that indicates whether or not an entry is valid and
A history bit that tracks the number of times the bit was taken.

Branch instructions are normally fetched from the source memory, and the branch target address is stored in the consequent data entry in BTB, if only the directory entry is valid.

Branch Prediction Example

A branch prediction, for example, identifies whether or not a conditional jump is taken such as in a testing code, performance predictors, tournament predictors and creating scoreboard architecture.

Testing code:

While testing C++ code, a conditional branch is used on unsorted and sorted data. The branch is replaced later on with a bitwise operation. The results of the testing code are then analyzed with no compiler optimization to find the following:

The time taken with sorted data by the conditional branch
The time taken with sorted and unsorted data with bitwise operation and
The time taken with sorted data and conditional branches.

However, the exact time may vary based on the infrastructure of the hardware but the comparative value will remain the same.

Tournament predictor:

This involves hybrid local-global predictors and combines them to find the right predictor.

Here, typically, the local predictor makes a prediction depending on the outcome of a branch for its last ten executions and the global predictor uses 4K entries.

Performance predictors:

This involves making correlated and local 2-bit predictions depending on the total prediction size and the conditional branch misprediction rate.

Scoreboard architecture:

This helps in detecting issues and execution of instructions along with the hazards involved and writing of the results.

There are different tables in it to keep track of the progress and the related intelligence of the execution which helps in finding out the time to dispatch instructions.

Here, every functional unit is linked to a single buffer in the wait queue.

Branch Prediction in Pipelining

There are several things that can create issues in the pipeline due to structural, data and control hazard and branch prediction can help resolve the issue in several ways depending on the condition.

The specific issues that can be caused are:

Due to structural hazards, stages may conflict during execution
Due to data hazards, the result of the earlier instruction may influence that of the following and
Due to control hazards, the conditional branches may break the pipeline rendering the things in it useless if they are fetched in advance.

In such situations, branch prediction can predict never taken or always taken.

If it is predicted never taken, the assumptions would be as follows:

Jump will not happen
Next instruction will always be fetched and
After branch VAX will not prefetch if there is a page fault.

If it is predicted as always taken, then the assumptions would be as follows:

Jump will happen and
Target instruction will always be fetched.

In a different scenario, if it is predicted by opcode, the assumptions will be as follows:

Specific types of instructions such as LOOP vs. JUMP will probably result in a jump and
A success rate of 75% can be achieved.

Predictions will be different in the case of a taken-not taken switch involving 1 bit branch predictor depending on the earlier history. In such situations, prediction can be as follows:

A branch will be taken again if it was taken the last time and
A branch will not be taken again if it was not taken the last time.

It will be considered good for loops and it may also use 1 bit to specify the earlier result and store this bit with every branch instruction. More bits may also be used for a more detailed history.

Types of Branch Prediction

Basically, there are two types of branch prediction techniques namely, static branch prediction technique and dynamic branch prediction technique, along with a few other variants.

Static branch prediction:

Static branch prediction can be general, profile-based, program-based or programmer-based, with different features for each.

As for the general static always not taken branch prediction, the features are:

Low accuracy of about 30 to 40%
Simple to implement without needing direction prediction or BTB and
The compiler may lay out the code so that the expected path is the “not taken” path.

As for general static always taken branch prediction, the features are:

A better accuracy of up to 60 to 70%
No direction prediction
Loop branches or others are usually taken and
Backward branch is usually the target address which is lower in comparison to the branch PC.

However, for BTFN, the backward loop is predicted as taken and others as not taken.

Profile-based static branch prediction:

In this type, the compiler finds out the probable direction of each branch by using profile run and encoding it in the branch instruction format as a hint bit. The advantage of this type is that you get predictions for each branch.

However, the downside is that it is accurate only when the profile input set is representative and that too on the behavior of the dynamic branch.

Program-based static branch prediction:

This type of static branch prediction is based on program analysis heuristics to find out the probable direction. Some of the heuristics used are:

Opcode heuristic that predicts negative integers that may be the error values in different programs and
Loop heuristic that predicts a loop execution by a branch guarding as taken and compares pointer and FP to predict not equal.

The good thing about this type is that it does not need profiling but needs ISA support and compiler analysis. Moreover, the heuristics may not always be good or representative.

Programmer-based static branch prediction:

This type also predicts direction statically by using pragmas in the programming language that meet the criteria of a branch as likely-taken against likely-not-taken.

This type however does not need program analysis or profiling and the branches and their programs may be better known to the programmer than other analysis processes.

The disadvantages of this type are that it creates some burdens on the programmer and needs compiler, programming language and ISA support.

Dynamic branch prediction:

In this type, dynamic information is collected at run-time and used to make predictions based on the execution history of the branches.

The process can adapt to the dynamic changes in the behavior of the branch and does not need input set representativeness or static profiling.

However, this process is more complex and needs additional hardware which changes dynamically.

This process is more accurate than static branch prediction which involves different processes such as:

1-bit branch prediction technique
2-bit branch prediction technique and
Correlating branch prediction technique.

There can be a few more advanced direction prediction techniques that are based on static or dynamic branch prediction.

Compile time is one example of static branch prediction where the assumptions are:

Always not taken
Always taken
Backward Taken, Forward Not Taken or BTFN
Possible direction based on profile and
Possible direction based on program analysis.

An example of advanced dynamic branch prediction is run-time prediction, where the assumptions are:

Single bit last time prediction
Double bit counter based prediction
Two-level global versus local prediction and
Hybrid or alloyed prediction.

A few other specific types of branch predictions are also there such as:

Neural
Overriding
Local
Global
Next line
One level and
Random.

Branch Prediction Algorithm

The branch prediction algorithm or set of rules is based on the earlier iterations of an instruction. If there is a regular pattern detected, the prediction is successful, especially when the branch instruction has the same effect, always.

Different information is collected from different resources such as:

The least significant bits of the earlier fetched branch target address
The Local History Table of shift registers of the last outcome of branches with the least significant bits and
The Local Prediction Table for predicting on the basis of the current state.

The rule may vary for different conditions as follows:

While evaluating a branch, the related state machine is updated and if it is evaluated as not taken, then the state is changed to strongly not taken. On the other hand, if it is determined as taken, then it is changed strongly towards taken.
For a two-bit counter scheme, the conditional jump deviates twice from what it was prior to the prediction changes. The two-level adaptive predictor follows a general rule with the bit history to predict the recurring sequences within a period where all bit sub-sequences are dissimilar.
A two-level adaptive predictor may also be used as a local branch predictor with a different history buffer for every conditional jump instruction. The pattern history table may be different or shared among all conditional jumps.
In alloyed branch prediction, the predictor combines the global and local prediction principles and branch histories. In this technique, a few bits from the Program Counter may also be used.

According to the rule, the function typically returns to its destination following the return instruction of an indirect jump. The target address is read by it from the call stack.

There is a different prediction mechanism in several processors for these return instructions which is based on a return stack buffer that can hold 4-16 entries and acts as the local mirror of the call stack.

How Does a Branch Predictor Work?

The working principle of branch prediction involves determining the address of a branch instruction well in advance and with little context.

Apart from the address of the present instruction and past history, very little data is available to the predictor and everything happens before the decoder pipeline phase.

The working process is as follows:

The branch predictor speculates on the likely condition to meet while processing a conditional operation for example an if…else statement
The necessary operations for the possible result are then carried out well in advance
The Branch Target Buffer or BTB supervises the branch instructions while sitting at the side of the Decode Instruction stage of the pipelines and the monitor
BTB executes a lookup in the cache using its source memory when the branch instruction enters the pipeline for the first time
Usually it is a miss since it is the first time and therefore the BTB guesses that the branch will not be taken even if it is an unconditional jump instruction
The branch is taken or not taken when the instruction reaches the Execution Unit or EU
The following instruction to be carried out is fetched from the target address of the branch if it is taken and a sequential fetch is carried out on the instructions if it is not taken
The EU sends feedback of the branch being taken for the first time to the branch prediction
The target address of the branch is also sent back and recorded in the BTB
An entry is made in the directory with the source memory address and finally
The history bit is recorded as strongly taken.

Static Vs Dynamic Branch Prediction

Static branch prediction provides a fixed prediction of a branch but dynamic branch prediction does it on the basis of the previous history
Static branch prediction is done statically, while dynamic branch prediction is done on the dynamically changing hardware underlying
The static branch prediction is less accurate in comparison to the dynamic branch prediction
Static branch prediction is simpler as compared to dynamic branch prediction and
Static branch prediction is done on the branch instruction while dynamic branch prediction is done on the executed codes.

Questions & Answers:

Why is Branch Prediction Used?

The main purpose of branch prediction is to predict the subsequent fetch address that will be used in the subsequent cycle. This helps in improving the flow within the instruction pipeline so that the processor achieves a high performance level.

Is Branch Prediction Still Used?

Yes, it is, but most of the powerful CPUs today depend extensively on dynamic branch prediction rather than static branch prediction.

This helps them to avoid the potential frontend bubble setbacks with accurate prediction of the correct address of the following instructions. This is applicable even for those specific branches that are not executed or completely decoded yet.

What is Branch Prediction Buffer and Branch Prediction?

Branch prediction buffers do not offer the value of the target PC but simply contain the prediction of whether or not the ensuing branch will be taken. The value of the target PC is offered by the BTB.

Branch prediction is done by the Branch Prediction Units in the CPUs to bring down the cost by predicting the future of the branch target and predicting it before it is decoded or executed.

How Can Branch Prediction be Improved?

If it is a high-level language, the best thing to do is to get rid of the branches in order to improve branch prediction. This can be done by stating the problem in terms of arithmetic or lookups.

This will make a large amount of history available on the branches left over and help it work better. Improvements to the bottleneck code can be made in this way.

What Happens if Branch Prediction is Wrong?

If the branch prediction is wrong, there will be a delay in the process because it will be considered that no predictions were made at all. In that case, everything done by the executed instructions speculatively has to be undone.

The pipeline has to be cleared so that the instructions needed to be executed can be resent.

What is Indirect Branch Prediction?

Indirect branch prediction refers to the control mechanism by which a barrier is established. This prevents the software that was executed before the barrier was created from controlling the envisaged targets of the indirect branches executed after the creation of it on the similar logical processor.

How Does 2-bit Branch Prediction Work?

In a two-bit branch prediction process, the prediction is altered by the branch predictor into only two consecutive mispredictions. These two bits are then retained in the prediction buffer.

There are usually four different states involved in this process. Two of them correspond to the taken state and the other two to the not-taken state.

How Many Cycles are there in a Branch?

Typically, there is only one cycle of instruction in a branch when the pipeline reloads for the target instruction. Also, the non-taken branches have a single cycle in total.

If there is a link operation along with it, there can be three cycles to complete the process. However, on modern processors, considering all conditional and unconditional branches, there can be one to twenty cycles.

Conclusion

Branch prediction is very important for a CPU to perform well.

It is different from branch target prediction which guesses the target of an unconditional or conditional jump taken before it is figured out by decoding and implementing the instruction itself.