Branch Prediction

What is Branch Prediction?

Branch prediction in a CPU design refers to the process which guesses the result of a conditional function and organizes the most expected result. This action is performed by a digital circuit called the branch predictor.

Also known as branch predication or simply predication, this is an important process that reduces the cost of branching and expedites branch instruction processing by using the pipelining of the CPU.

Understanding Branch Prediction

What is Branch Prediction

Branch prediction process helps the CPU in executing instructions using pipelining.

The process is completed faster because the CPU does not have to wait for a conditional jump to be carried out which slows down the process significantly.

The instructions are broken down into predicates by using branch prediction which is similar to predicate logic and is executed only if it is true.

This helps the processor work more efficiently. This is an example of conditional logic with a branch predictor.

The significance of branch prediction for the efficient functioning of the CPU is quite profound because no work needs to be done when the pipeline stages reload.

The gain produced by it is reduced by the program transfer instructions present and the sequence is changed which causes all of the instructions entering the pipeline invalid to become after program transfer.

Branch prediction logic helps eliminate these problems in the pipeline.

Branch prediction may or may not be taken depending on the conditions such as:

At the fetch stage of branch prediction, there are mainly three specific things that are to be ascertained namely:

If there is a conditional jump instruction, usually a two-way branching is implemented with it. This jump may be taken or not taken.

If it is taken, then it will move or ‘jump’ to a different position in the program memory.

If it is not taken, then it will continue being executed instantly after the conditional jump.

However, it is not always confirmed whether a conditional jump is taken or not taken until the calculation of the condition is made and the conditional jump passes the execution phase in the instruction pipeline.

Branch prediction is typically followed by using a 4-way set associative cache with 256 entries.

The process is called the Branch Target Buffer or BTB where there is a directory entry made for each line which comprises:

Branch instructions are normally fetched from the source memory, and the branch target address is stored in the consequent data entry in BTB, if only the directory entry is valid.

Branch Prediction Example

A branch prediction, for example, identifies whether or not a conditional jump is taken such as in a testing code, performance predictors, tournament predictors and creating scoreboard architecture.

Testing code:

While testing C++ code, a conditional branch is used on unsorted and sorted data. The branch is replaced later on with a bitwise operation. The results of the testing code are then analyzed with no compiler optimization to find the following:

However, the exact time may vary based on the infrastructure of the hardware but the comparative value will remain the same.

Read Also:  Why Does Number of Cores Affect CPU Performance?

Tournament predictor:

This involves hybrid local-global predictors and combines them to find the right predictor.

Here, typically, the local predictor makes a prediction depending on the outcome of a branch for its last ten executions and the global predictor uses 4K entries.

Performance predictors:

This involves making correlated and local 2-bit predictions depending on the total prediction size and the conditional branch misprediction rate.

Scoreboard architecture:

This helps in detecting issues and execution of instructions along with the hazards involved and writing of the results.

There are different tables in it to keep track of the progress and the related intelligence of the execution which helps in finding out the time to dispatch instructions.

Here, every functional unit is linked to a single buffer in the wait queue.

Branch Prediction in Pipelining

There are several things that can create issues in the pipeline due to structural, data and control hazard and branch prediction can help resolve the issue in several ways depending on the condition.

The specific issues that can be caused are:

In such situations, branch prediction can predict never taken or always taken.

If it is predicted never taken, the assumptions would be as follows:

If it is predicted as always taken, then the assumptions would be as follows:

In a different scenario, if it is predicted by opcode, the assumptions will be as follows:

Predictions will be different in the case of a taken-not taken switch involving 1 bit branch predictor depending on the earlier history. In such situations, prediction can be as follows:

It will be considered good for loops and it may also use 1 bit to specify the earlier result and store this bit with every branch instruction. More bits may also be used for a more detailed history.

Types of Branch Prediction

Basically, there are two types of branch prediction techniques namely, static branch prediction technique and dynamic branch prediction technique, along with a few other variants.

Static branch prediction:

Static branch prediction can be general, profile-based, program-based or programmer-based, with different features for each.

As for the general static always not taken branch prediction, the features are:

As for general static always taken branch prediction, the features are:

However, for BTFN, the backward loop is predicted as taken and others as not taken.

Profile-based static branch prediction:

In this type, the compiler finds out the probable direction of each branch by using profile run and encoding it in the branch instruction format as a hint bit. The advantage of this type is that you get predictions for each branch.

However, the downside is that it is accurate only when the profile input set is representative and that too on the behavior of the dynamic branch.

Program-based static branch prediction:

This type of static branch prediction is based on program analysis heuristics to find out the probable direction. Some of the heuristics used are:

The good thing about this type is that it does not need profiling but needs ISA support and compiler analysis. Moreover, the heuristics may not always be good or representative.

Programmer-based static branch prediction:

This type also predicts direction statically by using pragmas in the programming language that meet the criteria of a branch as likely-taken against likely-not-taken.

This type however does not need program analysis or profiling and the branches and their programs may be better known to the programmer than other analysis processes.

The disadvantages of this type are that it creates some burdens on the programmer and needs compiler, programming language and ISA support.

Dynamic branch prediction:

In this type, dynamic information is collected at run-time and used to make predictions based on the execution history of the branches.

The process can adapt to the dynamic changes in the behavior of the branch and does not need input set representativeness or static profiling.

However, this process is more complex and needs additional hardware which changes dynamically.

This process is more accurate than static branch prediction which involves different processes such as:

There can be a few more advanced direction prediction techniques that are based on static or dynamic branch prediction.

Compile time is one example of static branch prediction where the assumptions are:

An example of advanced dynamic branch prediction is run-time prediction, where the assumptions are:

A few other specific types of branch predictions are also there such as:

Branch Prediction Algorithm

The branch prediction algorithm or set of rules is based on the earlier iterations of an instruction. If there is a regular pattern detected, the prediction is successful, especially when the branch instruction has the same effect, always.

Different information is collected from different resources such as:

The rule may vary for different conditions as follows:

According to the rule, the function typically returns to its destination following the return instruction of an indirect jump. The target address is read by it from the call stack.

There is a different prediction mechanism in several processors for these return instructions which is based on a return stack buffer that can hold 4-16 entries and acts as the local mirror of the call stack.

How Does a Branch Predictor Work?

The working principle of branch prediction involves determining the address of a branch instruction well in advance and with little context.

Apart from the address of the present instruction and past history, very little data is available to the predictor and everything happens before the decoder pipeline phase.

The working process is as follows:

Read Also:  What is TPM (Trusted Platform Module)? (Explained)

Static Vs Dynamic Branch Prediction

Questions & Answers:

Why is Branch Prediction Used?

The main purpose of branch prediction is to predict the subsequent fetch address that will be used in the subsequent cycle. This helps in improving the flow within the instruction pipeline so that the processor achieves a high performance level.

Is Branch Prediction Still Used?

Yes, it is, but most of the powerful CPUs today depend extensively on dynamic branch prediction rather than static branch prediction.

This helps them to avoid the potential frontend bubble setbacks with accurate prediction of the correct address of the following instructions. This is applicable even for those specific branches that are not executed or completely decoded yet.

What is Branch Prediction Buffer and Branch Prediction?

Branch prediction buffers do not offer the value of the target PC but simply contain the prediction of whether or not the ensuing branch will be taken. The value of the target PC is offered by the BTB.

Branch prediction is done by the Branch Prediction Units in the CPUs to bring down the cost by predicting the future of the branch target and predicting it before it is decoded or executed.

How Can Branch Prediction be Improved?

If it is a high-level language, the best thing to do is to get rid of the branches in order to improve branch prediction. This can be done by stating the problem in terms of arithmetic or lookups.

This will make a large amount of history available on the branches left over and help it work better. Improvements to the bottleneck code can be made in this way.

What Happens if Branch Prediction is Wrong?

If the branch prediction is wrong, there will be a delay in the process because it will be considered that no predictions were made at all. In that case, everything done by the executed instructions speculatively has to be undone.

The pipeline has to be cleared so that the instructions needed to be executed can be resent.

What is Indirect Branch Prediction?

Indirect branch prediction refers to the control mechanism by which a barrier is established. This prevents the software that was executed before the barrier was created from controlling the envisaged targets of the indirect branches executed after the creation of it on the similar logical processor.

How Does 2-bit Branch Prediction Work?

In a two-bit branch prediction process, the prediction is altered by the branch predictor into only two consecutive mispredictions. These two bits are then retained in the prediction buffer.

There are usually four different states involved in this process. Two of them correspond to the taken state and the other two to the not-taken state.

How Many Cycles are there in a Branch?

Typically, there is only one cycle of instruction in a branch when the pipeline reloads for the target instruction. Also, the non-taken branches have a single cycle in total.

If there is a link operation along with it, there can be three cycles to complete the process. However, on modern processors, considering all conditional and unconditional branches, there can be one to twenty cycles.

Conclusion

Branch prediction is very important for a CPU to perform well.

It is different from branch target prediction which guesses the target of an unconditional or conditional jump taken before it is figured out by decoding and implementing the instruction itself.