Processors (CPU)

What is Out-of-Order Execution? (Explained)

December 10, 2022

In This Article

What is Out-of-Order Execution?

Out-of-order execution refers to a specific type of processing method followed by the modern CPUs where the instructions in a program are executed not in the order of their appearance but ‘out of order.’

These instructions are typically executed based on the availability of data for it. This specific system reduces the chances of wasted clock cycles. This is because it allows starting processing of an instruction when others are experiencing a delay.

KEY TAKEAWAYS

Out-of-order execution executes instructions out of order mainly, but the architecture may still decode them and retire them in the actual order of appearance in the program.
These executions typically happen in the multiprocessors since it helps them to regulate miss rate and miss penalty.
When the processor performs out-of-order executions it reduces idle time and wasting of clock cycles since independent instructions are executed during these times.
Out-of-order execution occurs due to ILP or Instruction Level Parallelism and does not create any visible conflicts since the design interlocks that help in detecting potential hazards and handling them.
During such operations, the instruction waits in a queue until the desired operands are available and then leaves the queue.

Understanding Out-of-Order Execution

Out-of-order execution, or OoOE, refers to just what the name signifies – to execute instructions out of order of appearance in a program.

This method is followed by high-performance microprocessors. The set of instructions waits in the queue for the right operand to be executed.

They start processing an instruction as soon as the desired operand is ready and available.

When the right operand is available, these pending instructions leave the queue before the older instructions and are sent to the correct functional unit to be executed.

The results of these instructions executed out of order are typically queued as well in temporary locations to be arranged later on in the register file.

The main objective of out-of-order execution is to reduce wait times in the CPU for the older instructions to be completed first to start the new one.

This helps it to avoid chances of stalls and wasted clock cycles due to the unavailability of data to perform an operation on an instruction in order.

Ideally, to work, a CPU has to follow two specific rules such as:

Decoding the instructions in order of being sent by the program to find out what it is asking the CPU to do and
Retiring them or writing the result of the operation to the memory or disk in the same order as it is sent for execution.

The out-of-order process, as well as the in-order technique adheres to these two rules, but in the case of the out-of-order execution technique, the cache miss conditions are not as dramatic as it is with the in-order execution technique.

Out-of-order executions work by using an instruction window that contains all of the decoded instructions in the actual order.

A record is maintained to ensure that the results of these instructions are retired just in the same order as each of them is decoded by the CPU.

In addition to that, there is also a scheduling window. This is where the reordering of instructions that takes place is maintained.

It contains logic that marks dependent and independent instructions.

This window sends all independent instructions to the execution units and waits for the dependent instructions to be available for execution.

Out-of-Order Execution Pipeline

The out-of-order execution queue prevents data hazards and pipeline stalls. This is a technique that allows following instructions to execute in Superscalar pipelining.

This specific type of pipelining instructions is an aggressive technique to maximize the final throughput.

In such a process, the processor is equipped with several units, known as processing units, which help it to handle several instructions in parallel at every stage of the pipeline.

In this pipeline, every instruction is typically divided into a series of steps that are good for being carried out in parallel.

In simple words, the fundamental instruction cycle is divided into a particular set of instructions in a pipeline.

For example, if it is a five-stage pipeline, the different stages of pipelining could be as follows:

IF – Instruction Fetch
ID – Instruction Decode
MEM – Memory Read
EX – Execute
WB – Register write back

However, the number of dependent steps may vary according to the architecture of the machine.

Nonetheless, if the processor can fetch instructions at every clock cycle, it is supposed to be fully pipelined.

And, when a CPU is properly and fully pipelined, it does not wait for delays in instructions getting data and being executed, so it can avoid situations that may be problematic.

It is the out-of-order execution technique that saves them from facing such issues since instructions that are not dependent on the current ones are executed before them without causing any data hazards.

In-order Vs Out-of-Order

In an in-order execution technique, instructions are executed in a sequential order, but in an out-of-order execution, the instructions are executed in a non-sequential order.
In an in-order technique, the next instruction waits in the queue until the current instruction is completed, but in out-of-order execution, it will be executed provided the subsequent instruction is not dependent on the results of earlier instructions.
The in-order technique typically has slower execution speed, but out-of-order execution is pretty fast.
In-order technique is not free from memory latency since potentially expensive reads are not moved away from the memory as far as possible from the point of data usage, but out-of-order execution is not, which makes the codes run faster.
In-order implementations are less complicated in comparison to the out-of-order executions, which need to maintain additional bookkeeping for instructions sent out of order and the results of them obtained.
In-order implementations are less power hungry in comparison to out-of-order implementations.
The instructions in the in-order execution process are fetched, executed and completed in an order generated by the compiler, but in out-of-order execution, the instructions are only fetched in the order generated by the compiler.
In an in-order execution process, if one instruction stalls or there is a delay, all other instructions are stalled, but in an out-of-order execution that is not to happen.
In in-order execution, all the instructions are scheduled statically, but in out-of-order execution, these are scheduled dynamically.
In an in-order implementation, instructions are executed only in order, but in an out-of-order execution, instructions can be executed either in order or out of order depending on the current conditions.

Out-of-Order Execution Examples

One significant example of out-of-order execution is software pipelining where the reordering is done by the compiler and not by the processor.

The POWER1 microprocessor of IBM introduced in 1990, most modern CPUs, and IBM PowerPC processors that use a centralized queue are also good examples of out-of-order execution processors.

Where are the Results of the Out-of-Order Mode of Execution Placed?

The results of the out-of-order executions that are performed before the older operations of a program are stored in temporary locations initially.

These are arranged and sent to the desired or permanent locations at a later stage when the older instructions are executed and their respective results are written back to the register file.

This is called the retirement or graduation stage.

Conclusion

There are more positives than downsides to out-of-order executions, and therefore most modern x86 CPUs are made up of out-of-order cores.

It allows dynamic reordering of the instructions, reduces memory latencies, CPU idle times, and cache misses, and offers higher clock speeds with fewer wasted clock cycles.