ILP

What is ILP (Instruction Level Parallelism)?

ILP or Instruction Level Parallelism refers to performing a number of operations in a program simultaneously.

This design standard is typically used in modern software as well as hardware compilation in several fields such as scientific and graphics applications.

Understanding ILP (Instruction Level Parallelism)

What is ILP (Instruction Level Parallelism)

Instruction Level Parallelism, just as the name suggests, refers to handling several instructions in parallel.

In this process, there are multiple integer adders and a control that allows simultaneous access to the execution hardware available and may even arrange it.

However, ILP is not the same as concurrency. This is because there is a single specific thread of execution in ILP for a process.

On the other hand, in concurrency, multiple threads are assigned to the core of the CPU in a strict alternation.

It may also be done in a truly parallel manner, provided there are enough CPU cores available, wherein one core will be assigned to each executable thread.

There are ideally two levels at which ILP works namely, hardware and software. In the hardware level, it works on dynamic parallelism, but in the software level, it works on static parallelism.

This means that at the hardware level, the processor decides which particular instructions are to be executed in parallel during the run time.

On the other hand, at the software level, it is the compiler that decides it.

The amount of Instruction Level Parallelism existing in a program will, however, depend on the particular type of application.

For example, it can be very large in the fields of scientific and graphics computing and lower in other workloads such as cryptography.

There are different micro-architectural techniques used for exploiting the benefits of ILP such as:

Instruction pipelining – This is where the multiple instructions are partially overlapped during execution.
Superscalar execution – This includes VLIW or Very Long Instruction Word and explicitly parallel and closely related instruction computing concepts. Here, multiple instructions are executed in parallel by using multiple execution units.
Out-of-order execution – This is where the instructions are executed in any order as long as it does not infringe on data dependencies. This particular process is independent of both superscalar execution and pipelining.
Register renaming – This process evades unnecessary serialization of the operations of a program that may be imposed due to reusing the registers by the operations to facilitate out-of-order execution.
Branch prediction – This technique is used to circumvent stalling of the processor due to control dependencies that are required to be resolved. Often, this type of prediction is used with speculative execution.
Speculative execution – This process allows executing parts of instructions or all of them without knowing for sure whether or not such an execution would take place. It is typically based on predictions which can be memory dependence prediction, value prediction, or cache latency prediction.

In order to extract the available ILP in programs, there are a few specific optimization techniques that must be followed which include:

Instruction scheduling
Memory access optimization
Register allocation.

There can be different architectures of Instruction Level Parallelism such as:

Sequential architecture – This is where a program is not likely to pass on any information about parallelism to hardware, such as in a superscalar architecture.

Dependence architecture – This is where the information about dependences among operations are mentioned explicitly by a program, such as in dataflow architecture.

Independence architecture – This is where the information about the operations that are independent of each other is given by the program so that they can be carried out instead of ‘no operation’ or ‘nop.’

In order to apply and achieve ILP, hardware and the compiler need to first determine the independent operations and data dependencies.

It is also needed to schedule these independent operations and the tasks of the functional unit and the register to store data.

Instruction Level Parallelism Examples

Dataflow architecture, VLIWs, and superscalar architecture are few examples where ILP is specified explicitly.

As for different operations, an example of Instruction Level Parallelism could be carrying out four operations in a single clock cycle or three operations such as a = 1 + 2, b = 3 + 4, and c = a + b, all performed in parallel.

When four operations are performed, there will be four functional units and each of them will be linked to one of the operations, a common register file and branch unit ILP execution hardware.

The sub-operations, on the other hand, such as integer ALU, floating point operations, integer multiplication, load, and store may be performed by the functional units.

And, in the second case, the line 3, which is c = a + b, cannot be computed unless the first two lines are calculated.

However, the second line does not depend on the first line to be calculated to produce its results, and vice versa,

It is for this reason these two lines can be calculated in parallel or at the same time, theoretically.

What is an ILP Wall?

ILP wall refers to the parallelism wall that the computers have hit. Technically it means a deeper instruction pipeline and digging into a much deeper power hole.

This is ideally an effect with respect to Moore’s Law which states that the CPU performance needs to be doubled in about two years.

However, it is taking much more time than that due to the ILP wall.

Solving the issues due to the ILP wall is getting harder as well, and there are several reasons for it. These are:

The cost of the chips is quite high. This makes building new architectures in hardware pretty expensive and dedicating new transistors to the ILP mechanism less cost-effective.
The compilers that are expected to be more flexible than the architecture of the microprocessor are getting increasingly slower in getting new ideas into production codes. By that time, the processor can complete 4 or 5 iterations.
With the pipelines becoming deeper, the problems inherent to it tend to get even worse.
The higher level of complexity of ILP effectively lowers the speed of processing for any given frequency due to specific issues such as misprediction.
It is too risky to use more aggressive ILP technologies because there may be real-world workloads that may be unknown.
There are also very little or no new or ground-breaking ideas regarding this subject available.

All these factors create a crisis of sorts because the programmers do not know how exactly they could program a microprocessor really fast.

Is Pipelining Instruction Level Parallelism?

Yes. Normally, pipelining works by executing several instructions sequentially and independent tasks are done in parallel with an aim to expedite the throughput or the amount of work done in a unit time.

Ideally, this type of parallelism is known as Instruction Level Parallelism.

However, it is not easy for ILP to come by even in simple pipelining due to the different hazards and physical limitations associated with it with respect to the nature of the instruction stream to be executed.

Typically, these hazards and physical limitations may prevent any specific stage of pipelining from being carried out properly to meet its purpose. Ideally, there are three major hazards with pipelining.

Structural hazards:

When different instructions vie for the same CPU resources, a structural hazard may occur.

For example, when a register file, which has only a single write port, generates two writes for a particular reason in a single cycle to the register file, one of the two pipelining stages will have to wait.

Data hazards:

When one instruction in the pipeline relies on the data of another instruction in the same pipeline, it may result in data hazards. Without this requirement being met, it will not be written back to the register file.

Control hazards:

If the control flow transfer instruction has to depend on the results that are yet to come, it may result in a control hazard. For example, every conditional branch is typically associated with a control flow hazard.

This is due to the unavailability of the condition in time to fetch the subsequent instruction from the correct location.

However, it is not impossible to squeeze out a bit more of ILP from the instruction stream. There are a few specific ways in which you can do so such as speculation and instruction reordering.

For example, structural hazards can often be resolved by using more hardware while working on a specific problem. However, it involves penalties such as more complexity, an increased gate count, and a probable delay.

Data hazards may be resolved in several ways, of which forwarding or bypassing is one of the more commonly followed techniques, while control hazards are pretty difficult to overcome and require thorough technical knowhow.

Limitations of Instruction Level Parallelism

An ideal processor with no limitations is required to run the instruction stream
Difficult to determine how one instruction depends on another
Hard to figure out the amount of parallelism existing in a program and how it can be exploited
Needs to deal with different dependencies such as data dependencies, name dependencies, and control dependencies

Conclusion

Instruction Level Parallelism is a very useful feature included, which facilitates parallel execution of instructions in a computer program.

The primary objective of such simultaneous or parallel execution of a set of instructions in a computer program is to enhance the overall performance level and speed of the system.