What is Superscalar?

Superscalar refers to a microprocessor architecture that was introduced with the Intel Pentium processors. This specific type of CPU architecture has the ability to handle more than one instruction per clock cycle.

Technically, the design of these processors consists of a set of mechanisms, which allow the Central Processing Unit, or the CPU, of a computer to organize and manage the output of the multiple instructions executed in a single cycle in a sequential manner.

Understanding Superscalar

What is Superscalar

A superscalar processor implements a kind of parallelism on processing data which is called Instruction Level Parallelism or ILP in a single processor.

However, this needs analyzing the instructions to be carried out and the use of different execution units to carry out these instructions.

By handling more than one instruction in a clock cycle simultaneously, a superscalar processor can dispatch several instructions to different execution units available on the processor.

The design of the superscalar processor typically emphasizes on enhancing the accuracy of the instruction dispatcher by keeping several execution units busy at all times.

The early designs of such processors had two ALUs or Arithmetic Logic Units and one FPU or Floating Point Unit.

However, the later designs included two more ALUs, one more FPU, and two additional SIMD or Single Instruction Multiple Data units. Typically, these CPUs maintain a steady execution rate.

However, merely processing more than one instruction per machine cycle does not make a superscalar architecture.

This can be achieved even by other architectures such as multiprocessor, or multi-core, or pipelined architectures, of course with different methods.

Ideally, in a superscalar design, the dispatcher needs to read the instructions from the memory and at the same time has to decide which of them are to be executed in parallel.

It also has to dispatch the instructions to several different execution units in the CPU.

This, in simple words, means that a superscalar processor should have several parallel pipelines and each of them should be able to process simultaneous instructions from one instruction thread.

Traditionally, a superscalar design is identified with a few specific characteristics within it such as:

The superscalar processor architecture typically came out in three successive phases such as:

With multi-core in the current microprocessor designs, the superscalar RISC or Reduced Instruction Set Computer processors surfaced according to two special approaches as follows:

The performance improvements gained from the superscalar processors are however dependent on three specific areas such as:

Ideally, a superscalar processor is a combination of a scalar and a vector processor where every instruction may process one data at a time but the multiple executions within it allows handling separate data at the same time.

Superscalar Processor Working Process

The working process of the superscalar processors involves sequential execution of instructions though there is not much of a universal agreement on its implementation of parallel instruction handling.

Typically, it uses specific superscalar design techniques to function, which include:

Along with these techniques, there are also a few other specific types of procedures typically employed to complement the superscalar design. These techniques include:

Ideally, the modes of execution used in a superscalar processor are Out-of-Order Execution and register renaming.

This helps the processor to identify and improve the level of parallelism at the instruction level while handling instructions, which, in turn, allows it to handle a number of them in a clock cycle.

During the operation, the superscalar processor makes the best use of the pipelining technique by using pipeline fetch, pipeline branch prediction logic, and pipeline decode.

Ideally, there are different stages in which a pipelined superscalar processor works. These are:

Superscalar Processor Examples

The first superscalar and commercial microprocessors were the Intel i960CA, introduced in 1989 and the AMD 29050 processor belonging to the 29000-series, introduced in 1990.

However, the IBM mainframe System/360 Model 91 used in 1967 is also an example of a superscalar computer. Apart from that, the P5 Pentium was the first superscalar x86 processor.

Some of the most prominent examples of the approach where a scalar or current RISC line is transferred into a superscalar line include:

And, a few examples of the other approach of microprocessor design where there is a completely new architecture include the RS/6000 processor announced by IBM in 1990, which was later renamed a Power1.

IBM also has some PowerPC superscalar processors such as:

The list of superscalar processors also includes a few CPUs belonging to the MIPS R series, and it includes:

Some other examples of superscalar processors, in chronological order, are:

Most of the out-of-order CPUs are considered to be superscalar processors by nature.

Also, most of the RISC line processors are also considered to be superscalar CPUs, depending on the two specific approaches to the design of the microprocessors.

Read Also:  What is Virtual Core? (Explained)

Also, the modern x86 processors are superscalar CPUs. They perform out-of-order executions.

The ARM Cortex-R52 CPUs also belong to the in-order, mid-performance, superscalar processor family. These are basically used in industrial and automotive applications.

What are the Superscalar Processor Design Characteristics?

The main characteristic of a superscalar processor design is the ability to execute multiple instructions in a clock cycle by establishing parallelism in the processor at the instruction level.

It is super-pipelined so that there is no waiting state when independent instructions are executed in sequence with the help of multiple processing units.

It is the Instruction Set Architecture and its implementation techniques that make these processors different from others, both in terms of hardware and software.

It acts as an abstraction between the programs and hardware implementations.

Some of the notable characteristics of it are:

The ISA itself acts like a specification for the hardware developers.

Different types of operation instructions are handled by the superscalar processors such as:

Since the performance of a processor is typically measured in CPI or Cycles Per Instruction, the superscalar processors also have the ability to reduce the instruction count by following two specific techniques as follows:

Branch prediction is an important aspect of the performance of the superscalar processors.

It is quite easily predictable by these processors due to its characteristics, features and different techniques applied such as:

And the renaming of the register is controlled by the scheduler and the reorder buffer. This eliminates the chances of any false data dependencies or data redundancy.

This may be a result of reusing the architectural registers by the following instructions that may not have any actual data dependencies between them.

Is Superscalar Multicore?

The superscalar processors can be either single core or multicore but the fact that they have only one instruction counter differentiates them from a multicore processor.

Therefore, you can keep track of various instructions in the process, but all of them are from a single program.

Ideally, in a multicore processor, several instruction streams may be executed simultaneously but the important thing is that each of the cores of the CPU has its own separate instruction counter to execute, which can also be superscalar.

This means that every single process can be executed more quickly, a trait which is customary to the superscalar processors.

How Do Superscalar Processors Exploit Parallelism?

Parallelism is exploited by the superscalar processors by fetching and executing several instructions simultaneously which reduces the clock cycles for each instruction.

Instruction Level Parallelism in superscalar processor is exploited in two specific ways as follows:

It is the improvements made in the architecture of the superscalar processors that help them to exploit more parallelism and in a much better way as well.

Ideally, this specific architecture allows pipelined execution of instructions, which is a necessity for parallel processing, so that there are no delays in the process due to waiting for the previous process to be completed in order to start working on the next one.

The architecture also allows Out-of-Order Execution or OoOE and extracting ILP more dynamically from the scalar instruction stream by a superscalar machine, which also augments their effort to achieve more parallelism.

Is Superscalar SIMD?

The answer to this question is not simple. In fact, it is quite confusing. According to Flynn’s taxonomy, if there is only one core in the superscalar processor and if it can execute short vector operations, it can be considered a SIMD processor.

However, Flynn’s taxonomy is based on the number of data and instruction streams.

Now the confusion lies in the fact that a superscalar processor can handle a number of instructions at a time. Therefore, it can very well be a MIMD or Multiple Instruction Multiple Data processor as well.

Moreover, the superscalar processors use only one instruction for multiple data, and therefore they are quite similar to SISD or Single Instruction Single Data CPUs as well.

However, based on the fact that it supports short vector instructions and uses one instruction and data stream, the general concept of a multiple data vector processor does not apply to it.

The pipelined architecture does not add to the number of instruction streams that are processed simultaneously, and the single stream merely flows through a channel which is just longer, as it were.

So, it is quite similar to SIMD architecture.



Scalar vs Superscalar

What is Superscalar Implementation?

The most common implementation of the superscalar processor architecture is in common instructions such as loads and stores, integer and floating-point arithmetic, and conditional branches.

All these instructions can be implemented simultaneously and initiated and executed independently.

What Happens to a Superscalar Processor without Multithreading Support?

If there is no multithreading possibility in a superscalar processor, there can be several issues of instructions that can be rendered useless.

This is because the absence of multithreading will reduce parallelism in each thread.

And, in the case of a Level 3 cache miss or any other similar type of long-lasting stall, it will limit the exploitation of CPU resources and may even freeze the processor completely.


A superscalar processor uses parallelism and a better architecture than a scalar processor to perform faster, being able to handle multiple instructions at a time.

The features and functionalities of superscalar make these processors more powerful and efficient in managing the instructions to produce higher output.