What is Arithmetic Logic Unit (ALU?
Just as the name indicates, the ALU or Arithmetic Logic Unit of the CPU executes the different arithmetic and logic operations that the CPU has to perform on different operands in specific instruction words.
In a few specific processors the Arithmetic Logic Unit may be divided into two separate units namely the Arithmetic Unit or the AU and the Logic Unit or LU.
- ALU in the CPU performs all arithmetic and logic operations depending on the instructions and operands.
- It consists of two specific parts namely, Arithmetic Unit and Logic Unit and can do a lot of things such as add numbers, subtract numbers, perform shifting and logic operations.
- There are three specific parts of an ALU such as operations logic, sequencing logic and storage registers.
- The ALU works by accessing input and output directly from different parts of the computer system such as the I/O devices, the processor controller or the main memory.
Understanding Arithmetic Logic Unit (ALU)
Typically, in a computer system, or in the CPU to be more precise, the Arithmetic Logic Unit or ALU, which is also known as the Integer Unit or IU, happens to be the main component.
Technically speaking, the ALU or IU is an integrated circuit in the CPU or even in a GPU that is actually an important element that does all the calculation in the processor.
The ALU has the ability to do a lot of things such as add and subtract numbers apart from performing other arithmetic operations as well as logic and shifting operations. This includes Boolean comparisons such as:
- XOR and
Apart from that, mathematical and bitwise operations can also be made on the binary numbers.
The codes and the operands used by the ALU determine which particular operations need to perform on the basis of the input data.
When the processing of the data input is completed by the ALU, the result is then sent to the memory of the computer system.
The concept of ALU is nothing new.
In fact, it was proposed way back in 1945 by the famous mathematician, John von Neumann, in a report on the foundation of the Electronic Discrete Variable Automatic Computer or EDVAC, the new computer then.
The ALU back then was very simple and typically could handle only one data bit at one time.
However, the ALUs presented a much wider word size typically to the programmers.
Also, in 1967, Fairchild 3800 was introduced by Fairchild which was the first ALU put into operation as an Integrated Circuit or IC.
This had an 8-bit ALU with an accumulator.
This was the beginning of the surge of other ALUs with Integrated Circuits including the 4-bit Arithmetic Logic Units namely the Am2901 and 74181. These devices were designed specifically to make the ‘bit slice’ capable.
This design facilitated the ‘carry look ahead’ signals by utilizing the numerous interconnected ALU chips.
This allowed creating an ALU that came with a much wider word size.
This design became popular pretty quickly and soon it was extensively used in the bit-slice minicomputers.
When the microprocessors appeared in the early 1970s, the transistors became much smaller in size and there was not enough space left in the die to install a full size ALU on it.
However, a narrower ALU was used but that needed numerous cycles for each Machine Language instruction.
For example, the popular Zilog Z80 microprocessor came with a 4-bit ALU but could perform 8-bit additions.
However, the geometries of the transistors became smaller over time, according to Moore’s law, and therefore it was reasonable and possible to build bigger ALUs on the microprocessors.
Now, in the age of the modern Integrated Circuit transistors, these are even smaller than the early microprocessors.
This means that these now need highly sophisticated ALUs to be fixed on the ICs.
These complex ALUs of today come with wider word widths, better and more architectural enhancements and useful features such as binary multipliers, barrel shifters and more.
These enhanced designs allow the ALUs to perform in a shorter clock cycle, often in a single one, and complete even those operations in quick time that would have required multiple operations on the traditional ALUs.
With continuous technological developments, the ALUs, which were and are still considered to be mechanical, electronic or electro-mechanical circuits usually are now being attempted to make biological ALUs.
However, as of now, research on these is still being carried out.
The design of the ALU is a vital part of the CPU.
It has changed significantly over time, as said earlier, and newer and better approaches are now being followed in order to speed up instruction handling.
Typically, the engineers design the ALU in such a way that it can perform any given kind of operation.
Still, it becomes costlier as and when the operations become more complicated.
This is because the ALU uses up more space and destroys more heat in the CPU.
That is why the engineers make the CPUs more powerful because it ensures that the CPU itself will be more powerful and faster.
On the contrary, if the Central Processing unit is made more powerful than the ALU, it will consume more energy or power which will produce more heat in turn.
There is no universal standard of the ALU design which is why it may vary from one processor to another in spite of being the main component of every processor.
This eventually offers variable functionality.
For example, some ALUs may be designed to execute integer calculations only while others may carry out floating-point operations only.
As for the working process of the ALU, it involves input and output access directly to the different parts of the computer such as:
- The processor controller
- The input/output devices and
- The main memory of the system.
Here, the input comprises the instruction word, which is also called the machine instruction word more often. This instruction word consists of:
- An opcode or operation code
- One or more operands and
- A format code sometimes.
Each of these has specific functions to perform. For example:
The operation code indicates the type of operation to be performed by the ALU and the operands are used while executing the operations.
For example, it might tell the ALU that the two operands are to be compared logically or simply to be added together.
The format, on the other hand, is typically combined with the operation code.
This usually determines whether it is a fixed-point instruction or a floating-point instruction.
The output, as always, is the result of the operation which is normally transferred and held in the storage register and settings.
This actually shows whether or not the operation was performed properly.
If, however, the operation is not successful, a machine status word, which is a kind of status, will be stored in a permanent location.
However, generally speaking, the ALU comes with its own storage units for storing different items such as:
- The input operands
- The operands that are to be added
- The result accumulated which is specifically stored in the accumulator and
- The shifted results.
Usually, there are gated circuits in the ALU that control the operations performed on the bits as well as the bit flow in the subunits of it.
These gates are usually managed by a sequence logic unit. This specific unit uses a specific sequence or algorithm for controlling every operation code.
Apart from handling addition and subtraction calculations, multiplications and divisions are also done in the Arithmetic Unit by using a set of adding or subtracting as well as shifting operations.
The negative numbers are characterized in different ways.
However, it is good to note that the multiplication operation of two integers is done by the ALU since it is designed that way. The result in such cases is therefore also an integer.
On the other hand, the division operations may not be performed by the ALU in some cases.
This is because the results of a division may be a floating-point number.
Therefore, it is commonly handled by the Floating Point Unit. However, the FPU can also handle other non-integer calculations equally well.
As in the logic unit, one of the 16 probable logic operations is performed such as comparing two given operands in order to find out where the bits do not match.
Functions of the Algorithm
The ALU is responsible for multiple-precision arithmetic which is an algorithm that uses integers to operate.
These integers are however much larger than the word size of the ALU.
In such situations, the algorithm considers every operand as an ordered collection that is in fragments corresponding to the ALU size.
These are arranged in order from the most significant ones to the least significant ones or vice versa.
The algorithm also utilizes the ALU to operate directly on a specific fragment of operand.
This produces a simultaneous fragment, called a ‘partial,’ of the multi-precision outcome.
Each of these partials, when produced, is usually written in a related region of storage.
This region is typically designated for multi-precision results. This procedure is repeated for every operand fragment.
However, while performing an arithmetic operation like addition or subtraction the algorithm begins by summoning an ALU operation on the least significant fragments of the operand.
This helps in generating both a carry-out bit as well as a least significant partial.
This partial is then written by the algorithm to the selected storage but the carry-out bit is stored by the state machine of the processor typically on the ALU status register.
Next, the algorithm proceeds to the next fragment of the operand collection and repeats the process to produce more significant carry-out bits and partials till every operand fragment is processed.
This eventually produces an entire collection of partials in the designated storage which is actually the multi-precision arithmetic result.
However, during the multiple-precision shift operations it is the shift direction that determines the order of operand fragment processing.
For example, in the case of a left-shift operation, the least significant fragments are processed first.
This is due to the fact that the least significant bit of every partial, which is conveyed through the stored carry bit, is needed to be acquired from the most significant bit of the earlier less significant operand that has shifted to the left.
In contrast, in the case of right-shift operations, the most significant fragments of the operands are processed first due to the fact that the most significant bit of every partial is to be obtained from the least significant bit of the previous more significant right-shifted operand.
And, in the case of bitwise logical operations, the operand fragments are processed in any random order.
This is due to the fact that every partial depends on the parallel operand fragments only and the carry bit stored from the earlier ALU operation is disregarded.
The ALU contains a wide range of input and output electrical connections that helps in sending the digital signals to and from the external electronics.
This means that the ALU receives the input signals from the external circuits and responds to it by sending output signals to the external electronics.
There are three parallel buses in the ALU. Out of these two are input operands and one is an output operand.
All these buses in combination handle the number of signals, which are however the same.
Status Input and Output Signals
The status input signals allow accessing further information once the ALU performs an operation in order to complete the task successfully.
In this case, the carry-out stored from the previous ALU operation is called the single ‘carry-in’ bit.
On the other hand, the status output signals provide the results of the operations performed by the ALU.
These are provided as supplemental data as if these were multiple signals.
Typically, and in general, the ALU contains different status signals such as:
- Carry out and more.
The external registers contain the status output signals after every operation is completed by the ALU and are made available as and when required for ALU operations in the future.
Typically, every ALU comes with the following architecture configurations:
- Register Stack
- Register Memory and
- Register to Register.
Here is a brief description of each of them.
- Accumulator – Here the accumulator contains the intermediate results of every operation. This means that the ISA or the Instruction Set Architecture needs to hold only one bit and therefore is not more complex.
- Stack – The latest operations performed are stored on the stack in top-down order. This acts as a small register and they push the old programs when fresh programs are added for execution.
- Register to Register – This is also called the 3-register operation machine because it includes 2 source destinations and 1 destination instruction such as the MIPS component. It uses two operands for input and a third separate component for output.
- Register Stack Architecture – This is the combination of Register and Accumulator where the operations to be performed are pushed to the top. The results are also stored at the top of the stack.
- Register Memory – In this particular architecture, there are two operands. One of these is sent by the register and the other by the external memory. This is one of the most complex architectures where every program is long and is entirely held in the memory space.
The Arithmetic Logic Unit offers several advantages such as:
- Parallel architecture support
- High performance applications
- Coalesce integer and floating-point variables
- Handle large instruction sets
- Offers high range of accuracy
- Combines two arithmetic operations in similar code
- Uniformity and non-interference during operation
- Very fast to provides results
- No sensitivity issues
- No memory wastage
- Less expensive
- Minimal logic gate necessities.
On the other hand, the disadvantages of ALU are:
- More delays with floating variables
- Hard to understand controller design
- Chances of bugs in definite memory space
- Complex circuit for beginners to understand
- Complex pipelining
- Irregularities in latencies
- Rounding off issues which impact precision.
Function of Arithmetic Logic Unit
The ALU actually supports a wide range of basic arithmetic calculations as well as bitwise logic functions as explained below.
The arithmetic functions performed by the ALU include:
- Add – Here two integers A and B are added and the result is displayed at Y and carry-out.
- Add with carry – Here two integers A, B and carry-in are added and the result is displayed at Y and carry-out.
- Subtract – Here A is subtracted from B or vice versa and the result is displayed at Y and carry-out, which is a ‘borrow’ indicator.
- Subtract with borrow – Here A is subtracted from B or vice versa with carry-in or borrow and the result is displayed at Y and carry-out, which is a borrow-out indicator.
- Negate two’s complement – Here A or B is subtracted from zero and the result is displayed at Y.
- Increment – Here A or B is increased by one and the result is displayed at Y.
- Decrement – Here A or B is decreased by one and the result is displayed at Y.
- The ALU performs a pass through function where all bits of A or B appear at Y unmodified. This particular function helps in determining the parity of or to load the operand into the processor register. It is also used to find out whether the operand is zero or negative.
The bitwise logical functions performed by the ALU are:
- AND – Here the bitwise AND of A and B is displayed at Y.
- OR – Here the bitwise OR of A and B is displayed at Y.
- Exclusive-OR – Here the bitwise XOR of A and B is displayed at Y.
- Ones’ complement – Here all bits of A or B are inverted and are displayed at Y.
The ALU also performs some bit shift functions to shift the operand A or B to the right or left depending on the operation code.
The operand that is shifted is displayed at Y.
The operand is shifted only by one bit position by the simple ALUs while the complex ones can shift them by any arbitrary number of bits by using the barrel shifters in a single operation.
Typically, in all types of single-bit shift operations, the shifted out bit of the operand is displayed on carry-out and the value of the bit shifted into the operand is based on the kind of shift. These can be:
- Arithmetic shift – Here the operand is considered as a two’s complement integer. This means that the most important bit is a ‘sign’ bit and is conserved.
- Logical shift – Here a logic zero is shifted to the operand in order to shift integers that are not ‘signed.’
- Rotate – Here the operand is considered as a circular buffer of bits. This means that the least significant as well as the most significant bits are in fact adjacent.
- Rotate through carry – This is where the operand and the carry bit are considered to be circular buffers of bits collectively.
The ALU however cannot perform complex functions, due to several factors such as:
- The higher circuit complication
- The higher cost
- The higher power consumption and
- Bigger size.
These functions are usually performed by the external processor circuitry but by coordinating a series of simpler ALU operations.
In such situations, for example, to calculate the square root of a number, there can be several ways implemented depending on the complexity of the ALU. Some of these ways are:
- Calculating in a single clock, which is very complex and needs a very sophisticated ALU
- Calculating in stages by using pipeline and a collection of simple ALUs arranged just like a factory production line and the intermediate results passing through them and
- Iterative calculation where a simple ALU calculates the value in several steps that are overseen and directed by the Control Unit.
The square root of the number will be calculated in any case but it is all about fastest transition, cost, and time.
Components of ALU
Typically, the Arithmetic Logic Unit comes with three major components or functional parts such as:
- The storage registers
- Operations logic and
- Sequencing logic.
However, if you look at it closely there are also several other components in an ALU that helps it in its operations and it is good to know all about them. Here are all of them explained for you.
These are considered to be the building blocks of the ALU and are typically made up of specific components such as:
- Resistors and
In an Integrated Circuit, these gates represent input ON with a binary number 1 and OFF by binary number 0. There are different types of logic gates such as:
- OR gate – These gates can typically handle two or more inputs but the output is always 1 provided any of the inputs is 0 and 1 and if all the inputs are false. The OR gate can also perform the addition function on all operands in an instruction and is usually expressed as X=A+B or X=A+B+C.
- AND gate – This particular gate can also take on two or more inputs. The output is also always 1 but only when all the inputs are 1. This means that the AND gate will display 0 results if in the given data any one of the inputs is 0. The AND gate usually performs multiplication operations on all given input operands and is expressed as X=A.B or X=A.B.C and is characterized by the ‘.’ symbol.
- NOT gate – This particular gate is used to reverse the Boolean state from 1 to 0 and 0 to 1. In order to reverse the result of gates, the NOT gate is also used along with ‘AND’ and ‘OR’ gate. However, at that time, this gate is represented as a small circle in front of both the gates. When the NOT gate is used, the AND gate changes into NAND or the OR gate changes into NOR.
The ALU also uses a wide range of bits which forms a part of its design. These are as follows:
- Auxiliary bit – This signifies two numbers which need to be added if there is a carry in the beginning of the higher bit.
- Carry bit – This indicates the most significant carry or borrow bit by adding two numbers or by subtracting a larger number than a smaller number.
- Sign bit – This is a most significant bit to show in a two’s complement that the result is positive or negative and is also referred to as negative bit. After adding the last most significant bit if the final carry is 1 over here it is ignored and the result is positive. On the other hand, if there is no carry over, it will be negative for the two’s complement and this negative bit will be set as 1.
- Overflow bit – This specific bit is used to specify whether or not a stack is an overflow after the instruction has been processed. If it is set to 1 then it signifies that the stack is overflow and not if it is set to 0.
- Parity bit – This bit represents the even or odd set of 1 bit in a given string. It is usually utilized as an error detection code. There are usually two types of parity bits such as even parity bit and odd parity bit. The number of occurrences of 1s in the string is counted in an even parity bit. If it is odd then an even parity bit is added to make it even. But, if the number is even, then the even parity bit is 0.
These are crucial components in an Arithmetic Logic Unit which allows storing all instruction, intermediate data, input and output.
These registers are typically built on the CPU and are quite small in size.
The main function of the registers in an ALU is to store the intermediate data that is processed.
There are a number of different types of registers used for definite purpose by the Arithmetic Logic Unit out of which four are general purpose registers.
All of these four registers are 16-bit registers which means that these can store up to 16 bit of data.
The different types of registers may have different names. Here they are with a brief description against each:
- Accumulator – Placed inside the ALU, this is a 16 bit general purpose register by default. This means that when any operand in the instruction set does not mention any specific register to store it, then it will be stored automatically in the accumulator. The accumulator is usually used as two distinct registers of 7 bits and it uses MBR or the Master Boot Record to deal with the memory.
- Program Counter – Often denoted as PC, the program counter is another 16-bit register in the ALU that calculates the number of instructions remaining to be carried out. It is also referred to as Instruction Pointer register since it functions as the pointer for instructions. It typically stores the address of the following instruction to be carried out. This register points to the address of the following instruction when it is fetched from it and gets incremented automatically by one.
- Flag register – This is also referred to as the Status register or Program Status register and usually stores the Boolean value of the status word that is used during the operation.
- Memory address register – This holds the memory address where the data is stored. The CPU accesses the register to fetch that address and acquires the data from there. It is also used in the same way while writing the data into the memory.
- Data register – These registers are also known as memory data registers and store the instruction or content that are fetched from the memory location for reading and writing. This is also a 16-bit register from where the instructions move to the instruction register and the data content to the accumulator for further operation.
- Instruction register – This specific register stores the instruction to be carried out. This is also a 16-bit register that comes with two specific fields namely operand and operation code. Usually the program counter stores the address of the instructions to be carried out. When it is fetched it is incremented by 1 and the program counter holds the address of the following instructions. In such a situation it is the instruction register that holds the address of the instruction in progress.
- Input/output register – Just as the name signifies, the input register stores the input received from the input devices and the output register stores the output of the process that it has to send to the output devices to display.
How Many ALU Does a CPU Have?
Ideally, there is no universal standard or a golden rule for ALU or CPU design which fixates the number of ALUs a CPU can have.
Therefore, it is due to the variable design of the ALU and the CPU that some processors may come with only one single ALU to perform the calculations while a few processors may come with more than one Arithmetic Logic Units for the same.
The number of ALUs a CPU can have can also vary according to the CPU architectures as well as the generation of the processors.
At the minimum, a CPU will come with one ALU for every CPU core.
However, the most widely used superscalar CPU architectures of today typically come with several execution pipelines.
It is highly possible for all or some of these pipelines to have an ALU in them.
Also, there are a few specific CPU architectures that may have SIMD or Single Instruction, Multiple Data ALUs. This is where several ALUs perform a task in parallel on different parts of the data.
And, having multiple ALUs should not be considered to be a modern marvel.
This is because, according to records, historically, the 1948 Whirlwind I was one of the earliest computers to come with several discrete single-bit ALU circuits. This had 16 math units that could handle the operation on 16-bit words.
Typically, the CPUs designed in the 1970s and 1980s, the ALU was actually a unit of the design occupying almost a square piece of silicon die.
However, with the passage of time when more modern CPUs were designed, these were not only more complicated than the older versions but also had more transistors and silicon.
Therefore, these came with up to a dozen circuits each of which acted like an Arithmetic Logic Unit.
Some of these could only perform addition operations while others only shift operations.
Therefore, even if a CPU comes with multiple ALUs you should not think that it will support multiprocessing necessarily and there are several reasons for it.
To start with, an ALU is certainly not a core of a processor.
This is simply a collection of electronic circuitry that includes components such as:
- Barrel shifters
- Logical circuitry and more.
It, therefore, really does not matter what an ALU processes.
It simply performs the arithmetic or logic operations as requested on the operands supplied.
Therefore, CPUs with multiple Arithmetic Logic Units and processing lanes are very common nowadays.
However, the number of such ALUs will typically depend on the number of pipelines in the CPU design.
In spite of that, these do not speed up the overall operation of the CPU.
This is because in most of the cases enough work is not available to the resources of the processors.
So, with all that said and explained this article comes to an end.
You are surely more knowledgeable now about the Arithmetic Logic Unit than you were before you started reading this article.