Floating Point Operations per Second (FLOPS)

What is FLOPS (Floating Point Operations Per Second)?

FLOPS or Floating Point Operations per Second is a particular measurement that indicates the number of floating point operations that can be performed by a processor in one second.

In simple words, this is a measurement that indicates the performance level of a computer. It is, in fact, a more precise measure as compared to Instructions per Second or IPC.

Understanding FLOPS (Floating Point Operations Per Second)

Floating Point Operations per Second helps to determine the performance of a computer.

Floating point calculations involve fractional numbers and FLOPS act as the standard benchmark measurement to determine the speed of these operations.

FLOPS, typically, is calculated by the formula:

(Clock ticks per second/operations required per floating point operation) x (number of ALUs working in parallel)

Invented by Frank H. McMahon of the Lawrence Livermore National Laboratory, FLOPS and Mega FLOPS or MFLOPS are much better and more accurate measures than MIPS or Million Instructions Per Second.

This is because MIPS typically provides statistics that have little or no bearing on the real arithmetic abilities of a computer system.

Ideally, in computers, the basic mathematical calculations are performed by a cluster of gates and a microcode is used usually to carry out the instructions.

The microcircuit may also contain much more complex logic depending on the significance of the speed of these mathematical operations.

Therefore, it is required to estimate the computing speed, and the FLOPS offers a rough basis for it.

For any High Performance Computing or HPC system, FLOPS can be calculated by using the following formula:

FLOPS = (racks) x (nodes/rack) x (sockets/node) x (cores/socket) x (cycles/second) x (FLOPs/cycle).

This formula can be further simplified if the computer system has only one CPU. Then the formula becomes:

FLOPS = (cores) x (cycles/second) x (FLOPs/cycle).

There are different ways in which FLOPS can be recorded depending on the type of computer as well as the different measures of accuracy. For example:

• 64 bit operations per second, abbreviated as FP64, as the double-precision floating-point format for TOP500 supercomputer
• 32-bit or FP32 floating-point operations format
• 16-bit or FP16 floating-point format operations format.

There are different names, units and values of FLOPS. These are:

• Kilo FLOPS or kFLOPS with its value 103
• Mega FLOPS or MFLOPS with its value 106
• Giga FLOPS or GFLOPS with its value 109
• Tera FLOPS or TFLOPS with its value 1012
• Peta FLOPS or PFLOPS with its value 1015
• Exa FLOPS or EFLOPS with its value 1018
• Zetta FLOPS or ZFLOPS with its value 1021
• Yotta FLOPS or YFLOPS with its value 1024
• Ronna FLOPS or RFLOPS with its value 1027
• Quetta FLOPS or QFLOPS with its value 1030

However, most of these units, apart from the top three or four, are not used very commonly.

FLOPS is used in all those fields that deal with a large number of non-integer calculations and where speed plays a very important role, such as the field of scientific research.

How Do You Calculate the Number of FLOPS?

Theoretically, the number of FLOPS of a general-purpose computer can be calculated by using the formula: Number of Cores x Average frequency x Operations per cycle.

Or even better: sockets x (cores per socket) x (number of clock cycles per second) x (number of floating point operations per cycle).

In the first case, the number of cores is easy to find but the average frequency may include some amount of Turbo Boost in the case of an Intel CPU or Turbo Core in the case of an AMD CPU, while the operating frequency is quite lower bound.

As for the operations per cycle, it is mainly dependent on the architecture of the core and is pretty hard to figure out.

You will need to know the vendor of the CPU used in your computer as well as its model number.

If you have both of these, you can visit the official website of the vendor and find out the necessary details such as:

• The clock rate
• The number of sockets or chips
• The number of cores in each chip
• The number of floating point operations per cycle
• The vector width of the operations

Once you have them, all you have to do is multiply. You can also use these numbers for the second formula.

While calculating the number of FLOPS of a computer, here are a few things that you should keep in mind:

• It is only the servers that may have more than one socket. Otherwise, most home computers, whether desktop or a laptop computer, typically come with a single socket.
• The number of cores per socket will depend on the type of the CPU. It can be 2 cores as in dual core CPUs, 4 as in quad core CPUs, and so on. There are some prototype processors that may have as many as 80 cores in them.
• The clock cycles per second mean the speed of the CPU, typically represented in Gigahertz. A 2-GHz CPU will run 2,000,000,000 clock cycles per second.
• The number of floating operations per cycle will also depend on the type of the CPU.

Read Also:  Octa Core & Quad Core Processor: 7 Differences

And, if you intend to find the FLOPS of a graphics card, the number of cores can be much, much higher than the number of cores in a CPU.

In fact, the number of GPU cores can be in the thousands! That is why the FLOPS of GPUs often achieve and surpass Terra FLOPS due to their specialized cores.

How Do You Calculate the FLOPS of a Model?

There are a few specific rules that you need to follow to calculate the FLOPS of a model, which are different for different conditions. In addition to that, you will also need to know the input and output size, number of kernels and more.

Ideally, the rules are:

• Convolutions – FLOPs = 2x Number of Kernel x Kernel Shape x Output Shape.
• Fully Connected Layers – FLOPs = 2x Input Size x Output Size.

How Do You Convert GHz to FLOPS?

In simple words, the answer to this question is: you simply cannot convert GHz into FLOPS easily and precisely, though it is true that GHz or the clock speed of the CPU affects the number of FLOPS performed by it.

This is because these are two different measurements of two things altogether.

However, this is not the only reason to say so.

The clock cycles of different CPUs can be different and therefore multiplying the floating point numbers by variable clock speeds will give different results.

Moreover, the hardware involved in the process also plays a crucial role, and therefore, you will need its value as well.

However, unfortunately, there is hardly any way to convert the performance of the hardware into a numerical value.

What you may get is an average or assumed value, which will give you only a rough or ‘almost’ value.

This means that multiplying the two numbers will only give you a theoretical peak performance.

Typically, there is no universal method to convert GHz or the clock rate of the processor into FLOPS or the number of floating point operations performed per second.

Apart from the two numbers that may be correlated, there are a lot of other factors that need to be considered such as the value from the memory, latency and more, because all these will affect the overall performance of the CPU.

Read Also:  What is Dual Core Processor? Pros, Cons & More

Therefore, it is quite impossible to ascertain real workloads and achieve real value.

Questions & Answers:

How Many FLOPS a Computer Can Calculate?

Ideally, most of the computers are within the Giga FLOPS or 10^9 range. However, a few supercomputers can calculate 1 Tera FLOPS or 10^12 operations, which, in simple words, is one trillion floating point operations per second.

Ideally, the FLOPS for any given code is given by the formula: (n*(n-1)/2) + (n*(n+1)/2). This is equal to n^2 + O(n).

How Many FLOPS the Supercomputers Can Achieve?

A supercomputer can perform as many as one quadrillion floating point operations per second. This is quite high because supercomputers are considered to be the most powerful computers.

What is FLOPS in Object Detection?

In object detection, the FLOPS are one or a couple of orders of magnitude higher with respect to the processing of the given task.

However, in a practical scenario and application, it refers to the precision trade-off proposal and the effective runtime.

Why is Performance Measured in FLOPS?

The primary reason to use FLOPS instead of MIPS or others for measuring the performance of a processor and a computer system is that the statistics are more effective.

It is also due to the fact that the math used in floating point is suitable for any given type of up to date applications.

Why is FLOPS Better than MIPS?

FLOPS is better than MIPS because the values have a specific bearing on the process and the computational abilities of the computer system.

Moreover, the fact that this is typically based on the operations rather than the instructions involved in a program gives it more authenticity and a much stronger claim on the effectiveness and fairness of the comparison between two different computers.

Is Higher FLOPS Better?

Apparently, it may seem that the higher the number of FLOPS, the faster will be the performance of the computer. However, it is not true because it depends on the clock cycles of the CPU or the time taken by it to complete those many floating point operations.

The type of CPU and hardware play a significant role. In fact, the higher the FLOPS, the slower the model will be and the lower the throughput will be. So, the lower the FLOPS, the better.

Conclusion

FLOPS or Floating Point per Second is an important measure that helps to determine the performance level of a computer.

In fact, it tells how many non-integer operations the CPU of the system can handle.

It is much better than MIPS because it is more accurate and is therefore a more reliable measure.