Cache Line

What is Cache Line?

In simple words, when the cache memory is separated into partitions of equal size, these partitions are called the cache lines.

Cache line refers to the block of memory that is moved to the cache memory. This cache line is usually fixed in size and ranges from 16 bytes to 256 bytes. However, the size of the cache line will depend largely on the type of application.

Understanding Cache Line

What is Cache Line

Some people however get confused and mix up cache memory and cache line and think that they are both the same.

Well, this shows their ignorance and need to know more about cache lines.

The good news is that to gain such additional knowledge, such users do not have to search any further because they are already in the right place, fortunately.

This article is just right for such types of users which includes the technical and non-technical aspects of cache line, its function and more.

In order to understand cache lines, you will first need to know about the cache memory itself.

There are a few specific attributes of cache memory which are:

Ideally, the cache memory is made up of three basic components such as:

Ideally, the cache line or cache block refers to the smallest unit of the cache memory or the fixed number of words held by each cache entry.

It is this unit that is moved from the cache memory to the main memory of the computer system.

You can configure the cache circuits by using the system designer into a different line size.

In addition to that, there are also a lot of other algorithms that you can use in order to adjust the cache line size more dynamically in real time.

The Concept and Advantages

The concept of cache line is based on the principle of locality of reference and takes the full advantage of it.

The full line is read and cached straight away instead of reading a single byte or a word from the main memory of the computer system at a time.

The advantages of the cache line concept are varied and many but the two major ones are as follows:

Ideally, when the cache system of a computer is designed, the size of the cache lines is considered to be a vital parameter because it affects a lot of other factors in the caching system.

Effects of Changing Cache Lines

As said earlier, you can configure the cache lines or cache blocks but there are some effects of such changes that you should also be aware of.

These effects may vary according to the situations on which the changes are made in the first place.

Effects of changing the cache line size on spatial locality:

Ideally, the larger the block size, the better the spatial locality will be.

However, when the cache size is kept constant, there are two types of effects based on two different conditions as follows:

Effects of changing the cache line size on the cache tag in a direct mapped cache:

Typically, the cache tag is not affected in any way by the block size in a direct mapped cache.

Once again, when you keep the size of the cache constant while changing the cache line size, there are two specific effects and results of it as follows:

Effects of changing the cache line size on cache tag in totally associative cache:

Ideally, in such situations when you reduce the size of the cache line the cache tag will also be decreased and vice versa.

When the size of the cache is kept constant but the block size is changed, these are the two specific effects that may happen:

Effects of changing the cache line size on cache tag in set associative cache:

Once again, in such situations, the cache tag will not be affected in any way whatsoever.

When the size of the cache is kept constant and the changes in cache lines are made, the effects it will have are as follows:

Effects of changing the cache line size on cache miss penalty:

Ideally, a lower cache miss penalty will result in a cache block of smaller size.

This is because when a cache miss happens, it is required to bring the block holding the necessary word from the main memory of the system.

Whether this miss penalty will be more or less will depend on the following conditions:

Effects of changing cache tag on cache miss penalty:

Cache hit time refers to the time needed for finding out whether or not the necessary block is within the cache.

When the cache tag is smaller in size, it will result in lesser cache hit time.

This entails comparing the two tags, one for the generated address and the other of the cache lines.

And, there are two specific effects of it depending on the size of the cache tag such as:

So, considering all these effects of changing cache lines it can be concluded that a smaller sized block memory will result in lower cache miss penalty. The results can be further summarized as:

Technical Aspects

Technically, the tag RAM, in particular, holds the information of the addresses of the data that is stored in the cache.

It is necessary to keep the size of the address information reasonable and so the cache is usually split into different blocks of uniform size.

These are called cache lines.

Though there is no golden rule, typically, the most common sizes of the cache lines are:

Read Also:  What is MIMO Technology? Function & Advantages

If you have a cache of 64 KB and the cache line size in 64 bytes, then the cache will have 1024 lines.

This means that the cache will need to track only those 1024 different addresses in order to define the contents available in that cache.

However, in some cases, the cache line may signify the data only but colloquially the cache line typically refers to the data as well as the metadata related to that data which includes the status, address and more.

Usually, the data is brought into the CPU through the cache lines and it will do so irrespective of the size of the data that is actually being read.

However, for example, if a processor brings in around 64 bytes of data at a time and if the desired word is not present in the cache line holding the byte the processor will request the 64 bytes that start at the boundary of the cache line.

This is the largest address less than the one required which is a multiple of 64.

Typically, all modern computer memory modules can transfer 64 bits or 8 bytes of data at a time.

This is usually done in a burst of 8 transfers. This means that a single command will trigger a read or write of an entire cache line from the memory.

In general, in all SDRAMs or Synchronous Dynamic RAM whether it is DDR1, DDR2, DDR3, or DDR4, the burst transfer size can be programmable up to 64 bytes.

This is pretty common though the CPUs usually select the size in order to match with the size of the catch line.

If the processor has to prefetch memory access not being able to forecast it, it will take about 90 nanoseconds or about 250 clock cycles for the retrieval process which involves knowing the address and receiving the data by the CPU.

On the other hand, if there is a L1 cache hit on any modern x86 CPU, the load-use latency would be about 3 or 4 cycles and the store-reload or store-forwarding latency would be about 4 or 5 cycles.

It is the same across all other architectures.

For example, a cache line of 64 bytes will be divided into non-overlapping or distinct blocks of memory where the start address of every block will have the lowest 6 address bits to be always zero bits.

It is not required to increase the address space while sending these six zero bits each time by 64 times for any given quantity of address bus width.

Apart from reading in advance and freeing or saving 6 bits on the address bus, the cache lines also solves another problem which is the organization of the cache memory itself.

For example, if a cache is divided into 64 bits or 8 byte cells or blocks, then the address of the memory cell is needed to be stored in the cache cell that holds its value.

However, if the address itself is 64 bits, then half the size of the cache will be consumed by it which will result in 100% of overhead.

This problem is solved by the cache line since it is 64 bytes in size and it is likely that the CPU will use 64 bit – 6 bit = 58 bit.

This means that it will not need to store the zero bits and 64 bytes or 512 bits can be cached with an 11% overhead of 58 bits.

In reality, the addresses stored are even much smaller than this.

However, there are different bits of status information such as valid and accurate cache lines that need to be written back in the RAM.

And then, in the set associative cache, every cache cell may not be able to hold a particular type of address but can store only the subsets of them.

This means that the bits of the required stored address are even smaller which allows accessing the cache in parallel.

This means that you can access every subset only once that is free from the other subsets.

It is very important to synchronize cache or memory access among the various virtual cores or the independent multiple processing units in each core and the several processors on one main board.

For that it is good to have cache lines that are 64 bytes wide because these will then correspond to memory blocks that start on the addresses that are typically divisible by 64.

The offsets into a cache line are usually the least significant 6 bits of an address.

Therefore, for any particular byte the cache line that needs to be fetched will be found easily by clearing these 6 bits of the address which is typically done by rounding down to the closest address that can be divided by 64.

However, all these are done by the relevant hardware.

Cache Line Functions

Now, knowing about the technicalities related to the cache lines, it will be easier for you to understand the functions of these memory blocks.

But first, it is good to know what the cache lines typically hold. Ideally, the cache lines hold:

This means that, in fact, the modern caches do not actually cache only one memory address. Instead, more often than not, these single cache lines contain data that are worth several bytes.

This means that all of these can be accessed in just one single operation and it also allows exploiting the spatial locality inside the memory.

It is for this reason the term cache block is often used interchangeably by some users since it signifies the data section solely.

Now, take a look at the functions of the cache lines.

Read Also:  What is Southbridge? Functions & More

The primary function of the cache lines is to act as the unit or medium to transfer data between the memory of the system to the cache.

Normally, the cache line has 64 bytes and the CPU will read or write the whole cache line when any location within this 64 byte area is being written or read.

Sometimes, as said earlier, the processor will also need to prefetch cache lines by evaluating the pattern of memory access of a thread.

It is very important to organize the access around the cache lines. This is because it will have notable consequences for the performance of the applications.

For example, when an application needs to access a two dimensional array which fits the different cache lines, the access will result in the following:

Considering that the data structures are accessed column-wise, things may not go very well.

Here the process will actually commence with the cache line 0 but it will soon need to access and read in cache line 1 as well as in the other cache lines as well.

In such situations, every access will incur an overhead of the main memory.

However, if the cache is full, there is a high chance that successive interaction with every column may result in repetitive fetching of the cache lines.

Cache lining allocation and functions allow better cache organization that allows storing data structures by value.

This is especially helpful for those who prefer vectors and arrays stored by value.

For example, for a cache line allocation for a vector pointer – vector<Entity*> = it is quite possible that the Entity objects will be allocated and located far away in the memory if each of the Entity objects are assigned from the heap.

This means that with every access to each of the entities it will cause a complete cache line to be loaded.

On the other hand, if the vectors store entities by value – vector – only, the Entity objects will be much more organized in the adjoining memory.

This will minimize the load of the cache lines significantly.

Another significant function of the cache lines is to keep the validity flags of the array entry in a different descriptor array.

This is important because the cache lines sometimes need to be reevaluated for provisional access to the array elements.

Typically, the validity of the array element is decided by a flag within the array element.

There is a ‘for’ loop that will help in cache line loads even for those specific cache entries that are to be left out.

When the validity flags are kept into different descriptor arrays, the performance of the cache improves considerably.

In such situations, the validity descriptors are loaded from one single cache line which means that the application will have fewer cache lines loaded.

This is because the cache lines for the entries of the invalid arrays will be omitted and also due to the fact that the array content and the descriptor will be separated.

Cache lines also eliminate the chances of false sharing by avoiding sharing the lines between the threads.

As you may know, the CPU die comes with Level 1 and Level 2 caches that are dedicated to each of the cores in it, and the Level 3 cache is typically shared.

This division of the cache architecture in different tiers often results in coherency issues among them.

For example, when the core on the far left reads a memory location which loads cache line 0 into the Level 1 cache of it and the far right core tries to read access and read a memory location which is also on the cache line 0, the CPU will need to set off the cache coherency process.

This modus operandi will ensure coherent representation of the cache line 0 across all available caches.

However, this is a costly process and also results in a lot of waste of memory bandwidth.

This eventually annuls the benefits offered by caching memory.

Any application that is caching-optimized will suffer due to the induced lag of cache coherency.

This will result in false sharing because the cores will usually operate on alternate entries within the array and access separate memory addresses.

This may double the output of a single core but that will happen only in the case of shared descriptors.

The descriptor array should be small enough to fit into a single cache line.

This problem can be easily resolved with the use of different cache lines because then the descriptors for the different cores will occupy them separately.

Therefore, the functions of the cache lines are quite varied and are quite important to make sure that all the cores of a multi-core processor performs in a much better way as desired.

Conclusion

Therefore, after reading this article now you surely know a lot about the cache line or a data block, what it contains, where it fetches the data from as well as a lot of technical aspects.

All these will surely entice you if you are pretty tech savvy.