In This Article
What is Steamroller Processor?
AMD Steamroller processor refers to the Family 15h processor line and is built on the microarchitecture specially designed for the Accelerated Processing Units or APUs.
In technical terms, the Steamroller processors are designed on the same 2-core module as their predecessor, the Piledriver processors.
These CPUs are typically built on the 28 nm manufacturing process by using different sockets such as FM2+ and FP3.
- The Steamroller processors were launched by AMD in early 2014 and are actually the third instantiation of the Bulldozer architecture.
- Using a two-core module and FM2+ and FP3 socket types, the architecture of these processors was actually developed by AMD keeping the APUs in mind.
- The features and design aspects of the third-generation Bulldozer-based microarchitecture help these processors to offer a much higher level of parallelism.
- Built on the 28-nm technology node, the Steamroller processors are the successors to the Family 15h Piledriver and the predecessors of the Excavator processor lines.
- The Steamroller processors come with a Perceptron branch predictor just like their predecessors which allows for better predictions and higher performance especially while handling server workloads.
Understanding Steamroller Processor
Steamroller processors are much improved than their predecessors, which is mainly due to their design characteristics.
The CPUs can offer much more to the users in the form of the following:
- Greater parallelism
- Independent instruction decoders for every core in a module
- Increased maximum width dispatches by about 25% for each thread
- Improved instruction schedulers
- Better memory controller
- Better perceptron branch predictor
- Bigger and smarter caches
- Better branch prediction rate
- Lower instruction cache misses
- Dynamically resizable L2 cache
- Micro-operations queue
- Higher internal register resources
- Increased Instructions Per Cycle (IPC) by about 30%
- Lower power consumption
- Higher clock rates
- Better single-threaded and multi-threaded IPC improvements
Also, the microarchitecture of these processors, when paired with the architecture of the microarchitecture of Graphics Core Next in the Accelerated Processing Units or APUs, helps in supporting the specific features of Heterogeneous System Architecture (HAS).
General Information and Specs
Here are some of the basic specifications and information about the Steamroller processors that were launched at the beginning of 2014.
- The processors are built on a 28-nm technology node.
- They support AMD64 (x86-64) Instruction Set Architecture (ISA).
- The design of these processors uses FM2+ and FP3 (µBGA) types of sockets.
- They are the successors to 2nd generation Piledriver processors and predecessors of 4th generation Excavator processors, both belonging to the Family 15h.
Ideally, the design of the architecture of the Steamroller processors is meant to keep the ball rolling by taking the basics from the architectures of Bulldozer and/or Piledriver processors.
On top of them, AMD has incorporated a healthy set of evolutionary enhancements.
However, the Steamroller processors cannot be considered to be a tick or a tock, in Intel terms, but they fit somewhere between these two extremes. This is with respect to the changes in the architecture.
It is not a tick because it does not come with a notable change in the technology node or manufacturing process.
Ideally, the 28-nm bulk process is quite close to the 32-nm Silicon on Insulator (SOI) technology.
On the other hand, it is not a tock because the architecture is mostly unchanged but enhanced.
However, these enhancements help the Steamroller processors to handle one of the biggest issues with its predecessors, common fetch and decode hardware.
The Steamroller retains the Perceptron branch predictor from its predecessor, but in a much improved form. This helps in delivering better performance while handling server workloads primarily.
In the Steamroller processors, the floating point and integer register files are quite large, though the actual size of them is not very clear. The load operations involving two operands are compressed.
This means that they can only make a single entry in the physical register file. This actually helps in increasing the effective size of the register files. However, there is no change made in the integer execution units themselves, but it is the other enhancements that improve the integer performance.
In addition to that, the size of the scheduling windows is increased as well, which facilitates greater and better utilization of the available execution resources.
Also, the decrease in pipeline resources helps in delivering the same output while using less area and power, which is basically the result of the smarter implementation of the Piledriver FPU.
Store to load forwarding:
There is also a significant improvement in the store to load forwarding feature of the processors, which helps the Steamroller CPUs in the following:
- Detecting interlocks
- Cancelling the load
- Getting data from the store
When it comes to instruction fetching, the architecture of the Steamroller processors seems to have been refined heavily to offer reasonable performance gains.
This is typically done by keeping the modules or the CPU cores fed with data.
This aspect, along with the refined design, also results in a significant reduction by 20% in the branch prediction errors and 30% less cache misses.
This is also facilitated by the larger size of the Branch Target Buffer (BTB).
Moreover, the Floating Point (FP) scheduler in these processors is shared between the two modules. It basically comes with the following:
- Two 128-bit Fused Multiply Add Capability (FMAC) units
- One Multimedia Extension (MMX) unit, instead of two of them as it is in Piledriver CPUs.
Some hardware is shared between the MMX unit and the 128-bit FMAC pipes, which allows mutually exclusive operations of the MMX, FMA, and FP and therefore does not cause any performance penalty.
Effects of the changes:
According to AMD, these changes have resulted in reclaiming die space without having a serious effect on its performance. It also helps the design to respond to the changing computing situations.
The effects of the Steamroller processor cores, when used in the AMD APUs are especially significant. It helps them to realize additional power-saving features.
For example, the chips can adjust the clock speeds as well as the power usage much more dynamically according to the present workload.
This means that, when the CPU sits idle most of the time while the maximum workload is on the GPU, for example when you are watching a movie, the power is mostly assigned to the GPU rather than the CPU.
Ideally, the architecture of the Steamroller processors is simply a tweaked version of the Piledriver processor architecture, made to offer higher power efficiency and processing capabilities.
In basic terms, it can be called a Piledriver 2.0 that offers the following benefits:
- Reduced latency
- Increased bandwidth
- Pipeline optimizations
- Improved instruction fetching
- Better inter-process communication
- Improved power efficiency
- A dynamically-sized Level 2 cache
The Steamroller processors typically come with a better design and cores that reduce latency, increase bandwidth, optimize pipeline and allow better power efficiency.
Ideally, the design of these CPUs, in simpler terms, is actually a Piledriver architecture that is tweaked slightly to be rightfully called Piledriver 2.0.