What is Nehalem Processor? (Explained)

What is Nehalem Processor?

The term Nehalem, derived from the Nehalem River, is actually the codename of the Intel microarchitectures built on a 45 nm manufacturing process. Ideally, these processors can run at a high clock speed and are quite energy efficient.

Technically, Nehalem architecture uses the CISC or Complex Instruction Set Computer Intel 64 instruction specification. It also uses a high-k + metal gate transistor technology that also adds to its performance level and speed.

KEY TAKEAWAYS

  • Launched in November 2008, the Nehalem microarchitecture was first used in the 1st generation Intel Core i5 and Core i7 processors.
  • Ideally, these processors have a reduced L2 cache size and the Level 3 cache is enlarged and is shared among all of the cores in it.
  • The Nehalem architecture is an upgrade of the Netburst architecture with hyperthreading ability included in the design, but has some of the minor features of the latter in it.
  • The features of the architecture allow it to handle scientific and complex workloads with multiple cores, an on-chip memory controller, and a better connectivity and I/O subsystem.
  • The cores of the Nehalem processors are superscalar and support SMT and speculative and Out of Order execution pipelines to render much faster and higher performance overall.

Understanding Nehalem Processor

What is Nehalem Processor

The Nehalem microarchitecture of Intel is based on the 45 nm fabrication process and is used for server as well as desktop processors.

They are the successors to Penryn and the features are much more improved in it.

This architecture is designed with a wide range of state-of-the-art technologies.

This offers a high computing rate and performance while handling any scientific as well as any other demanding workloads.

This is all due to the useful components used in the architecture which include but are not limited to the following:

  • Multiple cores
  • Enlarged and shared Level 3 cache
  • On-chip memory controller
  • High-speed Quick Path Interconnect ports and other connectivity options
  • A better I/O sub-system

The cores in these processors are superscalar in nature and support specific processes and features that add to the overall performance of the unit, such as:

  • Speculative and Out of Order execution pipelines
  • Two-way Simultaneous Multithreading or SMT
  • Several functional units
  • Higher instruction level parallelism rates
  • Program development tools
  • Compilers
  • Special coding techniques

Ideally, Nehalem can be called a shared CC-NUMA or Cache Coherent Non-Uniform Memory Access processor system.

These are quite complex systems that need the system level engineers and the application developers to write equally complex yet efficient codes on progressively more complex platforms.

They also need to understand the types of bottlenecks presented at the system level in order to tune it and configure is so that it yields good performance for any application mix.

General information

Here are a few general specs of the x86-64 architecture of Nehalem:

  • It was released by Intel on November 11, 2008.
  • The maximum CPU clock rate that can be achieved by the cores of these processors ranges between 1.06 GHz and 3.33 GHz, which is reasonably high.
  • The Level 1 cache of the processors is 64 KB per core, while the Level 2 cache size is 256 KB per core. The Level 3 cache is however shared and its data storing capacity ranges between 2 MB and 24 MB.
  • The processors come with a number of cores tagging between 2 and 6 for the regular CPUs and from 2 to 8, typically, in the case of Xeon processors.
  • There are as many as 731 million to 2300 million transistors on the 45 nm die of these processors.
  • Different variants of the processors come with different types of sockets such as LGA 1156, LGA 1366, LGA 1567, and µPGA 988.
  • The predecessor of the Nehalem processors is Penryn and their successors are Westmere and Sandy Bridge.
Read Also:  Core i7 and i9 Processors: 5 Differences

Variants

There are different variants of the Nehalem processors available for use as desktop, server, and mobile processors.

These CPUs are categorized on the basis of their cores, features, and brand names.

The different brand names are:

  • Intel Celeron
  • Core i5
  • Core i7
  • Intel Xeon

Based on the processing core type and their numbers, the Nehalem processors can also be categorized as follows:

  • Dual channel dual-core processors with PCI Express and graphics core
  • Dual channel quad-core processors with PCI Express
  • Triple channel quad-core processors
  • Quad-channel octa-core processors

Feature-wise, these processors can be categorized as follows:

The Celeron processors, codenamed Kasper Forest, are designed to be used in embedded desktop systems and come with:

The Core i5 processors are codenamed Lynnfield and designed for performance desktop usage. These CPUs come with:

  • Four cores
  • Four threads
  • An LGA 1156 socket type

The core i7 processors, however, come in two different variants.

For example, those codenamed Lynnfield come with the following features:

  • Designed for performance desktops
  • Four cores
  • Eight threads
  • An LGA 1156 socket type

On the other hand, the ones codenamed Bloomfield come with two variants, regular Core i7 and Core i7 Extreme, and with the following features:

  • Designed for enthusiast desktops
  • Four cores
  • Eight threads
  • An LGA 1366 socket type

Both the Core i7 and Core i7 Extreme processors are also designed to use as extreme and performance mobile processors.

These specific types of CPUs are codenamed Clarksfield and come with the following features:

  • Four cores
  • Eight threads
  • A µPGA 988 socket type

The Nehalem architecture based Intel Xeon processors need specific mention since different processors with different code names come with different features as explained below.

Intel Xeon 3000-series

Lynnfield:

The quad-core Xeon 3000-series uniprocessors that are codenamed Lynnfield come with the following features:

  • Hyperthreading support except for the X3430 model
  • Die size – 296 mm²
  • Steppings – B1

They also support a wide range of instruction sets and extensions such as:

Bloomfield:

The dual-core and quad-core Xeon 3000-series uniprocessors that are codenamed Bloomfield come with the following features:

  • Hyperthreading and Turbo Boost support in the quad-core models
  • Die size – 263 mm²
  • Steppings – D0

They also support a wide range of instruction sets and extensions such as:

  • XD bit
  • x64
  • SpeedStep
  • Smart Cache
  • EPT
  • ECC
Read Also:  What is CPU Socket? Lifespan, Type, Works & More

Jasper Forest:

The single, dual, and quad-core Xeon 3000-series uniprocessors that are codenamed Jasper Forest come with the following features:

  • Hyperthreading and Turbo Boost support for the LC3528 model
  • Die size – 263 mm²
  • Steppings – B0

They also support a wide range of instructions set and extensions such as:

  • XD bit
  • x64
  • SpeedStep
  • Smart Cache
  • ECC
  • EPT

Gainestown:

The dual-core and quad-core Xeon 5000-series dual processors that are codenamed Gainestown come with the following features:

  • Hyperthreading and Turbo Boost support except in E5502, E5503, E5504, E5506, L5506, and E5507 models
  • Dual-processor configurations support for all models
  • Die size – 263 mm²
  • Steppings – D0

They also support a wide range of instruction sets and extensions such as:

  • Demand-Based Switching or Intel’s Server EIST
  • Intel 64
  • XD bit
  • EPT
  • VT-c
  • Intel x8 SDDC

Intel Xeon 5000-series

Jasper Forest:

The dual-core and quad-core Xeon 5000-series dual processors that are codenamed Jasper Forest come with the following features:

  • Hyperthreading and Turbo Boost support for EC5549, LC5528, and LC5518
  • Die size – 263 mm²
  • Steppings – B0

They also support a wide range of instruction sets and extensions such as:

  • Demand-Based Switching or Intel’s Server EIST
  • Intel 64
  • XD bit
  • EPT
  • VT-c
  • Intel x8 SDDC

Intel Xeon 7000-series

Beckton:

The quad-core, hexa-core, and octa-core Xeon 7000-series multiprocessors that are codenamed Beckton come with the following features:

  • Single and dual-processor configuration support for the 65xx models
  • Up to 8 processor configurations support for the 75xx models
  • Turbo Boost
  • Smart Cache
  • Hyperthreading support for all except X7542
  • Transistors – 2.3 billion
  • Die size – 684 mm²
  • Steppings – D0

They also support a wide range of instruction sets and extensions such as:

Performance and Power

The Nehalem architecture is designed to offer both power and performance benefits to its users.

A lot of improvements have been made to it for its purpose with respect to its predecessors, over and above its increased core size.

In comparison to its predecessor, Penryn, Nehalem provides:

  • A better single-threaded performance of 10 to 25%
  • A better multithreaded performance of 20 to 100% by using the same amount of power
  • Lower power consumption by about 30% while providing the same level of performance
  • A notable improvement in individual core clock performance by about 15 to 20%
  • Overclocking support with the X58 chipset and Bloomfield processors
  • Use of Platform Controller Hub or PCH eliminating the need for a Northbridge in the Lynnfield processors

Additional Instruction and Extensions Support

The Nehalem processors support x86 and x86-64 instructions along with some additional extensions, over and above those mentioned above for the different types of processors.

This includes:

  • MMX or MultiMedia eXtensions, a Single Instruction, Multiple Data (SIMD) Instruction Set Architecture (ISA) designed by Intel, which offer support for video, audio, animations, the RIFF file format, MIDI, the MCI interface and joysticks.
  • POPCNT or Population Count Instructions, which helps in calculating the number of bits set to 1 in the source or the second operand and return that count in the destination register as the first operand.
  • SSE, or Streaming SIMD Extensions, along with all its variants such as SSE2, SSE3, SSSE3, SSE4, SSE4.1, and SSE4.2, where the SSE4.2 SIMD instructions add seven new instructions to the set of SSE4.1 instructions in the Core 2 series.
Read Also:  What is System on a Chip (SoC)? (Explained)

Apart from that, the processors also support VT-x and VT-d instruction extensions based on 2nd generation Intel Virtualization Technology, which offers different types of support as well such as:

  • Extended Page Table support
  • Non-maskable interrupt window exiting
  • Virtual Processor Identifiers or VPIDs

Nehalem Processor Features

Some of the notable features included in the Nehalem architecture design are as follows:

  • Reduced cache line block on L2 and L3 cache to 64 bytes from 128 bytes in the previous generations
  • Hyper-threading and SMT support
  • Intel Turbo Boost 1.0
  • Instruction Fetch Unit (IFU) with second-level branch predictor
  • Two level Return Stack Buffer (RSB) and Branch Target Buffer (BTB)
  • Support for all predictor types used in Intel processors such as Loop Detector and Indirect Predictor
  • Second level unified and 4-way associative Translation Lookaside Buffer or sTLB containing both instructions and data in 512 entries only for small pages of 4 KB
  • First level data TLB (DTLB) allowing 64 entries and 32 entries of data in 4 KB and 2 MB page size respectively
  • First level instruction TLB (ITLB) allowing 128 entries for 4 KB page size and only 7 per logical core of 2 MB page size
  • Three integer Arithmetic Logic Units (ALUs)
  • Two vector ALUs
  • Two Address Generation Units (AGUs) per core
  • All cores being native or on the single die
  • Intel Quick Path Interconnect (QPI) in high-end models in place of the legacy Front Side Bus (FSB)
  • Integrated Peripheral Component Interconnect Express or PCIe and Direct Media Interface or DMI into the mid-range processor models in place of the Northbridge
  • A large number of pipeline stages ranging from 20 to 24
  • Macro-op fusion in 64-bit mode
  • A wide range of instruction extensions

As said earlier, the Nehalem architecture also comes with an integrated memory controller.

This controller supports two or three Double Data Rate Synchronous Dynamic Random Access Memory (DDR3 SDRAM) channels or four Fully Buffered Dual Inline Memory Module (FB-DIMM2) channels.

Also, the Nehalem architecture has the ability to cut down atomic operation latency to half in order to remove overheads on atomic operations such as the compare-and-swap instruction LOCK CMPXCHG.

Conclusion

The Nehalem architecture, though built on a 45 nm manufacturing process, is quite competent at handling complex computing workloads and offering high performance.

It is all due to its efficient design, use of high-k + metal gate transistor technology, on-chip memory controller, larger L3 cache and more.

About Taylor Swift

Taylor SwiftTaylor Swift, a UOPEOPLE graduate is a freelance technology writer with in-depth knowledge about computers. She loves to play video games and watch movies when she has no writing assignments. Follow Her at Linkedin