What is Sunny Cove Processor? (Explained)

4
40
What is Sunny Cove Processor

What is Sunny Cove Processor?

The term Sunny Cove refers to the codename given by Intel’s to the processor microarchitecture built on Intel 10 nm FinFET technology node that succeeded the Palm Cove microarchitecture.

Technically, this is a high-performing x86-64 core architecture that supports Dynamic Tuning 2.0 to sustain turbo frequencies for a long time, hardware acceleration, new subsets of AVX instructions, rebalanced execution ports, and a larger scheduler with large dispatch.

KEY TAKEAWAYS

  • Sunny Cove refers to the architecture of the Intel Core 10th generation and 3rd generation Xeon scalable server processors built on the Intel 10 nm FinFET process.
  • This microarchitecture is designed by the Intel design team in Haifa, Israel, and is the direct successor to the Palm Cove and Skylake processors.
  • The larger size of the L2 cache, μOP cache, second-level TLB, and core width of the Sunny Cove processor architecture increases the performance ability of the CPU.
  • There is a significant increase in the execution ports, L1 store, and allocation width in the Sunny Cove architecture design, which further augments its capabilities.
  • The architecture supports a much better and larger 5-level paging scheme. This increases the linear and physical address spaces, which eventually increases the virtual memory and the addressable physical memory spaces.

Understanding Sunny Cove Processor

What is Sunny Cove Processor

Sunny Cove is the microarchitecture designed and manufactured by Intel.

It is designed by the Research and Development Center of Intel in Haifa, Israel, on 10 nm FinFET technology and was launched in September 2019.

According to Intel, the prime focus while designing the Sunny Cove microarchitecture was on a few specific aspects, such as:

  • Single-thread performance
  • Improved scalability
  • New instructions support
  • Deeper, wider, and smarter core which improves its performance.

The design of the Sunny Cove allows using it in a wide range of client and server products, such as:

  • Lakefield
  • Ice Lake (client)
  • Ice Lake (server)
  • The Nervana NNP-I

The Sunny Cove microarchitecture is the direct successor to mobile Palm Cove and server Skylake processors and predecessors of mobile Willow Cove and server Golden Cove processors.

Typically, this architecture is integrated in different Intel designs.

For example:

  • The 10th generation Intel Core mobile processors codenamed Ice Lake, launched in September 2019 and
  • The 3rd generation of Xeon scalable server processors, codenamed Ice Lake-SP, launched on April 6, 2021.

There is, however, no product featuring Sunny Cove for desktop computers.

You will need to use the Cypress Cove processors for that matter, which is actually a variant of the Sunny Cove microarchitecture, but reversed to the 14 nm technology node of Intel.

Pipeline

The Sunny Cove architecture supports a pipeline with the following features and capabilities:

Front-end and Back-end

The front-end of the Sunny Cove processor architecture is also quite improved in comparison to its predecessors, just like the back end.

As for the front-end, the following features are quite significant:

  • It comes with a 1.5x bigger µOP cache that supports up to 2.3K entries in place of 1536.
  • It allows smart prefetch.
  • It has a much improved branch predictor.
  • The ITLB supports 2x2M page entries up to 16 instead of 8.
  • The IDQ is larger up to 70 µOps in place of 64.
  • The LSD can detect as many as 70 µOP loops in place of 64.

And, as for the back-end, the features are as follows:

  • It allows a much wider 6-way allocation than 5-way and 4-way allocation in Skylake and Broadwell, respectively.
  • The delivery output is 6 uops.
  • It supports wider decoding width with the additional simple decoder.
Read Also:  What is 7th Generation Processor? Pros, Cons & More

Instruction Set Architectures and Extension Support

The Sunny Cove architecture supports x86-64, x86 Instruction Set Architecture (ISA) along with a wide array of their extensions, both new and old, such as:

  • MMX or MultiMedia eXtensions
  • SSE or Streaming SIMD Extensions, along with all its variants such as SSE2, SSE3, SSSE3, SSE4, SSE4.1, and SSE4.2
  • POPCNT or Population Count
  • AES or Advanced Encryption Standard
  • AES-NI or Advanced Encryption Standard New Instructions
  • CLMUL or Carry-less Multiplication
  • FSGSBASE instructions for FS and GS segment registers
  • RDRAND or Read Random
  • FMA3 or Fused Multiply Add
  • F16C or 16-bit floating point conversion instructions
  • BMI or Bit Manipulation Instructions along with BMI2
  • VT-x and VT-d or Virtualization extensions
  • TXT or Text File extension
  • TSX or Transactional Synchronization Extensions
  • RDSEED or Read Random SEED
  • ADCX or Add-Carry Instruction Extensions
  • PREFETCHW or Prefetch Data into Caches in Anticipation of a Write
  • CLFLUSHOPT or Flush Cache Line Optimized
  • XSAVE or Save Processor Extended States
  • SGX or Software Guard Extensions
  • MPX or Memory Protection Extensions
  • SHA or Secure Hashing Algorithm

In addition to the AVX, or Advanced Vector Extensions, along with the AVX2 instructions, the Sunny Cove architecture also supports quite a few additional AVX-512 extensions as follows:

  • AVX512-VPOPCNTDQ or AVX-512 Vector Population Count Doubleword and Quadword
  • AVX512-VNNI or AVX-512 Vector Neural Network Instructions
  • AVX512-GFNI or AVX-512 Galois Field New Instructions
  • AVX512-VAES or AVX-512 Vector AES
  • AVX512-VBMI2 or AVX-512 Vector Bit Manipulation, Version 2
  • AVX512-BITALG or AVX-512 Bit Algorithms
  • AVX512-VPCLMULQDQ or AVX-512 Vector Carry-Less Multiply

Apart from that, there are also a few other new instructions supported by the Sunny Cove architecture as follows:

  • CLWB or Force Cache Line Write-Back without flush
  • RDPID or Read Processor ID
  • SSE_GFNI or SSE-based Galois Field New Instructions
  • Split Lock Detection that detects and causes an exception for split locks
  • Fast Short REP MOV or Fast Short Repeat Move instruction

And only the server parts of the Ice Lake variant support special features and instructions such as:

  • TME or Total Memory Encryption
  • PCONFIG or Platform Configuration
  • WBNOINVD or Write Back and Do Not Invalidate cache
  • ENCLV or Enclave instructions which are actually SGX oversubscription instructions

Memory Subsystem

The memory of the Sunny Cove processors comes with divisions such as L1, L2, and L3, each with different features.

The L0 µOP cache is divided statically in each core between the threads and is inclusive with the L1 instruction cache. It comes with the following features:

  • 2,304 µOps
  • 8-way set associative
  • 48 sets
  • 6-µOP line size

The Level 1 instruction cache is shared by two threads per core and comes with the following features:

  • 32 KiB
  • 8-way set associative
  • 64 sets
  • 64 B line size

The Level 1 data cache is also shared by the two threads per core and comes with the following features:

  • 48 KiB
  • 12-way set associative
  • 64 sets
  • 64 B line size
  • 4 cycles allowing simple pointer access and faster load-to-use
  • 5 cycles for complex addresses
  • Bandwidth of 2x 64 B/cycle load and 1x 64 B/cycle store or 2x 32 B/cycle store
  • Write-back policy

The Level 2 cache of the Sunny Cove processors for server and client come with slightly different features.

As for the clients, the features are as follows:

  • Unified
  • 512 KiB
  • 8-way set associative
  • 1024 sets
  • 64 B line size

As for the servers, the features are as follows:

  • Unified
  • 1280 KiB
  • 20-way set associative
  • 1024 sets
  • 64 B line size
  • 13 cycles for faster load-to-use
  • Non-inclusive
  • 64 B/cycle bandwidth to L1
  • Write-back policy

And the Level 3 cache of the Sunny Cove processor comes with the following features:

  • 2 MiB per core
  • 16-way set associative

Translation Lookaside Buffer

The Sunny Cove architecture also supports different Translation Lookaside Buffers or TLBs, which consist of the following:

  • A dedicated L1 data TLB, called DTLB and
  • Another dedicated instruction cache, called ITLB
Read Also:  What is Parallel Processing? Types, Examples & More

In addition to that, there is also an additional and unified L2 TLB or Second-level or Shared Translation Lookaside Buffer (STLB).

The ITLB supports two types of page translations in different ways.

As for the 4 KiB page translations, the ITLB allows the following:

  • 128 entries
  • 8-way set associative
  • Dynamic partitioning

As for the 2 MiB or 4 MiB page translations, the ITLB allows the following:

  • 16 entries per thread
  • Fully associative
  • Duplicated for every thread

The DTLB of the Sunny Cove architecture supports loading and storing in different ways.

For the load operation, the DTLB allows three different types of page translation in different ways.

For example, for 4 KiB page translations, the DTLB allows the following:

  • 64 entries
  • 4-way set associative
  • Competitively shared

For 2 MiB or 4 MiB page translations, the DTLB supports the following:

  • 32 entries
  • 4-way set associative
  • Competitively shared

And for 1G page translations, it allows the following:

  • 8 entries
  • 8-way set associative
  • Competitively partition

For store operations, the DTLB supports all pages and allows the following:

  • 16 entries
  • 16-way set associative
  • Competitively partitioned

As for the additional STLB, it supports 16-way set associative, 2048 entries for all pages. And, when it comes to partitioning operations, the STLB allows the following:

  • The 4 KiB pages can use all 2,048 entries
  • The 2 MiB or 4 MiB pages can use 8-way sets or 1024 entries shared with 4 KiB pages and
  • The 1 GiB pages can use 8-way sets or 1024 entries shared across 4 KiB pages.

Scheduler Ports

The Sunny Cove processors come with an enlarged scheduler with two additional ports.

All these ports have a specific purpose to serve during the memory operations. All these are wider and can dispatch up to ten operations in every cycle.

If you consider the arithmetic perspective of the execution engine, the functionality is boosted by the four workhorse ports.

If you consider the vector point of view, the performance of the Sunny Cove is retained by the three FMAs and ALUs.

However, the addition of another shuffle unit on Port 1 is a significant change in its architecture which allows easy moving of data within the register.

Some of the ports of the scheduler may be designated for multiple purposes, while a few may serve a single purpose.

Here is the detail of the designation of the different scheduler ports.

The Port 0 of the scheduler is designated for the following:

  • Integer or vector arithmetic
  • Multiplication
  • Shift
  • Logic
  • String ops
  • FP add and multiply
  • FMA
  • Integer or FP division and square root
  • AES
  • Branch2

The scheduler Port 1 is designated for the following:

  • Integer or vector arithmetic
  • Multiplication
  • Shift
  • Logic
  • Bit scanning
  • FP add and multiply
  • FMA

Port 2 and Port 3 are designated to load AGU.

Port 4 is designated to store data.

The Port 5 of the scheduler is designated for the following:

  • Integer or vector arithmetic
  • Logic
  • Vector permute
  • x87
  • FP add
  • Composite Int
  • CLMUL

The Port 6 is designated for the following:

  • Integer arithmetic
  • Logic
  • Shift
  • Branch

Port 7 and Port 8 are designated to store AGU.

The Port 9 is designated to store data.

Execution Units

There are several different types of execution units in the Sunny Cove processors, and each of them is vested with the responsibility to carry out different instructions.

Here is the breakdown of the execution units with their numbers and the types of instructions carried out.

There are four ALUs that carry out the following instructions:

  • add
  • and
  • cmp
  • or
  • test
  • xor
  • movzx
  • movsx
  • mov
  • (v)movdqu
  • (v)movdqa
  • (v)movap
  • (v)movup

There are two SHFT units that carry out the following instructions and more:

  • sal
  • shl
  • rol
  • adc
  • sarx
  • adcx
  • adox

There is only one Slow Int unit that carries out the following instructions and more:

  • mul
  • imul
  • bsr
  • rcl
  • shld
  • mulx
  • pdep
Read Also:  What is 5th Generation Processor? Pros, Cons & More

There are two BM units that carry out the following instructions and more:

  • andn
  • bextr
  • blsi
  • blsmsk
  • bzhi

There are three Vector ALUs that carry out the following instructions:

  • (v)pand
  • (v)por
  • (v)pxor
  • (v)movq
  • (v)movap
  • (v)movup
  • (v)andp
  • (v)orp
  • (v)paddb
  • (v)paddw
  • (v)paddd
  • (v)paddq
  • (v)blendv
  • (v)blendp
  • (v)pblendd

There are two Vec_Shft units that carry out the following instructions:

  • (v)psllv
  • (v)psrlv
  • vector shift count in imm8

There are two Vector Add units that carry out the following instructions:

  • (v)addp
  • (v)cmpp
  • (v)max
  • (v)min
  • (v)pads
  • (v)paddus
  • (v)psign
  • (v)pabs
  • (v)pavgb
  • (v)pcmpeq
  • (v)pmax
  • (v)cvtps2dq
  • (v)cvtdq2ps
  • (v)cvtsd2si
  • (v)cvtss2si

There are two Shuffle units that carry out the following instructions:

  • (v)shufp
  • vperm
  • (v)pack
  • (v)unpck
  • (v)punpck
  • (v)pshuf
  • (v)pslldq
  • (v)alignr
  • (v)pmovzx
  • vbroadcast
  • (v)pslldq
  • (v)psrldq
  • (v)pblendw

There are two Vector Mul units that carry out the following instructions:

  • (v)mul
  • (v)pmul
  • (v)pmadd

There is one SIMD Misc unit that carries out the following instructions:

  • STTNI
  • (v)pclmulqdq
  • (v)psadw
  • vector shift count in xmm

There is one FP Mov unit that carries out the following instructions:

  • (v)movsd/ss
  • (v)movd gpr

There is one DIVIDE unit that carries out the following instructions:

  • divp
  • divs
  • vdiv
  • sqrt
  • vsqrt
  • rcp
  • vrcp
  • rsqrt
  • idiv

Additional Comparative Improvements

So, with all these design aspects, the Sunny Cove processors offer better performance in comparison to their predecessors. There are also a few other relative improvements in this architecture that further add to its capabilities. These improvements are as follows:

  • 18% increase on average in the Instruction per Cycle or IPC
  • 1.6x bigger Reorder Buffer or ROB to 352 entries from 224
  • Dynamic Tuning 2.0
  • Wider decoder with 4 simple and 1 complex 5-wide decoder
  • 1.65x bigger scheduler with 160 entries in place of 97 entries
  • Larger 10-way dispatch in place of an 8-way
  • 1.55x bigger integer register file up to 280 entries in place of 180
  • 1.33x bigger vector register files with 224 entries in place of 168
  • Four distributed scheduling queues instead of two
  • Intel Deep Learning Boost for Machine Learning and Artificial Intelligence inference acceleration

Sunny Cove vs Willow Cove

  • Sunny Cove is an older technology in comparison to Willow Cove by a year, with the two having their respective dates of release as September 2019 and September 2020.
  • The predecessors of the Sunny Cove processors are the Palm Cove mobile processors and the Skylake server processors. On the other hand, the predecessor of the Willow Cove processor is the Sunny Cove processor itself.
  • The successors to the Sunny Cove processors are the Willow Cove mobile processors and the Golden Cove server processors. On the other hand, the successor to the Willow Cove processor is the Golden Cove processor.
  • The Level 2 cache size of the Sunny Cove processor is 512 KiB, while, in comparison, the L2 cache of the Willow Cove processor is 1.25 MB per core.
  • The Level 3 cache size of the Sunny Cove processors is 2 MiB per core. On the other hand, the Level 3 cache size of the Willow Cove processor is 3 MB per core.
  • The Sunny Cove processors are built on the Intel 10 nm FinFET manufacturing process. On the other hand, the Willow Cove processors are built on the Intel 10 nm Super Fin (10SF) fabrication process.

Conclusion  

The Sunny Cove processor comes with significant improvements over its predecessors, Skylake and Palm Cove processors, and is integrated into a wide variety of Intel designs.

With major uplift in the Instruction per Cycle and better front-end and back-end, the performance of them is increased significantly.

About Taylor

AvatarTaylor S. Irwin is a freelance technology writer with in-depth knowledge about computers. She has an understanding of hardware and technology gained through over 10 years of experience.

4 Comments
Oldest
Newest
Inline Feedbacks
View all comments