Processors (CPU)

What is Sunny Cove Processor? (Explained)

January 8, 2023

In This Article

What is Sunny Cove Processor?

The term Sunny Cove refers to the codename given by Intel’s to the processor microarchitecture built on Intel 10 nm FinFET technology node that succeeded the Palm Cove microarchitecture.

Technically, this is a high-performing x86-64 core architecture that supports Dynamic Tuning 2.0 to sustain turbo frequencies for a long time, hardware acceleration, new subsets of AVX instructions, rebalanced execution ports, and a larger scheduler with large dispatch.

KEY TAKEAWAYS

Sunny Cove refers to the architecture of the Intel Core 10^th generation and 3^rd generation Xeon scalable server processors built on the Intel 10 nm FinFET process.
This microarchitecture is designed by the Intel design team in Haifa, Israel, and is the direct successor to the Palm Cove and Skylake processors.
The larger size of the L2 cache, μOP cache, second-level TLB, and core width of the Sunny Cove processor architecture increases the performance ability of the CPU.
There is a significant increase in the execution ports, L1 store, and allocation width in the Sunny Cove architecture design, which further augments its capabilities.
The architecture supports a much better and larger 5-level paging scheme. This increases the linear and physical address spaces, which eventually increases the virtual memory and the addressable physical memory spaces.

Understanding Sunny Cove Processor

Sunny Cove is the microarchitecture designed and manufactured by Intel.

It is designed by the Research and Development Center of Intel in Haifa, Israel, on 10 nm FinFET technology and was launched in September 2019.

According to Intel, the prime focus while designing the Sunny Cove microarchitecture was on a few specific aspects, such as:

Single-thread performance
Improved scalability
New instructions support
Deeper, wider, and smarter core which improves its performance.

The design of the Sunny Cove allows using it in a wide range of client and server products, such as:

Lakefield
Ice Lake (client)
Ice Lake (server)
The Nervana NNP-I

The Sunny Cove microarchitecture is the direct successor to mobile Palm Cove and server Skylake processors and predecessors of mobile Willow Cove and server Golden Cove processors.

Typically, this architecture is integrated in different Intel designs.

For example:

The 10^th generation Intel Core mobile processors codenamed Ice Lake, launched in September 2019 and
The 3^rd generation of Xeon scalable server processors, codenamed Ice Lake-SP, launched on April 6, 2021.

There is, however, no product featuring Sunny Cove for desktop computers.

You will need to use the Cypress Cove processors for that matter, which is actually a variant of the Sunny Cove microarchitecture, but reversed to the 14 nm technology node of Intel.

Pipeline

The Sunny Cove architecture supports a pipeline with the following features and capabilities:

Minimum of 14 stages
Maximum of 19 stages
Out-of-Order Execution or OoOE
Speculative execution
Register renaming

Front-end and Back-end

The front-end of the Sunny Cove processor architecture is also quite improved in comparison to its predecessors, just like the back end.

As for the front-end, the following features are quite significant:

It comes with a 1.5x bigger µOP cache that supports up to 2.3K entries in place of 1536.
It allows smart prefetch.
It has a much improved branch predictor.
The ITLB supports 2x2M page entries up to 16 instead of 8.
The IDQ is larger up to 70 µOps in place of 64.
The LSD can detect as many as 70 µOP loops in place of 64.

And, as for the back-end, the features are as follows:

It allows a much wider 6-way allocation than 5-way and 4-way allocation in Skylake and Broadwell, respectively.
The delivery output is 6 uops.
It supports wider decoding width with the additional simple decoder.

Instruction Set Architectures and Extension Support

The Sunny Cove architecture supports x86-64, x86 Instruction Set Architecture (ISA) along with a wide array of their extensions, both new and old, such as:

MMX or MultiMedia eXtensions
SSE or Streaming SIMD Extensions, along with all its variants such as SSE2, SSE3, SSSE3, SSE4, SSE4.1, and SSE4.2
POPCNT or Population Count
AES or Advanced Encryption Standard
AES-NI or Advanced Encryption Standard New Instructions
CLMUL or Carry-less Multiplication
FSGSBASE instructions for FS and GS segment registers
RDRAND or Read Random
FMA3 or Fused Multiply Add
F16C or 16-bit floating point conversion instructions
BMI or Bit Manipulation Instructions along with BMI2
VT-x and VT-d or Virtualization extensions
TXT or Text File extension
TSX or Transactional Synchronization Extensions
RDSEED or Read Random SEED
ADCX or Add-Carry Instruction Extensions
PREFETCHW or Prefetch Data into Caches in Anticipation of a Write
CLFLUSHOPT or Flush Cache Line Optimized
XSAVE or Save Processor Extended States
SGX or Software Guard Extensions
MPX or Memory Protection Extensions
SHA or Secure Hashing Algorithm

In addition to the AVX, or Advanced Vector Extensions, along with the AVX2 instructions, the Sunny Cove architecture also supports quite a few additional AVX-512 extensions as follows:

AVX512-VPOPCNTDQ or AVX-512 Vector Population Count Doubleword and Quadword
AVX512-VNNI or AVX-512 Vector Neural Network Instructions
AVX512-GFNI or AVX-512 Galois Field New Instructions
AVX512-VAES or AVX-512 Vector AES
AVX512-VBMI2 or AVX-512 Vector Bit Manipulation, Version 2
AVX512-BITALG or AVX-512 Bit Algorithms
AVX512-VPCLMULQDQ or AVX-512 Vector Carry-Less Multiply

Apart from that, there are also a few other new instructions supported by the Sunny Cove architecture as follows:

CLWB or Force Cache Line Write-Back without flush
RDPID or Read Processor ID
SSE_GFNI or SSE-based Galois Field New Instructions
Split Lock Detection that detects and causes an exception for split locks
Fast Short REP MOV or Fast Short Repeat Move instruction

And only the server parts of the Ice Lake variant support special features and instructions such as:

TME or Total Memory Encryption
PCONFIG or Platform Configuration
WBNOINVD or Write Back and Do Not Invalidate cache
ENCLV or Enclave instructions which are actually SGX oversubscription instructions

Memory Subsystem

The memory of the Sunny Cove processors comes with divisions such as L1, L2, and L3, each with different features.

The L0 µOP cache is divided statically in each core between the threads and is inclusive with the L1 instruction cache. It comes with the following features:

2,304 µOps
8-way set associative
48 sets
6-µOP line size

The Level 1 instruction cache is shared by two threads per core and comes with the following features:

32 KiB
8-way set associative
64 sets
64 B line size

The Level 1 data cache is also shared by the two threads per core and comes with the following features:

48 KiB
12-way set associative
64 sets
64 B line size
4 cycles allowing simple pointer access and faster load-to-use
5 cycles for complex addresses
Bandwidth of 2x 64 B/cycle load and 1x 64 B/cycle store or 2x 32 B/cycle store
Write-back policy

The Level 2 cache of the Sunny Cove processors for server and client come with slightly different features.

As for the clients, the features are as follows:

Unified
512 KiB
8-way set associative
1024 sets
64 B line size

As for the servers, the features are as follows:

Unified
1280 KiB
20-way set associative
1024 sets
64 B line size
13 cycles for faster load-to-use
Non-inclusive
64 B/cycle bandwidth to L1
Write-back policy

And the Level 3 cache of the Sunny Cove processor comes with the following features:

2 MiB per core
16-way set associative

Translation Lookaside Buffer

The Sunny Cove architecture also supports different Translation Lookaside Buffers or TLBs, which consist of the following:

A dedicated L1 data TLB, called DTLB and
Another dedicated instruction cache, called ITLB

In addition to that, there is also an additional and unified L2 TLB or Second-level or Shared Translation Lookaside Buffer (STLB).

The ITLB supports two types of page translations in different ways.

As for the 4 KiB page translations, the ITLB allows the following:

128 entries
8-way set associative
Dynamic partitioning

As for the 2 MiB or 4 MiB page translations, the ITLB allows the following:

16 entries per thread
Fully associative
Duplicated for every thread

The DTLB of the Sunny Cove architecture supports loading and storing in different ways.

For the load operation, the DTLB allows three different types of page translation in different ways.

For example, for 4 KiB page translations, the DTLB allows the following:

64 entries
4-way set associative
Competitively shared

For 2 MiB or 4 MiB page translations, the DTLB supports the following:

32 entries
4-way set associative
Competitively shared

And for 1G page translations, it allows the following:

8 entries
8-way set associative
Competitively partition

For store operations, the DTLB supports all pages and allows the following:

16 entries
16-way set associative
Competitively partitioned

As for the additional STLB, it supports 16-way set associative, 2048 entries for all pages. And, when it comes to partitioning operations, the STLB allows the following:

The 4 KiB pages can use all 2,048 entries
The 2 MiB or 4 MiB pages can use 8-way sets or 1024 entries shared with 4 KiB pages and
The 1 GiB pages can use 8-way sets or 1024 entries shared across 4 KiB pages.

Scheduler Ports

The Sunny Cove processors come with an enlarged scheduler with two additional ports.

All these ports have a specific purpose to serve during the memory operations. All these are wider and can dispatch up to ten operations in every cycle.

If you consider the arithmetic perspective of the execution engine, the functionality is boosted by the four workhorse ports.

If you consider the vector point of view, the performance of the Sunny Cove is retained by the three FMAs and ALUs.

However, the addition of another shuffle unit on Port 1 is a significant change in its architecture which allows easy moving of data within the register.

Some of the ports of the scheduler may be designated for multiple purposes, while a few may serve a single purpose.

Here is the detail of the designation of the different scheduler ports.

The Port 0 of the scheduler is designated for the following:

Integer or vector arithmetic
Multiplication
Shift
Logic
String ops
FP add and multiply
FMA
Integer or FP division and square root
AES
Branch2

The scheduler Port 1 is designated for the following:

Integer or vector arithmetic
Multiplication
Shift
Logic
Bit scanning
FP add and multiply
FMA

Port 2 and Port 3 are designated to load AGU.

Port 4 is designated to store data.

The Port 5 of the scheduler is designated for the following:

Integer or vector arithmetic
Logic
Vector permute
x87
FP add
Composite Int
CLMUL

The Port 6 is designated for the following:

Integer arithmetic
Logic
Shift
Branch

Port 7 and Port 8 are designated to store AGU.

The Port 9 is designated to store data.

Execution Units

There are several different types of execution units in the Sunny Cove processors, and each of them is vested with the responsibility to carry out different instructions.

Here is the breakdown of the execution units with their numbers and the types of instructions carried out.

There are four ALUs that carry out the following instructions:

add
and
cmp
or
test
xor
movzx
movsx
mov
(v)movdqu
(v)movdqa
(v)movap
(v)movup

There are two SHFT units that carry out the following instructions and more:

sal
shl
rol
adc
sarx
adcx
adox

There is only one Slow Int unit that carries out the following instructions and more:

mul
imul
bsr
rcl
shld
mulx
pdep

There are two BM units that carry out the following instructions and more:

andn
bextr
blsi
blsmsk
bzhi

There are three Vector ALUs that carry out the following instructions:

(v)pand
(v)por
(v)pxor
(v)movq
(v)movap
(v)movup
(v)andp
(v)orp
(v)paddb
(v)paddw
(v)paddd
(v)paddq
(v)blendv
(v)blendp
(v)pblendd

There are two Vec_Shft units that carry out the following instructions:

(v)psllv
(v)psrlv
vector shift count in imm8

There are two Vector Add units that carry out the following instructions:

(v)addp
(v)cmpp
(v)max
(v)min
(v)pads
(v)paddus
(v)psign
(v)pabs
(v)pavgb
(v)pcmpeq
(v)pmax
(v)cvtps2dq
(v)cvtdq2ps
(v)cvtsd2si
(v)cvtss2si

There are two Shuffle units that carry out the following instructions:

(v)shufp
vperm
(v)pack
(v)unpck
(v)punpck
(v)pshuf
(v)pslldq
(v)alignr
(v)pmovzx
vbroadcast
(v)pslldq
(v)psrldq
(v)pblendw

There are two Vector Mul units that carry out the following instructions:

(v)mul
(v)pmul
(v)pmadd

There is one SIMD Misc unit that carries out the following instructions:

STTNI
(v)pclmulqdq
(v)psadw
vector shift count in xmm

There is one FP Mov unit that carries out the following instructions:

(v)movsd/ss
(v)movd gpr

There is one DIVIDE unit that carries out the following instructions:

divp
divs
vdiv
sqrt
vsqrt
rcp
vrcp
rsqrt
idiv

Additional Comparative Improvements

So, with all these design aspects, the Sunny Cove processors offer better performance in comparison to their predecessors. There are also a few other relative improvements in this architecture that further add to its capabilities. These improvements are as follows:

18% increase on average in the Instruction per Cycle or IPC
1.6x bigger Reorder Buffer or ROB to 352 entries from 224
Dynamic Tuning 2.0
Wider decoder with 4 simple and 1 complex 5-wide decoder
1.65x bigger scheduler with 160 entries in place of 97 entries
Larger 10-way dispatch in place of an 8-way
1.55x bigger integer register file up to 280 entries in place of 180
1.33x bigger vector register files with 224 entries in place of 168
Four distributed scheduling queues instead of two
Intel Deep Learning Boost for Machine Learning and Artificial Intelligence inference acceleration

Sunny Cove vs Willow Cove

Sunny Cove is an older technology in comparison to Willow Cove by a year, with the two having their respective dates of release as September 2019 and September 2020.
The predecessors of the Sunny Cove processors are the Palm Cove mobile processors and the Skylake server processors. On the other hand, the predecessor of the Willow Cove processor is the Sunny Cove processor itself.
The successors to the Sunny Cove processors are the Willow Cove mobile processors and the Golden Cove server processors. On the other hand, the successor to the Willow Cove processor is the Golden Cove processor.
The Level 2 cache size of the Sunny Cove processor is 512 KiB, while, in comparison, the L2 cache of the Willow Cove processor is 1.25 MB per core.
The Level 3 cache size of the Sunny Cove processors is 2 MiB per core. On the other hand, the Level 3 cache size of the Willow Cove processor is 3 MB per core.
The Sunny Cove processors are built on the Intel 10 nm FinFET manufacturing process. On the other hand, the Willow Cove processors are built on the Intel 10 nm Super Fin (10SF) fabrication process.

Conclusion

The Sunny Cove processor comes with significant improvements over its predecessors, Skylake and Palm Cove processors, and is integrated into a wide variety of Intel designs.

With major uplift in the Instruction per Cycle and better front-end and back-end, the performance of them is increased significantly.