Intel Xe (stylized as Xe and pronounced as two separate letters,[1] abbreviation for "eXascale for everyone"[2]), earlier known unofficially as Gen12,[3][4] is a GPU architecture developed by Intel.[5]

Intel Xe
Release dateSeptember 2, 2020; 3 years ago (2020-09-02)
Manufactured byIntel
TSMC
Designed byIntel
Marketed byIntel
Fabrication processTSMC N6
Cards
Entry-levelIris Xe Graphics
High-endIntel Arc
History
PredecessorGen 11
SuccessorIntel Xe 2

Intel Xe includes a new instruction set architecture. The Xe GPU family consists of a series of microarchitectures, ranging from integrated/low power (Xe-LP),[6] to enthusiast/high performance gaming (Xe-HPG), datacenter/high performance (Xe-HP) and high performance computing (Xe-HPC).[7][8]

History

edit

Intel's first attempt at a dedicated graphics card was the Intel740,[9] released in February 1998. The Intel740 was considered unsuccessful due to its performance which was lower than market expectations, causing Intel to cease development on future discrete graphics products. However, its technology lived on in the Intel Extreme Graphics lineup.[10] Intel made another attempt with the Larrabee architecture before canceling it in 2009;[11] this time, the technology developed was used in the Xeon Phi, which was discontinued in 2020.[12]

In April 2018, it was reported that Intel was assembling a team to develop discrete graphics processing units, targeting both datacenters, as well as the PC gaming market, and therefore competitive with products from both Nvidia and AMD.[13] Rumors supporting the claim included that the company had vacancies for over 100 graphics-related jobs, and had taken on former Radeon Technologies Group (AMD) leader Raja Koduri in late 2017 – the new product was reported to be codenamed "Arctic Sound".[13] The project was reported to have initially been targeting video streaming chips for data centers, but had its scope expanded to include desktop GPUs.[13]

In June 2018, Intel confirmed it planned to launch a discrete GPU in 2020.[14]

The first functional discrete "Xe" GPU, codenamed "DG1", was reported as having begun testing in October 2019.[15]

According to a report by Hexus in late 2019, a discrete GPU would launch in mid 2020; combined GPU/CPU (GPGPU) products were also expected, for data center and autonomous driving applications. The product was expected to initially be built on a 10 nm node (with 7 nm products in 2021) and use Intel's Foveros die stacking packaging technology (see 3D die stacking).[16] During 2020, the first GPUs were released under the name Intel Iris Xe Max, being integrated in the 11th generation Intel Core processors (codenamed "Tiger Lake" and "Rocket Lake"),[4] followed in 2021 by the Iris Xe DG1 card, exclusive to Intel OEM manufacturers.[17] Finally and after some delays, the retail launch of these first discrete graphics cards from the company in over 20 years, known as the Intel Arc series, would occur during 2022.[18]

Architecture

edit

Intel Xe expands upon the microarchitectural overhaul introduced in Gen 11 with a full refactor of the instruction set architecture.[19][4] While Xe is a family of architectures, each variant has significant differences from each other as these are made with their targets in mind. The Xe GPU family consists of Xe-LP, Xe-HP, Xe-HPC, and Xe-HPG sub-architectures.

Unlike previous Intel graphics processing units which used the Execution Unit (EU) as a compute unit, Xe-HPG and Xe-HPC use the Xe-core.[20] This is similar to an Xe-LP subslice.[20] An Xe-core contains vector and matrix arithmetic logic units, which are referred to as vector and matrix engines. Other components include L1 cache and other hardware.[20][21]

Xe-LP (Low Power)

edit

Xe-LP is the low power variant of the Xe architecture with removed support for FP64.[22] Xe-LP is present as integrated graphics for 11th-generation Intel Core and the Iris Xe MAX mobile dedicated GPU (codenamed DG1),[6] as well as in the H3C XG310 Intel Server GPU (codenamed SG1).[4] Compared to its predecessor, Xe-LP includes new features such as Sampler Feedback,[23] Dual Queue Support,[24] DirectX12 View Instancing Tier2,[25] and AV1 8-bit and 10-bit fixed-function hardware decoding.[26]

Xe-HP (High Performance)

edit

Xe-HP is the datacenter/high performance variant of Xe, optimized for FP64 performance and multi-tile scalability.[5]

Xe-HPC (High Performance Compute)

edit

Xe-HPC is the high performance computing variant of the Xe architecture.[7][8] An Xe-HPC Xe-core contains 8 vector and 8 matrix engines, alongside a large 512  KB L1 cache.[27] It powers Ponte Vecchio.

Xe-HPG (High Performance Graphics)

edit

Xe-HPG is the enthusiast or high performance graphics variant of the Xe architecture. The microarchitecture is based on Xe-LP with improvements from Xe-HP and Xe-HPC.[28] The microarchitecture is focused on graphics performance and supports hardware-accelerated ray tracing,[7][29] DisplayPort 2.0,[30] XeSS or supersampling based on neural networks (similar to Nvidia DLSS), and DirectX 12 Ultimate.[31] Intel confirmed ASTC support has been removed from hardware starting with Alchemist and future Intel Arc GPU microarchitectures will also not support it.[32] An Xe-HPG Xe-core contains 16 vector engines and 16 matrix engines.[21] An Xe-HPG render slice will consist of four Xe-cores, ray tracing hardware, and other components.[21]

Xe-LPG (Low Power Graphics)

edit

The Xe-LPG architecture is a low power variant of Xe-HPG designed for the tile-based iGPUs (tGPUs) of Intel's Meteor Lake and upcoming Arrow Lake processors.It is based on the same Arc Alchemist graphics (Gen 12.7) used by Intel's Arc A-series graphics cards but is optimized for operation with lower wattage and higher performance per watt.

Intel Xe 2

edit
Intel Xe 2
Release date2024 or later
CodenameBattlemage
History
PredecessorIntel Xe
SuccessorIntel Xe 3

A successor to Xe was revealed during Intel Architecture Day 2021, under the name of Xe 2, codenamed Battlemage. It is currently under development.[21] In an exclusive interview with HardwareLuxx Tom Peterson confirmed that Xe2 will be segmented into "Xe2-LPG" (Low Power Graphics) for integrated GPUs and "Xe2-HPG" (High Performance Graphics) for discrete GPUs.[33]

Intel Xe 3

edit
Intel Xe 3
CodenameCelestial
History
PredecessorIntel Xe 2
SuccessorIntel Xe 4

Intel Xe 3 is the upcoming successor to the Intel Xe 2 microarchitecture codenamed Celestial.[21]

Intel Xe 4

edit
Intel Xe 4
CodenameDruid
History
PredecessorIntel Xe 3

Intel Xe 4 is the upcoming successor to the Intel Xe 3 microarchitecture codenamed Druid.[34]

Products using Xe

edit

Integrated graphics

edit

Newer Intel processors use the Xe-LP microarchitecture. These include 11th generation Intel Core processors (codenamed "Tiger Lake" and "Rocket Lake"),[4] 12th generation Intel Core processors (codenamed "Alder Lake"), and 13th generation Intel Core processors (codenamed "Raptor Lake").

Discrete graphics

edit

Intel Iris Xe Max (DG1)

edit
ModelLaunchProcessExecution

units

Shading

units

Clock speedsMemoryProcessing power (GFLOPS)Notes
Boost clock

(MHz)

Memory

(MT/s)

Size

(GB)

Bandwidth

(GB/s)

Bus

type

Bus width

(bit)

HalfSingleDoubleINT8
Iris Xe MAXNovember 1, 2020Intel 10SF9676816504266468LPDDR4X1285069253410138 

In August 2020, Intel was reported to be shipping Xe DG1 GPUs for a possible late 2020 release, while also commenting on a DG2 GPU aimed at the enthusiast market (later found out to be the first generation of Intel Arc nicknamed "Alchemist"). The DG1 is also sold as the Iris Xe MAX and as Iris Xe Graphics (stylized as iRIS Xe) in laptops, while cards for developers are sold as the DG1 SDV.[35][36]

The Xe MAX is an entry-level GPU that was first released on November 1, 2020, in China and is similar in most aspects to the integrated GPU found in Tiger Lake processors, the only differences being a higher clock speed, slightly higher performance and dedicated memory and a dedicated TDP requirement. It competes with Nvidia's laptop-level GeForce MX series GPUs. It is aimed at slim and highly portable productivity laptops and has 4GB of dedicated LPDDR4X-4266 memory with a 128-bit wide memory bus, has 96 EUs, 48 texture units, 24 ROPs, a peak clock speed of 1650 MHz and a performance of 2.46 FP32 teraFLOPs with a 25w TDP. By comparison, the integrated GPU in Tiger Lake processors has a performance of 2.1 FP32 teraFLOPs.[37][38] The Xe MAX does not replace the system's integrated GPU; instead it was designed to work alongside it, so tasks are split between the integrated and discrete GPUs.[39] It was initially available on only 3 laptops: The Asus VivoBook Flip 14 TP470, the Acer Swift 3X, and the Dell Inspiron 15 7000. Intel Xe MAX GPUs can only be found on systems with Tiger Lake processors.

Intel officially announced Intel Iris Xe Graphics desktop cards for OEMs and system integrators on January 26, 2021. It is aimed at mainstream desktop and business PCs as an improvement over other graphics options in AV1 video decoding, HDR (high dynamic range) video support and deep learning inference, and is not as powerful as its laptop counterpart, with only 80 enabled EUs. The first cards are made by Asus, have DisplayPort 1.4, HDMI 2.0, Dual Link DL-DVI-D outputs and are passively cooled.[40][41][42][43]

Intel Arc

edit

Intel Arc is a high-performance discrete graphics line optimized for gaming. This will compete directly with the Radeon and GeForce lines of graphics processing units. The first generation (codenamed "Alchemist"), was developed under the "DG2" name and is based on the Xe-HPG architecture. Future generations are codenamed Battlemage ("DG3", based on Xe2), Celestial ("DG4", based on Xe3), and Druid ("DG5").

Desktop
edit
Branding and Model[44]LaunchMSRP
(USD)
Code nameProcessTransistors (billion)Die size
(mm2)
Core config [a]L2 cacheClock rate
(MHz)[b]
FillrateMemoryProcessing power (TFLOPS)TDPBus
interface
Pixel
(GP/s)
Texture
(GT/s)
TypeSize (GB)Bandwidth
(GB/s)
Bus widthClock
(MT/s)
Half
precision

(base)
Single
precision

(base)
Double
precision

(base)
Arc 3A310Sep 28, 2022$110ACM-G11
(DG2-128)
TSMC
N6
7.21576 Xe cores
768:32:16:6
(192:96:2)
4 MB2000
2000
3264GDDR64 GB12464-bit155006.1443.0720.76875 WPCIe 4.0 x8
A380Jun 14, 2022$1398 Xe cores
1024:64:32:8
(256:128:2)
2000
2050
64
65.6
128
131.2
6 GB18696-bit8.192
8.3968
4.096
4.1984
1.024
1.0496
Arc 5A580Oct 10, 2023$179ACM-G10
(DG2-512)
21.740624 Xe cores
3072:192:96:24
(768:384:6)
8 MB1700
1700
163.2326.48 GB512256-bit1600020.89010.4452.611175 WPCIe 4.0 x16
Arc 7A750Oct 14, 2022$28928 Xe cores
3584:192:112:28
(896:448:7)
12 MB2050
2400
229.6
268.8
393.6
460.8
29.3888
34.4064
14.6944
17.2032
3.6736
4.3008
225 W
A770 8GB$32932 Xe cores
4096:256:128:32
(1024:512:8)
16 MB2100
2400
268.8
307.2
537.6
614.4
34.4064
39.3216
17.2032
19.6608
4.3008
4.9152
A770 16GB$34916 GB56017500
  1. ^ Shading cores (ALU): texture mapping units (TMU): render output units (ROP): ray tracing units
       (tensor cores (XMX): execution units: render slices)
  2. ^ Boost values (if available) are stated below the base value in italic.
Mobile
edit
Branding and Model[45]LaunchCode nameProcessTransistors (billion)Die size
(mm2)
Core config[a][b]L2
cache
Core clock
(MHz)[c]
Fillrate[d]MemoryProcessing power (TFLOPS)TDPBus
interface
Pixel
(GP/s)
Texture
(GT/s)
TypeSizeBandwidth
(GB/s)
Bus widthClock
(MT/s)
Half
precision
Single
precision
Double
precision
Arc 3A350MMar 30, 2022ACM-G11
(DG2-128)
TSMC
N6
7.21576 Xe cores
768:48:24:6
(96:96:2)
4 MB1150
2200
27.6
52.8
55.2
105.6
GDDR64 GB11264-bit140003.5328
6.7584
1.7664
3.3792
0.4416
0.8448
25–35 WPCIe 4.0 ×8
A370M8 Xe cores
1024:64:32:8
(128:128:2)
1550
2050
49.6
65.6
99.2
131.2
6.3488
8.3968
3.1744
4.1984
0.7936
1.0496
35–50 W
Arc 5A530MQ3 2023ACM-G12
(DG2-256)
12 Xe cores
1536:96:48:12
(192:192:3)
8 MB13004 GB
8 GB
224128-bit65–95 W
A550MQ2 2022ACM-G10
(DG2-512)
21.740616 Xe cores
2048:128:64:16
(256:256:4)
900
1700
57.6
108.8
115.2
217.6
8 GB7.3728
13.9264
3.6864
6.9632
0.9216
1.7408
60–80 W
A570MQ3 2023ACM-G12
(DG2-256)
130075–95 W
Arc 7A730MQ2 2022ACM-G10
(DG2-512)
21.740624 Xe cores
3072:192:96:24
(384:384:6)
12 MB1100
2050
105.6
196.8
211.2
393.6
12 GB336192-bit13.5168
25.1904
6.7584
12.5952
1.6896
3.1488
80–120 WPCIe 4.0 ×16
A770M32 Xe cores
4096:256:128:32
(512:512:8)
16 MB1650
2050
211.2
262.4
422.4
524.8
16 GB512256-bit1600027.0336
33.5872
13.5168
16.7936
3.3792
4.1984
120–150 W
  1. ^ Shading cores (ALU): texture mapping units (TMU): render output units (ROP): ray tracing units
       (tensor cores (XMX): execution units: render slices)
  2. ^ Texture fillrate is calculated as the number of texture mapping units (TMUs) multiplied by the base (or boost) core clock speed.
  3. ^ Boost values (if available) are stated below the base value in italic.
  4. ^ Pixel fillrate is calculated as the lowest of three numbers: number of ROPs multiplied by the base core clock speed, number of rasterizers multiplied by the number of fragments they can generate per rasterizer multiplied by the base core clock speed, and the number of streaming multiprocessors multiplied by the number of fragments per clock that they can output multiplied by the base clock rate.
Workstation
edit
Branding and Model[46]LaunchCode nameProcessTransistors (billion)Die size
(mm2)
Core config[a]L2
cache
Core clock
(MHz)[b]
Fillrate[c][d]MemoryProcessing power (TFLOPS)TDPBus
interface
Pixel
(GP/s)
Texture
(GT/s)
TypeSizeBandwidth
(GB/s)
Bus widthClock
(MT/s)
Half
precision
Single
precision
Double
precision
Arc ProA30M
(Mobile)
Aug 8, 2022ACM-G11
(DG2-128)
TSMC
N6
7.21578 Xe cores
1024:64:32:8
(128:128:2)
4 MB1550GDDR64 GB11264-bit14000
4.20[46]
50 WPCIe 4.0 x8
A406 GB19296-bit16000
5.02[46]
A50205075 W
A60M
(Mobile)
June 6, 2023ACM-G12
(DG2-256)
16 Xe cores
2048:128:64:16
(256:256:4)
13008 GB256128-bit
9.42[46]
95 WPCIe 4.0 x16
A6012 GB384192-bit
10.04[46]
130 W
  1. ^ Shading cores (ALU): texture mapping units (TMU): render output units (ROP): ray tracing units
       (tensor cores (XMX): execution Units: render slices)
  2. ^ Boost values (if available) are stated below the base value in italic.
  3. ^ Pixel fillrate is calculated as the lowest of three numbers: number of ROPs multiplied by the base core clock speed, number of rasterizers multiplied by the number of fragments they can generate per rasterizer multiplied by the base core clock speed, and the number of streaming multiprocessors multiplied by the number of fragments per clock that they can output multiplied by the base clock rate.
  4. ^ Texture fillrate is calculated as the number of texture mapping units (TMUs) multiplied by the base (or boost) core clock speed.

Datacenter

edit

Intel H3C XG310

edit

On November 11, 2020 Intel launched the H3C XG310 data center GPU consisting of four DG1 GPUs with 32GB of LPDDR4X memory on a single-slot PCIe card.[47][48] Each GPU is connected to 8GB of memory over a 128-bit bus and the card uses a PCIe 3.0 x16 connection to the rest of the system. The GPUs use the Xe-LP (Gen 12.1) architecture.

Ponte Vecchio

edit

Intel officially announced their Xe general HPC/AI GPU codenamed Ponte Vecchio on November 17, 2019. It was revealed to use the Xe-HPC variant of the architecture[49] and Intel's 'Embedded Multi-Die Interconnect Bridge' (EMIB) and Foveros die stacking packaging on a Intel 4 node (previously referred to as 7 nm). Intel later confirmed at Architecture Day 2021 that Ponte Vecchio would use Compute Tiles manufactured on TSMC N5, Base Tiles and Rambo Cache Tiles manufactured using Intel 7 (previously referred to as 10 nm Enhanced SuperFin) and Xe Link Tiles manufactured on the TSMC N7 process. The new GPU is expected to be used in Argonne National Laboratory's new exascale supercomputer, Aurora, with compute nodes comprising two next generation Intel Xeon (codenamed "Sapphire Rapids") CPUs, and six Ponte Vecchio GPUs.[50][51]

Model[52][53]LaunchCode name(s)ProcessTransistors (billion)Die size
(mm2)
Core config[a]CacheCore clock
(MHz)[b]
Fillrate[c][d]MemoryProcessing power (TFLOPS)TDPBus
interface
L1L2Pixel
(GP/s)
Texture
(GT/s)
TypeSizeBandwidth
(GB/s)
Bus widthClock
(MT/s)
Bfloat16Single
precision
Double
precision
Data Center GPU Max 1100Jan 10, 2023Xe-HPC
(Ponte Vecchio)
Multiple[54]10012807168:448:0:56:448:44828 MB204 MB1000
1550
0448.0
694.4
HBM2E48 GB1228.83072-bit3200
352
14.336
22.221
300 WPCIe 5.0 x16
Data Center GPU Max 1350abandoned14336:896:0:112:896:89656 MB408 MB750
1550
672.0
1388.8
96 GB2457.66144-bit
704
21.504
44.442
450 W
Data Center GPU Max 1550Jan 10, 202316384:1024:0:128:1024:102464 MB408 MB900
1600
921.6
1638.4
128 GB3276.88192-bit
832
29.491
54.423
600 W
  1. ^ shading cores (ALU):texture mapping units (TMU):render output units (ROP):ray tracing units:tensor cores (XMX):execution Units
  2. ^ Boost values (if available) are stated below the base value in italic.
  3. ^ Pixel fillrate is calculated as the lowest of three numbers: number of ROPs multiplied by the base core clock speed, number of rasterizers multiplied by the number of fragments they can generate per rasterizer multiplied by the base core clock speed, and the number of streaming multiprocessors multiplied by the number of fragments per clock that they can output multiplied by the base clock rate.
  4. ^ Texture fillrate is calculated as the number of texture mapping units (TMUs) multiplied by the base (or boost) core clock speed.

Rialto Bridge

edit

Intel officially announced the successor to Ponte Vecchio, GPU codenamed Rialto Bridge on May 31, 2022.[55] On March 3, 2023 Intel announced the discontinuation of Rialto Bridge in favor of their tile-based flexible and scalable Falcon Shores XPU (CPU + GPU) set to arrive in 2025.[56]

Arctic Sound

edit

Under the codename Arctic Sound Intel developed data center GPUs for visual cloud and AI inference based on the Xe-HP architecture (Gen 12.5).[57] The GPUs were supposed to be fabbed on Intel's 10nm node and have a die size of around 190 mm2 with 8 billion transistors.[58] Up to four GPUs tiles could be combined into a single package together with HBM2e memory.In October 2021 Raja Koduri announced that Xe-HP won't be commercialized into a final product.[57] Instead the Arctic Sound cards will be based on the Xe-HPG architecture (Gen 12.7), the same as the Alchemist consumer graphics cards.[59] They were launched on August 24, 2022 as the Intel Data Center GPU Flex series.On March 3, 2023 Intel announced that it would discontinue the development of Lancaster Sound which was supposed to succeed Arctic Sound in 2023 with incremental improvements. Instead Intel will accelerate the development of Melville Sound which will be a significant architectural leap in terms of performance and features.[56]

See also

edit

References

edit