ARM Cortex-X1

The ARM Cortex-X1 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Holdings' Austin design centre as part of ARM's Cortex-X Custom (CXC) program.[1][2]

ARM Cortex-X1
General information
Launched2020
Designed byARM Ltd.
Performance
Max. CPU clock rateto 3.0 GHz in phones and 3.3 GHz in tablets/laptops 
Address width40-bit
Cache
L1 cache128 KiB (64 KiB I-cache with parity, 64 KiB D-cache) per core
L2 cache512–1024 KiB per core
L3 cache512 KiB – 8 MiB (optional)
Architecture and classification
MicroarchitectureARM Cortex-X1
Instruction setARMv8-A: A64, A32, and T32 (at the EL0 only)
Extensions
Physical specifications
Cores
  • 1–4 per cluster
Products, models, variants
Product code name
  • Hera
Variant
History
SuccessorARM Cortex-X2

Design

edit

The Cortex-X1 design is based on the ARM Cortex-A78, but redesigned for purely performance instead of a balance of performance, power, and area (PPA).[1]

The Cortex-X1 is a 5-wide decode out-of-order superscalar design with a 3K macro-OP (MOPs) cache. It can fetch 5 instructions and 8 MOPs per cycle, and rename and dispatch 8 MOPs, and 16 μOPs per cycle. The out-of-order window size has been increased to 224 entries. The backend has 15 execution ports with a pipeline depth of 13 stages and the execution latencies consists of 10 stages. It also features 4x128b SIMD units.[3][4][5][6]

ARM claims the Cortex-X1 offers 30% faster integer and 100% faster machine learning performance than the ARM Cortex-A77.[3][4][5][6]

The Cortex-X1 supports ARM's DynamIQ technology, expected to be used as high-performance cores when used in combination with the ARM Cortex-A78 mid and ARM Cortex-A55 little cores.[1][2]

Architecture changes in comparison with ARM Cortex-A78

edit
  • Around 20% performance improvement (+30% from A77)[7]
    • 30% faster integer
    • 100% faster machine learning performance
  • Out-of-order window size has been increased to 224 entries (from 160 entries)
  • Up to 4x128b SIMD units (from 2x128b)
  • 15% more silicon area
  • 5-way decode (from 4-way)
  • 8 MOPs/cycle decoded cache bandwidth (from 6 MOPs/cycle)
  • 64 KB L1D + 64 KB L1I (from 32/64 KB L1)
  • Up to 1 MB/core L2 cache (from 512 KB/core max)
  • Up to 8 MB L3 cache (from 4 MB max)

Licensing

edit

The Cortex-X1 is available as SIP core to partners of their Cortex-X Custom (CXC) program, and its design makes it suitable for integration with other SIP cores (e.g. GPU, display controller, DSP, image processor, etc.) into one die constituting a system on a chip (SoC).[1][2]

Usage

edit

See also

edit

References

edit