Comparison of ARM processors

(Redirected from Comparison of ARMv7-A cores)

This is a comparison of ARM instruction set architecture application processor cores designed by ARM Holdings (ARM Cortex-A) and 3rd parties. It does not include ARM Cortex-R, ARM Cortex-M, or legacy ARM cores.

ARMv7-A edit

This is a table comparing 32-bit central processing units that implement the ARMv7-A (A means Application[1]) instruction set architecture and mandatory or optional extensions of it, the last AArch32.

CoreDecode
width
Execution
ports
Pipeline
depth
Out-of-order executionFPUPipelined
VFP
FPU
registers
NEON
(SIMD)
big.LITTLE
role
Virtualization[2]Process
technology
L0
cache
L1
cache
L2
cache
Core
configurations
Speed
per
core
(DMIPS
/ MHz
)
ARM part number
(in the main ID register)
ARM Cortex-A518NoVFPv4 (optional)16 × 64-bit64-bit wide (optional)NoNo40/28 nm4–64 KiB / core1, 2, 41.570xC05
ARM Cortex-A725[3]8NoVFPv4Yes16 × 64-bit64-bit wideLITTLEYes[4]40/28 nm8–64 KiB / coreup to 1 MiB (optional)1, 2, 4, 81.90xC07
ARM Cortex-A822[5]13NoVFPv3No32 × 64-bit64-bit wideNoNo65/55/45 nm32 KiB + 32 KiB256 or 512 (typical) KiB12.00xC08
ARM Cortex-A923[6]8–11[7]YesVFPv3 (optional)Yes(16 or 32) × 64-bit64-bit wide (optional)Companion CoreNo[7]65/45/40/32/28 nm32 KiB + 32 KiB1 MiB1, 2, 42.50xC09
ARM Cortex-A12211YesVFPv4Yes32 × 64-bit128-bit wideNo[8]Yes28 nm32–64 KiB + 32 KiB256 KiB, to 8 MiB1, 2, 43.00xC0D
ARM Cortex-A1538[3]15/17-25YesVFPv4Yes32 × 64-bit128-bit widebigYes[9]32/28/20 nm32 KiB + 32 KiB per coreup to 4 MiB per cluster, up to 8 MiB per chip2, 4, 8 (4×2)3.5 to 4.010xC0F
ARM Cortex-A172[10]11+YesVFPv4Yes32 × 64-bit128-bit widebigYes28 nm32 KiB + 32 KiB per core256 KiB, up to 8 MiBup to 44.00xC0E
Qualcomm Scorpion23[11]10Yes (FXU&LSU only)[12]VFPv3Yes128-bit wideNo65/45 nm32 KiB + 32 KiB256 KiB (single-core)
512 KiB (dual-core)
1, 22.10x00F
Qualcomm Krait[13]3711YesVFPv4[14]Yes128-bit wideNo28 nm4 KiB + 4 KiB direct mapped16 KiB + 16 KiB 4-way set associative1 MiB 8-way set associative (dual-core) / 2 MiB (quad-core)2, 43.3 (Krait 200)
3.39 (Krait 300)
3.39 (Krait 400)
3.51 (Krait 450)
0x04D

0x06F
Swift3512YesVFPv4Yes32 × 64-bit128-bit wideNo32 nm32 KiB + 32 KiB1 MiB23.5?
CoreDecode
width
Execution
ports
Pipeline
depth
Out-of-order executionFPUPipelined
VFP
FPU
registers
NEON
(SIMD)
big.LITTLE
role
Virtualization[2]Process
technology
L0
cache
L1
cache
L2
cache
Core
configurations
Speed
per
core
(DMIPS
/ MHz
)
ARM part number
(in the main ID register)

ARMv8-A edit

This is a table of 64/32-bit central processing units that implement the ARMv8-A instruction set architecture and mandatory or optional extensions of it. Most chips support the 32-bit ARMv7-A for legacy applications. All chips of this type have a floating-point unit (FPU) that is better than the one in older ARMv7-A and NEON (SIMD) chips. Some of these chips have coprocessors also include cores from the older 32-bit architecture (ARMv7). Some of the chips are SoCs and can combine both ARM Cortex-A53 and ARM Cortex-A57, such as the Samsung Exynos 7 Octa.

CompanyCoreReleasedRevisionDecodePipeline
depth
Out-of-order
execution
Branch
prediction
big.LITTLE roleExec.
ports
SIMDFab
(in nm)
Simult. MTL0 cacheL1 cache
Instr + Data
(in KiB)
L2 cacheL3 cacheCore
configu-
rations
Speed per core (DMIPS/
MHz
[note 1])
Clock rateARM part number (in the main ID register)
Have itEntries
ARMCortex-A32 (32-bit)[15]2017ARMv8.0-A
(only 32-bit)
2-wide8No0?LITTLE??28[16]NoNo8–64 + 8–640–1 MiBNo1–4+2.3?0xD01
Cortex-A34 (64-bit)[17]2019ARMv8.0-A
(only 64-bit)
2-wide8No0?LITTLE???NoNo8–64 + 8–640–1 MiBNo1–4+??0xD02
Cortex-A35[18]2017ARMv8.0-A2-wide[19]8No0YesLITTLE??28 / 16 /
14 / 10
NoNo8–64 + 8–640 / 128 KiB–1 MiBNo1–4+1.7[20]-1.85?0xD04
Cortex-A53[21]2014ARMv8.0-A2-wide8No0Conditional+
Indirect branch
prediction
big/LITTLE2?28 / 20 /
16 / 14 / 10
NoNo8–64 + 8–64128 KiB–2 MiBNo1–4+2.24[22]?0xD03
Cortex-A55[23]2017ARMv8.2-A2-wide8No0big/LITTLE2?28 / 20 /
16 / 14 / 12 / 10 / 5[24]
NoNo16–64 + 16–640–256 KiB/core0–4 MiB1–8+2.65[25]?0xD05
Cortex-A57[26]2013ARMv8.0-A3-wide15Yes
3-wide dispatch
??big8?28 / 20 /
16[27] / 14
NoNo48 + 320.5–2 MiBNo1–4+4.1[20]-4.8?0xD07
Cortex-A65[28]2019ARMv8.2-A
(only 64-bit)
2-wide10-12Yes
4-wide dispatch
Two-level?9?SMT2No32–64 + 32–64 KiB0, 64–256 KiB0, 0.5–4 MiB1-8??0xD06
Cortex-A65AE[29]2019ARMv8.2-A??YesTwo-level?2?SMT2No32–64 + 32–64 KiB64–256 KiB0, 0.5–4 MiB1–8??0xD43
Cortex-A72[30]2015ARMv8.0-A3-wide15Yes
5-wide dispatch
Two-levelbig828 / 16NoNo48 + 320.5–4 MiBNo1–4+4.7[22]-6.3[31]?0xD08
Cortex-A73[32]2016ARMv8.0-A2-wide11–12Yes
4-wide dispatch
Two-levelbig728 / 16 / 10NoNo64 + 32/641–8 MiBNo1–4+4.8[20]–8.5[31]?0xD09
Cortex-A75[23]2017ARMv8.2-A3-wide11–13Yes
6-wide dispatch
Two-levelbig8?2*128b28 / 16 / 10NoNo64 + 64256–512 KiB/core0–4 MiB1–8+6.1[20]–9.5[31]?0xD0A
Cortex-A76[33]2018ARMv8.2-A4-wide11–13Yes
8-wide dispatch
128Two-levelbig82*128b10 / 7NoNo64 + 64256–512 KiB/core1–4 MiB1–46.4?0xD0B
Cortex-A76AE[34]2018ARMv8.2-A??Yes128Two-levelbig??NoNo??????0xD0E
Cortex-A77[35]2019ARMv8.2-A4-wide11–13Yes
10-wide dispatch
160Two-levelbig122*128b7No1.5K entries64 + 64256–512 KiB/core1–4 MiB1–47.3[20][36]?0xD0D
Cortex-A78[37][38]2020ARMv8.2-A4-wideYes160Yesbig132*128bNo1.5K entries32/64 + 32/64256–512 KiB/core1–4 MiB1–47.6-8.2?0xD41
Cortex-X1[39]2020ARMv8.2-A5-wide[39]?Yes224Yesbig154*128bNo3K entries64 + 64up to 1 MiB[39]up to 8 MiB[39]custom[39]10-11?0xD44
AppleCyclone[40]2013ARMv8.0-A6-wide[41]16[41]Yes[41]192YesNo9[41]28[42]NoNo64 + 64[41]1 MiB[41]4 MiB[41]2[43]?1.3–1.4 GHz
Typhoon2014ARMv8.0‑A6-wide[44]16[44]Yes[44]YesNo920NoNo64 + 64[41]1 MiB[44]4 MiB[41]2, 3 (A8X)?1.1–1.5 GHz
Twister2015ARMv8.0‑A6-wide[44]16[44]Yes[44]YesNo916 / 14NoNo64 + 64[44]3 MiB[44]4 MiB[44]
No (A9X)
2?1.85–2.26 GHz
Hurricane2016ARMv8.0‑A6-wide[45]16Yes"big" (In A10/A10X paired with "LITTLE" Zephyr
cores)
93*128b16 (A10)
10 (A10X)
NoNo64 + 64[46]3 MiB[46] (A10)
8 MiB (A10X)
4 MiB[46] (A10)
No (A10X)
2x Hurricane (A10)
3x Hurricane (A10X)
?2.34–2.36 GHz
ZephyrARMv8.0‑A3-wide12YesLITTLE516 (A10)
10 (A10X)
NoNo32 + 32[47]1 MiB4 MiB[46] (A10)
No (A10X)
2x Zephyr (A10)
3x Zephyr (A10X)
?1.09–1.3 GHz
Monsoon2017ARMv8.2‑A[48]7-wide16Yes"big" (In Apple A11 paired with "LITTLE" Mistral
cores)
113*128b10NoNo64 + 64[47]8 MiBNo2x Monsoon?2.39 GHz
MistralARMv8.2‑A[48]3-wide12YesLITTLE510NoNo32 + 32[47]1 MiBNoMistral?1.19 GHz
Vortex2018ARMv8.3‑A[49]7-wide16Yes"big" (In Apple A12/Apple A12X/Apple A12Z paired with "LITTLE" Tempest
cores)
113*128b7NoNo128 + 128[47]8 MiBNo2x Vortex (A12)
4x Vortex (A12X/A12Z)
?2.49 GHz
TempestARMv8.3‑A[49]3-wide12YesLITTLE57NoNo32 + 32[47]2 MiBNo4x Tempest?1.59 GHz
Lightning2019ARMv8.4‑A[50]8-wide16Yes560"big" (In Apple A13 paired with "LITTLE" Thunder
cores)
113*128b7NoNo128 + 128[51]8 MiBNo2x Lightning?2.65 GHz
ThunderARMv8.4‑A[50]3-wide12YesLITTLE57NoNo96 + 48[52]4 MiBNo4x Thunder?1.8 GHz
Firestorm2020ARMv8.4-A[53]8-wide[54]Yes630[55]"big" (In Apple A14 and Apple M1/M1 Pro/M1 Max/M1 Ultra paired with "LITTLE" Icestorm
cores)
144*128b5No192 + 1288 MiB (A14)
12 MiB (M1)
24 MiB (M1 Pro/M1 Max)
48 MiB (M1 Ultra)
No2x Firestorm (A14)
4x Firestorm (M1)

6x or 8x Firestorm (M1 Pro)
8x Firestorm (M1 Max)
16x Firestorm (M1 Ultra)

?3.0–3.23 GHz
IcestormARMv8.4-A[53]4-wideYes110LITTLE72*128b5No128 + 644 MiB
8 MiB (M1 Ultra)
No4x Icestorm (A14/M1)
2x Icestorm (M1 Pro/Max)
4x Icestorm (M1 Ultra)
?1.82–2.06 GHz
Avalanche2021ARMv8.6‑A[53]8-wideYes"big" (In Apple A15 and Apple M2/M2 Pro/M2 Max/M2 Ultra paired with "LITTLE" Blizzard
cores)
144*128b5No192 + 12812 MiB (A15)
16 MiB (M2)
32 MiB (M2 Pro/M2 Max)
64 MiB (M2 Ultra)
No2x Avalanche (A15)
4x Avalanche (M2)
6x or 8x Avalanche (M2 Pro)

8x Avalanche (M2 Max)
16x Avalanche (M2 Ultra)

?2.93–3.49 GHz
BlizzardARMv8.6‑A[53]4-wideYesLITTLE82*128b5No128 + 644 MiB
8 MiB (M2 Ultra)
No4x Blizzard?2.02–2.42 GHz
Everest2022ARMv8.6‑A[53]8-wideYes"big" (In Apple A16 paired with "LITTLE" Sawtooth
cores)
144*128b5No192 + 12816 MiBNo2x Everest?3.46 GHz
SawtoothARMv8.6‑A[53]4-wideYesLITTLE82*128b5No128 + 644 MiBNo4x Sawtooth?2.02 GHz
NvidiaDenver[56][57]2014ARMv8‑A2-wide hardware
decoder, up to
7-wide variable-
length VLIW
micro-ops
13Not if the hardware
decoder is in use.
Can be provided
by dynamic software
translation into VLIW.
Direct+
Indirect branch
prediction
No728NoNo128 + 642 MiBNo2??
Denver 2[58]2016ARMv8‑A?13Not if the hardware
decoder is in use.
Can be provided
by dynamic software
translation into VLIW.
Direct+
Indirect branch
prediction
"Super" Nvidia's own implementation?16NoNo128 + 642 MiBNo2??
Carmel2018ARMv8.2‑A?Direct+
Indirect branch
prediction
?12NoNo128 + 642 MiB(4 MiB @ 8 cores)2 (+ 8)6.5-7.4?
CaviumThunderX[59][60]2014ARMv8-A2-wide9[60]Yes[59]Two-level?28NoNo78 + 32[61][62]16 MiB[61][62]No8–16, 24–48??
ThunderX2
[63](ex. Broadcom Vulcan[64])
2018[65]ARMv8.1-A
[66]
4-wide
"4 μops"[67][68]
?Yes[69]Multi-level??16[70]SMT4No32 + 32
(data 8-way)
256 KiB
per core[71]
1 MiB
per core[71]
16–32[71]??
MarvellThunderX32020[72]ARMv8.3+[72]8-wide?Yes
4-wide dispatch
Multi-level?77[72]SMT4[72]?64 + 32512 KiB
per core
90 MiB60??
Applied

Micro

Helix2014???????40 / 28NoNo32 + 32 (per core;
write-through
w/parity)[73]
256 KiB shared
per core pair (with ECC)
1 MiB/core2, 4, 8??
X-Gene2013?4-wide15Yes???40[74]NoNo8 MiB84.2?
X-Gene 22015?4-wide15Yes???28[75]NoNo8 MiB84.2?
X-Gene 3[75]2017???????16NoNo??32 MiB32??
QualcommKryo2015ARMv8-A??YesTwo-level?"big" or "LITTLE"
Qualcomm's own similar implementation
?14[76]NoNo32+24[77]0.5–1 MiB2+26.3?
Kryo 2002016ARMv8-A2-wide11–12Yes
7-wide dispatch
Two-levelbig714 / 11 / 10 / 6[78]NoNo64 + 32/64?512 KiB/Gold CoreNo4?1.8–2.45 GHz
2-wide8No0Conditional+
Indirect branch
prediction
LITTLE28–64? + 8–64?256 KiB/Silver Core4?1.8–1.9 GHz
Kryo 3002017ARMv8.2-A3-wide11–13Yes
8-wide dispatch
Two-levelbig810[78]NoNo64+64[78]256 KiB/Gold Core2 MiB2, 4?2.0–2.95 GHz
2-wide8No0Conditional+
Indirect branch
prediction
LITTLE2816–64? + 16–64?128 KiB/Silver4, 6?1.7–1.8 GHz
Kryo 4002018ARMv8.2-A4-wide11–13Yes
8-wide dispatch
Yesbig811 / 8 / 7NoNo64 + 64512 KiB/Gold Prime

256 KiB/Gold

2 MiB2, 1+1, 4, 1+3?2.0–2.96 GHz
2-wide8No0Conditional+
Indirect branch
prediction
LITTLE216–64? + 16–64?128 KiB/Silver4, 6?1.7–1.8 GHz
Kryo 5002019ARMv8.2-A4-wide11–13Yes
8-wide dispatch
Yesbig8 / 7No?512 KiB/Gold Prime

256 KiB/Gold

3 MiB2, 1+3?2.0–3.2 GHz
2-wide8No0Conditional+
Indirect branch
prediction
LITTLE2?128 KiB/Silver4, 6?1.7–1.8 GHz
Kryo 6002020ARMv8.4-A4-wide11–13Yes
8-wide dispatch
Yesbig6 / 5No?64 + 641024 KiB/Gold Prime

512 KiB/Gold

4 MiB2, 1+3?2.2–3.0 GHz
2-wide8No0Conditional+
Indirect branch
prediction
LITTLE2?128 KiB/Silver4, 6?1.7–1.8 GHz
Falkor[79][80]2017[81]"ARMv8.1-A features";[80] AArch64 only (not 32-bit)[80]4-wide10–15Yes
8-wide dispatch
Yes?810No24 KiB88[80] + 32500KiB1.25MiB40–48??
SamsungM1[82][83]2016ARMv8-A4-wide13[84]Yes
9-wide dispatch[85]
96big814NoNo64 + 322 MiB[86]No4?2.6 GHz
M2[82][83]2017ARMv8-A4-wide100Two-levelbig10NoNo64 + 642 MiBNo4?2.3 GHz
M3[84][87]2018ARMv8.2-A6-wide15Yes
12-wide dispatch
228Two-levelbig1210NoNo64 + 64512 KiB per core4096KB4?2.7 GHz
M4[88]2019ARMv8.2-A6-wide15Yes
12-wide dispatch
228Two-levelbig128 / 7NoNo64 + 64512 KiB per core3072KB2?2.73 GHz
M5[89]2020ARMv8.2-A6-wideYes
12-wide dispatch
228Two-levelbig7NoNo64 + 64512 KiB per core3072KB2?2.73 GHz
FujitsuA64FX[90][91]2019ARMv8.2-A4/2-wide7+Yes
5-way?
Yesn/a8+2*512b[92]7NoNo64 + 648MiB per 12+1 coresNo48+4?1.9 GHz+
HiSiliconTaiShan V110[93]2019ARMv8.2-A4-wide?Yesn/a87NoNo64 + 64512 KiB per core1 MiB per core???
CompanyCoreReleasedRevisionDecodePipeline
depth
Out-of-order
execution
Branch
prediction
big.LITTLE roleExec.
ports
SIMDFab
(in nm)
Simult. MTL0 cacheL1 cache
Instr + Data
(in KiB)
L2 cacheL3 cacheCore
configu-
rations
Speed per core (DMIPS/
MHz
[note 1])
Clock rateARM part number (in the main ID register)

See also edit

Notes edit

  1. ^ a b As Dhrystone (implied in "DMIPS") is a synthetic benchmark developed in 1980s, it is no longer representative of prevailing workloads – use with caution.

References edit