Why it matters: The CPU arms race is heating up, and American fabless semiconductor company Ampere Computing is also throwing its hat in the ring. The company has announced its roadmap for the coming year, and at the center of it all is the unveiling of a monstrous 256-core processor, the AmpereOne.

The AmpereOne is an Arm-compatible chip designed for cloud-native workloads, AI inferencing, databases, web servers, and media delivery. It aims to strike a balance between high performance and power efficiency.

The company has already commenced shipping its 192-core AmpereOne processors, which feature an 8-channel DDR5 memory subsystem introduced a year ago. Later this year, the company plans to unveil an updated 192-core AmpereOne CPU with beefier 12-channel DDR5 memory. This enhancement will necessitate launching an entirely new platform, laying the groundwork for the eventual transition to a 256-core variant next year.

Ampere claims that the 256-core chip will be fabricated on TSMC's cutting-edge 3nm process node and deliver a 40-percent performance boost compared to any other CPU currently on the market. The company has engineered several new features for efficient performance, memory management, caching, and AI compute capabilities.

Interestingly, the 256-core unit will utilize the same cooling solution as Ampere's existing offerings, implying a TDP of around 350 watts. That's an impressive feat, considering the sheer number of cores packed into the chip.

The adoption of the AmpereOne CPUs has been phenomenal. Ampere claims they outperform AMD's Genoa by 50 percent and Bergamo by 15 percent in terms of performance per watt. So, for data centers looking to consolidate and refresh aging infrastructure, AmpereOne promises up to 34 percent more performance per rack.

Ampere is also developing AI inference servers powered by semiconductor giant Qualcomm's Cloud AI 100 accelerators. The partnership aims to tackle the demanding workloads of large language models and generative AI applications.

Tests have shown Meta's Llama 3 language model running on Ampere CPUs at Oracle Cloud. Interestingly, the performance data shows that running Llama 3 on a 128-core Ampere Altra CPU, without a dedicated GPU, delivers the same performance as an Nvidia A10 GPU paired with an x86 CPU – all while consuming a third of the power.

Finally, Ampere formed the UCIe (Universal Chiplet Interconnect Express) working group as part of the AI Platform Alliance. This move aims to leverage the flexibility of Ampere's CPUs by enabling the integration of customer IP into future processors using the open UCIe interface.