Tuesday, May 31, 2022

New ARM Interconnect Technologies

 

ARM Interconnect Technologies

It’s been six years since ARM released the interconnect technology that supports its low-power chips. The two key parts of the backplane technology are the CoreLink CMN-600 Coherent Mesh Network interconnect and CoreLink DMC-620 Dynamic Memory Controller. With the new backplane, systems-on-a-chip (SoCs) based on the 64-bit ARMv8-A architecture will have the high data throughput and low latency capabilities that are crucial to these current and newer workloads in the increasingly cloud-centric world.


The Coherent Mesh Network (CMN) 600 which was introduced in late 2016 is finally reaching its limits with some of the recent server processors. Amazon’s AWS Graviton2 and Ampere Computing Altra server processors both use the CMN-600 as the underlying coherent interconnect network. Along with the launch of the new N2 and V1 server CPUs, Arm also launched the direct successor to the CMN-600, the CMN-700.

The new CMN-700 mesh network targets specifically the infrastructure market and therefore incorporates features that are critical for things such as server SoCs. The CMN-700 is now Arm’s 3rd-generation coherent interconnect IP.  Note that Arm also has a number of other IPs such as the NIC-400/450 which provides SoC connectivity but are non-coherent. The goal of the CMN-700 is to connect CPU cores to other CPU cores as well as other accelerators, I/O, cache, and memory. The CMN-700 was designed with higher bandwidth and lower latency in mind taking advantage of upcoming I/O and memory interfaces

The CMN-700 was designed primarily for things such as large core-count SoCs for the server market. The new launch of the CI-700 is a similar-purposed interconnect that better targets the client market.


Fig.1: New Arm Interconnect Technologies

ARM’s previous on-chip interconnect technology delivering the scalability, performance and efficiency demanded across multiple markets including 5G networks, data center infrastructure, HPC, automotive and industrial systems. As mentioned earlier the ARM CoreLink CMN-600 Coherent Mesh Network inter connect and CoreLink DMC-620 Dynamic Memory Controller enable the latest ARM-based SoCs to offer unmatched data throughput and the lowest edge to cloud latency in the market.

Optimized with the ARM Cortex-A processors, CoreLink CMN-600 and CoreLink DMC-620 are the industry’s only complete coherent backplane IP solution for the ARMv8-A architecture. Designers and system architects can scale high-performance SoC designs from 1 to 128 Cortex-A CPUs (32 clusters) with native ARM AMBA 5 CHI interfaces, the industry standard specification for high-performance on-chip communication.

“The demands of cloud-based business models require service providers to pack more efficient computational capability into their infrastructure,” said Monika Biddulph, general manager, systems and software group, ARM. “Our new CoreLink system IP for SoCs, based on the ARMv8-A architecture, delivers the flexibility to seamlessly integrate heterogeneous computing and acceleration to achieve the best balance of compute density and workload optimization within fixed power and space constraints.”

The combination of performance and efficiency provided by the third-generation CoreLink coherent backplane products advances the Intelligent Flexible Cloud by enabling efficient compute capability at any point from the edge of the network to the cloud.

Platform Capabilities

CMN-600

CMN-700

Uplift

# cores supported per die / system

 

64/128

256 / 512

4x

System Level Cache (SLC) size per die

64MB

512MB

4x

Nodes (cross points) per die

 

64 (8x8)

144 (12x12)

2.25

Devices per node (ex, CPUs, SLC)

2

3-5

2.5x

CHI / (AXI, CXS) Data path ink widths

 

256b / (256b)

2.256b / (512b)

2x

# memory device ports (ex, DRAM, HBM) per die

16

40

2.5x

CCIX device ports per die

4

32

8x

CXL accelerator/memory attach support

No

Yes

New

MPAM memory and SLC monitoring & partitioning

No

Yes

New

Memory access protection with Memory Tagging Extension

No

Yes

New

CBusy and interconnect hot-spot re-routing support

No

Yes

New


Table 1: Comparison between CMN-600 and CMN-700 platform capabilities

Today Arm is introducing a complete portfolio of IPs for the mobile market which includes a new little Armv9 CPU, a new big Armv9 CPU, a new flagship performance Armv9 CPU, new Mali GPUs, and even a new DSU. The last thing that’s needed to interconnect everything together is a coherent interconnect IP and a more comprehensive SoC transport interconnect. That’s where the new CoreLink CI-700 and the NI-700 come into play.

 

CI-700

The CoreLink CI-700 coherent interconnect is actually based on the recently-launched CMN-700 enterprise-grade mesh network. Unlike the CMN-700, the CI-700 is a custom variant especially tailored for the client devices and comes with additional efficiency optimizations specifically for the mobile consumer market. With that in mind, the CI-700 is a fully coherent interconnect supporting up to eight DSUs as well as up to 24 AMBA ACE-Lite or AXI managers accelerators or DMA devices, supports up to eight memory interfaces which can be either CHI or ACE-Lite and up to four ACE-Lite interfaces for peripherals.

The new CI-700 implements a system-level cache (SLC) with a snoop filter which helps reduce power and improve performance. The cache is exclusive to the DSU clusters, so their capacity is effectively added on top of the DSU capacity. It is also a true system-level cache, capable of caching any and all memory transactions from not just the CPUs, but also the GPU, and any other accelerator that might be interconnected as well as other high-bandwidth devices. The SLC has support for MPAM cache partitioning which is a feature that helps ensure predictability of performance by reserving certain cache capacities for certain devices or address spaces. For example, in order to prevent the GPU from consuming the entire cache for itself, MPAM can reserve a certain capacity for the CPUs, preventing a single device from starving out all other devices from system resources.

The new CI-700 is designed to run at around 1 GHz and up to 2 GHz in high-performance implementations.


Fig.2: CoreLink CI-700 Coherent Interconnect


· CoreLink CI-700 Coherent Interconnect

    It is a configurable coherent Interconnect designed together with Arm v9 Cortex processors and the latest Arm technologies to enable fully optimized Total Compute solutions. Each CoreLink CI-700 is scalable across the Total Compute solutions for premium, performance and efficiency tiers. These solutions offer different levels of performance, efficiency and scalability to deliver specialized compute across multiple consumer device markets. The scalability of CoreLink CI-700 means it can support low-power interconnect implementations from 1GHz right up to high-performance implementations up to 2GHz in 5nm processes.


Features

· Supporting Total Compute:  The three key aims of the Total Compute strategy are enhanced compute performance, security and developer access to more performant software and tools. CoreLink CI-700 and CoreLink NI-700 provide benefits across all three areas. The system improvements provide low latency for enhanced compute performance and high memory bandwidth for more advanced use cases. Both also provide higher security protections across the entire system through the new security architectural features, such as Memory Tagging Extensions (MTE). Finally, the flexible and faster configuration, which delivers a much faster time to market for our partners, is enabled through advanced design and verification tooling.

· Empowering use cases: CoreLink CI-700 is designed to meet requirements from a wide range of different use cases and consumer devices. From High Dynamic Range (HDR) and high frame rate video on DTVs right through to AAA gaming on premium mobile devices. Compute intensive applications are supported through CoreLink CI-700’s high-performance AMBA CHI mesh interconnect technology. This allows the coherent Interconnect to support 1-8 coherency clusters over the AMBA CHI interface. This aligns with the new DynamIQ Shared Unit-110 (DSU-110) that binds together different Armv9 CPU cores within a CPU cluster.


· Power and bandwidth reductions through system level cache:  Alongside performance, CoreLink CI-700 offers fully coherent, system level cache (SLC) for bandwidth and system power reductions. This reduces the average memory latency and system power due to fewer external memory transactions . It is an exclusive cache, so cache resources add to those in the Armv9 CPU clusters. Moreover, the SLC can be shared with GPUs and other accelerators. Supporting the SLC, Memory Partitioning and Monitoring (MPAM) enables control of how the SLC resources are allocated and increases predictability within the system.


· Improved system security with Memory Tagging Extensions (MTE):  A fundamental pillar of the Arm Total Compute strategy is security. This means incorporating security features that are designed to improve resilience to attacks and stop vulnerabilities at the source before they cause harm. As mentioned in this blog, Arm’s Cortex v9 CPUs have adopted MTE technology, which makes detecting memory safety violations across the entire system far easier and more efficient.



NI-700

The NI-700 is a new flexible packetized network-on-chip interconnect for both high-bandwidth accelerators and the rest of the SoC connectivity such as other peripherals. It’s applicable to just about every market. It can be used with the CI-700, CMN-700, or on its own. The NI-700 consists of a network of routers (round dots) connected to interfaces (rectangles) with links that go between them.

On the NI-700, all the transactions from the AMBA CHI or AXI are converted to a packetized format and that helps reduce the wire count by 30% on average. This also helps reduce routing congestion which helps with the physical design. It supports both multiple clocks and power domains. It’s designed to be implementable on modern processes up to around 1 GHz fairly easily. And it also supports the AMBA standard along with the recent security and reliability features it offers.


· CoreLink NI-700 Network-on-Chip Interconnect

  CoreLink NI-700 is a flexible packetized network-on-chip Interconnect for high-bandwidth accelerators, such as GPUs and NPUs, as well as rest-of-SoC connectivity. Packetization reduces wiring by 30 percent easing physical design. The Network-on-Chip (NoC) Interconnect also adopts the latest Arm architecture features and AMBA interface standards. This improves performance, reliability, and virtualization. Moreover, the advanced tooling support enables faster design, configuration, and implementation of complex SoCs for improved system performance and reduced routing congestion and area.

CoreLink NI-700 is also highly configurable and scalable across different use cases and devices. It not only targets consumer and mobile devices, but can also be implemented across SoC solutions targeting markets ranging from premium IoT devices to Enterprise compute.


Features

· Integrated Device Management:  A new capability that CoreLink NI-700 introduces is Integrated Device Management (IDM). IDM detects a peripheral causing a timeout, isolates it from the rest of the system, before stabilizing the system by completing the AMBA transaction (if incomplete). Finally, a software handler can recover by, for example, soft-resetting the device or powering it up if it was unpowered. This increases the uptime by overcoming issues without rebooting the entire device. This could significantly reduce how often a user needs to reboot their Wi-Fi router or set-top box, for example. CoreLink NI-700 also maintains the Quality of Service (QoS) features from previous Arm Interconnect products. The QoS provides virtual channels for non-blocking arbitration and reduced wiring as well as QoS regulators, which achieves bandwidth and latency targets across the system.


· Advanced design and verification tooling:  CoreLink CI-700 and CoreLink NI-700 support advanced design and verification tooling. These simplify the implementation and provide a quicker time-to-market for partners, as well as better results. The tools enable the quicker configuration of Arm IP within a system.

 

ARM Technologies and its significance

The ARM architecture, also known as the big Little Design, is a heterogeneous multi-processing system that uses more than one processor core and offers multiple software architectures like AMP architecture based Designs, SMP architecture based Designs and HMP architecture based Designs.

We encounter many embedded systems every day in our life, starting from smart phones and tablets to computers, Medical devices and other electronic gadgets which provide high computing capability. These electronic systems need to handle diverse compute requirements and diverse workloads and are not industry-specific; they span across several markets. In the 1980s, Acorn Computers developed the first ARM processor at Cambridge University, England for commercial purposes. These ARM processors were further enhanced to provide high-performance and efficient power management without disrupting the system’s overall efficiency.


Why ARM Technology and Processors got popular?

    ARM Holding is a leading company that was found in the 1990s. It offers a family of reduced instruction set computer (RISC) architecture which is designed specifically to form the cores of processors. This core design is licensed to silicon companies who can incorporate the processor core in their IC design in an efficient, affordable and secure way.


      ARM enabled AMP architecture based Designs, SMP architecture based Designs and HMP architecture based Designs aid creation of devices for all types of applications, with a complete toolkit and a strong global ecosystem for support. They provides a set of rules to the silicon companies which describe how the hardware works when an instruction is executed. The ARM architecture is used on CPUs to run applications software, with platform security machine to secure trillions of connected devices, and embedded systems, and thereby help the ecosystem to design secure and efficient systems as easily as possible.

      ARM’s comprehensive product offering includes 32- and 64-bit RISC microprocessors, graphics processors, enabling software, cell libraries, embedded memories, high-speed connectivity products, peripherals, and development tools. Due to low power consumption and high performance, ARM processors are being used in most of the modern devices. They have gone through several iterations to increase performance and improve power efficiency. This combination of high performance, low power consumption, wide offering, and low cost makes ARM processors popular. ARM processors have been providing better performance when compared to other processors. It is very easy to use ARM for quick and efficient application development and hence it has gained huge popularity in all varieties of applications. Here are a few of the advantages of ARM processors and their big.Little Design that have made them popular in modern-day electronics.

· They offer a variety of software system models like AMP architecture based Designs, SMP architecture based Designs and HMP architecture based Designs

· They offer a cost advantage compared to other processors

· They are designed to consume less power making it ideal for a wide variety of portable and battery-operated devices.

· Each core performs one operation per cycle and thus work faster

· The availability and applications support offered by ARM has also helped in popularizing the ARM processors


Conclusion

Both the Interconnect technologies are vital components of the new Total Compute Solutions. Both CoreLink CI-700 and CoreLink NI-700 are highly configurable IP designed to enable the very best solution performance. The improvements and power and bandwidth reductions across the system optimize key solution level use cases, such as AAA gaming. Moreover, the Interconnect technologies offer greater security protections through accelerating MTE hardware support and comprehensive design and verification tooling to speed up the SoC implementation process. This creates a seamless system, with the market-proven Interconnect designed and validated together with the latest Armv9 CPU cores. In the future, it is going to invest in the very best Interconnect technologies, bringing Arm’s Total Compute vision of seamless and secure performance for tomorrow’s compute to life.

Reference -

  1. https://fuse.wikichip.org/news/5271/arm-launches-new-coherent-and-soc-interconnects-ci-700-ni-700/
  2. https://www.hpcwire.com/off-the-wire/arm-releases-new-interconnect-technology/
  3. https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/new-arm-interconnect
  4. https://www.eweek.com/networking/arm-unveils-new-corelink-interconnect/
Image credit -


Created by -  

Rohit Agrawal

Rutuja Jaykumar Rathi​

Yashashree Shastri​​

Sameer Sumbhe​​

Vaishnav Suryawanshi​​

Syeda Zarah Aiman

New ARM Interconnect Technologies

  ARM Interconnect Technologies It’s been six years since ARM released the interconnect technology that supports its low-power chips. The...