ARM
Interconnect Technologies
It’s been six years since ARM released
the interconnect technology that supports its low-power chips. The two
key parts of the backplane technology are the CoreLink CMN-600 Coherent Mesh
Network interconnect and CoreLink DMC-620 Dynamic Memory Controller. With the
new backplane, systems-on-a-chip (SoCs) based on the 64-bit ARMv8-A architecture
will have the high data throughput and low latency capabilities that are
crucial to these current and newer workloads in the increasingly cloud-centric
world.
The Coherent Mesh Network (CMN) 600
which was introduced in late 2016 is finally reaching its limits with some of
the recent server processors. Amazon’s AWS Graviton2 and Ampere Computing
Altra server processors both use the CMN-600 as the underlying coherent
interconnect network. Along with the launch of the new N2 and V1 server CPUs,
Arm also launched the direct successor to the CMN-600, the CMN-700.
The new CMN-700 mesh network targets
specifically the infrastructure market and therefore incorporates features that
are critical for things such as server SoCs. The CMN-700 is now Arm’s
3rd-generation coherent interconnect IP. Note that Arm also has a number
of other IPs such as the NIC-400/450 which provides SoC connectivity but are
non-coherent. The goal of the CMN-700 is to connect CPU cores to other CPU
cores as well as other accelerators, I/O, cache, and memory. The CMN-700
was designed with higher bandwidth and lower latency in mind taking advantage
of upcoming I/O and memory interfaces
The CMN-700 was designed primarily for
things such as large core-count SoCs for the server market. The new launch of
the CI-700 is a similar-purposed interconnect that better targets the client
market.
ARM’s previous on-chip interconnect technology delivering
the scalability, performance and efficiency demanded across multiple markets
including 5G networks, data center infrastructure, HPC, automotive and
industrial systems. As mentioned earlier the ARM CoreLink CMN-600 Coherent Mesh
Network inter connect and CoreLink DMC-620 Dynamic Memory Controller enable the
latest ARM-based SoCs to offer unmatched data throughput and the lowest edge to
cloud latency in the market.
Optimized with the ARM Cortex-A
processors, CoreLink CMN-600 and CoreLink DMC-620 are the industry’s only
complete coherent backplane IP solution for the ARMv8-A architecture. Designers
and system architects can scale high-performance SoC designs from 1 to 128
Cortex-A CPUs (32 clusters) with native ARM AMBA 5 CHI interfaces, the industry
standard specification for high-performance on-chip communication.
“The demands of cloud-based business models require service
providers to pack more efficient computational capability into their
infrastructure,” said Monika Biddulph, general manager, systems and software
group, ARM. “Our new CoreLink system IP for SoCs, based on the ARMv8-A
architecture, delivers the flexibility to seamlessly integrate heterogeneous
computing and acceleration to achieve the best balance of compute density and
workload optimization within fixed power and space constraints.”
The combination of performance and efficiency provided by
the third-generation CoreLink coherent backplane products advances the
Intelligent Flexible Cloud by enabling efficient compute capability at any
point from the edge of the network to the cloud.
|
Platform Capabilities |
CMN-600 |
CMN-700 |
Uplift |
|
# cores
supported per die / system
|
64/128 |
256 / 512 |
4x |
|
System
Level Cache (SLC) size per die |
64MB |
512MB |
4x |
|
Nodes
(cross points) per die
|
64 (8x8) |
144 (12x12) |
2.25 |
|
Devices per
node (ex, CPUs, SLC) |
2 |
3-5 |
2.5x |
|
CHI / (AXI,
CXS) Data path ink widths
|
256b / (256b) |
2.256b / (512b) |
2x |
|
# memory
device ports (ex, DRAM, HBM) per die |
16 |
40 |
2.5x |
|
CCIX device
ports per die |
4 |
32 |
8x |
|
CXL accelerator/memory
attach support |
No |
Yes |
New |
|
MPAM memory
and SLC monitoring & partitioning |
No |
Yes |
New |
|
Memory
access protection with Memory Tagging Extension |
No |
Yes |
New |
|
CBusy and
interconnect hot-spot re-routing support |
No |
Yes |
New |
Today Arm is introducing a complete portfolio of IPs for
the mobile market which includes a new little Armv9 CPU, a new big Armv9 CPU, a
new flagship performance Armv9 CPU, new Mali GPUs, and even a new DSU. The last
thing that’s needed to interconnect everything together is a coherent
interconnect IP and a more comprehensive SoC transport interconnect. That’s
where the new CoreLink CI-700 and the NI-700 come into play.
CI-700
The CoreLink CI-700 coherent
interconnect is actually based on the recently-launched CMN-700
enterprise-grade mesh network. Unlike the CMN-700, the CI-700 is a custom
variant especially tailored for the client devices and comes with additional
efficiency optimizations specifically for the mobile consumer market. With that
in mind, the CI-700 is a fully coherent interconnect supporting up to eight DSUs
as well as up to 24 AMBA ACE-Lite or AXI managers accelerators or DMA devices,
supports up to eight memory interfaces which can be either CHI or ACE-Lite and
up to four ACE-Lite interfaces for peripherals.
The new CI-700 implements a system-level cache (SLC)
with a snoop filter which helps reduce power and improve performance. The cache
is exclusive to the DSU clusters, so their capacity is effectively added on top
of the DSU capacity. It is also a true system-level cache, capable of caching
any and all memory transactions from not just the CPUs, but also the GPU, and
any other accelerator that might be interconnected as well as other
high-bandwidth devices. The SLC has support for MPAM cache partitioning which
is a feature that helps ensure predictability of performance by reserving
certain cache capacities for certain devices or address spaces. For example, in
order to prevent the GPU from consuming the entire cache for itself, MPAM can
reserve a certain capacity for the CPUs, preventing a single device from
starving out all other devices from system resources.
The new CI-700 is designed to run at around 1 GHz and up
to 2 GHz in high-performance implementations.
Fig.2: CoreLink CI-700 Coherent Interconnect
· CoreLink CI-700 Coherent Interconnect
It is a configurable coherent Interconnect designed together with Arm v9 Cortex processors and the latest Arm technologies to enable fully optimized Total Compute solutions. Each CoreLink CI-700 is scalable across the Total Compute solutions for premium, performance and efficiency tiers. These solutions offer different levels of performance, efficiency and scalability to deliver specialized compute across multiple consumer device markets. The scalability of CoreLink CI-700 means it can support low-power interconnect implementations from 1GHz right up to high-performance implementations up to 2GHz in 5nm processes.
Features
· Supporting Total Compute: The three key aims of the Total Compute strategy are enhanced compute performance, security and developer access to more performant software and tools. CoreLink CI-700 and CoreLink NI-700 provide benefits across all three areas. The system improvements provide low latency for enhanced compute performance and high memory bandwidth for more advanced use cases. Both also provide higher security protections across the entire system through the new security architectural features, such as Memory Tagging Extensions (MTE). Finally, the flexible and faster configuration, which delivers a much faster time to market for our partners, is enabled through advanced design and verification tooling.
· Empowering use cases: CoreLink CI-700 is designed to meet requirements from a wide range of different use cases and consumer devices. From High Dynamic Range (HDR) and high frame rate video on DTVs right through to AAA gaming on premium mobile devices. Compute intensive applications are supported through CoreLink CI-700’s high-performance AMBA CHI mesh interconnect technology. This allows the coherent Interconnect to support 1-8 coherency clusters over the AMBA CHI interface. This aligns with the new DynamIQ Shared Unit-110 (DSU-110) that binds together different Armv9 CPU cores within a CPU cluster.
· Power and bandwidth reductions through system level cache: Alongside performance, CoreLink CI-700 offers fully coherent, system level cache (SLC) for bandwidth and system power reductions. This reduces the average memory latency and system power due to fewer external memory transactions . It is an exclusive cache, so cache resources add to those in the Armv9 CPU clusters. Moreover, the SLC can be shared with GPUs and other accelerators. Supporting the SLC, Memory Partitioning and Monitoring (MPAM) enables control of how the SLC resources are allocated and increases predictability within the system.
· Improved system security with Memory Tagging Extensions (MTE): A fundamental pillar of the Arm Total Compute strategy is security. This means incorporating security features that are designed to improve resilience to attacks and stop vulnerabilities at the source before they cause harm. As mentioned in this blog, Arm’s Cortex v9 CPUs have adopted MTE technology, which makes detecting memory safety violations across the entire system far easier and more efficient.
NI-700
The NI-700 is a new flexible packetized network-on-chip
interconnect for both high-bandwidth accelerators and the rest of the SoC
connectivity such as other peripherals. It’s applicable to just about every
market. It can be used with the CI-700, CMN-700, or on its own. The NI-700
consists of a network of routers (round dots) connected to interfaces
(rectangles) with links that go between them.
On the NI-700, all the transactions from the AMBA CHI or
AXI are converted to a packetized format and that helps reduce the wire count
by 30% on average. This also helps reduce routing congestion which helps with
the physical design. It supports both multiple clocks and power domains. It’s
designed to be implementable on modern processes up to around 1 GHz fairly
easily. And it also supports the AMBA standard along with the recent security
and reliability features it offers.
· CoreLink NI-700 Network-on-Chip Interconnect
CoreLink NI-700 is a flexible packetized network-on-chip Interconnect for high-bandwidth accelerators, such as GPUs and NPUs, as well as rest-of-SoC connectivity. Packetization reduces wiring by 30 percent easing physical design. The Network-on-Chip (NoC) Interconnect also adopts the latest Arm architecture features and AMBA interface standards. This improves performance, reliability, and virtualization. Moreover, the advanced tooling support enables faster design, configuration, and implementation of complex SoCs for improved system performance and reduced routing congestion and area.
CoreLink NI-700 is also highly configurable and scalable across different use cases and devices. It not only targets consumer and mobile devices, but can also be implemented across SoC solutions targeting markets ranging from premium IoT devices to Enterprise compute.
Features
· Integrated Device Management: A new capability that CoreLink NI-700 introduces is Integrated Device Management (IDM). IDM detects a peripheral causing a timeout, isolates it from the rest of the system, before stabilizing the system by completing the AMBA transaction (if incomplete). Finally, a software handler can recover by, for example, soft-resetting the device or powering it up if it was unpowered. This increases the uptime by overcoming issues without rebooting the entire device. This could significantly reduce how often a user needs to reboot their Wi-Fi router or set-top box, for example. CoreLink NI-700 also maintains the Quality of Service (QoS) features from previous Arm Interconnect products. The QoS provides virtual channels for non-blocking arbitration and reduced wiring as well as QoS regulators, which achieves bandwidth and latency targets across the system.
· Advanced design and verification tooling: CoreLink CI-700 and CoreLink NI-700 support advanced design and verification tooling. These simplify the implementation and provide a quicker time-to-market for partners, as well as better results. The tools enable the quicker configuration of Arm IP within a system.
ARM Technologies and its significance
The ARM architecture, also known as the big Little
Design, is a heterogeneous multi-processing system that uses more than one
processor core and offers multiple software architectures like AMP architecture
based Designs, SMP architecture based Designs and HMP architecture based
Designs.
We encounter many embedded systems every day in our life,
starting from smart phones and tablets to computers, Medical devices and other
electronic gadgets which provide high computing capability. These electronic
systems need to handle diverse compute requirements and diverse workloads and
are not industry-specific; they span across several markets. In the 1980s,
Acorn Computers developed the first ARM processor at Cambridge University,
England for commercial purposes. These ARM processors were further enhanced to
provide high-performance and efficient power management without disrupting the
system’s overall efficiency.
Why ARM Technology and Processors got popular?
ARM Holding is a leading company that was found in the 1990s. It offers a family of reduced instruction set computer (RISC) architecture which is designed specifically to form the cores of processors. This core design is licensed to silicon companies who can incorporate the processor core in their IC design in an efficient, affordable and secure way.
ARM enabled AMP architecture based Designs, SMP architecture based Designs and HMP architecture based Designs aid creation of devices for all types of applications, with a complete toolkit and a strong global ecosystem for support. They provides a set of rules to the silicon companies which describe how the hardware works when an instruction is executed. The ARM architecture is used on CPUs to run applications software, with platform security machine to secure trillions of connected devices, and embedded systems, and thereby help the ecosystem to design secure and efficient systems as easily as possible.
ARM’s comprehensive product offering includes 32- and 64-bit RISC microprocessors, graphics processors, enabling software, cell libraries, embedded memories, high-speed connectivity products, peripherals, and development tools. Due to low power consumption and high performance, ARM processors are being used in most of the modern devices. They have gone through several iterations to increase performance and improve power efficiency. This combination of high performance, low power consumption, wide offering, and low cost makes ARM processors popular. ARM processors have been providing better performance when compared to other processors. It is very easy to use ARM for quick and efficient application development and hence it has gained huge popularity in all varieties of applications. Here are a few of the advantages of ARM processors and their big.Little Design that have made them popular in modern-day electronics.
· They offer a variety of software system models like AMP architecture based Designs, SMP architecture based Designs and HMP architecture based Designs
· They offer a cost advantage compared to other processors
· They are designed to consume less power making it ideal for a wide variety of portable and battery-operated devices.
· Each core performs one operation per cycle and thus work faster
· The availability and applications support offered by ARM has also helped in popularizing the ARM processors
Conclusion
Both the Interconnect technologies are vital components of the new Total Compute Solutions. Both CoreLink CI-700 and CoreLink NI-700 are highly configurable IP designed to enable the very best solution performance. The improvements and power and bandwidth reductions across the system optimize key solution level use cases, such as AAA gaming. Moreover, the Interconnect technologies offer greater security protections through accelerating MTE hardware support and comprehensive design and verification tooling to speed up the SoC implementation process. This creates a seamless system, with the market-proven Interconnect designed and validated together with the latest Armv9 CPU cores. In the future, it is going to invest in the very best Interconnect technologies, bringing Arm’s Total Compute vision of seamless and secure performance for tomorrow’s compute to life.
Reference -
- https://fuse.wikichip.org/news/5271/arm-launches-new-coherent-and-soc-interconnects-ci-700-ni-700/
- https://www.hpcwire.com/off-the-wire/arm-releases-new-interconnect-technology/
- https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/new-arm-interconnect
- https://www.eweek.com/networking/arm-unveils-new-corelink-interconnect/
Created by -
Rohit Agrawal
Rutuja Jaykumar Rathi
Yashashree Shastri
Sameer Sumbhe
Vaishnav Suryawanshi
Syeda Zarah Aiman

