Xelerated Xpress

Insight on Carrier Ethernet and Beyond

Xelerated Is Four Years Ahead

I participated in an inspiring 100G panel at the Linley Tech Processor Conference last week. We didn’t have to debate about the need for more bandwidth and more processing. The debate, instead, was focused on how to best achieve the goal. Network processors, that are purposely designed for the task, or multicore processors that are general purpose and more capable for advanced services?

The Linley Tech Processor Conference attracted 300 attendees.

In the first day’s sessions, one could easily get the impression that multicores are up to the task of network processing. Thanks to Mike Coward of RadiSys, however, the bold marketing claims got a good reality check. RadiSys build systems based on multicore technology. Today, they do 10G per line card. In two, years, they expect to run up to 100G, and 100G in a single chip is likely four years out, all according to his estimation.

For those that don’t want to wait this long, you are welcome to Xelerated. Our 100G wirespeed NPU is here, and now going into production. And in addition to any of the multicore processors in the market, it also includes an advanced traffic manager.

Update: Xelerated’s presentation on ‘Uncompromised throughput at low power’ can be found here.

by Per Lembre on Oct. 11th, 2011

| Comment

100G Network Processors Start Ramping in November

Xelerated Xpress met with Anders Ericsson, VP of Sales and Marketing at Xelerated, for a chat on the state of the 100G NPU market. He explains how the technology pushes the communications industry to better optimized solutions.

Anders Ericsson

 

Xpress: How does the 100G NPU technology contribute to the networking industry?

AE: This new generation NPUs is changing some fundamental economics for the design of Carrier Ethernet systems and how system vendors compete. First, we have the bandwidth factor. The next wave of Carrier Ethernet line cards and systems will deliver higher capacities and packet rates within the same power budget. Second, we get more processing and features. The integrated advanced traffic manager allows system vendors to build more capable systems with higher quality and more advanced services at radically reduced dollar per gigabit. Third, the shift in favor to merchant NPU enable system vendors to compete more with software and feature velocity rather than having to depend on internal ASIC projects.

Xpress: How can Xelerated compete with in-house packet processing silicon?

AE: We are gaining experience from so many more customers, markets, sources and stake holders than an internal development group can get. Our solutions are catered for a broader task and can be used in so many more systems and applications. In comparison, Xelerated network processors are more flexible and have a higher integration factor. They include advanced traffic management and buffering, many hardware engines and huge banks of embedded memories. As our technology applies to a broad set of applications, we pay attention to R&D economics such as time to market, and re-usability of investments in software.

Xpress: What attracts system vendors to use silicon from Xelerated?

AE: We provide three fundamental benefits that no other supplier can provide today.

  1. Determinism through the wirespeed architecture
  2. Highly efficient programmability, and
  3. Low power consumption

It is the combination of the three that makes the big difference.

Xpress: Why is low power consumption becoming critical?

AE: Power consumption is a key design parameter in all systems today, across application types.  This is driven by direct energy costs, less complex and faster installation, minimized need for forced ventilation, and compliance to state regulations for environmental protection. And on top of that you gain a more cost-effective design.

Xpress: In what type of platforms do you see the largest adoption for HX and AX chips?

AE: OTN, Transport systems, PTN, Mobile Backhaul, Carrier-Ethernet Switch Routers and PON OLTs

Xpress: Does the technology enable fundamentally new designs, or are we mainly seeing more of the same; more ports, more packets and more bandwidth?

AE: We see both new designs and more of the same. With our state of the art integrated traffic manager we enable new systems with more efficient designs. But there is of course an ongoing cry for more bandwidth and throughput. One has to bear in mind though, that not all components are in the same maturity stage. For instance, there are a lot of optical components that are too expensive to support a cost-effective roll-out of 100G solutions before 2014.

Xpress: Are there any sweet spot designs?

AE: Yes, OTN and PTN. The HX and AX product attributes fit well in these high-volume optimized designs.

Xpress: The dataflow architecture has evolved in HX and AX. How?

AE: Our core technology continuously evolves. The HX and AX with the 100G dataflow architecture is now in production. It includes enhanced service densities; higher lookup rates and more processor cores compared to the 40G generation. In addition, we have enhanced the flexibility by allowing intelligent oversubscription through advanced pre-classification.

Moving forward the dataflow architecture is already staged for 200G and 400G. We are assessing parallel pipes, and enhanced flexibility for ingress and egress processing while retaining the deterministic characteristics.

Xpress: Xelerated is making a strong push for wirespeed processing. Is this inherent to the HX and AX chips?

AE: Wirespeed is by design and inherent to all our products.

Xpress: Doesn’t wirespeed come with a flexibility tax?

AE: Not really, our software and application utilization is not dependent on traffic load. It is always deterministic to the speed and throughput that the devices are specified for.

Xpress: How are customers responding to the new integrated traffic manager? Are customers using this feature?

AE: The market response has been overwhelming, really. It is mainly used for per-user and per-service shaping in both line card and pizza box designs. This was really the primary application we had in mind. But, we also see additional interest for building chassis-based solutions solely on the integrated TM.

Xpress: When do you expect the first platforms in volume based on HX and AX chips?

AE: HX-based products are expected to ramp in November, while AX will ramp in December.

Xpress: 100G network processing is here and now. So what’s next?

AE: The industry is screaming for more bandwidth and more advanced packet services on Carrier Ethernet systems. We have a number of interesting innovations in-design, but it is a bit early still to unveil any secrets.

 

by Per Lembre on Sep. 8th, 2011

| Comment

OAM on NPUs – Not Only To Off-load the CPU

Last week, I wrote a blog post on how critical Operations, Administration and Maintenance (OAM) is in Carrier Ethernet and IP/MPLS networks. These functions are essential for running a smooth and scalable service provider business. This post, in turn, will highlight one critical implementation aspect of these functions in modern switch/router designs. How do you divide the tasks between the network processor and the control host CPU?

In the good old TDM world, OAM features were built-in and had a dedicated out-of-band channel for communication.  In the packet-based world, however, network OAM protocols often run alongside user traffic and therefore compete on link and switching resources. The new – and evolving – standards for packet OAM have to be thought through during the systemization phase of the line card to give the feature set its required backing from the hardware. What bandwidth, packet rates and features have to be supported? How can they be assured processing guarantees without sacrificing user data?

OAM doesn’t come for free. For monitoring a single link, the line card has to generate hundreds of CCM packets (Ethernet term) or Hello packets (MPLS term) every second. On the receiving side, the line card has to read and count the OAM packets coming in. The data plane has to identify if the link is lost (by three consecutive lost packets) to trigger a protection switching mechanism and routing protocol convergence at the control plane.

Link OAM traffic can add up to significant packet rates, particularly in service edge routers that may be responsible for termination and management of hundreds of thousands of virtual connections. And this is just for link monitoring, which is costly from a processing perspective, but still a rather basic service in the broader OAM world. Add to this traffic for monitoring the performance of services for quality analyses on services like voice and video per customer, and the amount of processing for OAM raise to several gigabits per second. This was one of the reasons why system vendors were looking at network processors to support OAM a few years back. They are simply running out of steam on the host CPU to cope with the load. And while performance continues to be a major reason for OAM support in network processors, I believe there is a more fundamental reason why OAM should run on the NPU.

If you run status check services in the control plane rather than in the data plane, the OAM design will eventually end up interlinking and dependent on the control plane itself. While OAM plays a key role to the control and management planes, the planes should ultimately be autonomous from each other. When you are to check the connectivity between two points in the network, you want your test packet to be forwarded exactly as if the packet was part of the user traffic. This is also required in IEEE 802.1ag for link OAM. The packet should travel across the network using tables stored in the data plane, not mirrored copies that hold in the control plane. If there are inconsistencies between them, OAM won’t provide the correct data. It is therefore critical to implement OAM functions in the data plane itself. That is, it should run on the NPU rather than on the CPU.

In my next post, I will delve deeper into the importance of programmability for OAM functions.

by Per Lembre on Nov. 25th, 2010

| Comment

Demo video with EE Times

The 100G demo tour is now going into a final phase with additional customer meetings mainly in Europe, where vacation period is eventually over.

For all those who didn’t get to see the demo, I recommend watching the video demo we conducted with Brian Fuller of EE Times.

by Per Lembre on Sep. 7th, 2010

| Comment

It is Hot: 100G Wirespeed Processing

Some of you may have noted:  Xelerated is out on a roadshow. I’m just back from demo meetings in Asia and in the U.S. During our worldwide tour we have experienced a tremendous amount of positive customer responses, and as it seems, our timing to demonstrate wirespeed processing at 100G is impeccable. The long awaited possibility for our customers to realize their ever increasing demand for a higher rate of processing traffic is highly appreciated among all customers.

Per Lembre presents demo of the HX 100G NPU

Per Lembre demonstrates 100G

Over four weeks, we are visiting 40 customers and partners to demonstrate our new technology.  Many of them have said this is the first time they have actually experienced a demonstration of wirespeed processing at 100G in a single chip.

Here are some of my reflections this far:

  • 100G is hot. With the finalization of the 100GE and 40GE standards, there is a huge interest to scale packet processing to the next level. 100G wirespeed network processors that can match the new step in link capacity will be critical to the commercial success of 100GE.
  • Greater port densities in next generation fiber access systems. Xelerated’s OLT and unified fiber access customers are pushing to get the next generation systems to market as soon as possible. Service providers, primarily in Asia, are driving the need for more bandwidth and customized features to fit local market conditions.
  • Power reduction is critical. In several meetings, our customers indicated that the HX and AX technology can reduce the power consumption with more than 50% compared to a competitive solutions. This has implications for both the environment and the operational cost of running the networks. Reduction in power consumption also enables new types of designs that are more efficient and require a smaller footprint.
  • Wirespeed by Design. We use this term as a tag line for the company.  Through these meetings, I now realize just how well it resonates with our customer base. The dataflow architecture enables wirespeed packet processing without degradation when all services are turned on. It simplifies engineering, and our customers gain time to market. In addition, they are assured the products will come out well in performance tests.

The demo tour marks an important milestone for many of our customer design projects. The huge increase in demand on Reference Design Kits and the intensive customer correspondence on technical requirements are two safe signs of what’s ahead of us. It will be a lot of work, and a lot of fun!

By the way, we invited Craig Matsumoto at Light Reading to see the world’s first 100G demo. It all went very well, as expected, however there was initial confusion about bitrates and packet rates for 100Gbps Ethernet wirespeed. Please refer to the comment section to the blog post for more details.

by Anders Ericsson on Aug. 24th, 2010

| Comment

Time to Clarify Service Density

This industry struggles with a communication issue. One of the most important aspects of network processing – service density – lacks a common definition, and because of this, there is no widely accepted way to measure it.

Pipeline of processor cores and engine access points

The density of processing resources, here illustrated by a part of the single pipeline of processor cores and engine access points featured in the Dataflow Architecture, defines how much services a chip can support.

We all know how to measure link bandwidth; we do this in Gigabits per second (Gbps). Likewise, we have a common understanding of how to measure the raw performance of packet processing; Megapackets per second (Mpps). But we don’t fully agree on how to measure the capabilities of parsing, classification and modification – important tasks which are performed by the network processor (NPU).

Over what is now a few generations of NPU development at Xelerated, it is fair to say we spend a large part of our system engineering on increasing service density. The first generation of NPUs, the X10q family, was initially released in 2002. It was a 40 Gbps NPU with 40-100 Mpps, depending on type. In the next generation, the X11 Family of NPUs, the greatest achievement was (yes, you are correct!), increases in service density.   (Okay, some people may argue that the integration of more interface types and GE MACs was key to the success. But I still think  the list of services and features the X11 performs – in parallel and in wirespeed – must be one of the greatest achievements in the industry at the time).

Now moving to the third generation, the HX family of NPUs. This is a 100 Gbps and 150 Mpps NPU family, which is significantly more than the X11, sure. But again, we make a giant leap forward in terms of service density. This means more instructions per packet and more lookup bandwidth per packet type. The result is that more services can be delivered in a single chip.

For the system vendors, making the correct assumptions on service density is one of the most strategic tasks for product management. There are many cases where the HX device consolidates three to four ingress and egress processing chips (be it custom ASICs or merchant NPUs) into one. It impacts COGS, margins, and in the end, the whole business case for the product.

The lack of a commonly accepted definition of service density makes the dialog between silicon and system vendors unnecessary blurred and full of misunderstanding.

So let us start the job of defining the term. I do not have a perfect answer to this. As can be seen at the Xelerated product pages, we measure service density only by comparing the capabilities of the different devices within the same family. As we lack a more general definition, we could not measure service density across product families. Yet.

The definition should take both the number of operations per packet and the classification of resources per packet into account. And another component also needs to be added to the equation: the packet processing needs to be achieved at wirespeed for a specified link rate. Without a hard performance target in terms of Mpps, the whole discussion just falls short. So let’s get the discussion rolling… for the evolving services in the metro space, next generation platforms will be dependent on this definition.

by Per Lembre on Nov. 24th, 2009

| Comment

Observations at the Linley Data Center Seminar

Data CenterI attended Linley Group’s data center seminar this Tuesday to learn more about the latest data center trends.  Xelerated’s Anders Wirkestrand presented on the Network Processor Unit’s (NPU) role in data centers as a key catalyst for virtualization.

One key observation is that an NPU has an enormous amount of service density and could, when used together with a multicore processor, increase the overall performance and transaction rate while lowering the power significantly. The number of instructions per packet is 23 times (yes, you read that correctly!) over a state-of-the-art Intel Core 2 Extreme QX9770 multicore processor. Combining the strength of a state-of-the-art multicore processor and an Xelerated NPU can dramatically improve the overall solution.

It is interesting to see that the data center server players are adding switching functionality, and at the same time, the router and switch vendors are adding server functionality. They end up competing with each other. When the switch becomes a server, and the server becomes a switch, it opens up for a period of strong innovations.  Which architecture model will prevail in the future?

by Thomas Eklund on Nov. 13th, 2009

| Comment

Latest blog entries

Archive

Places we like

Categories