Xelerated Xpress

Insight on Carrier Ethernet and Beyond

Why Programmability Matters in OAM

Defining OAM, Part 3 of 3. Please check out my earlier posts on defining OAM and OAM on NPUs – not only about CPU off-load.

As service providers take on new technologies, they spend a significant amount of time and resources adapting the new infrastructure to their method of operation. This means fine-tuning service provisioning, monitoring network status and having tools for consistent fault protection and mitigation. As with most things, standards is one thing, but what purchased products really can do is yet another. Designing the OAM functions therefore is as delicate as engineering the user plane for the right performance, features and availability. You need to analyze what functions you have, what you want to maintain, and what new capabilities you’d like to see across the complete network. Purchasing equipment that doesn’t meet requirements is as bad for OAM as it is for user data services. You end up compromising functions against performance and designing against the least significant denominator.

System vendors know this. Some have learned it the hard way when their products can’t support the required model of operations by some of they’re potential service provider customers. Others by owing the luxury of incumbency; once an operating system is ‘in-designed’ it becomes ‘approved’ by the service provider. This puts vendors with running contracts in a strategic position where they may advise new features that are planned for in their future releases of the OS, and are well supported by the underlying hardware. Competitors not in this position struggle to catch up and get their functions approved.

Here is why programmability matters for OAM. If the hardware is not flexible enough to develop a strong and agile OAM and service provisioning roadmap, the product will fail to concur with new market territory. Even aggressive pricing can’t compensate for a rigid hardware. Addressing the service provider requirements with a re-spin of the hardware is not very attractive either. An unproven vendor with an unproven line card is just too much uphill to win the deal.

It is rather common that network elements such as switches and transport systems are designed to perfectly meet a large incumbent carrier’s set of requirements. This customer may very well be significant enough to justify the design. But in a global market with global standards all systems have a larger market to address. But it can only be addressed with flexible OAM functions.

For a successful global expansion, Carrier Ethernet products need programmable OAM.

PS. Interested in how Xelerated technology can be used for OAM? You can find more information on the technology section on the Xelerated website.

by Per Lembre on Dec. 3rd, 2010

| Comment

OAM on NPUs – Not Only To Off-load the CPU

Last week, I wrote a blog post on how critical Operations, Administration and Maintenance (OAM) is in Carrier Ethernet and IP/MPLS networks. These functions are essential for running a smooth and scalable service provider business. This post, in turn, will highlight one critical implementation aspect of these functions in modern switch/router designs. How do you divide the tasks between the network processor and the control host CPU?

In the good old TDM world, OAM features were built-in and had a dedicated out-of-band channel for communication.  In the packet-based world, however, network OAM protocols often run alongside user traffic and therefore compete on link and switching resources. The new – and evolving – standards for packet OAM have to be thought through during the systemization phase of the line card to give the feature set its required backing from the hardware. What bandwidth, packet rates and features have to be supported? How can they be assured processing guarantees without sacrificing user data?

OAM doesn’t come for free. For monitoring a single link, the line card has to generate hundreds of CCM packets (Ethernet term) or Hello packets (MPLS term) every second. On the receiving side, the line card has to read and count the OAM packets coming in. The data plane has to identify if the link is lost (by three consecutive lost packets) to trigger a protection switching mechanism and routing protocol convergence at the control plane.

Link OAM traffic can add up to significant packet rates, particularly in service edge routers that may be responsible for termination and management of hundreds of thousands of virtual connections. And this is just for link monitoring, which is costly from a processing perspective, but still a rather basic service in the broader OAM world. Add to this traffic for monitoring the performance of services for quality analyses on services like voice and video per customer, and the amount of processing for OAM raise to several gigabits per second. This was one of the reasons why system vendors were looking at network processors to support OAM a few years back. They are simply running out of steam on the host CPU to cope with the load. And while performance continues to be a major reason for OAM support in network processors, I believe there is a more fundamental reason why OAM should run on the NPU.

If you run status check services in the control plane rather than in the data plane, the OAM design will eventually end up interlinking and dependent on the control plane itself. While OAM plays a key role to the control and management planes, the planes should ultimately be autonomous from each other. When you are to check the connectivity between two points in the network, you want your test packet to be forwarded exactly as if the packet was part of the user traffic. This is also required in IEEE 802.1ag for link OAM. The packet should travel across the network using tables stored in the data plane, not mirrored copies that hold in the control plane. If there are inconsistencies between them, OAM won’t provide the correct data. It is therefore critical to implement OAM functions in the data plane itself. That is, it should run on the NPU rather than on the CPU.

In my next post, I will delve deeper into the importance of programmability for OAM functions.

by Per Lembre on Nov. 25th, 2010

| Comment

A Closer Look: Defining OAM

In an earlier post, we touched on Operations, Administrations and Maintenance (OAM) and the holistic approach Xelerated takes to these critical functions. As Carrier Ethernet vendors begin to take OAM more seriously, it’s time to revisit the long-debated topic, taking a closer look at OAM and what it means to the vendor and service provider communities.  In a series of three blog posts, I will take a look at the definition of OAM, its importance from the NPU level and why programmability matters.

First, let’s take a quick look at the role OAM plays for service providers and their networks.  Service providers demand operational excellence.  It is critical to their success.  And OAM is a critical component in hitting that mark.

On a high level, OAM is the monitoring and management of network resources – a set of data plane functions which allows the network operator to e.g. identify faults before they escalate and get noticed by end users, or to set a remote node into loopback for configuration under a service window.  For a good background on Carrier Ethernet OAM, I recommend a white paper by RAD.

OAM measures the status of nodes and links of the network, and it notifies control and management planes in case of events. Closely related to OAM is the area of performance monitoring, where the data plane provides information of the performance (in terms of throughput, packet loss, delay and delay-variations) of links and services across networks and service provider domains. Ethernet network and service level OAM tasks are well defined in the IEEE 802.1ag while performance monitoring is described in the ITU Y.1731.

The increased demand for performance monitoring is driven by the interest for enterprises and content providers to monitor their networks. Services providers therefore have to offer a secure and reliable interface to the virtual private network connections these services are based on.

Automated OAM functions enable service providers to streamline their operation and reduce truck-rolls. They have invested decades and several million dollars of IT investments on operational network and service systems. Therefore, they are putting careful attention to the OAM capabilities of network nodes. The switches and routers have to comply to current ways to monitor and provide services, as well as provide the required features and performance planned in future network designs.

By providing hard requirements on OAM, service providers can continue to run their operations with small and highly experienced staff. It is way too expensive to leave this set of requirements out in the purchasing of next generation Carrier Ethernet products. With advanced OAM, the business can run more efficiently.  But for some reason, many Carrier Ethernet vendors are still behind in delivering what carriers really require for OAM and performance monitoring. It may have to do with the stress these functions put on the hardware. Next week’s blog post will focus on this. I will look into OAM from the network processing perspective.

by Per Lembre on Nov. 17th, 2010

| Comment

Gloomy Old OAM and Synchronization Turns Hot

When I started in this industry more than a decade ago, I couldn’t care less about OAM and synchronization. Sure, probably important, but I just let the SDH/Sonet guys worry about those things. The future was all about packets, higher bandwidth and great user experience.

Now, we packet guys have started to realize why OAM and synchronization are important areas. Lionel Florit, MEF technical committee member and technical lead at Cisco, captured this in today’s  sessions at the MPLS and Ethernet World Congress in Paris, when comparing a 15 year old Sonet chart on OAM with the IEEE 802.1ag standard chart for Carrier Ethernet. Indeed, very similar.

First conference day was really good. This Upperside event attracts all top system vendors and most speakers are experienced enough to bring some good meat to the discussion.  Stands are congested and the conference sessions well attended. Served with French food and wine, we get most of what is needed for a great industry event.

Interested in more snippets from the show? Please follow  http://twitter.com/perlembre

by Per Lembre on Feb. 11th, 2010

| Comment

Dataflow OAM

For service providers, Operations Administration and Maintenance (OAM) is business critical. OAM is a set of features required to operate the network efficiently. Links and nodes are contineously monitored for consistent traffic forwarding. Anomalies and faults are announced (and protection switching is carried out). Access switches and CPEs are automatically configured.

All these tasks are operated in the background, invisible to the end-user, and in most cases even to the personell at the Network Operating Centre. Every node in the network is configured to automatically carry out its OAM tasks. Reporting back to the centralized operations systems is limited to faults and status message updates at regular intervals.

At the node level, all OAM tasks need to be implemented in a highly scalable manner with limited or no performance hit of the user data traffic. A few years back, these tasks were implemented by the control plane running on the CPU. But with the  processing power needed for OAM in today’s modern Carrier Ethernet networks, the data plane on the Network Processor (NPU) has to significantly offload the CPU.

Take Ethernet link OAM as defined in IEEE 802.1ag. It is used to trigger protection switching in e.g. PBB-TE, and requires significant support by the data plane to operate efficiently. Every second, each Management End Point (and a node can support several hundreds of these) generates 300 Control Check Messages (CCM) to its peers.

At Xelerated, we have taken an holistic approach to all OAM operations. The control plane is obviously still in charge for all OAM tasks. But when monitoring links and forwarding states, we are really interested in how the data plane behaves. It should ultimately be able to collect the right information and hand it to the reqeusting entity without having to involve the CPU of the system. Even if the control plane goes down, these tasks should run without any interference.

Similar processes are carried out for routing and switching table maintenance, hardware health checks and memory repairs. On a node level, these tasks are hugely important , and when you think about it, they have a lot in common with service and link OAM.

In the last few months we have got some good feedback on our approach to OAM. For developers used to the simplistic programming model used when coding the dataflow architecture featured in all Xelerated devices, programming OAM is like programming any other part of the data plane. Maybe this is what is best about our OAM approach. OAM isn’t a new area requiring a new set of tools and software. It is an inherent part of the data plane. So all we had to do was to allow the programmable pipeline to be programmed and configured by packets generated by the control plane, or be triggered by an OAM packet arriving from the network. These control messages triggers an OAM program running in the pipeline thereby allowing the rich set of resources to be accessed in a timely manner for any type of OAM task.

We haven’t coined Xelerated’s OAM approach yet, but I’m leaning towards Dataflow OAM. Below is a first draft on how the Dataflow OAM functionality can be presented. Comments, anyone? dataflow oam

by Per Lembre on Sep. 15th, 2009

| Comment

Latest blog entries

Archive

Places we like

Categories