For service providers, Operations Administration and Maintenance (OAM) is business critical. OAM is a set of features required to operate the network efficiently. Links and nodes are contineously monitored for consistent traffic forwarding. Anomalies and faults are announced (and protection switching is carried out). Access switches and CPEs are automatically configured.
All these tasks are operated in the background, invisible to the end-user, and in most cases even to the personell at the Network Operating Centre. Every node in the network is configured to automatically carry out its OAM tasks. Reporting back to the centralized operations systems is limited to faults and status message updates at regular intervals.
At the node level, all OAM tasks need to be implemented in a highly scalable manner with limited or no performance hit of the user data traffic. A few years back, these tasks were implemented by the control plane running on the CPU. But with the processing power needed for OAM in today’s modern Carrier Ethernet networks, the data plane on the Network Processor (NPU) has to significantly offload the CPU.
Take Ethernet link OAM as defined in IEEE 802.1ag. It is used to trigger protection switching in e.g. PBB-TE, and requires significant support by the data plane to operate efficiently. Every second, each Management End Point (and a node can support several hundreds of these) generates 300 Control Check Messages (CCM) to its peers.
At Xelerated, we have taken an holistic approach to all OAM operations. The control plane is obviously still in charge for all OAM tasks. But when monitoring links and forwarding states, we are really interested in how the data plane behaves. It should ultimately be able to collect the right information and hand it to the reqeusting entity without having to involve the CPU of the system. Even if the control plane goes down, these tasks should run without any interference.
Similar processes are carried out for routing and switching table maintenance, hardware health checks and memory repairs. On a node level, these tasks are hugely important , and when you think about it, they have a lot in common with service and link OAM.
In the last few months we have got some good feedback on our approach to OAM. For developers used to the simplistic programming model used when coding the dataflow architecture featured in all Xelerated devices, programming OAM is like programming any other part of the data plane. Maybe this is what is best about our OAM approach. OAM isn’t a new area requiring a new set of tools and software. It is an inherent part of the data plane. So all we had to do was to allow the programmable pipeline to be programmed and configured by packets generated by the control plane, or be triggered by an OAM packet arriving from the network. These control messages triggers an OAM program running in the pipeline thereby allowing the rich set of resources to be accessed in a timely manner for any type of OAM task.
We haven’t coined Xelerated’s OAM approach yet, but I’m leaning towards Dataflow OAM. Below is a first draft on how the Dataflow OAM functionality can be presented. Comments, anyone? 
by Per Lembre on Sep. 15th, 2009
| Comment