Flow Event Telemetry on Programmable Data Plane - SIGCOMM'20#
Motivation#
- Network performance anomalies(NPAs) is easier to happen and harder to mitigate than connectivity loss since NPA usually happens in a shorter time-scale with randomness, leaving minor fingerprints in networks.
- The NPA's cause location is usually the bottleneck that costs 90% time or more in most cases.
- Only a fraction of NPAs are caused by networks but many applications usually consider NPAs as the consequences of network faults, which wastes tremendous time to location the causes.
- Even if the network is indeed causing a given NPA, there are numerous potential root reasons and locations across the whole cloud network that can lead to the NPA.
- Locating the causes combing coarse-grained counters wastes a lot of time and is error-prone while fine-grained traffic mirroring may introduce congestion and packet drops.
- The actual recovery operations after finding the location of NPAs causes are typically fast
- The NPA's cause location is usually the bottleneck that costs 90% time or more in most cases.
- Completely realizing flow event telemetry(FET) on programmable data plane(PDP) is rational
- Directly running FET in the DP can reduce monitoring traffic volume and data processing overhead
- ASICs in PDP can keep line rate when running customized packet processing logics, different from CPU.
Aim#
- Design a flow event telemetry based network monitor which continuously(always-on) and simultaneously watches(distinguish) all individual flows and comprehensively detects(catch) sub-second level flow events, and meet the following requirements:
- Coverage: Feasible to discover all flow events from high-speed on-going traffic
- Not only events in ASICs but also events on interfaces and fibers
- Scalability: Scalable enough to work effectively even in large scale production cloud networks
- Need to compress event data
- Need to reorganize small-sized events
- Accuracy:
- zero FN
- very few FP
- Coverage: Feasible to discover all flow events from high-speed on-going traffic
Main Idea#
- Flow events: congestion, packet pause, path change, packet drop(intra-switch drop, inter-switch drop)
- Each programmable switch takes the following steps to derive comprehensive and compact flow events from original traffic:
- Event Packet Detection: Trace each step in packet processing in the data plane to detect all events that happen to each packet, especially packet drops:
- Intra-switch packet drops
- Types: pipeline drop, congestion drop
- Challenge: limitations of programmable switching ASICs.
- Existing Solution: record the appearance of packets at the beginning of the programmable pipeline and confirm the exit of packets at the pipeline tail
- Require unacceptable large memory such as hash tables
- Require to maintain a timer for each packet
- Demand a synchronization mechanism
- Authors' Solution: embed drop detection logics into the entire packet processing logic in ASICs to report all packet drops
- For each type and reason of packet drop, NetSeer use corresponding methods to detect them.
- Inter-switch packet drops
- Types: silent packet drop or corruption
- Challenge: lack of direct visibility in the electrical and optical components between the two neighboring switches
- Authors' Solution: use a four-byte consecutive packet ID between two neighboring switches to detect packet loss
- Packet numbering and recording in the ring buffer
- Packet transmission
- Loss detection
- Loss notification
- Loss retrieval
- Congestion, path change and pause detection
- Congestion: measure the queuing delay for each packet and select packet whose queuing delay exceeds a threshold
- Path change: select 1st packet of a new flow or an old flow whose ports are changed as a path change event packet
- Pause: look up the corresponding queue status in ingress by a queue status detector, and identify a packet as a pause event packet if the queue is paused
- Intra-switch packet drops
- Flow Event Generation & Compression: Aggregate sequential event packets that belong to one flow into a single flow event
- Event packets to flow events: Eliminate redundancy by aggregating event packets into flow events, and maintaining one counter for each flow event - Deduplication algorithm based on group caching
- Event information extraction: Compress the monitoring traffic volume by only extracting the necessary information from flow events, which includes flow headers, switch-port-queue and event-specific data
- Circulating Flow Event Batching: Batch event packets, i.e. packing $\ge 1$ event within each packet, to reduce the bandwidth overhead and be friendly for CPUs
- Design a stack data structure and push each incoming event into the stack for temporal caching
- Generate circulating event batching packets(CEBPs) that constantly recirculate within the pipeline via a separate internal port.
- When a CEBP hits the stack, it pops one event and append the event into its payload
- When the payload length exceeds a threshold or all events have been collected. the CEBP is forwarded to CPU and cloned at the same with an empty payload to collect latter events
- False Positive Elimination
- Definition of FP: repetitive event reports for the same flow event
- Authors' solution:
- Enable the switch pipeline to calculate the hash value in advance and attach it to the event
- Switch CPU could directly retrieve the value for indexing
- Event Packet Detection: Trace each step in packet processing in the data plane to detect all events that happen to each packet, especially packet drops:
- For troubleshooting network incidents: NetSeer can help network operators claim network innocence or quickly locate NPA causes by comprehensively capturing events.
- Identify path change events caused by the faulty update - Routing error due to network updates
- Discover pipeline drops by ACL - ACL configuration error
- Catch pipeline drops due to table lookup miss, identify flows - Silent drop due to parity error
- Find the flows that contributed the most to congestion by checking MMU congestion drop counters and perform scheduling accordingly - Congestion due to unexpected volume
- Find how many storage packets are dropped by which switches and accelerate the debugging process by clearly stating network responsibility - SSD firmware driver bug
Strength#
- Full coverage of flow events: leverage the programmability at both sides of a link to collaboratively discover inter-switch drop events and recover the flow information of dropped or corrupted packets
- Good scalability with network sizes:
- fully utilize data programmability to identify and aggregate packets, compress the flow events
- batch small-sized event messages into large packets with a novel in-data-plane design of circular-packet-driven event collection
- perform FET in a distributed manner
- High data accuracy: ensure 0 FN in event generation and uses switch CPU with ASIC offloading to discover and eliminate FP with small overhead
Weakness#
- NetSeer cannot cover drops due to ASIC or MMU hardware failures.
- Middleboxes have to follow some principles such as inter-device drop awareness, event-based anomaly detection and reliable report.