NetSeer-SIGCOMM'20

Flow Event Telemetry on Programmable Data Plane - SIGCOMM'20#

Motivation#

Network performance anomalies(NPAs) is easier to happen and harder to mitigate than connectivity loss since NPA usually happens in a shorter time-scale with randomness, leaving minor fingerprints in networks.
- The NPA's cause location is usually the bottleneck that costs 90% time or more in most cases.
  - Only a fraction of NPAs are caused by networks but many applications usually consider NPAs as the consequences of network faults, which wastes tremendous time to location the causes.
  - Even if the network is indeed causing a given NPA, there are numerous potential root reasons and locations across the whole cloud network that can lead to the NPA.
  - Locating the causes combing coarse-grained counters wastes a lot of time and is error-prone while fine-grained traffic mirroring may introduce congestion and packet drops.
- The actual recovery operations after finding the location of NPAs causes are typically fast
Completely realizing flow event telemetry(FET) on programmable data plane(PDP) is rational
- Directly running FET in the DP can reduce monitoring traffic volume and data processing overhead
- ASICs in PDP can keep line rate when running customized packet processing logics, different from CPU.

Aim#

Design a flow event telemetry based network monitor which continuously(always-on) and simultaneously watches(distinguish) all individual flows and comprehensively detects(catch) sub-second level flow events, and meet the following requirements:
- Coverage: Feasible to discover all flow events from high-speed on-going traffic
  - Not only events in ASICs but also events on interfaces and fibers
- Scalability: Scalable enough to work effectively even in large scale production cloud networks
  - Need to compress event data
  - Need to reorganize small-sized events
- Accuracy:
  - zero FN
  - very few FP

Main Idea#

Flow events: congestion, packet pause, path change, packet drop(intra-switch drop, inter-switch drop)
Each programmable switch takes the following steps to derive comprehensive and compact flow events from original traffic:
- Event Packet Detection: Trace each step in packet processing in the data plane to detect all events that happen to each packet, especially packet drops:
  - Intra-switch packet drops
    - Types: pipeline drop, congestion drop
    - Challenge: limitations of programmable switching ASICs.
    - Existing Solution: record the appearance of packets at the beginning of the programmable pipeline and confirm the exit of packets at the pipeline tail
      - Require unacceptable large memory such as hash tables
      - Require to maintain a timer for each packet
      - Demand a synchronization mechanism
    - Authors' Solution: embed drop detection logics into the entire packet processing logic in ASICs to report all packet drops
      - For each type and reason of packet drop, NetSeer use corresponding methods to detect them.
  - Inter-switch packet drops
    - Types: silent packet drop or corruption
    - Challenge: lack of direct visibility in the electrical and optical components between the two neighboring switches
    - Authors' Solution: use a four-byte consecutive packet ID between two neighboring switches to detect packet loss
      1. Packet numbering and recording in the ring buffer
      2. Packet transmission
      3. Loss detection
      4. Loss notification
      5. Loss retrieval
  - Congestion, path change and pause detection
    - Congestion: measure the queuing delay for each packet and select packet whose queuing delay exceeds a threshold
    - Path change: select 1st packet of a new flow or an old flow whose ports are changed as a path change event packet
    - Pause: look up the corresponding queue status in ingress by a queue status detector, and identify a packet as a pause event packet if the queue is paused
- Flow Event Generation & Compression: Aggregate sequential event packets that belong to one flow into a single flow event
  - Event packets to flow events: Eliminate redundancy by aggregating event packets into flow events, and maintaining one counter for each flow event - Deduplication algorithm based on group caching
  - Event information extraction: Compress the monitoring traffic volume by only extracting the necessary information from flow events, which includes flow headers, switch-port-queue and event-specific data
- Circulating Flow Event Batching: Batch event packets, i.e. packing $\ge 1$ event within each packet, to reduce the bandwidth overhead and be friendly for CPUs
  - Design a stack data structure and push each incoming event into the stack for temporal caching
  - Generate circulating event batching packets(CEBPs) that constantly recirculate within the pipeline via a separate internal port.
    - When a CEBP hits the stack, it pops one event and append the event into its payload
    - When the payload length exceeds a threshold or all events have been collected. the CEBP is forwarded to CPU and cloned at the same with an empty payload to collect latter events
- False Positive Elimination
  - Definition of FP: repetitive event reports for the same flow event
  - Authors' solution:
    - Enable the switch pipeline to calculate the hash value in advance and attach it to the event
    - Switch CPU could directly retrieve the value for indexing
For troubleshooting network incidents: NetSeer can help network operators claim network innocence or quickly locate NPA causes by comprehensively capturing events.
- Identify path change events caused by the faulty update - Routing error due to network updates
- Discover pipeline drops by ACL - ACL configuration error
- Catch pipeline drops due to table lookup miss, identify flows - Silent drop due to parity error
- Find the flows that contributed the most to congestion by checking MMU congestion drop counters and perform scheduling accordingly - Congestion due to unexpected volume
- Find how many storage packets are dropped by which switches and accelerate the debugging process by clearly stating network responsibility - SSD firmware driver bug

Strength#

Full coverage of flow events: leverage the programmability at both sides of a link to collaboratively discover inter-switch drop events and recover the flow information of dropped or corrupted packets
Good scalability with network sizes:
- fully utilize data programmability to identify and aggregate packets, compress the flow events
- batch small-sized event messages into large packets with a novel in-data-plane design of circular-packet-driven event collection
- perform FET in a distributed manner
High data accuracy: ensure 0 FN in event generation and uses switch CPU with ASIC offloading to discover and eliminate FP with small overhead

Weakness#

NetSeer cannot cover drops due to ASIC or MMU hardware failures.
Middleboxes have to follow some principles such as inter-device drop awareness, event-based anomaly detection and reliable report.