OPSAWG J. Evans Internet-Draft O. Pylypenko Intended status: Informational Amazon Expires: 9 January 2025 J. Haas Juniper Networks A. Kadosh Cisco Systems, Inc. M. Boucadair Orange 8 July 2024 An Information Model for Packet Discard Reporting draft-ietf-opsawg-discardmodel-02 Abstract The primary function of a network is to transport packets and deliver them according to a service level objective. Understanding both where and why packet loss occurs within a network is essential for effective network operation. Device-reported packet loss is the most direct signal for network operations to identify customer impact resulting from unintended packet loss. This document defines an information model for packet loss reporting, which classifies these signals to enable automated network mitigation of unintended packet loss. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 9 January 2025. Copyright Notice Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. Evans, et al. Expires 9 January 2025 [Page 1] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 4 4. Information Model . . . . . . . . . . . . . . . . . . . . . . 5 4.1. Requirements . . . . . . . . . . . . . . . . . . . . . . 9 4.2. Examples . . . . . . . . . . . . . . . . . . . . . . . . 10 5. Example Signal-Cause-Mitigation Mapping . . . . . . . . . . . 10 6. YANG Module . . . . . . . . . . . . . . . . . . . . . . . . . 11 7. Security Considerations . . . . . . . . . . . . . . . . . . . 24 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 9. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 25 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 25 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 25 11.1. Normative References . . . . . . . . . . . . . . . . . . 25 11.2. Informative References . . . . . . . . . . . . . . . . . 25 Appendix A. Where do packets get dropped? . . . . . . . . . . . 27 A.1. Discard Class Descriptions . . . . . . . . . . . . . . . 28 Appendix B. Implementation Experience . . . . . . . . . . . . . 28 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 30 1. Introduction In automating network operations, a network operator needs to be able to detect anomalous packet loss, diagnose or root cause the loss, and then apply one of a set of possible actions to mitigate customer- impacting packet loss. Some packet loss is normal or intended in IP/ MPLS networks, however. Hence, precise classification of packet loss signals is crucial both to ensure that anomalous packet loss is easily detected and that the right action or sequence of actions are taken to mitigate the impact, as taking the wrong action can make problems worse. The existing metrics for reporting packet loss, as defined in [RFC1213] - namely ifInDiscards, ifOutDiscards, ifInErrors, ifOutErrors - do not provide sufficient precision to automatically identify the cause of the loss and mitigate the impact. From a network operator's perspective, ifInDiscards can represent both Evans, et al. Expires 9 January 2025 [Page 2] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 intended packet loss (e.g., packets discarded due to policy) and unintended packet loss (e.g., packets dropped in error). Furthermore, these definitions are ambiguous, as vendors can and have implemented them differently. In some implementations, ifInErrors accounts only for errored packets that are dropped, while in others, it accounts for all errored packets, whether they are dropped or not. Many implementations support more discard metrics than these; where they do, they have been inconsistently implemented due to the lack of a standardised classification scheme and clear semantics for packet loss reporting. [RFC7270] provides support for reporting discards per flow in IPFIX using forwardingStatus, however, the defined drop reason codes also lack sufficient clarity to support automated root cause analysis and mitigation of impact. Hence, this document defines an information model for packet loss reporting, aiming to address these issues by presenting a packet loss classification scheme that can enable automated mitigation of unintended packet loss. Consistent with [RFC3444], this information model is independent of any specific implementations or protocols used to transport the data. There are multiple ways that this information model could be implemented (i.e., data models), including SNMP [RFC1157], NETCONF [RFC6241] / YANG [RFC7950], RESTCONF [RFC8040], and IPFIX [RFC5153]. However, these mechanisms are out of the scope of this document. The scope of this document is limited to reporting packet loss at Layer 3 and frames discarded at Layer 2, although the information model might be extended in future to cover segments dropped at Layer 4. Section 3 describes the problem to be solved. Section 4 describes the information model and requirements with a set of examples. Section 5 provides examples of discard signal-to-cause-to-auto- mitigation action mapping. Section 6 presents the information model as an abstract data structure in YANG, in accordance with [RFC8791]. Appendix A provides an example of where packets may be discarded in a device. Appendix B details the authors' experience from implementing this model. This document considers only the signals that may trigger automated mitigation plans and not how they are defined or executed. 2. Terminology The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. Evans, et al. Expires 9 January 2025 [Page 3] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 A packet discard is considered to be any packet dropped by a device, which may be intentional (i.e. due to a configured policy, e.g. such as an Access Control List (ACL)) or unintentional (i.e. packets dropped in error). The meanings of the symbols in the YANG tree diagrams are defined in [RFC8340]. Symbol "|" is used to denote "or". 3. Problem Statement At the highest-level, unintended packet loss is the discarding of packets that the network operator otherwise intends to deliver, i.e. which indicates an error state. There are many possible reasons for unintended packet loss, including: erroring links may corrupt packets in transit; incorrect routing tables may result in packets being dropped because they do not match a valid route; configuration errors may result in a valid packet incorrectly matching an access control list (ACL) and being dropped. Whilst the specific definition of unintended packet loss is network dependent, for any network there are a small set of potential actions that can be taken to minimise customer impact by auto-mitigating unintended packet loss: 1. Take a device, link, or set of devices and/or links out of service. 2. Return a device, link, or set of devices and/or links back into service. 3. Move traffic to other links or devices. 4. Roll back a recent change to a device that might have caused the problem. 5. Escalate to a human (e.g., network operator) as a last resort. A precise signal of impact is crucial, as taking the wrong action can be worse than taking no action. For example, taking a congested device out of service can make congestion worse by moving the traffic to other links or devices, which are already congested. To detect whether device-reported discards indicate a problem and to determine what actions should be taken to mitigate the impact and remediate the cause, depends on four primary features of the packet loss signal: FEATURE-LOSS-CAUSE: The cause of the loss. Evans, et al. Expires 9 January 2025 [Page 4] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 FEATURE-LOSS-RATE: The rate and/or degree of the loss. FEATURE-LOSS-DURATION: The duration of the loss. FEATURE-LOSS-LOCATION: The location of the loss. Features FEATURE-LOSS-RATE, FEATURE-LOSS-DURATION, and FEATURE-LOSS- LOCATION are already addressed with passive monitoring statistics, for example, obtained with SNMP [RFC1157] / MIB-II [RFC1213] or NETCONF [RFC6241] / YANG [RFC7950]. Feature FEATURE-LOSS-CAUSE, however, is dependent on the classification scheme used for packet loss reporting. The next section defines a new classification scheme to address this problem. 4. Information Model The classification scheme is defined as a tree, which follows the structure component/direction/type/layer/sub-type/sub-sub- type/.../metric, where: a. Component can be interface|device|control_plane|flow b. Direction can be ingress|egress c. Type can be traffic|discards, where traffic accounts for packets successfully received or transmitted, and discards accounts for packet drops d. Layer can be l2|l3 structure packet-discard-reporting: +-- interface* [name] +-- name string +-- ingress | +-- traffic | | +-- l2 | | | +-- frames? uint64 | | | +-- bytes? uint64 | | +-- l3 | | | +-- v4 | | | | +-- packets? uint64 | | | | +-- bytes? uint64 | | | | +-- unicast | | | | | +-- packets? uint64 | | | | | +-- bytes? uint64 | | | | +-- multicast | | | | +-- packets? uint64 | | | | +-- bytes? uint64 | | | +-- v6 | | | +-- packets? uint64 | | | +-- bytes? uint64 Evans, et al. Expires 9 January 2025 [Page 5] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 | | | +-- unicast | | | | +-- packets? uint64 | | | | +-- bytes? uint64 | | | +-- multicast | | | +-- packets? uint64 | | | +-- bytes? uint64 | | +-- qos | | +-- class* [id] | | +-- id string | | +-- packets? uint64 | | +-- bytes? uint64 | +-- discards | +-- l2 | | +-- frames? uint64 | | +-- bytes? uint64 | +-- l3 | | +-- v4 | | | +-- packets? uint64 | | | +-- bytes? uint64 | | | +-- unicast | | | | +-- packets? uint64 | | | | +-- bytes? uint64 | | | +-- multicast | | | +-- packets? uint64 | | | +-- bytes? uint64 | | +-- v6 | | +-- packets? uint64 | | +-- bytes? uint64 | | +-- unicast | | | +-- packets? uint64 | | | +-- bytes? uint64 | | +-- multicast | | +-- packets? uint64 | | +-- bytes? uint64 | +-- errors | | +-- l2 | | | +-- rx | | | +-- frames? uint48 | | | +-- crc-error? uint48 | | | +-- invalid-mac? uint48 | | | +-- invalid-vlan? uint48 | | | +-- invalid-frame? uint48 | | +-- l3 | | | +-- rx | | | | +-- packets? uint48 | | | | +-- checksum-error? uint48 | | | | +-- mtu-exceeded? uint48 | | | | +-- invalid-packet? uint48 Evans, et al. Expires 9 January 2025 [Page 6] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 | | | | +-- ttl-expired? uint48 | | | +-- no-route? uint48 | | | +-- invalid-sid? uint48 | | | +-- invalid-label? uint48 | | +-- hardware | | +-- packets? uint48 | | +-- parity-error? uint48 | +-- policy | | +-- l2 | | | +-- frames? uint48 | | | +-- acl? uint48 | | +-- l3 | | +-- packets? uint48 | | +-- acl? uint48 | | +-- policer | | | +-- packets? uint48 | | | +-- bytes? uint48 | | +-- null-route? uint48 | | +-- rpf? uint48 | | +-- ddos? uint48 | +-- no-buffer | +-- class* [id] | +-- id string | +-- packets? uint64 | +-- bytes? uint64 +-- egress | +-- traffic | | +-- l2 | | | +-- frames? uint64 | | | +-- bytes? uint64 | | +-- l3 | | | +-- v4 | | | | +-- packets? uint64 | | | | +-- bytes? uint64 | | | | +-- unicast | | | | | +-- packets? uint64 | | | | | +-- bytes? uint64 | | | | +-- multicast | | | | +-- packets? uint64 | | | | +-- bytes? uint64 | | | +-- v6 | | | +-- packets? uint64 | | | +-- bytes? uint64 | | | +-- unicast | | | | +-- packets? uint64 | | | | +-- bytes? uint64 | | | +-- multicast | | | +-- packets? uint64 Evans, et al. Expires 9 January 2025 [Page 7] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 | | | +-- bytes? uint64 | | +-- qos | | +-- class* [id] | | +-- id string | | +-- packets? uint64 | | +-- bytes? uint64 | +-- discards | +-- l2 | | +-- frames? uint64 | | +-- bytes? uint64 | +-- l3 | | +-- v4 | | | +-- packets? uint64 | | | +-- bytes? uint64 | | | +-- unicast | | | | +-- packets? uint64 | | | | +-- bytes? uint64 | | | +-- multicast | | | +-- packets? uint64 | | | +-- bytes? uint64 | | +-- v6 | | +-- packets? uint64 | | +-- bytes? uint64 | | +-- unicast | | | +-- packets? uint64 | | | +-- bytes? uint64 | | +-- multicast | | +-- packets? uint64 | | +-- bytes? uint64 | +-- errors | | +-- l2 | | | +-- tx | | | +-- frames? uint48 | | +-- l3 | | +-- tx | | +-- packets? uint48 | +-- policy | | +-- l3 | | +-- acl? uint48 | | +-- policer | | +-- packets? uint48 | | +-- bytes? uint48 | +-- no-buffer | +-- class* [id] | +-- id string | +-- packets? uint64 | +-- bytes? uint64 +-- control-plane Evans, et al. Expires 9 January 2025 [Page 8] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 +-- ingress +-- traffic | +-- packets? uint48 | +-- bytes? uint48 +-- discards +-- packets? uint48 +-- bytes? uint48 +-- policy +-- packets? uint48 For additional context, Appendix A provides an example of where packets may be discarded in a device. 4.1. Requirements Requirements 1-10 relate to packets forwarded by the device; requirement 11 relates to packets destined to or from the device: 1. All instances of frame or packet receipt, transmission, and discards MUST be reported. 2. All instances of frame or packet receipt, transmission, and discards SHOULD be attributed to the physical or logical interface of the device where they occur. 3. An individual frame MUST only be accounted for by either the L2 traffic class or the L2 discard classes within a single direction, i.e., ingress or egress. 4. An individual packet MUST only be accounted for by either the L3 traffic class or the L3 discard classes within a single direction, i.e., ingress or egress. 5. A frame accounted for at L2 SHOULD NOT be accounted for at L3 and vice versa. An implementation MUST expose which layers a discard is counted against. 6. The aggregate L2 and L3 traffic and discard classes SHOULD account for all underlying packets received, transmitted, and discarded across all other classes. 7. The aggregate Quality of Service (QoS) traffic and no buffer discard classes MUST account for all underlying packets received, transmitted, and discarded across all other classes. 8. In addition to the L2 and L3 aggregate classes, an individual discarded packet MUST only account against a single error, policy, or no_buffer discard subclass. Evans, et al. Expires 9 January 2025 [Page 9] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 9. When there are multiple reasons for discarding a packet, the ordering of discard class reporting MUST be defined. 10. If Diffserv [RFC2475] is not used, no_buffer discards SHOULD be reported as class0. 11. Traffic to the device control plane has its own class, however, traffic from the device control plane SHOULD be accounted for in the same way as other egress traffic. 4.2. Examples Assuming all the requirements are met, a "good" unicast IPv4 packet received would increment: * interface/ingress/traffic/l3/v4/unicast/packets * interface/ingress/traffic/l3/v4/unicast/bytes * interface/ingress/traffic/qos/class_0/packets * interface/ingress/traffic/qos/class_0/bytes A received unicast IPv6 packet discarded due to Hop Limit expiry would increment: * interface/ingress/discards/l3/v6/unicast/packets * interface/ingress/discards/l3/v6/unicast/bytes * interface/ingress/discards/l3/rx/ttl_expired/packets An IPv4 packet discarded on egress due to no buffers would increment: * interface/egress/discards/l3/v4/unicast/packets * interface/egress/discards/l3/v4/unicast/bytes * interface/egress/discards/no_buffer/class_0/packets * interface/egress/discards/no_buffer/class_0/bytes 5. Example Signal-Cause-Mitigation Mapping Figure 1 gives an example discard signal-to-cause-to-mitigation action mapping. Mappings for a specific network will be dependent on the definition of unintended packet loss for that network. Evans, et al. Expires 9 January 2025 [Page 10] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 +-------------------------------------------+---------------------+------------+----------+-------------+-----------------------+ | Discard class | Cause | Discard | Discard | Unintended? | Possible actions | | | | rate | duration | | | +-------------------------------------------+---------------------+------------+----------+-------------+-----------------------+ | ingress/discards/errors/l2/rx | Upstream device | >Baseline | O(1min) | Y | Take upstream link or | | | or link errror | | | | device out-of-service | | ingress/discards/errors/l3/rx/ttl_expired | Tracert | <=Baseline | | N | no action | | ingress/discards/errors/l3/rx/ttl_expired | Convergence | >Baseline | O(1s) | Y | no action | | ingress/discards/errors/l3/rx/ttl_expired | Routing loop | >Baseline | O(1min) | Y | Roll-back change | | .*/policy/.* | Policy | | | N | no action | | ingress/discards/errors/l3/no_route | Convergence | >Baseline | O(1s) | Y | no action | | ingress/discards/errors/l3/no_route | Config error | >Baseline | O(1min) | Y | Roll-back change | | ingress/discards/errors/l3/no_route | Invalid destination | >Baseline | O(10min) | N | Escalate to operator | | ingress/discards/errors/local | Device errors | >Baseline | O(1min) | Y | Take device | | | | | | | out-of-service | | egress/discards/no_buffer | Congestion | <=Baseline | | N | no action | | egress/discards/no_buffer | Congestion | >Baseline | O(1min) | Y | Bring capacity back | | | | | | | into service or move | | | | | | | traffic | +-------------------------------------------+---------------------+------------+----------+-------------+-----------------------+ Figure 1: Example Signal-Cause-Mitigation Mapping The 'Baseline' in the 'Discard Rate' column is network dependent. 6. YANG Module The "ietf-packet-discard-reporting" uses the "sx" structure defined in [RFC8791]. file "ietf-packet-discard-reporting@2024-07-04.yang" module ietf-packet-discard-reporting { yang-version 1.1; namespace "urn:ietf:params:xml:ns:yang:ietf-packet-discard-reporting"; prefix plr; import ietf-yang-structure-ext { prefix sx; reference "RFC 8791: YANG Data Structure Extensions"; } organization "IETF OPSAWG (Operations and Management Area Working Group)"; contact "WG Web: https://datatracker.ietf.org/wg/opsawg/ WG List: mailto:opsawg@ietf.org Evans, et al. Expires 9 January 2025 [Page 11] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 Author: John Evans Author: Oleksandr Pylypenko Author: Jeffrey Haas Author: Aviran Kadosh Author: Mohamed Boucadair "; description "This module defines an information model for packet discard reporting. Copyright (c) 2024 IETF Trust and the persons identified as authors of the code. All rights reserved. Redistribution and use in source and binary forms, with or without modification, is permitted pursuant to, and subject to the license terms contained in, the Revised BSD License set forth in Section 4.c of the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info). This version of this YANG module is part of RFC XXXX; see the RFC itself for full legal notices."; revision 2024-06-04 { description "Initial revision."; reference "RFC XXXX: An Information Model for Packet Discard Reporting"; } typedef uint48 { type uint64 { range "0..281474976710655"; } description "48-bit unsigned integer type"; } typedef uint48-or-64 { type union { Evans, et al. Expires 9 January 2025 [Page 12] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 type uint48; type uint64; } description "Union type representing either a 48-bit or 64-bit unsigned integer. 48-bit counters are used for packet and discard counters that increase at a lower rate, while 64-bit counters are used for traffic byte counters that may increase more rapidly."; } /* * Groupings */ grouping basic-packets-64 { description "Basic grouping with 64-bit packets"; leaf packets { type uint64; description "Number of L3 packets"; } } grouping basic-packets-bytes-64 { description "Basic grouping with 64-bit packets and bytes"; uses basic-packets-64; leaf bytes { type uint64; description "Number of L3 bytes"; } } grouping basic-frames-64 { description "Basic grouping with 64-bit frames"; leaf frames { type uint64; description "Number of L2 frames"; } } grouping basic-frames-bytes-64 { description Evans, et al. Expires 9 January 2025 [Page 13] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 "Basic grouping with 64-bit packets and bytes"; uses basic-frames-64; leaf bytes { type uint64; description "Number of L2 bytes"; } } grouping basic-packets-48 { description "Basic grouping with 48-bit packets"; leaf packets { type uint48; description "Number of L3 packets"; } } grouping basic-packets-bytes-48 { description "Basic grouping with 48-bit packets and bytes"; uses basic-packets-48; leaf bytes { type uint48; description "Number of L3 bytes"; } } grouping basic-frames-48 { description "Basic grouping with 48-bit frames"; leaf frames { type uint48; description "Number of L2 frames"; } } grouping basic-frames-bytes-48 { description "Basic grouping with 48-bit packets and bytes"; uses basic-frames-48; leaf bytes { type uint48; description "Number of L2 bytes"; Evans, et al. Expires 9 January 2025 [Page 14] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 } } grouping l2-traffic { description "Layer 2 traffic counters"; uses basic-frames-bytes-64; } grouping ip { description "IP traffic counters"; uses basic-packets-bytes-64; container unicast { description "Unicast traffic counters"; uses basic-packets-bytes-64; } container multicast { description "Multicast traffic counters"; uses basic-packets-bytes-64; } } grouping l3-traffic { description "Layer 3 traffic counters"; container v4 { description "IPv4 traffic counters"; uses ip; } container v6 { description "IPv6 traffic counters"; uses ip; } } grouping qos { description "Quality of Service (QoS) traffic counters"; list class { key "id"; min-elements 1; description "QoS class traffic counters"; Evans, et al. Expires 9 January 2025 [Page 15] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 leaf id { type string; description "QoS class identifier"; } uses basic-packets-bytes-64; } } grouping traffic { description "Traffic counters"; container l2 { description "Layer 2 traffic counters"; uses l2-traffic; } container l3 { description "Layer 3 traffic counters"; uses l3-traffic; } container qos { description "Quality of Service (QoS) traffic counters"; uses qos; } } grouping control-plane { description "Control plane packet counters"; container ingress { description "Control plane ingress counters"; container traffic { description "Control plane ingress traffic counters"; uses basic-packets-bytes-48; } container discards { description "Control plane ingress packet discard counters"; uses basic-packets-bytes-48; container policy { description "Number of control plane packets discarded due to policy"; uses basic-packets-48; Evans, et al. Expires 9 January 2025 [Page 16] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 } } } } grouping errors-l2-rx { description "Layer 2 ingress frame errors"; container rx { description "Layer 2 ingress frame error counters"; leaf frames { type uint48; description "Number of errored L2 frames"; } leaf crc-error { type uint48; description "Number of frames received with CRC error"; } leaf invalid-mac { type uint48; description "Number of frames received with invalid MAC address"; } leaf invalid-vlan { type uint48; description "Number of frames received with invalid VLAN tag"; } leaf invalid-frame { type uint48; description "Number of invalid frames received"; } } } grouping errors-l3-rx { description "Layer 3 ingress packet error counters"; container rx { description "Layer 3 ingress packet receive error counters"; leaf packets { type uint48; description Evans, et al. Expires 9 January 2025 [Page 17] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 "Number of errored L3 packets"; } leaf checksum-error { type uint48; description "Number of packets received with checksum error"; } leaf mtu-exceeded { type uint48; description "Number of packets received exceeding MTU"; } leaf invalid-packet { type uint48; description "Number of invalid packets received"; } leaf ttl-expired { type uint48; description "Number of packets received with expired TTL"; } } leaf no-route { type uint48; description "Number of packets with no route"; } leaf invalid-sid { type uint48; description "Number of packets with invalid SID"; } leaf invalid-label { type uint48; description "Number of packets with invalid label"; } } grouping errors-l3-hw { description "Hardware error counters"; leaf packets { type uint48; description "Number of local errored packets"; } Evans, et al. Expires 9 January 2025 [Page 18] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 leaf parity-error { type uint48; description "Number of packets with parity error"; } } grouping errors-rx { description "Ingress error counters"; container l2 { description "Layer 2 received frame error counters"; uses errors-l2-rx; } container l3 { description "Layer 3 received packet error counters"; uses errors-l3-rx; } container hardware { description "Hardware error counters"; uses errors-l3-hw; } } grouping errors-l2-tx { description "Layer 2 transmit error counters"; container tx { description "Layer 2 transmit frame error counters"; leaf frames { type uint48; description "Number of errored L2 frames during transmission"; } } } grouping errors-l3-tx { description "Layer 3 transmit error counters"; container tx { description "Layer 3 transmit packet error counters"; leaf packets { Evans, et al. Expires 9 January 2025 [Page 19] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 type uint48; description "Number of errored L3 packets during transmission"; } } } grouping errors-tx { description "Egress error counters"; container l2 { description "Layer 2 transmit frame error counters"; uses errors-l2-tx; } container l3 { description "Layer 3 transmit packet error counters"; uses errors-l3-tx; } } grouping policy-l2-rx { description "Layer 2 policy ingress packet discard counters"; leaf frames { type uint48; description "Number of L2 frames discarded due to policy"; } leaf acl { type uint48; description "Number of frames discarded due to L2 ACL"; } } grouping policy-l3-rx { description "Layer 3 policy ingress packet discard counters"; leaf packets { type uint48; description "Number of L3 packets discarded due to policy"; } leaf acl { type uint48; description Evans, et al. Expires 9 January 2025 [Page 20] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 "Number of packets discarded due to L3 ACL"; } container policer { description "Policer ingress packet discard counters"; uses basic-packets-bytes-48; } leaf null-route { type uint48; description "Number of packets discarded due to null route"; } leaf rpf { type uint48; description "Number of packets discarded due to RPF check failure"; } leaf ddos { type uint48; description "Number of packets discarded due to DDoS protection"; } } grouping policy-rx { description "Policy-related ingress packet discard counters"; container l2 { description "Layer 2 policy ingress packet discard counters"; uses policy-l2-rx; } container l3 { description "Layer 3 policy ingress packet discard counters"; uses policy-l3-rx; } } grouping policy-l3-tx { description "Layer 3 policy egress packet discard counters"; leaf acl { type uint48; description "Number of packets discarded due to L3 egress ACL"; } container policer { Evans, et al. Expires 9 January 2025 [Page 21] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 description "Policer egress packet discard counters"; uses basic-packets-bytes-48; } } grouping policy-tx { description "Policy-related egress packet discard counters"; container l3 { description "Layer 3 policy egress packet discard counters"; uses policy-l3-tx; } } grouping interface { description "Interface-level packet loss counters"; container ingress { description "Ingress counters"; container traffic { description "Ingress traffic counters"; uses traffic; } container discards { description "Ingress packet discard counters"; container l2 { description "Layer 2 ingress discards traffic counters"; uses l2-traffic; } container l3 { description "Layer 3 ingress discards traffic counters"; uses l3-traffic; } container errors { description "Ingress packet error counters"; uses errors-rx; } container policy { description "Policy-related ingress packet discard counters"; Evans, et al. Expires 9 January 2025 [Page 22] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 uses policy-rx; } container no-buffer { description "Ingress packet discard counters due to buffer unavailability"; uses qos; } } } container egress { description "Egress counters"; container traffic { description "Egress traffic counters"; uses traffic; } container discards { description "Egress packet discard counters"; container l2 { description "Layer 2 egress packet discard counters"; uses l2-traffic; } container l3 { description "Layer 3 egress packet discard counters"; uses l3-traffic; } container errors { description "Egress packet error counters"; uses errors-tx; } container policy { description "Policy-related egress packet discard counters"; uses policy-tx; } container no-buffer { description "Egress packet discard counters due to buffer unavailability"; uses qos; } } } container control-plane { Evans, et al. Expires 9 January 2025 [Page 23] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 description "Control plane packet counters"; uses control-plane; } } /* * Main Structure */ sx:structure packet-discard-reporting { description "Container for packet discard reporting data."; list interface { key "name"; description "List of interfaces for which packet discard reporting data is provided."; leaf name { type string; description "Name of the interface."; } uses interface; } } } 7. Security Considerations The document defines a YANG module using [RFC8791]. As such, this document does not define data nodes. Following the guidance in Section 3.7 of [I-D.ietf-netmod-rfc8407bis], the YANG security template is not used. 8. IANA Considerations IANA is requested to register the following URI in the "ns" subregistry within the "IETF XML Registry" [RFC3688]: URI: urn:ietf:params:xml:ns:ietf-packet-discard-reporting Registrant Contact: The IESG. XML: N/A; the requested URI is an XML namespace. IANA is requested to register the following YANG module in the "YANG Module Names" subregistry [RFC6020] within the "YANG Parameters" registry: Evans, et al. Expires 9 January 2025 [Page 24] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 Name: ietf-packet-discard-reporting Namespace: urn:ietf:params:xml:ns:ietf-packet-discard-reporting Prefix: plr Maintained by IANA? N Reference: RFC XXXX 9. Contributors Nadav Chachmon Cisco Systems, Inc. 170 West Tasman Dr. San Jose, CA 95134 United States of America Email: nchachmo@cisco.com 10. Acknowledgments The content of this draft has benefitted from feedback from JR Rivers, Ronan Waide, Chris DeBruin, and Marcoz Sanz. 11. References 11.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC3688] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688, DOI 10.17487/RFC3688, January 2004, . [RFC6020] Bjorklund, M., Ed., "YANG - A Data Modeling Language for the Network Configuration Protocol (NETCONF)", RFC 6020, DOI 10.17487/RFC6020, October 2010, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . [RFC8791] Bierman, A., Björklund, M., and K. Watsen, "YANG Data Structure Extensions", RFC 8791, DOI 10.17487/RFC8791, June 2020, . 11.2. Informative References Evans, et al. Expires 9 January 2025 [Page 25] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 [I-D.ietf-netmod-rfc8407bis] Bierman, A., Boucadair, M., and Q. Wu, "Guidelines for Authors and Reviewers of Documents Containing YANG Data Models", Work in Progress, Internet-Draft, draft-ietf- netmod-rfc8407bis-14, 5 July 2024, . [RED93] Jacobson, V., "Random Early Detection gateways for Congestion Avoidance", n.d.. [RFC1157] Case, J., Fedor, M., Schoffstall, M., and J. Davin, "Simple Network Management Protocol (SNMP)", RFC 1157, DOI 10.17487/RFC1157, May 1990, . [RFC1213] McCloghrie, K. and M. Rose, "Management Information Base for Network Management of TCP/IP-based internets: MIB-II", STD 17, RFC 1213, DOI 10.17487/RFC1213, March 1991, . [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., and W. Weiss, "An Architecture for Differentiated Services", RFC 2475, DOI 10.17487/RFC2475, December 1998, . [RFC3444] Pras, A. and J. Schoenwaelder, "On the Difference between Information Models and Data Models", RFC 3444, DOI 10.17487/RFC3444, January 2003, . [RFC5153] Boschi, E., Mark, L., Quittek, J., Stiemerling, M., and P. Aitken, "IP Flow Information Export (IPFIX) Implementation Guidelines", RFC 5153, DOI 10.17487/RFC5153, April 2008, . [RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., and A. Bierman, Ed., "Network Configuration Protocol (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, . [RFC7270] Yourtchenko, A., Aitken, P., and B. Claise, "Cisco- Specific Information Elements Reused in IP Flow Information Export (IPFIX)", RFC 7270, DOI 10.17487/RFC7270, June 2014, . Evans, et al. Expires 9 January 2025 [Page 26] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 [RFC7950] Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language", RFC 7950, DOI 10.17487/RFC7950, August 2016, . [RFC8040] Bierman, A., Bjorklund, M., and K. Watsen, "RESTCONF Protocol", RFC 8040, DOI 10.17487/RFC8040, January 2017, . [RFC8289] Nichols, K., Jacobson, V., McGregor, A., Ed., and J. Iyengar, Ed., "Controlled Delay Active Queue Management", RFC 8289, DOI 10.17487/RFC8289, January 2018, . [RFC8340] Bjorklund, M. and L. Berger, Ed., "YANG Tree Diagrams", BCP 215, RFC 8340, DOI 10.17487/RFC8340, March 2018, . Appendix A. Where do packets get dropped? Figure 2 depicts an example of where and why packets may be discarded in a typical single ASIC, shared buffered type device, where packets ingress on the left and egress on the right. +----------+ | | | CPU | | | +--+---^---+ from_cpu | | to_cpu | | +------------------------------v---+-------------------------------+ | | +----------+ +----------+ +----------+ +----------+ +----------+ +----------+ +----------+ | | | | | | | | | | | | | | Packet rx -> Phy +--> Mac +--> Ingress +--> Buffers +--> Egresss +--> Mac +--> Phy |> Packet tx | | | | | Pipeline| | | | Pipeline| | | | | +----------+ +----------+ +----------+ +----------+ +----------+ +----------+ +----------+ Intended policy/acl policy/acl Discards: policy/policer policy/policer policy/urpf policy/null_route Unintended error/rx/l2 error/l3/rx no_buffer error/l3/tx Discards: error/local error/l3/no_route error/l3/rx/ttl_expired Evans, et al. Expires 9 January 2025 [Page 27] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 Figure 2: Example of where packets get dropped A.1. Discard Class Descriptions discards/policy/: These are intended discards, meaning packets dropped by a device due to a configured policy. There are multiple sub-classes. discards/error/l2/rx/: Frames discarded due to errors in the received L2 frame. There are multiple sub-classes, such as those resulting from failing CRC, invalid header, invalid MAC address, or invalid VLAN. discards/error/l3/rx/: These are discards which occur due to errors in the received packet, indicating an upstream problem rather than an issue with the device dropping the errored packets. There are multiple sub-classes, including header checksum errors, MTU exceeded, and invalid packet, i.e. due to incorrect version, incorrect header length, or invalid options. discards/error/l3/rx/ttl_expired: There can be multiple causes for TTL-expired (or Hop limit exceeded) discards: i) trace-route; ii) TTL (Hop limit) set too low by the end-system; iii) routing loops. discards/error/l3/no_route/: Discards occur due to a packet not matching any route. discards/error/local/: A device may discard packets within its switching pipeline due to internal errors, such as parity errors. Any errored discards not explicitly assigned to the above classes are also accounted for here. discards/no_buffer/: Discards occur due to no available buffer to enqueue the packet. These can be tail-drop discards or due to an active queue management algorithm, such as RED [RED93] or CODEL [RFC8289]. Appendix B. Implementation Experience This appendix captures the authors' experience gained from implementing and applying this information model across multiple vendors' platforms, as guidance for future implementers. 1. The number and granularity of classes described in Section 3 represent a compromise. It aims to offer sufficient detail to enable appropriate automated actions while avoiding excessive detail, which may hinder quick problem identification. Additionally, it helps constrain the quantity of data produced Evans, et al. Expires 9 January 2025 [Page 28] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 per interface to constrain data volume and device CPU impacts. Although further granularity is possible, the scheme described has generally proven to be sufficient for the task of auto- mitigating unintended packet loss. 2. There are many possible ways to define the discard classification tree. For example, we could have used a multi- rooted tree, rooted in each protocol. Instead, we opted to define a tree where protocol discards and causal discards are accounted for orthogonally. This decision reduces the number of combinations of classes and has proven sufficient for determining mitigation actions. 3. NoBuffer discards can be realized differently with different memory architectures. Whether a NoBuffer discard is attributed to ingress or egress can differ accordingly. For successful auto-mitigation, discards due to egress interface congestion should be reported on egress, while discards due to device-level congestion (e.g. due to exceeding the device forwarding rate) should be reported on ingress. 4. Platforms often account for the number of packets discarded where the TTL has expired (or Hop Limit exceeded), and the device CPU has returned an ICMP Time Exceeded message. There is typically a policer applied to limit the number of packets sent to the device CPU, however, which implicitly limits the rate of TTL discards that are processed. One method to account for all packet discards due to TTL expired, even those that are dropped by a policer when being forwarded to the CPU, is to use accounting of all ingress packets received with TTL=1. 5. Where no route discards are implemented with a default null route, separate discard accounting is required for any explicit null routes configured, in order to differentiate between interface/ingress/discards/policy/null_route/packets and interface/ingress/discards/errors/no_route/packets. 6. It is useful to account separately for transit packets discarded by ACLs or policers, and packets discarded by ACLs or policers which limit the number of packets to the device control plane. 7. It is not possible to identify a configuration error - e.g., when intended discards are unintended - with device packet loss metrics alone. For example, additional context is needed to determine if ACL discards are intended or due to a misconfigured ACL, i.e., with configuration validation before deployment or by detecting a significant change in ACL discards after a configuration change compared to before. Evans, et al. Expires 9 January 2025 [Page 29] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 8. Where traffic byte counters need to be 64-bit, packet and discard counters that increase at a lower rate may be encoded in fewer bits, e.g., 48-bit. 9. Aggregate counters need to be able to deal with the possibility of discontinuities in the underlying counters. 10. In cases where the reporting device is the source or destination of a tunnel, the ingress protocol for a packet may differ from the egress protocol; if IPv4 is tunneled over IPv6 for example. Some implementations may attribute egress discards to the ingress protocol. 11. While the classification tree is seven layers deep, a minimal implementation may only implement the top six layers. Authors' Addresses John Evans Amazon 1 Principal Place, Worship Street London EC2A 2FA United Kingdom Email: jevanamz@amazon.co.uk Oleksandr Pylypenko Amazon 410 Terry Ave N Seattle, WA 98109 United States of America Email: opyl@amazon.com Jeffrey Haas Juniper Networks 1133 Innovation Way Sunnyvale, CA 94089 United States of America Email: jhaas@juniper.net Aviran Kadosh Cisco Systems, Inc. 170 West Tasman Dr. San Jose, CA 95134 United States of America Evans, et al. Expires 9 January 2025 [Page 30] Internet-Draft Info. Model for Pkt Discard Reporting July 2024 Email: akadosh@cisco.com Mohamed Boucadair Orange France Email: mohamed.boucadair@orange.com Evans, et al. Expires 9 January 2025 [Page 31]