Path Monitoring System/Head-end based MPLS Ping and Traceroute in Inter-domain Segment Routing Networks

Internet-Draft	Inter-as-OAM	February 2024
Hegde, et al.	Expires 22 August 2024	[Page]

Abstract

The Routing (SR) architecture leverages source routing and tunneling paradigms and can be directly applied to the use of a Multiprotocol Label Switching (MPLS) data plane. A network may consist of multiple IGP domains or multiple Autonomous Systems(ASes) under the control of the same organization. It is useful to have the Label Switched Path (LSP) ping and traceroute procedures when an SR end-to-end path spans across multiple ASes or domains. This document describes mechanisms to facilitate LSP ping and traceroute in inter-AS/inter-domain SR-MPLS networks in an efficient manner with a simple Operations, Administration and Maintenance (OAM) protocol extension which uses data plane forwarding alone for forwarding echo replies on transit nodes.¶

1. Introduction

Many network deployments have built their networks consisting of multiple ASes either for the ease of operations or as a result of network mergers and acquisitions. Segment Routing can be deployed in such scenarios to provide end-to-end paths, traversing multiple Autonomous systems(ASes). These paths consist of Segment Identifiers (SIDs) of different types as per [RFC8402].¶

[RFC8660] specifies Segment Routing with an MPLS data plane. [RFC9087] describes BGP peering SIDs, which will help in steering packet from one AS to another. Using the above SR capabilities, paths that span across multiple ASes can be created.¶

                    +----------------+
                    | Controller/PMS |
                    +----------------+



 |---AS1-----|                |------AS2------|            |----AS3---|

                ASBR2----ASBR3                ASBR5------ASBR7
                /             \               /            \
               /               \             /              \
 PE1----P1---P2                 P3---P4---PE4              P5---P6--PE5
               \               /            \               /
                \             /              \             /
                 ASBR1----ASBR4              ASBR6------ASBR8

    Autonomous System: AS1, AS2, AS3
    Provider Edge: PE1, PE4, PE5
    Provider: P1, P2, P3, P4, P5, P6
    AS Boundary Router:ASBR1, ASBR2, ASBR3, ASBR4,
                       ASBR5, ASBR6, ASBR7, ASBR8

Figure 1: Inter-AS Segment Routing Topology

For example, Figure 1 describes an inter-AS network scenario consisting of ASes AS1, AS2 and AS3. AS1, AS2 and AS2 are Segment Routing enabled and the egress links have PeerNode SID/PeerAdj SID/ PeerSet SID configured and advertised via [RFC9086]. PeerNode SID/PeerAdj SID/PeerSet SID are referred to as Egress Peer Engineering SIDs (EPE-SIDs) in this document. The controller or the head-end can build an end-to-end Traffic-Engineered path consisting of Node-SIDs, Adjacency-SIDs and EPE-SIDs. It is useful for operators to be able to perform LSP ping and traceroute procedures on these inter-AS SR-MPLS paths, in order to detect and diagnose failed deliveries and to determine the actual path that traffic takes through the network. LSP ping/traceroute procedures use IP connectivity for echo reply to reach the head-end. In inter-AS networks, IP connectivity may not be there from each router in the path. For example, in Figure 1, P3 and P4 may not have IP connectivity for PE1.¶

It is not possible to carry out LSP ping and traceroute functionality on these paths to verify basic connectivity and fault isolation using existing LSP ping and traceroute mechanism([RFC8287] and [RFC8029]). That is because there might not always be IP connectivity from a responding node back to the source address of the ping packet when the responding node is in a different AS from the source of the ping.¶

[RFC8403] describes mechanisms to carry out MPLS ping/traceroute from a Path Monitoring System (PMS). It is possible to build GRE tunnels or static routes to each router in the network to get IP connectivity for the reverse path. This mechanism is operationally very heavy and requires the PMS to be capable of building a huge number of GRE tunnels or installing the necessary static routes, which may not be feasible.¶

[RFC7743] describes an Echo-relay based solution based on advertising a new Relay Node Address Stack TLV containing a stack of Echo-relay IP addresses. These mechanisms can be applied to SR networks as well. The [RFC7743] mechanism requires the return ping packet to be processed on the slow path or as a bump-in-the-wire on every relay node. The motivation of the current document is to provide an alternate mechanism for ping/traceroute in inter-domain segment-routing networks. The definition of the term "domain" as applicable to this document is defined in Section 1.1.¶

This document describes a new mechanism that is efficient and simple and can be easily deployed in SR-MPLS networks. This mechanism uses MPLS paths and no changes are required in the forwarding path. Any MPLS-capable node will be able to forward the echo-reply packet in the fast path. The current document describes a mechanism that uses the Reply Path TLV [RFC7110] to convey the reverse path. Three new sub-TLVs are defined for the Reply path TLV that faciliate encoding SR label stack. The return path can either be derived by a smart application or controller which has a full topology view. This document also proposes mechanisms to derive the return path dynamically during traceroute procedures.¶

The current document is focused on the inter-domain use case. However, the protocol extensions described in this document may be applied to indicate the return path for other use cases as well.¶

1.1. Definition of Domain

The term domain used in this document implies an IGP domain where every node is visible to every other node for shortest path computation. The domain implies an IGP area or level. An AS consists of one or more IGP domains. The procedures described in this document apply to paths built across multiple domains which include inter-area as well as inter-AS paths. The procedures and deployment scenarios described in this document apply to inter-AS paths where the participating ASes belong to closely coordinating administrations or to a single ownership. This document applies to SR-MPLS networks where all nodes in each of the domains are SR capable. It is also applies to SR-MPLS networks where SR acts an an overlay having SR incapable underlay nodes. In such networks, the traceroute procedure is executed only on the overlay SR nodes.¶

1.2. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶

4. Segment Sub-TLV

[RFC9256] defines various types of segments. The types of segments applicable to this document have been defined in this section for the use of MPLS OAM. The intention was to keep the definitions as close to those in [RFC9256] as possible with modifications only when needed. One or more Segment Sub-TLVs can be included in the Reply Path TLV. The Segment Sub-TLVs included in a Reply Path TLV MAY be of different types.¶

Below types of Segment Sub-TLVs are applicable to the Reply Path TLV.¶

Type-A: SID only, in the form of MPLS Label¶

Type-C: IPv4 Node Address with optional SID¶

Type-D: IPv6 Node Address with optional SID for SR MPLS¶

4.1. Type-A: SID only, in the form of MPLS Label

The Type A Segment Sub-TLV encodes a single SID in the form of an MPLS label. The format is as follows:¶


    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type                      |   Length                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Flags       |   RESERVED                                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |          Label                        | TC  |S|       TTL     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 3: Type-A Segment Sub-TLV

where:¶

Type: TBD1(to be assigned by IANA from the registry "Sub-TLVs for TLV Types 1, 16, and 21").¶

Length is 8.¶

Flags: 1 octet of flags as defined in Section 4.4.¶

RESERVED: 3 octets of reserved bits. MUST be set to zero when sending; MUST be ignored on receipt..¶

Label: 20 bits of label value.¶

TC: 3 bits of traffic class¶

S: 1 bit Reserved¶

TTL: 1 octet of TTL.¶

The following applies to the Type-A Segment Sub-TLV:¶

The S bit SHOULD be zero upon transmission, and MUST be ignored upon reception.¶

If the originator wants the receiver to choose the TC value, it sets the Traffic Class(TC) field to zero.¶

If the originator wants the receiver to choose the TTL value, it sets the TTL field to 255.¶

If the originator wants to recommend a value for these fields, it puts those values in the TC and/or TTL fields.¶

The receiver MAY override the originator's values for these fields. This would be determined by local policy at the receiver. One possible policy would be to override the fields only if the fields have the default values specified above.¶

4.2. Type-C: IPv4 Node Address with Optional SID for SR-MPLS

The Type-C Segment Sub-TLV encodes an IPv4 node address, SR Algorithm and an optional SID in the form of an MPLS label. The format is as follows:¶

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type                      |   Length                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Flags       |  RESERVED (MBZ)             | SR Algorithm    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 IPv4 Node Address (4 octets)                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                SID (optional, 4 octets)                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 4: Type-C Segment Sub-TLV

where:¶

Type: TBD2 (to be assigned by IANA from the registry "Sub-TLVs for TLV Types 1, 16, and 21").¶

Length is 8 or 12.¶

Flags: 1 octet of flags as defined in Section 4.4.¶

SR Algorithm: 1 octet specifying SR Algorithm as described in section 3.1.1 in [RFC8402], when A-Flag as defined in Section 4.4is present. SR Algorithm is used by the receiver to derive the Label. When A-flag is unset, this field has no meaning and thus MUST be set to zero on transmission and ignored on receipt.¶

RESERVED: 2 octets of reserved bits. MUST be set to zero when sending; MUST be ignored on receipt.¶

IPv4 Node Address: 4-octet IPv4 address representing a node.¶

SID: optional: 4-octet field containing label, TC, S and TTL as defined in Section 4.1. When the SID field is present, it MUST be used for constructing the Reply Path.¶

The following applies to the Type-C Segment Sub-TLV:¶

The IPv4 Node Address MUST be present.¶

The SID is optional and specifies a 4-octet MPLS SID containing label, TC, S and TTL as defined in Section 4.1.¶

If the length is 8, then only the IPv4 Node Address is present.¶

If the length is 12, then the IPv4 Node Address and the MPLS SID are present.¶

4.3. Type D: IPv6 Node Address with Optional SID for SR MPLS

The Type-D Segment Sub-TLV encodes an IPv6 node address, SR Algorithm and an optional SID in the form of an MPLS label. The format is as follows:¶

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Type                      |   Length                      |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   Flags       |       RESERVED(MBZ)           | SR Algorithm  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   //                IPv6 Node Address (16 octets)                //
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                SID (optional, 4 octets)                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 5: Type-D Segment Sub-TLV

where:¶

Type: TBD3 (to be assigned by IANA from the registry "Sub-TLVs for TLV Types 1, 16, and 21").¶

Length is 20 or 24.¶

Flags: 1 octet of flags as defined in Section 4.4.¶

SR Algorithm: 1 octet specifying SR Algorithm as described in section 3.1.1 in [RFC8402], when A-Flag as defined in Section 4.4 is present. SR Algorithm is used by the receiver to derive the label.W hen A-flag is unset, this field has no meaning and thus MUST be set to zero (MBZ) on transmission and ignored on receipt.¶

RESERVED: 2-octet of reserved bits. MUST be set to zero when sending; MUST be ignored on receipt.¶

IPv6 Node Address: 16-octet IPv6 address representing a node.¶

SID: optional: 4-octet field containing label, TC, S and TTL as defined in Section 4.1 ¶

The following applies to the Type-D Segment Sub-TLV:¶

The IPv6 Node Address MUST be present.¶

The SID is optional and specifies a 4-octet MPLS SID containing label, TC, S and TTL as defined in Section 4.1.When the SID field is present, it MUST be used for constructing the Reply Path.¶

If the length is 20, then only the IPv6 Node Address is present.¶

If the length is 24, then the IPv6 Node Address and the MPLS SID are present.¶

4.4. Segment Flags

The Segment Types described above contain the following flags in the "Flags" field (codes to be assigned by IANA from the new registry "Segment Sub-TLV Flags" )¶

    0 1 2 3 4 5 6 7
   +-+-+-+-+-+-+-+-+
   | |A|           |
   +-+-+-+-+-+-+-+-+

Figure 6: Flags

where:¶

A-Flag: This flag indicates the presence of SR Algorithm ID in the "SR-Algorithm" field applicable to various Segment Types.¶

Unused bits in the Flag octet MUST be set to zero upon transmission and MUST be ignored upon receipt.¶

The following applies to the Segment Flags:¶

A-Flag applies to Segment Type-C and Type-D. If A-Flag appears with Type-A Segment Type, it MUST be ignored.¶

6. Detailed Procedures

6.1. Sending an Echo Request

In the inter-AS scenario, the procedures described in this document are used to specify the return path, if IP connectivity to the initiator is not available, and may be used in any case. LSP ping initiator MUST set the Reply Mode of the echo request to 5 "Reply via Specified Path", and a Reply Path TLV MUST be carried in the echo request message correspondingly. The Reply Path TLV MUST contain the Segment Routing Path in the reverse direction encoded as an ordered list of segments. The first Segment MUST correspond to the top Segment in MPLS header that the responder MUST use while sending the echo reply.¶

6.2. Receiving an Echo Request

As described in [RFC7110], when Reply mode is set to 5 (Reply via Specified Path), the echo request must contain the Reply path TLV. Absence of Reply Path TLV is treated as a malformed echo request. When an echo request is received, if the egress LSR does not know the Reply Mode 5 defined in [RFC7110], an echo reply with the return code set to "Malformed echo request received" and the Subcode set to zero must be sent back to the ingress LSR according to the rules of [RFC8029]. When a Reply Path TLV is received, and the responder that supports processing it, it MUST use the segments in Reply Path TLV to build the echo reply. The responder MUST follow the normal FEC validation procedures as described in [RFC8029] and [RFC8287] and this document does not suggest any change to those procedures. When the echo reply has to be sent out the Reply Path TLV MUST be used to construct the MPLS packet to send out.¶

6.3. Sending an Echo Reply

The echo reply message is sent as an MPLS packet with an MPLS label stack. The echo reply message MUST be constructed as described in the [RFC8029]. An MPLS packet is constructed with an echo reply in the payload. The top label MUST be constructed from the first Segment from the Reply Path TLV. The remaining labels MUST follow the order from the Reply Path TLV. The responder MAY check the reachability of the top label in its own Label Forwarding Information Base (LFIB) before sending the echo reply and provide necessary log information in case of unreachabilty. In certain scenarios, the head-end MAY choose to send Type-C/Type-D segments consisting of IPV4 address or IPv6 address, when it is unable to derive the SID from available topology information. Optionally SID may also be associated with the Type-C/Type-D segment, if such information is available from the controller or via operator input. In such cases, the node sending the echo reply MUST derive the MPLS labels based on Node-SIDs associated with the IPv4 /IPv6 addresses or from the optional MPLS SIDs in the Type-C/Type-D segments and encode the echo reply with MPLS labels.¶

The reply path return code is set as described in section 7.4 of [RFC7110].According to section 5.3 of [RFC7110] The Reply Path TLV is included in an echo reply indicating the specified return path that the echo reply message is required to follow.¶

When the node is configured to dynamically create a return path for the next echo request, the procedures described in Section 8 MUST be used. The reply path return code MUST be set to TBA1 and the same Reply Path TLV or a new Reply Path TLV MUST be included in the echo reply.¶

6.4. Receiving an Echo Reply

The rules and process defined in Section 4.6 of [RFC8029] and section 5.4 of [RFC7110] apply here. In addition, if the Reply path return code is "Use Reply Path TLV in the echo reply for building the next echo request", the Reply Path TLV from the echo Reply MUST be sent in the next echo request with TTL incremented by 1. If the TTL is already 255, the traceroute procedure MUST be ended with an appropriate log message.¶

7. Detailed Example

The example topology given in Figure 1 will be used will be used in the below sections to explain LSP ping and traceroute procedures. The PMS/Head-end has a complete view of topology. PE1, P1, P2, ASBR1 and ASBR2 are in AS1. Similarly ASBR3, ASBR4, P3, P4 and PE4 are in AS2.¶

AS1 and AS2 have Segment Routing enabled. IGPs like OSPF/ISIS are used to flood SIDs in each AS. The ASBR1, ASBR2, ASBR3 and ASBR4 advertise BGP EPE-SIDs for the inter-AS links. Topology of AS1 and AS2 are advertised via BGP-Link State (BGP-LS) to the controller/PMS or Head-end node. The EPE-SIDs are also advertised via BGP-LS as described in [RFC9086]. The example uses EPE-SIDs for the inter-AS links but the same could be achieved using adjacency-SIDs advertised for a passive IGP link.¶

The description in the document uses below notations for Segment Identifiers(SIDs).¶

Node SIDs: N-PE1, N-P1, N-ASBR1 N-ABR1, N-ABR2etc.¶

Adjacency SIDs: Adj-PE1-P1, Adj-P1-P2 etc.¶

EPE-SIDs: EPE-ASBR2-ASBR3, EPE-ASBR1-ASBR4, EPE-ASBR3-ASBR2 etc.¶

Let us consider a traffic-engineered path built from PE1 to PE4 with Segment List stack as below. N-P1, N-ASBR1, EPE-ASBR1-ASBR4, N-PE4 for following procedures. This stack may be programmed by controller/PMS or Head-end router PE1 may have imported the whole topology information from BGP-LS and computed the inter-AS path.¶

7.1. Procedures for Segment Routing LSP ping

Consider an SR-MPLS path from PE1 to PE4 consisting of a label stack [N-P1, N-ASBR1, EPE-ASBR1-ASBR4, N-PE4] from Figure 1. In order to perform MPLS ping procedures on this path, the remote end (PE4) needs IP connectivity to head end PE1, for the echo reply to travel back to PE1. In a deployment that uses controller-computed inter-domain path, there may be no IP connectivity from PE4 to PE1 as they lie in different ASes.¶

PE1 sends an echo request message to the end-point PE4 along the path that consists of label stacks [N-P1, N-ASBR1, EPE-ASBR1-ASBR4, N-PE4]. PE1 adds the return path from PE4 to PE1 in the echo request message in the Reply Path TLV. As an example, Reply Path TLV for PE1 to PE4 for LSP ping is [N-ASBR4, EPE-ASBR4-ASBR1, N-PE1]. This example path provides the entire return path up to the head-end node PE1. The mechanism used to construct the return path is implementation dependent.¶

An implementation may also build a return Path consisting of labels to reach its own AS. Once the label stack is popped off the echo reply message will be exposed. The further packet forwarding will be based on IP lookup. An example return Path for this case could be [N-ASBR4, EPE-ASBR4-ASBR1].¶

On receiving MPLS echo request PE4 first validates FEC in the echo request. PE4 then builds a label stack to send the response from PE4 to PE1 by copying the labels from Reply Path TLV. PE4 builds the echo reply packet with the MPLS label stack constructed and imposes MPLS headers on top of echo reply packet and sends out the packet towards PE1. This Segment List stack can successfully steer reply back to the Head-end node (PE1).¶

7.2. Procedures for Segment Routing LSP traceroute

7.2.1. Procedures for Segment Routing LSP traceroute with the Same SRGB on All Nodes

The traceroute procedure involves visiting every node on the path and echo replies sent from every node. In this section, we describe the traceroute mechanims when the headend/PMS has complete visibility of the database. Headend/PMS computes the return path from each node in the entire SR-MPLS path that is being tracerouted. The return path computation is implementation dependent. As the headend/PMS completely controls the return path, it can use proprietary computations to build the return path.¶

One of the ways the return path can be built, is to use the principle of building label stacks by adding each domain border node's Node SID on the return path label stack as the traceroute progresses. For inter-AS networks, in addition to border node's Node-SID, EPE-SID in the reverse direction also needs to be added to the label stack.¶

The Inter-domain/inter-as traceroute procedure uses the TTL expiry mechanism as specified in [RFC8029] and [RFC8287]. Every echo request packet Headend/PMS will include the appropriate return path in the Reply Path TLV. The node that receives the echo request will follow procedures described in section Section 6.1 and section Section 6.2 to send out an echo reply.¶

For Example:¶

Let us consider a topology from Figure 1. Let us consider a SR-MPLS path [N-P1, N-ASBR1, EPE-ASBR1-ASBR4, N-PE4]. The traceroute is being executed for this inter-AS path for destination PE4. PE1 sends the first echo request with TTL set to 1 and includes Reply Path TLV consisting of Type-A Segment containing label derived from its own SR Global Block (SRGB). Note that the type of segment used in constructing the return Path is local policy. If the entire network has the same SRGB configured, Type-A segments can be used. The TTL expires on P1 and P1 sends an echo reply using the return path. Note that implementations may choose to exclude the Reply Path TLV until traceroute reaches the first domain border as the return IP path to PE1 is expected to be available inside the first domain.¶

TTL is set to 2 and the next the echo request is sent out. Until the traceroute procedure reaches the domain border node ASBR1, the same return path TLV consisting of single Label (PE1's node Label) is used. When echo request reaches the border node ASBR1, and an echo reply is received from ASBR1, the next echo request needs to include an additional label as ASBR1 is a border node. The head-end node has complete visibility of the network database learned via BGP-LS [RFC9552] and [RFC9086] and can derive the details of Autonomous System Boundary Router (ASBR) nodes. The Reply Path TLV is built based on the forward path. As the forward path consists of EPE-ASBR1-ASBR4, an EPE-SID in the reverse direction is included in the Reply Path TLV. The return path now consists of two labels [EPE-ASBR4-ASBR1, N-PE1]. The echo reply from ASBR4 will use this return path to send the reply.¶

The next echo request after visiting the border node ASBR4 will update the return path with the Node-SID label of ASBR4. The return path beyond ASBR4 will be [N-ASBR4, EPE-ASBR4-ASBR1, N-PE1]. This same return path is used until the traceroute procedure reaches the next set of border nodes. When there are multiple ASes the traceroute procedure will continue by adding a set of Node-SIDs and EPE-SIDs as the border nodes are visited.¶

Note that the above return path-building procedure requires the database of all the domains to be available at the headend/PMS.¶

7.2.2. Procedures for Segment Routing LSP Traceroute with the Different SRGBs

The Section 7.2.1 assumes the same SRGB is configured on all nodes along the path. The SRGB may differ from one node to another node and the SR architecture [RFC8402] allows the nodes to use different SRGB. In such scenarios, PE1 sends Type-C (or Type-D in case of IPv6 networks) segment with the Node address of PE1 and with optional MPLS SID associated with the Node address. The receiving node derives the label for the return path based on its own SRGB. When the traceroute procedure crosses the border ASBR1, headend PE1 should send a Type-A segment for N-PE1 based on the label derived from ASBR1's SRGB. This is required because, ASBR4, P3, P4 etc may not have the topology information to derive SRGB for PE1. After the traceroute procedure reaches ASBR4 the return path will be [N-PE1 (Type-A with label based on ASBR1's SRGB), EPE-ASBR4-ASBR1, N-ASBR4 (Type-C)].¶

To extend the example to multiple ASes consisting of 3 or more ASes, let us consider a traceroute from PE1 to PE5 in Figure 1. In this example, the PE1 to PE5 path has to cross 3 domains AS1, AS2 and AS3. Let us consider a path from PE1 to PE5 that goes through [PE1, ASBR1, ASBR4, ASBR6, ASBR8,PE5]. When the traceroute procedure is visiting the nodes in AS1, the Reply Path TLV sent from the headend consists of [N-PE1]. When the traceroute procedure reaches the ASBR4, the return Path consists of [N-PE1, EPE-ASBR4-ASBR1]. While visiting nodes in AS2, the traceroute procedure consists of Reply Path TLV [N-PE1, EPE-ASBR4-ASBR1, N-ASBR4]. similarly, while visiting the ASBR8 Reply Path TLV adds the EPE-SID from ASBR8 to ASBR6. While visiting nodes in AS3 Node-SId of ASBR8 would also be added which makes the return Path [N-PE1, EPE-ASBR4-ASBR1, N-ASBR4, EPE-ASBR8-ASBR6, N-ASBR8]¶

Let us consider another example from topology Figure 2. This topology consists of multi-domain IGP with a common border node between the domains. This could be achieved with multi-area or multi-level IGP or multiple instances of IGP deployed on the same node. The return path computation for this topology is similar to the multi-AS computation except that the return path consists of a single border node label. When the traceroute procedure visits node P, the return path consists of [N-PE1, N-ABR1].¶

8. Building Reply Path TLV Dynamically

In some cases, the head-end may not have complete visibility of Inter-AS/Inter-domain topology. In such cases, it can rely on downstream routers to build the reverse path for MPLS traceroute procedures. For this purpose, Reply Path TLV in the echo reply corresponds to the return path to be used in building the next echo request.¶

   Value         Meaning
   ------        ----------------------
   TBA1        Use Reply Path TLV in the echo reply
               for building the next echo request.

8.1. The procedures to Build the Return Path

To dynamically build the return Path for the traceroute procedures, the domain border nodes along the path being traced should support the procedures described in this section. Local policy on the domain border nodes should determine whether the domain border node participates in building the return path dynamically during traceroute.¶

The headend/PMS node may include its node label while initiating traceroute procedure. When an Area Border Router (ABR) receives the echo request, if the local policy implies building a dynamic return path, ABR should include its Node label in the reply path TLV and send it in the echo reply. If there is a Reply Path TLV included in the received echo request message, the ABR's node label is added before the existing segments. The type of segment added is based on local policy. In cases when SRGB is not uniform across the network, it is RECOMMENDED to add a Type-C or a Type-D segment, but implementations MAY safely use other approaches if they see benefits in doing so. If the existing segment in the Reply Path TLV is a Type-C/Type-D segment, that segment should be converted to a Type-A segment based on ABR's own SRGB.This is because downstream nodes will not know what SRGB to use to translate the IP address to a label. As the ABR added its own Node label, it is guaranteed that this ABR will be in the return path and will be forwarding the traffic based on the next label after its label.¶

When an ASBR receives an echo request from another AS, and ASBR is configured to build the return path dynamically, ASBR should build a Reply Path TLV and include it in the echo reply. The Reply Path TLV should consist of its own node label and an EPE-SID to the AS from where the traceroute message was received. A Reply path return code of TBA1 should be set in the echo reply to indicate that the next echo request should use the return Path from the Reply Path TLV in the echo reply. ASBR should locally decide the outgoing interface for the echo reply packet. Generally, remote ASBR will choose the interface on which the incoming OAM packet was received to send the echo reply out. Reply Path TLV is built by adding two segment sub TLVs. The top segment sub TLV consists of the ASBR's Node SID and the second segment consists of the EPE-SID in the reverse direction to reach the AS from which the OAM packet was received. The type of segment chosen to build Reply Path TLV is a local policy. It is recommended to use the Type-C/Type-D segment for the top segment when the SRGB is not guaranteed to be uniform in the domain.¶

Irrespective of which type of segment is included in the Reply Path TLV, the responder of echo requests should always translate the Reply Path TLV to a label stack and build an MPLS header for the the echo reply packet. This procedure can be applied to an end-to-end path consisting of multiple ASes. Each ASBR that receives an echo request from another AS adds its Node-SID and EPE-SID on top of existing segments in the Reply Path TLV.¶

An ASBR that receives the echo request from a neighbor belonging to the same AS, MUST look at the Reply Path TLV received in the echo request. If the Reply Path TLV consists of a Type-C/Type-D segment, it MUST convert the Type-C/Type-D segment to a Type-A segment by deriving a label from its own SRGB. The ASBR MUST set the reply path return code to TBA1 and send the newly constructed Reply Path TLV in the echo reply.¶

Internal nodes or nondomain border nodes MAY not set the Reply Path TLV return code to TBA1 in the echo reply message as there is no change in the return Path. In these cases, the headend node/PMS that initiates the traceroute procedure MUST continue to send previously sent Reply Path TLV in the echo request message in every next echo request.¶

Note that an ASBR's local policy may prohibit it from participating in the dynamic traceroute procedures. If such an ASBR is encountered in the forward path, dynamic return path-building procedures will fail. In such cases, ASBR that supports this document MUST set the return code TBA2 to indicate local policies do not allow the dynamic return path building.¶

   Value         Meaning
   ------        ---------------------------------------------------
    TBA2        Local policy does not allow dynamic return Path
                building.

8.2. Details with Example

Let us consider a topology from Figure 1. Let us consider an SR policy path built from PE1 to PE4 with a label stack as below. N-P1, N-ASBR1, EPE-ASBR1-ASBR4, N-PE4. PE1 begins traceroute with TTL set to 1 and includes [N-PE1] in the Reply Path TLV. The traceroute packet TTL expires on P1 and P1 processes the traceroute as per the procedures described in [RFC8029] and [RFC8287]. P1 sends echo reply with the same Reply Path TLV with reply path return code set to 6. The return code of the echo reply itself is set to the return code as per [RFC8029] and [RFC8287]. This traceroute doesn't need any changes to the Reply Path TLV till it leaves AS1. The same Reply Path TLV that is received may be included in the echo reply by P1 and P2 or no Reply Path TLV included so that headend continues to use the same return path in echo request that it used to send previous echo request.¶

When ASBR1 receives the echo request, in case it received Type-C/Type-D segment in the Reply Path TLV in the echo request, converts that Type-C/Type-D segment to Type-A based on its own SRGB. When ASBR4 receives the echo request, it should form this Reply Path TLV using its own Node SID (N-ASBR4) and EPE-SID (EPE-ASRB4-ASBR1) labels and set the reply path return code to TBA1. Then PE1 should use this Reply Path TLV in subsequent echo requests. In this example, when the subsequent echo request reaches P3, it should use this Reply Path TLV for sending the echo reply. The same Reply Path TLV is sufficient for any router in AS2 to send the reply. Because the first label(N-ASBR4) can direct echo reply to ASBR4 and the second one (EPE-ASBR4-ASBR1) to direct echo reply to AS1. Once the echo reply reaches AS1, normal IP forwarding or the N-PE1 helps it to reach PE1.¶

The example described in the above paragraphs can be extended to multiple ASes by following the same procedure of each ASBR adding Node-SID and EPE-SID on receiving echo request from neighboring AS.¶

Let us consider a topology from Figure 2. It consists of multiple IGP domains with multiple areas/levels or separate IGP instances. There is a single border node that separates the two domains. In this case, PE1 sends a traceroute packet with TTL set to 1 and includes N-PE1 in the Reply Path TLV. ABR1 receives the echo request and while sending the echo reply adds its node Label to the Reply Path TLV and sets the Reply path return code to TBA1. The Reply Path TLV in the echo reply from ABR1 consists of [N-ABR1, N-PE1]. Next echo request with TTL 2 reaches the P node. It is an internal node so it does not change the return Path. Echo request with TTL 3 reaches ABR2 and it adds its own Node label so the Reply Path TLV sent in echo reply will be [N-ABR2, N-ABR1, N-PE1]. echo request with TTL 4 reaches PE4 and it sends an echo reply return code as Egress. PE4 does not include any Reply Path TLV in the echo reply. The above example assumes uniform SRGB throughout the domain. In the case of different SRGBs, the top segment will be a Type-C/Type-D segment and all other segments will be Type-A. Each border node converts the Type-C/Type-D segment to Type-A before adding its own segment to the Reply Path TLV.¶

10. IANA Considerations

10.1. Segment Sub-TLV

IANA should assign three new sub-TLVs from the "sub-TLVs for TLV Types 1, 16, and 21" subregistry of the "Multi-Protocol Label Switching (MPLS) Label Switched Paths (LSPs) Ping Parameters" registry.¶

   Sub-Type    Sub-TLV Name                  Reference
   --------    -----------------            ------------
 TBD1          SID only in the form of MPLS   Section 4.1
               label                          of this document
 TBD2          IPv4 Node Address with         Section 4.2
               optional SID for SR-MPLS       of this document
 TBD3          IPv6 Node Address with         Section 4.3
               optional SID for SR-MPLS       of this document

The allocation of code points for the Segment Sub-TLVs should be done from the Standards Action range (0-16383)¶

10.2. New Registry for Segment Sub-TLV Flags

IANA should create a new "Segment ID sub-TLV flags" (see Section Section 4.4) registry under the "Multi-Protocol Label Switching (MPLS) Label Switched Paths (LSPs) Ping Parameters" registry.¶

This registry tracks the assignment of 8 flags in the Segment ID sub-TLV flags field. The flags are numbered from 0 (most significant bit, transmitted first) to 8.¶

New entries are assigned by Standards Action. Initial entries in the registry are as follows:¶


      Bit number  |  Name                      | Reference
      ------------+----------------------------+--------------
        1         |  A Flag                    | Section 4.4
                  |                            | of this document

10.3. Reply Path Return Codes Registry

IANA should assign new return codes in the "Reply path return code" registry under the "Multi-Protocol Label Switching (MPLS) Label Switched Paths (LSPs) Ping Parameters" registry.¶


    Value            Meaning                  Reference
   --------         -----------------        ------------
 TBA1                Use Reply Path TLV       This document
                     from this echo reply
                     for building next
                     echo request.

 TBA2                Local policy does        This document
                     not allow dynamic
                     return Path building.

The return codes should be assigned from the Standards Action range (0x0000-0xFFFB).¶

14. References

14.1. Normative References

[RFC2119]: Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, <https://www.rfc-editor.org/info/rfc2119>.
[RFC7110]: Chen, M., Cao, W., Ning, S., Jounay, F., and S. Delord, "Return Path Specified Label Switched Path (LSP) Ping", RFC 7110, DOI 10.17487/RFC7110, January 2014, <https://www.rfc-editor.org/info/rfc7110>.
[RFC8029]: Kompella, K., Swallow, G., Pignataro, C., Ed., Kumar, N., Aldrin, S., and M. Chen, "Detecting Multiprotocol Label Switched (MPLS) Data-Plane Failures", RFC 8029, DOI 10.17487/RFC8029, March 2017, <https://www.rfc-editor.org/info/rfc8029>.
[RFC8174]: Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, <https://www.rfc-editor.org/info/rfc8174>.
[RFC8287]: Kumar, N., Ed., Pignataro, C., Ed., Swallow, G., Akiya, N., Kini, S., and M. Chen, "Label Switched Path (LSP) Ping/Traceroute for Segment Routing (SR) IGP-Prefix and IGP-Adjacency Segment Identifiers (SIDs) with MPLS Data Planes", RFC 8287, DOI 10.17487/RFC8287, December 2017, <https://www.rfc-editor.org/info/rfc8287>.

14.2. Informative References

[RFC7743]: Luo, J., Ed., Jin, L., Ed., Nadeau, T., Ed., and G. Swallow, Ed., "Relayed Echo Reply Mechanism for Label Switched Path (LSP) Ping", RFC 7743, DOI 10.17487/RFC7743, January 2016, <https://www.rfc-editor.org/info/rfc7743>.
[RFC7942]: Sheffer, Y. and A. Farrel, "Improving Awareness of Running Code: The Implementation Status Section", BCP 205, RFC 7942, DOI 10.17487/RFC7942, July 2016, <https://www.rfc-editor.org/info/rfc7942>.
[RFC8277]: Rosen, E., "Using BGP to Bind MPLS Labels to Address Prefixes", RFC 8277, DOI 10.17487/RFC8277, October 2017, <https://www.rfc-editor.org/info/rfc8277>.
[RFC8402]: Filsfils, C., Ed., Previdi, S., Ed., Ginsberg, L., Decraene, B., Litkowski, S., and R. Shakir, "Segment Routing Architecture", RFC 8402, DOI 10.17487/RFC8402, July 2018, <https://www.rfc-editor.org/info/rfc8402>.
[RFC8403]: Geib, R., Ed., Filsfils, C., Pignataro, C., Ed., and N. Kumar, "A Scalable and Topology-Aware MPLS Data-Plane Monitoring System", RFC 8403, DOI 10.17487/RFC8403, July 2018, <https://www.rfc-editor.org/info/rfc8403>.
[RFC8604]: Filsfils, C., Ed., Previdi, S., Dawra, G., Ed., Henderickx, W., and D. Cooper, "Interconnecting Millions of Endpoints with Segment Routing", RFC 8604, DOI 10.17487/RFC8604, June 2019, <https://www.rfc-editor.org/info/rfc8604>.
[RFC8660]: Bashandy, A., Ed., Filsfils, C., Ed., Previdi, S., Decraene, B., Litkowski, S., and R. Shakir, "Segment Routing with the MPLS Data Plane", RFC 8660, DOI 10.17487/RFC8660, December 2019, <https://www.rfc-editor.org/info/rfc8660>.
[RFC9086]: Previdi, S., Talaulikar, K., Ed., Filsfils, C., Patel, K., Ray, S., and J. Dong, "Border Gateway Protocol - Link State (BGP-LS) Extensions for Segment Routing BGP Egress Peer Engineering", RFC 9086, DOI 10.17487/RFC9086, August 2021, <https://www.rfc-editor.org/info/rfc9086>.
[RFC9087]: Filsfils, C., Ed., Previdi, S., Dawra, G., Ed., Aries, E., and D. Afanasiev, "Segment Routing Centralized BGP Egress Peer Engineering", RFC 9087, DOI 10.17487/RFC9087, August 2021, <https://www.rfc-editor.org/info/rfc9087>.
[RFC9256]: Filsfils, C., Talaulikar, K., Ed., Voyer, D., Bogdanov, A., and P. Mattes, "Segment Routing Policy Architecture", RFC 9256, DOI 10.17487/RFC9256, July 2022, <https://www.rfc-editor.org/info/rfc9256>.
[RFC9552]: Talaulikar, K., Ed., "Distribution of Link-State and Traffic Engineering Information Using BGP", RFC 9552, DOI 10.17487/RFC9552, December 2023, <https://www.rfc-editor.org/info/rfc9552>.