CCAMP Working Group                            Richard Rabbat, Ed. (FLA) 
Internet Draft                       Vishal Sharma, Ed. (Metanoia, Inc.) 
Expires: December 2003                          Norihiko Shinomiya (FLL) 
                                                     Ching-Fong Su (FLA) 
                                                              
                                                               June 2003 
 
         Fault Notification Protocol for GMPLS-Based Recovery 
            draft-rabbat-fault-notification-protocol-03.txt 
    
Status of this Memo  
   
  This document is an Internet-Draft and is in full conformance with 
  all provisions of Section 10 of RFC2026 [1]. 
   
  Internet-Drafts are working documents of the Internet Engineering 
  Task Force (IETF), its areas, and its working groups.  Note that 
  other groups may also distribute working documents as Internet-
  Drafts.  
   
  Internet-Drafts are draft documents valid for a maximum of six months 
  and may be updated, replaced, or obsoleted by other documents at any 
  time. It is inappropriate to use Internet-Drafts as reference 
  material or to cite them other than as "work in progress."  
   
  The list of current Internet-Drafts can be accessed at  
       http://www.ietf.org/ietf/1id-abstracts.txt  
  The list of Internet-Draft Shadow Directories can be accessed at  
       http://www.ietf.org/shadow.html. 
 
 
Abstract  
   
  This draft presents a fault notification protocol that is 
  implementation-agnostic to be used in a GMPLS-based failure recovery 
  scheme.  The protocol achieves bounded recovery path activation times 
  in the event of single resource failures. This is done by allowing 
  for the computation of constrained recovery paths that take into 
  account the physical capabilities of the nodes as well as the delay 
  characteristics of the control plane.  We propose using a flooding 
  protocol for fault notification to allow for per-failure notification 
  and to speed up the recovery process, and justify choices made for 
  the notification method and for the extensions required to current 
  protocols.  
 
 
 
 
 
 
Rabbat & Sharma (Eds.) Expires - December 2003             [Page 1] 
 
         draft-rabbat-fault-notification-protocol-03.txt    June 2003 
 
 
Table of Contents 
   
  1. Overview.......................................................2 
  2. Terminology....................................................3 
  3. Glossary of Terms Used.........................................3 
  4. Requirements at Recovery Path Setup Time.......................4 
  5. Protocol Steps in Failure Notification and Service Recovery....5 
  5.1 T1: Fault Detection Time......................................6 
  5.2 T2: Hold-Off Time.............................................6 
  5.3 T3: Fault Notification and Completion of Recovery Operation...6 
  5.3.1 Delays Incurred by Messages.................................9 
  5.3.2 Notification Message Data...................................9 
  5.4 T4: Traffic Recovery Time....................................10 
  6. Reversion (Normalization).....................................10 
  7. Security Considerations.......................................11 
  8. Conclusion....................................................11 
  9. Acknowledgments...............................................11 
  10. Intellectual Property Considerations.........................11 
  11. References...................................................13 
  12. Authors' Addresses...........................................14 
  Appendix A. Fault Notification Message Delays on a Path..........15 
  A.1 Delays Associated with Link Traversal........................15 
  A.2 Delays Incurred at the Nodes.................................15 
  Full Copyright Statement.........................................17 
   
   
1.   Overview 
   
  Recovery (protection and restoration in optical switching networks) 
  under tight time constraints has been recognized as a challenging 
  issue [2] that is crucial to meeting requirements for high-
  availability and service-level guarantees.  Several mechanisms have 
  been devised for recovery in mesh and ring topologies.  Currently, 
  the CCAMP WG has a collection of drafts that address the issue of 
  recovery in networks featuring a Generalized Multi-Protocol Label 
  Switching (GMPLS) control-plane.  Requirements for recovery in 
  optical networks are presented in [3], and the P&R Design Team has 
  produced a set of document addressing recovery: a terminology for 
  GMPLS-based recovery [2], an analysis draft [4] that looks at 
  differences between protection, restoration, path-based, link-based 
  and span-based approaches, and a functional specification draft [5] 
  that presents a functional description of some of the protocol 
  extensions needed to support GMPLS-based recovery.  A fault 
  notification protocol must address recovery requirements that fall 
  into three main categories: 
   
    o Timing requirements: it must meet adequate bounds on timing 
    o Control plane resources: it must use control plane resources  
       efficiently 
 
Rabbat & Sharma (Eds.) Expires - December 2003             [Page 2] 
 
         draft-rabbat-fault-notification-protocol-03.txt    June 2003 
 
 
    o Design of recovery schemes: it must allow for the design of  
       flexible recovery schemes 
   
  Protection and restoration algorithms can be used for local repair 
  (link-based or node-based), span recovery, and path recovery.  This 
  document presents a fault notification protocol and recovery scheme 
  designed to ensure bounded recovery times, (e.g., 50 ms), which are 
  comparable to recovery times in the ring-based SONET/SDH networks 
  that implement 1+1 or 1:1 protection schemes. 
     
  Link-based recovery can handle faults such as fiber link failures and 
  transponder failures.  However, in the case of a node failure, the 
  control plane uses either node-based or path-based recovery.  The 
  advantage of path-based recovery lies in its ability to reduce 
  wavelength redundancy (wavelengths that are reserved for possible 
  failures), but its disadvantage is the potentially lengthy delay 
  incurred in notifying all nodes along the recovery path of the 
  failure of a remote resource.  Span-based protection allows the 
  protection of independent segments on the working path thereby 
  decreasing the recovery time but requires more resources for 
  protection.  In addition, the provider has to go to a greater degree 
  of planning to protect the same resource.  In some applications, 
  recovery paths need to be chosen carefully to meet certain recovery 
  time requirements. 
 
  This document presents a fault notification protocol that is both 
  technology and topology agnostic, and applies to intra-domain 
  protection.  Multi-domain recovery is not within the scope of this 
  draft.  In addition, this proposal focuses on scalability, an 
  important issue that arises when using signaling for fault 
  notification. 
 
  We assume unidirectional traffic through Label Switched Paths (LSPs) 
  and assume that bidirectional traffic is carried by two 
  unidirectional LSPs.  Assumptions made in this draft are also valid 
  for bi-directional LSPs.  For the purpose of illustration, we also 
  assume a mesh Wavelength Division Multiplexing (WDM) network; 
  applicability to ring-topology networks is automatic. 
   
   
2.   Terminology 
   
  The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
  "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 
  document are to be interpreted as described in RFC 2119 [6]. 
 
 
3.   Glossary of Terms Used 
   
 
Rabbat & Sharma (Eds.) Expires - December 2003             [Page 3] 
 
         draft-rabbat-fault-notification-protocol-03.txt    June 2003 
 
 
  In addition to the terminology for GMPLS-based recovery that is 
  documented in [2], this draft uses the following acronyms: 
   
    o AIS:  Alarm Indication Signal, a signal at the SONET/SDH 
          transport layer 
    o BDI:  Backward Defect Indication, a signal at the transport 
          layer sent upstream 
    o LSP:  Label Switched Path 
    o MEMS: Micro-Electro Mechanical Systems 
    o PXC:  Photonic Cross-Connect, a cross-connect that switches 
          wavelengths transparently, by means of a switching fabric 
          such as MEMS 
    o WDM:  Wavelength Division Multiplexing 
 
 
4.   Requirements at Recovery Path Setup Time  
   
  A request for a working path signaled into the network indicates the 
  type of protection or restoration it requires, and, optionally, a 
  recovery priority value, which is useful if, during the recovery 
  process, a node has to decide on which (of many) working paths to 
  protect.  After the recovery route computation algorithm calculates 
  the protection or restoration path, the link resources (wavelengths, 
  wavebands, etc.) along that path are reserved and possibly activated.  
  When the recovery path is not activated, these link resources may be 
  used to carry preemptible best-effort traffic to increase network 
  utilization.  This traffic is generally identified as "extra 
  traffic."  Alternatively, the same link resource may be reserved by 
  multiple protection paths for different link failures as long as 
  these protection paths do not need to be activated simultaneously 
  (e.g., M:N shared protection).  In either case, proper link resources 
  need to be activated upon the notification of failure. 
   
  When a label for a recovery LSP is setup on a certain node A through 
  RSVP-TE or CR-LDP, node A SHOULD be aware of the network resource 
  that this LSP is protecting.  When using RSVP-TE for example, the 
  protection PATH message may notify all nodes on the protection path 
  of this information at path setup time as proposed in [7].  This 
  allows node A to bundle (or group together) labels (as well as link 
  resources) that protect a particular network resource.  For example, 
  if two labels j and k correspond to two LSPs used to protect working 
  paths from the failure of link (X,Y), then they belong to the bundle 
  L (X,Y).  This allows node A to process, in its control plane, the 
  joint event of the two LSP failures and possibly jointly 
  activate/cross-connect both LSPs referenced by labels j and k when it 
  receives notification of the failure of link (X,Y). 
   
  This documents proposes a method for per-failure fault notification 
  (as compared to per-LSP fault notification), hence such bundled label 
 
Rabbat & Sharma (Eds.) Expires - December 2003             [Page 4] 
 
         draft-rabbat-fault-notification-protocol-03.txt    June 2003 
 
 
  information is essential.  The main difference between "per-failure" 
  vs. "per-LSP" notification is in the number of notification processes 
  that need to be started.  Per-failure fault notification allows the 
  engaging of one mechanism to notify all relevant nodes of the fault.  
  On the other hand, per-LSP notification requires activating as many 
  mechanisms as the number of failed LSPs (for example, all LSPs that 
  failed due to a link failure).  In an optical network carrying 
  possibly hundreds of wavelengths per fiber, per-LSP notification can 
  be taxing on the hardware and resource-intensive. 
   
  Using LSP Hierarchy, one could achieve some amount of efficiency by 
  bundling notification messages. These messages would, however, have 
  to be unbundled later, to reach different sources.  The different 
  sources would then initiate handshake mechanisms on the different 
  recovery paths [5].  This is a time-consuming process that increases 
  recovery time.  As explained later in this draft, the flooding 
  approach decreases the recovery time by removing the need for such a 
  mechanism. 
 
 
5.   Protocol Steps in Failure Notification and Service Recovery 
   
  The steps described in this section present a control plane based 
  recovery scheme and its fault notification protocol.  It details the 
  process used in notifying nodes of the resource failure and 
  activating the recovery lightpaths.  The failure sequence is based on 
  the timing sequence in the ITU-T communication entitled G.gps [8] 
  applied to WDM networks.  A timing diagram in Figure 1 is reproduced 
  for clarity.  The critical component in guaranteeing time constraints 
  to service recovery is the fault notification process.  The following 
  sequence of events MUST be followed in order to ensure that the 
  recovery process happens within a specific amount of time, as is the 
  case of SONET/SDH-based networks.   
   
 
        +-Network Impairment  
        |    +-Fault Detected  
        |    |    +-Start of Fault Notification   
        |    |    |    +-Recovery Operation Complete 
        |    |    |    |    +-Traffic Recovered 
        |    |    |    |    |  
        |    |    |    |    |  
        v    v    v    v    v  
       ------------------------------------------------>  
        | T1 | T2 | T3 | T4 |                    time 
 
  Figure 1. Recovery Temporal Model 
   
   
 
Rabbat & Sharma (Eds.) Expires - December 2003             [Page 5] 
 
         draft-rabbat-fault-notification-protocol-03.txt    June 2003 
 
 
5.1  T1: Fault Detection Time 
   
  This is the period of time between the network impairment and the 
  detection at the control plane.  An example of such network 
  impairment is a fiber cut.  Layer 1 at a certain node detects the 
  fault and passes it to the control plane.  This document assumes that 
  equipment in the optical network can detect such failures.  This time 
  is not included in the calculation of the recovery time.  In general, 
  if a bi-directional link is cut, both its upstream and downstream 
  nodes will detect the fault.  The downstream node detects a 
  unidirectional link failure.  In this case, that node will send at 
  the transport layer a signal such as the Backward Defect Indication 
  (BDI) defined in ITU-T G.709 to the node upstream that will also act 
  as a detecting node.  We assume that the time difference between 
  detection and inference based on BDI is negligible.  Other transport 
  plane technologies MUST offer the same capability to be used in this 
  context.  So both upstream and downstream nodes detect the failure. 
   
  To support the failure detection requirement, nodes MUST implement 
  per-channel monitoring that will pinpoint the failure and report it 
  to the detecting entity. 
   
5.2  T2: Hold-Off Time 
   
  This is the period of time that the reporting entity waits before 
  starting the fault recovery process.  This allows the fault recovery 
  process at a given layer to wait for recovery to occur at a lower 
  layer.  In the case of WDM-based recovery, this time should be 0 sec 
  since there is no underlying layer recovery.  
   
  In the case of a GMPLS-enabled IP network over SONET, T2 may be set 
  to 50ms such that SONET protection scheme can activate before any IP 
  (MPLS) layer recovery is triggered.  For GMPLS-enabled SONET over 
  WDM, the choice for T2 is a bit complicated.  Mechanisms such as 
  SONET/SDH protection could be used in the same environment in 
  conjunction with WDM-based protection by picking either protection 
  mechanism or no protection at all.  Allowing redundant protection 
  mechanisms for the same light path may increase the recovery time.  
  The SONET/SDH layer, if it exists, makes the decision whether to 
  request a protected or unprotected light path from the WDM layer to 
  connect the SONET equipment.   
   
5.3  T3: Fault Notification and Completion of Recovery Operation 
   
  T3 is the period between the time when detecting entity starts 
  sending out a fault notification message and the time when every 
  node, including ingress nodes and intermediate nodes on the 
  corresponding recovery paths, have been notified of the failure and 
  finished reconfiguring themselves for carrying restored traffic.  For 
 
Rabbat & Sharma (Eds.) Expires - December 2003             [Page 6] 
 
         draft-rabbat-fault-notification-protocol-03.txt    June 2003 
 
 
  link-based recovery, the ingress node to the recovery LSP is the 
  upstream detecting node.  If the recovery time is strictly 
  constrained, the ingress node SHOULD be as close to the link failure 
  as possible.  This reduces the recovery time since no messages have 
  to be relayed to a remote or centralized authority to initiate 
  recovery.  
   
  Some ingress or egress nodes may detect a failure, for example, a 
  Loss of Light (LoL) event.  The fault notification message MUST be 
  initiated by the detecting entity even if the ingress and egress 
  nodes have other indications of failures.  This allows the fault 
  notification mechanism to solve for the worst-case scenarios and 
  gives timely notification of all concerned nodes on the recovery 
  path(s).  For the purpose of this draft, transport plane signals such 
  as the AIS (Alarm Indication Signal) and the BDI will be disregarded 
  by all OXCs except the detecting nodes.  It is to be noted that, for 
  the purposes of this draft, fault notification is initiated at the 
  control plane to minimize layer interaction. 
   
  The detecting entity MAY use several fault notification methods to 
  notify other nodes of the failure, including GMPLS-based signaling or 
  flooding.  In the case of GMPLS-based signaling, there is generally 
  one fault notification message per disrupted Label Switched Path.  In 
  case LSP Hierarchy is used, it would decrease the number of messages 
  by bundling them; these messages will, however, need to be unbundled 
  to reach different sources.  Then, each of these nodes would have to 
  initiate the handshake process.  While some protection paths may be 
  the same and could be signaled together during the handshake phase, 
  this is generally restrictive in a mesh network.  Hence, signaling 
  does not scale well with the number of connections; in addition, the 
  message processing delay is less predictable.  For details about the 
  notification methods and the choice of flooding for this draft, the 
  reader is encouraged to refer to [9]. 
   
  In the case of flooding, the message sent from the detecting entity 
  to all nodes on the various protection paths should reach them within 
  the specified recovery time (T-rec) minus the reconfiguration time 
  (T-cfg) needed at each node after it is notified of the fault.  We 
  define this time as the fault notification time (T-ntf = T-rec û T-
  cfg).  The method for assigning each node's T-ntf is out of scope for 
  this document. 
    
  Nodes on a recovery path (including the ingress node) are aware that 
  they are protecting against the failure of a particular resource.  
  All nodes notified of the failure will activate the recovery path by 
  performing any required hardware reconfiguration (e.g., moving 
  mirrors in the case of a MEMS-based switching fabric).  The approach 
  outlined in this draft supports node reconfiguration applied 
  sequentially (e.g., parallel movement of the mirrors is not 
 
Rabbat & Sharma (Eds.) Expires - December 2003             [Page 7] 
 
         draft-rabbat-fault-notification-protocol-03.txt    June 2003 
 
 
  available), or in parallel (e.g., electronic switching fabric).  An 
  algorithm that computes the constrained recovery path SHOULD take 
  nodesÆ physical capability into account in its path calculation. 
   
  The ingress node starts sending data on the protection path at the 
  start time S(I) specified in the next paragraph.  If one of the 
  detecting entities at the ingress or egress node detect, at the data 
  plane, a failure in the protection light path to be activated, it 
  MUST raise an alarm that may be dealt with at the management plane.  
  The management plane will take appropriate remediation action.  Alarm 
  and remediation are outside the scope of this draft. 
   
  The nodes on protection paths receive the fault notification within a 
  deterministic time.  This time delay is calculated by each node as 
  explained in Appendix A.  To avoid complex clock synchronization, an 
  ingress node, identified as node I, that receives the notification 
  from a detecting node, node J, calculates the start time S(I) at 
  which it must switch traffic to the protection path as follows: 
   
     S(I) = time-of-notification(I) - min-delay-between(J,I) + T-rec 
   
     Where 
      
       o time-of-notification(I) returns the clock time at node I. 
      
       o min-delay-between(J,I) returns the minimum time needed for 
         the notification from node J to reach node I; this value  
          is dependent on the topology and the different equipment in  
          the network.  It is calculated offline based on the  
          topology and hardware information, and is stored as a static  
          table at every node. 
   
  Note that (time-of-notification(I) - min-delay-between(J,I)) will 
  give the time when failure was detected at J, and T-rec is the 
  recovery time requirement.   
   
  Our scheme, therefore, works in the following manner: 
   
     1. Given the topology and the equipment in the network, it is 
        possible to calculate T-rec and T-ntf for a given failure.  
   
     2. An offline or online algorithm may calculate the recovery path  
        using this information. 
   
     3. Upon the occurrence of a failure, when flooding-based 
        notification is used as described above, a node I on the 
        recovery path is guaranteed that at S(I), all other nodes along  
        the recovery path have been informed of the failure and have  
        taken the appropriate action to move traffic onto the recovery 
 
Rabbat & Sharma (Eds.) Expires - December 2003             [Page 8] 
 
         draft-rabbat-fault-notification-protocol-03.txt    June 2003 
 
 
        path. 
   
  Fault notification is done via flooding as follows.  The detecting 
  entity sends a notification packet to its neighbors on all outgoing 
  links.  The notification packet is a high-priority packet, and 
  contains the unique identifier of the link at fault.  Each node that 
  receives such a packet sends an acknowledgement to the sender and 
  transmits duplicates of the notification to all other neighboring 
  nodes.  To reduce the amount of fault notification traffic that is 
  flooded, the nodes avoid re-broadcasting packets about the same fault 
  and decrement a time-to-live field in the packets as they are 
  received. 
   
  When the recovery type is restoration with dynamic routing, the 
  ingress node for the recovery path, on receiving the fault 
  notification message, must begin the processes of computing and 
  signaling the restoration paths in an order according to the relative 
  recovery priorities of the working paths for which it is responsible. 
   
5.3.1   Delays Incurred by Messages 
   
  The above discussion suggests that in order for the protection 
  algorithm to abide by the T-rec ms recovery requirement, it needs to 
  be either: 
   
     1. Aware of timing issues to be able to select a proper path, or 
     2. Passed a set of nodes and links that satisfy the timing 
     constraints. 
   
  Due to the complexity of the first method, we believe that the second 
  method will be easier to develop and implement.  For example, a 
  pruned topology may be considered for protection path computation, 
  where links/nodes that violate the strict recovery time requirements 
  are excluded.  A database of link information should hold the fiber 
  physical length and the capacity of each link (or channel) as well as 
  the notification message processing time.  The total time needed by a 
  notification packet to travel from source to destination can be 
  broken into two delay components: the time needed to traverse each 
  link and the time needed to go through each node.  While the 
  different delay calculations are discussed in Appendix A, the 
  algorithm for computing the protection paths is out of scope for this 
  document. 
 
5.3.2   Notification Message Data 
   
  Two types of messages are needed for reliable communication of fault 
  notifications:  
   
 
Rabbat & Sharma (Eds.) Expires - December 2003             [Page 9] 
 
         draft-rabbat-fault-notification-protocol-03.txt    June 2003 
 
 
  - A Fault Notify Message to carry the information regarding the 
  failure from each node on each of its outgoing links to its 
  neighboring node(s). 
   
  - A Fault Notify Acknowledge Message to indicate that the 
  notification message was properly received by a neighboring node.   
   
  Aside from implementation-dependent constructs, the data to be 
  carried in these messages is presented in Table 1 below. 
 
 
  Table 1. Required and Optional Data for Fault Notifications 
  -------------------------------------------------------------------- 
  Data Object    Fault   Fault Notify  Description 
                 Notify  Acknowledge 
  -------------------------------------------------------------------- 
  Message ID        R         R        Identifies notification messages 
  Fault Link ID     R         -        Identifies the failed link 
  Fault ID          R         -        Identifies sequence of failure 
  Channel Status    O         -        Indication of link fault status 
  Detecting Node ID O         -        Identifies the original node 
                                       that is reporting the failure 
  TTL               O         -        Time To Live field 
  -------------------------------------------------------------------- 
  R: required, O: optional, -: not applicable 
   
  A node keeps sending Fault Notify messages at intervals until it 
  receives a Fault Notify Acknowledgement response or the control 
  channel connectivity is declared lost. 
 
5.4    T4: Traffic Recovery Time 
   
  This is the time between the last recovery action and the time that 
  the traffic (if present) is completely recovered.  This interval is 
  intended to account for the time required for traffic to once again 
  arrive at the point in the network that experienced disrupted or 
  degraded service due to the occurrence of the fault, i.e. the egress 
  node. 
   
 
6.   Reversion (Normalization) 
   
  Most of the current literature recommends that for resource 
  efficiency, the traffic should be moved back to the original path 
  when the failed link or node is back online.  Although reversion is 
  an optional step, it is typically employed.  If reversion is not used, 
  the "orphaned" bandwidth on the failed working paths should be 
  reclaimed as these paths are repaired.  The signaling of fault repair 
 
Rabbat & Sharma (Eds.) Expires - December 2003            [Page 10] 
 
         draft-rabbat-fault-notification-protocol-03.txt    June 2003 
 
 
  notifications is similar to that of fault notifications.  However, 
  the reversion phase does not have strict time constraints. 
   
 
7.   Security Considerations  
    
  This draft makes use of several existing protocols; therefore this 
  draft does not introduce any new security issues besides the ones 
  that arise in the use of these protocols. 
   
   
8.   Conclusion 
   
  This draft presented generic mechanisms for a fault notification and 
  service recovery protocol for GMPLS-enabled optical networks.  It 
  described the steps required in the notification process, leading to 
  recovery of light path service within specific time bounds. A "per-
  failure" approach (as opposed to the "per-LSP" approach) to fault 
  notification is proposed for its scalability. 
 
 
9.   Acknowledgments 
   
  The authors would like to thank Jonathan Lang and Roberto Albanese 
  for feedback and helpful comments, and Takafumi Chujo, Peter 
  Czezowski, and Akira Chugo for valuable inputs to this draft. 
 
   
10.    Intellectual Property Considerations 
   
  This section is taken from Section 10.4 of RFC2026 [1]. 
   
  The IETF takes no position regarding the validity or scope of any 
  intellectual property or other rights that might be claimed to 
  pertain to the implementation or use of the technology described in 
  this document or the extent to which any license under such rights 
  might or might not be available; neither does it represent that it 
  has made any effort to identify any such rights. Information on the 
  IETF's procedures with respect to rights in standards-track and 
  standards-related documentation can be found in BCP-11. Copies of 
  claims of rights made available for publication and any assurances of 
  licenses to be made available, or the result of an attempt made to 
  obtain a general license or permission for the use of such 
  proprietary rights by implementors or users of this specification can 
  be obtained from the IETF Secretariat. 
   
  The IETF invites any interested party to bring to its attention any 
  copyrights, patents or patent applications, or other proprietary 
  rights, which may cover technology that may be required to practice 
 
Rabbat & Sharma (Eds.) Expires - December 2003            [Page 11] 
 
         draft-rabbat-fault-notification-protocol-03.txt    June 2003 
 
 
  this standard. Please address the information to the IETF Executive 
  Director. 








































 
Rabbat & Sharma (Eds.) Expires - December 2003            [Page 12] 
 
         draft-rabbat-fault-notification-protocol-03.txt    June 2003 
 
 
11.    References
                                    
  [1] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 
      9, RFC 2026, October 1996. 
   
  [2] Mannie, E., ed., et al, "Recovery (Protection and Restoration) 
      Terminology for Generalized Multi-Protocol Label Switching 
      (GMPLS)", Internet Draft, work in progress, draft-ietf-ccamp-
      gmpls-recovery-terminology-02.txt, May 2003. 
    
  [3] Rabbat, R., Soumiya, T. (Eds.), "Optical network failure recovery 
      requirements", Internet Draft, work in progress, draft-rabbat-
      optical-recovery-reqs-00.txt, June 2003. 
   
  [4] Papadimitriou, D., et al, "Analysis of Generalized MPLS-based 
      Recovery Mechanisms (including Protection and Restoration)", 
      Internet draft, work in progress, draft-ietf-ccamp-gmpls-recovery-
      analysis-01.txt, May 2003. 
   
  [5] Lang, J., and Rajagopalan, B. (Eds.) ôGeneralized MPLS recovery 
      functional specification,ö Internet Draft, Work in Progress, 
      draft-ietf-ccamp-gmpls-recovery-functional-00.txt, January 2003. 
   
  [6] Bradner, S., "Key words for use in RFCs to Indicate Requirement 
      Levels", BCP 14, RFC 2119, March 1997. 
   
  [7] Li, G., J. Yates, et al, "Experiments in Fast Restoration using 
      GMPLS in Optical/Electronic Mesh Networks", Post-deadline Papers 
      Digest, OFC 2001, Anaheim, CA, March 2001. 
   
  [8] ITU-T Draft Recommendation G.gps, "Generic Protection Switching", 
      April 2002, available at ITU web site. 
   
  [9] Rabbat, R. et al, "Fault Notification and Service Recovery in WDM 
      Networks", white paper available at: 
      http://perth.mit.edu/~richard/wp-ietf-fault-notification.pdf. 











 
Rabbat & Sharma (Eds.) Expires - December 2003            [Page 13] 
 
                                
   
12.    Authors' Addresses  
       
  Richard Rabbat               Vishal Sharma 
  Fujitsu Labs of America, Inc.     Metanoia, Inc. 
  1240 E. Arques Ave, MS 345          1600 Villa Street, Unit 352 
  Sunnyvale, CA 94085           Mtn. View, CA 94041 
  United States of America        United States of America 
  Phone: +1-408-530-4537         Phone: +1-650-386-6723 
  Email: rabbat@fla.fujitsu.com     Email: v.sharma@ieee.org 
 
  Norihiko Shinomiya            Ching-Fong Su 
  Fujitsu Laboratories Ltd.       Fujitsu Labs of America, Inc. 
  1-1, Kamikodanaka 4-Chome       1240 E. Arques Ave 
  Nakahara-ku, Kawasaki          Sunnyvale, CA 94085 
  211-8588, Japan              United States of America 
  Phone: +81-44-754-2635         Phone: +1-408-530-4572 
  Email: shinomi@jp.fujitsu.com     Email: csu@fla.fujitsu.com 
   


























 
Rabbat & Sharma (Eds.) Expires - December 2003            [Page 14] 
 
         draft-rabbat-fault-notification-protocol-03.txt    June 2003 
 
 
 
Appendix A.  Fault Notification Message Delays on a Path 
   
  This appendix describes the delays incurred on a path.  Two types of 
  delays occur on the path between any two nodes.  They are delays 
  incurred during traversal of the links on that path, and delays that 
  occur at the nodes along the path.  The following presents the 
  computations and expected values for the different delays. 
   
A.1  Delays Associated with Link Traversal 
   
  The time needed to traverse each link is the sum of the transmission 
  time and the link propagation delay: 
   
  1. The transmission time is a value based on link capacity.  The 
     calculation is as follows: D trans = (packet size) / (link 
     speed). 
  2. The link propagation delay is due to the physical length of the      Formatted:
     link: D prop = length / (light propagation speed on fiber). 
   
  The length of a notification packet is expected to be of the order of 
  a hundred bytes (about 10^3 bits).  As an example, for a link speed 
  of 1 Gbps, 
   
  D trans ~= 10^3 / 10^9 = 10^-6 s = 1 microsecond. 
   
  This value therefore can safely be ignored in calculating delays.  On 
  the other hand, the link propagation delay in metropolitan area and 
  long-haul networks affects total delay.  For a distance of 100 km, 
  with light speed in a fiber at 2/3 (about 200,000 km/s) of its speed 
  in free space, 
   
  D prop ~= 10^2 / (2 * 10^5) = 0.5*10^-3 s = 500 microseconds. 
   
A.2  Delays Incurred at the Nodes 
   
  At each node, two delays are important: queuing delay and processing 
  time.  The processing time D proc has been identified in the 
  literature [7] as a few tenths of a millisecond in the case of an 
  RSVP object.  This value is smaller in the case of a simpler LMP 
  message requesting the activation of an LSP path. 
   
  The issue of queuing delay is important at all intermediate nodes.   
  Fault notification messages should be queued at the front of the 
  buffer that holds other control packets in order to avoid queuing 
  delays, (those messages do not have to contend with data packets 
  since obviously no data are sent over the control channel).  A 
  queuing process such as priority queuing would allow those packets to 
  be admitted at the head of the queue, through the setup of the 
 
Rabbat & Sharma (Eds.) Expires - December 2003            [Page 15] 
 
         draft-rabbat-fault-notification-protocol-03.txt    June 2003 
 
 
  priority of the packet.  A simple mechanism such as the setup of the 
  priority bits at the IP header, such as the IP precedence bits or 
  DSCP code points of the TOS (Type Of Service) byte would be 
  appropriate.  Using priority queuing for fault notification messages 
  will ensure that their queuing delay will be bounded.  In the case of 
  flooding for fault notification, D queue(A) = 0 sec.  If other fault 
  notification messages are in the queue as well, this implies multiple 
  failures, where the time recovery guarantee does not apply.  
  Otherwise, it may indicate the fact that multiple messages are 
  traveling on different protection paths to notify the same link 
  failure, such as the case when a signaling protocol is used for fault 
  notification.  In the case of per-LSP fault notification just as in 
  the case of using a signaling protocol, the maximum queuing delay at 
  node A is: 
   
  D queue max(A)= (number of protection paths) * (packet size) / (link 
  bandwidth). 
   
  This provides the mathematical basis for using flooding for fault 
  notification; flooding allows this value to be 0 sec.  In the absence 
  of priority queuing, the maximum queue delay can be calculated as 
  follows at node A, assuming fair queuing at the FIFO buffers of all 
  control channels and assuming input buffers only: 
   
  D queue max(A)= (number of queues) * (queue size) / (link bandwidth). 
   
  This value is an upper bound, and is dependent on hardware buffer 
  implementations. 
   
   
















 
Rabbat & Sharma (Eds.) Expires - December 2003            [Page 16] 
 
         draft-rabbat-fault-notification-protocol-03.txt    June 2003 
 
 
Full Copyright Statement 
   
  "Copyright (C) The Internet Society (2003). All Rights Reserved. 
  This document and translations of it may be copied and furnished to  
  others, and derivative works that comment on or otherwise explain it 
  or assist in its implementation may be prepared, copied, published 
  and distributed, in whole or in part, without restriction of any 
  kind, provided that the above copyright notice and this paragraph are 
  included on all such copies and derivative works. However, this 
  document itself may not be modified in any way, such as by removing 
  the copyright notice or references to the Internet Society or other 
  Internet organizations, except as needed for the purpose of 
  developing Internet standards in which case the procedures for 
  copyrights defined in the Internet Standards process must be 
  followed, or as required to translate it into languages other than 
  English. 
   
  The limited permissions granted above are perpetual and will not be 
  revoked by the Internet Society or its successors or assigns. 
   
  This document and the information contained herein is provided on an 
  "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING 
  TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING 
  BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION 
  HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF 
  MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." 



















 
Rabbat & Sharma (Eds.) Expires - December 2003            [Page 17]