L2TP Working Group                                           Vipin Jain
INTERNET DRAFT                                    Nortel Networks, Inc.
Expires Nov 2001                                               May 2001


                   Fail over extensions for L2TP
                 draft-vipin-l2tpext-failover-00.txt


              
Status of this Memo

   This document is an Internet-Draft and is in full conformance with     
   all provisions of Section 10 of RFC 2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six 
   months and may be updated, replaced, or obsoleted by other documents 
   at any time.  It is inappropriate to use Internet- Drafts as 
   reference material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
        http://www.ietf.org/shadow.html.

Abstract

   The Layer Two Tunneling Protocol (L2TP) [1] provides a standard
   method for tunneling PPP [2] packets. Because L2TP control, and
   optionally data packets uses sequencing, it becomes difficult to 
   preserve L2TP tunnels and sessions within, should there be a failure 
   in a system. Protocol extensions are required to indicate the peer
   about fail over to help failed system recover better and exhibit
   more predictable behavior. This would provide resiliency in an L2TP
   based network thereby improving end user's PPP connectivity. It can
   also be used to provide planned shutdown of L2TP tunnels. 














Jain, Vipin              expires November 2001                 [Page 1]


Internet-Draft      draft-ietf-l2tpext-failover-00.txt         May,2001



1.0 Introduction

   L2TP control plane uses sequencing, timeouts and retransmissions to
   reliably transmit control packets. Where as L2TP data plane uses
   sequencing to detect packet loss. Sliding window mechanism used by
   L2TP makes it difficult for a system that fails and wants a standby
   to take over. To be able to do this, an implementation has to 
   maintain an active copy of transmit and receive windows for every 
   tunnel on the standby.

   This document defines new AVPs and procedures describing extensions
   to the protocol that will allow indicating the peer about fail over 
   and fail over capabilities. Upon such indication a peer would 
   understand the new sequencing requirements on data and control plane 
   and not drop existing L2TP tunnels and sessions. The extensions 
   proposed are backward compatible.

1.1 Conventions 

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED",  "MAY", and "OPTIONAL" in 
   this document are to be interpreted as described in RFC 2119 [5].


2.0 Fail Over Protocol

   This section describes the protocol followed between LAC and LNS
   before and after fail-over occurs.

2.1 Tunnel Establishement

   LAC or LNS when sending SCCRQ or SCCN include Fail-over capability 
   AVP to indicate its level of support for a fail-over situation. This 
   means granularity of a fail-over operation is per-tunnel. Appendix A 
   discusses design considerations for providing fail-over operation on 
   per tunnel granularity.

5.2 Session Establishement

   There is no change to how L2TP [2] describes L2TP session 
   establishment. A node requiring supporting fail-over must maintain
   the state (and other relevant info) of each session on a redundant 
   card or processor. How it achieves this is left to the 
   implementation and is out of the scope of this document.






Jain, Vipin              expires November 2001                 [Page 2]


Internet-Draft      draft-ietf-l2tpext-failover-00.txt         May,2001


5.3 Fail Over protocol 

   This section describes the behavior of two endpoints of a tunnel, 
   should LAC or LNS fails. The behavior is not different for a LAC 
   failure or LNS Failure. Appendix B contains example dialogues on a 
   fail-over situations.

5.3.1. On the Node that Fails

   After a fail over occurs the Node sends an SCCRQ with following
   considerations:

    o It includes all AVPs that it had sent when tunnel was 
      established.

    o It includes Fail-Over AVP indicating that fail-over has occurred.

    o It includes Old-Assigned-Tunnel-ID AVP, indicating the value of
      tunnel ID that was assigned by the peer in prior tunnel 
      establishment dialogue. This AVP indicates the tunnel-id of the 
      peer that is being subjected to fail-over.

    o This SCCRQ MUST use a new value tunnel ID in Assigned Tunnel ID 
      AVP upon fail-over. Use of a different tunnel ID avoids 
      acknowledging some control messages by peer that were meant for
      previous tunnel. The session IDs of various sessions remain same. 

   After a new tunnel is established, the node MUST retransmit all CDNs 
   that were not acknowledged by the peer. It MUST also use the new 
   tunnel id while re-transmitting these messages.

5.3.2. On the Node that gets indication of peer failure

   Upon receipt of an SCCRQ by a node that supports fail-over, it 
   responds as follows:

    o It MUST use Old-Assigned-Tunnel-ID AVP to identify the tunnel 
      that is subjected to fail-over. If it could not find Old Assigned 
      Tunnel ID AVP in SCCRQ, it MUST reject the SCCRQ and send SCCN in 
      response. 

    o If Fail-Over AVP indicates a value that is different from what  
      peer advertised in Fail-over Capability AVP, SCCRQ MUST be 
      rejected with an SCCN in response. In this case the node should 
      not take any action on any tunnel that matches Old Assigned 
      Tunnel ID AVP.





Jain, Vipin              expires November 2001                 [Page 3]


Internet-Draft      draft-ietf-l2tpext-failover-00.txt         May,2001


    o The SCCRP in response to SCCRQ, MUST use a new tunnel ID in
      Assigned Tunnel ID AVP. This avoids acknowledging some control 
      messages by peer that were meant for previous tunnel. The session 
      IDs of various sessions remain same. 

    o It retain all sessions within a tunnel and have them belong to 
      the new tunnel upon establishment. 

    o It MUST retransmit all non ZLB control messages that were not 
      acknowledged by the peer. It MUST also use the new tunnel id 
      while re-transmitting these control messages.

5.3.3. Data Session sequencing

   If sequencing is used on any session within a tunnel, then both 
   peers MUST reset their sequence numbers to 0. This allows data plane 
   to come back in sync and avoids any confusion of packet loss.


6.0. Fail Over AVPs

   Following new AVPs are introduced that should be included in SCCRQ 
   and SCCN messages to deal with fail over situations.

6.1. Fail-over capability AVP 

   A Fail-over capability AVP, Attribute Type [TBD], describes the 
   node's capability for a fail-over situation. A node should depend on 
   peer's fail-over capability in a fail-over situation.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0|H| rsvd  |      Length       |           Vendor ID [IETF]    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Attribute Type [TBD]  |        Attribute Value        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   The valid value for the attribute can only be 1. Higher level of 
   fail-over capabilities can be defined in future to support further 
   requirements. This document describes the mechanism only for level 1
   fail-over support.  

   The AVP is not mandatory (the M-bit MUST be set to 0), however an
   implementation requiring a fail-over capability from peer might 
   reject SCCRQ if it doesn't find Fail-over capability AVP in it. The 
   AVP MAY be hidden (the H-bit set to 0 or 1). 




Jain, Vipin              expires November 2001                 [Page 4]


Internet-Draft      draft-ietf-l2tpext-failover-00.txt         May,2001


6.2. Old Assigned Tunnel ID AVP 

   The Old Assigned Tunnel ID AVP, Attribute Type [TBD], encodes the
   Tunnel ID in SCCRQ and SCCRP messages that was assigned by the 
   sender before a fail-over occurred.

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |1|H| rsvd  |      Length       |           Vendor ID [IETF]    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Attribute Type [TBD]  |        Attribute Value        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

      The Assigned Tunnel ID is a 2 octet non-zero unsigned integer.

   When used in fail-over situation this AVP MUST be included in SCCRQ.
   This AVP is mandatory(the M-bit MUST be set to 1). The AVP MAY be
   hidden (the H-bit set to 0 or 1).  

6.3 Fail-over AVP

   The Fail-over AVP, Attribute Type [TBD], indicates the peer that
   Fail-over on the node has occurred and it would like the peer to 
   restart tunnel establishment and preserve all sessions that were
   established. 

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |1|H| rsvd  |      Length       |           Vendor ID [IETF]    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Attribute Type [TBD]  |        Attribute Value        |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

   The value of this attribute should be the same as what was 
   advertised by the peer in Fail-over Capability AVP defined in 
   section 6.1. It indicates the level of fail-over support peer needs
   on a fail-over situation. The valid value of the attribute can only 
   be 1 because peer will not advertise anything other than 1. Higher 
   level of Fail-over AVP behavior can be defined in future to support 
   further requirements. This document describes the mechanism only for 
   level 1 fail-over support.  

   When used in fail-over situation this AVP MUST be included in SCCRQ
   This AVP is mandatory(the M-bit MUST be set to 1). The AVP MAY be
   hidden (the H-bit set to 0 or 1).  




Jain, Vipin              expires November 2001                 [Page 5]


Internet-Draft      draft-ietf-l2tpext-failover-00.txt         May,2001



7. Security considerations

   The fail-over mechanism does not add any further security problems.
   To prevent from potential misuse of a fail-over situation tunnel
   authentication is recommended. 

8. Future work

   Future work would comprise of defining behavior among peers for new 
   levels of fail-over support.

9. IANA Considerations

   To be completed.

10. Acknowledgments

   Many thanks to Mark Townsley, Keyur Parikh, Andy Kocinski and 
   Reinaldo Penno for their valuable comments.

11. References

   [1] Townsley, et. al., "Layer Two Tunneling Protocol L2TP", RFC
       2661, February 1999.

   [2] Simpson, W., "The Point-to-Point Protocol (PPP)", STD 51,
       RFC 1661, July 1994.

12. Authors' Addresses

   Vipin Jain
   Nortel Networks, Inc.
   2305 Mission College Blvd
   Santa Clara, CA 95054
   Phone: +1 408.565.2636
   Email: vipin@nortelnetworks.com














Jain, Vipin              expires November 2001                 [Page 6]


Internet-Draft      draft-ietf-l2tpext-failover-00.txt         May,2001




Appendix A

This section lists some design considerations:

1. Why tunnel level granularity? Why not system wide? Why not per session?

Following Reasons, why tunnel level granularity should be preferred over system level granularity or session level 
granularity:
 o When fail-over occurs at a system where from only one tunnel is 
   being established, only a tunnel needs fail-over support from our 
   perspective.
 o There might be implementations that want to provide resiliency only 
   for specific set of tunnels (due to some QOS agreements) and not 
   everything else. 
 o Session level granularity is like inviting too much of message 
   exchange for no real advantage. 


Appendix B

This section contains examples of how L2TP control channel would recover from various fail-over situations.

2.1. ICRQ sent:

LAC                                          LNS
                                     
Sid =x, tid=y  ---- ICRQ (Ns=a, Nr=b) --X    Fail-over occurs
                                             before receiving ICRQ
new tid=y1     <--- SCCRQ (Ns=0, Nr=0)-----  Send SCCRQ with
                                             Fail-over AVP
               -----SCCRP (Ns=0, Nr=1) --->  New LAC tid noted here
               <--- SCCN (Ns=1, Nr=1) -----     
               ----- ZLB (Ns=1, Nr=2)----->     
sid=x, tid=y1  --RESEND ICRQ (Ns=1,Nr=2)-->  Valid ICRQ, send ICRP
               <---- ICRP (Ns=2, Nr=2) ----     

2.2. ICCN sent:

LAC                                          LNS
                                     
Sid =x, tid=y  ---- ICRQ (Ns=a, Nr=b) ---->  Valid ICRQ
               <--- ICRP (Ns=b, Nr=a+1) ---  Send ICRP and save 
                                             the session's state




Jain, Vipin              expires November 2001                 [Page 7]


Internet-Draft      draft-ietf-l2tpext-failover-00.txt         May,2001




               --- ICCN (Ns=a+1, Nr=b+1)--X  Fail-over occurs
                                             before receiving ICCN
new tid=y1     <--- SCCRQ (Ns=0, Nr=0)-----  Send SCCRQ with
                                             Fail-over AVP
               -----SCCRP (Ns=0, Nr=1) --->  New LAC tid y2 noted 
               <--- SCCN (Ns=1, Nr=1) -----     
               ----- ZLB (Ns=1, Nr=2)----->     
Sid =x, tid=y1 ---- ICCN (Ns=1, Nr=2) ---->  LNS from its previous 
                                             Saved state knows it 
                                             had sent ICRP
               <---  ZLB (Ns=2, Nr=2)------- tid = y2

2.3. ICRP sent:

   LAC                                          LNS
                                     
Save session state --- ICRQ (Ns=a, Nr=b) ---->  Valid ICRQ, Send ICRP
Fail-over occurs   X-- ICRP (Ns=b, Nr=a+1) ---  Sid =x, tid=y
before receiving ICRP
Send SCCRQ with   ---- SCCRQ (Ns=0, Nr=0)-----> new tid=y1
Fail-over AVP
New LNS tid y2    <-----SCCRP (Ns=0, Nr=1)----    
noted here       
                  ----- SCCN (Ns=1, Nr=1) ---->     
                  <----- ZLB (Ns=1, Nr=2)------     
sid =x, tid=y1    <---- ICRP (Ns=1, Nr=2) ----- LNS resends 
                                                unacknowledged ICRP
                  ----- ICCN (Ns=2, Nr=2) ----> sid = x, tid = y2


2.4. ICCN Acked:

LAC                                             LNS
                                     
Save session state --- ICRQ (Ns=a, Nr=b) ---->  Valid ICRQ, Send ICRP
                   <--- ICRP (Ns=b, Nr=a+1)---  Sid =x, tid=y
Save session state --- ICCN (Ns=a+1, Nr=b+1)->  Valid ICCN, Send ZLB
Fail-over occurs   X--- ZLB (Ns=1, Nr=2)------    
Before receiving ZLB

Send SCCRQ with   ---- SCCRQ (Ns=0, Nr=0)-----> new tid=y1
Fail-over AVP
New LNS tid y2    <-----SCCRP (Ns=0, Nr=1)----    
noted here       





Jain, Vipin              expires November 2001                 [Page 8]


Internet-Draft      draft-ietf-l2tpext-failover-00.txt         May,2001



                  ----- SCCN (Ns=1, Nr=1) ---->     
                  <----- ZLB (Ns=1, Nr=2)------     
LAC is not required                             LNS is not required to 
to resend anything                              resend ZLB  

2.5. CDN Sent:

   LAC                                          LNS
                                     
   Sid =x, tid=y  ---- CDN (Ns=a, Nr=b) ---X    Fail-over occurs
                                                before receiving CDN
   new tid=y1     <--- SCCRQ (Ns=0, Nr=0)-----  Send SCCRQ with
                                                Fail-over AVP
                  -----SCCRP (Ns=0, Nr=1) --->  New LAC tid noted here
                  <--- SCCN (Ns=1, Nr=1) -----     
                  ----- ZLB (Ns=1, Nr=2)----->     
   sid=x, tid=y1  ---RESEND CDN (Ns=1,Nr=2)-->  Valid CDN, send ZLB Ack
                  <---- ZLB (Ns=2, Nr=2) ----     

2.5. CDN Sent:

LAC                                             LNS

                  <--- CDN (Ns=b, Nr=a+1) ---   Sid =x, tid=y, LNS 
                                                remembers
                                                un-acknowledged CDNs 
                                                here
                  --- ZLB Ack (Ns=a+1, Nr=b+1)-X Fail-over occurs 
                                                 before receiving ZLB
Send SCCRQ with   ---- SCCRQ (Ns=0, Nr=0)-----> new tid=y1
Fail-over AVP
New LNS tid y2    <-----SCCRP (Ns=0, Nr=1)----    
noted here       
                  ----- SCCN (Ns=1, Nr=1) ---->     
                  <----- ZLB (Ns=1, Nr=2)------    
sid=x, tid=y1     <--RESEND CDN (Ns=1, Nr=2)--- LNS resends CDN
If LAC finds      ----- ZLB (Ns=1, Nr=2)----->  
a session with sid=x
it deletes the session












Jain, Vipin              expires November 2001                 [Page 9]