Internet Draft                                            Srinivas_Pitta
Expires: October 2003                                 Wipro Technologies
                                                              April 2003


                    Redundant Fault Tolerant Configurations 
  
                   draft-pitta-redundant-fault-tol-conf-00.txt

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at http://
   www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on October 20, 2003.

Copyright Notice

   Copyright (C) The Internet Society (2003). All Rights Reserved.

Abstract

   The ability of the system to deliver its normal service in the 
   presence of any unexpected errors and able to recover from the
   errors and restore to normal operation is called a fault tolerant
   system. Such systems whose behavior is predictable in nearly 
   every possible situation are often in redundant fault tolerant 
   configurations. This document defines the various redundant fault 
   tolerant configurations that make these systems dependable 
   systems or high availability systems.


Srinivas_Pitta         Expires October 2003                     [Page 1]

Internet-Draft      Redundant Fault Tolerant Configurations   April 2003


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  2
     1.1 Definitions  . . . . . . . . . . . . . . . . . . . . . . . .  2
   2.  Conventions  . . . . . . . . . . . . . . . . . . . . . . . . .  2
   3.  Configurations . . . . . . . . . . . . . . . . . . . . . . . .  3
     3.1  Hot Standby Redundant Fault Tolerant Configuration  . . . .  4
     3.2  Warm Standby Redundant Fault Tolerant Configuration . . . .  6
     3.3  Cold Standby Redundant Fault Tolerant Configuration . . . .  7
       Security Considerations . . . . . . . . . . . . . . . . . . . . 9
       Normative References  . . . . . . . . . . . . . . . . . . . . . 9
       Author's Addresses . . . . . . . . . . . . . .  . . . . . . . . 9
       Intellectual Property and Copyright Statements  . . . . . . .  10






























Srinivas_Pitta         Expires October 2003                     [Page 2]

Internet-Draft      Redundant Fault Tolerant Configurations   April 2003


1. Introduction

   This document gives the definitions of Hot, Warm and Cold standby 
   redundant fault-tolerant configurations. It will not assume in any
   way as to how the functionality is achieved. It can be implemented
   either in hardware, software or a combination of both. The 
   fault-tolerant configurations mentioned in this document does
   not restrict the user in having more that one standby sets. 
  



1.1 Definitions

   Component - A component can be a set of hardware or software entities 
   or a combination of both.

   up-time - A pre-determined time as to how long a system will be up 
   and running and will be able to provide complete functionality to 
   the user. 

   dependable system - A system that will be able to provide full 
   services without any downtime for a given period of time (up-time) 

   Set - A set is a collection of hardware and software components.


2. Conventions

   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
   SHOULD NOT, RECOMMENDED, NOT RECOMMENDED, MAY, and OPTIONAL, when
   they appear in this document, are to be interpreted as described in
   [2].









Srinivas_Pitta         Expires October 2003                     [Page 3]

Internet-Draft      Redundant Fault Tolerant Configurations   April 2003


3. Configurations


   Due to our inability to produce error-free hardware or software 
   components, it is not possible to assure the up-time of even a 
   dependable system. One of the most important factor in building a 
   dependable mission critical system is fault tolerance. The 
   primary purpose of the fault-tolerant system is to ensure the 
   system is operational for a given period (up-time) of time for which 
   it is designed. Any failures that occur during this time should not 
   prevent the system to render its services and should be able to 
   provide the complete functionality for which it is designed. For 
   some applications safety is more important than reliability, and 
   fault tolerance techniques will aid those applications in preventing 
   catastrophes. Applications running over a fault-tolerant system are 
   often called as High-Availability applications.

   The term redundancy can be defined as the use of more resources 
   beyond the minimum need to deliver a specific functionality or 
   service to the user. Fault tolerance is achieved through the use of 
   redundancy in the hardware and software. This involves the use of 
   duplicate set of hardware and software in addition to the primary or 
   the principal set. The primary set actively process the inputs 
   entering the system until a failure occurs, after which the 
   duplicate or secondary set (standby) takes over the system and 
   starts to provide the functionality.

   Redundant fault-tolerant systems come in three different 
   configurations:

      1) Hot Standby
      2) Warm Standby
      3) Cold Standby

   Under normal functionality the active set (one or more processors
   and software components) controls the system. The active set 
   processes all the traffic (includes provisioning, data and control) 
   and provides complete functionality to the user. If the active set 
   cannot deliver the functionality because it has broken badly due to 
   a failure then the standby takes over the system and starts to 
   provide the functionality. The difference between these systems is 
   how the input enters and where it is processed. It also depends on
   where the information required to make the system a fault-tolerant 
   system, is processed and maintained.










Srinivas_Pitta         Expires October 2003                     [Page 4]

Internet-Draft      Redundant Fault Tolerant Configurations   April 2003



3.1 Hot Standby Redundant Fault Tolerant Configuration  


   In Hot standby systems, the active and standby sets receives all the 
   inputs entering the system. Both the sets process the input in 
   exactly the same way. A selector receives the output from both the 
   sets. The selector suppress the output from the standby and only the 
   output from the active is considered. When the active set encounters 
   a failure where it can no longer be able to deliver its intended 
   functionality, the standby takes over the system and continue to 
   provide the functionality. The decision to switch to the standby 
   when a fault occurs can be made by the standby or another processor 
   actively monitoring both the active and standby set. In some cases 
   if the active set determines that some of its components are in bad 
   shape and it will no longer be able to continue further, the active 
   set can give up the control of the system to the standby set. 

   In this case all the hardware and software states at both the active
   and standby sets are exactly similar as both receive all the input 
   traffic. At any given time if the active fails the system will be 
   able to continue to provide the intended functionality with out any 
   disruption in the system. In this arrangement it is sufficient that 
   the selector function need to switch to the standby output with the 
   dead line set by the sonet standard GR-253-CORE.

   If a hot insertion needs to be supported for some or all of the 
   components in a particular set, then the implementation should take 
   care of restoring the re-inserted component to the same state as
   the active one. The re-inserted component can be declared redundant 
   only after exactly replicating the hardware and software states of 
   the current active component. In this configuration as both the 
   active and standby sets receives exactly the similar stimuli or 
   inputs, the hardware and software states on both sides will be 
   exactly similar. It is possible that both the sets will not be in 
   the similar state in the following cases.

   a) If the processing by the system is dependant on the 
   surrounding environment. 

      For example, even though the twp sets receives the same inputs, 
      if they take into account the environment variables like, 
      temperature or pressure etc., the out from the two systems will 
      be different and they may not be in the same state as expected. 

   b) If the two sets have different versions of the hardware and 
      software and the system cannot guarantee the same output for 
      a given input.






Srinivas_Pitta         Expires October 2003                     [Page 5]

Internet-Draft      Redundant Fault Tolerant Configurations   April 2003
   

      The rationale for the use of multiple versions is the expectation 
      that components built differently (i.e., different designers, 
      different algorithms, different design tools, etc) should fail 
      differently. Therefore, if one version fails on a particular 
      input, at least one of the alternate versions should be able to 
      provide an appropriate output. But it the output from the two 
      different sets with different versions cannot guarantee the same 
      output then the states of the two systems may not be as expected.
           

                                 
                                 |
                                \|/  Inputs
                                 V
                           -------------
                          | Duplication  |
                          | Function     |
                           --------------
                                 |
                                \|/
                                 V
                          ----------------
                          |              |
                         \|/            \|/
                          V              V
                  -------------       -------------
                 |   Active    |     |   Standby   |
                 |     Set     |     |     Set     |
                  -------------       -------------
                          |               |
                         \|/             \|/
                          V               V
                        --------------------
                                 |
                                \|/
                                 V
                            -------------
                           |  Selection  |
                           |  Function   |
                            -------------
                                 |
                                \|/ Outputs
                                 V


        Fig 1: Hot Standby Redundant Fault-tolerant configuration








Srinivas_Pitta         Expires October 2003                     [Page 6]

Internet-Draft      Redundant Fault Tolerant Configurations   April 2003


3.2 Warm Standby Redundant Fault Tolerant Configuration

 
   In Warm standby systems, only the active set receives all the inputs
   entering the system. The active set process the all inputs actively
   entering into the system. Once the inputs are processed the active
   set updates the internal state changes to the standby set. This 
   process of active set updating the internal state changes to the 
   standby set is called state updates or check pointing. 
   All the hardware and software states (data base) on both the 
   active and standby sets will be similar  due to internal the state 
   updates from the active to standby set. Since their states are 
   almost similar at any given time, the standby will be ready to take 
   over the system at any time. 
   
   If at any given time the active fails or if the active set is no 
   longer be able to provide the intended functionality to the user, 
   the standby takes over the system and continues to provide the 
   functionality. The decision to switch to the standby when a fault 
   occurs can be made by the standby or another processor actively 
   monitoring both the active and standby set. In some cases if the 
   active set determines that some of its components are in bad shape 
   and will no longer be able to continue further, the active set can 
   give up the control of the system to the standby set. In this 
   arrangement it is sufficient that the standby set should be able 
   to take over the system with the dead line set by the sonet 
   standard GR-253-CORE.

   If a hot insertion needs to be supported for some or all of the 
   components in a particular set, then the implementation should take 
   care of replicating the hardware and software states to the 
   re-inserted (standby) components in the system. This process of 
   replicating the standby component states from active is called 
   cloning. The re-inserted component should be declared redundant 
   only after cloning the hardware and software states of the current 
   active component.

   In this configuration, instead of cloning the actual internal 
   states to the standby from the active, the actual input (stimuli) 
   can be cloned to the standby so that the standby will be exactly in 
   the same configuration. It is possible that both the sets will not 
   be in the similar state in the following cases.












Srinivas_Pitta         Expires October 2003                     [Page 7]

Internet-Draft      Redundant Fault Tolerant Configurations   April 2003

   
   a) If the processing by the system is dependant on the 
   surrounding environment. 

      For example, even though the two sets receives the same inputs, 
      if they take into account the environment variables like, 
      temperature or pressure etc., the output from the two systems will 
      be different and they may not be in the same state as expected. 
   
   b) If the two sets have different versions of the hardware and 
      software and the system cannot guarantee the same output for a 
      given input.

      The rationale for the use of multiple versions is the expectation 
      that components built differently (i.e., different designers, 
      different algorithms, different design tools, etc) should fail 
      differently. Therefore, if one version fails on a particular 
      input, at least one of the alternate versions should be able to 
      provide an appropriate output. But it the output from the two 
      different sets with different versions cannot guarantee the same 
      output then the states of the two systems may not be as expected.




              -------------      State       -------------
      Inputs |    Active   |    Updates     |   Standby   |
    -------->|     Set     |--------------> |     Set     |
              -------------                  -------------
                    |                              
                    |
                   \|/ Outputs
                    V


    Fig 2: Warm Standby Redundant Fault-tolerant configuration



3.3 Cold Standby Redundant Fault Tolerant Configuration

   In Cold standby systems, only the active set receives all the inputs 
   entering the system. The active set processes all the inputs actively 
   entering into the system. Once the inputs are processed the active 
   set updates (writes) the internal state changes to a reliable 
   storage media. The standby set may or may not be powered up. Standby 
   will not be able to take over the system until the standby is 
   populated with all the internal state changes from the storage media.








Srinivas_Pitta         Expires October 2003                     [Page 8]

Internet-Draft      Redundant Fault Tolerant Configurations   April 2003


   If at any given time the active fails or if the active set is no 
   longer be able to provide the intended functionality to the user, 
   the standby may be brought into the operational state. The standby 
   will be populated with all the state updates from the storage media, 
   and all the inputs will be directed to the standby set. Hence forth 
   standby starts to provide the functionality to the user.

   The decision to switch to the standby when a fault occurs on the 
   active set can be made by the standby if the standby is powered up 
   or another processor actively monitoring the active set if the 
   standby is not powered up. In some cases if the active set 
   determines that some of its components are in bad shape and will no 
   longer be able to continue further, the active set can give up the 
   control of the system to the standby set. In any case in this 
   configuration it may not be possible by the standby set to take 
   over the system with the dead line set by the sonet standard 
   GR-253-CORE.

   Hot insertion will not be of great concern here as there will not 
   be much interaction between the active set and the standby set.


          ----------   State    / --------- \  State    -----------  
   Inputs | Active   | Updates   | Reliable  |  Updates |  Standby   |
   ------>|  Set     |---------->|  Storage  |--------> |    Set     | 
           ---------             \ --------- /           ------------  
                    |                              
                    |
                   \|/ Outputs
                    V

   Fig 3a: Cold Standby Redundant Fault-tolerant configuration 
           (Powered Up Scenario)


                    |                              |
                    |                              |
                   \|/ Inputs                     \|/ Inputs         
                    V                              V     
              -------------                  -------------
             |    Active   |                |   Standby   |
             |     Set     |                |     Set     |
              -------------                  -------------
                    |                              |
                    |                              |            
                   \|/ Outputs                    \|/ Outputs 
                    V                              V         


   Fig 3b: Cold Standby Redundant Fault-tolerant configuration 
           (Not Powered Up Scenario)




Srinivas_Pitta         Expires October 2003                     [Page 9]

Internet-Draft      Redundant Fault Tolerant Configurations   April 2003






Security Considerations
None


Normative References

   [1]  Bradner, S., "The Internet Standards Process -- Revision 3", BCP
        9, RFC 2026, October 1996.

   [2]  Bradner, S., "Key words for use in RFCs to Indicate Requirement
        Levels", BCP 14, RFC 2119, March 1997.

   [3]  GR-253-CORE, "Synchronous Optical Network (SONET) Transport 
        Systems: Common Generic Criteria" September 2000.




Author's Addresses

   Srinivas Pitta
   Wipro Technologies
   1300 Crittenden Lane, 
   2nd Floor, Mountain View 
   CA - 94043, USA

   Phone: 
   EMail: Srinivas.Pitta@Wipro.com

   
Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   intellectual property or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; neither does it represent that it
   has made any effort to identify any such rights. Information on the
   IETF's procedures with respect to rights in standards-track and
   standards-related documentation can be found in BCP-11. Copies of
   claims of rights made available for publication and any assurances of
   licenses to be made available, or the result of an attempt made to
   obtain a general license or permission for the use of such
   proprietary rights by implementors or users of this specification can
   be obtained from the IETF Secretariat.





Srinivas_Pitta         Expires October 2003                    [Page 10]

Internet-Draft      Redundant Fault Tolerant Configurations   April 2003

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights which may cover technology that may be required to practice
   this standard. Please address the information to the IETF Executive
   Director.


Full Copyright Statement

   Copyright (C) The Internet Society (2003). All Rights Reserved.

   This document and translations of it may be copied and furnished to
   others, and derivative works that comment on or otherwise explain it
   or assist in its implementation may be prepared, copied, published
   and distributed, in whole or in part, without restriction of any
   kind, provided that the above copyright notice and this paragraph are
   included on all such copies and derivative works. However, this
   document itself may not be modified in any way, such as by removing
   the copyright notice or references to the Internet Society or other
   Internet organizations, except as needed for the purpose of
   developing Internet standards in which case the procedures for
   copyrights defined in the Internet Standards process must be
   followed, or as required to translate it into languages other than
   English.


   The limited permissions granted above are perpetual and will not be
   revoked by the Internet Society or its successors or assignees.

   This document and the information contained herein is provided on an
   "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
   TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
   BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION

   HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
   MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Acknowledgement

   Funding for the RFC Editor function is currently provided by the
   Internet Society.












Srinivas_Pitta         Expires October 2003                    [Page 11]