S. Bailey    (Sandburst)
Internet-draft Expires: July 2002

             The Direct Data Placement Protocol (DDPP) Core
                     draft-bailey-roi-ddpp-core-00


Status of this Memo

     This document is an Internet-Draft and is in full conformance with
     all provisions of Section 10 of RFC2026.

     Internet-Drafts are working documents of the Internet Engineering
     Task Force (IETF), its areas, and its working groups.  Note that
     other groups may also distribute working documents as Internet-
     Drafts.

     Internet-Drafts are draft documents valid for a maximum of six
     months and may be updated, replaced, or obsoleted by other
     documents at any time.  It is inappropriate to use Internet-Drafts
     as reference material or to cite them other than as "work in
     progress."

     The list of current Internet-Drafts can be accessed at
     http://www.ietf.org/ietf/1id-abstracts.txt

     The list of Internet-Draft Shadow Directories can be accessed at
     http://www.ietf.org/shadow.html.

Copyright Notice

     Copyright (C) The Internet Society (2002). All Rights Reserved.


Abstract


     This document defines the core of a Direct Data Placement Protocol
     (DDPP) to run on Internet Protocol-suite transport protocols.  The
     DDPP core is mapped to specific transport protocols in separate
     documents.


Table Of Contents

     1.   Introduction . . . . . . . . . . . . . . . . . . . . . . .   2
     2.   DDP-Decorated Messages In DDPP . . . . . . . . . . . . . .   2
     2.1. Splitting DDP-Decorated Messages . . . . . . . . . . . . .   2



Bailey                      Expires July 2002                   [Page 1]

Internet-Draft             DDPP Protocol Core                12 Feb 2002


     2.2. DDP-decoration Structure . . . . . . . . . . . . . . . . .   3
     3.   Operation Ordering In DDPP . . . . . . . . . . . . . . . .   4
     3.1. Ordering On Reliable, Ordered Transports . . . . . . . . .   6
     3.2. Ordering On Reliable, Unordered Transports . . . . . . . .   6
     3.3. Ordering On Unreliable, Ordered Transports . . . . . . . .   7
     3.4. Ordering On Unreliable, Unordered Transports . . . . . . .   7
     4.   Transport Topology In DDPP . . . . . . . . . . . . . . . .   7
     5.   Negotiating DDPP . . . . . . . . . . . . . . . . . . . . .   8
     6.   Security Considerations  . . . . . . . . . . . . . . . . .   8
     7.   IANA Considerations  . . . . . . . . . . . . . . . . . . .   8
          References . . . . . . . . . . . . . . . . . . . . . . . .   8
          Author's Address . . . . . . . . . . . . . . . . . . . . .   9
          Full Copyright Statement . . . . . . . . . . . . . . . . .   9



1.  Introduction

     This document defines the core of a Direct Data Placement Protocol
     (DDPP) to run on Internet Protocol-suite transport protocols.  The
     DDPP core is mapped to specific transport protocols in separate
     documents.

     DDPP follows the architecture and terminology of `The Architecture
     of Direct Data Placement (DDP) And Remote Direct Memory Access
     (RDMA) On Internet Protocols' (DRARCH) [DRARCH].  A thorough
     understanding of DRARCH is necessary to understand this document.

2.  DDP-Decorated Messages In DDPP

     DDP-decorated messages allow a receiving network interface to
     directly place the data in a client protocol buffer.

     A DDP-decorated message submitted to DDPP by a client protcol may
     be split into a group of smaller DDP-decorated messages which are
     each submitted to the transport.  Each DDP-decorated message
     submitted the transport carries its own, complete DDP-decoration
     information.

2.1.  Splitting DDP-Decorated Messages

     DDPP processes a client protocol request to send a DDP-decorated
     message of arbitrary length by potentially sending a group of
     smaller DDP-decorated messages with equivalent content.  A group of
     DDP-decorated messages corresponding to a client protocol request:

     o    MAY be sent in any order.  For example, 1000 octets of DDP-
          decorated data could be sent as two messages, the first



Bailey                      Expires July 2002                   [Page 2]

Internet-Draft             DDPP Protocol Core                12 Feb 2002


          containing octets 500-999, and the second containing octets
          0-499.

     o    MUST request a reception indication in the last message with
          the client protocol-supplied message identifier, if the client
          protocol requested a reception indication.

     o    MUST NOT request a reception indication in any message other
          than the final one.

     DDPP mappings to unreliable or unordered transports MUST provide
     client protocols a way to ensure DDP-decorated messages are sent
     atomically, or not at all when the client protocol requests this
     behavior; for example, by defining a DDP-decorated sending
     operation that returns an error if the message can not be sent
     atomically.  DDPP on a reliable, ordered transport MAY also provide
     this capability.

2.2.  DDP-decoration Structure

     There are two DDP-decoration elements which appear `on the wire': a
     buffer address, composed of a steering tag, and a buffer offset,
     and notification information, composed of a notification request
     flag, and a message identifier.  DDPP organizes these as:


     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |N|                    Message Identifier                       |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                             STag                              |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
     |                                                               |
     +                            Offset                             +
     |                                                               |
     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


     N - Notify Flag : 1 bit (boolean flag)

          if set to 1, notify the client protocol of the reception of
          this DDP-decorated message.

     Message Identifier: 31 bits (unsigned integer)

          Passed to the client protocol when the Notify Flag is set.
          The Message Identifier is opaque to DDPP and can be structured



Bailey                      Expires July 2002                   [Page 3]

Internet-Draft             DDPP Protocol Core                12 Feb 2002


          in any way by the client protocol.  The Message Identifier
          field is ignored by DDPP on DDP-decorated messages without the
          Notify Flag set and may be set to any value by the sending
          DDPP implementation.  For example, if a DDP-decorated message
          is split into several smaller DDP-decorated messages, the
          Message Identifier field in each might contain the same value,
          even thought the Notify Flag is only set in the last message.

     STag: 32 bits (unsigned integer)

          The steering tag identifying the destination buffer into which
          to place the contents of a DDP-decorated message.

     Offset: 64 bits (unsigned integer)

          The offset in the destination buffer at which to begin placing
          the contents of a DDP-decorated message.

     A DDPP transport mapping MAY arrange these components differently,
     but all four components MUST be present, or directly computable
     from information available in every transport message containing a
     DDP-decorated message.  The STag MUST be 32 bits, and the Offset
     MUST be 64 bits.  The Message Identifier MUST be at least 15 bits
     and SHOULD be at least 31 bits.

3.  Operation Ordering In DDPP

     The ordering among:

     o    set()s,

     o    undecorated messages, and

     o    DDP-decorated message reception indications

     and their relationship to corresponding operations on the sender is
     defined in DDPP according to underlying transport characteristics:

     o    reliable or unreliable, and

     o    ordered or unordered.

     A primary principle in DDPP is that absolutely minimal restrictions
     are imposed on ordering among set()s.

     One view of the ordering rules of DDPP is that messages are passed
     to DDPP by the transport, and DDPP can accumulate and process these
     messages in any way, and in any order, as long as it conforms to



Bailey                      Expires July 2002                   [Page 4]

Internet-Draft             DDPP Protocol Core                12 Feb 2002


     the rules defined below.  While such accumulation and haphazard,
     nondeterministic processing of DDPP messages may seem unlikely in a
     real implementation, in fact, it does reflect the range of
     behaviors exhibited when considering a wide range of
     implementations.  Such liberal rules permit very efficient
     implementations that do not violate transport semantics either in
     the transport interface to DDPP, or in the `pass-through' of
     transport semantics to the client protocol.

     Operation ordering in DDPP is defined in terms of:

     o    `submission' of messages to DDPP by the client protocol by the
          sender,

     o    `reception' of messages by DDPP from the transport by the
          receiver, and

     o    `delivery' of DDP-decorated message reception indications and
          undecorated messages to the client protocol by the receiver.

     For reception properties, DDP-decorated messages resulting from
     splitting a single client protocol message are all considered to be
     separate messages.

     A set() to buffer `b', address `a', with value `v' `corresponds' to
     a DDP-decorated message `m' if m also targets buffer b, address a
     with value v.  A set() which corresponds to a DDP-decorated message
     that has been submitted to DDPP by the client protocol is called a
     ``corresponding set()'.

     Regardless of transport characteristics, DDPP:

     o    MUST only perform corresponding set()s,

     o    MAY perform a corresponding set() more than once,

     o    MAY perform corresponding set()s in any order,

     o    MUST perform set(a,v) for every (a,v) that corresponds to a
          received message `m' before m's reception indication (if any)
          is delivered.

     o    MUST only perform set()s on registered buffers.

     If the transport is ordered, DDPP:

     o    MUST only perform set()s that correspond to messages that
          follow all delivered reception indications and all delivered



Bailey                      Expires July 2002                   [Page 5]

Internet-Draft             DDPP Protocol Core                12 Feb 2002


          undecorated messages.

     If the transport is reliable, DDPP:

     o    MUST only perform set()s that correspond to messages for which
          a reception indication has not yet been delivered.

3.1.  Ordering On Reliable, Ordered Transports

     On a reliable, ordered transport, DDPP:

     o    MUST not deliver a reception indication more than once,

     o    MUST NOT deliver a reception indication before all preceding
          reception indications and undecorated messages are delivered,

     o    MUST not deliver an undecorated message more than once.

     o    MUST NOT deliver an undecorated message before all preceding
          reception indications and undecorated messages are delivered,

     o    MUST perform set(a,v) for every (a,v) that corresponds to a
          received message before a subsequent reception indication or
          undecorated message is delivered.

     These rules allow subsequent reception indications and subsequent
     undecorated messages to act as implicit reception indications:
     delivery of a subsequent reception indication or subsequent
     undecorated message implies all set()s corresponding to preceding
     DDP-decorated messages have been performed.

     For a reliable, ordered transport, delivery of the reception
     indication on the last of a group of DDP-decorated messages sent in
     place of a single client protocol message is equivalent to delivery
     of a reception indication for a single DDP-decorated message
     carrying the same data.

3.2.  Ordering On Reliable, Unordered Transports

     On a reliable, unordered transport, DDPP:

     o    MUST not deliver a reception indication more than once,

     o    MUST not deliver an undecorated message more than once.







Bailey                      Expires July 2002                   [Page 6]

Internet-Draft             DDPP Protocol Core                12 Feb 2002


3.3.  Ordering On Unreliable, Ordered Transports

     On an unreliable, ordered transport, DDPP:

     o    MUST not deliver a reception indication more than once,

     o    MUST NOT deliver a reception indication before a preceding
          reception indication or undecorated message,

     o    MUST not deliver an undecorated message more than once,

     o    MUST NOT deliver an undecorated message before a preceding
          reception indication or undecorated message.

3.4.  Ordering On Unreliable, Unordered Transports

     On an unreliable, unordered transport, in general, no additional,
     transport-dependent rules apply to DDPP.

     Particular unreliable, unordered transports may have additional
     characteristics that permit useful ordering properties.  For
     example, a DDPP mapping to an unreliable datagram protocol on a
     network with a maximum datagram lifetime of `MDL' could define, as
     a function of MDL, the maximum time between submitting a DDP-
     decorated message, and a set() that corresponds to it.

     Unregistering a buffer is another way for a receiver to limit the
     maximum time between submitting a DDP-decorated message and a set()
     that corresponds to it.  However, if another buffer is registered
     subsequently with the same STag, set()s may be performed on the new
     buffer that were destined for the old one.  One possible way of
     preventing immediate reuse of STags is to give the client protocol
     some control over STags assigned to registered buffers.

4.  Transport Topology In DDPP

     Transports support some combination of:

     o    single source, or multisource,

     o    single destination, or multidestination (multicast or
          anycast).

     No special considerations apply to DDPP on multisource transports.

     DDPP on multidestination transports must ensure that DDP-decorated
     messages destined for many receivers can be placed in the
     appropriate buffer on each receiver.  The two tools for doing this



Bailey                      Expires July 2002                   [Page 7]

Internet-Draft             DDPP Protocol Core                12 Feb 2002


     are:

     o    different receivers assigning the same buffer address (STag
          and Offset) when registering the buffer,

     o    senders sending several messages with the same contents and
          different buffer addresses.

     A DDPP multicast transport mapping could use either of these
     techniques, or both in combination.  However, if no receivers
     assign the same buffer address, there will be no economy of data
     transport compared to using a single destination transport.  Any
     DDPP multicast transport mapping must carefully trade off the
     implementation restrictions resulting from requiring control of
     buffer address assignment, and the benefits of multicast data
     transport.  For example, it might be reasonable to expect support
     for a small set of distinguished multicast buffer addresses by any
     multicast-capable DDPP implementation.  This would be analogous to
     the small set of distinguished multicast network addresses within
     the larger network address space.

     A DDPP anycast transport must ensure that all different receivers
     assign the same buffer address, because the choice of destination
     may be beyond the control of the data source.

5.  Negotiating DDPP

     Negotiating the use of DDPP is the sole responsibility of the
     client protocol.  Note that DDPP is a simplex protocol and MAY be
     enabled in only one direction by a pair of participants.  Some
     client protocols (e.g. RDMA) MAY chose to require DDPP a priori,
     while others MAY define an in- or out-of-band negotiation process
     to dynamically enable DDPP per sender/receiver pair.

6.  Security Considerations

     [TODO]

7.  IANA Considerations

     [TODO]

8.  References

     [DRARCH]
          Bailey, S., "The Architecture of Direct Data Placement (DDP)
          And Remote Direct Memory Access (RDMA) On Internet Protocols",
          February 2002.  http://www.cs.uchicago.edu/~steph/draft-



Bailey                      Expires July 2002                   [Page 8]

Internet-Draft             DDPP Protocol Core                12 Feb 2002


          bailey-roi-ddp-rdma-arch-00.txt

Author's Address


     Stephen Bailey
     Sandburst Corporation
     600 Federal Street
     Andover, MA  01810
     USA

     Email: steph@sandburst.com


Full Copyright Statement

     Copyright (C) The Internet Society (2002). All Rights Reserved.

     This document and translations of it may be copied and furnished to
     others, and derivative works that comment on or otherwise explain
     it or assist in its implementation may be prepared, copied,
     published and distributed, in whole or in part, without restriction
     of any kind, provided that the above copyright notice and this
     paragraph are included on all such copies and derivative works.
     However, this document itself may not be modified in any way, such
     as by removing the copyright notice or references to the Internet
     Society or other Internet organizations, except as needed for the
     purpose of developing Internet standards in which case the
     procedures for copyrights defined in the Internet Standards process
     must be followed, or as required to translate it into languages
     other than English.

     The limited permissions granted above are perpetual and will not be
     revoked by the Internet Society or its successors or assigns.

     This document and the information contained herein is provided on
     an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
     ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
     IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
     THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
     WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.










Bailey                      Expires July 2002                   [Page 9]