S. Bailey (Sandburst)
Internet-draft Expires: July 2002
The Direct Data Placement Protocol (DDPP) Core
draft-bailey-roi-ddpp-core-00
Status of this Memo
This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-Drafts
as reference material or to cite them other than as "work in
progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Copyright Notice
Copyright (C) The Internet Society (2002). All Rights Reserved.
Abstract
This document defines the core of a Direct Data Placement Protocol
(DDPP) to run on Internet Protocol-suite transport protocols. The
DDPP core is mapped to specific transport protocols in separate
documents.
Table Of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 2
2. DDP-Decorated Messages In DDPP . . . . . . . . . . . . . . 2
2.1. Splitting DDP-Decorated Messages . . . . . . . . . . . . . 2
Bailey Expires July 2002 [Page 1]
Internet-Draft DDPP Protocol Core 12 Feb 2002
2.2. DDP-decoration Structure . . . . . . . . . . . . . . . . . 3
3. Operation Ordering In DDPP . . . . . . . . . . . . . . . . 4
3.1. Ordering On Reliable, Ordered Transports . . . . . . . . . 6
3.2. Ordering On Reliable, Unordered Transports . . . . . . . . 6
3.3. Ordering On Unreliable, Ordered Transports . . . . . . . . 7
3.4. Ordering On Unreliable, Unordered Transports . . . . . . . 7
4. Transport Topology In DDPP . . . . . . . . . . . . . . . . 7
5. Negotiating DDPP . . . . . . . . . . . . . . . . . . . . . 8
6. Security Considerations . . . . . . . . . . . . . . . . . 8
7. IANA Considerations . . . . . . . . . . . . . . . . . . . 8
References . . . . . . . . . . . . . . . . . . . . . . . . 8
Author's Address . . . . . . . . . . . . . . . . . . . . . 9
Full Copyright Statement . . . . . . . . . . . . . . . . . 9
1. Introduction
This document defines the core of a Direct Data Placement Protocol
(DDPP) to run on Internet Protocol-suite transport protocols. The
DDPP core is mapped to specific transport protocols in separate
documents.
DDPP follows the architecture and terminology of `The Architecture
of Direct Data Placement (DDP) And Remote Direct Memory Access
(RDMA) On Internet Protocols' (DRARCH) [DRARCH]. A thorough
understanding of DRARCH is necessary to understand this document.
2. DDP-Decorated Messages In DDPP
DDP-decorated messages allow a receiving network interface to
directly place the data in a client protocol buffer.
A DDP-decorated message submitted to DDPP by a client protcol may
be split into a group of smaller DDP-decorated messages which are
each submitted to the transport. Each DDP-decorated message
submitted the transport carries its own, complete DDP-decoration
information.
2.1. Splitting DDP-Decorated Messages
DDPP processes a client protocol request to send a DDP-decorated
message of arbitrary length by potentially sending a group of
smaller DDP-decorated messages with equivalent content. A group of
DDP-decorated messages corresponding to a client protocol request:
o MAY be sent in any order. For example, 1000 octets of DDP-
decorated data could be sent as two messages, the first
Bailey Expires July 2002 [Page 2]
Internet-Draft DDPP Protocol Core 12 Feb 2002
containing octets 500-999, and the second containing octets
0-499.
o MUST request a reception indication in the last message with
the client protocol-supplied message identifier, if the client
protocol requested a reception indication.
o MUST NOT request a reception indication in any message other
than the final one.
DDPP mappings to unreliable or unordered transports MUST provide
client protocols a way to ensure DDP-decorated messages are sent
atomically, or not at all when the client protocol requests this
behavior; for example, by defining a DDP-decorated sending
operation that returns an error if the message can not be sent
atomically. DDPP on a reliable, ordered transport MAY also provide
this capability.
2.2. DDP-decoration Structure
There are two DDP-decoration elements which appear `on the wire': a
buffer address, composed of a steering tag, and a buffer offset,
and notification information, composed of a notification request
flag, and a message identifier. DDPP organizes these as:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|N| Message Identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| STag |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ Offset +
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
N - Notify Flag : 1 bit (boolean flag)
if set to 1, notify the client protocol of the reception of
this DDP-decorated message.
Message Identifier: 31 bits (unsigned integer)
Passed to the client protocol when the Notify Flag is set.
The Message Identifier is opaque to DDPP and can be structured
Bailey Expires July 2002 [Page 3]
Internet-Draft DDPP Protocol Core 12 Feb 2002
in any way by the client protocol. The Message Identifier
field is ignored by DDPP on DDP-decorated messages without the
Notify Flag set and may be set to any value by the sending
DDPP implementation. For example, if a DDP-decorated message
is split into several smaller DDP-decorated messages, the
Message Identifier field in each might contain the same value,
even thought the Notify Flag is only set in the last message.
STag: 32 bits (unsigned integer)
The steering tag identifying the destination buffer into which
to place the contents of a DDP-decorated message.
Offset: 64 bits (unsigned integer)
The offset in the destination buffer at which to begin placing
the contents of a DDP-decorated message.
A DDPP transport mapping MAY arrange these components differently,
but all four components MUST be present, or directly computable
from information available in every transport message containing a
DDP-decorated message. The STag MUST be 32 bits, and the Offset
MUST be 64 bits. The Message Identifier MUST be at least 15 bits
and SHOULD be at least 31 bits.
3. Operation Ordering In DDPP
The ordering among:
o set()s,
o undecorated messages, and
o DDP-decorated message reception indications
and their relationship to corresponding operations on the sender is
defined in DDPP according to underlying transport characteristics:
o reliable or unreliable, and
o ordered or unordered.
A primary principle in DDPP is that absolutely minimal restrictions
are imposed on ordering among set()s.
One view of the ordering rules of DDPP is that messages are passed
to DDPP by the transport, and DDPP can accumulate and process these
messages in any way, and in any order, as long as it conforms to
Bailey Expires July 2002 [Page 4]
Internet-Draft DDPP Protocol Core 12 Feb 2002
the rules defined below. While such accumulation and haphazard,
nondeterministic processing of DDPP messages may seem unlikely in a
real implementation, in fact, it does reflect the range of
behaviors exhibited when considering a wide range of
implementations. Such liberal rules permit very efficient
implementations that do not violate transport semantics either in
the transport interface to DDPP, or in the `pass-through' of
transport semantics to the client protocol.
Operation ordering in DDPP is defined in terms of:
o `submission' of messages to DDPP by the client protocol by the
sender,
o `reception' of messages by DDPP from the transport by the
receiver, and
o `delivery' of DDP-decorated message reception indications and
undecorated messages to the client protocol by the receiver.
For reception properties, DDP-decorated messages resulting from
splitting a single client protocol message are all considered to be
separate messages.
A set() to buffer `b', address `a', with value `v' `corresponds' to
a DDP-decorated message `m' if m also targets buffer b, address a
with value v. A set() which corresponds to a DDP-decorated message
that has been submitted to DDPP by the client protocol is called a
``corresponding set()'.
Regardless of transport characteristics, DDPP:
o MUST only perform corresponding set()s,
o MAY perform a corresponding set() more than once,
o MAY perform corresponding set()s in any order,
o MUST perform set(a,v) for every (a,v) that corresponds to a
received message `m' before m's reception indication (if any)
is delivered.
o MUST only perform set()s on registered buffers.
If the transport is ordered, DDPP:
o MUST only perform set()s that correspond to messages that
follow all delivered reception indications and all delivered
Bailey Expires July 2002 [Page 5]
Internet-Draft DDPP Protocol Core 12 Feb 2002
undecorated messages.
If the transport is reliable, DDPP:
o MUST only perform set()s that correspond to messages for which
a reception indication has not yet been delivered.
3.1. Ordering On Reliable, Ordered Transports
On a reliable, ordered transport, DDPP:
o MUST not deliver a reception indication more than once,
o MUST NOT deliver a reception indication before all preceding
reception indications and undecorated messages are delivered,
o MUST not deliver an undecorated message more than once.
o MUST NOT deliver an undecorated message before all preceding
reception indications and undecorated messages are delivered,
o MUST perform set(a,v) for every (a,v) that corresponds to a
received message before a subsequent reception indication or
undecorated message is delivered.
These rules allow subsequent reception indications and subsequent
undecorated messages to act as implicit reception indications:
delivery of a subsequent reception indication or subsequent
undecorated message implies all set()s corresponding to preceding
DDP-decorated messages have been performed.
For a reliable, ordered transport, delivery of the reception
indication on the last of a group of DDP-decorated messages sent in
place of a single client protocol message is equivalent to delivery
of a reception indication for a single DDP-decorated message
carrying the same data.
3.2. Ordering On Reliable, Unordered Transports
On a reliable, unordered transport, DDPP:
o MUST not deliver a reception indication more than once,
o MUST not deliver an undecorated message more than once.
Bailey Expires July 2002 [Page 6]
Internet-Draft DDPP Protocol Core 12 Feb 2002
3.3. Ordering On Unreliable, Ordered Transports
On an unreliable, ordered transport, DDPP:
o MUST not deliver a reception indication more than once,
o MUST NOT deliver a reception indication before a preceding
reception indication or undecorated message,
o MUST not deliver an undecorated message more than once,
o MUST NOT deliver an undecorated message before a preceding
reception indication or undecorated message.
3.4. Ordering On Unreliable, Unordered Transports
On an unreliable, unordered transport, in general, no additional,
transport-dependent rules apply to DDPP.
Particular unreliable, unordered transports may have additional
characteristics that permit useful ordering properties. For
example, a DDPP mapping to an unreliable datagram protocol on a
network with a maximum datagram lifetime of `MDL' could define, as
a function of MDL, the maximum time between submitting a DDP-
decorated message, and a set() that corresponds to it.
Unregistering a buffer is another way for a receiver to limit the
maximum time between submitting a DDP-decorated message and a set()
that corresponds to it. However, if another buffer is registered
subsequently with the same STag, set()s may be performed on the new
buffer that were destined for the old one. One possible way of
preventing immediate reuse of STags is to give the client protocol
some control over STags assigned to registered buffers.
4. Transport Topology In DDPP
Transports support some combination of:
o single source, or multisource,
o single destination, or multidestination (multicast or
anycast).
No special considerations apply to DDPP on multisource transports.
DDPP on multidestination transports must ensure that DDP-decorated
messages destined for many receivers can be placed in the
appropriate buffer on each receiver. The two tools for doing this
Bailey Expires July 2002 [Page 7]
Internet-Draft DDPP Protocol Core 12 Feb 2002
are:
o different receivers assigning the same buffer address (STag
and Offset) when registering the buffer,
o senders sending several messages with the same contents and
different buffer addresses.
A DDPP multicast transport mapping could use either of these
techniques, or both in combination. However, if no receivers
assign the same buffer address, there will be no economy of data
transport compared to using a single destination transport. Any
DDPP multicast transport mapping must carefully trade off the
implementation restrictions resulting from requiring control of
buffer address assignment, and the benefits of multicast data
transport. For example, it might be reasonable to expect support
for a small set of distinguished multicast buffer addresses by any
multicast-capable DDPP implementation. This would be analogous to
the small set of distinguished multicast network addresses within
the larger network address space.
A DDPP anycast transport must ensure that all different receivers
assign the same buffer address, because the choice of destination
may be beyond the control of the data source.
5. Negotiating DDPP
Negotiating the use of DDPP is the sole responsibility of the
client protocol. Note that DDPP is a simplex protocol and MAY be
enabled in only one direction by a pair of participants. Some
client protocols (e.g. RDMA) MAY chose to require DDPP a priori,
while others MAY define an in- or out-of-band negotiation process
to dynamically enable DDPP per sender/receiver pair.
6. Security Considerations
[TODO]
7. IANA Considerations
[TODO]
8. References
[DRARCH]
Bailey, S., "The Architecture of Direct Data Placement (DDP)
And Remote Direct Memory Access (RDMA) On Internet Protocols",
February 2002. http://www.cs.uchicago.edu/~steph/draft-
Bailey Expires July 2002 [Page 8]
Internet-Draft DDPP Protocol Core 12 Feb 2002
bailey-roi-ddp-rdma-arch-00.txt
Author's Address
Stephen Bailey
Sandburst Corporation
600 Federal Street
Andover, MA 01810
USA
Email: steph@sandburst.com
Full Copyright Statement
Copyright (C) The Internet Society (2002). All Rights Reserved.
This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain
it or assist in its implementation may be prepared, copied,
published and distributed, in whole or in part, without restriction
of any kind, provided that the above copyright notice and this
paragraph are included on all such copies and derivative works.
However, this document itself may not be modified in any way, such
as by removing the copyright notice or references to the Internet
Society or other Internet organizations, except as needed for the
purpose of developing Internet standards in which case the
procedures for copyrights defined in the Internet Standards process
must be followed, or as required to translate it into languages
other than English.
The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on
an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Bailey Expires July 2002 [Page 9]