D. Connolly
Internet-Draft                           World Wide Web Consortium (W3C)
                                                             L. Masinter
                                                       Xerox Corporation
                                                      September 21, 1999

draft-connolly-text-html-00.txt
Obsoletes: RFC 1866, RFC 2070, RFC 1980, RFC 1867, RFC 1942

                      The 'text/html' Media Type

Status of this Memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC 2026.

   This document is an Internet-Draft. Internet-Drafts are working
   documents of the Internet Engineering Task Force (IETF), its areas,
   and its working groups. Note that other groups may also distribute
   working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six
   months and may be updated, replaced, or obsoleted by other
   documents at any time. It is inappropriate to use Internet-Drafts
   as reference material or to cite them other than as ``work in
   progress''.

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

Copyright Notice

   Copyright (C) The Internet Society (1999).  All Rights Reserved.

Abstract

   This document summarizes the history of HTML development, and
   defines the "text/html" MIME type by pointing to the relevant W3C
   recommendations, It is intended to obsolete the previous IETF
   documents defining HTML, including RFC 1866, RFC 1867, RFC 1980,
   RFC 1942 and RFC 2070.

   This document was prepared at the request of the W3C HTML working
   group. Please send comments to www-html@w3.org, a public mailing
   list with archive at
   <http://lists.w3.org/Archives/Public/www-html/>.

1. Introduction and background

   The text/html media type was originally defined in [HTML20] in Nov
   1995. Extensions to HTML were proposed in [HTML30], [UPLOAD],
   [TABLES], [CLIMAPS], and [I18N].

   The IETF HTML working group closed Sep 1996, and work on HTML moved
   to the World Wide Web Consortium (W3C). The proposed extensions
   were incorporated, to some extent, in [HTML32], and to a larger
   extent, in [HTML40]. The definition of multipart/form-data from
   [UPLOAD] was described in [FORMDATA]. In addition, a reformulation
   of HTML 4.0 in XML 1.0 is being developed [XHTML1].

   [HTML32] notes "This specification defines HTML version 3.2. HTML
   3.2 aims to capture recommended practice as of early '96 and as
   such to be used as a replacement for HTML 2.0 (RFC 1866)."
   Subsequent specifications for HTML describe the differences
   in each version.

2. Definition for 'text/html'

   The 'text/html' media type is now defined by W3C recommendations;
   the latest published version is [HTML40]. As of this writing, a
   revision, HTML 4.01 [HTML401], is being developed. In both [HTML40]
   and [HTML401].

   The "text/html" media type may be used to refer to any W3C
   published version of HTML; the versions are distinguishable by the
   "DOCTYPE" declaration contained within them. In addition, [XHTML1]
   defines an profile of use of XHTML which is compatible with HTML
   4.0, and which may also be served as 'text/html'.

   MIME media type name:      text
   MIME subtype name:         html
   Required parameters:       none
   Optional parameters:       charset

     The optional parameter "charset" refers to the character encoding
     used to represent the HTML document as a sequence of bytes. Any
     registered IANA charset may be used, but UTF-8 is preferred.
     Although this parameter is optional, it is recommended that it
     always be present.

  Encoding considerations:

     Because of the availability within HTML itself for using
     character entity references for non-ASCII characters, it is
     possible that text/html documents with a wide repertoire may be
     transported without encoding. Otherwise, transport of text/html
     using charsets other than US-ASCII may require base64 or
     quoted-printable encoding for 7-bit channels.

  Security considerations:

     See section 3 of this document.

   Additional information:

     Magic number:

       There is no single initial string that is always present for
       HTML files.

       Almost all HTML files have the string "<html" or "<HTML" near
       the beginning of the file.

       Documents conformant to HTML 2.0, HTML 3.2 and HTML 4.0 will
       start with a DOCTYPE declaration "<!DOCTYPE HTML" near the
       beginning, before the "<html". These dialects are case
       insensitive.  Files may start with white space, comments
       (introduced by "<!--" ), or processing instructions (introduced
       by "<?") prior to the DOCTYPE declaration.

       XHTML documents (optionally) start with an XML declaration
       which begins with "<?xml" and are required to have a DOCTYPE
       declaration "<!DOCTYPE html".

     File extension:

       The file extensions 'htm' or 'html' are commonly used, but
       other extensions denoting file formats for preprocessing are
       also common.

     Macintosh File Type code: HTML

3. Security Considerations

   [HTML40], section B.10, notes various security issues with
   interpreting anchors and forms in HTML documents.

   In addition, the introduction of scripting languages and
   interactive capabilities in HTML 4.0 introduced a number of
   security risks associated with the automatic execution of programs
   written by the sender but interpreted by the recipient.  User
   agents executing such scripts or programs must be extremely careful
   to insure that untrusted software is executed in a protected
   environment.

4. Author's Address

   Daniel W. Connolly
   World Wide Web Consortum (W3C)
   MIT Laboratory for Computer Science
   545 Technology Square
   Cambridge, MA 02139, U.S.A.
   phone:+1-512-310-2971
   mailto:connolly@w3.org
   http://www.w3.org/People/Connolly/

   Larry Masinter
   Palo Alto Research Center
   Xerox Corporation
   3333 Coyote Hill Road
   Palo Alto, CA 94304
   mailto: masinter@parc.xerox.com
   http://purl.org/NET/masinter

4. References

[HTML30] "HyperText Markup Language Specification Version 3.0." Dave
         Raggett, September 1995. Internet Draft (expired). Available
         at <http://www.w3.org/MarkUp/html3/CoverPage>.

[HTML20] "Hypertext Markup Language - 2.0." T. Berners-Lee &
         D. Connolly. RFC 1866. November 1995. Additional information
         available at <http://www.w3.org/MarkUp/html-spec/>.

[UPLOAD] "Form-based File Upload in HTML." E. Nebel & L. Masinter. RFC
         1867. November 1995.

[TABLES] "HTML Tables." D. Raggett. RFC 1942. May 1996.

[CLIMAPS] "A Proposed Extension to HTML : Client-Side Image Maps."
         J. Seidman. RFC 1980. August 1996.

[HTML32] "HTML 3.2 Reference Specification." Dave Raggett. W3C
         Recomendation. 14 January 1997. Available at
         <http://www.w3.org/TR/REC-html32>.

[I18N] "Internationalization of the Hypertext Markup Language."  RFC
         2070. F. Yergeau, G. Nicol, G. Adams, M. Duerst. January
         1997.

[FORMDATA] "Returning Values from Forms: multipart/form-data".  RFC
         2388. L. Masinter. August 1998.

[HTML40] "HTML 4.0 Specification." Raggett, Le Hors, Jacobs. W3C
         Recommendation. 18 Dec 1997. Available at
         <http://www.w3.org/TR/REC-html40>.

[HTML401] "HTML 4.01 Specification." D. Raggett, A. Le Hors,
         I. Jacobs.  W3C Proposed Recommendation (work in progress),
         August 1999. Available at
         <http://www.w3.org/TR/1999/PR-html40-19990824>.

[XHTML1] "XHTML 1.0: The Extensible HyperText Markup Language: A
         Reformulation of HTML 4.0 in XML 1.0." W3C HTML Working
         Group. W3C Proposed Recommendation (work in progress). August
         1999. Available at <http://www.w3.org/TR/xhtml1>.