10. A Vocabulary for the Contents of String-Encoded Data
10.1. Foreword
Annotations defined in this section indicate that an instance contains non-JSON data encoded in a JSON string.¶
These properties provide additional information required to interpret JSON data as rich multimedia documents. They describe the type of content, how it is encoded, and/or how it may be validated. They do not function as validation assertions; a malformed string-encoded document MUST NOT cause the containing instance to be considered invalid.¶
Meta-schemas that do not use "$vocabulary" SHOULD be considered to require this vocabulary as if its URI were present with a value of true.¶
The current URI for this vocabulary, known as the Content vocabulary, is:¶
<https://json-schema.org/draft/2020-12/vocab/content>.¶
The current URI for the corresponding meta-schema is:¶
10.2. Implementation Requirements
Due to security and performance concerns, as well as the open-ended nature of possible content types, implementations MUST NOT automatically decode, parse, and/or validate the string contents by default. This additionally supports the use case of embedded documents intended for processing by a different consumer than that which processed the containing document.¶
All keywords in this section apply only to strings, and have no effect on other data types.¶
Implementations MAY offer the ability to decode, parse, and/or validate the string contents automatically. However, it MUST NOT perform these operations by default, and MUST provide the validation result of each string-encoded document separately from the enclosing document. This process SHOULD be equivalent to fully evaluating the input against the original schema, followed by using the annotations to decode, parse, and/or validate each string-encoded document.For now, the exact mechanism of performing and returning parsed data and/or validation results from such an automatic decoding, parsing, and validating feature is left unspecified. Should such a feature prove popular, it may be specified more thoroughly in a future draft.¶
See also the Security Considerations (Section 16) sections for possible vulnerabilities introduced by automatically processing inputs according to these keywords.¶
10.3. "contentEncoding"
If the input value is a string, this property defines that the string SHOULD be interpreted as encoded binary data and decoded using the encoding named by this property.¶
Possible values indicating base 16, 32, and 64 encodings with several variations are listed in [RFC4648]. Additionally, sections 6.7 and 6.8 of [RFC2045] provide encodings used in MIME. This keyword is derived from MIME's Content-Transfer-Encoding header, which was designed to map binary data into ASCII characters. It is not related to HTTP's Content-Encoding header, which is used to encode (e.g. compress or encrypt) the content of HTTP request and responses.¶
As "base64" is defined in both RFCs, the definition from RFC 4648 SHOULD be assumed unless the string is specifically intended for use in a MIME context. Note that all of these encodings result in strings consisting only of 7-bit ASCII characters. Therefore, this keyword has no meaning for strings containing characters outside of that range.¶
If this keyword is absent, but "contentMediaType" is present, this indicates that the encoding is the identity encoding, meaning that no transformation was needed in order to represent the content in a UTF-8 string.¶
The value of this property MUST be a string.¶
10.4. "contentMediaType"
If the input value is a string, this property indicates the media type of the contents of the string. If "contentEncoding" is present, this property describes the decoded string.¶
The value of this property MUST be a string, which MUST be a media type, as defined by [RFC2046].¶
10.5. "contentSchema"
If the input value is a string, and if "contentMediaType" is present, this property contains a schema which describes the structure of the string.¶
This keyword MAY be used with any media type that can be mapped into JSON Schema's data model.¶
The value of this property MUST be a valid JSON schema. It SHOULD be ignored if "contentMediaType" is not present.¶
10.6. Example
Here is an example schema, illustrating the use of "contentEncoding" and "contentMediaType":¶
{
"type": "string",
"contentEncoding": "base64",
"contentMediaType": "image/png"
}
¶
Instances described by this schema are expected to be strings, and their values should be interpretable as base64-encoded PNG images.¶
Another example:¶
{
"type": "string",
"contentMediaType": "text/html"
}
¶
Instances described by this schema are expected to be strings containing HTML, using whatever character set the JSON string was decoded into. Per [RFC8259], Section 8.1, outside of an entirely closed system, this MUST be UTF-8.¶
This example describes a JWT that is MACed using the HMAC SHA-256 algorithm, and requires the "iss" and "exp" fields in its claim set.¶
{
"type": "string",
"contentMediaType": "application/jwt",
"contentSchema": {
"type": "array",
"minItems": 2,
"prefixItems": [
{
"const": {
"typ": "JWT",
"alg": "HS256"
}
},
{
"type": "object",
"required": ["iss", "exp"],
"properties": {
"iss": {"type": "string"},
"exp": {"type": "integer"}
}
}
]
}
}
¶
Note that "contentEncoding" does not appear. While the "application/jwt" media type makes use of base64url encoding, that is defined by the media type, which determines how the JWT string is decoded into a list of two JSON data structures: first the header, and then the payload. Since the JWT media type ensures that the JWT can be represented in a JSON string, there is no need for further encoding or decoding.¶