mscharhag, Programming and Stuff;

A blog about programming and software development topics, mostly focused on Java technologies including Java EE, Spring and Grails.

Wednesday, 6 October, 2021

Media types and the Content-Type header

A Media type (formerly known as MIME type) is an identifier for file formats and format contents. Media types are used by different internet technologies like e-mail or HTTP.

Media types consist of a type and a subtype. It can optionally contain a suffix and one or more parameters. Media types follow this syntax:

type "/" [tree "."] subtype ["+" suffix]* [";" parameter]

For example the media type for JSON documents is:

application/json

It consists of the type application with the subtype json.

A HTML document with UTF-8 encoding can be expressed as:

text/html; charset=UTF-8

Here we have the type text, the subtype html and a parameter charset=UTF-8 indicating UTF-8 character encoding.

A suffix can be used to specify the underlying format of a media type. For example, SVG images use the media type:

image/svg+xml

The type is image, svg is the subtype and xml the suffix. The suffix tells us that the SVG file format is based on XML.

Note that subtypes can be organized in a hierarchical tree structure. For example, the binary format used by Apache Thrift uses the following media type:

application/vnd.apache.thrift.binary

vnd is a standardized prefix that tells us this is a vendor specific media type.

The Content-Type header

With HTTP any message that contains an entity-body should include a Content-Type header to define the media type of the body.

The RFC says:

Any HTTP/1.1 message containing an entity-body SHOULD include a Content-Type header field defining the media type of that body. If and only if the media type is not given by a Content-Type field, the recipient MAY attempt to guess the media type via inspection of its content and/or the name extension(s) of the URI used to identify the resource. If the media type remains unknown, the recipient SHOULD treat it as type "application/octet-stream".

The RFC allows clients to guess the media type if the Content-Type header is not present. However, this should be avoided in any case.

Guessing the media-type of a piece of data is called Content sniffing (or MIME-sniffing). This practice was (and sometimes is still) used by web browsers and accounts for multiple security vulnerabilities. To explicitly tell browsers not to guess certain media types the following header can be added:

X-Content-Type-Options: nosniff

Note that the Content-Type header always contains the media type of the original resource, before any content encoding is applied. Content encoding (like gzip compression) is indicated by the Content-Encoding header.

Leave a reply