Living Standard — Last Updated 7 March 2021
This section only describes the rules for resources labeled with an HTML MIME type . Rules for XML resources are discussed in the section below entitled " The XML syntax ".
This section only applies to documents, authoring tools, and markup generators. In particular, it does not apply to conformance checkers; conformance checkers must use the requirements given in the next section ("parsing HTML documents").
Documents must consist of the following parts, in the given order:
html
element
.
The various types of content mentioned above are described in the next few sections.
In addition, there are some restrictions on how character encoding declarations are to be serialized, as discussed in the section on that topic.
ASCII
whitespace
before
the
html
element,
at
the
start
of
the
html
element
and
before
the
head
element,
will
be
dropped
when
the
document
is
parsed;
ASCII
whitespace
after
the
html
element
will
be
parsed
as
if
it
were
at
the
end
of
the
body
element.
Thus,
ASCII
whitespace
around
the
document
element
does
not
round-trip.
It
is
suggested
that
newlines
be
inserted
after
the
DOCTYPE,
after
any
comments
that
are
before
the
document
element,
after
the
html
element's
start
tag
(if
it
is
not
omitted
),
and
after
any
comments
that
are
inside
the
html
element
but
before
the
head
element.
Many strings in the HTML syntax (e.g. the names of elements and their attributes) are case-insensitive, but only for ASCII upper alphas and ASCII lower alphas . For convenience, in this section this is just referred to as "case-insensitive".
A DOCTYPE is a required preamble.
DOCTYPEs are required for legacy reasons. When omitted, browsers tend to use a different rendering mode that is incompatible with some specifications. Including the DOCTYPE in a document ensures that the browser makes a best-effort attempt at following the relevant specifications.
A DOCTYPE must consist of the following components, in this order:
<!DOCTYPE
".
html
".
In
other
words,
<!DOCTYPE
html>
,
case-insensitively.
For
the
purposes
of
HTML
generators
that
cannot
output
HTML
markup
with
the
short
DOCTYPE
"
<!DOCTYPE
html>
",
a
DOCTYPE
legacy
string
may
be
inserted
into
the
DOCTYPE
(in
the
position
defined
above).
This
string
must
consist
of:
SYSTEM
".
about:legacy-compat
".
In
other
words,
<!DOCTYPE
html
SYSTEM
"about:legacy-compat">
or
<!DOCTYPE
html
SYSTEM
'about:legacy-compat'>
,
case-insensitively
except
for
the
part
in
single
or
double
quotes.
The DOCTYPE legacy string should not be used unless the document is generated from a system that cannot output the shorter string.
There
are
six
different
kinds
of
elements
:
void
elements
,
the
template
element
,
raw
text
elements
,
escapable
raw
text
elements
,
foreign
elements
,
and
normal
elements
.
area
,
base
,
br
,
col
,
embed
,
hr
,
img
,
input
,
link
,
meta
,
param
,
source
,
track
,
wbr
template
element
template
script
,
style
textarea
,
title
Tags are used to delimit the start and end of elements in the markup. Raw text , escapable raw text , and normal elements have a start tag to indicate where they begin, and an end tag to indicate where they end. The start and end tags of certain normal elements can be omitted , as described below in the section on optional tags . Those that cannot be omitted must not be omitted. Void elements only have a start tag; end tags must not be specified for void elements . Foreign elements must either have a start tag and an end tag, or a start tag that is marked as self-closing, in which case they must not have an end tag.
The contents of the element must be placed between just after the start tag (which might be implied, in certain cases ) and just before the end tag (which again, might be implied in certain cases ). The exact allowed contents of each individual element depend on the content model of that element, as described earlier in this specification. Elements must not contain content that their content model disallows. In addition to the restrictions placed on the contents by those content models, however, the five types of elements have additional syntactic requirements.
Void elements can't have any contents (since there's no end tag, no content can be put between the start tag and the end tag).
The
template
element
can
have
template
contents
,
but
such
template
contents
are
not
children
of
the
template
element
itself.
Instead,
they
are
stored
in
a
DocumentFragment
associated
with
a
different
Document
—
without
a
browsing
context
—
so
as
to
avoid
the
template
contents
interfering
with
the
main
Document
.
The
markup
for
the
template
contents
of
a
template
element
is
placed
just
after
the
template
element's
start
tag
and
just
before
template
element's
end
tag
(as
with
other
elements),
and
may
consist
of
any
text
,
character
references
,
elements
,
and
comments
,
but
the
text
must
not
contain
the
character
U+003C
LESS-THAN
SIGN
(<)
or
an
ambiguous
ampersand
.
Raw text elements can have text , though it has restrictions described below.
Escapable raw text elements can have text and character references , but the text must not contain an ambiguous ampersand . There are also further restrictions described below.
Foreign elements whose start tag is marked as self-closing can't have any contents (since, again, as there's no end tag, no content can be put between the start tag and the end tag). Foreign elements whose start tag is not marked as self-closing can have text , character references , CDATA sections , other elements , and comments , but the text must not contain the character U+003C LESS-THAN SIGN (<) or an ambiguous ampersand .
The HTML syntax does not support namespace declarations, even in foreign elements .
For instance, consider the following HTML fragment:
<p>
<svg>
<metadata>
<!-- this is invalid -->
<cdr:license xmlns:cdr="https://www.example.com/cdr/metadata" name="MIT"/>
</metadata>
</svg>
</p>
The
innermost
element,
cdr:license
,
is
actually
in
the
SVG
namespace,
as
the
"
xmlns:cdr
"
attribute
has
no
effect
(unlike
in
XML).
In
fact,
as
the
comment
in
the
fragment
above
says,
the
fragment
is
actually
non-conforming.
This
is
because
SVG
2
does
not
define
any
elements
called
"
cdr:license
"
in
the
SVG
namespace.
Normal elements can have text , character references , other elements , and comments , but the text must not contain the character U+003C LESS-THAN SIGN (<) or an ambiguous ampersand . Some normal elements also have yet more restrictions on what content they are allowed to hold, beyond the restrictions imposed by the content model and those described in this paragraph. Those restrictions are described below.
Tags contain a tag name , giving the element's name. HTML elements all have names that only use ASCII alphanumerics . In the HTML syntax, tag names, even those for foreign elements , may be written with any mix of lower- and uppercase letters that, when converted to all-lowercase, matches the element's tag name; tag names are case-insensitive.
Start tags must have the following format:
End tags must have the following format:
Attributes for an element are expressed inside the element's start tag.
Attributes have a name and a value. Attribute names must consist of one or more characters other than controls , U+0020 SPACE, U+0022 ("), U+0027 ('), U+003E (>), U+002F (/), U+003D (=), and noncharacters . In the HTML syntax, attribute names, even those for foreign elements , may be written with any mix of ASCII lower and ASCII upper alphas .
Attribute values are a mixture of text and character references , except with the additional restriction that the text cannot contain an ambiguous ampersand .
Attributes can be specified in four different ways:
Just the attribute name . The value is implicitly the empty string.
In
the
following
example,
the
disabled
attribute
is
given
with
the
empty
attribute
syntax:
<input
disabled
>
If an attribute using the empty attribute syntax is to be followed by another attribute, then there must be ASCII whitespace separating the two.
The attribute name , followed by zero or more ASCII whitespace , followed by a single U+003D EQUALS SIGN character, followed by zero or more ASCII whitespace , followed by the attribute value , which, in addition to the requirements given above for attribute values, must not contain any literal ASCII whitespace , any U+0022 QUOTATION MARK characters ("), U+0027 APOSTROPHE characters ('), U+003D EQUALS SIGN characters (=), U+003C LESS-THAN SIGN characters (<), U+003E GREATER-THAN SIGN characters (>), or U+0060 GRAVE ACCENT characters (`), and must not be the empty string.
In
the
following
example,
the
value
attribute
is
given
with
the
unquoted
attribute
value
syntax:
<input
value=yes
>
If an attribute using the unquoted attribute syntax is to be followed by another attribute or by the optional U+002F SOLIDUS character (/) allowed in step 6 of the start tag syntax above, then there must be ASCII whitespace separating the two.
The attribute name , followed by zero or more ASCII whitespace , followed by a single U+003D EQUALS SIGN character, followed by zero or more ASCII whitespace , followed by a single U+0027 APOSTROPHE character ('), followed by the attribute value , which, in addition to the requirements given above for attribute values, must not contain any literal U+0027 APOSTROPHE characters ('), and finally followed by a second single U+0027 APOSTROPHE character (').
In
the
following
example,
the
type
attribute
is
given
with
the
single-quoted
attribute
value
syntax:
<input
type='checkbox'
>
If an attribute using the single-quoted attribute syntax is to be followed by another attribute, then there must be ASCII whitespace separating the two.
The attribute name , followed by zero or more ASCII whitespace , followed by a single U+003D EQUALS SIGN character, followed by zero or more ASCII whitespace , followed by a single U+0022 QUOTATION MARK character ("), followed by the attribute value , which, in addition to the requirements given above for attribute values, must not contain any literal U+0022 QUOTATION MARK characters ("), and finally followed by a second single U+0022 QUOTATION MARK character (").
In
the
following
example,
the
name
attribute
is
given
with
the
double-quoted
attribute
value
syntax:
<input
name="be
evil"
>
If an attribute using the double-quoted attribute syntax is to be followed by another attribute, then there must be ASCII whitespace separating the two.
There must never be two or more attributes on the same start tag whose names are an ASCII case-insensitive match for each other.
When a foreign element has one of the namespaced attributes given by the local name and namespace of the first and second cells of a row from the following table, it must be written using the name given by the third cell from the same row.
Local name | Namespace | Attribute name |
---|---|---|
actuate
| XLink namespace |
xlink:actuate
|
arcrole
| XLink namespace |
xlink:arcrole
|
href
| XLink namespace |
xlink:href
|
role
| XLink namespace |
xlink:role
|
show
| XLink namespace |
xlink:show
|
title
| XLink namespace |
xlink:title
|
type
| XLink namespace |
xlink:type
|
lang
| XML namespace |
xml:lang
|
space
| XML namespace |
xml:space
|
xmlns
| XMLNS namespace |
xmlns
|
xlink
| XMLNS namespace |
xmlns:xlink
|
No other namespaced attribute can be expressed in the HTML syntax .
Whether the attributes in the table above are conforming or not is defined by other specifications (e.g. SVG 2 and MathML ); this section only describes the syntax rules if the attributes are serialized using the HTML syntax.
Certain tags can be omitted .
Omitting
an
element's
start
tag
in
the
situations
described
below
does
not
mean
the
element
is
not
present;
it
is
implied,
but
it
is
still
there.
For
example,
an
HTML
document
always
has
a
root
html
element,
even
if
the
string
<html>
doesn't
appear
anywhere
in
the
markup.
An
html
element's
start
tag
may
be
omitted
if
the
first
thing
inside
the
html
element
is
not
a
comment
.
For
example,
in
the
following
case
it's
ok
to
remove
the
"
<html>
"
tag:
<!DOCTYPE HTML>
<html>
<head>
<title>Hello</title>
</head>
<body>
<p>Welcome to this example.</p>
</body>
</html>
Doing so would make the document look like this:
<!DOCTYPE HTML>
<head>
<title>Hello</title>
</head>
<body>
<p>Welcome to this example.</p>
</body>
</html>
This has the exact same DOM. In particular, note that whitespace around the document element is ignored by the parser. The following example would also have the exact same DOM:
<!DOCTYPE HTML><head>
<title>Hello</title>
</head>
<body>
<p>Welcome to this example.</p>
</body>
</html>
However,
in
the
following
example,
removing
the
start
tag
moves
the
comment
to
before
the
html
element:
<!DOCTYPE HTML>
<html>
<!-- where is this comment in the DOM? -->
<head>
<title>Hello</title>
</head>
<body>
<p>Welcome to this example.</p>
</body>
</html>
With the tag removed, the document actually turns into the same as this:
<!DOCTYPE HTML>
<!-- where is this comment in the DOM? -->
<html>
<head>
<title>Hello</title>
</head>
<body>
<p>Welcome to this example.</p>
</body>
</html>
This is why the tag can only be removed if it is not followed by a comment: removing the tag when there is a comment there changes the document's resulting parse tree. Of course, if the position of the comment does not matter, then the tag can be omitted, as if the comment had been moved to before the start tag in the first place.
An
html
element's
end
tag
may
be
omitted
if
the
html
element
is
not
immediately
followed
by
a
comment
.
A
head
element's
start
tag
may
be
omitted
if
the
element
is
empty,
or
if
the
first
thing
inside
the
head
element
is
an
element.
A
head
element's
end
tag
may
be
omitted
if
the
head
element
is
not
immediately
followed
by
ASCII
whitespace
or
a
comment
.
A
body
element's
start
tag
may
be
omitted
if
the
element
is
empty,
or
if
the
first
thing
inside
the
body
element
is
not
ASCII
whitespace
or
a
comment
,
except
if
the
first
thing
inside
the
body
element
is
a
meta
,
link
,
script
,
style
,
or
template
element.
A
body
element's
end
tag
may
be
omitted
if
the
body
element
is
not
immediately
followed
by
a
comment
.
Note
that
in
the
example
above,
the
head
element
start
and
end
tags,
and
the
body
element
start
tag,
can't
be
omitted,
because
they
are
surrounded
by
whitespace:
<!DOCTYPE HTML>
<html>
<head>
<title>Hello</title>
</head>
<body>
<p>Welcome to this example.</p>
</body>
</html>
(The
body
and
html
element
end
tags
could
be
omitted
without
trouble;
any
spaces
after
those
get
parsed
into
the
body
element
anyway.)
Usually, however, whitespace isn't an issue. If we first remove the whitespace we don't care about:
<!DOCTYPE
HTML><html><head><title>Hello</title></head><body><p>Welcome
to
this
example.</p></body></html>
Then we can omit a number of tags without affecting the DOM:
<!DOCTYPE
HTML><title>Hello</title><p>Welcome
to
this
example.</p>
At that point, we can also add some whitespace back:
<!DOCTYPE HTML>
<title>Hello</title>
<p>Welcome
to
this
example.</p>
This
would
be
equivalent
to
this
document,
with
the
omitted
tags
shown
in
their
parser-implied
positions;
the
only
whitespace
text
node
that
results
from
this
is
the
newline
at
the
end
of
the
head
element:
<!DOCTYPE HTML>
<html><head><title>Hello</title>
</head><body>
<p>Welcome
to
this
example.</p>
</body></html>
An
li
element's
end
tag
may
be
omitted
if
the
li
element
is
immediately
followed
by
another
li
element
or
if
there
is
no
more
content
in
the
parent
element.
A
dt
element's
end
tag
may
be
omitted
if
the
dt
element
is
immediately
followed
by
another
dt
element
or
a
dd
element.
A
dd
element's
end
tag
may
be
omitted
if
the
dd
element
is
immediately
followed
by
another
dd
element
or
a
dt
element,
or
if
there
is
no
more
content
in
the
parent
element.
A
p
element's
end
tag
may
be
omitted
if
the
p
element
is
immediately
followed
by
an
address
,
article
,
aside
,
blockquote
,
details
,
div
,
dl
,
fieldset
,
figcaption
,
figure
,
footer
,
form
,
h1
,
h2
,
h3
,
h4
,
h5
,
h6
,
header
,
,
hgroup
hr
,
main
,
menu
,
nav
,
ol
,
p
,
pre
,
section
,
table
,
or
ul
element,
or
if
there
is
no
more
content
in
the
parent
element
and
the
parent
element
is
an
HTML
element
that
is
not
an
a
,
audio
,
del
,
ins
,
map
,
noscript
,
or
video
element,
or
an
autonomous
custom
element
.
We can thus simplify the earlier example further:
<!DOCTYPE
HTML><title>Hello</title><p>Welcome
to
this
example.
An
rt
element's
end
tag
may
be
omitted
if
the
rt
element
is
immediately
followed
by
an
rt
or
rp
element,
or
if
there
is
no
more
content
in
the
parent
element.
An
rp
element's
end
tag
may
be
omitted
if
the
rp
element
is
immediately
followed
by
an
rt
or
rp
element,
or
if
there
is
no
more
content
in
the
parent
element.
An
optgroup
element's
end
tag
may
be
omitted
if
the
optgroup
element
is
immediately
followed
by
another
optgroup
element,
or
if
there
is
no
more
content
in
the
parent
element.
An
option
element's
end
tag
may
be
omitted
if
the
option
element
is
immediately
followed
by
another
option
element,
or
if
it
is
immediately
followed
by
an
optgroup
element,
or
if
there
is
no
more
content
in
the
parent
element.
A
colgroup
element's
start
tag
may
be
omitted
if
the
first
thing
inside
the
colgroup
element
is
a
col
element,
and
if
the
element
is
not
immediately
preceded
by
another
colgroup
element
whose
end
tag
has
been
omitted.
(It
can't
be
omitted
if
the
element
is
empty.)
A
colgroup
element's
end
tag
may
be
omitted
if
the
colgroup
element
is
not
immediately
followed
by
ASCII
whitespace
or
a
comment
.
A
caption
element's
end
tag
may
be
omitted
if
the
caption
element
is
not
immediately
followed
by
ASCII
whitespace
or
a
comment
.
A
thead
element's
end
tag
may
be
omitted
if
the
thead
element
is
immediately
followed
by
a
tbody
or
tfoot
element.
A
tbody
element's
start
tag
may
be
omitted
if
the
first
thing
inside
the
tbody
element
is
a
tr
element,
and
if
the
element
is
not
immediately
preceded
by
a
tbody
,
thead
,
or
tfoot
element
whose
end
tag
has
been
omitted.
(It
can't
be
omitted
if
the
element
is
empty.)
A
tbody
element's
end
tag
may
be
omitted
if
the
tbody
element
is
immediately
followed
by
a
tbody
or
tfoot
element,
or
if
there
is
no
more
content
in
the
parent
element.
A
tfoot
element's
end
tag
may
be
omitted
if
there
is
no
more
content
in
the
parent
element.
A
tr
element's
end
tag
may
be
omitted
if
the
tr
element
is
immediately
followed
by
another
tr
element,
or
if
there
is
no
more
content
in
the
parent
element.
A
td
element's
end
tag
may
be
omitted
if
the
td
element
is
immediately
followed
by
a
td
or
th
element,
or
if
there
is
no
more
content
in
the
parent
element.
A
th
element's
end
tag
may
be
omitted
if
the
th
element
is
immediately
followed
by
a
td
or
th
element,
or
if
there
is
no
more
content
in
the
parent
element.
The ability to omit all these table-related tags makes table markup much terser.
Take this example:
<table>
<caption>37547 TEE Electric Powered Rail Car Train Functions (Abbreviated)</caption>
<colgroup><col><col><col></colgroup>
<thead>
<tr>
<th>Function</th>
<th>Control Unit</th>
<th>Central Station</th>
</tr>
</thead>
<tbody>
<tr>
<td>Headlights</td>
<td>✔</td>
<td>✔</td>
</tr>
<tr>
<td>Interior Lights</td>
<td>✔</td>
<td>✔</td>
</tr>
<tr>
<td>Electric locomotive operating sounds</td>
<td>✔</td>
<td>✔</td>
</tr>
<tr>
<td>Engineer's cab lighting</td>
<td></td>
<td>✔</td>
</tr>
<tr>
<td>Station Announcements - Swiss</td>
<td></td>
<td>✔</td>
</tr>
</tbody>
</table>
The exact same table, modulo some whitespace differences, could be marked up as follows:
<table>
<caption>37547 TEE Electric Powered Rail Car Train Functions (Abbreviated)
<colgroup><col><col><col>
<thead>
<tr>
<th>Function
<th>Control Unit
<th>Central Station
<tbody>
<tr>
<td>Headlights
<td>✔
<td>✔
<tr>
<td>Interior Lights
<td>✔
<td>✔
<tr>
<td>Electric locomotive operating sounds
<td>✔
<td>✔
<tr>
<td>Engineer's cab lighting
<td>
<td>✔
<tr>
<td>Station Announcements - Swiss
<td>
<td>✔
</table>
Since the cells take up much less room this way, this can be made even terser by having each row on one line:
<table>
<caption>37547 TEE Electric Powered Rail Car Train Functions (Abbreviated)
<colgroup><col><col><col>
<thead>
<tr> <th>Function <th>Control Unit <th>Central Station
<tbody>
<tr> <td>Headlights <td>✔ <td>✔
<tr> <td>Interior Lights <td>✔ <td>✔
<tr> <td>Electric locomotive operating sounds <td>✔ <td>✔
<tr> <td>Engineer's cab lighting <td> <td>✔
<tr> <td>Station Announcements - Swiss <td> <td>✔
</table>
The only differences between these tables, at the DOM level, is with the precise position of the (in any case semantically-neutral) whitespace.
However , a start tag must never be omitted if it has any attributes.
Returning to the earlier example with all the whitespace removed and then all the optional tags removed:
<!DOCTYPE
HTML><title>Hello</title><p>Welcome
to
this
example.
If
the
body
element
in
this
example
had
to
have
a
class
attribute
and
the
html
element
had
to
have
a
lang
attribute,
the
markup
would
have
to
become:
<!DOCTYPE
HTML><html
lang="en"><title>Hello</title><body
class="demo"><p>Welcome
to
this
example.
This section assumes that the document is conforming, in particular, that there are no content model violations. Omitting tags in the fashion described in this section in a document that does not conform to the content models described in this specification is likely to result in unexpected DOM differences (this is, in part, what the content models are designed to avoid).
For historical reasons, certain elements have extra restrictions beyond even the restrictions given by their content model.
A
table
element
must
not
contain
tr
elements,
even
though
these
elements
are
technically
allowed
inside
table
elements
according
to
the
content
models
described
in
this
specification.
(If
a
tr
element
is
put
inside
a
table
in
the
markup,
it
will
in
fact
imply
a
tbody
start
tag
before
it.)
A
single
newline
may
be
placed
immediately
after
the
start
tag
of
pre
and
textarea
elements.
This
does
not
affect
the
processing
of
the
element.
The
otherwise
optional
newline
must
be
included
if
the
element's
contents
themselves
start
with
a
newline
(because
otherwise
the
leading
newline
in
the
contents
would
be
treated
like
the
optional
newline,
and
ignored).
The
text
in
raw
text
and
escapable
raw
text
elements
must
not
contain
any
occurrences
of
the
string
"
</
"
(U+003C
LESS-THAN
SIGN,
U+002F
SOLIDUS)
followed
by
characters
that
case-insensitively
match
the
tag
name
of
the
element
followed
by
one
of
U+0009
CHARACTER
TABULATION
(tab),
U+000A
LINE
FEED
(LF),
U+000C
FORM
FEED
(FF),
U+000D
CARRIAGE
RETURN
(CR),
U+0020
SPACE,
U+003E
GREATER-THAN
SIGN
(>),
or
U+002F
SOLIDUS
(/).
Text is allowed inside elements, attribute values, and comments. Extra constraints are placed on what is and what is not allowed in text based on where the text is to be put, as described in the other sections.
Newlines in HTML may be represented either as U+000D CARRIAGE RETURN (CR) characters, U+000A LINE FEED (LF) characters, or pairs of U+000D CARRIAGE RETURN (CR), U+000A LINE FEED (LF) characters in that order.
Where character references are allowed, a character reference of a U+000A LINE FEED (LF) character (but not a U+000D CARRIAGE RETURN (CR) character) also represents a newline .
In certain cases described in other sections, text may be mixed with character references . These can be used to escape characters that couldn't otherwise legally be included in text .
Character references must start with a U+0026 AMPERSAND character (&). Following this, there are three possible kinds of character references:
The numeric character reference forms described above are allowed to reference any code point excluding U+000D CR, noncharacters , and controls other than ASCII whitespace .
An ambiguous ampersand is a U+0026 AMPERSAND character (&) that is followed by one or more ASCII alphanumerics , followed by a U+003B SEMICOLON character (;), where these characters do not match any of the names given in the named character references section.
CDATA sections must consist of the following components, in this order:
<![CDATA[
".
]]>
".
]]>
".
CDATA
sections
can
only
be
used
in
foreign
content
(MathML
or
SVG).
In
this
example,
a
CDATA
section
is
used
to
escape
the
contents
of
a
MathML
ms
element:
<p>You can add a string to a number, but this stringifies the number:</p>
<math>
<ms><![CDATA[x<y]]></ms>
<mo>+</mo>
<mn>3</mn>
<mo>=</mo>
<ms><![CDATA[x<y3]]></ms>
</math>
Comments must have the following format:
<!--
".
>
",
nor
start
with
the
string
"
->
",
nor
contain
the
strings
"
<!--
",
"
-->
",
or
"
--!>
",
nor
end
with
the
string
"
<!-
".
-->
".
The
text
is
allowed
to
end
with
the
string
"
<!
",
as
in
<!--My
favorite
operators
are
>
and
<!-->
.