Living Standard — Last Updated 1 October 2025
This section only applies to user agents, data mining tools, and conformance checkers.
The rules for parsing XML documents into DOM trees are covered by the next section, entitled "The XML syntax".
User agents must use the parsing rules described in this section to generate the DOM trees from
  text/html resources. Together, these rules define what is referred to as the
  HTML parser.
While the HTML syntax described in this specification bears a close resemblance to SGML and XML, it is a separate language with its own parsing rules.
Some earlier versions of HTML (in particular from HTML2 to HTML4) were based on SGML and used SGML parsing rules. However, few (if any) web browsers ever implemented true SGML parsing for HTML documents; the only user agents to strictly handle HTML as an SGML application have historically been validators. The resulting confusion — with validators claiming documents to have one representation while widely deployed web browsers interoperably implemented a different representation — has wasted decades of productivity. This version of HTML thus returns to a non-SGML basis.
For the purposes of conformance checkers, if a resource is determined to be in the HTML syntax, then it is an HTML document.
As stated in the terminology section,
  references to element types that do not explicitly specify a
  namespace always refer to elements in the HTML namespace. For example, if the spec
  talks about "a menu element", then that is an element with the local name "menu", the namespace "http://www.w3.org/1999/xhtml", and
  the interface HTMLMenuElement. Where possible, references to such elements are
  hyperlinked to their definition.
The input to the HTML parsing process consists of a stream of code
  points, which is passed through a tokenization stage followed by a tree
  construction stage. The output is a Document object.
Implementations that do not support scripting do not
  have to actually create a DOM Document object, but the DOM tree in such cases is
  still used as the model for the rest of the specification.
In the common case, the data handled by the tokenization stage comes from the network, but
  it can also come from script running in the user
  agent, e.g. using the document.write() API.
There is only one set of states for the tokenizer stage and the tree construction stage, but the tree construction stage is reentrant, meaning that while the tree construction stage is handling one token, the tokenizer might be resumed, causing further tokens to be emitted and processed before the first token's processing is complete.
In the following example, the tree construction stage will be called upon to handle a "p" start tag token while handling the "script" end tag token:
...
<script>
 document.write('<p>');
</script>
...To handle these cases, parsers have a script nesting level, which must be initially set to zero, and a parser pause flag, which must be initially set to false.
This specification defines the parsing rules for HTML documents, whether they are syntactically correct or not. Certain points in the parsing algorithm are said to be parse errors. The error handling for parse errors is well-defined (that's the processing rules described throughout this specification), but user agents, while parsing an HTML document, may abort the parser at the first parse error that they encounter for which they do not wish to apply the rules described in this specification.
Conformance checkers must report at least one parse error condition to the user if one or more parse error conditions exist in the document and must not report parse error conditions if none exist in the document. Conformance checkers may report more than one parse error condition if more than one parse error condition exists in the document.
Parse errors are only errors with the syntax of HTML. In addition to checking for parse errors, conformance checkers will also verify that the document obeys all the other conformance requirements described in this specification.
Some parse errors have dedicated codes outlined in the table below that should be used by conformance checkers in reports.
Error descriptions in the table below are non-normative.
| Code | Description | 
|---|---|
| abrupt-closing-of-empty-comment | This error occurs if the parser encounters an empty comment that is abruptly closed by a U+003E (>) code
     point (i.e.,  | 
| abrupt-doctype-public-identifier | This error occurs if the parser encounters a U+003E (>) code point in the
     DOCTYPE public identifier (e.g.,  | 
| abrupt-doctype-system-identifier | This error occurs if the parser encounters a U+003E (>) code point in the
     DOCTYPE system identifier (e.g.,  | 
| absence-of-digits-in-numeric-character-reference | This error occurs if the parser encounters a numeric character reference that doesn't contain any digits (e.g.,  | 
| cdata-in-html-content | This error occurs if the parser encounters a CDATA
     section outside of foreign content (SVG or MathML). The parser treats such CDATA
     sections (including leading " | 
| character-reference-outside-unicode-range | This error occurs if the parser encounters a numeric character reference that references a code point that is greater than the valid Unicode range. The parser resolves such a character reference to a U+FFFD REPLACEMENT CHARACTER. | 
| control-character-in-input-stream | This error occurs if the input stream contains a control code point that is not ASCII whitespace or U+0000 NULL. Such code points are parsed as-is and usually, where parsing rules don't apply any additional restrictions, make their way into the DOM. | 
| control-character-reference | This error occurs if the parser encounters a numeric character reference that references a control code point that is not ASCII whitespace or is a U+000D CARRIAGE RETURN. The parser resolves such character references as-is except C1 control references that are replaced according to the numeric character reference end state. | 
| duplicate-attribute | This error occurs if the parser encounters an attribute in a tag that already has an attribute with the same name. The parser ignores all such duplicate occurrences of the attribute. | 
| end-tag-with-attributes | This error occurs if the parser encounters an end tag with attributes. Attributes in end tags are ignored and do not make their way into the DOM. | 
| end-tag-with-trailing-solidus | This error occurs if the parser encounters an end
     tag that has a U+002F (/) code point right before the closing U+003E (>)
     code point (e.g.,  | 
| eof-before-tag-name | This error occurs if the parser encounters the end of the input stream
     where a tag name is expected. In this case the parser treats the beginning of a start tag (i.e.,  | 
| eof-in-cdata | This error occurs if the parser encounters the end of the input stream in a CDATA section. The parser treats such CDATA sections as if they are closed immediately before the end of the input stream. | 
| eof-in-comment | This error occurs if the parser encounters the end of the input stream in a comment. The parser treats such comments as if they are closed immediately before the end of the input stream. | 
| eof-in-doctype | This error occurs if the parser encounters the end of the input stream in a DOCTYPE. In such a case, if the DOCTYPE is correctly placed as a
     document preamble, the parser sets the  | 
| eof-in-script-html-comment-like-text | This error occurs if the parser encounters the end of the input stream in text
      that resembles an HTML comment inside
       Syntactic structures that resemble HTML comments in  | 
| eof-in-tag | This error occurs if the parser encounters the end of the input stream in a
     start tag or an end
     tag (e.g.,  | 
| incorrectly-closed-comment | This error occurs if the parser encounters a comment that is closed by the " | 
| incorrectly-opened-comment | This error occurs if the parser encounters the " One possible cause of this error is using an XML markup declaration (e.g.,
       | 
| invalid-character-sequence-after-doctype-name | This error occurs if the parser encounters any code point sequence other
     than " | 
| invalid-first-character-of-tag-name | This error occurs if the parser encounters a code point that is not an ASCII alpha where first code point of a start tag name or an end tag name is expected. If a start tag was expected such code point and a preceding U+003C (<) is treated as text content, and all content that follows is treated as markup. Whereas, if an end tag was expected, such code point and all content that follows up to a U+003E (>) code point (if present) or to the end of the input stream is treated as a comment. For example, consider the following markup: This will be parsed into: While the first code point of a tag name is limited to an ASCII alpha, a wide range of code points (including ASCII digits) is allowed in subsequent positions. | 
| missing-attribute-value | This error occurs if the parser encounters a U+003E (>) code point where an
     attribute value is expected (e.g.,  | 
| missing-doctype-name | This error occurs if the parser encounters a DOCTYPE that is missing a name (e.g.,  | 
| missing-doctype-public-identifier | This error occurs if the parser encounters a U+003E (>) code point where
     start of the DOCTYPE public identifier is expected (e.g.,
      | 
| missing-doctype-system-identifier | This error occurs if the parser encounters a U+003E (>) code point where
     start of the DOCTYPE system identifier is expected (e.g.,
      | 
| missing-end-tag-name | This error occurs if the parser encounters a U+003E (>) code point where an
     end tag name is expected, i.e.,  | 
| missing-quote-before-doctype-public-identifier | This error occurs if the parser encounters the DOCTYPE public identifier that is not preceded by a quote (e.g.,
      | 
| missing-quote-before-doctype-system-identifier | This error occurs if the parser encounters the DOCTYPE system identifier that is not preceded by a quote (e.g.,
      | 
| missing-semicolon-after-character-reference | This error occurs if the parser encounters a character reference that is not terminated by a U+003B (;) code point. The parser behaves the same as if the character reference is terminated by the U+003B (;) code point. Most named character references require a terminating U+003B (;) code point. Those that don't might get resolved as a longer named character reference in certain ambiguous scenarios. For example,  | 
| missing-whitespace-after-doctype-public-keyword | This error occurs if the parser encounters a DOCTYPE whose " | 
| missing-whitespace-after-doctype-system-keyword | This error occurs if the parser encounters a DOCTYPE whose " | 
| missing-whitespace-before-doctype-name | This error occurs if the parser encounters a DOCTYPE whose " | 
| missing-whitespace-between-attributes | This error occurs if the parser encounters attributes that are not separated by ASCII
     whitespace (e.g.,  | 
| missing-whitespace-between-doctype-public-and-system-identifiers | This error occurs if the parser encounters a DOCTYPE whose public and system identifiers are not separated by ASCII whitespace. In this case the parser behaves as if ASCII whitespace is present. | 
| nested-comment | This error occurs if the parser encounters a nested comment (e.g.,  | 
| noncharacter-character-reference | This error occurs if the parser encounters a numeric character reference that references a noncharacter. The parser resolves such character references as-is. | 
| noncharacter-in-input-stream | This error occurs if the input stream contains a noncharacter. Such code points are parsed as-is and usually, where parsing rules don't apply any additional restrictions, make their way into the DOM. | 
| non-void-html-element-start-tag-with-trailing-solidus | This error occurs if the parser encounters a start tag for an element that is not in the list of void elements or is not a part of foreign content (i.e., not an SVG or MathML element) that has a U+002F (/) code point right before the closing U+003E (>) code point. The parser behaves as if the U+002F (/) is not present. For example, consider the following markup: This will be parsed into: The trailing U+002F (/) in a start tag name can be used only in foreign content to specify self-closing tags. (Self-closing tags don't exist in HTML.) It is also allowed for void elements, but doesn't have any effect in this case. | 
| null-character-reference | This error occurs if the parser encounters a numeric character reference that references a U+0000 NULL code point. The parser resolves such character references to a U+FFFD REPLACEMENT CHARACTER. | 
| surrogate-character-reference | This error occurs if the parser encounters a numeric character reference that references a surrogate. The parser resolves such character references to a U+FFFD REPLACEMENT CHARACTER. | 
| surrogate-in-input-stream | This error occurs if the input stream contains a surrogate. Such code points are parsed as-is and usually, where parsing rules don't apply any additional restrictions, make their way into the DOM. Surrogates can only find their way into the input stream via script APIs such
      as  | 
| unexpected-character-after-doctype-system-identifier | This error occurs if the parser encounters any code points other than ASCII whitespace or closing U+003E (>) after the DOCTYPE system identifier. The parser ignores these code points. | 
| unexpected-character-in-attribute-name | This error occurs if the parser encounters a U+0022 ("), U+0027 ('), or U+003C (<) code point in an attribute name. The parser includes such code points in the attribute name. Code points that trigger this error are usually a part of another syntactic construct and can be a sign of a typo around the attribute name. For example, consider the following markup: Due to a forgotten U+003E (>) code point after  As another example of this error, consider the following markup: Due to a forgotten U+003D (=) code point between an attribute name and value the parser
       treats this markup as a  | 
| unexpected-character-in-unquoted-attribute-value | This error occurs if the parser encounters a U+0022 ("), U+0027 ('), U+003C (<), U+003D (=), or U+0060 (`) code point in an unquoted attribute value. The parser includes such code points in the attribute value. Code points that trigger this error are usually a part of another syntactic construct and can be a sign of a typo around the attribute value. U+0060 (`) is in the list of code points that trigger this error because certain legacy user agents treat it as a quote. For example, consider the following markup: Due to a misplaced U+0027 (') code point the parser sets the value of the " | 
| unexpected-equals-sign-before-attribute-name | This error occurs if the parser encounters a U+003D (=) code point before an attribute name. In this case the parser treats U+003D (=) as the first code point of the attribute name. The common reason for this error is a forgotten attribute name. For example, consider the following markup: Due to a forgotten attribute name the parser treats this markup as a  | 
| unexpected-null-character | This error occurs if the parser encounters a U+0000 NULL code point in the input stream in certain positions. In general, such code points are either ignored or, for security reasons, replaced with a U+FFFD REPLACEMENT CHARACTER. | 
| unexpected-question-mark-instead-of-tag-name | This error occurs if the parser encounters a U+003F (?) code point where first code point of a start tag name is expected. The U+003F (?) and all content that follows up to a U+003E (>) code point (if present) or to the end of the input stream is treated as a comment. For example, consider the following markup: This will be parsed into: The common reason for this error is an XML processing instruction (e.g.,  | 
| unexpected-solidus-in-tag | This error occurs if the parser encounters a U+002F (/) code point that is
     not a part of a quoted attribute value and not
     immediately followed by a U+003E (>) code point in a tag (e.g.,  | 
| unknown-named-character-reference | This error occurs if the parser encounters an ambiguous ampersand. In this case the parser doesn't resolve the character reference. | 
The stream of code points that comprises the input to the tokenization stage will be initially seen by the user agent as a stream of bytes (typically coming over the network or from the local file system). The bytes encode the actual characters according to a particular character encoding, which the user agent uses to decode the bytes into characters.
For XML documents, the algorithm user agents are required to use to determine the character encoding is given by XML. This section does not apply to XML documents. [XML]
Usually, the encoding sniffing algorithm defined below is used to determine the character encoding.
Given a character encoding, the bytes in the input byte stream must be converted to characters for the tokenizer's input stream, by passing the input byte stream and character encoding to decode.
A leading Byte Order Mark (BOM) causes the character encoding argument to be ignored and will itself be skipped.
Bytes or sequences of bytes in the original byte stream that did not conform to the Encoding standard (e.g. invalid UTF-8 byte sequences in a UTF-8 input byte stream) are errors that conformance checkers are expected to report. [ENCODING]
The decoder algorithms describe how to handle invalid input; for security reasons, it is imperative that those rules be followed precisely. Differences in how invalid byte sequences are handled can result in, amongst other problems, script injection vulnerabilities ("XSS").
When the HTML parser is decoding an input byte stream, it uses a character encoding and a confidence. The confidence is either tentative, certain, or irrelevant. The encoding used, and whether the confidence in that encoding is tentative or certain, is used during the parsing to determine whether to change the encoding. If no encoding is necessary, e.g. because the parser is operating on a Unicode stream and doesn't have to use a character encoding at all, then the confidence is irrelevant.
Some algorithms feed the parser by directly adding characters to the input stream rather than adding bytes to the input byte stream.
When the HTML parser is to operate on an input byte stream that has a known definite encoding, then the character encoding is that encoding and the confidence is certain.
In some cases, it might be impractical to unambiguously determine the encoding before parsing the document. Because of this, this specification provides for a two-pass mechanism with an optional pre-scan. Implementations are allowed, as described below, to apply a simplified parsing algorithm to whatever bytes they have available before beginning to parse the document. Then, the real parser is started, using a tentative encoding derived from this pre-parse and other out-of-band metadata. If, while the document is being loaded, the user agent discovers a character encoding declaration that conflicts with this information, then the parser can get reinvoked to perform a parse of the document with the real encoding.
User agents must use the following algorithm, called the encoding sniffing algorithm, to determine the character encoding to use when decoding a document in the first pass. This algorithm takes as input any out-of-band metadata available to the user agent (e.g. the Content-Type metadata of the document) and all the bytes available so far, and returns a character encoding and a confidence that is either tentative or certain.
If the result of BOM sniffing is an encoding, return that encoding with confidence certain.
Although the decode algorithm will itself change the encoding to use based on the presence of a byte order mark, this algorithm sniffs the BOM as well in order to set the correct document's character encoding and confidence.
If the user has explicitly instructed the user agent to override the document's character encoding with a specific encoding, optionally return that encoding with the confidence certain.
Typically, user agents remember such user requests across sessions, and in some
    cases apply them to documents in iframes as well.
The user agent may wait for more bytes of the resource to be available, either in this step or at any later step in this algorithm. For instance, a user agent might wait 500ms or 1024 bytes, whichever came first. In general preparsing the source to find the encoding improves performance, as it reduces the need to throw away the data structures used when parsing upon finding the encoding information. However, if the user agent delays too long to obtain data to determine the encoding, then the cost of the delay could outweigh any performance improvements from the preparse.
The authoring conformance requirements for character encoding declarations limit them to only appearing in the first 1024 bytes. User agents are therefore encouraged to use the prescan algorithm below (as invoked by these steps) on the first 1024 bytes, but not to stall beyond that.
If the transport layer specifies a character encoding, and it is supported, return that encoding with the confidence certain.
Optionally, prescan the byte stream to determine its encoding, with the end condition being when the user agent decides that scanning further bytes would not be efficient. User agents are encouraged to only prescan the first 1024 bytes. User agents may decide that scanning any bytes is not efficient, in which case these substeps are entirely skipped.
The aforementioned algorithm returns either a character encoding or failure. If it returns a character encoding, then return the same encoding, with confidence tentative.
If the HTML parser for which this algorithm is being run is associated with a
    Document d whose container
    document is non-null, then:
Let parentDocument be d's container document.
If parentDocument's origin is same origin with d's origin and parentDocument's character encoding is not UTF-16BE/LE, then return parentDocument's character encoding, with the confidence tentative.
Otherwise, if the user agent has information on the likely encoding for this page, e.g. based on the encoding of the page when it was last visited, then return that encoding, with the confidence tentative.
The user agent may attempt to autodetect the character encoding from applying frequency analysis or other algorithms to the data stream. Such algorithms may use information about the resource other than the resource's contents, including the address of the resource. If autodetection succeeds in determining a character encoding, and that encoding is a supported encoding, then return that encoding, with the confidence tentative. [UNIVCHARDET]
User agents are generally discouraged from attempting to autodetect encodings for resources obtained over the network, since doing so involves inherently non-interoperable heuristics. Attempting to detect encodings based on an HTML document's preamble is especially tricky since HTML markup typically uses only ASCII characters, and HTML documents tend to begin with a lot of markup rather than with text content.
The UTF-8 encoding has a highly detectable bit pattern. Files from the local file system that contain bytes with values greater than 0x7F which match the UTF-8 pattern are very likely to be UTF-8, while documents with byte sequences that do not match it are very likely not. When a user agent can examine the whole file, rather than just the preamble, detecting for UTF-8 specifically can be especially effective. [PPUTF8] [UTF8DET]
Otherwise, return an implementation-defined or user-specified default character encoding, with the confidence tentative.
In controlled environments or in environments where the encoding of documents can be
    prescribed (for example, for user agents intended for dedicated use in new networks), the
    comprehensive UTF-8 encoding is suggested.
In other environments, the default encoding is typically dependent on the user's locale (an approximation of the languages, and thus often encodings, of the pages that the user is likely to frequent). The following table gives suggested defaults based on the user's locale, for compatibility with legacy content. Locales are identified by BCP 47 language tags. [BCP47] [ENCODING]
| Locale language | Suggested default encoding | |
|---|---|---|
| ar | Arabic | windows-1256 | 
| az | Azeri | windows-1254 | 
| ba | Bashkir | windows-1251 | 
| be | Belarusian | windows-1251 | 
| bg | Bulgarian | windows-1251 | 
| cs | Czech | windows-1250 | 
| el | Greek | ISO-8859-7 | 
| et | Estonian | windows-1257 | 
| fa | Persian | windows-1256 | 
| he | Hebrew | windows-1255 | 
| hr | Croatian | windows-1250 | 
| hu | Hungarian | ISO-8859-2 | 
| ja | Japanese | Shift_JIS | 
| kk | Kazakh | windows-1251 | 
| ko | Korean | EUC-KR | 
| ku | Kurdish | windows-1254 | 
| ky | Kyrgyz | windows-1251 | 
| lt | Lithuanian | windows-1257 | 
| lv | Latvian | windows-1257 | 
| mk | Macedonian | windows-1251 | 
| pl | Polish | ISO-8859-2 | 
| ru | Russian | windows-1251 | 
| sah | Yakut | windows-1251 | 
| sk | Slovak | windows-1250 | 
| sl | Slovenian | ISO-8859-2 | 
| sr | Serbian | windows-1251 | 
| tg | Tajik | windows-1251 | 
| th | Thai | windows-874 | 
| tr | Turkish | windows-1254 | 
| tt | Tatar | windows-1251 | 
| uk | Ukrainian | windows-1251 | 
| vi | Vietnamese | windows-1258 | 
| zh-Hans, zh-CN, zh-SG | Chinese, Simplified | GBK | 
| zh-Hant, zh-HK, zh-MO, zh-TW | Chinese, Traditional | Big5 | 
| All other locales | windows-1252 | |
The contents of this table are derived from the intersection of Windows, Chrome, and Firefox defaults.
The document's character encoding must immediately be set to the value returned from this algorithm, at the same time as the user agent uses the returned value to select the decoder to use for the input byte stream.
When an algorithm requires a user agent to prescan a byte stream to determine its encoding, given some defined end condition, then it must run the following steps. If at any point during these steps (including during instances of the get an attribute algorithm invoked by this one) the user agent either runs out of bytes (meaning the position pointer created in the first step below goes beyond the end of the byte stream obtained so far) or reaches its end condition, then abort the prescan a byte stream to determine its encoding algorithm and return the result get an XML encoding applied to the same bytes that the prescan a byte stream to determine its encoding algorithm was applied to. Otherwise, these steps will return a character encoding.
Let position be a pointer to a byte in the input byte stream, initially pointing at the first byte.
Prescan for UTF-16 XML declarations: If position points to:
Return UTF-16LE.
Return UTF-16BE.
For historical reasons, the prefix is two bytes longer than in Appendix F of XML and the encoding name is not checked.
Loop: If position points to:
<!--`)Advance the position pointer so that it points at the first 0x3E byte which is preceded by two 0x2D bytes (i.e. at the end of an ASCII '-->' sequence) and comes after the 0x3C byte that was found. (The two 0x2D bytes can be the same as those in the '<!--' sequence.)
Advance the position pointer so that it points at the next 0x09, 0x0A, 0x0C, 0x0D, 0x20, or 0x2F byte (the one in sequence of characters matched above).
Let attribute list be an empty list of strings.
Let got pragma be false.
Let need pragma be null.
Let charset be the null value (which, for the purposes of this algorithm, is distinct from an unrecognized encoding or the empty string).
Attributes: Get an attribute and its value. If no attribute was sniffed, then jump to the processing step below.
If the attribute's name is already in attribute list, then return to the step labeled attributes.
Add the attribute's name to attribute list.
Run the appropriate step from the following list, if one applies:
http-equiv"If the attribute's value is "content-type", then set got pragma to true.
content"Apply the algorithm for extracting a character encoding from a
         meta element, giving the attribute's value as the string to parse. If a
         character encoding is returned, and if charset is still set to null,
         let charset be the encoding returned, and set need
         pragma to true.
charset"Let charset be the result of getting an encoding from the attribute's value, and set need pragma to false.
Return to the step labeled attributes.
Processing: If need pragma is null, then jump to the step below labeled next byte.
If need pragma is true but got pragma is false, then jump to the step below labeled next byte.
If charset is failure, then jump to the step below labeled next byte.
If charset is UTF-16BE/LE, then set charset to UTF-8.
If charset is x-user-defined, then set charset to windows-1252.
Return charset.
Advance the position pointer so that it points at the next 0x09 (HT), 0x0A (LF), 0x0C (FF), 0x0D (CR), 0x20 (SP), or 0x3E (>) byte.
Repeatedly get an attribute until no further attributes can be found, then jump to the step below labeled next byte.
<!`)</`)<?`)Advance the position pointer so that it points at the first 0x3E byte (>) that comes after the 0x3C byte that was found.
Do nothing with that byte.
When the prescan a byte stream to determine its encoding algorithm says to get an attribute, it means doing this:
If the byte at position is one of 0x09 (HT), 0x0A (LF), 0x0C (FF), 0x0D (CR), 0x20 (SP), or 0x2F (/), then advance position to the next byte and redo this step.
If the byte at position is 0x3E (>), then abort the get an attribute algorithm. There isn't one.
Otherwise, the byte at position is the start of the attribute name. Let attribute name and attribute value be the empty string.
Process the byte at position as follows:
Advance position to the next byte and return to the previous step.
Spaces: If the byte at position is one of 0x09 (HT), 0x0A (LF), 0x0C (FF), 0x0D (CR), or 0x20 (SP), then advance position to the next byte, then, repeat this step.
If the byte at position is not 0x3D (=), abort the get an attribute algorithm. The attribute's name is the value of attribute name, its value is the empty string.
Advance position past the 0x3D (=) byte.
Value: If the byte at position is one of 0x09 (HT), 0x0A (LF), 0x0C (FF), 0x0D (CR), or 0x20 (SP), then advance position to the next byte, then, repeat this step.
Process the byte at position as follows:
Process the byte at position as follows:
Advance position to the next byte and return to the previous step.
When the prescan a byte stream to determine its encoding algorithm is aborted without returning an encoding, get an XML encoding means doing this.
Looking for syntax resembling an XML declaration, even in text/html,
  is necessary for compatibility with existing content.
Let encodingPosition be a pointer to the start of the stream.
If encodingPosition does not point to the start of a byte sequence 0x3C, 0x3F,
   0x78, 0x6D, 0x6C (`<?xml`), then return failure.
Let xmlDeclarationEnd be a pointer to the next byte in the input byte stream which is 0x3E (>). If there is no such byte, then return failure.
Set encodingPosition to the position of the first occurrence of the subsequence
   of bytes 0x65, 0x6E, 0x63, 0x6F, 0x64, 0x69, 0x6E, 0x67 (`encoding`) at or
   after the current encodingPosition. If there is no such sequence, then return
   failure.
Advance encodingPosition past the 0x67 (g) byte.
While the byte at encodingPosition is less than or equal to 0x20 (i.e., it is either an ASCII space or control character), advance encodingPosition to the next byte.
If the byte at encodingPosition is not 0x3D (=), then return failure.
Advance encodingPosition to the next byte.
While the byte at encodingPosition is less than or equal to 0x20 (i.e., it is either an ASCII space or control character), advance encodingPosition to the next byte.
Let quoteMark be the byte at encodingPosition.
If quoteMark is not either 0x22 (") or 0x27 ('), then return failure.
Advance encodingPosition to the next byte.
Let encodingEndPosition be the position of the next occurrence of quoteMark at or after encodingPosition. If quoteMark does not occur again, then return failure.
Let potentialEncoding be the sequence of the bytes between encodingPosition (inclusive) and encodingEndPosition (exclusive).
If potentialEncoding contains one or more bytes whose byte value is 0x20 or below, then return failure.
Let encoding be the result of getting an encoding given potentialEncoding isomorphic decoded.
If the encoding is UTF-16BE/LE, then change it to UTF-8.
Return encoding.
For the sake of interoperability, user agents should not use a pre-scan algorithm that returns different results than the one described above. (But, if you do, please at least let us know, so that we can improve this algorithm and benefit everyone...)
User agents must support the encodings defined in Encoding, including, but not limited to, UTF-8, ISO-8859-2, ISO-8859-7, ISO-8859-8, windows-874, windows-1250, windows-1251, windows-1252, windows-1254, windows-1255, windows-1256, windows-1257, windows-1258, GBK, Big5, ISO-2022-JP, Shift_JIS, EUC-KR, UTF-16BE, UTF-16LE, UTF-16BE/LE, and x-user-defined. User agents must not support other encodings.
The above prohibits supporting, for example, CESU-8, UTF-7, BOCU-1, SCSU, EBCDIC, and UTF-32. This specification does not make any attempt to support prohibited encodings in its algorithms; support and use of prohibited encodings would thus lead to unexpected behavior. [CESU8] [UTF7] [BOCU1] [SCSU]
When the parser requires the user agent to change the encoding, it must run the following steps. This might happen if the encoding sniffing algorithm described above failed to find a character encoding, or if it found a character encoding that was not the actual encoding of the file.
If the encoding that is already being used to interpret the input stream is UTF-16BE/LE, then set the confidence to certain and return. The new encoding is ignored; if it was anything but the same encoding, then it would be clearly incorrect.
If the new encoding is UTF-16BE/LE, then change it to UTF-8.
If the new encoding is x-user-defined, then change it to windows-1252.
If the new encoding is identical or equivalent to the encoding that is already being used to interpret the input stream, then set the confidence to certain and return. This happens when the encoding information found in the file matches what the encoding sniffing algorithm determined to be the encoding, and in the second pass through the parser if the first pass found that the encoding sniffing algorithm described in the earlier section failed to find the right encoding.
If all the bytes up to the last byte converted by the current decoder have the same Unicode interpretations in both the current encoding and the new encoding, and if the user agent supports changing the converter on the fly, then the user agent may change to the new converter for the encoding on the fly. Set the document's character encoding and the encoding used to convert the input stream to the new encoding, set the confidence to certain, and return.
Otherwise, restart the navigate algorithm, with historyHandling set to "replace" and other inputs kept the same, but
   this time skip the encoding sniffing algorithm and instead just set the encoding to
   the new encoding and the confidence to
   certain. Whenever possible, this should be done without actually contacting the network
   layer (the bytes should be re-parsed from memory), even if, e.g., the document is marked as not
   being cacheable. If this is not possible and contacting the network layer would involve repeating
   a request that uses a method other than `GET`, then instead set the confidence to certain and ignore the new
   encoding. The resource will be misinterpreted. User agents may notify the user of the situation,
   to aid in application development.
This algorithm is only invoked when a new encoding is found declared on a
  meta element.
The input stream consists of the characters pushed into it as the input byte stream is decoded or from the various APIs that directly manipulate the input stream.
Any occurrences of surrogates are surrogate-in-input-stream parse errors. Any occurrences of noncharacters are noncharacter-in-input-stream parse errors and any occurrences of controls other than ASCII whitespace and U+0000 NULL characters are control-character-in-input-stream parse errors.
The handling of U+0000 NULL characters varies based on where the characters are found and happens at the later stages of the parsing. They are either ignored or, for security reasons, replaced with a U+FFFD REPLACEMENT CHARACTER. This handling is, by necessity, spread across both the tokenization stage and the tree construction stage.
Before the tokenization stage, the input stream must be preprocessed by normalizing newlines. Thus, newlines in HTML DOMs are represented by U+000A LF characters, and there are never any U+000D CR characters in the input to the tokenization stage.
The next input character is the first character in the input stream that has not yet been consumed or explicitly ignored by the requirements in this section. Initially, the next input character is the first character in the input. The current input character is the last character to have been consumed.
The insertion point is the position (just before a character or just before the end
  of the input stream) where content inserted using document.write() is actually inserted. The insertion point is
  relative to the position of the character immediately after it, it is not an absolute offset into
  the input stream. Initially, the insertion point is undefined.
The "EOF" character in the tables below is a conceptual character representing the end of the
  input stream. If the parser is a script-created parser, then the end of
  the input stream is reached when an explicit "EOF" character (inserted by
  the document.close() method) is consumed. Otherwise, the
  "EOF" character is not a real character in the stream, but rather the lack of any further
  characters.
The insertion mode is a state variable that controls the primary operation of the tree construction stage.
Initially, the insertion mode is "initial". It can change to "before html", "before head", "in head", "in head noscript", "after head", "in body", "text", "in table", "in table text", "in caption", "in column group", "in table body", "in row", "in cell", "in template", "after body", "in frameset", "after frameset", "after after body", and "after after frameset" during the course of the parsing, as described in the tree construction stage. The insertion mode affects how tokens are processed and whether CDATA sections are supported.
Several of these modes, namely "in head", "in body", and "in table", are special, in that the other modes defer to them at various times. When the algorithm below says that the user agent is to do something "using the rules for the m insertion mode", where m is one of these modes, the user agent must use the rules described under the m insertion mode's section, but must leave the insertion mode unchanged unless the rules in m themselves switch the insertion mode to a new value.
When the insertion mode is switched to "text" or "in table text", the original insertion mode is also set. This is the insertion mode to which the tree construction stage will return.
Similarly, to parse nested template elements, a stack of template insertion
  modes is used. It is initially empty. The current template insertion mode is the
  insertion mode that was most recently added to the stack of template insertion modes.
  The algorithms in the sections below will push insertion modes onto this stack, meaning
  that the specified insertion mode is to be added to the stack, and pop insertion modes from
  the stack, which means that the most recently added insertion mode must be removed from the
  stack.
When the steps below require the UA to reset the insertion mode appropriately, it means the UA must follow these steps:
Let last be false.
Let node be the last node in the stack of open elements.
Loop: If node is the first node in the stack of open elements, then set last to true, and, if the parser was created as part of the HTML fragment parsing algorithm (fragment case), set node to the context element passed to that algorithm.
If node is a td or th element and last is
   false, then switch the insertion mode to "in
   cell" and return.
If node is a tr element, then switch the insertion
   mode to "in row" and return.
If node is a tbody, thead, or
   tfoot element, then switch the insertion mode to "in table body" and return.
If node is a caption element, then switch the
   insertion mode to "in caption" and
   return.
If node is a colgroup element, then switch the
   insertion mode to "in column
   group" and return.
If node is a table element, then switch the
   insertion mode to "in table" and
   return.
If node is a template element, then switch the
   insertion mode to the current template insertion mode and
   return.
If node is a head element and last is
   false, then switch the insertion mode to "in
   head" and return.
If node is a body element, then switch the
   insertion mode to "in body" and
   return.
If node is a frameset element, then switch the
   insertion mode to "in frameset" and
   return. (fragment case)
If node is an html element, run these substeps:
If the head element pointer is null, switch the
     insertion mode to "before head"
     and return. (fragment case)
Otherwise, the head element pointer is not null, switch the
     insertion mode to "after head" and
     return.
If last is true, then switch the insertion mode to "in body" and return. (fragment case)
Let node now be the node before node in the stack of open elements.
Return to the step labeled loop.
Initially, the stack of open elements is empty. The stack grows downwards; the topmost node on the stack is the first one added to the stack, and the bottommost node of the stack is the most recently added node in the stack (notwithstanding when the stack is manipulated in a random access fashion as part of the handling for misnested tags).
The "before html"
  insertion mode creates the html document element, which is
  then added to the stack.
In the fragment case, the stack of open elements is
  initialized to contain an html element that is created as part of that algorithm. (The fragment case skips the
  "before html" insertion mode.)
The html node, however it is created, is the topmost node of the stack. It only
  gets popped off the stack when the parser finishes.
The current node is the bottommost node in this stack of open elements.
The adjusted current node is the context element if the parser was created as part of the HTML fragment parsing algorithm and the stack of open elements has only one element in it (fragment case); otherwise, the adjusted current node is the current node.
When the current node is removed from the stack of open elements, process internal resource links given the current node's node document.
Elements in the stack of open elements fall into the following categories:
The following elements have varying levels of special parsing rules: HTML's
    address, applet, area, article,
    aside, base, basefont, bgsound,
    blockquote, body, br, button,
    caption, center, col, colgroup,
    dd, details, dir, div, dl,
    dt, embed, fieldset, figcaption,
    figure, footer, form, frame,
    frameset, h1, h2, h3, h4,
    h5, h6, head, header, hgroup,
    hr, html, iframe, 
    img, input, keygen, li, link,
    listing, main, marquee, menu,
    meta, nav, noembed, noframes,
    noscript, object, ol, p,
    param, plaintext, pre, script,
    search, section, select, source,
    style, summary, table, tbody,
    td, template, textarea, tfoot,
    th, thead, title, tr, track,
    ul, wbr, xmp; MathML mi,
    MathML mo, MathML mn, MathML
    ms, MathML mtext, and MathML
    annotation-xml; and SVG foreignObject, SVG
    desc, and SVG title.
An image start tag token is handled by the tree builder,
    but it is not in this list because it is not an element; it gets turned into an img
    element.
The following HTML elements are those that end up in the list of active formatting
   elements: a, b, big, code,
   em, font, i, nobr, s,
   small, strike, strong, tt, and
   u.
All other elements found while parsing an HTML document.
Typically, the special elements have the start and end tag tokens
  handled specifically, while ordinary elements' tokens fall into "any other start tag"
  and "any other end tag" clauses, and some parts of the tree builder check if a particular element
  in the stack of open elements is in the special category. However, some
  elements (e.g., the option element) have their start or end tag tokens handled
  specifically, but are still not in the special category, so that they get the
  ordinary handling elsewhere.
The stack of open elements is said to have an element target node in a specific scope consisting of a list of element types list when the following algorithm terminates in a match state:
Initialize node to be the current node (the bottommost node of the stack).
If node is target node, terminate in a match state.
Otherwise, if node is one of the element types in list, terminate in a failure state.
Otherwise, set node to the previous entry in the stack of open
   elements and return to step 2. (This will never fail, since the loop will always terminate
   in the previous step if the top of the stack — an html element — is
   reached.)
The stack of open elements is said to have a particular element in scope when it has that element in the specific scope consisting of the following element types:
The stack of open elements is said to have a particular element in list item scope when it has that element in the specific scope consisting of the following element types:
ol in the HTML namespaceul in the HTML namespaceThe stack of open elements is said to have a particular element in button scope when it has that element in the specific scope consisting of the following element types:
button in the HTML namespaceThe stack of open elements is said to have a particular element in table scope when it has that element in the specific scope consisting of the following element types:
html in the HTML namespacetable in the HTML namespacetemplate in the HTML namespaceNothing happens if at any time any of the elements in the stack of open elements
  are moved to a new location in, or removed from, the Document tree. In particular,
  the stack is not changed in this situation. This can cause, amongst other strange effects, content
  to be appended to nodes that are no longer in the DOM.
In some cases (namely, when closing misnested formatting elements), the stack is manipulated in a random-access fashion.
Initially, the list of active formatting elements is empty. It is used to handle mis-nested formatting element tags.
The list contains elements in the formatting category, and markers. The markers are inserted when entering applet,
  object, marquee, template, td,
  th, and caption elements, and are used to prevent formatting from
  "leaking" into applet, object, marquee,
  template, td, th, and caption elements.
In addition, each element in the list of active formatting elements is associated with the token for which it was created, so that further elements can be created for that token if necessary.
When the steps below require the UA to push onto the list of active formatting elements an element element, the UA must perform the following steps:
If there are already three elements in the list of active formatting elements after the last marker, if any, or anywhere in the list if there are no markers, that have the same tag name, namespace, and attributes as element, then remove the earliest such element from the list of active formatting elements. For these purposes, the attributes must be compared as they were when the elements were created by the parser; two elements have the same attributes if all their parsed attributes can be paired such that the two attributes in each pair have identical names, namespaces, and values (the order of the attributes does not matter).
This is the Noah's Ark clause. But with three per family instead of two.
Add element to the list of active formatting elements.
When the steps below require the UA to reconstruct the active formatting elements, the UA must perform the following steps:
If there are no entries in the list of active formatting elements, then there is nothing to reconstruct; stop this algorithm.
If the last (most recently added) entry in the list of active formatting elements is a marker, or if it is an element that is in the stack of open elements, then there is nothing to reconstruct; stop this algorithm.
Let entry be the last (most recently added) element in the list of active formatting elements.
Rewind: If there are no entries before entry in the list of active formatting elements, then jump to the step labeled create.
Let entry be the entry one earlier than entry in the list of active formatting elements.
If entry is neither a marker nor an element that is also in the stack of open elements, go to the step labeled rewind.
Advance: Let entry be the element one later than entry in the list of active formatting elements.
Create: Insert an HTML element for the token for which the element entry was created, to obtain new element.
Replace the entry for entry in the list with an entry for new element.
If the entry for new element in the list of active formatting elements is not the last entry in the list, return to the step labeled advance.
This has the effect of reopening all the formatting elements that were opened in the current body, cell, or caption (whichever is youngest) that haven't been explicitly closed.
The way this specification is written, the list of active formatting elements always consists of elements in chronological order with the least recently added element first and the most recently added element last (except for while steps 7 to 10 of the above algorithm are being executed, of course).
When the steps below require the UA to clear the list of active formatting elements up to the last marker, the UA must perform the following steps:
Let entry be the last (most recently added) entry in the list of active formatting elements.
Remove entry from the list of active formatting elements.
If entry was a marker, then stop the algorithm at this point. The list has been cleared up to the last marker.
Go to step 1.
Initially, the head element pointer and the form element pointer are both null.
Once a head element has been parsed (whether implicitly or explicitly) the
  head element pointer gets set to point to this node.
The form element pointer points to the last
  form element that was opened and whose end tag has not yet been seen. It is used to
  make form controls associate with forms in the face of dramatically bad markup, for historical
  reasons. It is ignored inside template elements.
The scripting flag is set to "enabled" if scripting
  was enabled for the Document with which the parser is associated when the
  parser was created, and "disabled" otherwise.
The scripting flag can be enabled even when the parser was created as
  part of the HTML fragment parsing algorithm, even though script elements
  don't execute in that case.
The frameset-ok flag is set to "ok" when the parser is created. It is set to "not ok" after certain tokens are seen.
Implementations must act as if they used the following state machine to tokenize HTML. The state machine must start in the data state. Most states consume a single character, which may have various side-effects, and either switches the state machine to a new state to reconsume the current input character, or switches it to a new state to consume the next character, or stays in the same state to consume the next character. Some states have more complicated behavior and can consume several characters before switching to another state. In some cases, the tokenizer state is also changed by the tree construction stage.
When a state says to reconsume a matched character in a specified state, that means to switch to that state, but when it attempts to consume the next input character, provide it with the current input character instead.
The exact behavior of certain states depends on the insertion mode and the stack of open elements. Certain states also use a temporary buffer to track progress, and the character reference state uses a return state to return to the state it was invoked from.
The output of the tokenization step is a series of zero or more of the following tokens: DOCTYPE, start tag, end tag, comment, character, end-of-file. DOCTYPE tokens have a name, a public identifier, a system identifier, and a force-quirks flag. When a DOCTYPE token is created, its name, public identifier, and system identifier must be marked as missing (which is a distinct state from the empty string), and the force-quirks flag must be set to off (its other state is on). Start and end tag tokens have a tag name, a self-closing flag, and a list of attributes, each of which has a name and a value. When a start or end tag token is created, its self-closing flag must be unset (its other state is that it be set), and its attributes list must be empty. Comment and character tokens have data.
When a token is emitted, it must immediately be handled by the tree construction
  stage. The tree construction stage can affect the state of the tokenization stage, and can insert
  additional characters into the stream. (For example, the script element can result in
  scripts executing and using the dynamic markup insertion APIs to insert characters
  into the stream being tokenized.)
Creating a token and emitting it are distinct actions. It is possible for a token to be created but implicitly abandoned (never emitted), e.g. if the file ends unexpectedly while processing the characters that are being parsed into a start tag token.
When a start tag token is emitted with its self-closing flag set, if the flag is not acknowledged when it is processed by the tree construction stage, that is a non-void-html-element-start-tag-with-trailing-solidus parse error.
When an end tag token is emitted with attributes, that is an end-tag-with-attributes parse error.
When an end tag token is emitted with its self-closing flag set, that is an end-tag-with-trailing-solidus parse error.
An appropriate end tag token is an end tag token whose tag name matches the tag name of the last start tag to have been emitted from this tokenizer, if any. If no start tag has been emitted from this tokenizer, then no end tag token is appropriate.
A character reference is said to be consumed as part of an attribute if the return state is either attribute value (double-quoted) state, attribute value (single-quoted) state, or attribute value (unquoted) state.
When a state says to flush code points consumed as a character reference, it means that for each code point in the temporary buffer (in the order they were added to the buffer), the user agent must append the code point from the buffer to the current attribute's value if the character reference was consumed as part of an attribute, or emit the code point as a character token otherwise.
Before each step of the tokenizer, the user agent must first check the parser pause flag. If it is true, then the tokenizer must abort the processing of any nested invocations of the tokenizer, yielding control back to the caller.
The tokenizer state machine consists of the states defined in the following subsections.
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
script", then switch to the script data double escaped state.
   Otherwise, switch to the script data escaped state. Emit the current input
   character as a character token.Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
script", then switch to the script data escaped state. Otherwise,
   switch to the script data double escaped state. Emit the current input
   character as a character token.Consume the next input character:
Consume the next input character:
When the user agent leaves the attribute name state (and before emitting the tag token, if appropriate), the complete attribute's name must be compared to the other attributes on the same token; if there is already an attribute on the token with the exact same name, then this is a duplicate-attribute parse error and the new attribute must be removed from the token.
If an attribute is so removed from a token, it, and the value that gets associated with it, if any, are never subsequently used by the parser, and are therefore effectively discarded. Removing the attribute in this way does not change its status as the "current attribute" for the purposes of the tokenizer, however.
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
If the next few characters are:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
If the six characters starting from the current input character are an ASCII case-insensitive match for the word "PUBLIC", then consume those characters and switch to the after DOCTYPE public keyword state.
Otherwise, if the six characters starting from the current input character are an ASCII case-insensitive match for the word "SYSTEM", then consume those characters and switch to the after DOCTYPE system keyword state.
Otherwise, this is an invalid-character-sequence-after-doctype-name parse error. Set the current DOCTYPE token's force-quirks flag to on. Reconsume in the bogus DOCTYPE state.
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
U+0000 NULL characters are handled in the tree construction stage, as part of the in foreign content insertion mode, which is the only place where CDATA sections can appear.
Consume the next input character:
Consume the next input character:
Set the temporary buffer to the empty string. Append a U+0026 AMPERSAND (&) character to the temporary buffer. Consume the next input character:
Consume the maximum number of characters possible, where the consumed characters are one of the identifiers in the first column of the named character references table. Append each character to the temporary buffer when it's consumed.
If the character reference was consumed as part of an attribute, and the last character matched is not a U+003B SEMICOLON character (;), and the next input character is either a U+003D EQUALS SIGN character (=) or an ASCII alphanumeric, then, for historical reasons, flush code points consumed as a character reference and switch to the return state.
Otherwise:
If the last character matched is not a U+003B SEMICOLON character (;), then this is a missing-semicolon-after-character-reference parse error.
Set the temporary buffer to the empty string. Append one or two characters corresponding to the character reference name (as given by the second column of the named character references table) to the temporary buffer.
If the markup contains (not in an attribute) the string I'm ¬it; I
    tell you, the character reference is parsed as "not", as in, I'm ¬it;
    I tell you (and this is a parse error). But if the markup was I'm
    ∉ I tell you, the character reference would be parsed as "notin;", resulting
    in I'm ∉ I tell you (and no parse error).
However, if the markup contains the string I'm ¬it; I tell you
    in an attribute, no character reference is parsed and string remains intact (and there is no
    parse error).
Consume the next input character:
Set the character reference code to zero (0).
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Consume the next input character:
Check the character reference code:
If the number is 0x00, then this is a null-character-reference parse error. Set the character reference code to 0xFFFD.
If the number is greater than 0x10FFFF, then this is a character-reference-outside-unicode-range parse error. Set the character reference code to 0xFFFD.
If the number is a surrogate, then this is a surrogate-character-reference parse error. Set the character reference code to 0xFFFD.
If the number is a noncharacter, then this is a noncharacter-character-reference parse error.
If the number is 0x0D, or a control that's not ASCII whitespace, then this is a control-character-reference parse error. If the number is one of the numbers in the first column of the following table, then find the row with that number in the first column, and set the character reference code to the number in the second column of that row.
| Number | Code point | |
|---|---|---|
| 0x80 | 0x20AC | EURO SIGN (€) | 
| 0x82 | 0x201A | SINGLE LOW-9 QUOTATION MARK (‚) | 
| 0x83 | 0x0192 | LATIN SMALL LETTER F WITH HOOK (ƒ) | 
| 0x84 | 0x201E | DOUBLE LOW-9 QUOTATION MARK („) | 
| 0x85 | 0x2026 | HORIZONTAL ELLIPSIS (…) | 
| 0x86 | 0x2020 | DAGGER (†) | 
| 0x87 | 0x2021 | DOUBLE DAGGER (‡) | 
| 0x88 | 0x02C6 | MODIFIER LETTER CIRCUMFLEX ACCENT (ˆ) | 
| 0x89 | 0x2030 | PER MILLE SIGN (‰) | 
| 0x8A | 0x0160 | LATIN CAPITAL LETTER S WITH CARON (Š) | 
| 0x8B | 0x2039 | SINGLE LEFT-POINTING ANGLE QUOTATION MARK (‹) | 
| 0x8C | 0x0152 | LATIN CAPITAL LIGATURE OE (Œ) | 
| 0x8E | 0x017D | LATIN CAPITAL LETTER Z WITH CARON (Ž) | 
| 0x91 | 0x2018 | LEFT SINGLE QUOTATION MARK (‘) | 
| 0x92 | 0x2019 | RIGHT SINGLE QUOTATION MARK (’) | 
| 0x93 | 0x201C | LEFT DOUBLE QUOTATION MARK (“) | 
| 0x94 | 0x201D | RIGHT DOUBLE QUOTATION MARK (”) | 
| 0x95 | 0x2022 | BULLET (•) | 
| 0x96 | 0x2013 | EN DASH (–) | 
| 0x97 | 0x2014 | EM DASH (—) | 
| 0x98 | 0x02DC | SMALL TILDE (˜) | 
| 0x99 | 0x2122 | TRADE MARK SIGN (™) | 
| 0x9A | 0x0161 | LATIN SMALL LETTER S WITH CARON (š) | 
| 0x9B | 0x203A | SINGLE RIGHT-POINTING ANGLE QUOTATION MARK (›) | 
| 0x9C | 0x0153 | LATIN SMALL LIGATURE OE (œ) | 
| 0x9E | 0x017E | LATIN SMALL LETTER Z WITH CARON (ž) | 
| 0x9F | 0x0178 | LATIN CAPITAL LETTER Y WITH DIAERESIS (Ÿ) | 
Set the temporary buffer to the empty string. Append a code point equal to the character reference code to the temporary buffer. Flush code points consumed as a character reference. Switch to the return state.
The input to the tree construction stage is a sequence of tokens from the
  tokenization stage. The tree construction stage is associated with a DOM
  Document object when a parser is created. The "output" of this stage consists of
  dynamically modifying or extending that document's DOM tree.
This specification does not define when an interactive user agent has to render the
  Document so that it is available to the user, or when it has to begin accepting user
  input.
As each token is emitted from the tokenizer, the user agent must follow the appropriate steps from the following list, known as the tree construction dispatcher:
annotation-xml element and the token is a start tag whose tag name is "svg"The next token is the token that is about to be processed by the tree construction dispatcher (even if the token is subsequently just ignored).
A node is a MathML text integration point if it is one of the following elements:
mi elementmo elementmn elementms elementmtext elementA node is an HTML integration point if it is one of the following elements:
annotation-xml element whose start tag token had an
   attribute with the name "encoding" whose value was an ASCII case-insensitive match
   for the string "text/html"annotation-xml element whose start tag token had an
   attribute with the name "encoding" whose value was an ASCII case-insensitive match
   for the string "application/xhtml+xml"foreignObject elementdesc elementtitle elementIf the node in question is the context element passed to the HTML fragment parsing algorithm, then the start tag token for that element is the "fake" token created during by that HTML fragment parsing algorithm.
Not all of the tag names mentioned below are conformant tag names in this specification; many are included to handle legacy content. They still form part of the algorithm that implementations are required to implement to claim conformance.
The algorithm described below places no limit on the depth of the DOM tree
  generated, or on the length of tag names, attribute names, attribute values, Text
  nodes, etc. While implementers are encouraged to avoid arbitrary limits, it is
  recognized that practical concerns will likely force user agents to impose nesting depth
  constraints.
While the parser is processing a token, it can enable or disable foster parenting. This affects the following algorithm.
The appropriate place for inserting a node, optionally using a particular override target, is the position in an element returned by running the following steps:
If there was an override target specified, then let target be the override target.
Otherwise, let target be the current node.
Determine the adjusted insertion location using the first matching steps from the following list:
table, tbody, tfoot,
     thead, or tr elementFoster parenting happens when content is misnested in tables.
Run these substeps:
Let last template be the last template element in the
       stack of open elements, if any.
Let last table be the last table element in the
       stack of open elements, if any.
If there is a last template and either there is no last table, or there is one, but last template is lower (more recently added) than last table in the stack of open elements, then: let adjusted insertion location be inside last template's template contents, after its last child (if any), and abort these steps.
If there is no last table,  then let adjusted insertion
       location be inside the first element in the stack of open elements (the
       html element), after its last child (if any), and abort these steps.
       (fragment case)
If last table has a parent node, then let adjusted insertion location be inside last table's parent node, immediately before last table, and abort these steps.
Let previous element be the element immediately above last table in the stack of open elements.
Let adjusted insertion location be inside previous element, after its last child (if any).
These steps are involved in part because it's possible for elements, the
      table element in this case in particular, to have been moved by a script around
      in the DOM, or indeed removed from the DOM entirely, after the element was inserted by the
      parser.
Let adjusted insertion location be inside target, after its last child (if any).
If the adjusted insertion location is inside a template
    element, let it instead be inside the template element's template
    contents, after its last child (if any).
Return the adjusted insertion location.
To create an element for a token, given a
  token token, a string namespace, and a Node object
  intendedParent:
If the active speculative HTML parser is not null, then return the result of creating a speculative mock element given namespace, token's tag name, and token's attributes.
Otherwise, optionally create a speculative mock element given namespace, token's tag name, and token's attributes.
The result is not used. This step allows for a speculative fetch to be initiated from non-speculative parsing. The fetch is still speculative at this point, because, for example, by the time the element is inserted, intended parent might have been removed from the document.
Let document be intendedParent's node document.
Let localName be token's tag name.
Let is be the value of the "is" attribute in
   token, if such an attribute exists; otherwise null.
Let registry be the result of looking up a custom element registry given intendedParent.
Let definition be the result of looking up a custom element definition given registry, namespace, localName, and is.
Let willExecuteScript be true if definition is non-null and the parser was not created as part of the HTML fragment parsing algorithm; otherwise false.
If willExecuteScript is true:
Increment document's throw-on-dynamic-markup-insertion counter.
If the JavaScript execution context stack is empty, then perform a microtask checkpoint.
Push a new element queue onto document's relevant agent's custom element reactions stack.
Let element be the result of creating an element given document, localName, namespace, null, is, willExecuteScript, and registry.
This will cause custom element constructors to run, if willExecuteScript is true. However, since we incremented the throw-on-dynamic-markup-insertion counter, this cannot cause new characters to be inserted into the tokenizer, or the document to be blown away.
Append each attribute in the given token to element.
This can enqueue a custom element callback reaction for the
    attributeChangedCallback, which might run immediately (in the next
    step).
Even though the is attribute governs the creation of a customized built-in element, it is
    not present during the execution of the relevant custom element constructor; it is
    appended in this step, along with all other attributes.
If willExecuteScript is true:
Let queue be the result of popping from document's relevant agent's custom element reactions stack. (This will be the same element queue as was pushed above.)
Invoke custom element reactions in queue.
Decrement document's throw-on-dynamic-markup-insertion counter.
If element has an xmlns attribute in the XMLNS
   namespace whose value is not exactly the same as the element's namespace, that is a
   parse error. Similarly, if element has an xmlns:xlink attribute in the XMLNS namespace whose value is not the
   XLink Namespace, that is a parse error.
If element is a resettable element and not a form-associated custom element, then invoke its reset algorithm. (This initializes the element's value and checkedness based on the element's attributes.)
If element is a form-associated element and not a
   form-associated custom element, the
   form element pointer is not null, there is no
   template element on the stack of open elements, element is
   either not listed or doesn't have a form attribute, and the intendedParent is in the same
   tree as the element pointed to by the form element
   pointer, then associate element
   with the form element pointed to by the form element
   pointer and set element's parser inserted flag.
Return element.
To insert an element at the adjusted insertion location with an element element:
Let the adjusted insertion location be the appropriate place for inserting a node.
If it is not possible to insert element at the adjusted insertion location, abort these steps.
If the parser was not created as part of the HTML fragment parsing algorithm, then push a new element queue onto element's relevant agent's custom element reactions stack.
Insert element at the adjusted insertion location.
If the parser was not created as part of the HTML fragment parsing algorithm, then pop the element queue from element's relevant agent's custom element reactions stack, and invoke custom element reactions in that queue.
If the adjusted insertion location cannot accept more elements, e.g.,
  because it's a Document that already has an element child, then element is
  dropped on the floor.
To insert a foreign element, given a token token, a string namespace, and a boolean onlyAddToElementStack:
Let the adjustedInsertionLocation be the appropriate place for inserting a node.
Let element be the result of creating an element for the token given token, namespace, and the element in which the adjustedInsertionLocation finds itself.
If onlyAddToElementStack is false, then run insert an element at the adjusted insertion location with element.
Push element onto the stack of open elements so that it is the new current node.
Return element.
To insert an HTML element given a token token: insert a foreign element given token, the HTML namespace, and false.
When the steps below require the user agent to adjust MathML attributes for a token,
  then, if the token has an attribute named definitionurl, change its name to
  definitionURL (note the case difference).
When the steps below require the user agent to adjust SVG attributes for a token, then, for each attribute on the token whose attribute name is one of the ones in the first column of the following table, change the attribute's name to the name given in the corresponding cell in the second column. (This fixes the case of SVG attributes that are not all lowercase.)
| Attribute name on token | Attribute name on element | 
|---|---|
| attributename | attributeName | 
| attributetype | attributeType | 
| basefrequency | baseFrequency | 
| baseprofile | baseProfile | 
| calcmode | calcMode | 
| clippathunits | clipPathUnits | 
| diffuseconstant | diffuseConstant | 
| edgemode | edgeMode | 
| filterunits | filterUnits | 
| glyphref | glyphRef | 
| gradienttransform | gradientTransform | 
| gradientunits | gradientUnits | 
| kernelmatrix | kernelMatrix | 
| kernelunitlength | kernelUnitLength | 
| keypoints | keyPoints | 
| keysplines | keySplines | 
| keytimes | keyTimes | 
| lengthadjust | lengthAdjust | 
| limitingconeangle | limitingConeAngle | 
| markerheight | markerHeight | 
| markerunits | markerUnits | 
| markerwidth | markerWidth | 
| maskcontentunits | maskContentUnits | 
| maskunits | maskUnits | 
| numoctaves | numOctaves | 
| pathlength | pathLength | 
| patterncontentunits | patternContentUnits | 
| patterntransform | patternTransform | 
| patternunits | patternUnits | 
| pointsatx | pointsAtX | 
| pointsaty | pointsAtY | 
| pointsatz | pointsAtZ | 
| preservealpha | preserveAlpha | 
| preserveaspectratio | preserveAspectRatio | 
| primitiveunits | primitiveUnits | 
| refx | refX | 
| refy | refY | 
| repeatcount | repeatCount | 
| repeatdur | repeatDur | 
| requiredextensions | requiredExtensions | 
| requiredfeatures | requiredFeatures | 
| specularconstant | specularConstant | 
| specularexponent | specularExponent | 
| spreadmethod | spreadMethod | 
| startoffset | startOffset | 
| stddeviation | stdDeviation | 
| stitchtiles | stitchTiles | 
| surfacescale | surfaceScale | 
| systemlanguage | systemLanguage | 
| tablevalues | tableValues | 
| targetx | targetX | 
| targety | targetY | 
| textlength | textLength | 
| viewbox | viewBox | 
| viewtarget | viewTarget | 
| xchannelselector | xChannelSelector | 
| ychannelselector | yChannelSelector | 
| zoomandpan | zoomAndPan | 
When the steps below require the user agent to adjust foreign attributes for a
  token, then, if any of the attributes on the token match the strings given in the first column of
  the following table, let the attribute be a namespaced attribute, with the prefix being the string
  given in the corresponding cell in the second column, the local name being the string given in the
  corresponding cell in the third column, and the namespace being the namespace given in the
  corresponding cell in the fourth column. (This fixes the use of namespaced attributes, in
  particular lang attributes in the XML
  namespace.)
| Attribute name | Prefix | Local name | Namespace | 
|---|---|---|---|
| xlink:actuate | xlink | actuate | XLink namespace | 
| xlink:arcrole | xlink | arcrole | XLink namespace | 
| xlink:href | xlink | href | XLink namespace | 
| xlink:role | xlink | role | XLink namespace | 
| xlink:show | xlink | show | XLink namespace | 
| xlink:title | xlink | title | XLink namespace | 
| xlink:type | xlink | type | XLink namespace | 
| xml:lang | xml | lang | XML namespace | 
| xml:space | xml | space | XML namespace | 
| xmlns | (none) | xmlns | XMLNS namespace | 
| xmlns:xlink | xmlns | xlink | XMLNS namespace | 
When the steps below require the user agent to insert a character while processing a token, the user agent must run the following steps:
Let data be the characters passed to the algorithm, or, if no characters were explicitly specified, the character of the character token being processed.
Let the adjusted insertion location be the appropriate place for inserting a node.
If the adjusted insertion location is in a Document node,
    then return.
    
The DOM will not let Document nodes have Text node
    children, so they are dropped on the floor.
If there is a Text node immediately before the adjusted insertion
    location, then append data to that Text node's data.
Otherwise, create a new Text node whose data is data and
    whose node document is the same as that of the
    element in which the adjusted insertion location finds itself, and insert
    the newly created node at the adjusted insertion location.
Here are some sample inputs to the parser and the corresponding number of Text
   nodes that they result in, assuming a user agent that executes scripts.
| Input | Number of Textnodes | 
|---|---|
|  | One Textnode in the document, containing "AB". | 
|  | Three Textnodes; "A" before the script, the script's contents, and "BC" after the script (the parser appends to theTextnode created by the script). | 
|  | Two adjacent Textnodes in the document, containing "A" and "BC". | 
|  | One Textnode before the table, containing "ABCD". (This is caused by foster parenting.) | 
|  | One Textnode before the table, containing "A B C" (A-space-B-space-C). (This is caused by foster parenting.) | 
|  | One Textnode before the table, containing "A BC" (A-space-B-C), and oneTextnode inside the table (as a child of atbody) with a single space character. (Space characters separated from non-space characters by non-character tokens are not affected by foster parenting, even if those other tokens then get ignored.) | 
When the steps below require the user agent to insert a comment while processing a comment token, optionally with an explicit insertion position position, the user agent must run the following steps:
Let data be the data given in the comment token being processed.
If position was specified, then let the adjusted insertion location be position. Otherwise, let adjusted insertion location be the appropriate place for inserting a node.
Create a Comment node whose data attribute is set to
   data and whose node document is
   the same as that of the node in which the adjusted insertion location finds
   itself.
Insert the newly created node at the adjusted insertion location.
The generic raw text element parsing algorithm and the generic RCDATA element parsing algorithm consist of the following steps. These algorithms are always invoked in response to a start tag token.
Insert an HTML element for the token.
If the algorithm that was invoked is the generic raw text element parsing algorithm, switch the tokenizer to the RAWTEXT state; otherwise the algorithm invoked was the generic RCDATA element parsing algorithm, switch the tokenizer to the RCDATA state.
Set the original insertion mode to the current insertion mode.
Then, switch the insertion mode to "text".
When the steps below require the UA to generate implied end tags, then, while the
  current node is a dd element, a dt element, an
  li element, an optgroup element, an option element, a
  p element, an rb element, an rp element, an rt
  element, or an rtc element, the UA must pop the current node off the
  stack of open elements.
If a step requires the UA to generate implied end tags but lists an element to exclude from the process, then the UA must perform the above steps as if that element was not in the above list.
When the steps below require the UA to generate all implied end tags thoroughly,
  then, while the current node is a caption element, a
  colgroup element, a dd element, a dt element, an
  li element, an optgroup element, an option element, a
  p element, an rb element, an rp element, an rt
  element, an rtc element, a tbody element, a td element, a
  tfoot element, a th element, a thead element, or a
  tr element, the UA must pop the current node off the
  stack of open elements.
A Document object has an associated parser cannot change the mode flag
  (a boolean). It is initially false.
When the user agent is to apply the rules for the "initial" insertion mode, the user agent must handle the token as follows:
Ignore the token.
Insert a comment as the last child of the Document object.
If the DOCTYPE token's name is not "html", or the token's public
    identifier is not missing, or the token's system identifier is neither missing nor
    "about:legacy-compat", then there is a parse error.
Append a DocumentType node to the Document node, with its name set to the name given in the DOCTYPE token, or the
    empty string if the name was missing; its public
    ID set to the public identifier given in the DOCTYPE token, or the empty string if the
    public identifier was missing; and its system ID
    set to the system identifier given in the DOCTYPE token, or the empty string if the system
    identifier was missing.
This also ensures that the DocumentType node is returned as the
    value of the doctype attribute of the
    Document object.
Then, if the document is not an iframe
    srcdoc document, and the parser cannot
    change the mode flag is false, and the DOCTYPE token matches one of the conditions in the
    following list, then set the Document to quirks mode:
html". -//W3O//DTD W3 HTML Strict 3.0//EN//" -/W3C/DTD HTML 4.0 Transitional/EN" HTML" http://www.ibm.com/data/dtd/v11/ibmxhtml1-transitional.dtd" +//Silmaril//dtd html Pro v0r11 19970101//" -//AS//DTD HTML 3.0 asWedit + extensions//" -//AdvaSoft Ltd//DTD HTML 3.0 asWedit + extensions//" -//IETF//DTD HTML 2.0 Level 1//" -//IETF//DTD HTML 2.0 Level 2//" -//IETF//DTD HTML 2.0 Strict Level 1//" -//IETF//DTD HTML 2.0 Strict Level 2//" -//IETF//DTD HTML 2.0 Strict//" -//IETF//DTD HTML 2.0//" -//IETF//DTD HTML 2.1E//" -//IETF//DTD HTML 3.0//" -//IETF//DTD HTML 3.2 Final//" -//IETF//DTD HTML 3.2//" -//IETF//DTD HTML 3//" -//IETF//DTD HTML Level 0//" -//IETF//DTD HTML Level 1//" -//IETF//DTD HTML Level 2//" -//IETF//DTD HTML Level 3//" -//IETF//DTD HTML Strict Level 0//" -//IETF//DTD HTML Strict Level 1//" -//IETF//DTD HTML Strict Level 2//" -//IETF//DTD HTML Strict Level 3//" -//IETF//DTD HTML Strict//" -//IETF//DTD HTML//" -//Metrius//DTD Metrius Presentational//" -//Microsoft//DTD Internet Explorer 2.0 HTML Strict//" -//Microsoft//DTD Internet Explorer 2.0 HTML//" -//Microsoft//DTD Internet Explorer 2.0 Tables//" -//Microsoft//DTD Internet Explorer 3.0 HTML Strict//" -//Microsoft//DTD Internet Explorer 3.0 HTML//" -//Microsoft//DTD Internet Explorer 3.0 Tables//" -//Netscape Comm. Corp.//DTD HTML//" -//Netscape Comm. Corp.//DTD Strict HTML//" -//O'Reilly and Associates//DTD HTML 2.0//" -//O'Reilly and Associates//DTD HTML Extended 1.0//" -//O'Reilly and Associates//DTD HTML Extended Relaxed 1.0//" -//SQ//DTD HTML 2.0 HoTMetaL + extensions//" -//SoftQuad Software//DTD HoTMetaL PRO 6.0::19990601::extensions to HTML 4.0//" -//SoftQuad//DTD HoTMetaL PRO 4.0::19971010::extensions to HTML 4.0//" -//Spyglass//DTD HTML 2.0 Extended//" -//Sun Microsystems Corp.//DTD HotJava HTML//" -//Sun Microsystems Corp.//DTD HotJava Strict HTML//" -//W3C//DTD HTML 3 1995-03-24//" -//W3C//DTD HTML 3.2 Draft//" -//W3C//DTD HTML 3.2 Final//" -//W3C//DTD HTML 3.2//" -//W3C//DTD HTML 3.2S Draft//" -//W3C//DTD HTML 4.0 Frameset//" -//W3C//DTD HTML 4.0 Transitional//" -//W3C//DTD HTML Experimental 19960712//" -//W3C//DTD HTML Experimental 970421//" -//W3C//DTD W3 HTML//" -//W3O//DTD W3 HTML 3.0//" -//WebTechs//DTD Mozilla HTML 2.0//" -//WebTechs//DTD Mozilla HTML//" -//W3C//DTD HTML 4.01 Frameset//" -//W3C//DTD HTML 4.01 Transitional//" Otherwise, if the document is not an iframe srcdoc document, and the parser cannot change
    the mode flag is false, and the DOCTYPE token matches one of the conditions in the
    following list, then set the Document to limited-quirks mode:
-//W3C//DTD XHTML 1.0 Frameset//" -//W3C//DTD XHTML 1.0 Transitional//" -//W3C//DTD HTML 4.01 Frameset//" -//W3C//DTD HTML 4.01 Transitional//" The system identifier and public identifier strings must be compared to the values given in the lists above in an ASCII case-insensitive manner. A system identifier whose value is the empty string is not considered missing for the purposes of the conditions above.
Then, switch the insertion mode to "before html".
If the document is not an iframe srcdoc document, then this is a parse
    error; if the parser cannot change the mode flag is false, set the
    Document to quirks mode.
In any case, switch the insertion mode to "before html", then reprocess the token.
When the user agent is to apply the rules for the "before html" insertion mode, the user agent must handle the token as follows:
Parse error. Ignore the token.
Insert a comment as the last child of the Document object.
Ignore the token.
Create an element for the token in the HTML namespace, with the
    Document as the intended parent. Append it to the Document object. Put
    this element in the stack of open elements.
Switch the insertion mode to "before head".
Act as described in the "anything else" entry below.
Parse error. Ignore the token.
Create an html element whose node document is the Document object. Append
    it to the Document object. Put this element in the stack of open
    elements.
Switch the insertion mode to "before head", then reprocess the token.
The document element can end up being removed from the Document
  object, e.g. by scripts; nothing in particular happens in such cases, content continues being
  appended to the nodes as described in the next section.
When the user agent is to apply the rules for the "before head" insertion mode, the user agent must handle the token as follows:
Ignore the token.
Parse error. Ignore the token.
Process the token using the rules for the "in body" insertion mode.
Insert an HTML element for the token.
Set the head element pointer to the newly created
    head element.
Switch the insertion mode to "in head".
Act as described in the "anything else" entry below.
Parse error. Ignore the token.
Insert an HTML element for a "head" start tag token with no attributes.
Set the head element pointer to the newly created
    head element.
Switch the insertion mode to "in head".
Reprocess the current token.
When the user agent is to apply the rules for the "in head" insertion mode, the user agent must handle the token as follows:
Parse error. Ignore the token.
Process the token using the rules for the "in body" insertion mode.
Insert an HTML element for the token. Immediately pop the current node off the stack of open elements.
Acknowledge the token's self-closing flag, if it is set.
Insert an HTML element for the token. Immediately pop the current node off the stack of open elements.
Acknowledge the token's self-closing flag, if it is set.
If the active speculative HTML parser is null, then:
If the element has a charset attribute, and getting an encoding from
     its value results in an encoding, and the
     confidence is currently tentative,
     then change the encoding to the resulting encoding.
Otherwise, if the element has an http-equiv
     attribute whose value is an ASCII case-insensitive match for the string "Content-Type", and the element has a content attribute, and applying the algorithm for
     extracting a character encoding from a meta element to that attribute's
     value returns an encoding, and the
     confidence is currently tentative,
     then change the encoding to the extracted encoding.
The speculative HTML parser doesn't speculatively apply character encoding declarations in order to reduce implementation complexity.
Follow the generic RCDATA element parsing algorithm.
Follow the generic raw text element parsing algorithm.
Insert an HTML element for the token.
Switch the insertion mode to "in head noscript".
Run these steps:
Let the adjusted insertion location be the appropriate place for inserting a node.
Create an element for the token in the HTML namespace, with the intended parent being the element in which the adjusted insertion location finds itself.
Set the element's parser document to the Document, and set the
      element's force async to false.
This ensures that, if the script is external, any document.write() calls in the script will execute in-line,
      instead of blowing the document away, as would happen in most other cases. It also prevents
      the script from executing until the end tag is seen.
If the parser was created as part of the HTML fragment parsing algorithm,
     then set the script element's already started to true.
     (fragment case)
If the parser was invoked via the document.write() or document.writeln() methods, then optionally set the
     script element's already started to true. (For example, the user
     agent might use this clause to prevent execution of cross-origin
     scripts inserted via document.write() under slow
     network conditions, or when the page has already taken a long time to load.)
Insert the newly created element at the adjusted insertion location.
Push the element onto the stack of open elements so that it is the new current node.
Switch the tokenizer to the script data state.
Set the original insertion mode to the current insertion mode.
Switch the insertion mode to "text".
Pop the current node (which will be the head element) off the
    stack of open elements.
Switch the insertion mode to "after head".
Act as described in the "anything else" entry below.
Run these steps:
Let templateStartTag be the start tag.
Insert a marker at the end of the list of active formatting elements.
Set the frameset-ok flag to "not ok".
Switch the insertion mode to "in template".
Push "in template" onto the stack of template insertion modes so that it is the new current template insertion mode.
Let the adjustedInsertionLocation be the appropriate place for inserting a node.
Let intendedParent be the element in which the adjustedInsertionLocation finds itself.
Let document be intendedParent's node document.
If any of the following are false:
shadowrootmode is not in the None state;then insert an HTML element for the token.
Otherwise:
Let declarativeShadowHostElement be adjusted current node.
Let template be the result of insert a foreign element for templateStartTag, with HTML namespace and true.
Let mode be templateStartTag's shadowrootmode attribute's value.
Let clonable be true if templateStartTag has a shadowrootclonable attribute; otherwise
       false.
Let serializable be true if templateStartTag has a shadowrootserializable attribute;
       otherwise false.
Let delegatesFocus be true if templateStartTag has a shadowrootdelegatesfocus attribute;
       otherwise false.
If declarativeShadowHostElement is a shadow host, then insert an element at the adjusted insertion location with template.
Otherwise:
Let registry be declarativeShadowHostElement's custom element registry.
If templateStartTag has a shadowrootcustomelementregistry
         attribute, then set registry to null.
Attach a shadow root with
          declarativeShadowHostElement, mode, clonable,
          serializable, delegatesFocus, "named", and
          registry.
If an exception is thrown, then catch it and:
Insert an element at the adjusted insertion location with template.
The user agent may report an error to the developer console.
Return.
Let shadow be declarativeShadowHostElement's shadow root.
Set shadow's declarative to true.
Set template's template contents property to shadow.
Set shadow's available to element internals to true.
If templateStartTag has a shadowrootcustomelementregistry
         attribute, then set shadow's keep custom element registry null to
         true.
If there is no template element on the stack of open elements, then
    this is a parse error; ignore the token.
Otherwise, run these steps:
If the current node is not a template element, then this is a
     parse error.
Pop elements from the stack of open elements until a template
     element has been popped from the stack.
Pop the current template insertion mode off the stack of template insertion modes.
Parse error. Ignore the token.
Pop the current node (which will be the head element) off the
    stack of open elements.
Switch the insertion mode to "after head".
Reprocess the token.
When the user agent is to apply the rules for the "in head noscript" insertion mode, the user agent must handle the token as follows:
Parse error. Ignore the token.
Process the token using the rules for the "in body" insertion mode.
Pop the current node (which will be a noscript element) from the
    stack of open elements; the new current node will be a
    head element.
Switch the insertion mode to "in head".
Process the token using the rules for the "in head" insertion mode.
Act as described in the "anything else" entry below.
Parse error. Ignore the token.
Pop the current node (which will be a noscript element) from the
    stack of open elements; the new current node will be a
    head element.
Switch the insertion mode to "in head".
Reprocess the token.
When the user agent is to apply the rules for the "after head" insertion mode, the user agent must handle the token as follows:
Parse error. Ignore the token.
Process the token using the rules for the "in body" insertion mode.
Insert an HTML element for the token.
Set the frameset-ok flag to "not ok".
Switch the insertion mode to "in body".
Insert an HTML element for the token.
Switch the insertion mode to "in frameset".
Push the node pointed to by the head element pointer onto
    the stack of open elements.
Process the token using the rules for the "in head" insertion mode.
Remove the node pointed to by the head element pointer
    from the stack of open elements. (It might not be the current node at
    this point.)
The head element pointer cannot be null at
    this point.
Process the token using the rules for the "in head" insertion mode.
Act as described in the "anything else" entry below.
Parse error. Ignore the token.
Insert an HTML element for a "body" start tag token with no attributes.
Switch the insertion mode to "in body".
Reprocess the current token.
When the user agent is to apply the rules for the "in body" insertion mode, the user agent must handle the token as follows:
Parse error. Ignore the token.
Reconstruct the active formatting elements, if any.
Set the frameset-ok flag to "not ok".
Parse error. Ignore the token.
If there is a template element on the stack of open elements, then
    ignore the token.
Otherwise, for each attribute on the token, check to see if the attribute is already present on the top element of the stack of open elements. If it is not, add the attribute and its corresponding value to that element.
Process the token using the rules for the "in head" insertion mode.
If the stack of open elements has only one node on it, or if the second element
    on the stack of open elements is not a body element, or if there is a
    template element on the stack of open elements, then ignore the token.
    (fragment case or there is a template element on the stack)
Otherwise, set the frameset-ok flag to "not ok"; then, for each attribute on the
    token, check to see if the attribute is already present on the body element (the
    second element) on the stack of open elements, and if it is not, add the attribute
    and its corresponding value to that element.
If the stack of open elements has only one node on it, or if the second element
    on the stack of open elements is not a body element, then ignore the
    token. (fragment case or there is a template element on the
    stack)
If the frameset-ok flag is set to "not ok", ignore the token.
Otherwise, run the following steps:
Remove the second element on the stack of open elements from its parent node, if it has one.
Pop all the nodes from the bottom of the stack of open elements, from the
     current node up to, but not including, the root html element.
Insert an HTML element for the token.
Switch the insertion mode to "in frameset".
If the stack of template insertion modes is not empty, then process the token using the rules for the "in template" insertion mode.
Otherwise, follow these steps:
If there is a node in the stack of open elements that is not either a
     dd element, a dt element, an li element, an
     optgroup element, an option element, a p element, an
     rb element, an rp element, an rt element, an
     rtc element, a tbody element, a td element, a
     tfoot element, a th element, a thead element, a
     tr element, the body element, or the html element, then
     this is a parse error.
If the stack of open elements does not have a body element in scope, this is a parse error;
    ignore the token.
Otherwise, if there is a node in the stack of open elements that is not either a
    dd element, a dt element, an li element, an
    optgroup element, an option element, a p element, an
    rb element, an rp element, an rt element, an
    rtc element, a tbody element, a td element, a
    tfoot element, a th element, a thead element, a
    tr element, the body element, or the html element, then
    this is a parse error.
Switch the insertion mode to "after body".
If the stack of open elements does not have a body element in scope, this is a parse error;
    ignore the token.
Otherwise, if there is a node in the stack of open elements that is not either a
    dd element, a dt element, an li element, an
    optgroup element, an option element, a p element, an
    rb element, an rp element, an rt element, an
    rtc element, a tbody element, a td element, a
    tfoot element, a th element, a thead element, a
    tr element, the body element, or the html element, then
    this is a parse error.
Switch the insertion mode to "after body".
Reprocess the token.
If the stack of open elements has a
    p element in button scope, then close a p
    element.
Insert an HTML element for the token.
If the stack of open elements has a
    p element in button scope, then close a p
    element.
If the current node is an HTML element whose tag name is one of "h1", "h2", "h3", "h4", "h5", or "h6", then this is a parse error; pop the current node off the stack of open elements.
Insert an HTML element for the token.
If the stack of open elements has
    a p element in button scope, then close a p
    element.
Insert an HTML element for the token.
If the next token is a U+000A LINE FEED (LF) character token, then ignore that
    token and move on to the next one. (Newlines at the start of pre blocks are ignored
    as an authoring convenience.)
Set the frameset-ok flag to "not ok".
If the form element pointer is not null, and there is
    no template element on the stack of open elements, then this is a
    parse error; ignore the token.
Otherwise:
If the stack of open elements has
     a p element in button scope, then close a p
     element.
Insert an HTML element for the token, and, if there is no template
     element on the stack of open elements, set the form element pointer to point to the element created.
Run these steps:
Set the frameset-ok flag to "not ok".
Initialize node to be the current node (the bottommost node of the stack).
Loop: If node is an li element, then run these
      substeps:
Generate implied end tags, except for li elements.
If the current node is not an li element, then this is a
       parse error.
Pop elements from the stack of open elements until an li
       element has been popped from the stack.
Jump to the step labeled done below.
If node is in the special category, but is not an
     address, div, or p element, then jump to the step
     labeled done below.
Otherwise, set node to the previous entry in the stack of open elements and return to the step labeled loop.
Done: If the stack of open elements has a p element in button scope, then close a
     p element.
Finally, insert an HTML element for the token.
Run these steps:
Set the frameset-ok flag to "not ok".
Initialize node to be the current node (the bottommost node of the stack).
Loop: If node is a dd element, then run these
      substeps:
Generate implied end tags, except for dd elements.
If the current node is not a dd element, then this is a
       parse error.
Pop elements from the stack of open elements until a dd
       element has been popped from the stack.
Jump to the step labeled done below.
If node is a dt element, then run these substeps:
Generate implied end tags, except for dt elements.
If the current node is not a dt element, then this is a
       parse error.
Pop elements from the stack of open elements until a dt
       element has been popped from the stack.
Jump to the step labeled done below.
If node is in the special category, but is not an
     address, div, or p element, then jump to the step
     labeled done below.
Otherwise, set node to the previous entry in the stack of open elements and return to the step labeled loop.
Done: If the stack of open elements has a p element in button scope, then close a
     p element.
Finally, insert an HTML element for the token.
If the stack of open elements has a
    p element in button scope, then close a p
    element.
Insert an HTML element for the token.
Switch the tokenizer to the PLAINTEXT state.
Once a start tag with the tag name "plaintext" has been seen, all remaining
    tokens will be character tokens (and a final end-of-file token) because there is no way to
    switch the tokenizer out of the PLAINTEXT state. However, as the tree builder
    remains in its existing insertion mode, it might reconstruct the active formatting
    elements while processing those character tokens. This means that the parser can
    insert other elements into the plaintext element.
If the stack of open elements has a
      button element in scope, then run these substeps:
Pop elements from the stack of open elements until a button
       element has been popped from the stack.
Insert an HTML element for the token.
Set the frameset-ok flag to "not ok".
If the stack of open elements does not have an element in scope that is an HTML element with the same tag name as that of the token, then this is a parse error; ignore the token.
Otherwise, run these steps:
If the current node is not an HTML element with the same tag name as that of the token, then this is a parse error.
Pop elements from the stack of open elements until an HTML element with the same tag name as the token has been popped from the stack.
If there is no template element on the stack of open elements, then
    run these substeps:
Let node be the element that the form
     element pointer is set to, or null if it is not set to an element.
Set the form element pointer to null.
If node is null or if the stack of open elements does not have node in scope, then this is a parse error; return and ignore the token.
If the current node is not node, then this is a parse error.
Remove node from the stack of open elements.
If there is a template element on the stack of open
    elements, then run these substeps instead:
If the stack of open elements does not have a form element in scope, then this is a parse
     error; return and ignore the token.
If the current node is not a form element, then this is a
     parse error.
Pop elements from the stack of open elements until a form
     element has been popped from the stack.
If the stack of open elements does not have a p element in button scope, then this is a parse
    error; insert an HTML element for a "p" start tag token with no
    attributes.
If the stack of open elements does not have an li element in list item scope, then this is a parse
    error; ignore the token.
Otherwise, run these steps:
Generate implied end tags, except for li elements.
If the current node is not an li element, then this is a
     parse error.
Pop elements from the stack of open elements until an li
     element has been popped from the stack.
If the stack of open elements does not have an element in scope that is an HTML element with the same tag name as that of the token, then this is a parse error; ignore the token.
Otherwise, run these steps:
Generate implied end tags, except for HTML elements with the same tag name as the token.
If the current node is not an HTML element with the same tag name as that of the token, then this is a parse error.
Pop elements from the stack of open elements until an HTML element with the same tag name as the token has been popped from the stack.
If the stack of open elements does not have an element in scope that is an HTML element and whose tag name is one of "h1", "h2", "h3", "h4", "h5", or "h6", then this is a parse error; ignore the token.
Otherwise, run these steps:
If the current node is not an HTML element with the same tag name as that of the token, then this is a parse error.
Pop elements from the stack of open elements until an HTML element whose tag name is one of "h1", "h2", "h3", "h4", "h5", or "h6" has been popped from the stack.
Take a deep breath, then act as described in the "any other end tag" entry below.
If the list of active formatting elements contains an a element
    between the end of the list and the last marker on
    the list (or the start of the list if there is no marker on the list), then this is a parse
    error; run the adoption agency algorithm for the token, then remove that
    element from the list of active formatting elements and the stack of open
    elements if the adoption agency algorithm didn't already remove it (it might
    not have if the element is not in table
    scope).
In the non-conforming stream
    <a href="a">a<table><a href="b">b</table>x, the first
    a element would be closed upon seeing the second one, and the "x" character would
    be inside a link to "b", not to "a". This is despite the fact that the outer a
    element is not in table scope (meaning that a regular </a> end tag at the start
    of the table wouldn't close the outer a element). The result is that the two
    a elements are indirectly nested inside each other — non-conforming markup
    will often result in non-conforming DOMs when parsed.
Reconstruct the active formatting elements, if any.
Insert an HTML element for the token. Push onto the list of active formatting elements that element.
Reconstruct the active formatting elements, if any.
Insert an HTML element for the token. Push onto the list of active formatting elements that element.
Reconstruct the active formatting elements, if any.
If the stack of open elements has a
    nobr element in scope, then this is a parse error; run the
    adoption agency algorithm for the token, then once again reconstruct the
    active formatting elements, if any.
Insert an HTML element for the token. Push onto the list of active formatting elements that element.
Run the adoption agency algorithm for the token.
Reconstruct the active formatting elements, if any.
Insert an HTML element for the token.
Insert a marker at the end of the list of active formatting elements.
Set the frameset-ok flag to "not ok".
If the stack of open elements does not have an element in scope that is an HTML element with the same tag name as that of the token, then this is a parse error; ignore the token.
Otherwise, run these steps:
If the current node is not an HTML element with the same tag name as that of the token, then this is a parse error.
Pop elements from the stack of open elements until an HTML element with the same tag name as the token has been popped from the stack.
If the Document is not set to quirks mode, and the
    stack of open elements has a
    p element in button scope, then close a p
    element.
Insert an HTML element for the token.
Set the frameset-ok flag to "not ok".
Switch the insertion mode to "in table".
Parse error. Drop the attributes from the token, and act as described in the next entry; i.e. act as if this was a "br" start tag token with no attributes, rather than the end tag token that it actually is.
Reconstruct the active formatting elements, if any.
Insert an HTML element for the token. Immediately pop the current node off the stack of open elements.
Acknowledge the token's self-closing flag, if it is set.
Set the frameset-ok flag to "not ok".
If the parser was created as part of the HTML fragment parsing algorithm
    (fragment case) and the context
    element passed to that algorithm is a select element:
Ignore the token.
Return.
If the stack of open elements has a
    select element in scope:
Pop elements from the stack of open elements until a select
     element has been popped from the stack.
Reconstruct the active formatting elements, if any.
Insert an HTML element for the token. Immediately pop the current node off the stack of open elements.
Acknowledge the token's self-closing flag, if it is set.
If the token does not have an attribute with the name "type", or if it does, but that
    attribute's value is not an ASCII case-insensitive match for the string "hidden", then: set the frameset-ok flag to "not ok".
Insert an HTML element for the token. Immediately pop the current node off the stack of open elements.
Acknowledge the token's self-closing flag, if it is set.
If the stack of open elements has a
    p element in button scope, then close a p
    element.
If the stack of open elements has a
    select element in scope:
If the stack of open elements has an
     option element in scope or has an
     optgroup element in scope, then this is a parse
     error.
Insert an HTML element for the token. Immediately pop the current node off the stack of open elements.
Acknowledge the token's self-closing flag, if it is set.
Set the frameset-ok flag to "not ok".
Parse error. Change the token's tag name to "img" and reprocess it. (Don't ask.)
Run these steps:
Insert an HTML element for the token.
If the next token is a U+000A LINE FEED (LF) character token, then ignore
     that token and move on to the next one. (Newlines at the start of textarea
     elements are ignored as an authoring convenience.)
Switch the tokenizer to the RCDATA state.
Set the original insertion mode to the current insertion mode.
Set the frameset-ok flag to "not ok".
Switch the insertion mode to "text".
If the stack of open elements has a
    p element in button scope, then close a p
    element.
Reconstruct the active formatting elements, if any.
Set the frameset-ok flag to "not ok".
Follow the generic raw text element parsing algorithm.
Set the frameset-ok flag to "not ok".
Follow the generic raw text element parsing algorithm.
Follow the generic raw text element parsing algorithm.
If the parser was created as part of the HTML fragment parsing algorithm
    (fragment case) and the context
    element passed to that algorithm is a select element:
Ignore the token.
Otherwise, if the stack of open elements has a select element in scope:
Ignore the token.
Pop elements from the stack of open elements until a select
     element has been popped from the stack.
Otherwise:
Insert an HTML element for the token.
Set the frameset-ok flag to "not ok".
If the stack of open elements has a
    select element in scope:
Generate implied end tags except for optgroup
     elements.
If the stack of open elements has an
     option element in scope, then this is a parse error.
Otherwise:
If the current node is an option element, then pop the
     current node off the stack of open elements.
Reconstruct the active formatting elements, if any.
Insert an HTML element for the token.
If the stack of open elements has a
    select element in scope:
If the stack of open elements has an
     option element in scope or has an
     optgroup element in scope, then this is a parse
     error.
Otherwise, if the current node is an option element, then pop the
    current node from the stack of open elements.
Reconstruct the active formatting elements, if any.
Insert an HTML element for the token.
Let option be the first option element in the stack of open
    elements.
Run the steps for "any other end tag."
If option is no longer in the stack of open elements, then run maybe clone an option into selectedcontent given option.
If the stack of open elements has a
    ruby element in scope, then generate implied end tags. If the
    current node is not now a ruby element, this is a
    parse error.
Insert an HTML element for the token.
If the stack of open elements has a
    ruby element in scope, then generate implied end tags, except
    for rtc elements. If the current node is not now a rtc
    element or a ruby element, this is a parse error.
Insert an HTML element for the token.
Reconstruct the active formatting elements, if any.
Adjust MathML attributes for the token. (This fixes the case of MathML attributes that are not all lowercase.)
Adjust foreign attributes for the token. (This fixes the use of namespaced attributes, in particular XLink.)
Insert a foreign element for the token, with MathML namespace and false.
If the token has its self-closing flag set, pop the current node off the stack of open elements and acknowledge the token's self-closing flag.
Reconstruct the active formatting elements, if any.
Adjust SVG attributes for the token. (This fixes the case of SVG attributes that are not all lowercase.)
Adjust foreign attributes for the token. (This fixes the use of namespaced attributes, in particular XLink in SVG.)
Insert a foreign element for the token, with SVG namespace and false.
If the token has its self-closing flag set, pop the current node off the stack of open elements and acknowledge the token's self-closing flag.
Parse error. Ignore the token.
Reconstruct the active formatting elements, if any.
Insert an HTML element for the token.
This element will be an ordinary element. With one exception: if
    the scripting flag is disabled, it can also be a noscript element.
Run these steps:
Initialize node to be the current node (the bottommost node of the stack).
Loop: If node is an HTML element with the same tag name as the token, then:
Generate implied end tags, except for HTML elements with the same tag name as the token.
If node is not the current node, then this is a parse error.
Pop all the nodes from the current node up to node, including node, then stop these steps.
Otherwise, if node is in the special category, then this is a parse error; ignore the token, and return.
Set node to the previous entry in the stack of open elements.
Return to the step labeled loop.
When the steps above say the user agent is to close a p element, it
  means that the user agent must run the following steps:
Generate implied end tags, except for p elements.
If the current node is not a p element, then this is a
   parse error.
Pop elements from the stack of open elements until a p element
   has been popped from the stack.
The adoption agency algorithm, which takes as its only argument a token token for which the algorithm is being run, consists of the following steps:
Let subject be token's tag name.
If the current node is an HTML element whose tag name is subject, and the current node is not in the list of active formatting elements, then pop the current node off the stack of open elements and return.
Let outerLoopCounter be 0.
While true:
If outerLoopCounter is greater than or equal to 8, then return.
Increment outerLoopCounter by 1.
Let formattingElement be the last element in the list of active formatting elements that:
If there is no such element, then return and instead act as described in the "any other end tag" entry above.
If formattingElement is not in the stack of open elements, then this is a parse error; remove the element from the list, and return.
If formattingElement is in the stack of open elements, but the element is not in scope, then this is a parse error; return.
If formattingElement is not the current node, this is a parse error. (But do not return.)
Let furthestBlock be the topmost node in the stack of open elements that is lower in the stack than formattingElement, and is an element in the special category. There might not be one.
If there is no furthestBlock, then the UA must first pop all the nodes from the bottom of the stack of open elements, from the current node up to and including formattingElement, then remove formattingElement from the list of active formatting elements, and finally return.
Let commonAncestor be the element immediately above formattingElement in the stack of open elements.
Let a bookmark note the position of formattingElement in the list of active formatting elements relative to the elements on either side of it in the list.
Let node and lastNode be furthestBlock.
Let innerLoopCounter be 0.
While true:
Increment innerLoopCounter by 1.
Let node be the element immediately above node in the stack of open elements, or if node is no longer in the stack of open elements (e.g. because it got removed by this algorithm), the element that was immediately above node in the stack of open elements before node was removed.
If node is formattingElement, then break.
If innerLoopCounter is greater than 3 and node is in the list of active formatting elements, then remove node from the list of active formatting elements.
If node is not in the list of active formatting elements, then remove node from the stack of open elements and continue.
Create an element for the token for which the element node was created, in the HTML namespace, with commonAncestor as the intended parent; replace the entry for node in the list of active formatting elements with an entry for the new element, replace the entry for node in the stack of open elements with an entry for the new element, and let node be the new element.
If lastNode is furthestBlock, then move the aforementioned bookmark to be immediately after the new node in the list of active formatting elements.
Append lastNode to node.
Set lastNode to node.
Insert whatever lastNode ended up being in the previous step at the appropriate place for inserting a node, but using commonAncestor as the override target.
Create an element for the token for which formattingElement was created, in the HTML namespace, with furthestBlock as the intended parent.
Take all of the child nodes of furthestBlock and append them to the element created in the last step.
Append that new element to furthestBlock.
Remove formattingElement from the list of active formatting elements, and insert the new element into the list of active formatting elements at the position of the aforementioned bookmark.
Remove formattingElement from the stack of open elements, and insert the new element into the stack of open elements immediately below the position of furthestBlock in that stack.
This algorithm's name, the "adoption agency algorithm", comes from the way it causes elements to change parents, and is in contrast with other possible algorithms for dealing with misnested content.
When the user agent is to apply the rules for the "text" insertion mode, the user agent must handle the token as follows:
This can never be a U+0000 NULL character; the tokenizer converts those to U+FFFD REPLACEMENT CHARACTER characters.
If the current node is a script element, then set its already
    started to true.
Pop the current node off the stack of open elements.
Switch the insertion mode to the original insertion mode and reprocess the token.
If the active speculative HTML parser is null and the JavaScript execution context stack is empty, then perform a microtask checkpoint.
Let script be the current node (which will be a
    script element).
Pop the current node off the stack of open elements.
Switch the insertion mode to the original insertion mode.
Let the old insertion point have the same value as the current insertion point. Let the insertion point be just before the next input character.
Increment the parser's script nesting level by one.
If the active speculative HTML parser is null, then prepare the script element script. This might cause some script to execute, which might cause new characters to be inserted into the tokenizer, and might cause the tokenizer to output more tokens, resulting in a reentrant invocation of the parser.
Decrement the parser's script nesting level by one. If the parser's script nesting level is zero, then set the parser pause flag to false.
Let the insertion point have the value of the old insertion point. (In other words, restore the insertion point to its previous value. This value might be the "undefined" value.)
At this stage, if the pending parsing-blocking script is not null, then:
Set the parser pause flag to true, and abort the processing of any nested invocations of the tokenizer, yielding control back to the caller. (Tokenization will resume when the caller returns to the "outer" tree construction stage.)
The tree construction stage of this particular parser is being called reentrantly, say from a call to document.write().
While the pending parsing-blocking script is not null:
Let the script be the pending parsing-blocking script.
Set the pending parsing-blocking script to null.
Start the speculative HTML parser for this instance of the HTML parser.
Block the tokenizer for this instance of the HTML parser, such that the event loop will not run tasks that invoke the tokenizer.
If the parser's Document has a style sheet that is blocking
       scripts or the script's ready to be parser-executed is false:
       spin the event loop until the parser's Document has no style
       sheet that is blocking scripts and the script's ready to be
       parser-executed becomes true.
If this parser has been aborted in the meantime, return.
This could happen if, e.g., while the spin the event loop
        algorithm is running, the Document gets destroyed, or the document.open()
        method gets invoked on the Document.
Stop the speculative HTML parser for this instance of the HTML parser.
Unblock the tokenizer for this instance of the HTML parser, such that tasks that invoke the tokenizer can again be run.
Let the insertion point be just before the next input character.
Increment the parser's script nesting level by one (it should be zero before this step, so this sets it to one).
Execute the script element the script.
Decrement the parser's script nesting level by one. If the parser's script nesting level is zero (which it always should be at this point), then set the parser pause flag to false.
Let the insertion point be undefined again.
Pop the current node off the stack of open elements.
Switch the insertion mode to the original insertion mode.
When the user agent is to apply the rules for the "in table" insertion mode, the user agent must handle the token as follows:
table, tbody, template, tfoot, thead, or tr elementLet the pending table character tokens be an empty list of tokens.
Set the original insertion mode to the current insertion mode.
Switch the insertion mode to "in table text" and reprocess the token.
Parse error. Ignore the token.
Clear the stack back to a table context. (See below.)
Insert a marker at the end of the list of active formatting elements.
Insert an HTML element for the token, then switch the insertion mode to "in caption".
Clear the stack back to a table context. (See below.)
Insert an HTML element for the token, then switch the insertion mode to "in column group".
Clear the stack back to a table context. (See below.)
Insert an HTML element for a "colgroup" start tag token with no attributes, then switch the insertion mode to "in column group".
Reprocess the current token.
Clear the stack back to a table context. (See below.)
Insert an HTML element for the token, then switch the insertion mode to "in table body".
Clear the stack back to a table context. (See below.)
Insert an HTML element for a "tbody" start tag token with no attributes, then switch the insertion mode to "in table body".
Reprocess the current token.
If the stack of open elements does not have a table element in table scope, ignore the token.
Otherwise:
Pop elements from this stack until a table element has been popped from the
     stack.
Reprocess the token.
If the stack of open elements does not have a table element in table scope, this is a parse
    error; ignore the token.
Otherwise:
Pop elements from this stack until a table element has been popped from the
     stack.
Parse error. Ignore the token.
Process the token using the rules for the "in head" insertion mode.
If the token does not have an attribute with the name "type", or if it does, but that
    attribute's value is not an ASCII case-insensitive match for the string "hidden", then: act as described in the "anything else" entry below.
Otherwise:
Insert an HTML element for the token.
Pop that input element off the stack of open elements.
Acknowledge the token's self-closing flag, if it is set.
If there is a template element on the stack of open elements, or if
    the form element pointer is not null, ignore the
    token.
Otherwise:
Insert an HTML element for the token, and set the form element pointer to point to the element created.
Pop that form element off the stack of open elements.
Process the token using the rules for the "in body" insertion mode.
Parse error. Enable foster parenting, process the token using the rules for the "in body" insertion mode, and then disable foster parenting.
When the steps above require the UA to clear the stack back to a table context, it
  means that the UA must, while the current node is not a table,
  template, or html element, pop elements from the stack of open
  elements.
This is the same list of elements as used in the has an element in table scope steps.
The current node being an html element after this
  process is a fragment case.
When the user agent is to apply the rules for the "in table text" insertion mode, the user agent must handle the token as follows:
Parse error. Ignore the token.
Append the character token to the pending table character tokens list.
If any of the tokens in the pending table character tokens list are character tokens that are not ASCII whitespace, then this is a parse error: reprocess the character tokens in the pending table character tokens list using the rules given in the "anything else" entry in the "in table" insertion mode.
Otherwise, insert the characters given by the pending table character tokens list.
Switch the insertion mode to the original insertion mode and reprocess the token.
When the user agent is to apply the rules for the "in caption" insertion mode, the user agent must handle the token as follows:
If the stack of open elements does not have a caption element in table scope, this is a parse
    error; ignore the token. (fragment case)
Otherwise:
Now, if the current node is not a caption element, then this is a
     parse error.
Pop elements from this stack until a caption element has been popped from the
     stack.
Clear the list of active formatting elements up to the last marker.
Switch the insertion mode to "in table".
If the stack of open elements does not have a caption element in table scope, this is a parse
    error; ignore the token. (fragment case)
Otherwise:
Now, if the current node is not a caption element, then this is a
     parse error.
Pop elements from this stack until a caption element has been popped from the
     stack.
Clear the list of active formatting elements up to the last marker.
Switch the insertion mode to "in table".
Reprocess the token.
Parse error. Ignore the token.
Process the token using the rules for the "in body" insertion mode.
When the user agent is to apply the rules for the "in column group" insertion mode, the user agent must handle the token as follows:
Parse error. Ignore the token.
Process the token using the rules for the "in body" insertion mode.
Insert an HTML element for the token. Immediately pop the current node off the stack of open elements.
Acknowledge the token's self-closing flag, if it is set.
If the current node is not a colgroup element, then this is a
    parse error; ignore the token.
Otherwise, pop the current node from the stack of open elements. Switch the insertion mode to "in table".
Parse error. Ignore the token.
Process the token using the rules for the "in head" insertion mode.
Process the token using the rules for the "in body" insertion mode.
If the current node is not a colgroup element, then this is a
    parse error; ignore the token.
Otherwise, pop the current node from the stack of open elements.
Switch the insertion mode to "in table".
Reprocess the token.
When the user agent is to apply the rules for the "in table body" insertion mode, the user agent must handle the token as follows:
Clear the stack back to a table body context. (See below.)
Insert an HTML element for the token, then switch the insertion mode to "in row".
Clear the stack back to a table body context. (See below.)
Insert an HTML element for a "tr" start tag token with no attributes, then switch the insertion mode to "in row".
Reprocess the current token.
If the stack of open elements does not have an element in table scope that is an HTML element with the same tag name as the token, this is a parse error; ignore the token.
Otherwise:
Clear the stack back to a table body context. (See below.)
Pop the current node from the stack of open elements. Switch the insertion mode to "in table".
If the stack of open elements does not have a tbody, thead, or tfoot element in table
    scope, this is a parse error; ignore the token.
Otherwise:
Clear the stack back to a table body context. (See below.)
Pop the current node from the stack of open elements. Switch the insertion mode to "in table".
Reprocess the token.
Parse error. Ignore the token.
Process the token using the rules for the "in table" insertion mode.
When the steps above require the UA to clear the stack back to a table body context,
  it means that the UA must, while the current node is not a tbody,
  tfoot, thead, template, or html element, pop
  elements from the stack of open elements.
The current node being an html element after this
  process is a fragment case.
When the user agent is to apply the rules for the "in row" insertion mode, the user agent must handle the token as follows:
Clear the stack back to a table row context. (See below.)
Insert an HTML element for the token, then switch the insertion mode to "in cell".
Insert a marker at the end of the list of active formatting elements.
If the stack of open elements does not have a tr element in table scope, this is a parse error;
    ignore the token.
Otherwise:
Clear the stack back to a table row context. (See below.)
Pop the current node (which will be a tr element) from the
     stack of open elements. Switch the insertion mode to "in table body".
If the stack of open elements does not have a tr element in table scope, this is a parse error;
    ignore the token.
Otherwise:
Clear the stack back to a table row context. (See below.)
Pop the current node (which will be a tr element) from the
     stack of open elements. Switch the insertion mode to "in table body".
Reprocess the token.
If the stack of open elements does not have an element in table scope that is an HTML element with the same tag name as the token, this is a parse error; ignore the token.
If the stack of open elements does not have a tr element in table scope, ignore the token.
Otherwise:
Clear the stack back to a table row context. (See below.)
Pop the current node (which will be a tr element) from the
     stack of open elements. Switch the insertion mode to "in table body".
Reprocess the token.
Parse error. Ignore the token.
Process the token using the rules for the "in table" insertion mode.
When the steps above require the UA to clear the stack back to a table row context,
  it means that the UA must, while the current node is not a tr,
  template, or html element, pop elements from the stack of open
  elements.
The current node being an html element after this
  process is a fragment case.
When the user agent is to apply the rules for the "in cell" insertion mode, the user agent must handle the token as follows:
If the stack of open elements does not have an element in table scope that is an HTML element with the same tag name as that of the token, then this is a parse error; ignore the token.
Otherwise:
Now, if the current node is not an HTML element with the same tag name as the token, then this is a parse error.
Pop elements from the stack of open elements until an HTML element with the same tag name as the token has been popped from the stack.
Clear the list of active formatting elements up to the last marker.
Switch the insertion mode to "in row".
Assert: The stack of open elements has a td or th element in table scope.
Close the cell (see below) and reprocess the token.
Parse error. Ignore the token.
If the stack of open elements does not have an element in table scope that is an HTML element with the same tag name as that of the token, then this is a parse error; ignore the token.
Otherwise, close the cell (see below) and reprocess the token.
Process the token using the rules for the "in body" insertion mode.
Where the steps above say to close the cell, they mean to run the following algorithm:
If the current node is not now a td element or a th
   element, then this is a parse error.
Pop elements from the stack of open elements until a td
   element or a th element has been popped from the stack.
Clear the list of active formatting elements up to the last marker.
Switch the insertion mode to "in row".
The stack of open elements cannot have both a td and a
  th element in table scope at the
  same time, nor can it have neither when the close the cell algorithm is invoked.
When the user agent is to apply the rules for the "in template" insertion mode, the user agent must handle the token as follows:
Process the token using the rules for the "in body" insertion mode.
Process the token using the rules for the "in head" insertion mode.
Pop the current template insertion mode off the stack of template insertion modes.
Push "in table" onto the stack of template insertion modes so that it is the new current template insertion mode.
Switch the insertion mode to "in table", and reprocess the token.
Pop the current template insertion mode off the stack of template insertion modes.
Push "in column group" onto the stack of template insertion modes so that it is the new current template insertion mode.
Switch the insertion mode to "in column group", and reprocess the token.
Pop the current template insertion mode off the stack of template insertion modes.
Push "in table body" onto the stack of template insertion modes so that it is the new current template insertion mode.
Switch the insertion mode to "in table body", and reprocess the token.
Pop the current template insertion mode off the stack of template insertion modes.
Push "in row" onto the stack of template insertion modes so that it is the new current template insertion mode.
Switch the insertion mode to "in row", and reprocess the token.
Pop the current template insertion mode off the stack of template insertion modes.
Push "in body" onto the stack of template insertion modes so that it is the new current template insertion mode.
Switch the insertion mode to "in body", and reprocess the token.
Parse error. Ignore the token.
If there is no template element on the stack of open elements, then
    stop parsing. (fragment case)
Otherwise, this is a parse error.
Pop elements from the stack of open elements until a template
    element has been popped from the stack.
Clear the list of active formatting elements up to the last marker.
Pop the current template insertion mode off the stack of template insertion modes.
Reset the insertion mode appropriately.
Reprocess the token.
When the user agent is to apply the rules for the "after body" insertion mode, the user agent must handle the token as follows:
Process the token using the rules for the "in body" insertion mode.
Insert a comment as the last child of the first element in the stack of
    open elements (the html element).
Parse error. Ignore the token.
Process the token using the rules for the "in body" insertion mode.
If the parser was created as part of the HTML fragment parsing algorithm, this is a parse error; ignore the token. (fragment case)
Otherwise, switch the insertion mode to "after after body".
Parse error. Switch the insertion mode to "in body" and reprocess the token.
When the user agent is to apply the rules for the "in frameset" insertion mode, the user agent must handle the token as follows:
Parse error. Ignore the token.
Process the token using the rules for the "in body" insertion mode.
Insert an HTML element for the token.
If the current node is the root html element, then this is a
    parse error; ignore the token. (fragment case)
Otherwise, pop the current node from the stack of open elements.
If the parser was not created as part of the HTML fragment parsing algorithm
    (fragment case), and the current node is no longer a
    frameset element, then switch the insertion mode to "after frameset".
Insert an HTML element for the token. Immediately pop the current node off the stack of open elements.
Acknowledge the token's self-closing flag, if it is set.
Process the token using the rules for the "in head" insertion mode.
If the current node is not the root html element, then this is a
    parse error.
The current node can only be the root
    html element in the fragment case.
Parse error. Ignore the token.
When the user agent is to apply the rules for the "after frameset" insertion mode, the user agent must handle the token as follows:
Parse error. Ignore the token.
Process the token using the rules for the "in body" insertion mode.
Switch the insertion mode to "after after frameset".
Process the token using the rules for the "in head" insertion mode.
Parse error. Ignore the token.
When the user agent is to apply the rules for the "after after body" insertion mode, the user agent must handle the token as follows:
Insert a comment as the last child of the Document object.
Process the token using the rules for the "in body" insertion mode.
Parse error. Switch the insertion mode to "in body" and reprocess the token.
When the user agent is to apply the rules for the "after after frameset" insertion mode, the user agent must handle the token as follows:
Insert a comment as the last child of the Document object.
Process the token using the rules for the "in body" insertion mode.
Process the token using the rules for the "in head" insertion mode.
Parse error. Ignore the token.
When the user agent is to apply the rules for parsing tokens in foreign content, the user agent must handle the token as follows:
Parse error. Insert a U+FFFD REPLACEMENT CHARACTER character.
Set the frameset-ok flag to "not ok".
Parse error. Ignore the token.
While the current node is not a MathML text integration point, an HTML integration point, or an element in the HTML namespace, pop elements from the stack of open elements.
Reprocess the token according to the rules given in the section corresponding to the current insertion mode in HTML content.
If the adjusted current node is an element in the MathML namespace, adjust MathML attributes for the token. (This fixes the case of MathML attributes that are not all lowercase.)
If the adjusted current node is an element in the SVG namespace, and the token's tag name is one of the ones in the first column of the following table, change the tag name to the name given in the corresponding cell in the second column. (This fixes the case of SVG elements that are not all lowercase.)
| Tag name | Element name | 
|---|---|
| altglyph | altGlyph | 
| altglyphdef | altGlyphDef | 
| altglyphitem | altGlyphItem | 
| animatecolor | animateColor | 
| animatemotion | animateMotion | 
| animatetransform | animateTransform | 
| clippath | clipPath | 
| feblend | feBlend | 
| fecolormatrix | feColorMatrix | 
| fecomponenttransfer | feComponentTransfer | 
| fecomposite | feComposite | 
| feconvolvematrix | feConvolveMatrix | 
| fediffuselighting | feDiffuseLighting | 
| fedisplacementmap | feDisplacementMap | 
| fedistantlight | feDistantLight | 
| fedropshadow | feDropShadow | 
| feflood | feFlood | 
| fefunca | feFuncA | 
| fefuncb | feFuncB | 
| fefuncg | feFuncG | 
| fefuncr | feFuncR | 
| fegaussianblur | feGaussianBlur | 
| feimage | feImage | 
| femerge | feMerge | 
| femergenode | feMergeNode | 
| femorphology | feMorphology | 
| feoffset | feOffset | 
| fepointlight | fePointLight | 
| fespecularlighting | feSpecularLighting | 
| fespotlight | feSpotLight | 
| fetile | feTile | 
| feturbulence | feTurbulence | 
| foreignobject | foreignObject | 
| glyphref | glyphRef | 
| lineargradient | linearGradient | 
| radialgradient | radialGradient | 
| textpath | textPath | 
If the adjusted current node is an element in the SVG namespace, adjust SVG attributes for the token. (This fixes the case of SVG attributes that are not all lowercase.)
Adjust foreign attributes for the token. (This fixes the use of namespaced attributes, in particular XLink in SVG.)
Insert a foreign element for the token, with the adjusted current node's namespace and false.
If the token has its self-closing flag set, then run the appropriate steps from the following list:
Acknowledge the token's self-closing flag, and then act as described in the steps for a "script" end tag below.
Pop the current node off the stack of open elements and acknowledge the token's self-closing flag.
script elementPop the current node off the stack of open elements.
Let the old insertion point have the same value as the current insertion point. Let the insertion point be just before the next input character.
Increment the parser's script nesting level by one. Set the parser pause flag to true.
If the active speculative HTML parser is null and the user agent supports SVG,
    then Process the
    SVG script element according to the SVG rules. [SVG]
Even if this causes new characters to be inserted into the tokenizer, the parser will not be executed reentrantly, since the parser pause flag is true.
Decrement the parser's script nesting level by one. If the parser's script nesting level is zero, then set the parser pause flag to false.
Let the insertion point have the value of the old insertion point. (In other words, restore the insertion point to its previous value. This value might be the "undefined" value.)
Run these steps:
Initialize node to be the current node (the bottommost node of the stack).
If node's tag name, converted to ASCII lowercase, is not the same as the tag name of the token, then this is a parse error.
Loop: If node is the topmost element in the stack of open elements, then return. (fragment case)
If node's tag name, converted to ASCII lowercase, is the same as the tag name of the token, pop elements from the stack of open elements until node has been popped from the stack, and then return.
Set node to the previous entry in the stack of open elements.
If node is not an element in the HTML namespace, return to the step labeled loop.
Otherwise, process the token according to the rules given in the section corresponding to the current insertion mode in HTML content.
Document/DOMContentLoaded_event
Support in all current engines.
Support in all current engines.
Once the user agent stops parsing the document, the user agent must run the following steps:
If the active speculative HTML parser is not null, then stop the speculative HTML parser and return.
Set the insertion point to undefined.
Update the current document readiness to "interactive".
Pop all the nodes off the stack of open elements.
While the list of scripts that will execute when the document has finished parsing is not empty:
Spin the event loop until the first script in the list
     of scripts that will execute when the document has finished parsing has its ready
     to be parser-executed set to true and the parser's Document
     has no style sheet that is blocking scripts.
Execute the script element given by the first script in
     the list of scripts that will execute when the document has finished
     parsing.
Remove the first script element from the list of scripts that will
     execute when the document has finished parsing (i.e. shift out the first entry in the
     list).
Queue a global task on the DOM manipulation task source given the
    Document's relevant global object to run the following substeps:
Set the Document's load timing info's DOM content loaded
     event start time to the current high resolution time given the
     Document's relevant global object.
Fire an event named DOMContentLoaded at the Document
     object, with its bubbles attribute initialized to
     true.
Set the Document's load timing info's DOM content loaded
     event end time to the current high resolution time given the
     Document's relevant global object.
Enable the client message queue of the
     ServiceWorkerContainer object whose associated service worker client is the
     Document object's relevant settings object.
Invoke WebDriver BiDi DOM content loaded with the Document's
     browsing context, and a new WebDriver BiDi
     navigation status whose id is the
     Document object's during-loading
     navigation ID for WebDriver BiDi, status
     is "pending", and url is the Document object's URL.
Spin the event loop until the set of scripts that will execute as soon as possible and the list of scripts that will execute in order as soon as possible are empty.
Spin the event loop until there is nothing that delays the load event in the Document.
Queue a global task on the DOM manipulation task source given the
    Document's relevant global object to run the following steps:
Update the current document readiness to "complete".
If the Document object's browsing
     context is null, then abort these steps.
Let window be the Document's relevant global
     object.
Set the Document's load timing info's load event start
     time to the current high resolution time given window.
Fire an event named load at window, with legacy target override
     flag set.
Invoke WebDriver BiDi load complete with the Document's browsing context, and a new WebDriver BiDi
     navigation status whose id is the
     Document object's during-loading
     navigation ID for WebDriver BiDi, status
     is "complete", and url is the Document object's URL.
Set the Document object's during-loading navigation ID for WebDriver BiDi
     to null.
Set the Document's load timing info's load event end
     time to the current high resolution time given window.
Assert: Document's page showing is
     false.
Set the Document's page showing to true.
Fire a page transition event named pageshow at window with false.
If the Document's print when loaded flag is set, then run the
   printing steps.
The Document is now ready for post-load tasks.
When the user agent is to abort a parser, it must run the following steps:
Throw away any pending content in the input stream, and discard any future content that would have been added to it.
Stop the speculative HTML parser for this HTML parser.
Update the current document readiness to "interactive".
Pop all the nodes off the stack of open elements.
Update the current document readiness to "complete".
User agents may implement an optimization, as described in this section, to speculatively fetch resources that are declared in the HTML markup while the HTML parser is waiting for a pending parsing-blocking script to be fetched and executed, or during normal parsing, at the time an element is created for a token. While this optimization is not defined in precise detail, there are some rules to consider for interoperability.
Each HTML parser can have an active speculative HTML parser. It is initially null.
The speculative HTML parser must act like the normal HTML parser (e.g., the tree builder rules apply), with some exceptions:
The state of the normal HTML parser and the document itself must not be affected.
For example, the next input character or the stack of open elements for the normal HTML parser is not affected by the speculative HTML parser.
Bytes pushed into the HTML parser's input byte stream must also be pushed into the speculative HTML parser's input byte stream. Bytes read from the streams must be independent.
The result of the speculative parsing is primarily a series of speculative fetches. Which kinds of resources to speculatively fetch is implementation-defined, but user agents must not speculatively fetch resources that would not be fetched with the normal HTML parser, under the assumption that the script that is blocking the HTML parser does nothing.
It is possible that the same markup is seen multiple times from the speculative HTML parser and then the normal HTML parser. It is expected that duplicated fetches will be prevented by caching rules, which are not yet fully specified.
A speculative fetch for a speculative mock element element must follow these rules:
Should some of these things be applied to the document "for real", even though they are found speculatively?
If the speculative HTML parser encounters one of the following elements, then act as if that element is processed for the purpose of its effect of subsequent speculative fetches.
base element.meta element whose http-equiv
     attribute is in the Content
     security policy state.meta element whose name attribute is an
     ASCII case-insensitive match for "referrer".meta element whose name attribute is an
     ASCII case-insensitive match for "viewport". (This can
     affect whether a media query list matches the environment.) [CSSDEVICEADAPT]Let url be the URL that element would fetch if it was processed normally. If there is no such URL or if it is the empty string, then do nothing. Otherwise, if url is already in the list of speculative fetch URLs, then do nothing. Otherwise, fetch url as if the element was processed normally, and add url to the list of speculative fetch URLs.
Each Document has a list of speculative fetch URLs, which is a
  list of URLs, initially empty.
To start the speculative HTML parser for an instance of an HTML parser parser:
Optionally, return.
This step allows user agents to opt out of speculative HTML parsing.
If parser's active speculative HTML parser is not null, then stop the speculative HTML parser for parser.
This can happen when document.write()
    writes another parser-blocking script. For simplicity, this specification always restarts
    speculative parsing, but user agents can implement a more efficient strategy, so long as the end
    result is equivalent.
Let speculativeParser be a new speculative HTML parser, with the same state as parser.
Let speculativeDoc be a new isomorphic representation of parser's
   Document, where all elements are instead speculative mock elements. Let speculativeParser parse into
   speculativeDoc.
Set parser's active speculative HTML parser to speculativeParser.
In parallel, run speculativeParser until it is stopped or until it reaches the end of its input stream.
To stop the speculative HTML parser for an instance of an HTML parser parser:
Let speculativeParser be parser's active speculative HTML parser.
If speculativeParser is null, then return.
Throw away any pending content in speculativeParser's input stream, and discard any future content that would have been added to it.
Set parser's active speculative HTML parser to null.
The speculative HTML parser will create speculative mock elements instead of normal elements. DOM operations that the tree builder normally does on elements are expected to work appropriately on speculative mock elements.
A speculative mock element is a struct with the following items:
A string namespace, corresponding to an element's namespace.
A string local name, corresponding to an element's local name.
A list attribute list, corresponding to an element's attribute list.
To create a speculative mock element given a namespace, tagName, and attributes:
Let element be a new speculative mock element.
Set element's namespace to namespace.
Set element's local name to tagName.
Set element's attribute list to attributes.
Optionally, perform a speculative fetch for element.
Return element.
When the tree builder says to insert an element into a template element's
  template contents, if that is a speculative mock element, and the
  template element's template contents is not a
  ShadowRoot node, instead do nothing. URLs found speculatively inside
  non-declarative-shadow-root template elements might themselves be templates, and must
  not be speculatively fetched.
When an application uses an HTML parser in conjunction with an XML pipeline, it is
  possible that the constructed DOM is not compatible with the XML tool chain in certain subtle
  ways. For example, an XML toolchain might not be able to represent attributes with the name xmlns, since they conflict with the Namespaces in XML syntax. There is also some
  data that the HTML parser generates that isn't included in the DOM itself. This
  section specifies some rules for handling these issues.
If the XML API being used doesn't support DOCTYPEs, the tool may drop DOCTYPEs altogether.
If the XML API doesn't support attributes in no namespace that are named "xmlns", attributes whose names start with "xmlns:", or
  attributes in the XMLNS namespace, then the tool may drop such attributes.
The tool may annotate the output with any namespace declarations required for proper operation.
If the XML API being used restricts the allowable characters in the local names of elements and attributes, then the tool may map all element and attribute local names that the API wouldn't support to a set of names that are allowed, by replacing any character that isn't supported with the uppercase letter U and the six digits of the character's code point when expressed in hexadecimal, using digits 0-9 and capital letters A-F as the symbols, in increasing numeric order.
For example, the element name foo<bar, which can be
  output by the HTML parser, though it is neither a legal HTML element name nor a
  well-formed XML element name, would be converted into fooU00003Cbar, which
  is a well-formed XML element name (though it's still not legal in HTML by any means).
As another example, consider the attribute xlink:href.
  Used on a MathML element, it becomes, after being adjusted, an attribute with a prefix "xlink" and a local
  name "href". However, used on an HTML element, it becomes an attribute with
  no prefix and the local name "xlink:href", which is not a valid NCName, and
  thus might not be accepted by an XML API. It could thus get converted, becoming "xlinkU00003Ahref".
The resulting names from this conversion conveniently can't clash with any attribute generated by the HTML parser, since those are all either lowercase or those listed in the adjust foreign attributes algorithm's table.
If the XML API restricts comments from having two consecutive U+002D HYPHEN-MINUS characters (--), the tool may insert a single U+0020 SPACE character between any such offending characters.
If the XML API restricts comments from ending in a U+002D HYPHEN-MINUS character (-), the tool may insert a single U+0020 SPACE character at the end of such comments.
If the XML API restricts allowed characters in character data, attribute values, or comments, the tool may replace any U+000C FORM FEED (FF) character with a U+0020 SPACE character, and any other literal non-XML character with a U+FFFD REPLACEMENT CHARACTER.
If the tool has no way to convey out-of-band information, then the tool may drop the following information:
form
   element ancestor (use of the form element pointer in the parser)template elements.The mutations allowed by this section apply after the HTML
  parser's rules have been applied. For example, a <a::> start tag
  will be closed by a </a::> end tag, and never by a </aU00003AU00003A> end tag, even if the user agent is using the rules above to
  then generate an actual element in the DOM with the name aU00003AU00003A for
  that start tag.
This section is non-normative.
This section examines some erroneous markup and discusses how the HTML parser handles these cases.
This section is non-normative.
The most-often discussed example of erroneous markup is as follows:
<p>1<b>2<i>3</b>4</i>5</p>The parsing of this markup is straightforward up to the "3". At this point, the DOM looks like this:
Here, the stack of open elements has five elements on it: html,
  body, p, b, and i. The list of active
  formatting elements just has two: b and i. The insertion
  mode is "in body".
Upon receiving the end tag token with the tag name "b", the "adoption
  agency algorithm" is invoked. This is a simple case, in that the formattingElement
  is the b element, and there is no furthest block.
  Thus, the stack of open elements ends up with just three elements: html,
  body, and p, while the list of active formatting elements
  has just one: i. The DOM tree is unmodified at this point.
The next token is a character ("4"), triggers the reconstruction of the active formatting elements, in this case just
  the i element. A new i element is thus created for the "4"
  Text node. After the end tag token for the "i" is also received, and the "5"
  Text node is inserted, the DOM looks as follows:
This section is non-normative.
A case similar to the previous one is the following:
<b>1<p>2</b>3</p>Up to the "2" the parsing here is straightforward:
The interesting part is when the end tag token with the tag name "b" is parsed.
Before that token is seen, the stack of open elements has four elements on it:
  html, body, b, and p. The list of active
  formatting elements just has the one: b. The insertion mode is
  "in body".
Upon receiving the end tag token with the tag name "b", the "adoption
  agency algorithm" is invoked, as in the previous example. However, in this case, there
  is a furthest block, namely the p element. Thus, this
  time the adoption agency algorithm isn't skipped over.
The common ancestor is the body element. A conceptual
  "bookmark" marks the position of the b in the list of active formatting
  elements, but since that list has only one element in it, the bookmark won't have much
  effect.
As the algorithm progresses, node ends up set to the formatting element
  (b), and last node ends up set to the furthest
  block (p).
The last node gets appended (moved) to the common ancestor, so that the DOM looks like:
A new b element is created, and the children of the p element are
  moved to it:
Finally, the new b element is appended to the p element, so that the
  DOM looks like:
The b element is removed from the list of active formatting elements
  and the stack of open elements, so that when the "3" is parsed, it is appended to the
  p element:
This section is non-normative.
Error handling in tables is, for historical reasons, especially strange. For example, consider the following markup:
<table><b><tr><td>aaa</td></tr>bbb</table>cccThe highlighted b element start tag is not allowed directly inside a table like
  that, and the parser handles this case by placing the element before the table. (This is
  called foster parenting.) This can be seen by examining the DOM tree
  as it stands just after the table element's start tag has been seen:
...and then immediately after the b element start tag has been seen:
At this point, the stack of open elements has on it the elements
  html, body, table, and b (in that order,
  despite the resulting DOM tree); the list of active formatting elements just has the
  b element in it; and the insertion mode is "in table".
The tr start tag causes the b element to be popped off the stack and
  a tbody start tag to be implied; the tbody and tr elements
  are then handled in a rather straight-forward manner, taking the parser through the "in table body" and "in row" insertion modes, after which the DOM looks as follows:
Here, the stack of open elements has on it the elements html,
  body, table, tbody, and tr; the list of
  active formatting elements still has the b element in it; and the
  insertion mode is "in row".
The td element start tag token, after putting a td element on the
  tree, puts a marker on the list of active
  formatting elements (it also switches to the "in
  cell" insertion mode).
The marker means that when the "aaa" character
  tokens are seen, no b element is created to hold the resulting Text
  node:
The end tags are handled in a straight-forward manner; after handling them, the stack of
  open elements has on it the elements html, body,
  table, and tbody; the list of active formatting elements
  still has the b element in it (the marker
  having been removed by the "td" end tag token); and the insertion mode is "in table body".
Thus it is that the "bbb" character tokens are found. These trigger the "in table text" insertion mode to be used (with the original
  insertion mode set to "in table body").
  The character tokens are collected, and when the next token (the table element end
  tag) is seen, they are processed as a group. Since they are not all spaces, they are handled as
  per the "anything else" rules in the "in table"
  insertion mode, which defer to the "in body"
  insertion mode but with foster parenting.
When the active formatting elements
  are reconstructed, a b element is created and foster parented, and then the "bbb" Text node is appended to it:
The stack of open elements has on it the elements html,
  body, table, tbody, and the new b (again, note
  that this doesn't match the resulting tree!); the list of active formatting elements
  has the new b element in it; and the insertion mode is still "in table body".
Had the character tokens been only ASCII whitespace instead of "bbb", then that
  ASCII whitespace would just be appended to the tbody element.
Finally, the table is closed by a "table" end tag. This pops all the nodes from
  the stack of open elements up to and including the table element, but it
  doesn't affect the list of active formatting elements, so the "ccc" character tokens
  after the table result in yet another b element being created, this time after the
  table:
This section is non-normative.
Consider the following markup, which for this example we will assume is the document with
  URL https://example.com/inner, being rendered as the content of
  an iframe in another document with the URL https://example.com/outer:
<div id=a>
 <script>
  var div = document.getElementById('a');
  parent.document.body.appendChild(div);
 </script>
 <script>
  alert(document.URL);
 </script>
</div>
<script>
 alert(document.URL);
</script>Up to the first "script" end tag, before the script is parsed, the result is relatively straightforward:
After the script is parsed, though, the div element and its child
  script element are gone:
They are, at this point, in the Document of the aforementioned outer
  browsing context. However, the stack of open elements still contains
  the div element.
Thus, when the second script element is parsed, it is inserted into the outer
  Document object.
Those parsed into different Documents than the one the parser was created for do
  not execute, so the first alert does not show.
Once the div element's end tag is parsed, the div element is popped
  off the stack, and so the next script element is in the inner
  Document:
This script does execute, resulting in an alert that says "https://example.com/inner".
This section is non-normative.
Elaborating on the example in the previous section, consider the case where the second
  script element is an external script (i.e. one with a src attribute). Since the element was not in the parser's
  Document when it was created, that external script is not even downloaded.
In a case where a script element with a src
  attribute is parsed normally into its parser's Document, but while the external
  script is being downloaded, the element is moved to another document, the script continues to
  download, but does not execute.
In general, moving script elements between Documents is
  considered a bad practice.
This section is non-normative.
The following markup shows how nested formatting elements (such as b) get
  collected and continue to be applied even as the elements they are contained in are closed, but
  that excessive duplicates are thrown away.
<!DOCTYPE html>
<p><b class=x><b class=x><b><b class=x><b class=x><b>X
<p>X
<p><b><b class=x><b>X
<p></b></b></b></b></b></b>XThe resulting DOM tree is as follows:
htmlhtmlNote how the second p element in the markup has no explicit b
  elements, but in the resulting DOM, up to three of each kind of formatting element (in this case
  three b elements with the class attribute, and two unadorned b elements)
  get reconstructed before the element's "X".
Also note how this means that in the final paragraph only six b end tags are
  needed to completely clear the list of active formatting elements, even though nine
  b start tags have been seen up to this point.
For the purposes of the following algorithm, an element serializes as void if its
  element type is one of the void elements, or is basefont,
  bgsound, frame, keygen, or param.
The following steps form the HTML fragment serialization algorithm. The algorithm takes as input a DOM
  Element, Document, or DocumentFragment referred to as
  the node, a boolean serializableShadowRoots, and a
  sequence<ShadowRoot> shadowRoots, and returns a string.
This algorithm serializes the children of the node being serialized, not the node itself.
If the node serializes as void, then return the empty string.
Let s be a string, and initialize it to the empty string.
If the node is a template element, then let the node instead be the template element's template
   contents (a DocumentFragment node).
If current node is a shadow host, then:
Let shadow be current node's shadow root.
If one of the following is true:
serializableShadowRoots is true and shadow's serializable is true; or
shadowRoots contains shadow,
then:
Append "<template shadowrootmode="".
If shadow's mode
       is "open", then append "open".
       Otherwise, append "closed".
Append """.
If shadow's delegates focus is set, then append
       " shadowrootdelegatesfocus=""".
If shadow's serializable is set, then append
       " shadowrootserializable=""".
If shadow's clonable is set, then append
       " shadowrootclonable=""".
If current node's custom
       element registry is not shadow's custom element registry, then append
       " shadowrootcustomelementregistry=""".
Append ">".
Append the value of running the HTML fragment serialization algorithm with shadow, serializableShadowRoots, and shadowRoots (thus recursing into this algorithm for that element).
Append "</template>".
For each child node of the node, in tree order, run the following steps:
Let current node be the child node being processed.
Append the appropriate string from the following list to s:
ElementIf current node is an element in the HTML namespace, the MathML namespace, or the SVG namespace, then let tagname be current node's local name. Otherwise, let tagname be current node's qualified name.
Append a U+003C LESS-THAN SIGN character (<), followed by tagname.
For HTML elements created by the HTML parser or
        createElement(), tagname will be
        lowercase.
If current node's is value is not null, and the element does not have an is attribute in its attribute list, then append the string " is="", followed by current node's is value escaped as described below in attribute mode,
        followed by a U+0022 QUOTATION MARK character (").
For each attribute that the element has, append a U+0020 SPACE character, the attribute's serialized name as described below, a U+003D EQUALS SIGN character (=), a U+0022 QUOTATION MARK character ("), the attribute's value, escaped as described below in attribute mode, and a second U+0022 QUOTATION MARK character (").
An attribute's serialized name for the purposes of the previous paragraph must be determined as follows:
The attribute's serialized name is the attribute's local name.
For attributes on HTML elements set by the HTML
          parser or by setAttribute(), the
          local name will be lowercase.
The attribute's serialized name is the string "xml:" followed
         by the attribute's local name.
xmlnsThe attribute's serialized name is the string "xmlns".
xmlnsThe attribute's serialized name is the string "xmlns:"
         followed by the attribute's local name.
The attribute's serialized name is the string "xlink:"
         followed by the attribute's local name.
The attribute's serialized name is the attribute's qualified name.
While the exact order of attributes is implementation-defined, and may depend on factors such as the order that the attributes were given in the original markup, the sort order must be stable, such that consecutive invocations of this algorithm serialize an element's attributes in the same order.
Append a U+003E GREATER-THAN SIGN character (>).
If current node serializes as void, then continue on to the next child node at this point.
Append the value of running the HTML fragment serialization algorithm with current node, serializableShadowRoots, and shadowRoots (thus recursing into this algorithm for that node), followed by a U+003C LESS-THAN SIGN character (<), a U+002F SOLIDUS character (/), tagname again, and finally a U+003E GREATER-THAN SIGN character (>).
Text nodeIf the parent of current node is a style,
        script, xmp, iframe, noembed,
        noframes, or plaintext element, or if the parent of current node is a noscript element and scripting is enabled for the node, then append the value of
        current node's data literally.
Otherwise, append the value of current node's data, escaped as described below.
CommentAppend "<!--" (U+003C LESS-THAN SIGN, U+0021
        EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS), followed by the value of
        current node's data, followed by the
        literal string "-->" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS,
        U+003E GREATER-THAN SIGN).
ProcessingInstructionAppend "<?" (U+003C LESS-THAN SIGN, U+003F
        QUESTION MARK), followed by the value of current node's target IDL attribute, followed by a single U+0020 SPACE character, followed
        by the value of current node's data,
        followed by a single U+003E GREATER-THAN SIGN character (>).
DocumentTypeAppend "<!DOCTYPE" (U+003C LESS-THAN SIGN, U+0021
        EXCLAMATION MARK, U+0044 LATIN CAPITAL LETTER D, U+004F LATIN CAPITAL LETTER O, U+0043 LATIN
        CAPITAL LETTER C, U+0054 LATIN CAPITAL LETTER T, U+0059 LATIN CAPITAL LETTER Y, U+0050 LATIN
        CAPITAL LETTER P, U+0045 LATIN CAPITAL LETTER E), followed by a space (U+0020 SPACE),
        followed by the value of current node's name, followed by ">"
        (U+003E GREATER-THAN SIGN).
Return s.
It is possible that the output of this algorithm, if parsed with an HTML parser, will not return the original tree structure. Tree structures that do not roundtrip a serialize and reparse step can also be produced by the HTML parser itself, although such cases are typically non-conforming.
For instance, if a textarea element to which a Comment
   node has been appended is serialized and the output is then reparsed, the comment will end up
   being displayed in the text control. Similarly, if, as a result of DOM manipulation, an element
   contains a comment that contains "-->", then when the result of
   serializing the element is parsed, the comment will be truncated at that point and the rest of
   the comment will be interpreted as markup. More examples would be making a script
   element contain a Text node with the text string "</script>", or having a p element that contains a
   ul element (as the ul element's start
   tag would imply the end tag for the p).
This can enable cross-site scripting attacks. An example of this would be a page that lets the
   user enter some font family names that are then inserted into a CSS style block via
   the DOM and which then uses the innerHTML IDL
   attribute to get the HTML serialization of that style element: if the user enters
   "</style><script>attack</script>" as a font family name, innerHTML will return markup that, if parsed in a different
   context, would contain a script node, even though no script node
   existed in the original DOM.
For example, consider the following markup:
<form id="outer"><div></form><form id="inner"><input>This will be parsed into:
The input element will be associated with the inner form element.
   Now, if this tree structure is serialized and reparsed, the <form
   id="inner"> start tag will be ignored, and so the input element will be
   associated with the outer form element instead.
<html><head></head><body><form id="outer"><div><form id="inner"><input></form></div></form></body></html>As another example, consider the following markup:
<a><table><a>This will be parsed into:
That is, the a elements are nested, because the second a element is
   foster parented. After a serialize-reparse roundtrip, the
   a elements and the table element would all be siblings, because the
   second <a> start tag implicitly closes the first a
   element.
<html><head></head><body><a><a></a><table></table></a></body></html>For historical reasons, this algorithm does not round-trip an initial U+000A (LF) character in
  pre, textarea, or listing elements, even though (in the
  first two cases) the markup being round-tripped can be conforming. The HTML parser
  will drop such a character during parsing, but this algorithm does not serialize an extra
  U+000A (LF) character.
For example, consider the following markup:
<pre>
Hello.</pre>When this document is first parsed, the pre element's child text
   content starts with a single newline character. After a serialize-reparse roundtrip, the
   pre element's child text content is simply "Hello.".
Because of the special role of the is attribute in signaling the creation of customized built-in elements, in that it provides a mechanism for parsed
  HTML to set the element's is
  value, we special-case its handling during serialization. This ensures that an element's
  is value is preserved
  through serialize-parse roundtrips.
When creating a customized built-in element via the parser, a developer uses the
   is attribute directly; in such cases serialize-parse roundtrips
   work fine.
<script>
window.SuperP = class extends HTMLParagraphElement {};
customElements.define("super-p", SuperP, { extends: "p" });
</script>
<div id="container"><p is="super-p">Superb!</p></div>
<script>
console.log(container.innerHTML); // <p is="super-p">
container.innerHTML = container.innerHTML;
console.log(container.innerHTML); // <p is="super-p">
console.assert(container.firstChild instanceof SuperP);
</script>But when creating a customized built-in element via its constructor or via createElement(), the is
   attribute is not added. Instead, the is value (which is what the custom elements machinery uses) is set
   without intermediating through an attribute.
<script>
container.innerHTML = "";
const p = document.createElement("p", { is: "super-p" });
container.appendChild(p);
// The is attribute is not present in the DOM:
console.assert(!p.hasAttribute("is"));
// But the element is still a super-p:
console.assert(p instanceof SuperP);
</script>To ensure that serialize-parse roundtrips still work, the serialization process explicitly
   writes out the element's is
   value as an is attribute:
<script>
console.log(container.innerHTML); // <p is="super-p">
container.innerHTML = container.innerHTML;
console.log(container.innerHTML); // <p is="super-p">
console.assert(container.firstChild instanceof SuperP);
</script>Escaping a string (for the purposes of the algorithm above) consists of running the following steps:
Replace any occurrence of the "&" character by the string "&".
Replace any occurrences of the U+00A0 NO-BREAK SPACE character by the string " ".
Replace any occurrences of the "<" character by the string
   "<".
Replace any occurrences of the ">" character by the string
   ">".
If the algorithm was invoked in the attribute mode, then replace any occurrences of
   the """ character by the string """.
The HTML fragment parsing algorithm, given an Element node context, string input, and an
  optional boolean allowDeclarativeShadowRoots (default false) is the following steps.
  They return a list of zero or more nodes.
Parts marked fragment case in algorithms in the HTML parser section are parts that only occur if the parser was created for the purposes of this algorithm. The algorithms have been annotated with such markings for informational purposes only; such markings have no normative weight. If it is possible for a condition described as a fragment case to occur even when the parser wasn't created for the purposes of handling this algorithm, then that is an error in the specification.
If context's node document is
   in quirks mode, then set document's mode to "quirks".
Otherwise, if context's node
   document is in limited-quirks mode, then set document's mode to "limited-quirks".
If allowDeclarativeShadowRoots is true, then set document's allow declarative shadow roots to true.
Create a new HTML parser, and associate it with document.
Set the state of the HTML parser's tokenization stage as follows, switching on the context element:
titletextareastylexmpiframenoembednoframesscriptnoscriptplaintextFor performance reasons, an implementation that does not report errors and that uses the actual state machine described in this specification directly could use the PLAINTEXT state instead of the RAWTEXT and script data states where those are mentioned in the list above. Except for rules regarding parse errors, they are equivalent, since there is no appropriate end tag token in the fragment case, yet they involve far fewer state transitions.
Let root be the result of creating an
   element given document, "html", the HTML
   namespace, null, null, false, and context's custom element registry.
Append root to document.
Set up the HTML parser's stack of open elements so that it contains just the single element root.
If context is a template
   element, then push "in template" onto the
   stack of template insertion modes so that it is the new current template
   insertion mode.
Create a start tag token whose name is the local name of context and whose attributes are the attributes of context.
Let this start tag token be the start tag token of context; e.g. for the purposes of determining if it is an HTML integration point.
Reset the parser's insertion mode appropriately.
The parser will reference the context element as part of that algorithm.
Set the HTML parser's form element pointer to the
   nearest node to context that is a
   form element (going straight up the ancestor chain, and including the element
   itself, if it is a form element), if any. (If there is no such form
   element, the form element pointer keeps its initial value,
   null.)
Place the input into the input stream for the HTML parser just created. The encoding confidence is irrelevant.
Start the HTML parser and let it run until it has consumed all the characters just inserted into the input stream.
Return root's children, in tree order.