Living Standard — Last Updated 13 May 2026
APIs for dynamically inserting markup into the document interact with the parser, and thus their behavior varies depending on whether they are used with HTML documents (and the HTML parser) or XML documents (and the XML parser).
Document objects have a throw-on-dynamic-markup-insertion counter,
which is used in conjunction with the create an element for the token algorithm to
prevent custom element constructors from being
able to use document.open(), document.close(), and document.write() when they are invoked by the parser.
Initially, the counter must be set to zero.
document = document.open()Support in all current engines.
Causes the Document to be replaced in-place, as if it was a new
Document object, but reusing the previous object, which is then returned.
The resulting Document has an HTML parser associated with it, which can be given
data to parse using document.write().
The method has no effect if the Document is still being parsed.
Throws an "InvalidStateError" DOMException if the
Document is an XML document.
Throws an "InvalidStateError" DOMException if the
parser is currently executing a custom element constructor.
window = document.open(url, name, features)Works like the window.open() method.
Document objects have an active parser was aborted boolean, which is
used to prevent scripts from invoking the document.open()
and document.write() methods (directly or indirectly)
after the document's active parser has been aborted. It is initially false.
The document open steps, given a document, are as follows:
If document is an XML document, then throw
an "InvalidStateError" DOMException.
If document's throw-on-dynamic-markup-insertion counter is greater
than 0, then throw an "InvalidStateError"
DOMException.
Let entryDocument be the entry global object's associated Document.
If document's origin is not
same origin to entryDocument's origin, then throw a
"SecurityError" DOMException.
If document has an active parser whose script nesting level is greater than 0, then return document.
This basically causes document.open() to
be ignored when it's called in an inline script found during parsing, while still letting it
have an effect when called from a non-parser task such as a timer callback or event handler.
Similarly, if document's unload counter is greater than 0, then return document.
This basically causes document.open() to
be ignored when it's called from a beforeunload, pagehide, or unload event
handler while the Document is being unloaded.
If document's active parser was aborted is true, then return document.
This notably causes document.open() to
be ignored if it is called after a navigation has started, but
only during the initial parse. See issue
#4723 for more background.
If document's node navigable is non-null and document's node navigable's ongoing navigation is a navigation ID, then stop loading document's node navigable.
For each shadow-including inclusive descendant node of document, erase all event listeners and handlers given node.
If document is the associated
Document of document's relevant global object, then
erase all event listeners and handlers given document's relevant
global object.
Replace all with null within document.
If document is fully active, then:
Let newURL be a copy of entryDocument's URL.
If entryDocument is not document, then set newURL's fragment to null.
Run the URL and history update steps with document and newURL.
Set document's is initial about:blank to
false.
If document's iframe load in progress flag is set, then set document's mute iframe load flag.
Set document to no-quirks mode.
Create a new HTML parser and associate it with document. This is a
script-created parser (meaning that it can be closed by the document.open() and document.close() methods, and that the tokenizer will wait for
an explicit call to document.close() before emitting an
end-of-file token). The encoding confidence is
irrelevant.
Set the insertion point to point at just before the end of the input stream (which at this point will be empty).
Update the current document readiness of document to "loading".
This causes a readystatechange
event to fire, but the event is actually unobservable to author code, because of the previous
step which erased all event listeners and
handlers that could observe it.
Return document.
The document open steps do not affect whether a Document
is ready for post-load tasks or completely loaded.
The open(unused1,
unused2) method must return the result of running the document open
steps with this.
The unused1 and
unused2 arguments are ignored, but kept in the IDL to allow code that calls the
function with one or two arguments to continue working. They are necessary due to Web IDL
overload resolution algorithm rules, which would throw a TypeError
exception for such calls had the arguments not been there. whatwg/webidl issue #581 investigates
changing the algorithm to allow for their removal. [WEBIDL]
The open(url,
name, features) method must run these steps:
If this is not fully active, then throw an
"InvalidAccessError" DOMException.
Return the result of running the window open steps with url, name, and features.
document.close()Support in all current engines.
Closes the input stream that was opened by the document.open() method.
Throws an "InvalidStateError" DOMException if the
Document is an XML document.
Throws an "InvalidStateError" DOMException if the
parser is currently executing a custom element constructor.
The close() method must run the following
steps:
If this is an XML document, then throw
an "InvalidStateError" DOMException.
If this's throw-on-dynamic-markup-insertion counter is greater
than zero, then throw an "InvalidStateError"
DOMException.
If there is no script-created parser associated with this, then return.
Insert an explicit "EOF" character at the end of the parser's input stream.
If this's pending parsing-blocking script is not null, then return.
Run the tokenizer, processing resulting tokens as they are emitted, and stopping when the tokenizer reaches the explicit "EOF" character or spins the event loop.
document.write()document.write(...text)Support in all current engines.
In general, adds the given string(s) to the Document's input stream.
This method has very idiosyncratic behavior. In some cases, this method can
affect the state of the HTML parser while the parser is running, resulting in a DOM
that does not correspond to the source of the document (e.g. if the string written is the string
"<plaintext>" or "<!--"). In other cases,
the call can clear the current page first, as if document.open() had been called. In yet more cases, the method
is simply ignored, or throws an exception. User agents are explicitly allowed to avoid executing
script elements inserted via this method. And to make matters even worse, the
exact behavior of this method can in some cases be dependent on network latency, which can lead to failures that are very hard to debug. For all these reasons, use
of this method is strongly discouraged.
Throws an "InvalidStateError" DOMException when
invoked on XML documents.
Throws an "InvalidStateError" DOMException if the
parser is currently executing a custom element constructor.
This method performs no sanitization to remove potentially-dangerous elements
and attributes like script or event handler content attributes.
Document objects have an ignore-destructive-writes counter, which is
used in conjunction with the processing of script elements to prevent external
scripts from being able to use document.write() to blow
away the document by implicitly calling document.open().
Initially, the counter must be set to zero.
The document write steps, given a Document object document,
a list text, a boolean lineFeed, and a string sink, are as
follows:
Let string be the empty string.
Let isTrusted be false if text contains a string; otherwise true.
For each value of text:
If value is a TrustedHTML object, then
append value's associated data to
string.
Otherwise, append value to string.
If isTrusted is false, set string to the result of invoking the
get trusted type compliant string algorithm with
TrustedHTML, this's relevant global
object, string, sink, and "script".
If lineFeed is true, append U+000A LINE FEED to string.
If document is an XML document, then throw
an "InvalidStateError" DOMException.
If document's throw-on-dynamic-markup-insertion counter is greater
than 0, then throw an "InvalidStateError"
DOMException.
If document's active parser was aborted is true, then return.
If the insertion point is undefined, then:
If document's unload counter is greater than 0 or document's ignore-destructive-writes counter is greater than 0, then return.
Run the document open steps with document.
Insert string into the input stream just before the insertion point.
If document's pending parsing-blocking script is null, then have the
HTML parser process string, one code point at a time, processing
resulting tokens as they are emitted, and stopping when the tokenizer reaches the insertion
point or when the processing of the tokenizer is aborted by the tree construction stage (this
can happen if a script end tag token is emitted by the tokenizer).
If the document.write() method was
called from script executing inline (i.e. executing because the parser parsed a set of
script tags), then this is a reentrant invocation of the
parser. If the parser pause flag is set, the tokenizer will abort immediately
and no HTML will be parsed, per the tokenizer's parser pause
flag check.
The document.write(...text) method steps are
to run the document write steps with this, text, false, and
"Document write".
document.writeln()document.writeln(...text)Support in all current engines.
Adds the given string(s) to the Document's input stream, followed by a newline
character. If necessary, calls the open() method
implicitly first.
This method has very idiosyncratic behavior. Use of this
method is strongly discouraged, for the same reasons as document.write().
Throws an "InvalidStateError" DOMException when
invoked on XML documents.
Throws an "InvalidStateError" DOMException if the
parser is currently executing a custom element constructor.
This method performs no sanitization to remove potentially-dangerous elements
and attributes like script or event handler content attributes.
The document.writeln(...text) method steps are
to run the document write steps with this, text, true, and
"Document writeln".
Support in all current engines.
partial interface Element {
[CEReactions] undefined setHTMLUnsafe((TrustedHTML or DOMString) html, optional SetHTMLUnsafeOptions options = {});
[CEReactions] undefined setHTML(DOMString html, optional SetHTMLOptions options = {});
DOMString getHTML(optional GetHTMLOptions options = {});
[CEReactions] attribute (TrustedHTML or [LegacyNullToEmptyString] DOMString) innerHTML;
[CEReactions] attribute (TrustedHTML or [LegacyNullToEmptyString] DOMString) outerHTML;
[CEReactions] undefined insertAdjacentHTML(DOMString position, (TrustedHTML or DOMString) string);
};
partial interface ShadowRoot {
[CEReactions] undefined setHTMLUnsafe((TrustedHTML or DOMString) html, optional SetHTMLUnsafeOptions options = {});
[CEReactions] undefined setHTML(DOMString html, optional SetHTMLOptions options = {});
DOMString getHTML(optional GetHTMLOptions options = {});
[CEReactions] attribute (TrustedHTML or [LegacyNullToEmptyString] DOMString) innerHTML;
};
enum SanitizerPresets { "default" };
dictionary SetHTMLOptions {
(Sanitizer or SanitizerConfig or SanitizerPresets) sanitizer = "default";
};
dictionary SetHTMLUnsafeOptions {
(Sanitizer or SanitizerConfig or SanitizerPresets) sanitizer = {};
};
dictionary GetHTMLOptions {
boolean serializableShadowRoots = false;
sequence<ShadowRoot> shadowRoots = [];
};
DOMParser interfaceThe DOMParser interface allows authors to create new Document objects
by parsing strings, as either HTML or XML.
parser = new DOMParser()Support in all current engines.
Constructs a new DOMParser object.
document = parser.parseFromString(string, type)Support in all current engines.
Parses string using either the HTML or XML parser, according to type,
and returns the resulting Document. type can be "text/html"
(which will invoke the HTML parser), or any of "text/xml",
"application/xml", "application/xhtml+xml", or
"image/svg+xml" (which will invoke the XML parser).
For the XML parser, if string cannot be parsed, then the returned
Document will contain elements describing the resulting error.
Note that script elements are not evaluated during parsing, and the resulting
document's encoding will always be
UTF-8. The document's URL will be
inherited from parser's relevant global object.
Values other than the above for type will cause a TypeError exception
to be thrown.
The design of DOMParser, as a class that needs to be constructed and
then have its parseFromString() method
called, is an unfortunate historical artifact. If we were designing this functionality today it
would be a standalone function. For parsing HTML, the modern alternative is Document.parseHTMLUnsafe().
This method performs no sanitization to remove potentially-dangerous elements
and attributes like script or event handler content attributes.
[Exposed=Window]
interface DOMParser {
constructor();
[NewObject] Document parseFromString((TrustedHTML or DOMString) string, DOMParserSupportedType type);
};
enum DOMParserSupportedType {
"text/html",
"text/xml",
"application/xml",
"application/xhtml+xml",
"image/svg+xml"
};
The new DOMParser() constructor
steps are to do nothing.
The parseFromString(string,
type) method steps are:
Let compliantString be the result of invoking the get trusted type compliant string algorithm with TrustedHTML, this's relevant global
object, string, "DOMParser parseFromString", and "script".
Let document be a new Document, whose content type is type and URL is this's relevant global object's associated Document's URL.
The document's encoding will
be left as its default, of UTF-8. In particular, any XML declarations or
meta elements found while parsing compliantString will have no effect.
Switch on type:
text/html"Parse HTML from a string given document and compliantString.
Since document does not have a browsing context, scripting is disabled.
Create an XML parser parser, associated with document, and with XML scripting support disabled.
Parse compliantString using parser.
If the previous step resulted in an XML well-formedness or XML namespace well-formedness error, then:
Assert: document has no child nodes.
Let root be the result of creating an
element given document, "parsererror", and "http://www.mozilla.org/newlayout/xml/parsererror.xml".
Optionally, add attributes or children to root to describe the nature of the parsing error.
Append root to document.
Return document.
To parse HTML from a string, given a Document document and a
string html:
Set document's type to "html".
Create an HTML parser parser, associated with document.
Place html into the input stream for parser. The encoding confidence is irrelevant.
Start parser and let it run until it has consumed all the characters just inserted into the input stream.
This might mutate the document's mode.
element.setHTMLUnsafe(html, options)Parses html using the HTML parser with options options, and replaces
the children of element with the result. element provides context for the
HTML parser. If the options dictionary contains a "sanitizer" member, it is used to
sanitize the parsed fragment before it is inserted into element.
shadowRoot.setHTMLUnsafe(html, options)Parses html using the HTML parser with options options, and replaces
the children of shadowRoot with the result. shadowRoot's host provides context for the HTML parser. If the
options dictionary contains a "sanitizer" member, it is used to
sanitize the parsed fragment before it is inserted into shadowRoot.
element.setHTML(html, options)Parses html using the HTML parser with options options, and replaces
the children of element with the result. element provides context for the
HTML parser. The parsed fragment is sanitized based on the
options's "sanitizer" member, and
unsafe content is removed.
shadowRoot.setHTML(html, options)Parses html using the HTML parser with options options, and replaces
the children of shadowRoot with the result. shadowRoot's host provides context for the HTML parser. The
parsed fragment is sanitized based on the options's
"sanitizer" member, and unsafe content is removed.
doc = Document.parseHTMLUnsafe(html, options)Parses html using the HTML parser with options options, and returns the
resulting Document.
Note that script elements are not evaluated during parsing, and the resulting
document's encoding will always be
UTF-8. The document's URL will be
about:blank. If the options dictionary contains a "sanitizer" member, it is used to
sanitize the resulting DOM.
Parses html using the HTML parser with options options, and replaces the children of the element or shadow root with the result.
doc = Document.parseHTML(html, options)Parses html using the HTML parser with options options, and returns a
new Document containing the result. The resulting document is sanitized based on the options's "sanitizer" member, and unsafe content is removed.
The methods with an Unsafe suffix perform no
sanitization to remove potentially-dangerous elements and attributes like script or
event handler content attributes.
Element's setHTMLUnsafe(html, options)
method steps are:
Let compliantHTML be the result of invoking the get trusted type compliant string algorithm with TrustedHTML, this's relevant global
object, html, "Element setHTMLUnsafe", and "script".
Let target be this's template contents if
this is a template element; otherwise this.
Set and filter HTML given target, this, compliantHTML, options, and false.
ShadowRoot's setHTMLUnsafe(html,
options) method steps are:
Let compliantHTML be the result of invoking the get trusted type compliant string algorithm with TrustedHTML, this's relevant global
object, html, "ShadowRoot setHTMLUnsafe", and "script".
Set and filter HTML given this, this's shadow host, compliantHTML, options, and false.
Element's setHTML(html, options) method
steps are:
Let target be this's template contents if
this is a template element; otherwise this.
Set and filter HTML given target, this, html, options, and true.
ShadowRoot's setHTML(html, options) method
steps are:
Set and filter HTML given this, this's shadow host, html, options, and true.
The static parseHTMLUnsafe(html, options)
method steps are:
Let compliantHTML be the result of invoking the get trusted type compliant string algorithm with TrustedHTML, this's relevant global
object, html, "Document parseHTMLUnsafe", and "script".
Let document be a new Document, whose content type is "text/html".
Since document does not have a browsing context, scripting is disabled.
Set document's allow declarative shadow roots to true.
Parse HTML from a string given document and compliantHTML.
Let sanitizer be the result of calling get a sanitizer instance from options with options and false.
Call sanitize on document with sanitizer and false.
Return document.
The static parseHTML(html,
options) method steps are:
Let document be a new Document, whose content type is "text/html".
Since document does not have a browsing context, scripting is disabled.
Set document's allow declarative shadow roots to true.
Parse HTML from a string given document and html.
Let sanitizer be the result of calling get a sanitizer instance from options with options and true.
Call sanitize on document with sanitizer and true.
Return document.
html = element.getHTML({ serializableShadowRoots, shadowRoots })Returns the result of serializing element to HTML. Shadow roots within element are serialized according to the provided options:
If serializableShadowRoots is true, then all shadow roots marked as serializable are serialized.
If the shadowRoots array is provided, then all shadow roots specified in the array are serialized, regardless of whether or not they are marked as serializable.
If neither option is provided, then no shadow roots are serialized.
html = shadowRoot.getHTML({ serializableShadowRoots, shadowRoots })Returns the result of serializing shadowRoot to HTML, using its shadow host as the context element. Shadow roots within shadowRoot are serialized according to the provided options, as above.
Element's getHTML(options) method steps
are to return the result of HTML fragment serialization algorithm with
this, options["serializableShadowRoots"],
and options["shadowRoots"].
ShadowRoot's getHTML(options) method steps
are to return the result of HTML fragment serialization algorithm with
this, options["serializableShadowRoots"],
and options["shadowRoots"].
innerHTML propertyThe innerHTML property has a number of outstanding issues
in the DOM Parsing and Serialization issue
tracker, documenting various problems with its specification.
element.innerHTMLReturns a fragment of HTML or XML that represents the element's contents.
In the case of an XML document, throws an "InvalidStateError"
DOMException if the element cannot be serialized to XML.
element.innerHTML = valueReplaces the contents of the element with nodes parsed from the given string.
In the case of an XML document, throws a "SyntaxError"
DOMException if the given string is not well-formed.
shadowRoot.innerHTMLReturns a fragment of HTML that represents the shadow roots's contents.
shadowRoot.innerHTML = valueReplaces the contents of the shadow root with nodes parsed from the given string.
These properties' setters perform no sanitization to remove
potentially-dangerous elements and attributes like script or event handler
content attributes.
The fragment serializing algorithm steps, given an Element,
Document, or DocumentFragment node and a boolean require
well-formed, are:
Let context document be node's node document.
If context document is an HTML document, return the result of HTML fragment serialization algorithm with node, false, and « ».
Return the XML serialization of node given require well-formed.
The fragment parsing algorithm steps, given an Element
context, a string markup, and an optional parser scripting mode
scriptingMode (default Inert), are:
Let newChildren be null.
If context's node document is an XML document, then set newChildren to the result of invoking the XML fragment parsing algorithm given context and markup.
Otherwise, set newChildren to the result of invoking the HTML fragment parsing algorithm given context, markup, false, and scriptingMode.
Let fragment be a new DocumentFragment whose node
document is context's node document.
For each node of newChildren, in tree order: append node to fragment.
This ensures the node document for the new nodes is correct.
Return fragment.
Element's innerHTML getter steps are to return the result of
running fragment serializing algorithm steps with this and true.
ShadowRoot's innerHTML getter steps are to return the result of
running fragment serializing algorithm steps with this and true.
Element's innerHTML setter steps
are:
Let compliantString be the result of invoking the get trusted type compliant string algorithm with TrustedHTML, this's relevant global
object, the given value, "Element innerHTML", and "script".
Let context be this.
Let fragment be the result of invoking the fragment parsing algorithm steps with context and compliantString.
If context is a template element, then set context to
the template element's template contents (a
DocumentFragment).
Setting innerHTML on a
template element will replace all the nodes in its template contents
rather than its children.
Replace all with fragment within context.
ShadowRoot's innerHTML setter
steps are:
Let compliantString be the result of invoking the get trusted type compliant string algorithm with TrustedHTML, this's relevant global
object, the given value, "ShadowRoot innerHTML", and "script".
Let fragment be the result of invoking the fragment parsing algorithm steps with context and compliantString.
Replace all with fragment within this.
outerHTML propertyThe outerHTML property has a number of outstanding issues
in the DOM Parsing and Serialization issue
tracker, documenting various problems with its specification.
element.outerHTMLReturns a fragment of HTML or XML that represents the element and its contents.
In the case of an XML document, throws an "InvalidStateError"
DOMException if the element cannot be serialized to XML.
element.outerHTML = valueReplaces the element with nodes parsed from the given string.
In the case of an XML document, throws a "SyntaxError"
DOMException if the given string is not well-formed.
Throws a "NoModificationAllowedError" DOMException if
the parent of the element is a Document.
This property's setter performs no sanitization to remove potentially-dangerous
elements and attributes like script or event handler content
attributes.
Element's outerHTML getter steps are:
Let element be a fictional node whose only child is this.
Return the result of running fragment serializing algorithm steps with element and true.
Element's outerHTML setter steps
are:
Let compliantString be the result of invoking the get trusted type compliant string algorithm with TrustedHTML, this's relevant global
object, the given value, "Element outerHTML", and "script".
If parent is null, return. There would be no way to obtain a reference to the nodes created even if the remaining steps were run.
If parent is a Document, throw a
"NoModificationAllowedError" DOMException.
If parent is a DocumentFragment, set parent to the
result of creating an element given this's
node document, "body", and the HTML
namespace.
Let fragment be the result of invoking the fragment parsing algorithm steps given parent and compliantString.
insertAdjacentHTML() methodThe insertAdjacentHTML()
method has a number of outstanding issues in the DOM Parsing and Serialization issue tracker, documenting various problems
with its specification.
element.insertAdjacentHTML(position, string)Parses string as HTML or XML and inserts the resulting nodes into the tree in the position given by the position argument, as follows:
beforebegin"afterbegin"beforeend"afterend"Throws a "SyntaxError" DOMException if the arguments
have invalid values (e.g., in the case of an XML document,
if the given string is not well-formed).
Throws a "NoModificationAllowedError" DOMException
if the given position isn't possible (e.g. inserting elements after the root element of a
Document).
This method performs no sanitization to remove potentially-dangerous elements
and attributes like script or event handler content attributes.
Element's insertAdjacentHTML(position,
string) method steps are:
Let compliantString be the result of invoking the get trusted type compliant string algorithm with TrustedHTML, this's relevant global
object, string, "Element insertAdjacentHTML", and "script".
Let context be null.
Use the first matching item from this list:
beforebegin"afterend"If context is null or a Document, throw a
"NoModificationAllowedError" DOMException.
afterbegin"beforeend"Throw a "SyntaxError" DOMException.
If context is not an Element or all of the following are true:
context's node document is an HTML document;
context's local name is
"html"; and
context's namespace is the HTML namespace,
then set context to the result of creating an
element given this's node document, "body", and the HTML namespace.
Let fragment be the result of invoking the fragment parsing algorithm steps with context and compliantString.
beforebegin"afterbegin"Insert fragment into this before its first child.
beforeend"afterend"Insert fragment into this's parent before this's next sibling.
As with other direct Node-manipulation APIs (and unlike innerHTML), insertAdjacentHTML() does not include any special
handling for template elements. In most cases you will want to use templateEl.content.insertAdjacentHTML() instead of directly
manipulating the child nodes of a template element.
createContextualFragment()
methodThe createContextualFragment() method has a number
of outstanding issues in the DOM Parsing and Serialization issue tracker, documenting various problems
with its specification.
docFragment = range.createContextualFragment(string)Returns a DocumentFragment created from the markup string string using
range's start node as the context in
which fragment is parsed.
This method performs no sanitization to remove potentially-dangerous elements
and attributes like script or event handler content attributes.
partial interface Range {
[CEReactions, NewObject] DocumentFragment createContextualFragment((TrustedHTML or DOMString) string);
};
Range's createContextualFragment(string)
method steps are:
Let compliantString be the result of invoking the get
trusted type compliant string algorithm with TrustedHTML, this's relevant global
object, string, "Range createContextualFragment", and
"script".
Let node be this's start node.
Let element be null.
If node implements Element, set element
to node.
Otherwise, if node implements Text or
Comment, set element to node's parent
element.
If element is null or all of the following are true:
element's node document is an HTML document;
element's local name is
"html"; and
element's namespace is the HTML namespace,
then set element to the result of creating an
element given this's node document, "body", and the HTML namespace.
Return the result of invoking the fragment parsing algorithm steps with element, compliantString, and Fragment.
XMLSerializer interfaceThe XMLSerializer interface has a number of outstanding issues in the
DOM Parsing and Serialization issue tracker, documenting various problems
with its specification. The remainder of DOM Parsing and Serialization will be
gradually upstreamed to this specification.
xmlSerializer = new XMLSerializer()Constructs a new XMLSerializer object.
string = xmlSerializer.serializeToString(root)Returns the result of serializing root to XML.
Throws an "InvalidStateError" DOMException if
root cannot be serialized to XML.
The design of XMLSerializer, as a class that needs to be constructed
and then have its serializeToString()
method called, is an unfortunate historical artifact. If we were designing this functionality
today it would be a standalone function.
[Exposed=Window]
interface XMLSerializer {
constructor();
DOMString serializeToString(Node root);
};
The new XMLSerializer()
constructor steps are to do nothing.
The serializeToString(root)
method steps are:
Return the XML serialization of root given false.
This section is non-normative.
Web applications often need to process untrusted HTML strings, such as when rendering user-generated content or using client-side templates. Safely inserting these strings into the DOM requires careful sanitization to prevent DOM-based cross-site scripting (XSS) attacks.
HTML sanitization provides a native mechanism for safely parsing and sanitizing HTML strings. By using the user agent's own HTML parser, they ensure the sanitized output accurately reflects how the browser will render the content, preventing script execution and mitigating advanced attacks such as script gadgets.
These APIs offer functionality to parse a string containing HTML into a DOM tree, and to filter the resulting tree according to a user-supplied configuration. The methods come in two main flavors: "safe" and "unsafe".
The "safe" methods will not generate any markup that executes script. That is, they are intended to be safe from XSS. The "unsafe" methods will parse and filter based on the provided configuration, but do not have the same safety guarantees by default.
Sanitizer interface[Exposed=Window]
interface Sanitizer {
constructor(optional (SanitizerConfig or SanitizerPresets) configuration = "default");
// Query configuration:
SanitizerConfig get();
// Modify a Sanitizer's lists and fields:
boolean allowElement(SanitizerElementWithAttributes element);
boolean removeElement(SanitizerElement element);
boolean replaceElementWithChildren(SanitizerElement element);
boolean allowProcessingInstruction(SanitizerPI pi);
boolean removeProcessingInstruction(SanitizerPI pi);
boolean allowAttribute(SanitizerAttribute attribute);
boolean removeAttribute(SanitizerAttribute attribute);
boolean setComments(boolean allow);
boolean setDataAttributes(boolean allow);
// Remove markup that executes script.
boolean removeUnsafe();
};
config = sanitizer.get()Returns a copy of the sanitizer's configuration.
sanitizer.allowElement(element)Ensures that the sanitizer configuration allows the specified element.
sanitizer.removeElement(element)Ensures that the sanitizer configuration blocks the specified element.
sanitizer.replaceElementWithChildren(element)Configures the sanitizer to remove the specified element but keep its child nodes.
sanitizer.allowAttribute(attribute)Configures the sanitizer to allow the specified attribute globally.
sanitizer.removeAttribute(attribute)Configures the sanitizer to block the specified attribute globally.
sanitizer.allowProcessingInstruction(pi)Configures the sanitizer to allow the specified processing instruction.
sanitizer.removeProcessingInstruction(pi)Configures the sanitizer to block the specified processing instruction.
sanitizer.setComments(allow)Sets whether the sanitizer preserves comments.
sanitizer.setDataAttributes(allow)Sets whether the sanitizer preserves custom data attributes (e.g., data-*).
sanitizer.removeUnsafe()Modifies the configuration to automatically remove elements and attributes that are considered unsafe.
A Sanitizer has an associated configuration (a
SanitizerConfig).
The new
Sanitizer(configuration) constructor steps are:
If configuration is a SanitizerPresets string, then:
Set configuration to the built-in safe default configuration.
To configure a Sanitizer
sanitizer, given a dictionary configuration and a boolean
allowCommentsPIsAndDataAttributes:
Canonicalize the configuration configuration with allowCommentsPIsAndDataAttributes.
Set sanitizer's configuration to configuration.
To canonicalize the configuration SanitizerConfig configuration with a boolean allowCommentsPIsAndDataAttributes:
For each member of configuration that is a list of strings:
Replace each string in member with the result of canonicalizing it.
If neither configuration["elements"] nor configuration["removeElements"] exists, then set configuration["removeElements"] to an empty list.
If neither configuration["attributes"] nor configuration["removeAttributes"] exists, then set configuration["removeAttributes"] to an empty
list.
If neither configuration["processingInstructions"] nor
configuration["removeProcessingInstructions"]
exists, then:
If allowCommentsPIsAndDataAttributes is true, then set
configuration["removeProcessingInstructions"]
to an empty list.
Otherwise, set configuration["processingInstructions"] to an empty
list.
If configuration["elements"]
exists, then:
Let newElements be « ».
For each element of
configuration["elements"], append the result of canonicalizing element to newElements.
Set configuration["elements"] to newElements.
If configuration["removeElements"] exists, then set configuration["removeElements"] to the result of canonicalizing
configuration["removeElements"].
If configuration["attributes"] exists, then set configuration["attributes"] to the result of canonicalizing configuration["attributes"].
If configuration["removeAttributes"] exists, then set configuration["removeAttributes"] to the result of canonicalizing configuration["removeAttributes"].
If configuration["replaceWithChildrenElements"]
exists, then set configuration["replaceWithChildrenElements"] to
the result of canonicalizing
configuration["replaceWithChildrenElements"].
If configuration["processingInstructions"] exists, then set configuration["processingInstructions"] to the result
of canonicalizing
configuration["processingInstructions"].
If configuration["removeProcessingInstructions"]
exists, then set configuration["removeProcessingInstructions"]
to the result of canonicalizing
configuration["removeProcessingInstructions"].
If configuration["comments"]
does not exist, then set it to
allowCommentsPIsAndDataAttributes.
If configuration["attributes"] exists and configuration["dataAttributes"] does not exist, then set it to allowCommentsPIsAndDataAttributes.
To canonicalize a sanitizer list list:
Let newList be « ».
For each item in list, append the result of canonicalizing item to newList.
Return newList.
To canonicalize a processing instruction list list:
Let newList be « ».
For each item in list, append the result of canonicalizing item to newList.
Return newList.
To canonicalize a processing instruction given a SanitizerPI
pi:
To canonicalize a sanitizer name given a DOMString or dictionary name, and a default namespace
defaultNamespace (default null):
To canonicalize a sanitizer element given a SanitizerElement
element:
Return the result of canonicalizing element with the HTML namespace as the default namespace.
To canonicalize a sanitizer element list list:
Let newList be « ».
For each item in list, append the result of canonicalizing item to newList.
Return newList.
To find the canonicalized intersection of lists A and B:
Let setA be « ».
Let setB be « ».
For each entry of A, append the result of canonicalizing entry to setA.
For each entry of B, append the result of canonicalizing entry to setB.
Return the intersection of setA and setB.
The get() method
steps are:
Outside of the get() method, the order of
the Sanitizer's elements and attributes is unobservable. By explicitly sorting the
result of this method, we give implementations the opportunity to optimize by, for example, using
unordered sets internally.
Let config be this's configuration.
If config["elements"] exists, then:
For any element of config["elements"]:
If element["attributes"] exists, then set element["attributes"] to the
result of sort in ascending order element["attributes"], with
compare sanitizer items.
If element["removeAttributes"]
exists, then set element["removeAttributes"]
to the result of sort in ascending order
element["removeAttributes"],
with compare sanitizer items.
Set config["elements"] to
the result of sort in ascending order config["elements"], with compare sanitizer
items.
Otherwise:
Set config["removeElements"] to the result of sort in ascending order config["removeElements"], with compare
sanitizer items.
If config["replaceWithChildrenElements"]
exists, then set config["replaceWithChildrenElements"] to
the result of sort in ascending order config["replaceWithChildrenElements"],
with compare sanitizer items.
If config["processingInstructions"] exists, then set config["processingInstructions"] to the result
of sort in ascending order config["processingInstructions"], with
piA["target"] being
code unit less than piB["target"].
Otherwise:
Set config["removeProcessingInstructions"]
to the result of sorting config["removeProcessingInstructions"],
with piA["target"]
being code unit less than piB["target"].
If config["attributes"]
exists, then set config["attributes"] to the result of sorting config["attributes"] given compare sanitizer
items.
Otherwise:
Set config["removeAttributes"] to the result of sorting config["removeAttributes"] given compare
sanitizer items.
Return config.
The allowElement(element) method steps
are:
Let configuration be this's configuration.
Set element to the result of canonicalizing element.
If configuration["elements"]
exists, then:
Let modified be the result of removing
element from configuration["replaceWithChildrenElements"].
If configuration["attributes"] exists, then:
If element["attributes"] exists, then:
Set element["attributes"] to the
result of removing duplicates from
element["attributes"].
Set element["attributes"] to the
difference of element["attributes"] and
configuration["attributes"].
If configuration["dataAttributes"] is true, then remove all items item from element["attributes"] where
item is a custom data attribute.
If element["removeAttributes"]
exists, then:
Set element["removeAttributes"]
to the result of removing duplicates from
element["removeAttributes"].
Set element["removeAttributes"]
to the intersection of
element["removeAttributes"]
and configuration["attributes"].
Otherwise:
If element["attributes"] exists, then:
Set element["attributes"] to the
result of removing duplicates from
element["attributes"].
Set element["attributes"] to the
difference of element["attributes"] and
element["removeAttributes"]
(or an empty list if it does not exist).
Remove element["removeAttributes"].
Set element["attributes"] to the
difference of element["attributes"] and
configuration["removeAttributes"].
If element["removeAttributes"]
exists, then:
Set element["removeAttributes"]
to the result of removing duplicates from
element["removeAttributes"].
Set element["removeAttributes"]
to the difference of element["removeAttributes"]
and configuration["removeAttributes"].
If configuration["elements"]
does not contain element, then:
Let current element be the item in configuration["elements"] whose name member is element's name member and whose namespace member is
element's namespace
member.
If element is equal to current element, then return modified.
Return true.
Otherwise:
If element["attributes"] exists or element["removeAttributes"]
(or an empty list if it does not exist) is not empty, then return false.
Let modified be the result of removing
element from configuration["replaceWithChildrenElements"].
If configuration["removeElements"] does not contain element, then return modified.
Remove element from
configuration["removeElements"].
Return true.
The removeElement(element) method steps
are to return the result of removing
element from this's configuration.
The replaceElementWithChildren(element)
method steps are:
Let configuration be this's configuration.
Set element to the result of canonicalizing element.
If the built-in non-replaceable elements list contains element, then return false.
Let modified be the result of removing
element from configuration["elements"].
If removing element from
configuration["removeElements"] is true, then set
modified to true.
If configuration["replaceWithChildrenElements"]
does not contains element, then:
Append element to
configuration["replaceWithChildrenElements"].
Return true.
Return modified.
The allowAttribute(attribute) method
steps are:
Let configuration be this's configuration.
Set attribute to the result of canonicalizing with attribute.
If configuration["attributes"] exists, then:
If configuration["dataAttributes"] is true and
attribute is a custom data attribute, then return false.
If configuration["attributes"] contains attribute, then return false.
If configuration["elements"]
exists, then:
For each element in
configuration["elements"]:
If element["attributes"] (or an
empty list if it does not exist) contains
attribute, then remove attribute
from element["attributes"].
Append attribute to
configuration["attributes"].
Return true.
Otherwise:
If configuration["removeAttributes"] does not contain attribute, then return false.
Remove attribute from
configuration["removeAttributes"].
Return true.
The removeAttribute(attribute) method
steps are to return the result of remove
an attribute with attribute and this's
configuration.
The setComments(allow) method steps
are:
The setDataAttributes(allow) method
steps are:
Let configuration be this's configuration.
If configuration["attributes"] does not exist, then return false.
If configuration["dataAttributes"] exists and is equal to allow, then return false.
If allow is true, then:
If configuration["elements"]
exists, then:
For each element of
configuration["elements"]:
If element["attributes"] exists, then remove all items
item from element["attributes"] where
item is a custom data attribute.
Remove all items item from
configuration["attributes"]
where item is a custom data attribute.
Set configuration["dataAttributes"] to allow.
Return true.
The allowProcessingInstruction(pi)
method steps are:
Let configuration be this's configuration.
Set pi to the result of canonicalizing pi.
If configuration["processingInstructions"] exists, then:
If configuration["processingInstructions"] contains pi, then return false.
Append pi to
configuration["processingInstructions"].
Return true.
Otherwise:
If configuration["removeProcessingInstructions"]
contains pi, then:
Remove pi from
configuration["removeProcessingInstructions"].
Return true.
Return false.
The removeProcessingInstruction(pi)
method steps are:
Let configuration be this's configuration.
Set pi to the result of canonicalizing pi.
If configuration["processingInstructions"] exists, then:
If configuration["processingInstructions"] contains pi, then:
Remove pi from
configuration["processingInstructions"].
Return true.
Return false.
Otherwise:
If configuration["removeProcessingInstructions"]
contains pi, then return false.
Append pi to
configuration["removeProcessingInstructions"].
Return true.
The removeUnsafe() method steps are to remove unsafe from this's
configuration.
dictionary SanitizerElementNamespace {
required DOMString name;
DOMString? _namespace = "http://www.w3.org/1999/xhtml";
};
// Used by "elements"
dictionary SanitizerElementNamespaceWithAttributes : SanitizerElementNamespace {
sequence<SanitizerAttribute> attributes;
sequence<SanitizerAttribute> removeAttributes;
};
dictionary SanitizerAttributeNamespace {
required DOMString name;
DOMString? _namespace = null;
};
dictionary SanitizerProcessingInstruction {
required DOMString target;
};
typedef (DOMString or SanitizerElementNamespace) SanitizerElement;
typedef (DOMString or SanitizerElementNamespaceWithAttributes) SanitizerElementWithAttributes;
typedef (DOMString or SanitizerProcessingInstruction) SanitizerPI;
typedef (DOMString or SanitizerAttributeNamespace) SanitizerAttribute;
dictionary SanitizerConfig {
sequence<SanitizerElementWithAttributes> elements;
sequence<SanitizerElement> removeElements;
sequence<SanitizerElement> replaceWithChildrenElements;
sequence<SanitizerProcessingInstruction> processingInstructions;
sequence<SanitizerProcessingInstruction> removeProcessingInstructions;
sequence<SanitizerAttribute> attributes;
sequence<SanitizerAttribute> removeAttributes;
boolean comments;
boolean dataAttributes;
};
SanitizerElementNamespace, SanitizerAttributeNamespace,
SanitizerAttribute, and SanitizerProcessingInstruction dictionaries are
considered equal when all of their members are equal.
Equality should be defined in the infra spec instead. See issue #664.
This section is non-normative.
Configurations can and ought to be modified by developers to suit their purposes. Options are
to write a new SanitizerConfig dictionary from scratch, to modify an existing
Sanitizer's configuration by using the modifier methods, or to get() an existing Sanitizer's
configuration as a dictionary and modify the dictionary and then create a new
Sanitizer with it.
An empty configuration allows everything (when called with the "unsafe" methods like setHTMLUnsafe()). A configuration "default" contains a built-in safe default
configuration. Note that "safe" and "unsafe" sanitizer methods have different defaults.
Not all configuration dictionaries are valid. A valid configuration avoids redundancy (like specifying the same element to be allowed twice) and contradictions (like specifying an element to be both removed and allowed.)
Several conditions need to hold for a configuration to be valid:
Mixing global allow- and remove-lists:
elements or removeElements can exist, but not both. If
both are missing, this is equivalent to removeElements being an empty list.
attributes or removeAttributes can exist, but not both.
If both are missing, this is equivalent to removeAttributes being an empty
list.
dataAttributes is conceptually
an extension of the attributes allow-list.
The dataAttributes member is only
allowed when an attributes list is
used.
Duplicate entries between different global lists:
There are no duplicate entries (i.e., no same elements) between elements, removeElements, or replaceWithChildrenElements.
There are no duplicate entries (i.e., no same attributes) between attributes or removeAttributes.
Mixing local allow- and remove-lists on the same element:
When an attributes list exists,
both, either or none of the attributes and removeAttributes
lists are allowed on the same element.
When a removeAttributes list
exists, either or none of the attributes and removeAttributes
lists are allowed on the same element, but not both.
Duplicate entries on the same element:
There are no duplicate entries between attributes and removeAttributes
on the same element.
No element from the built-in non-replaceable elements list appears in replaceWithChildrenElements,
since replacing these elements with their children could lead to re-parsing issues or invalid
node trees.
The elements element allow-list can also
specify allowing or removing attributes for a given element. This is meant to mirror this
standard's structure, which knows both global attributes as well as local attributes
that apply to a specific element. Global and local attributes can be mixed, but note that
ambiguous configurations where a particular attribute would be allowed by one list and forbidden
by another, are generally invalid.
global attributes | global removeAttributes | |
|---|---|---|
local attributes | An attribute is allowed if it matches either list. No duplicates are allowed. | An attribute is only allowed if it's in the local allow list. No duplicate entries between global remove and local allow lists are allowed. Note that the global remove list has no function for this particular element, but can apply to other elements that do not have a local allow list. |
local removeAttributes | An attribute is allowed if it's in the global allow-list, but not in the local remove-list. Local remove has to be a subset of the global allow lists. | An attribute is allowed if it is in neither list. No duplicate entries between global remove and local remove lists are allowed. |
Please note the asymmetry where mostly no duplicates between global and per-element lists are permitted, but in the case of a global allow-list and a per-element remove-list the latter has to be a subset of the former. An excerpt of the table above, only focusing on duplicates, is as follows:
global attributes | global removeAttributes | |
|---|---|---|
local attributes | No duplicates are allowed. | No duplicates are allowed. |
local removeAttributes | Local remove has to be a subset of the global allow lists. | No duplicates are allowed. |
The dataAttributes setting allows
custom data attributes. The rules above easily extends
to custom data attributes if one considers dataAttributes to be an allow-list:
global attributes and dataAttributes set | |
|---|---|
local attributes | All custom data attributes are allowed. No custom data attributes can be listed in any allow-list, as that would mean a duplicate entry. |
local removeAttributes | A custom data attribute is allowed, unless it's listed in the local remove-list. No custom data attribute can be listed in the global allow-list, as that would mean a duplicate entry. |
Putting these rules in words:
Duplicates and interactions between global and local lists:
If a global attributes allow list
exists, then all element's local lists:
If a local attributes allow list
exists, there can be no duplicate entries between these lists.
If a local removeAttributes
remove list exists, then all its entries also need to be listed in the global attributes allow list.
If dataAttributes is true,
then no custom data attributes can be listed in
any of the allow-lists.
If a global removeAttributes
remove list exists, then:
If a local attributes allow list
exists, there can be no duplicate entries between these lists.
If a local removeAttributes
remove list exists, there can be no duplicate entries between these lists.
Not both a local attributes allow list
and local removeAttributes
remove list exists.
dataAttributes has to be
false.
To set and filter HTML, given an Element or
DocumentFragment target, an Element
contextElement, a string html, a dictionary options,
and a boolean safe:
If all of the following are true:
safe is true;
contextElement's local name
is "script"; and
contextElement's namespace is the HTML namespace or the SVG namespace,
then return.
Let sanitizer be the result of calling getting a sanitizer from options given safe.
Let newChildren be the result of parsing a fragment given contextElement, html, and true.
Let fragment be a new DocumentFragment whose node
document is contextElement's node document.
Sanitize fragment given sanitizer and safe.
Replace all with fragment within target.
To get a sanitizer instance from options from a dictionary options with a boolean safe:
Let sanitizerSpec be "default".
If options["sanitizer"]
exists, then set sanitizerSpec to
options["sanitizer"].
Assert: sanitizerSpec is either a Sanitizer instance,
a SanitizerPresets member, or a SanitizerConfig dictionary.
If sanitizerSpec is a string, then:
Set sanitizerSpec to the built-in safe default configuration.
If sanitizerSpec is a dictionary, then:
Return sanitizerSpec.
To sanitize a Node node with a Sanitizer
sanitizer and a boolean safe:
Let configuration be sanitizer's configuration.
If safe is true, then remove unsafe from configuration.
Sanitize node given configuration and safe.
To perform the inner sanitize steps on a Node node, given a
SanitizerConfig configuration, and a boolean
handleJavascriptNavigationUrls:
For each child of node's children:
Assert: child is a Text, Comment,
Element, ProcessingInstruction, or DocumentType
node.
If child is a DocumentType or Text node, then
continue.
If child is a Comment node, then:
If child is a ProcessingInstruction node, then:
Let piTarget be child's target.
If configuration["processingInstructions"] exists, then:
If configuration["processingInstructions"] does
not contain piTarget, then remove child.
Otherwise:
If configuration["removeProcessingInstructions"]
contains piTarget, then remove child.
Otherwise:
Let elementName be a SanitizerElementNamespace with child's local name and namespace.
If configuration["replaceWithChildrenElements"]
exists and configuration["replaceWithChildrenElements"]
contains elementName, then:
Sanitize child given configuration and handleJavascriptNavigationUrls.
Let fragment be a new DocumentFragment whose node
document is node's node document.
For each innerChild of child's children, append innerChild to fragment.
Replace child with fragment within node. Assert that this did not throw.
Otherwise:
If configuration["removeElements"] contains elementName, then remove child and
continue.
If elementName is a template element in the HTML
namespace, then sanitize child's
template contents given configuration and
handleJavascriptNavigationUrls.
If child is a shadow host, then sanitize child's shadow root given configuration and handleJavascriptNavigationUrls.
Let elementWithLocalAttributes be « ».
If configuration["elements"] exists and configuration["elements"] contains elementName, then set
elementWithLocalAttributes to configuration["elements"][elementName].
For each attribute in child's attribute list:
Let attrName be a SanitizerAttributeNamespace with attribute's local name and namespace.
If elementWithLocalAttributes["removeAttributes"]
exists and elementWithLocalAttributes["removeAttributes"]
contains attrName, then remove
attribute from child's attribute list.
Otherwise, if configuration["attributes"] exists, then:
If configuration["attributes"] does not contain attrName and
elementWithLocalAttributes["attributes"] does
not contain attrName, and if "data-" is not a prefix of attribute's local name or attribute's namespace is not null or
configuration["dataAttributes"] is false, then remove
attribute from child's attribute list.
Otherwise:
If elementWithLocalAttributes["attributes"] exists and elementWithLocalAttributes["attributes"] does
not contain attrName, then remove
attribute from child's attribute list.
Otherwise, if configuration["removeAttributes"] contains attrName, then remove
attribute from child's attribute list.
If handleJavascriptNavigationUrls is true, then:
If the pair (elementName, attrName) matches an entry in the
built-in navigating URL attributes list, and if attribute
contains a javascript: URL, then remove attribute
from child's attribute list.
If child's namespace is
the MathML Namespace, attribute's local name is "href",
and attribute's namespace is
null or the XLink namespace, and attribute contains a
javascript: URL, then remove attribute from child's
attribute list.
If the built-in animating URL attributes list contains the pair (elementName, attrName), and
attribute's value is "href" or "xlink:href", then remove
attribute from child's attribute list.
Sanitize child given configuration and handleJavascriptNavigationUrls.
To determine whether an attribute attribute contains a javascript:
URL:
Let url be the result of running the basic URL parser on attribute's value.
If url is failure, then return false.
Return true if url's scheme is "javascript", and false otherwise.
To remove an element element from a SanitizerConfig configuration:
Set element to the result of canonicalizing element.
Let modified be the result of removing
element from configuration["replaceWithChildrenElements"].
Otherwise:
If configuration["removeElements"] contains element, then return modified.
Add element to
configuration["removeElements"].
Return true.
To remove an attribute attribute from a SanitizerConfig configuration:
Set attribute to the result of canonicalizing with attribute.
If configuration["attributes"] exists, then:
Let modified be the result of removing
attribute from configuration["attributes"].
If configuration["elements"]
exists, then:
For each element of
configuration["elements"]:
If element["attributes"] (or an
empty list if it does not exist) contains
attribute, then:
Set modified to true.
Remove attribute from
element["attributes"].
If element["removeAttributes"]
(or an empty list if it does not exist) contains
attribute, then:
Assert: modified is true.
Remove attribute from
element["removeAttributes"].
Return modified.
Otherwise:
If configuration["removeAttributes"] contains attribute, then return false.
If configuration["elements"]
exists, then:
For each element in
configuration["elements"]:
If element["attributes"] (or an
empty list if it does not exist) contains
attribute, then remove attribute
from element["attributes"].
If element["removeAttributes"]
(or an empty list if it does not exist) contains
attribute, then remove attribute
from element["removeAttributes"].
Add attribute to
configuration["removeAttributes"].
Return true.
To remove unsafe from a SanitizerConfig configuration:
Let result be false.
For each element in built-in safe
baseline configuration["removeElements"]:
If removing element from configuration is true, then set result to true.
For each attribute in built-in safe
baseline configuration["removeAttributes"]:
If removing attribute from configuration is true, then set result to true.
For each attribute that is an event handler content attribute:
If removing attribute from configuration is true, then set result to true.
Return result.
To compare sanitizer items itemA and itemB:
Let namespaceA be itemA["namespace"].
Let namespaceB be itemB["namespace"].
If namespaceA is null, then:
If namespaceB is not null, then return true.
Otherwise:
If namespaceB is null, then return false.
If namespaceA is code unit less than namespaceB, then return true.
If namespaceA is not namespaceB, then return false.
If itemA["name"] is
code unit less than itemB["name"], then return true.
Return false.
To canonicalize a SanitizerElementWithAttributes element:
Let result be the result of canonicalizing element.
If element is a dictionary, then:
If element["attributes"] exists, then set result["attributes"] to the
result of canonicalizing
element["attributes"].
If element["removeAttributes"]
exists, then set result["removeAttributes"]
to the result of canonicalizing
element["removeAttributes"].
If neither result["attributes"] nor
result["removeAttributes"]
exists, then set result["removeAttributes"]
to an empty list.
Return result.
To determine whether a canonical SanitizerConfig config is valid:
It's expected that the configuration being passing in has previously been run through the canonicalize the configuration steps. We will simply assert conditions that that algorithm is guaranteed to hold.
Assert: config["elements"] exists
or config["removeElements"]
exists.
If config["elements"] exists and config["removeElements"] exists, then return false.
Assert: Either config["processingInstructions"] exists or config["removeProcessingInstructions"]
exists.
If config["processingInstructions"] exists and config["removeProcessingInstructions"]
exists, then return false.
Assert: Either config["attributes"] exists or config["removeAttributes"] exists.
If config["attributes"]
exists and config["removeAttributes"] exists, then return false.
Assert: All SanitizerElementNamespaceWithAttributes, SanitizerElementNamespace, SanitizerProcessingInstruction, and SanitizerAttributeNamespace items in config are canonical, meaning they have been run through canonicalizing, as appropriate.
If config["elements"]
has duplicates, then return false.
Otherwise:
If config["removeElements"] has duplicates, then return false.
If config["replaceWithChildrenElements"]
exists and has
duplicates, then return false.
If config["processingInstructions"] exists:
If config["processingInstructions"] has duplicate targets, then return false.
Otherwise:
If config["removeProcessingInstructions"]
has duplicates, then return false.
If config["attributes"] exists:
If config["attributes"]
has duplicates, then return false.
Otherwise:
If config["removeAttributes"] has duplicates, then return false.
If config["replaceWithChildrenElements"]
exists:
For each element of config["replaceWithChildrenElements"]:
If the built-in non-replaceable elements list contains element, then return false.
If the intersection of
config["elements"] and
config["replaceWithChildrenElements"]
is not empty, then return false.
Otherwise:
If the intersection of
config["removeElements"]
and config["replaceWithChildrenElements"]
is not empty, then return false.
If config["attributes"] exists:
Assert: config["dataAttributes"] exists.
For each element of
config["elements"]:
If element["attributes"] exists and element["attributes"] has duplicates, then return false.
If element["removeAttributes"]
exists and element["removeAttributes"]
has duplicates, then return false.
If the intersection of
config["attributes"] and
element["attributes"] (or an
empty list if it does not exist) is not empty, then return false.
If element["removeAttributes"]
(or an empty list if it does not exist) is not a subset of
config["attributes"], then
return false.
If config["dataAttributes"] is true and
element["attributes"]
contains a custom data attribute, then return false.
If config["dataAttributes"] is true and
config["attributes"] contains a
custom data attribute, then return false.
Otherwise:
For each element of
config["elements"]:
If element["attributes"] exists and element["removeAttributes"]
exists, then return false.
If element["attributes"] exists and element["attributes"] has duplicates, then return false.
If element["removeAttributes"]
exists and element["removeAttributes"]
has duplicates, then return false.
If the intersection of
config["removeAttributes"] and
element["attributes"] (or an
empty list if it does not exist) is not empty, then return false.
If the intersection of
config["removeAttributes"] and
element["removeAttributes"]
(or an empty list if it does not exist) is not empty, then return false.
If config["dataAttributes"] exists, then return false.
Return true.
When specified, the safe sanitization criteria
for each element defines whether the element is removed or
included by default when performing safe
sanitization. When unspecified, the element is not included by default, but can still be added by
a SanitizerConfig
The built-in safe baseline configuration is a SanitizerConfig. Its
removeElements list consists of all HTML
elements normatively marked as Removed within their
individual definitions, along with the obsolete frame element, and the
script and use SVG elements, and its removeAttributes list is empty.
Event handler content attributes are automatically removed by the remove
unsafe algorithm during safe sanitization, so the effective baseline behaves as if they
were included in the removeAttributes
list.
The built-in safe default configuration is a SanitizerConfig whose members are initialized as follows:
processingInstructionsattributescommentsdataAttributeselementsA table of SVG & MathML elements that are included by default in the built-in safe default configuration:
The built-in navigating URL attributes list corresponds to all HTML elements marked with Navigating URL attributes in their normative definitions, as well as elements corresponding to the following table:
| Element | Element Namespace | Attribute | Attribute Namespace |
|---|---|---|---|
a
| SVG | href
| |
a
| SVG | href
| XLink |
The built-in animating URL attributes list corresponds to the following table:
| Element | Element Namespace | Attribute |
|---|---|---|
animate
| SVG | attributeName
|
animateTransform
| SVG | attributeName
|
set
| SVG | attributeName
|
The built-in non-replaceable elements list contains elements that must not be replaced with their children, as doing so can lead to re-parsing issues or an invalid node tree. It is the following list of SanitizerElementNamespace dictionaries:
| Element | Element Namespace |
|---|---|
html
| HTML |
svg
| SVG |
math
| MathML |
This section is non-normative.
The Sanitizer API is intended to prevent DOM-based cross-site scripting by traversing a supplied HTML content and removing elements and attributes according to a configuration. The specified API is designed to not support the construction of a Sanitizer object that leaves script-capable markup in and doing so would be a bug in the threat model. That being said, there are security issues which the correct usage of the Sanitizer API will not be able to protect against and the scenarios will be laid out in the following sections.
This section is non-normative.
The Sanitizer API operates solely in the DOM and adds a capability to traverse and filter an
existing DocumentFragment. The Sanitizer does not address server-side reflected or
stored XSS
This section is non-normative.
DOM clobbering describes an attack in which malicious HTML confuses an application by naming
elements through id or name
attributes such that properties like children of an HTML element in the DOM are overshadowed by
the malicious content. The Sanitizer API does not protect DOM clobbering attacks in its default
state, but can be configured to remove id and name attributes.
Script gadgets are a technique in which an attacker uses existing application code from popular JavaScript libraries to cause their own code to execute. This is often done by injecting innocent-looking code or seemingly inert DOM nodes that is only parsed and interpreted by a framework which then performs the execution of JavaScript based on that input.
The Sanitizer API can not prevent these attacks, but requires page authors to explicitly allow
unknown elements in general, and authors required to additionally explicitly configure unknown
attributes and elements and markup that is known to be widely used for templating and
framework-specific code, like data-* and slot attributes and elements like slot and
template. These restrictions are not exhaustive and encourage page authors to examine
their third party libraries for this behavior.
Mutation XSS or mXSS describes an attack that exploits cases where the parsed DOM structure is not the same after serializing and parsing again, to bypass sanitization that happens before serialization. An example for carrying out such an attack is by relying on the change of parsing behavior for foreign content or mis-nested tags. The Sanitizer API offers only functions that turn a string into a node tree. The context is supplied implicitly by all sanitizer functions: setHTML() uses the current element; Document.parseHTML() creates a new document. Therefore Sanitizer API is not directly affected by mutation XSS. If a developer were to retrieve a sanitized node tree as a string, e.g. via .innerHTML, and to then parse it again then mutation XSS can occur. This practice is strongly discouraged. If processing or passing of HTML as a string is necessary after all, then any string is considered untrusted and re-sanitized when inserted into the DOM. In other words, a sanitized and then serialized HTML tree can no longer be considered as sanitized. A more complete treatment of mXSS can be found in [MXSS].