URL ( PR #763 )

PR Preview — Last Updated

Participate:
GitHub whatwg/url ( new issue , open issues )
Chat on Matrix
Commits:
GitHub whatwg/url/commits
Go to the living standard
@urlstandard
Tests:
web-platform-tests url/ ( ongoing work )
Translations (non-normative) :
日本語
This is a pull request preview of the standard

This document contains the contents of the standard as modified by pull request #763 , and should only be used as a preview.

Do not attempt to implement this version of the standard. Do not reference this version as authoritative in any way. Instead, see https://url.spec.whatwg.org/ for the living standard.

Abstract

The URL Standard defines URLs, domains, IP addresses, the application/x-www-form-urlencoded format, and their API.

Goals

The URL standard takes the following approach towards making URLs fully interoperable:

As the editors learn more about the subject matter the goals might increase in scope somewhat.

1. Infrastructure

This specification depends on Infra . [INFRA]

Some terms used in this specification are defined in the following standards and specifications:


To serialize an integer , represent it as the shortest possible decimal number.

1.1. Writing

A validation error indicates a mismatch between input and valid input. User agents, especially conformance checkers, are encouraged to report them somewhere.

A validation error does not mean that the parser terminates. Termination of a parser is always stated explicitly, e.g., through a return statement.

It is useful to signal validation errors as error-handling can be non-intuitive, legacy user agents might not implement correct error-handling, and the intent of what is written might be unclear to other developers.

Error type Error description Failure
IDNA
domain-to-ASCII

Unicode ToASCII records an error or returns the empty string. [UTS46]

If details about Unicode ToASCII errors are recorded, user agents are encouraged to pass those along.

Yes
domain-to-Unicode

Unicode ToUnicode records an error. [UTS46]

The same considerations as with domain-to-ASCII apply.

·
Host parsing
domain-invalid-code-point

The input’s host contains a forbidden domain code point .

Hosts are percent-decoded before being processed when the URL is special , which would result in the following host portion becoming " exa#mple.org " and thus triggering this error.

" https://exa%23mple.org "

Yes
host-invalid-code-point

An opaque host (in a URL that is not special ) contains a forbidden host code point .

" foo://exa[mple.org "

Yes
IPv4-empty-part

An IPv4 address ends with a U+002E (.).

" https://127.0.0.1./ "

·
IPv4-too-many-parts

An IPv4 address does not consist of exactly 4 parts.

" https://1.2.3.4.5/ "

Yes
IPv4-non-numeric-part

An IPv4 address part is not numeric.

" https://test.42 "

Yes
IPv4-non-decimal-part

The IPv4 address contains numbers expressed using hexadecimal or octal digits.

" https://127.0.0x0.1 "

·
IPv4-out-of-range-part

An IPv4 address part exceeds 255.

" https://255.255.4000.1 "

Yes
(only if applicable to the last part)
IPv6-unclosed

An IPv6 address is missing the closing U+005D (]).

<p class=example id='example-ipv6-unclosed"

https://[::1" Yes " https://[::1 "

Yes
IPv6-invalid-compression An IPv6 address

An IPv6 address begins with improper compression.

" https://[:1] " Yes "

Yes
IPv6-too-many-pieces An IPv6 address

An IPv6 address contains more than 8 pieces.

" https://[1:2:3:4:5:6:7:8:9] " Yes "

Yes
IPv6-multiple-compression An IPv6 address

An IPv6 address is compressed in more than one spot.

" https://[1::1::1] " Yes "

Yes
IPv6-invalid-code-point An IPv6 address ASCII hex digit

An IPv6 address contains a code point that is neither an ASCII hex digit nor a U+003A (:). Or it unexpectedly ends.

" https://[1:2:3!:4] " "

" https://[1:2:3:] " Yes "

Yes
IPv6-too-few-pieces An uncompressed IPv6 address

An uncompressed IPv6 address contains fewer than 8 pieces.

" https://[1:2:3] " Yes "

Yes
IPv4-in-IPv6-too-many-pieces An IPv6 address IPv4 address

An IPv6 address with IPv4 address syntax: the IPv6 address has more than 6 pieces.

" https://[1:1:1:1:1:1:1:127.0.0.1] " Yes "

Yes
IPv4-in-IPv6-invalid-code-point An IPv6 address IPv4 address An IPv4 part is empty or contains a non-

An IPv6 address with IPv4 address syntax:

" https://[ffff::.0.0.1] " "

" https://[ffff::127.0.xyz.1] " "

" https://[ffff::127.0xyz] " "

" https://[ffff::127.00.0.1] " "

" https://[ffff::127.0.0.1.2] " Yes "

Yes
IPv4-in-IPv6-out-of-range-part An IPv6 address IPv4 address

An IPv6 address with IPv4 address syntax: an IPv4 part exceeds 255.

" https://[ffff::127.0.0.4000] " Yes "

Yes
IPv4-in-IPv6-too-few-parts An IPv6 address IPv4 address

An IPv6 address with IPv4 address syntax: an IPv4 address contains too few parts.

" https://[ffff::127.0.0] " Yes "

Yes
URL parsing URL parsing
invalid-URL-unit A code point is found that is not a URL unit

A code point is found that is not a URL unit .

" https://example.org/> " " "

" https://example.org "

" ht
tps://example.org
" "

" https://example.org/%s " · "

·
special-scheme-missing-following-solidus The input’s scheme is not followed by "

The input’s scheme is not followed by " // ". ".

" file:c:/my-secret-folder " "

" https:example.org " "



const

url

=


new
 URL


URL


(


"https:foo.html"

,


"https://example.org/"


);


· 

·
missing-scheme-non-relative-URL The input is missing a

The input is missing a scheme ASCII alpha base URL base URL base URL opaque path , because it does not begin with an ASCII alpha , and either no base URL was provided or the base URL cannot be used as a base URL because it has an opaque path .

Input’s

Input’s scheme base URL is missing and no base URL is given:



const

url

=


new
 URL


URL


(


"💩"


);


Input’s 

Input’s scheme base URL opaque path is missing, but the base URL has an opaque path .



const

url

=


new
 URL


URL


(


"💩"

,


"mailto:user@example.org"


);


Yes 

Yes
invalid-reverse-solidus The URL has a special scheme

The URL has a special scheme and it uses U+005C (\) instead of U+002F (/).

" https://example.org\path\to\file " · "

·
invalid-credentials The input includes credentials

The input includes credentials .

" https://user@example.org " "

" https://user:pass@ " "

Yes
(only if there is no host) (only if there is no host)
host-missing The input has a special scheme

The input has a special scheme , but does not contain a host .

" https://#fragment " "

" https://:443 " Yes "

Yes
port-out-of-range The input’s port is too big.

The input’s port is too big.

" https://example.org:70000 " Yes "

Yes
port-invalid The input’s port is invalid.

The input’s port is invalid.

" https://example.org:7z " Yes "

Yes
file-invalid-Windows-drive-letter The input is a relative-URL string starts with a Windows drive letter base URL

The input is a relative-URL string ’s that starts with a Windows drive letter and the base URL ’s scheme is " is " file ". ".





const

url

=


new
 URL


URL


(


"/c:/path/to/file"

,


"file:///c:/"


);


· 

·
file-invalid-Windows-drive-letter-host A

A file: URL’s host is a Windows drive letter.

" file://c: " · "

·

1.2. 1.2. Parsers The EOF code point A

The EOF code point is a conceptual code point that signifies the end of a string or code point stream.

A pointer for a string input code point is an integer that points to a code point within input . Initially it points to the start of input . If it is −1 it points nowhere. If it is greater than or equal to input ’s code point length EOF code point When a ’s code point length , it points to the EOF code point .

When a pointer is used, c code point references the code point the pointer points to as long as it does not point nowhere. When the pointer points to nowhere c When a cannot be used.

When a pointer is used, remaining code point substring references the code point substring from the pointer + 1 to the end of the string, as long as c EOF code point is not the EOF code point . When c EOF code point is the EOF code point remaining cannot be used.

If " If " mailto:username@example " is a " is a string being processed and a pointer points to @, c is U+0040 (@) and remaining is " is " example ". ".

If the empty string is being processed and a If the empty string is being processed and a pointer points to the start and is then decreased by 1, using c or remaining would be an error.

1.3. 1.3. Percent-encoded bytes Percent-encoded bytes A percent-encoded byte ASCII hex digits It is generally a good idea for sequences of percent-encoded bytes

A percent-encoded byte is U+0025 (%), followed by two ASCII hex digits .

It is generally a good idea for sequences of percent-encoded bytes to be such that, when percent-decoded UTF-8 decode without BOM or fail percent-encoded bytes host parser URL rendering percent-encoded bytes and then passed to UTF-8 decode without BOM or fail , they do not end up as failure. How important this is depends on where the percent-encoded bytes are used. E.g., for the host parser not following this advice is fatal, whereas for URL rendering the percent-encoded bytes would not be rendered percent-decoded To .

To percent-encode a byte byte , return a string ASCII upper hex digits consisting of U+0025 (%), followed by two ASCII upper hex digits representing byte To .

To percent-decode byte sequence a byte sequence input Using anything but UTF-8 decode without BOM , run these steps:

Using anything but UTF-8 decode without BOM when input ASCII bytes Let contains bytes that are not ASCII bytes might be insecure and is not recommended.

  1. Let output byte sequence For each byte be an empty byte sequence .

  2. For each byte byte in input If :

    1. If byte is not 0x25 (%), then append byte to output Otherwise, if .

    2. Otherwise, if byte is 0x25 (%) and the next two bytes after byte in input are not in the ranges 0x30 (0) to 0x39 (9), 0x41 (A) to 0x46 (F), and 0x61 (a) to 0x66 (f), all inclusive, append byte to output Otherwise: Let .

    3. Otherwise:

      1. Let bytePoint be the two bytes after byte in input , decoded Append a byte whose value is , and then interpreted as hexadecimal number.

      2. Append a byte whose value is bytePoint to output Skip the next two bytes in .

      3. Skip the next two bytes in input Return .

  3. Return output To .

To percent-decode scalar value string a scalar value string input Let :

  1. Let bytes UTF-8 encoding be the UTF-8 encoding of input Return the .

  2. Return the percent-decoding of bytes In general, percent-encoding results in a string with more U+0025 (%) code points than the input, and percent-decoding results in a byte sequence with less 0x25 (%) bytes than the input. The C0 control percent-encode set .

In general, percent-encoding results in a string with more U+0025 (%) code points than the input, and percent-decoding results in a byte sequence with less 0x25 (%) bytes than the input.


The C0 control percent-encode set C0 controls code points The fragment percent-encode set C0 control percent-encode set The query percent-encode set C0 control percent-encode set The query percent-encode set fragment percent-encode set The special-query percent-encode set query percent-encode set The path percent-encode set are the C0 controls and all code points greater than U+007E (~).

The fragment percent-encode set is the C0 control percent-encode set and U+0020 SPACE, U+0022 ("), U+003C (<), U+003E (>), and U+0060 (`).

The query percent-encode set is the C0 control percent-encode set and U+0020 SPACE, U+0022 ("), U+0023 (#), U+003C (<), and U+003E (>).

The query percent-encode set cannot be defined in terms of the fragment percent-encode set due to the omission of U+0060 (`).

The special-query percent-encode set is the query percent-encode set and U+0027 (').

The path percent-encode set query percent-encode set The userinfo percent-encode set is the query percent-encode set and U+003F (?), U+0060 (`), U+007B ({), and U+007D (}).

The userinfo percent-encode set path percent-encode set The component percent-encode set userinfo percent-encode set This is used by is the path percent-encode set and U+002F (/), U+003A (:), U+003B (;), U+003D (=), U+0040 (@), U+005B ([) to U+005E (^), inclusive, and U+007C (|).

The component percent-encode set is the userinfo percent-encode set and U+0024 ($) to U+0026 (&), inclusive, U+002B (+), and U+002C (,).

This is used by HTML for registerProtocolHandler() , and could also be used by other standards to percent-encode data that can then be embedded in a URL ’s ’s path , query , or fragment ; or in an opaque host UTF-8 percent-encode ; or in an opaque host . Using it with UTF-8 percent-encode gives identical results to JavaScript’s encodeURIComponent() [sic] [sic] . [HTML] [ECMA-262] The

The application/x-www-form-urlencoded percent-encode set component percent-encode set The percent-encode set is the component percent-encode set and U+0021 (!), U+0027 (') to U+0029 RIGHT PARENTHESIS, inclusive, and U+007E (~).

The application/x-www-form-urlencoded percent-encode set ASCII alphanumeric To percent-encode after encoding percent-encode set contains all code points, except the ASCII alphanumeric , U+002A (*), U+002D (-), U+002E (.), and U+005F (_).

To percent-encode after encoding , given an encoding scalar value string encoding , scalar value string input , a percentEncodeSet , and an optional boolean spaceAsPlus Let (default false):

  1. Let encoder getting an encoder be the result of getting an encoder from encoding Let .

  2. Let inputQueue be input I/O queue Let converted to an I/O queue .

  3. Let output Let be the empty string.

  4. Let potentialError This needs to be a non-null value to initiate the subsequent while loop. While be 0.

    This needs to be a non-null value to initiate the subsequent while loop.

  5. While potentialError Let is non-null:

    1. Let encodeOutput I/O queue Set be an empty I/O queue .

    2. Set potentialError encode or fail to the result of running encode or fail with inputQueue , encoder , and encodeOutput For each .

    3. For each byte of encodeOutput If converted to a byte sequence:

      1. If spaceAsPlus is true and byte is 0x20 (SP), then append U+002B (+) to output and continue Let .

      2. Let isomorph code point be a code point whose value is byte ’s ’s value Assert: .

      3. Assert: percentEncodeSet includes all non- includes all non- ASCII code points If ASCII code points .

      4. If isomorph is not in percentEncodeSet , then append isomorph to output Otherwise, .

      5. Otherwise, percent-encode byte and append the result to output If .

    4. If potentialError is non-null, then append " is non-null, then append " %26%23 ", followed by the shortest sequence of ASCII digits ", followed by the shortest sequence of ASCII digits representing potentialError in base ten, followed by " in base ten, followed by " %3B ", to ", to output This can happen when .

      This can happen when encoding is not UTF-8 Return .

  6. Return output Of the possible values for the .

Of the possible values for the percentEncodeSet component percent-encode set argument only two end up encoding U+0025 (%) and thus give “roundtripable data”: component percent-encode set and application/x-www-form-urlencoded percent-encode set percent-encode set . The other values for the percentEncodeSet URL parser argument — which happen to be used by the URL parser — leave U+0025 (%) untouched and as such it needs to be percent-encoded To UTF-8 percent-encode scalar value first in order to be properly represented.

To UTF-8 percent-encode a scalar value scalarValue using a percentEncodeSet percent-encode after encoding , return the result of running percent-encode after encoding with UTF-8 , scalarValue as a string , and percentEncodeSet To UTF-8 percent-encode scalar value string .

To UTF-8 percent-encode a scalar value string input using a percentEncodeSet percent-encode after encoding , return the result of running percent-encode after encoding with UTF-8 , input , and percentEncodeSet .


Here is a summary, by way of example, of the operations defined above: Operation Input Output

Here is a summary, by way of example, of the operations defined above:

Operation Input Output
Percent-encode input 0x23 0x23 " %23 " 0x7F "
0x7F " %7F " "
Percent-decode input ` %25%s%1G ` ` ` %%s%1G ` `
Percent-decode input " ‽%25%2E " 0xE2 0x80 0xBD 0x25 0x2E " 0xE2 0x80 0xBD 0x25 0x2E
Percent-encode after encoding Percent-encode after encoding with Shift_JIS , input userinfo percent-encode set , and the userinfo percent-encode set " " " " %20 " "
" " " " %81%DF " "
" " " " %26%238253%3B " "
Percent-encode after encoding Percent-encode after encoding with ISO-2022-JP , input userinfo percent-encode set , and the userinfo percent-encode set " ¥ " " " %1B(J\%1B(B " "
Percent-encode after encoding Percent-encode after encoding with Shift_JIS , input userinfo percent-encode set , the userinfo percent-encode set , and true " 1+1 ≡ 2%20‽ 1+1 ≡ 2%20‽ " " " 1+1+%81%DF+2%20%26%238253%3B " "
UTF-8 percent-encode UTF-8 percent-encode input userinfo percent-encode set U+2261 (≡) using the userinfo percent-encode set U+2261 (≡) " %E2%89%A1 " U+203D (‽) "
U+203D (‽) " %E2%80%BD " "
UTF-8 percent-encode UTF-8 percent-encode input userinfo percent-encode set using the userinfo percent-encode set " Say what‽ Say what‽ " " " Say%20what%E2%80%BD " "

2. 2. Security considerations Security considerations The security of a

The security of a URL is a function of its environment. Care is to be taken when rendering, interpreting, and passing URLs When rendering and allocating new around.

When rendering and allocating new URLs "spoofing" needs to be considered. An attack whereby one host or URL code points can be confused for another. For instance, consider how 1/l/I, m/rn/rri, 0/O, and а/a can all appear eerily similar. Or worse, consider how U+202A LEFT-TO-RIGHT EMBEDDING and similar code points are invisible. [UTR36] When passing a

When passing a URL from party A to B , both need to carefully consider what is happening. A might end up leaking data it does not want to leak. B might receive input it did not expect and take an action that harms the user. In particular, B should never trust A , as at some point URLs from A can come from untrusted sources.

3. 3. Hosts (domains and IP addresses) Hosts (domains and IP addresses) At a high level, a

At a high level, a host valid host string host parser host serializer The host parser scalar value string , valid host A string , host A valid host string validation error host parser The host serializer parser , and host ASCII string serializer relate as follows:

A

A parse - serialize roundtrip gives the following results, depending on the isNotSpecial host parser Input Output ( argument to the host parser :

Input Output ( isNotSpecial Output ( = false) Output ( isNotSpecial = true)
EXAMPLE.COM example.com ( ( domain ) ) EXAMPLE.COM ( ( opaque host opaque host ) )
example%2Ecom example%2Ecom ( ( opaque host opaque host ) )
faß.example xn--fa-hia.example ( ( domain ) ) fa%C3%9F.example ( ( opaque host opaque host ) )
0 0.0.0.0 ( ( IPv4 ) ) 0 ( ( opaque host opaque host ) )
%30 %30 ( ( opaque host opaque host ) )
0x 0x ( ( opaque host opaque host ) )
0xffffffff 255.255.255.255 ( ( IPv4 ) ) 0xffffffff ( ( opaque host opaque host ) )
[0:0::1] [::1] ( ( IPv6 ) )
[0:0::1%5D Failure Failure
[0:0::%31]
09 Failure Failure 09 ( ( opaque host opaque host ) )
example.255 example.255 ( ( opaque host opaque host ) )
example^example Failure Failure

3.1. 3.1. Host representation Host representation A

A host is a domain IP address opaque host empty host , an IP address , an opaque host , or an empty host . Typically a host serves as a network address, but it is sometimes used as opaque identifier in URLs where a network address is not necessary.

A typical A typical URL whose host opaque host is an opaque host is git://github.com/whatwg/url.git The RFCs referenced in the paragraphs below are for informative purposes only. They have no influence on .

The RFCs referenced in the paragraphs below are for informative purposes only. They have no influence on host A writing, parsing, and serialization. Unless stated otherwise in the sections that follow.

A domain ASCII string is a non-empty ASCII string that identifies a realm within a network. [RFC1034] The domain labels

The domain labels of a domain strictly splitting domain The are the result of strictly splitting domain on U+002E (.).

The example.com and example.com. domains An IP address IPv4 address IPv6 address An IPv4 address are not equivalent and typically treated as distinct.

An IP address is an IPv4 address or an IPv6 address .

An IPv4 address is a 32-bit unsigned integer that identifies a network address. [RFC791] An IPv6 address

An IPv6 address is a 128-bit unsigned integer that identifies a network address. For the purposes of this standard it is represented as a list IPv6 pieces of eight 16-bit unsigned integers, also known as IPv6 pieces . [RFC4291] Support for

Support for <zone_id> intentionally omitted An opaque host ASCII string An empty host is intentionally omitted .

An opaque host is a non-empty ASCII string that can be used for further processing.

An empty host is the empty string.

3.2. 3.2. Host miscellaneous Host miscellaneous A forbidden host code point A forbidden domain code point forbidden host code point C0 control To obtain the public suffix

A forbidden host code point is U+0000 NULL, U+0009 TAB, U+000A LF, U+000D CR, U+0020 SPACE, U+0023 (#), U+002F (/), U+003A (:), U+003C (<), U+003E (>), U+003F (?), U+0040 (@), U+005B ([), U+005C (\), U+005D (]), U+005E (^), or U+007C (|).

A forbidden domain code point is a forbidden host code point , a C0 control , U+0025 (%), or U+007F DELETE.

To obtain the public suffix of a host host , run these steps. They return null or a domain representing a portion of host Public Suffix List that is included on the Public Suffix List . [PSL] If

  1. If host is not a domain Let , then return null.

  2. Let trailingDot be " be " . " if " if host ends with " ends with " . "; otherwise the empty string. Let "; otherwise the empty string.

  3. Let publicSuffix Public Suffix List algorithm be the public suffix determined by running the Public Suffix List algorithm with host as domain. [PSL] Assert:

  4. Assert: publicSuffix ASCII string end with " is an ASCII string that does not end with " . ". Return ".

  5. Return publicSuffix and trailingDot To obtain the registrable domain concatenated.

To obtain the registrable domain of a host host , run these steps. They return null or a domain formed by host ’s public suffix domain label If ’s public suffix and the domain label preceding it, if any.

  1. If host ’s public suffix ’s public suffix is null or host ’s public suffix ’s public suffix equals host Let , then return null.

  2. Let trailingDot be " be " . " if " if host ends with " ends with " . "; otherwise the empty string. Let "; otherwise the empty string.

  3. Let registrableDomain Public Suffix List algorithm be the registrable domain determined by running the Public Suffix List algorithm with host as domain. [PSL] Assert:

  4. Assert: registrableDomain ASCII string end with " is an ASCII string that does not end with " . ". Return ".

  5. Return registrableDomain and trailingDot concatenated.

Host input Public suffix Registrable domain
Host input Public suffix Registrable domain
com com null null
example.com com example.com
www.example.com com example.com
sub.www.example.com com example.com
EXAMPLE.COM com example.com
example.com. com. example.com.
github.io github.io null null
whatwg.github.io github.io whatwg.github.io
إختبار xn--kgbechtv null null
example.إختبار xn--kgbechtv example.xn--kgbechtv
sub.example.إختبار xn--kgbechtv example.xn--kgbechtv
[2001:0db8:85a3:0000:0000:8a2e:0370:7334] null null Specifications should prefer the null null

Specifications should prefer the origin concept for security decisions. The notion of " concept for security decisions. The notion of " public suffix public suffix " and " " and " registrable domain registrable domain " cannot be relied-upon to provide a hard security boundary, as the public suffix list will diverge from client to client. Specifications which ignore this advice are encouraged to carefully consider whether URLs' schemes ought to be incorporated into any decisions made, i.e. whether to use the same site schemelessly same site " cannot be relied-upon to provide a hard security boundary, as the public suffix list will diverge from client to client. Specifications which ignore this advice are encouraged to carefully consider whether URLs' schemes ought to be incorporated into any decisions made, i.e. whether to use the same site or schemelessly same site concepts.

3.3. 3.3. IDNA The domain to ASCII

The domain to ASCII algorithm, given a string domain and a boolean beStrict Let , runs these steps:

  1. Let result Unicode ToASCII be the result of running Unicode ToASCII with domain_name set to domain , UseSTD3ASCIIRules set to beStrict , CheckHyphens set to false, CheckBidi set to true, CheckJoiners set to true, Transitional_Processing set to false, and VerifyDnsLength set to beStrict . [UTS46] If

    If beStrict is false, domain ASCII string strictly splitting is an ASCII string , and strictly splitting domain on U+002E (.) does not produce any item starts with ASCII case-insensitive match for " that starts with an ASCII case-insensitive match for " xn-- ", this step is equivalent to ASCII lowercasing ", this step is equivalent to ASCII lowercasing domain If .

  2. If result is a failure value, domain-to-ASCII validation error If validation error , return failure.

  3. If result is the empty string, domain-to-ASCII validation error Return validation error , return failure.

  4. Return result This document and the web platform at large use Unicode IDNA Compatibility Processing .

This document and the web platform at large use Unicode IDNA Compatibility Processing and not IDNA2008. For instance, ☕.example becomes xn--53h.example and not failure. [UTS46] [RFC5890] The domain to Unicode

The domain to Unicode algorithm, given a domain domain and a boolean beStrict Let , runs these steps:

  1. Let result Unicode ToUnicode be the result of running Unicode ToUnicode with domain_name set to domain , CheckHyphens set to false, CheckBidi set to true, CheckJoiners set to true, UseSTD3ASCIIRules set to beStrict , and Transitional_Processing set to false. [UTS46] Signify

  2. Signify domain-to-Unicode validation errors validation errors for any returned errors, and then, return result .

3.4. 3.4. Host writing Host writing A valid host string

A valid host string valid domain string valid IPv4-address string valid IPv6-address string A must be a valid domain valid domain Let string , a valid IPv4-address string , or: U+005B ([), followed by a valid IPv6-address string , followed by U+005D (]).

A domain is a valid domain if these steps return success:

  1. Let result domain to ASCII be the result of running domain If to ASCII with domain and true.

  2. If result Set is failure, then return failure.

  3. Set result domain to Unicode to the result If of running domain to Unicode with result Return success. Ideally we define this in terms of a sequence of code points that make up a valid domain issue 245 A valid domain string and true.

  4. If result contains any errors, return failure.

  5. Return success.

Ideally we define this in terms of a sequence of code points that make up a valid domain rather than through a whack-a-mole: issue 245 .

A valid domain string valid domain A valid IPv4-address string must be a string that is a valid domain .

A valid IPv4-address string ASCII digits A valid IPv6-address string must be four shortest possible strings of ASCII digits , representing a decimal number in the range 0 to 255, inclusive, separated from each other by U+002E (.).

A valid IPv6-address string "Text Representation of Addresses" chapter of IP Version 6 Addressing Architecture is defined in the "Text Representation of Addresses" chapter of IP Version 6 Addressing Architecture . [RFC4291] A valid opaque-host string one or more URL units forbidden host code points U+005B ([), followed by a valid IPv6-address string This is not part of the definition of valid host string

A valid opaque-host string must be one of the following:

This is not part of the definition of valid host string as it requires context to be distinguished.

3.5. 3.5. Host parsing Host parsing The host parser scalar value string

The host parser takes a scalar value string input with an optional boolean isNotSpecial (default false), and then runs these steps. They return failure or a host If .

  1. If input If starts with U+005B ([), then:

    1. If input does not end with U+005D (]), IPv6-unclosed validation error Return the result of IPv6 parsing validation error , return failure.

    2. Return the result of IPv6 parsing input If with its leading U+005B ([) and trailing U+005D (]) removed.

  2. If isNotSpecial opaque-host parsing is true, then return the result of opaque-host parsing input Assert: .

  3. Assert: input Let is not the empty string.

  4. Let domain UTF-8 decode without BOM be the result of running UTF-8 decode without BOM on the percent-decoding of input Alternatively UTF-8 decode without BOM or fail domain to ASCII Let .

    Alternatively UTF-8 decode without BOM or fail can be used, coupled with an early return for failure, as domain to ASCII fails on U+FFFD (�).

  5. Let asciiDomain domain to ASCII be the result of running domain If to ASCII with domain and false.

  6. If asciiDomain If is failure, then return failure.

  7. If asciiDomain forbidden domain code point contains a forbidden domain code point , domain-invalid-code-point validation error If validation error , return failure.

  8. If asciiDomain ends in a number IPv4 parsing ends in a number , then return the result of IPv4 parsing asciiDomain Return .

  9. Return asciiDomain The ends in a number checker ASCII string .


The ends in a number checker takes an ASCII string input Let and then runs these steps. They return a boolean.

  1. Let parts strictly splitting be the result of strictly splitting input If the last on U+002E (.).

  2. If the last item in parts If is the empty string, then:

    1. If parts ’s ’s size is 1, then return false.

    2. Remove the last item from parts Let .

  3. Let last be the last item in parts If .

  4. If last ASCII digits The erroneous input " is non-empty and contains only ASCII digits , then return true.

    The erroneous input " 09 " will be caught by the IPv4 parser If parsing " will be caught by the IPv4 parser at a later stage.

  5. If parsing last IPv4 number This is equivalent to checking that as an IPv4 number does not return failure, then return true.

    This is equivalent to checking that last is " is " 0X " or " " or " 0x ", followed by zero or more ASCII hex digits Return false. The IPv4 parser ASCII string ", followed by zero or more ASCII hex digits .

  6. Return false.

The IPv4 parser takes an ASCII string input IPv4 address The IPv4 parser host parser IPv4 address Let and then runs these steps. They return failure or an IPv4 address .

The IPv4 parser is not to be invoked directly. Instead check that the return value of the host parser is an IPv4 address .

  1. Let parts strictly splitting be the result of strictly splitting input If the last on U+002E (.).

  2. If the last item in parts is the empty string, then:

    1. IPv4-empty-part validation error If validation error .

    2. If parts ’s ’s size is greater than 1, then remove the last item from parts If .

  3. If parts ’s ’s size is greater than 4, IPv4-too-many-parts validation error Let validation error , return failure.

  4. Let numbers be an empty list .

  5. For each For each part of parts Let :

    1. Let result be the result of parsing part If .

    2. If result is failure, IPv4-non-numeric-part validation error If validation error , return failure.

    3. If result [1] is true, [1] is true, IPv4-non-decimal-part validation error validation error .

    4. Append result [0] to [0] to numbers If any item in .

  6. If any item in numbers is greater than 255, IPv4-out-of-range-part validation error If any but the last validation error .

  7. If any but the last item in numbers If the last is greater than 255, then return failure.

  8. If the last item in numbers is greater than or equal to 256 is greater than or equal to 256 (5 − (5 − numbers ’s ’s size ) Let , then return failure.

  9. Let ipv4 be the last item in numbers .

  10. Remove the last item from numbers Let .

  11. Let counter be 0.

  12. For each For each n of numbers Increment :

    1. Increment ipv4 by n × 256 × 256 (3 − (3 − counter ) Increment .

    2. Increment counter Return by 1.

  13. Return ipv4 The IPv4 number parser ASCII string .

The IPv4 number parser takes an ASCII string input and then runs these steps. They return failure or a tuple If of a number and a boolean.

  1. If input Let is the empty string, then return failure.

  2. Let validationError Let be false.

  3. Let R If be 10.

  4. If input contains at least two code points and the first two code points are either " contains at least two code points and the first two code points are either " 0X " or " " or " 0x ", then: Set ", then:

    1. Set validationError Remove the first two code points from to true.

    2. Remove the first two code points from input Set .

    3. Set R Otherwise, if to 16.

  5. Otherwise, if input Set contains at least two code points and the first code point is U+0030 (0), then:

    1. Set validationError Remove the first code point from to true.

    2. Remove the first code point from input Set .

    3. Set R If to 8.

  6. If input If is the empty string, then return (0, true).

  7. If input contains a code point that is not a radix- contains a code point that is not a radix- R Let digit, then return failure.

  8. Let output be the mathematical integer value that is represented by input in radix- in radix- R ASCII hex digits Return ( notation, using ASCII hex digits for digits with values 0 through 15.

  9. Return ( output , validationError ). The IPv6 parser scalar value string ).


The IPv6 parser takes a scalar value string input IPv6 address The IPv6 parser Let and then runs these steps. They return failure or an IPv6 address IPv6 address IPv6 pieces Let .

The IPv6 parser could in theory be invoked directly, but please discuss actually doing that with the editors of this document first.

  1. Let address be a new IPv6 address whose IPv6 pieces are all 0.

  2. Let pieceIndex Let be 0.

  3. Let compress Let be null.

  4. Let pointer be a pointer for input If .

  5. If c If is U+003A (:), then:

    1. If remaining does not start with U+003A (:), IPv6-invalid-compression validation error Increase validation error , return failure.

    2. Increase pointer Increase by 2.

    3. Increase pieceIndex by 1 and then set compress to pieceIndex While .

  6. While c EOF code point If is not the EOF code point :

    1. If pieceIndex is 8, IPv6-too-many-pieces validation error If validation error , return failure.

    2. If c If is U+003A (:), then:

      1. If compress is non-null, IPv6-multiple-compression validation error Increase validation error , return failure.

      2. Increase pointer and pieceIndex by 1, set compress to pieceIndex , and then continue Let .
    3. Let value and length While be 0.

    4. While length is less than 4 and c ASCII hex digit is an ASCII hex digit , set value to value × 0x10 + c interpreted as hexadecimal number, and increase pointer and length If by 1.

    5. If c If is U+002E (.), then:

      1. If length is 0, IPv4-in-IPv6-invalid-code-point validation error Decrease validation error , return failure.

      2. Decrease pointer by length If .

      3. If pieceIndex is greater than 6, IPv4-in-IPv6-too-many-pieces validation error Let validation error , return failure.

      4. Let numbersSeen While be 0.

      5. While c EOF code point Let is not the EOF code point :

        1. Let ipv4Piece If be null.

        2. If numbersSeen If is greater than 0, then:

          1. If c is a U+002E (.) and numbersSeen is less than 4, then increase pointer Otherwise, by 1.

          2. Otherwise, IPv4-in-IPv6-invalid-code-point validation error If validation error , return failure.
        3. If c ASCII digit is not an ASCII digit , IPv4-in-IPv6-invalid-code-point validation error While validation error , return failure.

        4. While c ASCII digit Let is an ASCII digit :

          1. Let number be c If interpreted as decimal number.

          2. If ipv4Piece is null, then set ipv4Piece to number Otherwise, if .

            Otherwise, if ipv4Piece is 0, IPv4-in-IPv6-invalid-code-point validation error Otherwise, set validation error , return failure.

            Otherwise, set ipv4Piece to ipv4Piece × 10 + number If .

          3. If ipv4Piece is greater than 255, IPv4-in-IPv6-out-of-range-part validation error Increase validation error , return failure.

          4. Increase pointer Set by 1.

        5. Set address [ pieceIndex ] to ] to address [ pieceIndex ] × 0x100 + ] × 0x100 + ipv4Piece Increase .

        6. Increase numbersSeen If by 1.

        7. If numbersSeen is 2 or 4, then increase pieceIndex If by 1.

      6. If numbersSeen is not 4, IPv4-in-IPv6-too-few-parts validation error validation error , return failure.

      7. Break Otherwise, if .

    6. Otherwise, if c Increase is U+003A (:):

      1. Increase pointer If by 1.

      2. If c EOF code point is the EOF code point , IPv6-invalid-code-point validation error Otherwise, if validation error , return failure.

    7. Otherwise, if c EOF code point is not the EOF code point , IPv6-invalid-code-point validation error Set validation error , return failure.

    8. Set address [ pieceIndex ] to ] to value Increase .

    9. Increase pieceIndex If by 1.

  7. If compress Let is non-null, then:

    1. Let swaps be pieceIndex compress Set .

    2. Set pieceIndex While to 7.

    3. While pieceIndex is not 0 and swaps is greater than 0, swap address [ pieceIndex ] with ] with address [ compress + swaps − 1], and then decrease both pieceIndex and swaps Otherwise, if by 1.

  8. Otherwise, if compress is null and pieceIndex is not 8, IPv6-too-few-pieces validation error Return validation error , return failure.

  9. Return address The opaque-host parser scalar value string .


The opaque-host parser takes a scalar value string input opaque host If , and then runs these steps. They return failure or an opaque host .

  1. If input forbidden host code point contains a forbidden host code point , host-invalid-code-point validation error If validation error , return failure.

  2. If input code point URL code point contains a code point that is not a URL code point and not U+0025 (%), invalid-URL-unit validation error If validation error .

  3. If input code points ASCII hex digits contains a U+0025 (%) and the two code points following it are not ASCII hex digits , invalid-URL-unit validation error Return the result of running UTF-8 percent-encode validation error .

  4. Return the result of running UTF-8 percent-encode on input C0 control percent-encode set using the C0 control percent-encode set .

3.6. 3.6. Host serializing Host serializing The host serializer

The host serializer takes a host ASCII string If host IPv4 address IPv4 serializer and then runs these steps. They return an ASCII string .

  1. If host Otherwise, if is an IPv4 address , return the result of running the IPv4 serializer on host IPv6 address IPv6 serializer .

  2. Otherwise, if host Otherwise, is an IPv6 address , return U+005B ([), followed by the result of running the IPv6 serializer on host , followed by U+005D (]).

  3. Otherwise, host is a domain opaque host empty host , opaque host The IPv4 serializer IPv4 address , or empty host , return host .

The IPv4 serializer takes an IPv4 address ASCII string address and then runs these steps. They return an ASCII string . Let

  1. Let output Let be the empty string.

  2. Let n be the value of address .

  3. For each For each i Prepend in the range 1 to 4, inclusive:

    1. Prepend n % 256, serialized , to output If .

    2. If i is not 4, then prepend U+002E (.) to output Set .

    3. Set n to floor( to floor( n Return / 256).

  4. Return output The IPv6 serializer IPv6 address .

The IPv6 serializer takes an IPv6 address ASCII string Let address and then runs these steps. They return an ASCII string .

  1. Let output Let be the empty string.

  2. Let compress IPv6 piece be an index to the first IPv6 piece in the first longest sequences of address ’s IPv6 pieces ’s IPv6 pieces that are 0.

    In In 0:f:0:0:f:f:0:0 If there is no sequence of it would point to the second 0.

  3. If there is no sequence of address ’s IPv6 pieces ’s IPv6 pieces that are 0 that is longer than 1, then set compress Let to null.

  4. Let ignore0 be false.

  5. For each For each pieceIndex If in the range 0 to 7, inclusive:

    1. If ignore0 is true and address [ pieceIndex ] is 0, then ] is 0, then continue Otherwise, if .

    2. Otherwise, if ignore0 is true, set ignore0 If to false.

    3. If compress is pieceIndex Let , then:

      1. Let separator be " be " :: " if " if pieceIndex Append is 0, and U+003A (:) otherwise.

      2. Append separator to output Set .

      3. Set ignore0 to true and continue Append .

    4. Append address [ pieceIndex ], represented as the shortest possible lowercase hexadecimal number, to ], represented as the shortest possible lowercase hexadecimal number, to output If .

    5. If pieceIndex is not 7, then append U+003A (:) to output Return .

  6. Return output This algorithm requires the recommendation from A Recommendation for IPv6 Address Text Representation. .

This algorithm requires the recommendation from A Recommendation for IPv6 Address Text Representation. [RFC5952]

3.7. 3.7. Host equivalence Host equivalence To determine whether a

To determine whether a host A equals host B , return true if A is B Certificate comparison requires a host equivalence check that ignores the trailing dot of a domain (if any). However, those hosts have also various other facets enforced, such as DNS length, that are not enforced here, as URLs do not enforce them. If anyone has a good suggestion for how to bring these two closer together, or what a good unified model would be, please file an issue. , and false otherwise.

Certificate comparison requires a host equivalence check that ignores the trailing dot of a domain (if any). However, those hosts have also various other facets enforced, such as DNS length, that are not enforced here, as URLs do not enforce them. If anyone has a good suggestion for how to bring these two closer together, or what a good unified model would be, please file an issue.

4. 4. URLs At a high level, a

At a high level, a URL valid URL string URL parser URL serializer The URL parser scalar value string , valid URL validation errors A string , URL A valid URL string validation error URL parser The URL serializer parser , and URL ASCII string serializer relate as follows:

Input Base Valid Output
Input Base Valid Output
https:example.org https://example.org/
https://////example.com/// https://example.com///
https://example.com/././foo https://example.com/foo
hello:world https://example.com/ hello:world
https:example.org https://example.com/ https://example.com/example.org
\example\..\demo/.\ https://example.com/ https://example.com/demo/
example https://example.com/demo https://example.com/example
file:///C|/demo file:///C:/demo
.. file:///C:/demo file:///C:/
file://loc%61lhost/ file:///
https://user:password@example.org/ https://user:password@example.org/
https://example.org/foo bar ❌ https://example.org/foo bar https://example.org/foo%20bar
https://EXAMPLE.com/../x https://example.com/x
https://ex ample.org/ ❌ Failure https://ex ample.org/ Failure
example ❌, due to lack of base Failure ❌, due to lack of base Failure
https://example.com:demo ❌ Failure Failure
http://[www.example.com]/ ❌ Failure Failure
https://example.org// https://example.org//
https://example.com/[]?[]#[] https://example.com/[]?[]#[]
https://example/%?%#% https://example/%?%#%
https://example/%25?%25#%25 https://example/%25?%25#%25 The base and output

The base and output URL are represented in serialized form for brevity.

4.1. 4.1. URL representation URL representation A

A URL is a struct valid URL string URL record A that represents a universal identifier. To disambiguate from a valid URL string ’s it can also be referred to as a URL record .

A URL ’s scheme ASCII string is an ASCII string that identifies the type of URL and can be used to dispatch a URL for further processing after parsing A . It is initially the empty string.

A URL ’s ’s username ASCII string A is an ASCII string identifying a username. It is initially the empty string.

A URL ’s ’s password ASCII string A is an ASCII string identifying a password. It is initially the empty string.

A URL ’s ’s host is null or a host The following table lists allowed . It is initially null.

The following table lists allowed URL ’s ’s scheme / host combinations.

scheme host
domain IPv4 address IPv4 address IPv6 address IPv6 address opaque host opaque host empty host null empty host null
Special schemes excluding " Special schemes excluding " file " ✅ ✅ ✅ ❌ ❌ ❌ "
" file " ✅ ✅ ✅ ❌ ✅ ❌ Others ❌ ❌ ✅ ✅ ✅ ✅ A "
Others

A URL ’s ’s port A is either null or a 16-bit unsigned integer that identifies a networking port. It is initially null.

A URL ’s ’s path URL path segment is either a URL path segment or a list URL path segments A of zero or more URL path segments , usually identifying a location. It is initially « ».

A special URL ’s ’s path is always a list , i.e., it is never opaque A .

A URL ’s ’s query ASCII string A is either null or an ASCII string . It is initially null.

A URL ’s ’s fragment ASCII string is either null or an ASCII string that can be used for further processing on the resource the URL ’s other components identify. It is initially null. A ’s other components identify. It is initially null.

A URL blob URL entry blob URL entry This is used to support caching the object a " also has an associated blob URL entry that is either null or a blob URL entry . It is initially null.

This is used to support caching the object a " blob " URL refers to as well as its origin. It is important that these are cached as the " URL blob URL store refers to as well as its origin. It is important that these are cached as the URL might be removed from the blob URL store between parsing and fetching, while fetching will still need to succeed.

The following table lists how valid URL strings

The following table lists how valid URL strings , when parsed , map to a URL ’s components. ’s components. Username , password blob URL entry Input , and blob URL entry are omitted; in the examples below they are the empty string, the empty string, and null, respectively.

Input Scheme Host Port Path Query Fragment
https://example.com/ " https " " " example.com " null « the empty string » null null " null « the empty string » null null
https://localhost:8000/search?q=text#hello " https " " " localhost " 8000 « " " 8000 « " search " » " » " q=text " " " hello " "
urn:isbn:9780307476463 " urn " null null " null null " isbn:9780307476463 " null null " null null
file:///ada/Analytical%20Engine/README.md file:///ada/Analytical%20Engine/README.md " file " null null « " " null null « " ada ", " ", " Analytical%20Engine ", " ", " README.md " » null null A URL path segment ASCII string A single-dot URL path segment " » null null

A URL path segment is an ASCII string . It commonly refers to a directory or a file, but has no predefined meaning.

A single-dot URL path segment URL path segment that is " is a URL path segment that is " . " or an ASCII case-insensitive match for " " or an ASCII case-insensitive match for " %2e ". A double-dot URL path segment ".

A double-dot URL path segment URL path segment that is " is a URL path segment that is " .. " or an ASCII case-insensitive match for " " or an ASCII case-insensitive match for " .%2e ", " ", " %2e. ", or " ", or " %2e%2e ". ".

4.2. 4.2. URL miscellaneous URL miscellaneous A special scheme ASCII string default port special scheme default port ASCII string

A special scheme is an ASCII string that is listed in the first column of the following table. The default port for a special scheme is listed in the second column on the same row. The default port for any other ASCII string is null.

Special scheme Special scheme Default port Default port
" ftp " 21 " 21
" file " null " null
" http " 80 " 80
" https " 443 " 443
" ws " 80 " 80
" wss " 443 A " 443

A URL is special is special if its scheme special scheme is a special scheme . A URL is not special is not special if its scheme special scheme A is not a special scheme .

A URL includes credentials includes credentials if its username or password A is not the empty string.

A URL opaque path has an opaque path URL path segment A if its path is a URL cannot have a username/password/port path segment .

A URL cannot have a username/password/port if its host is null or the empty string, or its scheme is " is " file ". A ".

A URL base URL A base URL URL parser relative-URL string A Windows drive letter ASCII alpha A normalized Windows drive letter Windows drive letter As per the URL writing normalized Windows drive letter A string starts with a Windows drive letter its can be designated as base URL .

A base URL is useful for the URL parser when the input might be a relative-URL string .


A Windows drive letter is two code points, of which the first is an ASCII alpha and the second is either U+003A (:) or U+007C (|).

A normalized Windows drive letter is a Windows drive letter of which the second code point is U+003A (:).

As per the URL writing section, only a normalized Windows drive letter is conforming.

A string starts with a Windows drive letter if all of the following are true:

String Starts with a Windows drive letter
String Starts with a Windows drive letter
" c: " ✅ "
" c:/ " ✅ "
" c:a " ❌ To shorten a "

To shorten a url ’s path ’s path :

  1. Assert : url opaque path Let does not have an opaque path .

  2. Let path be url ’s ’s path If .

  3. If url ’s ’s scheme is " is " file ", ", path ’s ’s size is 1, and path [0] is a normalized Windows drive letter [0] is a normalized Windows drive letter , then return.

  4. Remove path ’s last item, if any. ’s last item, if any.

4.3. 4.3. URL writing URL writing A valid URL string

A valid URL string relative-URL-with-fragment string absolute-URL-with-fragment string An absolute-URL-with-fragment string must be either a relative-URL-with-fragment string or an absolute-URL-with-fragment string .

An absolute-URL-with-fragment string absolute-URL string URL-fragment string An absolute-URL string must be an absolute-URL string , optionally followed by U+0023 (#) and a URL-fragment string .

An absolute-URL string a URL-scheme string ASCII case-insensitive special scheme ASCII case-insensitive match for " must be one of the following:

any optionally followed by U+003F (?) and a URL-query string .

A URL-scheme string ASCII alpha ASCII alphanumeric must be one ASCII alpha , followed by zero or more of ASCII alphanumeric , U+002B (+), U+002D (-), and U+002E (.). Schemes IANA URI [sic] Schemes should be registered in the IANA URI [sic] Schemes registry. [IANA-URI-SCHEMES] [RFC7595] A relative-URL-with-fragment string

A relative-URL-with-fragment string relative-URL string URL-fragment string A relative-URL string must be a relative-URL string , optionally followed by U+0023 (#) and a URL-fragment string .

A relative-URL string base URL must be one of the following, switching on base URL ’s ’s scheme A special scheme that is not " :

A special scheme that is not " file " a scheme-relative-special-URL string a path-absolute-URL string a path-relative-scheme-less-URL string "

a scheme-relative-special-URL string

a path-absolute-URL string

a path-relative-scheme-less-URL string

" file " a scheme-relative-file-URL string a path-absolute-URL string base URL "

a scheme-relative-file-URL string ’s

a path-absolute-URL string if base URL ’s host empty host a path-absolute-non-Windows-file-URL string base URL ’s is an empty host empty host a path-relative-scheme-less-URL string Otherwise a scheme-relative-URL string a path-absolute-URL string a path-relative-scheme-less-URL string any optionally followed by U+003F (?) and a URL-query string A non-null base URL

a path-absolute-non-Windows-file-URL string if base URL ’s host is not an empty host

a path-relative-scheme-less-URL string

Otherwise

a scheme-relative-URL string

a path-absolute-URL string

a path-relative-scheme-less-URL string

any optionally followed by U+003F (?) and a URL-query string .

A non-null base URL is necessary when parsing relative-URL string A scheme-relative-special-URL string must be " a relative-URL string .

A scheme-relative-special-URL string must be " // ", followed by a valid host string URL-port string path-absolute-URL string A URL-port string ", followed by a valid host string , optionally followed by U+003A (:) and a URL-port string , optionally followed by a path-absolute-URL string .

A URL-port string the empty string one or more ASCII digits representing a decimal number no greater than 2 must be one of the following:

A scheme-relative-URL string must be " must be " // ", followed by an opaque-host-and-port string path-absolute-URL string An opaque-host-and-port string valid opaque-host string URL-port string A scheme-relative-file-URL string ", followed by an opaque-host-and-port string , optionally followed by a path-absolute-URL string .

An opaque-host-and-port string must be either the empty string or: a valid opaque-host string , optionally followed by U+003A (:) and a URL-port string .

A scheme-relative-file-URL string must be " must be " // ", followed by one of the following: a valid host string path-absolute-non-Windows-file-URL string a path-absolute-URL string A path-absolute-URL string ", followed by one of the following:

A path-absolute-URL string path-relative-URL string A path-absolute-non-Windows-file-URL string must be U+002F (/) followed by a path-relative-URL string .

A path-absolute-non-Windows-file-URL string path-absolute-URL string Windows drive letter A path-relative-URL string must be a path-absolute-URL string that does not start with: U+002F (/), followed by a Windows drive letter , followed by U+002F (/).

A path-relative-URL string URL-path-segment strings A path-relative-scheme-less-URL string must be zero or more URL-path-segment strings , separated from each other by U+002F (/), and not start with U+002F (/).

A path-relative-scheme-less-URL string path-relative-URL string URL-scheme string A URL-path-segment string must be a path-relative-URL string that does not start with: a URL-scheme string , followed by U+003A (:).

A URL-path-segment string zero or more URL units single-dot URL path segment double-dot URL path segment a single-dot URL path segment a double-dot URL path segment A URL-query string must be one of the following:

A URL-query string URL units A URL-fragment string must be zero or more URL units .

A URL-fragment string URL units The URL code points ASCII alphanumeric code points must be zero or more URL units .

The URL code points are ASCII alphanumeric , U+0021 (!), U+0024 ($), U+0026 (&), U+0027 ('), U+0028 LEFT PARENTHESIS, U+0029 RIGHT PARENTHESIS, U+002A (*), U+002B (+), U+002C (,), U+002D (-), U+002E (.), U+002F (/), U+003A (:), U+003B (;), U+003D (=), U+003F (?), U+0040 (@), U+005F (_), U+007E (~), and code points in the range U+00A0 to U+10FFFD, inclusive, excluding surrogates and noncharacters Code points greater than U+007F DELETE will be converted to percent-encoded bytes URL parser In HTML, when the document encoding is a legacy encoding, code points in the URL-query string percent-encoded bytes using the document’s encoding .

Code points greater than U+007F DELETE will be converted to percent-encoded bytes by the URL parser .

In HTML, when the document encoding is a legacy encoding, code points in the URL-query string that are higher than U+007F DELETE will be converted to percent-encoded bytes using the document’s encoding . This can cause problems if a URL that works in one document is copied to another document that uses a different document encoding. Using the UTF-8 encoding everywhere solves this problem.

For example, consider this HTML document:

For example, consider this HTML document:

<!doctype html>
<meta charset="windows-1252">

<


a


href


=


"?sm&ouml;rg&aring;sbord"


>

Test

</


a


>


Since the document encoding is windows-1252, the link’s 

Since the document encoding is windows-1252, the link’s URL ’s ’s query will be " will be " sm%F6rg%E5sbord ". If the document encoding had been UTF-8, it would instead be " ". If the document encoding had been UTF-8, it would instead be " sm%C3%B6rg%C3%A5sbord ". The URL units URL code points percent-encoded bytes ".

The URL units are URL code points and percent-encoded bytes .

Percent-encoded bytes URL code points There is no way to express a Percent-encoded bytes can be used to encode code points that are not URL code points or are excluded from being written.


There is no way to express a username or password URL record valid URL string of a URL record within a valid URL string .

4.4. 4.4. URL parsing URL parsing The URL parser scalar value string

The URL parser takes a scalar value string input base URL , with an optional null or base URL base (default null) and an optional encoding encoding (default UTF-8 ), and then runs these steps: Non-web-browser implementations only need to implement the basic URL parser How user input in the web browser’s address bar is converted to a URL record URL rendering requirements Let ), and then runs these steps:

Non-web-browser implementations only need to implement the basic URL parser .

How user input in the web browser’s address bar is converted to a URL record is out-of-scope of this standard. This standard does include URL rendering requirements as they pertain trust decisions.

  1. Let url basic URL parser be the result of running the basic URL parser on input with base and encoding If .

  2. If url If is failure, return failure.

  3. If url ’s ’s scheme is not " is not " blob ", return ", return url Set .

  4. Set url ’s blob URL entry resolving the blob URL ’s blob URL entry to the result of resolving the blob URL url Return , if that did not return failure, and null otherwise.

  5. Return url The basic URL parser scalar value string .


The basic URL parser takes a scalar value string input base URL , with an optional null or base URL base (default null), an optional encoding encoding (default UTF-8 ), an optional ), an optional URL url , and an optional state override state override state override The , and then runs these steps:

The encoding argument is a legacy concept only relevant for HTML . The url state override and state override arguments are only for use by various APIs. [HTML] When the

When the url state override basic URL parser and state override arguments are not passed, the basic URL parser returns either a new URL or failure. If they are passed, the algorithm modifies the passed url If and can terminate without returning anything.

  1. If url Set is not given:

    1. Set url to a new URL If .

    2. If input C0 control or space contains any leading or trailing C0 control or space , invalid-URL-unit validation error Remove any leading and trailing C0 control or space validation error .

    3. Remove any leading and trailing C0 control or space from input If .

  2. If input ASCII tab or newline contains any ASCII tab or newline , invalid-URL-unit validation error Remove all ASCII tab or newline validation error .

  3. Remove all ASCII tab or newline from input Let .

  4. Let state state override scheme start state Set be state override if given, or scheme start state otherwise.

  5. Set encoding getting an output encoding to the result of getting an output encoding Let from encoding .

  6. Let buffer Let be the empty string.

  7. Let atSignSeen , insideBrackets , and passwordTokenSeen Let be false.

  8. Let pointer be a pointer for input Keep running the following state machine by switching on .

  9. Keep running the following state machine by switching on state . If after a run pointer EOF code point points to the EOF code point , go to the next step. Otherwise, increase pointer by 1 and continue with the state machine.

    scheme start state If scheme start state
    1. If c ASCII alpha is an ASCII alpha , append c , lowercased , to buffer , and set state scheme state Otherwise, if state override to scheme state no scheme state .

    2. Otherwise, if state override is not given, set state to no scheme state and decrease pointer Otherwise, return failure. This indication of failure is used exclusively by the by 1.

    3. Otherwise, return failure.

      This indication of failure is used exclusively by the Location object’s protocol setter.

    scheme state If scheme state
    1. If c ASCII alphanumeric is an ASCII alphanumeric , U+002B (+), U+002D (-), or U+002E (.), append c , lowercased , to buffer Otherwise, if .

    2. Otherwise, if c If state override If is U+003A (:), then:

      1. If state override is given, then:

        1. If url ’s ’s scheme special scheme is a special scheme and buffer special scheme If is not a special scheme , then return.

        2. If url ’s ’s scheme special scheme is not a special scheme and buffer special scheme If is a special scheme , then return.

        3. If url includes credentials includes credentials or has a non-null port , and buffer is " is " file ", then return. If ", then return.

        4. If url ’s ’s scheme is " is " file " and its " and its host empty host Set is an empty host , then return.

      2. Set url ’s ’s scheme to buffer If state override If .

      3. If state override is given, then:

        1. If url ’s ’s port is url ’s ’s scheme ’s default port ’s default port , then set url ’s ’s port Return. Set to null.

        2. Return.

      4. Set buffer If to the empty string.

      5. If url ’s ’s scheme is " is " file ", then: If ", then:

        1. If remaining does not start with " does not start with " // ", ", special-scheme-missing-following-solidus validation error Set validation error .

        2. Set state file state Otherwise, if to file state .

      6. Otherwise, if url is special is special , base is non-null, and base ’s ’s scheme is url ’s ’s scheme :

        1. Assert : base is special opaque path is is special ). Set (and therefore does not have an opaque path ).

        2. Set state special relative or authority state Otherwise, if to special relative or authority state .

      7. Otherwise, if url is special is special , set state special authority slashes state Otherwise, if to special authority slashes state .

      8. Otherwise, if remaining starts with an U+002F (/), set state path or authority state to path or authority state and increase pointer Otherwise, set by 1.

      9. Otherwise, set url ’s ’s path to the empty string and set state opaque path state Otherwise, if state override to opaque path state .

    3. Otherwise, if state override is not given, set buffer to the empty string, state no scheme state to no scheme state , and start over (from the first code point in input ). Otherwise, return failure. This indication of failure is used exclusively by the ).

    4. Otherwise, return failure.

      This indication of failure is used exclusively by the Location object’s protocol setter. Furthermore, the non-failure termination earlier in this state is an intentional difference for defining that setter.

    no scheme state If no scheme state
    1. If base is null, or base opaque path has an opaque path and c is not U+0023 (#), missing-scheme-non-relative-URL validation error Otherwise, if validation error , return failure.

    2. Otherwise, if base opaque path has an opaque path and c is U+0023 (#), set url ’s ’s scheme to base ’s ’s scheme , url ’s ’s path to base ’s ’s path , url ’s ’s query to base ’s ’s query , url ’s ’s fragment to the empty string, and set state fragment state Otherwise, if to fragment state .

    3. Otherwise, if base ’s ’s scheme is not " is not " file ", set ", set state relative state to relative state and decrease pointer Otherwise, set by 1.

    4. Otherwise, set state file state to file state and decrease pointer by 1.

    special relative or authority state If special relative or authority state
    1. If c is U+002F (/) and remaining starts with U+002F (/), then set state special authority ignore slashes state to special authority ignore slashes state and increase pointer Otherwise, by 1.

    2. Otherwise, special-scheme-missing-following-solidus validation error validation error , set state relative state to relative state and decrease pointer by 1.

    path or authority state If path or authority state
    1. If c is U+002F (/), then set state authority state Otherwise, set to authority state path state .

    2. Otherwise, set state to path state , and decrease pointer by 1.

    relative state Assert: relative state
    1. Assert: base ’s ’s scheme is not " is not " file ". Set ".

    2. Set url ’s ’s scheme to base ’s ’s scheme If .

    3. If c is U+002F (/), then set state relative slash state Otherwise, if to relative slash state .

    4. Otherwise, if url is special is special and c is U+005C (\), invalid-reverse-solidus validation error validation error , set state relative slash state Otherwise: Set to relative slash state .

    5. Otherwise:

      1. Set url ’s ’s username to base ’s ’s username , url ’s ’s password to base ’s ’s password , url ’s ’s host to base ’s ’s host , url ’s ’s port to base ’s ’s port , url ’s ’s path to a clone of base ’s ’s path , and url ’s ’s query to base ’s ’s query If .

      2. If c is U+003F (?), then set url ’s ’s query to the empty string, and state query state Otherwise, if to query state .

      3. Otherwise, if c is U+0023 (#), set url ’s ’s fragment to the empty string and state fragment state Otherwise, if to fragment state .

      4. Otherwise, if c EOF code point Set is not the EOF code point :

        1. Set url ’s ’s query to null.

        2. Shorten url ’s ’s path Set .

        3. Set state path state to path state and decrease pointer by 1.

    relative slash state If relative slash state
    1. If url is special is special and c If is U+002F (/) or U+005C (\), then:

      1. If c is U+005C (\), invalid-reverse-solidus validation error Set validation error .

      2. Set state special authority ignore slashes state Otherwise, if to special authority ignore slashes state .

    2. Otherwise, if c is U+002F (/), then set state authority state Otherwise, set to authority state .

    3. Otherwise, set url ’s ’s username to base ’s ’s username , url ’s ’s password to base ’s ’s password , url ’s ’s host to base ’s ’s host , url ’s ’s port to base ’s ’s port , state path state to path state , and then, decrease pointer by 1.

    special authority slashes state If special authority slashes state
    1. If c is U+002F (/) and remaining starts with U+002F (/), then set state special authority ignore slashes state to special authority ignore slashes state and increase pointer Otherwise, by 1.

    2. Otherwise, special-scheme-missing-following-solidus validation error validation error , set state special authority ignore slashes state to special authority ignore slashes state and decrease pointer by 1.

    special authority ignore slashes state If special authority ignore slashes state
    1. If c is neither U+002F (/) nor U+005C (\), then set state authority state to authority state and decrease pointer Otherwise, by 1.

    2. Otherwise, special-scheme-missing-following-solidus validation error validation error .

    authority state If authority state
    1. If c is U+0040 (@), then:

      1. Invalid-credentials validation error If validation error .

      2. If atSignSeen is true, then prepend " is true, then prepend " %40 " to " to buffer Set .

      3. Set atSignSeen For each to true.

      4. For each codePoint in buffer If :

        1. If codePoint is U+003A (:) and passwordTokenSeen is false, then set passwordTokenSeen to true and continue Let .

        2. Let encodedCodePoints UTF-8 percent-encode be the result of running UTF-8 percent-encode codePoint userinfo percent-encode set If using the userinfo percent-encode set .

        3. If passwordTokenSeen is true, then append encodedCodePoints to url ’s ’s password Otherwise, append .

        4. Otherwise, append encodedCodePoints to url ’s ’s username Set .

      5. Set buffer Otherwise, if one of the following is true: to the empty string.

    2. Otherwise, if one of the following is true:

      then:

      1. If atSignSeen is true and buffer is the empty string, invalid-credentials validation error Decrease validation error , return failure.

      2. Decrease pointer by buffer ’s code point length ’s code point length + 1, set buffer to the empty string, and set state host state Otherwise, append to host state .

    3. Otherwise, append c to buffer .

    host state host state
    hostname state If state override hostname state
    1. If state override is given and url ’s ’s scheme is " is " file ", then decrease ", then decrease pointer by 1 and set state file host state Otherwise, if to file host state .

    2. Otherwise, if c is U+003A (:) and insideBrackets If is false, then:

      1. If buffer is the empty string, host-missing validation error If state override state override hostname state Let validation error , return failure.

      2. If state override is given and state override is hostname state , then return.

      3. Let host host parsing be the result of host parsing buffer with url is not special If is not special .

      4. If host Set is failure, then return failure.

      5. Set url ’s ’s host to host , buffer to the empty string, and state port state Otherwise, if one of the following is true: to port state .

    3. Otherwise, if one of the following is true:

      then decrease pointer If by 1, and then:

      1. If url is special is special and buffer is the empty string, host-missing validation error Otherwise, if state override validation error , return failure.

      2. Otherwise, if state override is given, buffer is the empty string, and either url includes credentials includes credentials or url ’s ’s port Let is non-null, return.

      3. Let host host parsing be the result of host parsing buffer with url is not special If is not special .

      4. If host Set is failure, then return failure.

      5. Set url ’s ’s host to host , buffer to the empty string, and state path start state If state override Otherwise: If to path start state .

      6. If state override is given, then return.

    4. Otherwise:

      1. If c is U+005B ([), then set insideBrackets If to true.

      2. If c is U+005D (]), then set insideBrackets Append to false.

      3. Append c to buffer .

    port state If port state
    1. If c ASCII digit is an ASCII digit , append c to buffer Otherwise, if one of the following is true: .

    2. Otherwise, if one of the following is true:

      then:

      1. If buffer Let is not the empty string, then:

        1. Let port be the mathematical integer value that is represented by buffer ASCII digits If in radix-10 using ASCII digits for digits with values 0 through 9.

        2. If port is greater than 2 is greater than 2 16  − 1,  − 1, port-out-of-range validation error Set validation error , return failure.

        3. Set url ’s ’s port to null, if port is url ’s ’s scheme ’s default port ’s default port ; otherwise to ; otherwise to port Set .

        4. Set buffer If state override Set to the empty string.

      2. If state path start state override is given, then return.

      3. Set state to path start state and decrease pointer Otherwise, by 1.

    3. Otherwise, port-invalid validation error validation error , return failure.

    file state Set file state
    1. Set url ’s ’s scheme to " to " file ". Set ".

    2. Set url ’s ’s host If to the empty string.

    3. If c If is U+002F (/) or U+005C (\), then:

      1. If c is U+005C (\), invalid-reverse-solidus validation error Set validation error .

      2. Set state file slash state Otherwise, if to file slash state .

    4. Otherwise, if base is non-null and base ’s ’s scheme is " is " file ": Set ":

      1. Set url ’s ’s host to base ’s ’s host , url ’s ’s path to a clone of base ’s ’s path , and url ’s ’s query to base ’s ’s query If .

      2. If c is U+003F (?), then set url ’s ’s query to the empty string and state query state Otherwise, if to query state .

      3. Otherwise, if c is U+0023 (#), set url ’s ’s fragment to the empty string and state fragment state Otherwise, if to fragment state .

      4. Otherwise, if c EOF code point Set is not the EOF code point :

        1. Set url ’s ’s query If the code point substring to null.

        2. If the code point substring from pointer to the end of input start with a Windows drive letter does not start with a Windows drive letter , then shorten url ’s ’s path Otherwise: .

        3. Otherwise:

          1. File-invalid-Windows-drive-letter validation error Set validation error .

          2. Set url ’s ’s path This is a (platform-independent) Windows drive letter quirk. Set to « ».

          This is a (platform-independent) Windows drive letter quirk.

        4. Set state path state to path state and decrease pointer Otherwise, set by 1.

    5. Otherwise, set state path state to path state , and decrease pointer by 1.

    file slash state If file slash state
    1. If c If is U+002F (/) or U+005C (\), then:

      1. If c is U+005C (\), invalid-reverse-solidus validation error Set validation error .

      2. Set state file host state Otherwise: If to file host state .

    2. Otherwise:

      1. If base is non-null and base ’s ’s scheme is " is " file ", then: Set ", then:

        1. Set url ’s ’s host to base ’s ’s host If the code point substring .

        2. If the code point substring from pointer to the end of input start with a Windows drive letter does not start with a Windows drive letter and base ’s ’s path [0] is a normalized Windows drive letter [0] is a normalized Windows drive letter , then append base ’s ’s path [0] to [0] to url ’s ’s path This is a (platform-independent) Windows drive letter quirk. Set .

          This is a (platform-independent) Windows drive letter quirk.

      2. Set state path state to path state , and decrease pointer by 1.

    file host state If file host state
    1. If c EOF code point is the EOF code point , U+002F (/), U+005C (\), U+003F (?), or U+0023 (#), then decrease pointer If state override by 1 and then:

      1. If state override is not given and buffer Windows drive letter is a Windows drive letter , file-invalid-Windows-drive-letter-host validation error validation error , set state path state This is a (platform-independent) Windows drive letter quirk. to path state .

        This is a (platform-independent) Windows drive letter quirk. buffer path state Otherwise, if is not reset here and instead used in the path state .

      2. Otherwise, if buffer Set is the empty string, then:

        1. Set url ’s ’s host If state override Set to the empty string.

        2. If state path start state Otherwise, run these steps: Let override is given, then return.

        3. Set state to path start state .

      3. Otherwise, run these steps:

        1. Let host host parsing be the result of host parsing buffer with url is not special If is not special .

        2. If host If is failure, then return failure.

        3. If host is " is " localhost ", then set ", then set host Set to the empty string.

        4. Set url ’s ’s host to host If state override Set .

        5. If state override is given, then return.

        6. Set buffer to the empty string and state path start state Otherwise, append to path start state .

    2. Otherwise, append c to buffer .

    path start state If path start state
    1. If url is special If is special , then:

      1. If c is U+005C (\), invalid-reverse-solidus validation error Set validation error .

      2. Set state path state If to path state .

      3. If c is neither U+002F (/) nor U+005C (\), then decrease pointer Otherwise, if state override by 1.

    2. Otherwise, if state override is not given and c is U+003F (?), set url ’s ’s query to the empty string and state query state Otherwise, if state override to query state .

    3. Otherwise, if state override is not given and c is U+0023 (#), set url ’s ’s fragment to the empty string and state fragment state Otherwise, if to fragment state .

    4. Otherwise, if c EOF code point Set is not the EOF code point :

      1. Set state path state If to path state .

      2. If c is not U+002F (/), then decrease pointer Otherwise, if state override by 1.

    5. Otherwise, if state override is given and url ’s ’s host is null, append the empty string to url ’s ’s path .

    path state If one of the following is true: path state
    1. If one of the following is true:

      then:

      1. If url is special is special and c is U+005C (\), invalid-reverse-solidus validation error If validation error .

      2. If buffer double-dot URL path segment is a double-dot URL path segment , then:

        1. Shorten url ’s ’s path If neither .

        2. If neither c is U+002F (/), nor url is special is special and c is U+005C (\), append the empty string to url ’s ’s path This means that for input .

          This means that for input /usr/.. the result is / Otherwise, if and not a lack of a path.

      3. Otherwise, if buffer single-dot URL path segment is a single-dot URL path segment and if neither c is U+002F (/), nor url is special is special and c is U+005C (\), append the empty string to url ’s ’s path Otherwise, if .

      4. Otherwise, if buffer single-dot URL path segment If is not a single-dot URL path segment , then:

        1. If url ’s ’s scheme is " is " file ", ", url ’s ’s path is empty is empty , and buffer Windows drive letter is a Windows drive letter , then replace the second code point in buffer This is a (platform-independent) Windows drive letter quirk. with U+003A (:).

          This is a (platform-independent) Windows drive letter quirk.

        2. Append buffer to url ’s ’s path Set .

      5. Set buffer If to the empty string.

      6. If c is U+003F (?), then set url ’s ’s query to the empty string and state query state If to query state .

      7. If c is U+0023 (#), then set url ’s ’s fragment to the empty string and state fragment state Otherwise, run these steps: If to fragment state .

    2. Otherwise, run these steps:

      1. If c URL code point is not a URL code point and not U+0025 (%), invalid-URL-unit validation error If validation error .

      2. If c is U+0025 (%) and remaining ASCII hex digits does not start with two ASCII hex digits , invalid-URL-unit validation error validation error .

      3. UTF-8 percent-encode UTF-8 percent-encode c path percent-encode set using the path percent-encode set and append the result to buffer .

    opaque path state If opaque path state
    1. If c is U+003F (?), then set url ’s ’s query to the empty string and state query state Otherwise, if to query state .

    2. Otherwise, if c is U+0023 (#), then set url ’s ’s fragment to the empty string and state fragment state Otherwise: If to fragment state .

    3. Otherwise:

      1. If c EOF code point URL code point is not the EOF code point , not a URL code point , and not U+0025 (%), invalid-URL-unit validation error If validation error .

      2. If c is U+0025 (%) and remaining ASCII hex digits does not start with two ASCII hex digits , invalid-URL-unit validation error If validation error .

      3. If c EOF code point UTF-8 percent-encode is not the EOF code point , UTF-8 percent-encode c C0 control percent-encode set using the C0 control percent-encode set and append the result to url ’s ’s path .

    query state If query state
    1. If encoding is not UTF-8 and one of the following is true:

      then set encoding to UTF-8 If one of the following is true: .

    2. If one of the following is true:

      then:

      1. Let queryPercentEncodeSet special-query percent-encode set be the special-query percent-encode set if url is special is special ; otherwise the query percent-encode set ; otherwise the query percent-encode set .

      2. Percent-encode after encoding Percent-encode after encoding , with encoding , buffer , and queryPercentEncodeSet , and append the result to url ’s ’s query This operation cannot be invoked code-point-for-code-point due to the stateful ISO-2022-JP encoder Set .

        This operation cannot be invoked code-point-for-code-point due to the stateful ISO-2022-JP encoder .

      3. Set buffer If to the empty string.

      4. If c is U+0023 (#), then set url ’s ’s fragment fragment state Otherwise, if to the empty string and state to fragment state .

    3. Otherwise, if c EOF code point If is not the EOF code point :

      1. If c URL code point is not a URL code point and not U+0025 (%), invalid-URL-unit validation error If validation error .

      2. If c is U+0025 (%) and remaining ASCII hex digits does not start with two ASCII hex digits , invalid-URL-unit validation error Append validation error .

      3. Append c to buffer .

    fragment state If fragment state
    1. If c EOF code point If is not the EOF code point , then:

      1. If c URL code point is not a URL code point and not U+0025 (%), invalid-URL-unit validation error If validation error .

      2. If c is U+0025 (%) and remaining ASCII hex digits does not start with two ASCII hex digits , invalid-URL-unit validation error validation error .

      3. UTF-8 percent-encode UTF-8 percent-encode c fragment percent-encode set using the fragment percent-encode set and append the result to url ’s ’s fragment Return .

  10. Return url To set the username .


To set the username given a url and username , set url ’s ’s username UTF-8 percent-encode to the result of running UTF-8 percent-encode on username userinfo percent-encode set To set the password using the userinfo percent-encode set .

To set the password given a url and password , set url ’s ’s password UTF-8 percent-encode to the result of running UTF-8 percent-encode on password userinfo percent-encode set using the userinfo percent-encode set .

4.5. 4.5. URL serializing URL serializing The URL serializer

The URL serializer takes a URL url , with an optional boolean exclude fragment exclude fragment ASCII string Let (default false), and then runs these steps. They return an ASCII string .

  1. Let output be url ’s ’s scheme If and U+003A (:) concatenated.

  2. If url ’s ’s host Append " is non-null:

    1. Append " // " to " to output If .

    2. If url includes credentials Append includes credentials , then:

      1. Append url ’s ’s username to output If .

      2. If url ’s ’s password is not the empty string, then append U+003A (:), followed by url ’s ’s password , to output Append U+0040 (@) to .

      3. Append U+0040 (@) to output Append .

    3. Append url ’s ’s host , serialized , to output If .

    4. If url ’s ’s port is non-null, append U+003A (:) followed by url ’s ’s port , serialized , to output If .

  3. If url ’s ’s host is null, url opaque path does not have an opaque path , url ’s ’s path ’s ’s size is greater than 1, and url ’s ’s path [0] is the empty string, then append U+002F (/) followed by U+002E (.) to [0] is the empty string, then append U+002F (/) followed by U+002E (.) to output This prevents .

    This prevents web+demo:/.//not-a-host/ or web+demo:/path/..//not-a-host/ , when parsed and then serialized , from ending up as web+demo://not-a-host/ (they end up as web+demo:/.//not-a-host/ ). Append the result of URL path serializing ).

  4. Append the result of URL path serializing url to output If .

  5. If url ’s ’s query is non-null, append U+003F (?), followed by url ’s ’s query , to output If exclude fragment .

  6. If exclude fragment is false and url ’s ’s fragment is non-null, then append U+0023 (#), followed by url ’s ’s fragment , to output Return .

  7. Return output The URL path serializer .

The URL path serializer takes a URL url ASCII string If and then runs these steps. They return an ASCII string .

4.6. 4.6. URL equivalence URL equivalence To determine whether a

To determine whether a URL A equals URL B , with an optional boolean exclude fragments exclude fragments Let (default false), run these steps:

  1. Let serializedA be the result of serializing A , with exclude fragment exclude fragment exclude fragments Let set to exclude fragments .

  2. Let serializedB be the result of serializing B , with exclude fragment exclude fragment exclude fragments Return true if set to exclude fragments .

  3. Return true if serializedA is serializedB ; otherwise false. ; otherwise false.

4.7. 4.7. Origin See

See origin ’s definition in ’s definition in HTML for the necessary background information. [HTML] The

The origin of a URL url is the origin returned by running these steps, switching on url ’s ’s scheme :

" blob " If "
  1. If url ’s blob URL entry ’s blob URL entry is non-null, then return url ’s blob URL entry ’s blob URL entry ’s ’s environment ’s ’s origin Let .

  2. Let pathURL be the result of parsing URL path serializing the result of URL path serializing url If .

  3. If pathURL opaque origin Return is failure, then return a new opaque origin .

  4. Return pathURL ’s ’s origin .

The The origin of blob:https://whatwg.org/d0360e2f-caee-469f-9a2f-87d5b0456f6f tuple origin (" is the tuple origin (" https ", " ", " whatwg.org ", null, null). ", null, null).

" ftp " "
" http " "
" https " "
" ws " "
" wss " Return the tuple origin ( "

Return the tuple origin ( url ’s ’s scheme , url ’s ’s host , url ’s ’s port , null).

" file " Unfortunate as it is, this is left as an exercise to the reader. When in doubt, return a new opaque origin Otherwise Return a new opaque origin This does indeed mean that these "

Unfortunate as it is, this is left as an exercise to the reader. When in doubt, return a new opaque origin .

Otherwise

Return a new opaque origin .

This does indeed mean that these URLs same origin cannot be same origin with themselves.

4.8. 4.8. URL rendering URL rendering A

A URL should be rendered in its serialized form, with modifications described below, when the primary purpose of displaying a URL is to have the user make a security or trust decision. For example, users are expected to make trust decisions based on a URL rendered in the browser address bar.

4.8.1. 4.8.1. Simplify non-human-readable or irrelevant components Simplify non-human-readable or irrelevant components Remove components that can provide opportunities for spoofing or distract from security-relevant information: Browsers may render only a URL’s

Remove components that can provide opportunities for spoofing or distract from security-relevant information:

4.8.2. 4.8.2. Elision In a space-constrained display, URLs should be elided carefully to avoid misleading the user when making a security decision: Browsers should ensure that at least the registrable domain ...examplecorp.com

In a space-constrained display, URLs should be elided carefully to avoid misleading the user when making a security decision:

4.8.3. 4.8.3. Internationalization and special characters Internationalization and special characters Internationalized domain names (IDNs), special characters, and bidirectional text should be handled with care to prevent spoofing: Browsers should render a

Internationalized domain names (IDNs), special characters, and bidirectional text should be handled with care to prevent spoofing:

5. 5. application/x-www-form-urlencoded The

The application/x-www-form-urlencoded format provides a way to encode a list of tuples The , each consisting of a name and a value.

The application/x-www-form-urlencoded format is in many ways an aberrant monstrosity, the result of many years of implementation accidents and compromises leading to a set of requirements necessary for interoperability, but in no way representing good design practices. In particular, readers are cautioned to pay close attention to the twisted details involving repeated (and in some cases nested) conversions between character encodings and byte sequences. Unfortunately the format is in widespread use due to the prevalence of HTML forms. [HTML]

5.1. 5.1. application/x-www-form-urlencoded parsing parsing A legacy server-oriented implementation might have to support

A legacy server-oriented implementation might have to support encodings other than UTF-8 as well as have special logic for tuples of which the name is ` as well as have special logic for tuples of which the name is ` _charset `. Such logic is not described here as only `. Such logic is not described here as only UTF-8 The is conforming.

The application/x-www-form-urlencoded parser parser takes a byte sequence input Let , and then runs these steps:

  1. Let sequences be the result of splitting input Let on 0x26 (&).

  2. Let output be an initially empty list of name-value tuples where both name and value hold a string.

  3. For each For each byte sequence bytes in sequences If :

    1. If bytes is the empty byte sequence, then continue If .

    2. If bytes contains a 0x3D (=), then let name be the bytes from the start of bytes up to but excluding its first 0x3D (=), and let value be the bytes, if any, after the first 0x3D (=) up to the end of bytes . If 0x3D (=) is the first byte, then name will be the empty byte sequence. If it is the last, then value Otherwise, let will be the empty byte sequence.

    3. Otherwise, let name have the value of bytes and let value Replace any 0x2B (+) in be the empty byte sequence.

    4. Replace any 0x2B (+) in name and value Let with 0x20 (SP).

    5. Let nameString and valueString UTF-8 decode without BOM be the result of running UTF-8 decode without BOM on the percent-decoding of name and value , respectively.

    6. Append ( ( nameString , valueString ) to ) to output Return .

  4. Return output .

5.2. 5.2. application/x-www-form-urlencoded serializing serializing The

The application/x-www-form-urlencoded serializer serializer takes a list of name-value tuples tuples , with an optional encoding encoding (default UTF-8 ), and then runs these steps. They return an ASCII string Set ), and then runs these steps. They return an ASCII string .

  1. Set encoding getting an output encoding to the result of getting an output encoding Let from encoding .

  2. Let output be the empty string.

  3. For each For each tuple of tuples :

    1. Assert : tuple ’s name and ’s name and tuple ’s value are scalar value strings Let ’s value are scalar value strings .

    2. Let name percent-encode after encoding be the result of running percent-encode after encoding with encoding , tuple ’s name, the ’s name, the application/x-www-form-urlencoded percent-encode set Let percent-encode set , and true.

    3. Let value percent-encode after encoding be the result of running percent-encode after encoding with encoding , tuple ’s value, the ’s value, the application/x-www-form-urlencoded percent-encode set If percent-encode set , and true.

    4. If output is not the empty string, then append U+0026 (&) to output Append .

    5. Append name , followed by U+003D (=), followed by value , to output Return .
  4. Return output .

5.3. 5.3. Hooks The

The application/x-www-form-urlencoded string parser scalar value string string parser takes a scalar value string input UTF-8 encodes , UTF-8 encodes it, and then returns the result of application/x-www-form-urlencoded parsing parsing it.

6. 6. API This section uses terminology from Web IDL

This section uses terminology from Web IDL . Browser user agents must support this API. JavaScript implementations should support this API. Other user agents or programming languages are encouraged to use an API suitable to their needs, which might not be this one. [WEBIDL]

6.1. 6.1. URL class URL class [Exposed=*,

[Exposed=*,
 LegacyWindowAlias=webkitURL]
interface URL {
  constructor(USVString url, optional USVString base);
  ;
  ;
           ;
           ;
           ;
           ;
           ;
           ;
           ;
           ;

  static boolean canParse(USVString url, optional USVString base);
  stringifier attribute USVString href;
  readonly attribute USVString origin;
           attribute USVString protocol;
           attribute USVString username;
           attribute USVString password;
           attribute USVString host;
           attribute USVString hostname;
           attribute USVString port;
           attribute USVString pathname;
           attribute USVString search;
  [SameObject] readonly attribute URLSearchParams searchParams;
           ;

           attribute USVString hash;
  ();

  USVString toJSON();
};
A 

A URL object has an associated:

To potentially strip trailing spaces from an opaque path given a URL object url If :

  1. If url ’s ’s URL opaque path If does not have an opaque path , then return.

  2. If url ’s ’s URL ’s ’s fragment If is non-null, then return.

  3. If url ’s ’s URL ’s ’s query Remove all trailing U+0020 SPACE code points is non-null, then return.

  4. Remove all trailing U+0020 SPACE code points from url ’s ’s URL ’s ’s path The new URL( .

The API URL parser takes a scalar value string url and an optional null-or- scalar value string base ) Let (default null), and then runs these steps:

  1. Let parsedBase If be null.

  2. If base Let is non-null:

    1. Set parsedBase basic URL parser to the result of running the basic URL parser on base If .

    2. If parsedBase throw TypeError is failure, then return failure.

  3. Return the result of running the basic URL parser on url with parsedBase .


The Let new URL( url , base ) constructor steps are:

  1. Let parsedURL basic URL parser be the result of running the API URL parser on url parsedBase If with base , if given.

  2. If parsedURL is failure, then throw a TypeError Let .

  3. Let query be parsedURL ’s ’s query Set , if that is non-null, and the empty string otherwise.

  4. Set this ’s ’s URL to parsedURL Set .

  5. Set this ’s query object ’s query object to a new URLSearchParams object.

  6. Initialize this ’s query object ’s query Set object with query .

  7. Set this ’s query object ’s query object ’s URL object ’s URL object to this .

To

To parse a string into a URL base URL without using a base URL , invoke the URL constructor with a single argument:

var input = "https://example.org/💩",
    url = new URL(input)
url
.

pathname 
// "/%F0%9F%92%A9"

pathname

//
"/%F0%9F%92%A9"



This throws an exception if the input is a 
relative-URL string

This throws an exception if the input is a relative-URL string :

try {
  var url = new URL("/🍣🍺")
} catch(e) {
  // that happened

}


For those cases a 
base URL

For those cases a base URL is necessary:

var input = "/🍣🍺",
    url = new URL(input, document.baseURI)
url
.

href 
// "https://url.spec.whatwg.org/%F0%9F%8D%A3%F0%9F%8D%BA"

href

//
"https://url.spec.whatwg.org/%F0%9F%8D%A3%F0%9F%8D%BA"



A 

A URL base URL object can be used as a base URL (as the IDL requires a string as argument, a URL object stringifies to its href getter return value): getter return value):

var url = new URL("🏳️‍🌈", new URL("https://pride.example/hello-world"))
url
.

pathname 
// "/%F0%9F%8F%B3%EF%B8%8F%E2%80%8D%F0%9F%8C%88"

pathname

//
"/%F0%9F%8F%B3%EF%B8%8F%E2%80%8D%F0%9F%8C%88"



The 


The static canParse( url , base ) method steps are:

  1. Let parsedURL be the result of running the API URL parser on url with base , if given.

  2. If parsedURL is failure, then return false.

  3. Return true.


The href getter steps and the toJSON() method steps are to return the serialization of this ’s ’s URL The .

The href Let setter steps are:

  1. Let parsedURL basic URL parser If be the result of running the basic URL parser on the given value.

  2. If parsedURL is failure, then throw a TypeError Set .

  3. Set this ’s ’s URL to parsedURL Empty .

  4. Empty this ’s query object ’s query object ’s ’s list Let .

  5. Let query be this ’s ’s URL ’s ’s query If .

  6. If query is non-null, then set this ’s query object ’s query object ’s ’s list to the result of parsing query The .

The origin getter steps are to return the serialization of this ’s ’s URL ’s ’s origin . [HTML] The

The protocol getter steps are to return this ’s ’s URL ’s ’s scheme The , followed by U+003A (:).

The protocol basic URL parse setter steps are to basic URL parse the given value, followed by U+003A (:), with this ’s ’s URL as url scheme start state and scheme start state as state override state override The .

The username getter steps are to return this ’s ’s URL ’s ’s username The .

The username If setter steps are:

  1. If this ’s ’s URL cannot have a username/password/port cannot have a username/password/port , then return.

  2. Set the username Set the username given this ’s ’s URL The and the given value.

The password getter steps are to return this ’s ’s URL ’s ’s password The .

The password If setter steps are:

  1. If this ’s ’s URL cannot have a username/password/port cannot have a username/password/port , then return.

  2. Set the password Set the password given this ’s ’s URL The and the given value.

The host Let getter steps are:

  1. Let url be this ’s ’s URL If .

  2. If url ’s ’s host If is null, then return the empty string.

  3. If url ’s ’s port is null, return url ’s ’s host , serialized Return .

  4. Return url ’s ’s host , serialized , followed by U+003A (:) and url ’s ’s port , serialized The .

The host If setter steps are:

  1. If this ’s ’s URL opaque path has an opaque path , then return.

  2. Basic URL parse Basic URL parse the given value with this ’s ’s URL as url host state and host state as state override state override If the given value for the .

If the given value for the host setter lacks a port , this ’s ’s URL ’s ’s port will not change. This can be unexpected as host URL-port string The getter does return a URL-port string so one might have assumed the setter to always "reset" both.

The hostname If getter steps are:

  1. If this ’s ’s URL ’s ’s host Return is null, then return the empty string.

  2. Return this ’s ’s URL ’s ’s host , serialized The .

The hostname If setter steps are:

  1. If this ’s ’s URL opaque path has an opaque path , then return.

  2. Basic URL parse Basic URL parse the given value with this ’s ’s URL as url hostname state and hostname state as state override state override The .

The port If getter steps are:

  1. If this ’s ’s URL ’s ’s port Return is null, then return the empty string.

  2. Return this ’s ’s URL ’s ’s port , serialized The .

The port If setter steps are:

  1. If this ’s ’s URL cannot have a username/password/port If the given value is the empty string, then set cannot have a username/password/port , then return.

  2. If the given value is the empty string, then set this ’s ’s URL ’s ’s port to null. Otherwise, basic URL parse to null.

  3. Otherwise, basic URL parse the given value with this ’s ’s URL as url port state and port state as state override state override The .

The pathname URL path serializing getter steps are to return the result of URL path serializing this ’s ’s URL The .

The pathname If setter steps are:

  1. If this ’s ’s URL opaque path has an opaque path , then return.

  2. Empty this ’s ’s URL ’s ’s path .

  3. Basic URL parse Basic URL parse the given value with this ’s ’s URL as url path start state and path start state as state override state override The .

The search If getter steps are:

  1. If this ’s ’s URL ’s ’s query Return U+003F (?), followed by is either null or the empty string, then return the empty string.

  2. Return U+003F (?), followed by this ’s ’s URL ’s ’s query The .

The search Let setter steps are:

  1. Let url be this ’s ’s URL If the given value is the empty string: Set .

  2. If the given value is the empty string:

    1. Set url ’s ’s query to null.

    2. Empty this ’s query object ’s query object ’s ’s list .

    3. Potentially strip trailing spaces from an opaque path Potentially strip trailing spaces from an opaque path with this Return. Let .

    4. Return.

  3. Let input Set be the given value with a single leading U+003F (?) removed, if any.

  4. Set url ’s ’s query to the empty string.

  5. Basic URL parse Basic URL parse input with url as url query state and query state as state override state override Set .

  6. Set this ’s query object ’s query object ’s ’s list to the result of parsing input The .

The search code points setter has the potential to remove trailing U+0020 SPACE code points from this ’s ’s URL ’s ’s path URL parser URL serializer . It does this so that running the URL parser ’s on the output of running the URL serializer on this ’s URL does not yield a URL that is not equal The .

The searchParams getter steps are to return this ’s query object The ’s query object .

The hash If getter steps are:

  1. If this ’s ’s URL ’s ’s fragment Return U+0023 (#), followed by is either null or the empty string, then return the empty string.

  2. Return U+0023 (#), followed by this ’s ’s URL ’s ’s fragment The .

The hash If the given value is the empty string: Set setter steps are:

  1. If the given value is the empty string:

    1. Set this ’s ’s URL ’s ’s fragment to null.

    2. Potentially strip trailing spaces from an opaque path Potentially strip trailing spaces from an opaque path with this Return. Let .

    3. Return.

  2. Let input Set be the given value with a single leading U+0023 (#) removed, if any.

  3. Set this ’s ’s URL ’s ’s fragment to the empty string.

  4. Basic URL parse Basic URL parse input with this ’s ’s URL as url fragment state and fragment state as state override state override The .

The hash setter has the potential to change this ’s ’s URL ’s ’s path in a manner equivalent to the search setter.

6.2. 6.2. URLSearchParams class URLSearchParams class [Exposed=*]

[Exposed=*]
interface URLSearchParams {
   = "");

  constructor(optional (sequence<sequence<USVString>> or record<USVString, USVString> or USVString) init = "");
  readonly attribute unsigned long size;
  );
  );
  );
  );
  );
  );

  undefined append(USVString name, USVString value);
  undefined delete(USVString name);
  USVString? get(USVString name);
  sequence<USVString> getAll(USVString name);
  boolean has(USVString name);
  undefined set(USVString name, USVString value);
  undefined sort();
  >;

  iterable<USVString, USVString>;
  stringifier;
};
Constructing and stringifying a

Constructing and stringifying a URLSearchParams object is fairly straightforward:

let params = new URLSearchParams({key: "730d67"})
params
.

toString

()
// "key=730d67"



//
"key=730d67"



As a 

As a URLSearchParams object uses the application/x-www-form-urlencoded format underneath there are some difference with how it encodes certain code points compared to a URL object (including href and search ). This can be especially surprising when using ). This can be especially surprising when using searchParams to operate on a URL ’s ’s query .

const url = new URL('https://example.com/?a=b ~');
console.log(url.href);   // "https://example.com/?a=b%20~"
url.searchParams.sort();
console
.

log

(

url
.

href

);
// "https://example.com/?a=b+%7E"



//
"https://example.com/?a=b+%7E"



const url = new URL('https://example.com/?a=~&b=%7E');
console.log(url.search);                // "?a=~&b=%7E"
console.log(url.searchParams.get('a')); // "~"
console
.

log

(

url
.

searchParams
.

get

(


'b'


));
// "~"



//
"~"



URLSearchParams objects will percent-encode anything in the application/x-www-form-urlencoded percent-encode set Ignoring encodings (use percent-encode set , and will encode U+0020 SPACE as U+002B (+).

Ignoring encodings (use UTF-8 ), ), search query percent-encode set special-query percent-encode set will percent-encode anything in the query percent-encode set or the special-query percent-encode set (depending on whether or not the URL is special ). A is special ).

A URLSearchParams object has an associated:

A URLSearchParams URL object object with a non-null URL object has the potential to change that object’s path in a manner equivalent to the URL object’s search and hash To setters.

To initialize a URLSearchParams object query with init If , run these steps:

  1. If init is a sequence for each , then for each innerSequence of init If :

    1. If innerSequence ’s ’s size is not 2, then throw a TypeError .

    2. Append ( ( innerSequence [0], [0], innerSequence [1]) to [1]) to query ’s ’s list Otherwise, if .

  2. Otherwise, if init is a record for each , then for each name value of init , append ( ( name , value ) to ) to query ’s ’s list Otherwise: Assert: .

  3. Otherwise:

    1. Assert: init Set is a string.

    2. Set query ’s ’s list to the result of parsing init To .

To update a URLSearchParams object query If , run these steps:

  1. If query ’s URL object Let ’s URL object is null, then return.

  2. Let serializedQuery be the serialization of query ’s ’s list If .

  3. If serializedQuery is the empty string, then set serializedQuery Set to null.

  4. Set query ’s URL object ’s URL object ’s ’s URL ’s ’s query to serializedQuery If .

  5. If serializedQuery potentially strip trailing spaces from an opaque path is null, then potentially strip trailing spaces from an opaque path with query ’s URL object The ’s URL object .

The new URLSearchParams( new URLSearchParams( init ) constructor steps are: If constructor steps are:

  1. If init is a string and starts with U+003F (?), then remove the first code point from init .

  2. Initialize this with init The .

The size getter steps are to return this ’s ’s list ’s ’s size The .

The append( name , value ) method steps are:

  1. Append ( ( name , value ) to ) to this ’s ’s list .

  2. Update this The .

The delete( name ) method steps are:

  1. Remove all tuples whose name is name from this ’s ’s list .

  2. Update this The .

The get( name ) method steps are to return the value of the first tuple whose name is name in this ’s ’s list , if there is such a tuple ; otherwise null. The ; otherwise null.

The getAll( name ) method steps are to return the values of all tuples whose name is name in this ’s ’s list The , in list order; otherwise the empty sequence.

The has( name ) method steps are to return true if there is a tuple whose name is name in this ’s ’s list ; otherwise false. The ; otherwise false.

The set( name , value ) If method steps are:

  1. If this ’s ’s list contains any tuples whose name is name , then set the value of the first such tuple to value and remove Otherwise, the others.

  2. Otherwise, append ( ( name , value ) to ) to this ’s ’s list .

  3. Update this .


It can be useful to sort the name-value tuples in a

It can be useful to sort the name-value tuples in a URLSearchParams object, in particular to increase cache hits. This can be accomplished through invoking the sort() method:

const url = new URL("https://example.org/?q=🏳️‍🌈&key=e1f7bc78");
url.searchParams.sort();
url
.

search

;
// "?key=e1f7bc78&q=%F0%9F%8F%B3%EF%B8%8F%E2%80%8D%F0%9F%8C%88"



//
"?key=e1f7bc78&q=%F0%9F%8F%B3%EF%B8%8F%E2%80%8D%F0%9F%8C%88"



To avoid altering the original input, e.g., for comparison purposes, construct a new 

To avoid altering the original input, e.g., for comparison purposes, construct a new URLSearchParams object:

const sorted = new URLSearchParams(url.search)
sorted
.

sort

()


The 

The sort() Sort all method steps are:

  1. Sort all tuples in this ’s ’s list , if any, by their names. Sorting must be done by comparison of code units. The relative order between tuples with equal names must be preserved.

  2. Update this The value pairs to iterate over .


The value pairs to iterate over are this ’s ’s list ’s ’s tuples The stringification behavior with the key being the name and the value being the value.

The stringification behavior steps are to return the serialization of this ’s ’s list .

6.3. 6.3. URL APIs elsewhere URL APIs elsewhere A standard that exposes

A standard that exposes URLs , should expose the URL as a string (by serializing an internal URL ). A standard should not expose a ). A standard should not expose a URL using a URL object. URL objects are meant for URL The higher-level notion here is that values are to be exposed as immutable data structures. If a standard decides to use a variant of the name "URL" for a feature it defines, it should name such a feature "url" (i.e., lowercase and with an "l" at the end). Names such as "URL", "URI", and "IRI" should not be used. However, if the name is a compound, "URL" (i.e., uppercase) is preferred, e.g., "newURL" and "oldURL". The manipulation. In IDL the USVString type should be used.

The higher-level notion here is that values are to be exposed as immutable data structures.

If a standard decides to use a variant of the name "URL" for a feature it defines, it should name such a feature "url" (i.e., lowercase and with an "l" at the end). Names such as "URL", "URI", and "IRI" should not be used. However, if the name is a compound, "URL" (i.e., uppercase) is preferred, e.g., "newURL" and "oldURL".

The EventSource and HashChangeEvent interfaces in HTML are examples of proper naming. [HTML]

Acknowledgments There have been a lot of people that have helped make

There have been a lot of people that have helped make URLs With that, many thanks to 100の人, Adam Barth, Addison Phillips, Adrián Chaves, Albert Wiersch, Alex Christensen, Alexis Hunt, Alexandre Morgaut, Alexis Hunt, Alwin Blok, Andrew Sullivan, Arkadiusz Michalski, Behnam Esfahbod, Bobby Holley, Boris Zbarsky, Brad Hill, Brandon Ross, Chris Dumez, Chris Rebert, Corey Farwell, Dan Appelquist, Daniel Bratell, Daniel Stenberg, David Burns, David Håsäther, David Sheets, David Singer, David Walp, Domenic Denicola, Emily Schechter, Emily Stark, Eric Lawrence, Erik Arvidsson, Gavin Carothers, Geoff Richards, Glenn Maynard, Gordon P. Hemsley, Henri Sivonen, Ian Hickson, Ilya Grigorik, Italo A. Casas, Jakub Gieryluk, James Graham, James Manger, James Ross, Jeff Hodges, Jeffrey Posnick, Jeffrey Yasskin, Joe Duarte, Joshua Bell, Jxck, Karl Wagner, 田村健人 (Kent TAMURA), Kevin Grandon, Kornel Lesiński, Larry Masinter, Leif Halvard Silli, Mark Amery, Mark Davis, Marcos Cáceres, Marijn Kruisselbrink, Martin Dürst, Mathias Bynens, Matt Falkenhagen, Matt Giuca, Michael Peick, Michael™ Smith, Michal Bukovský, Michel Suignard, Mikaël Geljić, Noah Levitt, Peter Occil, Philip Jägenstedt, Philippe Ombredanne, Prayag Verma, Rimas Misevičius, Robert Kieffer, Rodney Rehm, Roy Fielding, Ryan Sleevi, Sam Ruby, Sam Sneddon, Santiago M. Mola, Sebastian Mayr, Simon Pieters, Simon Sapin, Steven Vachon, Stuart Cook, Sven Uhlig, Tab Atkins, 吉野剛史 (Takeshi Yoshino), Tantek Çelik, Tiancheng "Timothy" Gu, Tim Berners-Lee, 簡冠庭 (Tim Guan-tin Chien), Titi_Alone, Tomek Wytrębowicz, Trevor Rowbotham, Tristan Seligmann, Valentin Gosu, Vyacheslav Matva, Wei Wang, Wolf Lammen, 山岸和利 (Yamagishi Kazutoshi), Yongsheng Zhang, 成瀬ゆい (Yui Naruse), and zealousidealroll for being awesome! This standard is written by Anne van Kesteren ( more interoperable over the years and thereby furthered the goals of this standard. Likewise many people have helped making this standard what it is today.

With that, many thanks to 100の人, Adam Barth, Addison Phillips, Adrián Chaves, Albert Wiersch, Alex Christensen, Alexis Hunt, Alexandre Morgaut, Alexis Hunt, Alwin Blok, Andrew Sullivan, Arkadiusz Michalski, Behnam Esfahbod, Bobby Holley, Boris Zbarsky, Brad Hill, Brandon Ross, Chris Dumez, Chris Rebert, Corey Farwell, Dan Appelquist, Daniel Bratell, Daniel Stenberg, David Burns, David Håsäther, David Sheets, David Singer, David Walp, Domenic Denicola, Emily Schechter, Emily Stark, Eric Lawrence, Erik Arvidsson, Gavin Carothers, Geoff Richards, Glenn Maynard, Gordon P. Hemsley, hemanth, Henri Sivonen, Ian Hickson, Ilya Grigorik, Italo A. Casas, Jakub Gieryluk, James Graham, James Manger, James Ross, Jeff Hodges, Jeffrey Posnick, Jeffrey Yasskin, Joe Duarte, Joshua Bell, Jxck, Karl Wagner, 田村健人 (Kent TAMURA), Kevin Grandon, Kornel Lesiński, Larry Masinter, Leif Halvard Silli, Mark Amery, Mark Davis, Marcos Cáceres, Marijn Kruisselbrink, Martin Dürst, Mathias Bynens, Matt Falkenhagen, Matt Giuca, Michael Peick, Michael™ Smith, Michal Bukovský, Michel Suignard, Mikaël Geljić, Noah Levitt, Peter Occil, Philip Jägenstedt, Philippe Ombredanne, Prayag Verma, Rimas Misevičius, Robert Kieffer, Rodney Rehm, Roy Fielding, Ryan Sleevi, Sam Ruby, Sam Sneddon, Santiago M. Mola, Sebastian Mayr, Simon Pieters, Simon Sapin, Steven Vachon, Stuart Cook, Sven Uhlig, Tab Atkins, 吉野剛史 (Takeshi Yoshino), Tantek Çelik, Tiancheng "Timothy" Gu, Tim Berners-Lee, 簡冠庭 (Tim Guan-tin Chien), Titi_Alone, Tomek Wytrębowicz, Trevor Rowbotham, Tristan Seligmann, Valentin Gosu, Vyacheslav Matva, Wei Wang, Wolf Lammen, 山岸和利 (Yamagishi Kazutoshi), Yongsheng Zhang, 成瀬ゆい (Yui Naruse), and zealousidealroll for being awesome!

This standard is written by Anne van Kesteren ( Apple , annevk@annevk.nl ). ).

Intellectual property rights Intellectual property rights Copyright © WHATWG (Apple, Google, Mozilla, Microsoft). This work is licensed under a Creative Commons Attribution 4.0 International License BSD 3-Clause License instead. This is the Living Standard. Those interested in the patent-review version should view the Living Standard Review Draft

Copyright © WHATWG (Apple, Google, Mozilla, Microsoft). This work is licensed under a Creative Commons Attribution 4.0 International License . To the extent portions of it are incorporated into source code, such portions in the source code are licensed under the BSD 3-Clause License instead.

This is the Living Standard. Those interested in the patent-review version should view the Living Standard Review Draft .

Index

Terms defined by this specification Terms defined by this specification

Terms defined by reference Terms defined by reference

References

Normative References Normative References [BIDI] Mark Davis; Ken Whistler.

[BIDI]
Mark Davis; Ken Whistler. Unicode Bidirectional Algorithm Unicode Bidirectional Algorithm . 16 August 2022. Unicode Standard Annex #9. URL: https://www.unicode.org/reports/tr9/tr9-46.html [ENCODING] Anne van Kesteren.
[ENCODING]
Anne van Kesteren. Encoding Standard Encoding Standard . Living Standard. URL: https://encoding.spec.whatwg.org/ [FILEAPI] Marijn Kruisselbrink.
[FILEAPI]
Marijn Kruisselbrink. File API File API . URL: https://w3c.github.io/FileAPI/ [HTML] Anne van Kesteren; et al.
[HTML]
Anne van Kesteren; et al. HTML Standard HTML Standard . Living Standard. URL: https://html.spec.whatwg.org/multipage/ [IANA-URI-SCHEMES]
[IANA-URI-SCHEMES]
Uniform Resource Identifier (URI) Schemes Uniform Resource Identifier (URI) Schemes . URL: https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml [INFRA] Anne van Kesteren; Domenic Denicola.
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard Infra Standard . Living Standard. URL: https://infra.spec.whatwg.org/ [PSL]
[PSL]
Public Suffix List Public Suffix List [RFC4291] R. Hinden; S. Deering. . Mozilla Foundation.
[RFC4291]
R. Hinden; S. Deering. IP Version 6 Addressing Architecture IP Version 6 Addressing Architecture . February 2006. Draft Standard. URL: https://www.rfc-editor.org/rfc/rfc4291 [UTS46] Mark Davis; Michel Suignard.
[UTS46]
Mark Davis; Michel Suignard. Unicode IDNA Compatibility Processing Unicode IDNA Compatibility Processing . 26 August 2022. Unicode Technical Standard #46. URL: https://www.unicode.org/reports/tr46/tr46-29.html [WEBIDL] Edgar Chen; Timothy Gu.
[WEBIDL]
Edgar Chen; Timothy Gu. Web IDL Standard Web IDL Standard . Living Standard. URL: https://webidl.spec.whatwg.org/

Informative References Informative References [ECMA-262]

[ECMA-262]
ECMAScript Language Specification ECMAScript Language Specification . URL: https://tc39.es/ecma262/multipage/ [IDNFAQ]
[IDNFAQ]
Internationalized Domain Names (IDN) FAQ Internationalized Domain Names (IDN) FAQ . URL: https://unicode.org/faq/idn.html [RFC1034] P. Mockapetris.
[RFC1034]
P. Mockapetris. Domain names - concepts and facilities Domain names - concepts and facilities . November 1987. Internet Standard. URL: https://www.rfc-editor.org/rfc/rfc1034 [RFC3986] T. Berners-Lee; R. Fielding; L. Masinter.
[RFC3986]
T. Berners-Lee; R. Fielding; L. Masinter. Uniform Resource Identifier (URI): Generic Syntax Uniform Resource Identifier (URI): Generic Syntax . January 2005. Internet Standard. URL: https://www.rfc-editor.org/rfc/rfc3986 [RFC3987] M. Duerst; M. Suignard.
[RFC3987]
M. Duerst; M. Suignard. Internationalized Resource Identifiers (IRIs) Internationalized Resource Identifiers (IRIs) . January 2005. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc3987 [RFC5890] J. Klensin.
[RFC5890]
J. Klensin. Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework . August 2010. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc5890 [RFC5952] S. Kawamura; M. Kawashima.
[RFC5952]
S. Kawamura; M. Kawashima. A Recommendation for IPv6 Address Text Representation A Recommendation for IPv6 Address Text Representation . August 2010. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc5952 [RFC6454] A. Barth.
[RFC6454]
A. Barth. The Web Origin Concept The Web Origin Concept . December 2011. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc6454 [RFC7595] D. Thaler, Ed.; T. Hansen; T. Hardie.
[RFC7595]
D. Thaler, Ed.; T. Hansen; T. Hardie. Guidelines and Registration Procedures for URI Schemes Guidelines and Registration Procedures for URI Schemes . June 2015. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc7595 [RFC791] J. Postel.
[RFC791]
J. Postel. Internet Protocol Internet Protocol . September 1981. Internet Standard. URL: https://www.rfc-editor.org/rfc/rfc791 [UTR36] Mark Davis; Michel Suignard.
[UTR36]
Mark Davis; Michel Suignard. Unicode Security Considerations Unicode Security Considerations . 19 September 2014. Unicode Technical Report #36. URL: https://www.unicode.org/reports/tr36/tr36-15.html [UTS39] Mark Davis; Michel Suignard.
[UTS39]
Mark Davis; Michel Suignard. Unicode Security Mechanisms Unicode Security Mechanisms . 26 August 2022. Unicode Technical Standard #39. URL: https://www.unicode.org/reports/tr39/tr39-26.html

IDL Index IDL Index [Exposed=*,

[Exposed=*,
 LegacyWindowAlias=webkitURL]
interface URL {
  constructor(USVString url, optional USVString base);
  static boolean canParse(USVString url, optional USVString base);
  stringifier attribute USVString href;
  readonly attribute USVString origin;
           attribute USVString protocol;
           attribute USVString username;
           attribute USVString password;
           attribute USVString host;
           attribute USVString hostname;
           attribute USVString port;
           attribute USVString pathname;
           attribute USVString search;
  [SameObject] readonly attribute URLSearchParams searchParams;
           attribute USVString hash;
  USVString toJSON();
};
[Exposed=*]
interface URLSearchParams {
  constructor(optional (sequence<sequence<USVString>> or record<USVString, USVString> or USVString) init = "");
  readonly attribute unsigned long size;
  undefined append(USVString name, USVString value);
  undefined delete(USVString name);
  USVString? get(USVString name);
  sequence<USVString> getAll(USVString name);
  boolean has(USVString name);
  undefined set(USVString name, USVString value);
  undefined sort();
  iterable<USVString, USVString>;
  stringifier;
};
Info about the 'serialize an integer' definition.

MDN

URL/URL In all current engines.

In all current engines.

Firefox 26+ Safari 14.1+ Chrome 19+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 12+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 10.0.0+
MDN

URL/hash In all current engines.

In all current engines.

Firefox 22+ Safari 7+ Chrome 32+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 13+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 7.0.0+
MDN

URL/host In all current engines.

In all current engines.

Firefox 22+ Safari 7+ Chrome 32+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 13+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 7.0.0+
MDN

URL/hostname In all current engines.

In all current engines.

Firefox 22+ Safari 10+ Chrome 32+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 13+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 7.0.0+
MDN

URL/href In all current engines.

In all current engines.

Firefox 22+ Safari 10+ Chrome 32+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 13+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 7.0.0+
MDN

URL/origin In all current engines.

In all current engines.

Firefox 26+ Safari 10+ Chrome 32+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 12+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet 6.0+ Opera Mobile Opera Mobile ?
Node.js 7.0.0+
MDN

URL/password In all current engines.

In all current engines.

Firefox 26+ Safari 10+ Chrome 32+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 12+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet 6.0+ Opera Mobile Opera Mobile ?
Node.js 7.0.0+
MDN

URL/pathname In all current engines.

In all current engines.

Firefox 22+ Safari 10+ Chrome 32+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 13+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 7.0.0+
MDN

URL/port In all current engines.

In all current engines.

Firefox 22+ Safari 10+ Chrome 32+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 13+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 7.0.0+
MDN

URL/protocol In all current engines.

In all current engines.

Firefox 22+ Safari 10+ Chrome 32+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 13+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 7.0.0+
MDN

URL/search In all current engines.

In all current engines.

Firefox 22+ Safari 10+ Chrome 32+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 13+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 7.0.0+
MDN

URL/searchParams In all current engines.

In all current engines.

Firefox 29+ Safari 10+ Chrome 51+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 17+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 7.5.0+
MDN

URL/toJSON In all current engines.

In all current engines.

Firefox 54+ Safari 11+ Chrome 71+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 17+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 7.7.0+
MDN

URL/toString In all current engines.

In all current engines.

Firefox 54+ Safari 7+ Chrome 19+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 17+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet 6.0+ Opera Mobile Opera Mobile ?
Node.js 7.0.0+
MDN

URL/username In all current engines.

In all current engines.

Firefox 26+ Safari 10+ Chrome 32+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 12+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet 6.0+ Opera Mobile Opera Mobile ?
Node.js 7.0.0+
MDN

URL In all current engines.

In all current engines.

Firefox 19+ Safari 7+ Chrome 32+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 12+ IE 10+
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView 4.4+ Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 10.0.0+
MDN

URLSearchParams/URLSearchParams In all current engines.

In all current engines.

Firefox 29+ Safari 10.1+ Chrome 49+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 17+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 7.5.0+
MDN

URLSearchParams/append In all current engines.

In all current engines.

Firefox 29+ Safari 10.1+ Chrome 49+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 17+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 7.5.0+
MDN

URLSearchParams/delete In all current engines.

In all current engines.

Firefox 29+ Safari 14+ Chrome 49+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 17+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 7.5.0+
MDN

URLSearchParams/get In all current engines.

In all current engines.

Firefox 29+ Safari 10.1+ Chrome 49+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 17+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 7.5.0+
MDN

URLSearchParams/getAll In all current engines.

In all current engines.

Firefox 29+ Safari 10.1+ Chrome 49+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 17+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 7.5.0+
MDN

URLSearchParams/has In all current engines.

In all current engines.

Firefox 29+ Safari 10.1+ Chrome 49+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 17+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 7.5.0+
MDN

URLSearchParams/set In all current engines.

In all current engines.

Firefox 29+ Safari 10.1+ Chrome 49+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 17+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 7.5.0+
MDN

URLSearchParams/sort In all current engines.

In all current engines.

Firefox 54+ Safari 11+ Chrome 61+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 17+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 7.7.0+
MDN

URLSearchParams/toString In all current engines.

In all current engines.

Firefox 29+ Safari 10.1+ Chrome 49+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 17+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 7.5.0+
MDN

URLSearchParams In all current engines.

In all current engines.

Firefox 29+ Safari 10.1+ Chrome 49+
Opera ? Edge 79+
Edge (Legacy) Edge (Legacy) 17+ IE None
Firefox for Android Firefox for Android ? iOS Safari iOS Safari ? Chrome for Android Chrome for Android ? Android WebView Android WebView ? Samsung Internet Samsung Internet ? Opera Mobile Opera Mobile ?
Node.js 10.0.0+ /* position-annos */ "use strict"; { function repositionAnnoPanels(){ const panels = [...document.querySelectorAll("[data-anno-for]")]; const main = document.querySelector("main"); let mainRect; if(main) mainRect = main.getBoundingClientRect(); for(const panel of panels) { const dfn = document.getElementById(panel.getAttribute("data-anno-for")); if(!dfn) { console.log("Can't find the annotation panel target:", panel); continue; } const rect = dfn.getBoundingClientRect(); const top = window.scrollY + rect.top panel.style.top = top + "px"; panel.top = top; panel.height = rect.height; panel.classList.remove("unpositioned"); const panelRect = panel.getBoundingClientRect() if(main) { panel.classList.toggle("overlapping-main", panelRect.left < mainRect.right) } } let vSoFar = 0; for(const panel of panels.sort(cmpTops)) { console.log(panel.top, vSoFar); if(panel.top < vSoFar) { panel.top = vSoFar; panel.style.top = vSoFar + "px"; } vSoFar = panel.top + panel.height + 15; } } function cmpTops(a,b) { return a.top - b.top; } window.addEventListener("load", repositionAnnoPanels); window.addEventListener("resize", repositionAnnoPanels); }