Goals
The URL standard takes the following approach towards making URLs fully interoperable:
-
Align RFC 3986 and RFC 3987 with contemporary implementations and obsolete them in the process. (E.g., spaces, other "illegal" code points, query encoding, equality, canonicalization, are all concepts not entirely shared, or defined.) URL parsing needs to become as solid as HTML parsing. [RFC3986] [RFC3987]
-
Standardize on the term URL. URI and IRI are just confusing. In practice a single algorithm is used for both so keeping them distinct is not helping anyone. URL also easily wins the search result popularity contest .
-
Supplanting Origin of a URI [sic] . [RFC6454]
-
Define URL’s existing JavaScript API in full detail and add enhancements to make it easier to work with. Add a new
URL
object as well for URL manipulation without usage of HTML elements. (Useful for JavaScript worker environments.) -
Ensure the combination of parser, serializer, and API guarantee idempotence. For example, a non-failure result of a parse-then-serialize operation will not change with any further parse-then-serialize operations applied to it. Similarly, manipulating a non-failure result through the API will not change from applying any number of serialize-then-parse operations to it.
As the editors learn more about the subject matter the goals might increase in scope somewhat.
1. Infrastructure
This specification depends on the Infra Standard. [INFRA]
Some terms used in this specification are defined in the following standards and specifications:
- DOM Standard [DOM]
- Encoding Standard [ENCODING]
- File API [FILEAPI]
- HTML Standard [HTML]
- Media Source Extensions [MEDIA-SOURCE]
- Unicode IDNA Compatibility Processing [UTS46]
- Web IDL [WEBIDL]
To serialize an integer , represent it as the shortest possible decimal number.
1.1. Writing
A validation error indicates a mismatch between input and valid input. User agents, especially conformance checkers, are encouraged to report them somewhere.
A validation error does not mean that the parser terminates. Termination of a parser is always stated explicitly, e.g., through a return statement.
It is useful to signal validation errors as error-handling can be non-intuitive, legacy user agents might not implement correct error-handling, and the intent of what is written might be unclear to other developers.
1.2. Parsers
The EOF code point is a conceptual code point that signifies the end of a string or code point stream.
A pointer for a string input is an integer that points to a code point within input . Initially it points to the start of input . If it is −1 it points nowhere. If it is greater than or equal to input ’s code point length , it points to the EOF code point .
When a pointer is used, c references the code point the pointer points to as long as it does not point nowhere. When the pointer points to nowhere c cannot be used.
When a pointer is used, remaining references the substring after c as long as c is not the EOF code point . When c is the EOF code point remaining cannot be used.
If
"
mailto:username@example
"
is
a
string
being
processed
and
a
pointer
points
to
@,
c
is
U+0040
(@)
and
remaining
is
"
example
".
If the empty string is being processed and a pointer points to the start and is then decreased by 1, using c or remaining would be an error.
1.3. Percent-encoded bytes
A percent-encoded byte is U+0025 (%), followed by two ASCII hex digits . Sequences of percent-encoded bytes , percent-decoded , should not cause UTF-8 decode without BOM or fail to return failure.
To percent-encode a byte byte , return a string consisting of U+0025 (%), followed by two ASCII upper hex digits representing byte .
To percent-decode a byte sequence input , run these steps:
Using anything but UTF-8 decode without BOM when input contains bytes that are not ASCII bytes might be insecure and is not recommended.
-
Let output be an empty byte sequence .
-
For each byte byte in input :
-
If byte is not 0x25 (%), then append byte to output .
-
Otherwise, if byte is 0x25 (%) and the next two bytes after byte in input are not in the ranges 0x30 (0) to 0x39 (9), 0x41 (A) to 0x46 (F), and 0x61 (a) to 0x66 (f), all inclusive, append byte to output .
-
Otherwise:
-
Let bytePoint be the two bytes after byte in input , decoded , and then interpreted as hexadecimal number.
-
Append a byte whose value is bytePoint to output .
-
Skip the next two bytes in input .
-
-
-
Return output .
To percent-decode a string input , run these steps:
-
Let bytes be the UTF-8 encoding of input .
-
Return the percent-decoding of bytes .
In general, percent-encoding results in a string with more U+0025 (%) code points than the input, and percent-decoding results in a byte sequence with less 0x25 (%) bytes than the input.
The C0 control percent-encode set are the C0 controls and all code points greater than U+007E (~).
The fragment percent-encode set is the C0 control percent-encode set and U+0020 SPACE, U+0022 ("), U+003C (<), U+003E (>), and U+0060 (`).
The query percent-encode set is the C0 control percent-encode set and U+0020 SPACE, U+0022 ("), U+0023 (#), U+003C (<), and U+003E (>).
The query percent-encode set cannot be defined in terms of the fragment percent-encode set due to the omission of U+0060 (`).
The special-query percent-encode set is the query percent-encode set and U+0027 (').
The path percent-encode set is the query percent-encode set and U+003F (?), U+0060 (`), U+007B ({), and U+007D (}).
The userinfo percent-encode set is the path percent-encode set and U+002F (/), U+003A (:), U+003B (;), U+003D (=), U+0040 (@), U+005B ([) to U+005E (^), inclusive, and U+007C (|).
The component percent-encode set is the userinfo percent-encode set and U+0024 ($) to U+0026 (&), inclusive, U+002B (+), and U+002C (,).
This
is
used
by
HTML
for
registerProtocolHandler()
,
and
could
also
be
used
by
other
standards
to
percent-encode
data
that
can
then
be
embedded
in
a
URL
’s
path
,
query
,
or
fragment
.
Using
it
with
UTF-8
percent-encode
gives
identical
results
to
JavaScript’s
encodeURIComponent()
[sic]
.
[HTML]
[ECMA-262]
The
application/x-www-form-urlencoded
percent-encode
set
is
the
component
percent-encode
set
and
U+0021
(!),
U+0027
(')
to
U+0029
RIGHT
PARENTHESIS,
inclusive,
and
U+007E
(~).
The
application/x-www-form-urlencoded
percent-encode
set
contains
all
code
points,
except
the
ASCII
alphanumeric
,
U+002A
(*),
U+002D
(-),
U+002E
(.),
and
U+005F
(_).
To percent-encode after encoding , given an encoding encoding , code point codePoint , and a percentEncodeSet , run these steps:
-
Let bytes be the result of encoding codePoint using encoding .
-
If bytes starts with 0x26 (&) 0x23 (#) and ends with 0x3B (;), then:
-
Let output be bytes , isomorphic decoded .
-
Replace the first two code points of output with "
%26%23
". -
Replace the last code point of output with "
%3B
". -
Return output .
This can happen when encoding is not UTF-8 .
-
-
Let output be the empty string.
-
For each byte of bytes :
-
Let isomorph be a code point whose value is byte ’s value .
-
Assert: percentEncodeSet includes all non- ASCII code points .
-
If isomorph is not in percentEncodeSet , then append isomorph to output .
-
Otherwise, percent-encode byte and append the result to output .
-
-
Return output .
To percent-encode after encoding , given an encoding encoding , string input , a percentEncodeSet , and a boolean spaceAsPlus , run these steps:
-
Let output be the empty string.
-
For each codePoint of input :
-
If spaceAsPlus is true and codePoint is U+0020, then append U+002B (+) to output .
-
Otherwise, run percent-encode after encoding with encoding , codePoint , and percentEncodeSet , and append the result to output .
-
-
Return output .
To UTF-8 percent-encode a code point codePoint using a percentEncodeSet , return the result of running percent-encode after encoding with UTF-8 , codePoint , and percentEncodeSet .
To UTF-8 percent-encode a string input using a percentEncodeSet , return the result of running percent-encode after encoding with UTF-8 , input , percentEncodeSet , and false.
Here is a summary, by way of example, of the operations defined above:
Operation | Input | Output |
---|---|---|
Percent-encode input | 0x7F |
"
%23
"
|
Percent-decode input |
`
%25%s%1G
`
|
`
%%s%1G
`
|
Percent-decode input |
"
‽%25%2E
"
| 0xE2 0x80 0xBD 0x25 0x2E |
Percent-encode after encoding with Shift_JIS , input , and the userinfo percent-encode set | U+0020 |
"
%20
"
|
U+2261 (≡) |
"
%81%DF
"
| |
U+203D (‽) |
"
%26%238253%3B
"
| |
Percent-encode after encoding with ISO-2022-JP , input , and the userinfo percent-encode set | U+00A5 (¥) |
"
%1B(J\%1B(B
"
|
Percent-encode after encoding with Shift_JIS , input , the userinfo percent-encode set , and true |
"
1+1
≡
2%20‽
"
|
"
1+1+%81%DF+2%20%26%238253%3B
"
|
UTF-8 percent-encode input using the userinfo percent-encode set | U+2261 (≡) |
"
%E2%89%A1
"
|
U+203D (‽) |
"
%E2%80%BD
"
| |
UTF-8 percent-encode input using the userinfo percent-encode set |
"
Say
what‽
"
|
"
Say%20what%E2%80%BD
"
|
2. Security considerations
The security of a URL is a function of its environment. Care is to be taken when rendering, interpreting, and passing URLs around.
When rendering and allocating new URLs "spoofing" needs to be considered. An attack whereby one host or URL can be confused for another. For instance, consider how 1/l/I, m/rn/rri, 0/O, and а/a can all appear eerily similar. Or worse, consider how U+202A LEFT-TO-RIGHT EMBEDDING and similar code points are invisible. [UTR36]
When passing a URL from party A to B , both need to carefully consider what is happening. A might end up leaking data it does not want to leak. B might receive input it did not expect and take an action that harms the user. In particular, B should never trust A , as at some point URLs from A can come from untrusted sources.
3. Hosts (domains and IP addresses)
At a high level, a host , valid host string , host parser , and host serializer relate as follows:
-
The host parser takes an arbitrary string and returns either failure or a host .
-
A host can be seen as the in-memory representation.
-
A valid host string defines what input would not trigger a validation error or failure when given to the host parser . I.e., input that would be considered conforming or valid.
-
The host serializer takes a host and returns a string. (If that string is then parsed , the result will equal the host that was serialized .)
3.1. Host representation
A host is a domain , an IPv4 address , an IPv6 address , an opaque host , or an empty host . Typically a host serves as a network address, but it is sometimes used as opaque identifier in URLs where a network address is not necessary.
The RFCs referenced in the paragraphs below are for informative purposes only. They have no influence on host writing, parsing, and serialization. Unless stated otherwise in the sections that follow.
A domain is a non-empty ASCII string that identifies a realm within a network. [RFC1034]
The
example.com
and
example.com.
domains
are
not
equivalent
and
typically
treated
as
distinct.
An IPv4 address is a 32-bit unsigned integer that identifies a network address. [RFC791]
An IPv6 address is a 128-bit unsigned integer that identifies a network address. For the purposes of this standard it is represented as a list of eight 16-bit unsigned integers, also known as IPv6 pieces . [RFC4291]
Support
for
<zone_id>
is
intentionally
omitted
.
An opaque host is a non-empty ASCII string that can be used for further processing.
An empty host is the empty string.
3.2. Host miscellaneous
A forbidden host code point is U+0000 NULL, U+0009 TAB, U+000A LF, U+000D CR, U+0020 SPACE, U+0023 (#), U+0025 (%), U+002F (/), U+003A (:), U+003C (<), U+003E (>), U+003F (?), U+0040 (@), U+005B ([), U+005C (\), U+005D (]), or U+005E (^).
A host ’s public suffix is the portion of a host which is included on the Public Suffix List . To obtain host ’s public suffix , run these steps: [PSL]
-
If host is not a domain , then return null.
-
Let publicSuffix be the public suffix determined by running the Public Suffix List algorithm with host as domain. [PSL]
-
Assert: publicSuffix is an ASCII string .
-
Return publicSuffix .
A host ’s registrable domain is a domain formed by the most specific public suffix, along with the domain label immediately preceeding it, if any. To obtain host ’s registrable domain , run these steps:
-
If host ’s public suffix is null or host ’s public suffix equals host , then return null.
-
Let registrableDomain be the registrable domain determined by running the Public Suffix List algorithm with host as domain. [PSL]
-
Assert: registrableDomain is an ASCII string .
-
Return registrableDomain .
Host input | Public suffix | Registrable domain |
---|---|---|
com
|
com
| null |
example.com
|
com
|
example.com
|
www.example.com
|
com
|
example.com
|
sub.www.example.com
|
com
|
example.com
|
EXAMPLE.COM
|
com
|
example.com
|
github.io
|
github.io
| null |
whatwg.github.io
|
github.io
|
whatwg.github.io
|
إختبار
|
xn--kgbechtv
| null |
example.إختبار
|
xn--kgbechtv
|
example.xn--kgbechtv
|
sub.example.إختبار
|
xn--kgbechtv
|
example.xn--kgbechtv
|
[2001:0db8:85a3:0000:0000:8a2e:0370:7334]
| null | null |
Specifications should prefer the origin concept for security decisions. The notion of " public suffix " and " registrable domain " cannot be relied-upon to provide a hard security boundary, as the public suffix list will diverge from client to client. Specifications which ignore this advice are encouraged to carefully consider whether URLs' schemes ought to be incorporated into any decisions made, i.e. whether to use the same site or schemelessly same site concepts.
3.3. IDNA
The domain to ASCII algorithm, given a string domain and optionally a boolean beStrict , runs these steps:
-
If beStrict is not given, set it to false.
-
Let result be the result of running Unicode ToASCII with domain_name set to domain , UseSTD3ASCIIRules set to beStrict , CheckHyphens set to false, CheckBidi set to true, CheckJoiners set to true, Transitional_Processing set to false, and VerifyDnsLength set to beStrict .
-
If result is a failure value, validation error , return failure.
-
If result is the empty string, validation error , return failure.
-
Return result .
The domain to Unicode algorithm, given a domain domain , runs these steps:
-
Let result be the result of running Unicode ToUnicode with domain_name set to domain , CheckHyphens set to false, CheckBidi set to true, CheckJoiners set to true, UseSTD3ASCIIRules set to false, and Transitional_Processing set to false.
-
Signify validation errors for any returned errors, and then, return result .
3.4. Host writing
A valid host string must be a valid domain string , a valid IPv4-address string , or: U+005B ([), followed by a valid IPv6-address string , followed by U+005D (]).
A domain is a valid domain if these steps return success:
-
Let result be the result of running domain to ASCII with domain and true.
-
If result is failure, then return failure.
-
Set result to the result of running Unicode ToUnicode with domain_name set to result , CheckHyphens set to false, CheckBidi set to true, CheckJoiners set to true, UseSTD3ASCIIRules set to true, and Transitional_Processing set to false.
-
If result contains any errors, return failure.
-
Return success.
Ideally we define this in terms of a sequence of code points that make up a valid domain rather than through a whack-a-mole: issue 245 .
A valid domain string must be a string that is a valid domain .
A valid IPv4-address string must be four sequences of up to three ASCII digits per sequence, each representing a decimal number no greater than 255, and separated from each other by U+002E (.).
A valid IPv6-address string is defined in the "Text Representation of Addresses" chapter of IP Version 6 Addressing Architecture . [RFC4291]
A valid opaque-host string must be one of the following:
-
one or more URL units excluding forbidden host code points
-
U+005B ([), followed by a valid IPv6-address string , followed by U+005D (]).
This is not part of the definition of valid host string as it requires context to be distinguished.
3.5. Host parsing
The host parser takes a string input with an optional boolean isNotSpecial , and then runs these steps:
-
If isNotSpecial is not given, then set isNotSpecial to false.
-
If input starts with U+005B ([), then:
-
If input does not end with U+005D (]), validation error , return failure.
-
Return the result of IPv6 parsing input with its leading U+005B ([) and trailing U+005D (]) removed.
-
-
If isNotSpecial is true, then return the result of opaque-host parsing input .
-
Assert: input is not the empty string.
-
Let domain be the result of running UTF-8 decode without BOM on the percent-decoding of input .
Alternatively UTF-8 decode without BOM or fail can be used, coupled with an early return for failure, as domain to ASCII fails on U+FFFD REPLACEMENT CHARACTER.
-
Let asciiDomain be the result of running domain to ASCII on domain .
-
If asciiDomain is failure, validation error , return failure.
-
If asciiDomain contains a forbidden host code point , validation error , return failure.
-
Let ipv4Host be the result of IPv4 parsing asciiDomain .
-
If ipv4Host is an IPv4 address or failure, return ipv4Host .
-
Return asciiDomain .
The IPv4 parser takes a string input and then runs these steps:
-
Let validationError be false.
This uses validationError to track validation errors to avoid reporting them before we are confident we want to parse input as an IPv4 address as the host parser almost always invokes the IPv4 parser .
-
Let parts be the result of strictly splitting input on U+002E (.).
-
If the last item in parts is the empty string, then:
-
If parts ’s size is greater than 4, then return input .
-
Let numbers be an empty list .
-
For each part of parts :
-
If part is the empty string, then return input .
0..0x300
is a domain , not an IPv4 address . -
Let result be the result of parsing part .
-
If result is failure, then return input .
-
If result [1] is true, then set validationError to true.
-
Append result [0] to numbers .
-
-
If validationError is true, validation error .
At this point each part was parsed into a number and input will be treated as an IPv4 address (or failure). And therefore error reporting resumes.
-
If any item in numbers is greater than 255, validation error .
-
If any but the last item in numbers is greater than 255, then return failure.
-
If the last item in numbers is greater than or equal to 256 (5 − numbers ’s size ) , validation error , return failure.
-
Let ipv4 be the last item in numbers .
-
Let counter be 0.
-
For each n of numbers :
-
Increment ipv4 by n × 256 (3 − counter ) .
-
Increment counter by 1.
-
-
Return ipv4 .
The IPv4 number parser takes a string input and then runs these steps:
-
Let validationError be false.
-
Let R be 10.
-
If input contains at least two code points and the first two code points are either "
0x
" or "0X
", then:-
Set validationError to true.
-
Remove the first two code points from input .
-
Set R to 16.
-
-
Otherwise, if input contains at least two code points and the first code point is U+0030 (0), then:
-
Set validationError to true.
-
Remove the first code point from input .
-
Set R to 8.
-
-
If input is the empty string, then return 0.
-
If input contains a code point that is not a radix- R digit, then return failure.
-
Let output be the mathematical integer value that is represented by input in radix- R notation, using ASCII hex digits for digits with values 0 through 15.
-
Return ( output , validationError ).
The IPv6 parser takes a string input and then runs these steps:
-
Let address be a new IPv6 address whose IPv6 pieces are all 0.
-
Let pieceIndex be 0.
-
Let compress be null.
-
Let pointer be a pointer for input .
-
If c is U+003A (:), then:
-
If remaining does not start with U+003A (:), validation error , return failure.
-
Increase pointer by 2.
-
Increase pieceIndex by 1 and then set compress to pieceIndex .
-
-
While c is not the EOF code point :
-
If pieceIndex is 8, validation error , return failure.
-
If c is U+003A (:), then:
-
If compress is non-null, validation error , return failure.
- Increase pointer and pieceIndex by 1, set compress to pieceIndex , and then continue .
-
-
Let value and length be 0.
-
While length is less than 4 and c is an ASCII hex digit , set value to value × 0x10 + c interpreted as hexadecimal number, and increase pointer and length by 1.
-
If c is U+002E (.), then:
-
If length is 0, validation error , return failure.
-
Decrease pointer by length .
-
If pieceIndex is greater than 6, validation error , return failure.
-
Let numbersSeen be 0.
-
While c is not the EOF code point :
-
Let ipv4Piece be null.
-
If numbersSeen is greater than 0, then:
-
If c is a U+002E (.) and numbersSeen is less than 4, then increase pointer by 1.
- Otherwise, validation error , return failure.
-
-
If c is not an ASCII digit , validation error , return failure.
-
While c is an ASCII digit :
-
Let number be c interpreted as decimal number.
-
If ipv4Piece is null, then set ipv4Piece to number .
Otherwise, if ipv4Piece is 0, validation error , return failure.
Otherwise, set ipv4Piece to ipv4Piece × 10 + number .
-
If ipv4Piece is greater than 255, validation error , return failure.
-
Increase pointer by 1.
-
-
Set address [ pieceIndex ] to address [ pieceIndex ] × 0x100 + ipv4Piece .
-
Increase numbersSeen by 1.
-
If numbersSeen is 2 or 4, then increase pieceIndex by 1.
-
-
If numbersSeen is not 4, validation error , return failure.
-
Break .
-
-
Otherwise, if c is U+003A (:):
-
Increase pointer by 1.
-
If c is the EOF code point , validation error , return failure.
-
-
Otherwise, if c is not the EOF code point , validation error , return failure.
-
Set address [ pieceIndex ] to value .
-
Increase pieceIndex by 1.
-
-
If compress is non-null, then:
-
Let swaps be pieceIndex − compress .
-
Set pieceIndex to 7.
-
While pieceIndex is not 0 and swaps is greater than 0, swap address [ pieceIndex ] with address [ compress + swaps − 1], and then decrease both pieceIndex and swaps by 1.
-
-
Otherwise, if compress is null and pieceIndex is not 8, validation error , return failure.
-
Return address .
The opaque-host parser takes a string input , and then runs these steps:
-
If input contains a forbidden host code point excluding U+0025 (%), validation error , return failure.
-
If input contains a code point that is not a URL code point and not U+0025 (%), validation error .
-
If input contains a U+0025 (%) and the two code points following it are not ASCII hex digits , validation error .
-
Return the result of running UTF-8 percent-encode on input using the C0 control percent-encode set .
3.6. Host serializing
The host serializer takes a host host and then runs these steps:
-
If host is an IPv4 address , return the result of running the IPv4 serializer on host .
-
Otherwise, if host is an IPv6 address , return U+005B ([), followed by the result of running the IPv6 serializer on host , followed by U+005D (]).
-
Otherwise, host is a domain , opaque host , or empty host , return host .
The IPv4 serializer takes an IPv4 address address and then runs these steps:
-
Let output be the empty string.
-
Let n be the value of address .
-
For each i in the range 1 to 4, inclusive:
-
Prepend n % 256, serialized , to output .
-
If i is not 4, then prepend U+002E (.) to output .
-
Set n to floor( n / 256).
-
-
Return output .
The IPv6 serializer takes an IPv6 address address and then runs these steps:
-
Let output be the empty string.
-
Let compress be an index to the first IPv6 piece in the first longest sequences of address ’s IPv6 pieces that are 0.
-
If there is no sequence of address ’s IPv6 pieces that are 0 that is longer than 1, then set compress to null.
-
Let ignore0 be false.
-
For each pieceIndex in the range 0 to 7, inclusive:
-
If ignore0 is true and address [ pieceIndex ] is 0, then continue .
-
Otherwise, if ignore0 is true, set ignore0 to false.
-
If compress is pieceIndex , then:
-
Let separator be "
::
" if pieceIndex is 0, and U+003A (:) otherwise. -
Append separator to output .
-
Set ignore0 to true and continue .
-
-
Append address [ pieceIndex ], represented as the shortest possible lowercase hexadecimal number, to output .
-
If pieceIndex is not 7, then append U+003A (:) to output .
-
-
Return output .
This algorithm requires the recommendation from A Recommendation for IPv6 Address Text Representation. [RFC5952]
3.7. Host equivalence
To determine whether a host A equals B , return true if A is B , and false otherwise.
Certificate comparison requires a host equivalence check that ignores the trailing dot of a domain (if any). However, those hosts have also various other facets enforced, such as DNS length, that are not enforced here, as URLs do not enforce them. If anyone has a good suggestion for how to bring these two closer together, or what a good unified model would be, please file an issue.
4. URLs
In all current engines.
Opera 11+ Edge 79+
Edge (Legacy) 12+ IE 10+
Firefox for Android Yes iOS Safari Yes Chrome for Android Yes Android WebView Yes Samsung Internet Yes Opera Mobile Yes
At a high level, a URL , valid URL string , URL parser , and URL serializer relate as follows:
-
The URL parser takes an arbitrary string and returns either failure or a URL .
-
A URL can be seen as the in-memory representation.
-
A valid URL string defines what input would not trigger a validation error or failure when given to the URL parser . I.e., input that would be considered conforming or valid.
-
The URL serializer takes a URL and returns an ASCII string . (If that string is then parsed , the result will equal the URL that was serialized .)
Input | Base | Valid | Output |
---|---|---|---|
https:example.org
| ❌ |
https://example.org/
| |
https://////example.com///
| ❌ |
https://example.com///
| |
https://example.com/././foo
| ✅ |
https://example.com/foo
| |
hello:world
|
https://example.com/
| ✅ |
hello:world
|
https:example.org
|
https://example.com/
| ❌ |
https://example.com/example.org
|
\example\..\demo/.\
|
https://example.com/
| ❌ |
https://example.com/demo/
|
example
|
https://example.com/demo
| ✅ |
https://example.com/example
|
file:///C|/demo
| ❌ |
file:///C:/demo
| |
..
|
file:///C:/demo
| ✅ |
file:///C:/
|
file://loc%61lhost/
| ✅ |
file:///
| |
https://user:password@example.org/
| ❌ |
https://user:password@example.org/
| |
https://example.org/foo
bar
| ❌ |
https://example.org/foo%20bar
| |
https://EXAMPLE.com/../x
| ✅ |
https://example.com/x
| |
https://ex
ample.org/
| ❌ | Failure | |
example
| ❌, due to lack of base | Failure | |
https://example.com:demo
| ❌ | Failure | |
http://[www.example.com]/
| ❌ | Failure | |
https://example.org//
| ✅ |
https://example.org//
|
The base and output URL are represented in serialized form for brevity.
4.1. URL representation
A URL is a universal identifier. To disambiguate from a valid URL string it can also be referred to as a URL record .
A URL ’s scheme is an ASCII string that identifies the type of URL and can be used to dispatch a URL for further processing after parsing . It is initially the empty string.
A URL ’s username is an ASCII string identifying a username. It is initially the empty string.
A URL ’s password is an ASCII string identifying a password. It is initially the empty string.
A URL ’s host is null or a host . It is initially null.
The following table lists allowed URL ’s scheme / host combinations.
scheme | host | |||||
---|---|---|---|---|---|---|
domain | IPv4 address | IPv6 address | opaque host | empty host | null | |
Special
schemes
excluding
"
file
"
| ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
"
file
"
| ✅ | ✅ | ✅ | ❌ | ✅ | ✅ |
Others | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ |
A URL ’s port is either null or a 16-bit unsigned integer that identifies a networking port. It is initially null.
A URL ’s path is a list of zero or more ASCII strings , usually identifying a location in hierarchical form. It is initially empty.
A special URL always has a non-empty path .
A URL ’s query is either null or an ASCII string . It is initially null.
A URL ’s fragment is either null or an ASCII string that can be used for further processing on the resource the URL ’s other components identify. It is initially null.
A URL also has an associated cannot-be-a-base-URL flag . It is initially unset.
A URL also has an associated blob URL entry that is either null or a blob URL entry . It is initially null.
This
is
used
to
support
caching
the
object
a
"
blob
"
URL
refers
to
as
well
as
its
origin.
It
is
important
that
these
are
cached
as
the
URL
might
be
removed
from
the
blob
URL
store
between
parsing
and
fetching,
while
fetching
will
still
need
to
succeed.
The following table lists how valid URL strings , when parsed , map to a URL ’s components. Username , password , and blob URL entry are omitted; in the examples below they are the empty string, the empty string, and null, respectively.
Input | Scheme | Host | Port | Path | Query | Fragment | Cannot-be-a-base-URL flag |
---|---|---|---|---|---|---|---|
https://example.com/
|
"
https
"
|
"
example.com
"
| null | « the empty string » | null | null | unset |
https://localhost:8000/search?q=text#hello
|
"
https
"
|
"
localhost
"
| 8000 |
«
"
search
"
»
|
"
q=text
"
|
"
hello
"
| unset |
urn:isbn:9780307476463
|
"
urn
"
| null | null |
«
"
isbn:9780307476463
"
»
| null | null | set |
file:///ada/Analytical%20Engine/README.md
|
"
file
"
| null | null |
«
"
ada
",
"
Analytical%20Engine
",
"
README.md
"
»
| null | null | unset |
4.2. URL miscellaneous
A special scheme is a scheme listed in the first column of the following table. A default port is a special scheme ’s corresponding port and is listed in the second column on the same row.
scheme | port |
---|---|
"
ftp
"
| 21 |
"
file
"
| null |
"
http
"
| 80 |
"
https
"
| 443 |
"
ws
"
| 80 |
"
wss
"
| 443 |
A URL is special if its scheme is a special scheme . A URL is not special if its scheme is not a special scheme .
A URL includes credentials if its username or password is not the empty string.
A
URL
cannot
have
a
username/password/port
if
its
host
is
null
or
the
empty
string,
its
cannot-be-a-base-URL
flag
is
set,
or
its
scheme
is
"
file
".
A URL can be designated as base URL .
A base URL is useful for the URL parser when the input might be a relative-URL string .
A Windows drive letter is two code points, of which the first is an ASCII alpha and the second is either U+003A (:) or U+007C (|).
A normalized Windows drive letter is a Windows drive letter of which the second code point is U+003A (:).
As per the URL writing section, only a normalized Windows drive letter is conforming.
A string starts with a Windows drive letter if all of the following are true:
- its length is greater than or equal to 2
- its first two code points are a Windows drive letter
- its length is 2 or its third code point is U+002F (/), U+005C (\), U+003F (?), or U+0023 (#).
To shorten a url ’s path :
-
Let path be url ’s path .
-
If path is empty , then return.
-
If url ’s scheme is "
file
", path ’s size is 1, and path [0] is a normalized Windows drive letter , then return. -
Remove path ’s last item.
4.3. URL writing
A valid URL string must be either a relative-URL-with-fragment string or an absolute-URL-with-fragment string .
An absolute-URL-with-fragment string must be an absolute-URL string , optionally followed by U+0023 (#) and a URL-fragment string .
An absolute-URL string must be one of the following:
-
a URL-scheme string that is an ASCII case-insensitive match for a special scheme and not an ASCII case-insensitive match for "
file
", followed by U+003A (:) and a scheme-relative-special-URL string -
a URL-scheme string that is not an ASCII case-insensitive match for a special scheme , followed by U+003A (:) and a relative-URL string
-
a URL-scheme string that is an ASCII case-insensitive match for "
file
", followed by U+003A (:) and a scheme-relative-file-URL string
any optionally followed by U+003F (?) and a URL-query string .
A URL-scheme string must be one ASCII alpha , followed by zero or more of ASCII alphanumeric , U+002B (+), U+002D (-), and U+002E (.). Schemes should be registered in the IANA URI [sic] Schemes registry. [IANA-URI-SCHEMES] [RFC7595]
A relative-URL-with-fragment string must be a relative-URL string , optionally followed by U+0023 (#) and a URL-fragment string .
A relative-URL string must be one of the following, switching on base URL ’s scheme :
-
A
special
scheme
that
is
not
"
file
" -
a scheme-relative-special-URL string
-
"
file
" -
a scheme-relative-file-URL string
a path-absolute-URL string if base URL ’s host is an empty host
a path-absolute-non-Windows-file-URL string if base URL ’s host is not an empty host
- Otherwise
-
any optionally followed by U+003F (?) and a URL-query string .
A non-null base URL is necessary when parsing a relative-URL string .
A
scheme-relative-special-URL
string
must
be
"
//
",
followed
by
a
valid
host
string
,
optionally
followed
by
U+003A
(:)
and
a
URL-port
string
,
optionally
followed
by
a
path-absolute-URL
string
.
A URL-port string must be one of the following:
-
the empty string
-
one or more ASCII digits representing a decimal number no greater than 2 16 − 1.
A
scheme-relative-URL
string
must
be
"
//
",
followed
by
an
opaque-host-and-port
string
,
optionally
followed
by
a
path-absolute-URL
string
.
An opaque-host-and-port string must be either the empty string or: a valid opaque-host string , optionally followed by U+003A (:) and a URL-port string .
A
scheme-relative-file-URL
string
must
be
"
//
",
followed
by
one
of
the
following:
-
a valid host string , optionally followed by a path-absolute-non-Windows-file-URL string
A path-absolute-URL string must be U+002F (/) followed by a path-relative-URL string .
A path-absolute-non-Windows-file-URL string must be a path-absolute-URL string that does not start with: U+002F (/), followed by a Windows drive letter , followed by U+002F (/).
A path-relative-URL string must be zero or more URL-path-segment strings , separated from each other by U+002F (/), and not start with U+002F (/).
A path-relative-scheme-less-URL string must be a path-relative-URL string that does not start with: a URL-scheme string , followed by U+003A (:).
A URL-path-segment string must be one of the following:
-
zero or more URL units excluding U+002F (/) and U+003F (?), that together are not a single-dot path segment or a double-dot path segment .
A
single-dot
path
segment
must
be
"
.
"
or
an
ASCII
case-insensitive
match
for
"
%2e
".
A
double-dot
path
segment
must
be
"
..
"
or
an
ASCII
case-insensitive
match
for
"
.%2e
",
"
%2e.
",
or
"
%2e%2e
".
A URL-query string must be zero or more URL units .
A URL-fragment string must be zero or more URL units .
The URL code points are ASCII alphanumeric , U+0021 (!), U+0024 ($), U+0026 (&), U+0027 ('), U+0028 LEFT PARENTHESIS, U+0029 RIGHT PARENTHESIS, U+002A (*), U+002B (+), U+002C (,), U+002D (-), U+002E (.), U+002F (/), U+003A (:), U+003B (;), U+003D (=), U+003F (?), U+0040 (@), U+005F (_), U+007E (~), and code points in the range U+00A0 to U+10FFFD, inclusive, excluding surrogates and noncharacters .
Code points greater than U+007F DELETE will be converted to percent-encoded bytes by the URL parser .
In HTML, when the document encoding is a legacy encoding, code points in the URL-query string that are higher than U+007F DELETE will be converted to percent-encoded bytes using the document’s encoding . This can cause problems if a URL that works in one document is copied to another document that uses a different document encoding. Using the UTF-8 encoding everywhere solves this problem.
For example, consider this HTML document:
<!doctype html>
< meta charset = "windows-1252" >
<
a
href
=
"?smörgåsbord"
>
Test
</
a
>
Since
the
document
encoding
is
windows-1252,
the
link’s
URL
’s
query
will
be
"
sm%F6rg%E5sbord
".
If
the
document
encoding
had
been
UTF-8,
it
would
instead
be
"
sm%C3%B6rg%C3%A5sbord
".
The URL units are URL code points and percent-encoded bytes .
Percent-encoded bytes can be used to encode code points that are not URL code points or are excluded from being written.
There is no way to express a username or password of a URL record within a valid URL string .
4.4. URL parsing
The URL parser takes a string input , with an optional base URL base and an optional encoding encoding override , and then runs these steps:
Non-web-browser implementations only need to implement the basic URL parser .
How user input in the web browser’s address bar is converted to a URL record is out-of-scope of this standard. This standard does include URL rendering requirements as they pertain trust decisions.
-
Let url be the result of running the basic URL parser on input with base , and encoding override as provided.
-
If url is failure, return failure.
-
If url ’s scheme is not "
blob
", return url . -
Set url ’s blob URL entry to the result of resolving the blob URL url , if that did not return failure, and null otherwise.
-
Return url .
The basic URL parser takes a string input , optionally with a base URL base , optionally with an encoding encoding override , optionally with a URL url and a state override state override , and then runs these steps:
The encoding override argument is a legacy concept only relevant for HTML. The url and state override arguments are only for use by various APIs. [HTML]
When the url and state override arguments are not passed, the basic URL parser returns either a new URL or failure. If they are passed, the algorithm modifies the passed url and can terminate without returning anything.
-
If url is not given:
-
Set url to a new URL .
-
If input contains any leading or trailing C0 control or space , validation error .
-
Remove any leading and trailing C0 control or space from input .
-
-
If input contains any ASCII tab or newline , validation error .
-
Remove all ASCII tab or newline from input .
-
Let state be state override if given, or scheme start state otherwise.
-
If base is not given, set it to null.
-
Let encoding be UTF-8 .
-
If encoding override is given, set encoding to the result of getting an output encoding from encoding override .
-
Let buffer be the empty string.
-
Let the @ flag , [] flag , and passwordTokenSeenFlag be unset.
-
Let pointer be a pointer for input .
-
Keep running the following state machine by switching on state . If after a run pointer points to the EOF code point , go to the next step. Otherwise, increase pointer by 1 and continue with the state machine.
- scheme start state
-
-
If c is an ASCII alpha , append c , lowercased , to buffer , and set state to scheme state .
-
Otherwise, if state override is not given, set state to no scheme state and decrease pointer by 1.
-
Otherwise, validation error , return failure.
This indication of failure is used exclusively by the
Location
object’sprotocol
setter.
-
- scheme state
-
-
If c is an ASCII alphanumeric , U+002B (+), U+002D (-), or U+002E (.), append c , lowercased , to buffer .
-
Otherwise, if c is U+003A (:), then:
-
If state override is given, then:
-
If url ’s scheme is a special scheme and buffer is not a special scheme , then return.
-
If url ’s scheme is not a special scheme and buffer is a special scheme , then return.
-
If url includes credentials or has a non-null port , and buffer is "
file
", then return. -
If url ’s scheme is "
file
" and its host is an empty host or null, then return.
-
-
Set url ’s scheme to buffer .
-
If state override is given, then:
-
If url ’s port is url ’s scheme ’s default port , then set url ’s port to null.
-
Return.
-
-
Set buffer to the empty string.
-
If url ’s scheme is "
file
", then:-
If remaining does not start with "
//
", validation error . -
Set state to file state .
-
-
Otherwise, if url is special , base is non-null, and base ’s scheme is equal to url ’s scheme , set state to special relative or authority state .
This means that base ’s cannot-be-a-base-URL flag is unset.
-
Otherwise, if url is special , set state to special authority slashes state .
-
Otherwise, if remaining starts with an U+002F (/), set state to path or authority state and increase pointer by 1.
-
Otherwise, set url ’s cannot-be-a-base-URL flag , append an empty string to url ’s path , and set state to cannot-be-a-base-URL path state .
-
-
Otherwise, if state override is not given, set buffer to the empty string, state to no scheme state , and start over (from the first code point in input ).
-
Otherwise, validation error , return failure.
This indication of failure is used exclusively by the
Location
object’sprotocol
setter. Furthermore, the non-failure termination earlier in this state is an intentional difference for defining that setter.
-
- no scheme state
-
-
If base is null, or base ’s cannot-be-a-base-URL flag is set and c is not U+0023 (#), validation error , return failure.
-
Otherwise, if base ’s cannot-be-a-base-URL flag is set and c is U+0023 (#), set url ’s scheme to base ’s scheme , url ’s path to a clone of base ’s path , url ’s query to base ’s query , url ’s fragment to the empty string, set url ’s cannot-be-a-base-URL flag , and set state to fragment state .
-
Otherwise, if base ’s scheme is not "
file
", set state to relative state and decrease pointer by 1. -
Otherwise, set state to file state and decrease pointer by 1.
-
- special relative or authority state
-
-
If c is U+002F (/) and remaining starts with U+002F (/), then set state to special authority ignore slashes state and increase pointer by 1.
-
Otherwise, validation error , set state to relative state and decrease pointer by 1.
-
- path or authority state
-
-
If c is U+002F (/), then set state to authority state .
-
Otherwise, set state to path state , and decrease pointer by 1.
-
- relative state
-
-
If c is U+002F (/), then set state to relative slash state .
-
Otherwise, if url is special and c is U+005C (\), validation error , set state to relative slash state .
-
Otherwise:
-
Set url ’s username to base ’s username , url ’s password to base ’s password , url ’s host to base ’s host , url ’s port to base ’s port , url ’s path to a clone of base ’s path , and url ’s query to base ’s query .
-
If c is U+003F (?), then set url ’s query to the empty string, and state to query state .
-
Otherwise, if c is U+0023 (#), set url ’s fragment to the empty string and state to fragment state .
-
Otherwise, if c is not the EOF code point :
-
Set url ’s query to null.
-
Set state to path state and decrease pointer by 1.
-
-
- relative slash state
-
-
If url is special and c is U+002F (/) or U+005C (\), then:
-
If c is U+005C (\), validation error .
-
Set state to special authority ignore slashes state .
-
-
Otherwise, if c is U+002F (/), then set state to authority state .
-
Otherwise, set url ’s username to base ’s username , url ’s password to base ’s password , url ’s host to base ’s host , url ’s port to base ’s port , state to path state , and then, decrease pointer by 1.
-
- special authority slashes state
-
-
If c is U+002F (/) and remaining starts with U+002F (/), then set state to special authority ignore slashes state and increase pointer by 1.
-
Otherwise, validation error , set state to special authority ignore slashes state and decrease pointer by 1.
-
- special authority ignore slashes state
-
-
If c is neither U+002F (/) nor U+005C (\), then set state to authority state and decrease pointer by 1.
-
Otherwise, validation error .
-
- authority state
-
-
If c is U+0040 (@), then:
-
If the @ flag is set, prepend "
%40
" to buffer . -
Set the @ flag .
-
For each codePoint in buffer :
-
If codePoint is U+003A (:) and passwordTokenSeenFlag is unset, then set passwordTokenSeenFlag and continue .
-
Let encodedCodePoints be the result of running UTF-8 percent-encode codePoint using the userinfo percent-encode set .
-
If passwordTokenSeenFlag is set, then append encodedCodePoints to url ’s password .
-
Otherwise, append encodedCodePoints to url ’s username .
-
-
Set buffer to the empty string.
-
Otherwise, if one of the following is true:
-
c is the EOF code point , U+002F (/), U+003F (?), or U+0023 (#)
-
url is special and c is U+005C (\)
then:
-
If @ flag is set and buffer is the empty string, validation error , return failure.
-
Decrease pointer by the number of code points in buffer plus one, set buffer to the empty string, and set state to host state .
-
-
Otherwise, append c to buffer .
-
-
host
state
- hostname state
-
-
If state override is given and url ’s scheme is "
file
", then decrease pointer by 1 and set state to file host state . -
Otherwise, if c is U+003A (:) and the [] flag is unset, then:
-
If buffer is the empty string, validation error , return failure.
-
Let host be the result of host parsing buffer with url is not special .
-
If host is failure, then return failure.
-
Set url ’s host to host , buffer to the empty string, and state to port state .
-
If state override is given and state override is hostname state , then return.
-
-
Otherwise, if one of the following is true:
-
c is the EOF code point , U+002F (/), U+003F (?), or U+0023 (#)
-
url is special and c is U+005C (\)
then decrease pointer by 1, and then:
-
If url is special and buffer is the empty string, validation error , return failure.
-
Otherwise, if state override is given, buffer is the empty string, and either url includes credentials or url ’s port is non-null, validation error , return.
-
Let host be the result of host parsing buffer with url is not special .
-
If host is failure, then return failure.
-
Set url ’s host to host , buffer to the empty string, and state to path start state .
-
If state override is given, then return.
-
-
Otherwise:
-
- port state
-
-
If c is an ASCII digit , append c to buffer .
-
Otherwise, if one of the following is true:
-
c is the EOF code point , U+002F (/), U+003F (?), or U+0023 (#)
-
url is special and c is U+005C (\)
-
state override is given
then:
-
If buffer is not the empty string, then:
-
Let port be the mathematical integer value that is represented by buffer in radix-10 using ASCII digits for digits with values 0 through 9.
-
If port is greater than 2 16 − 1, validation error , return failure.
-
Set url ’s port to null, if port is url ’s scheme ’s default port , and to port otherwise.
-
Set buffer to the empty string.
-
-
If state override is given, then return.
-
Set state to path start state and decrease pointer by 1.
-
-
Otherwise, validation error , return failure.
-
- file state
-
-
Set url ’s scheme to "
file
". -
If c is U+002F (/) or U+005C (\), then:
-
If c is U+005C (\), validation error .
-
Set state to file slash state .
-
-
Otherwise, if base is non-null and base ’s scheme is "
file
":-
Set url ’s host to base ’s host , url ’s path to a clone of base ’s path , and url ’s query to base ’s query .
-
If c is U+003F (?), then set url ’s query to the empty string and state to query state .
-
Otherwise, if c is U+0023 (#), set url ’s fragment to the empty string and state to fragment state .
-
Otherwise, if c is not the EOF code point :
-
Set url ’s query to null.
-
If the substring from pointer in input does not start with a Windows drive letter , then shorten url ’s path .
-
Otherwise:
-
Set url ’s
host to null and url ’spath to an empty list.
This is a (platform-independent) Windows drive letter quirk.
-
Set state to path state and decrease pointer by 1.
-
-
-
Otherwise, set state to path state , and decrease pointer by 1.
-
- file slash state
-
-
If c is U+002F (/) or U+005C (\), then:
-
If c is U+005C (\), validation error .
-
Set state to file host state .
-
-
Otherwise:
-
If base is
non-null,non-null and base ’s scheme is "file
",andthen:If the substring from pointer in input does not start with a Windows drive letter
, then: Ifand base ’s path [0] is a normalized Windows drive letter , then append base ’s path [0] to url ’s path .This is a (platform-independent) Windows drive letter quirk.
Both url ’s and base ’s host are null under these conditions and therefore not copied. Otherwise, set url ’s host to base ’s host .
-
Set state to path state , and decrease pointer by 1.
-
-
- file host state
-
-
If c is the EOF code point , U+002F (/), U+005C (\), U+003F (?), or U+0023 (#), then decrease pointer by 1 and then:
-
If state override is not given and buffer is a Windows drive letter , validation error , set state to path state .
This is a (platform-independent) Windows drive letter quirk. buffer is not reset here and instead used in the path state .
-
Otherwise, if buffer is the empty string, then:
-
Set url ’s host to the empty string.
-
If state override is given, then return.
-
Set state to path start state .
-
-
Otherwise, run these steps:
-
Let host be the result of host parsing buffer with url is not special .
-
If host is failure, then return failure.
-
If host is "
localhost
", then set host to the empty string. -
Set url ’s host to host .
-
If state override is given, then return.
-
Set buffer to the empty string and state to path start state .
-
-
-
Otherwise, append c to buffer .
-
- path start state
-
-
If url is special , then:
-
If c is U+005C (\), validation error .
-
Set state to path state .
-
If c is neither U+002F (/) nor U+005C (\), then decrease pointer by 1.
-
-
Otherwise, if state override is not given and c is U+003F (?), set url ’s query to the empty string and state to query state .
-
Otherwise, if state override is not given and c is U+0023 (#), set url ’s fragment to the empty string and state to fragment state .
-
Otherwise, if c is not the EOF code point :
-
Set state to path state .
-
If c is not U+002F (/), then decrease pointer by 1.
-
-
- path state
-
-
If one of the following is true:
-
c is the EOF code point or U+002F (/)
-
url is special and c is U+005C (\)
-
state override is not given and c is U+003F (?) or U+0023 (#)
then:
-
If url is special and c is U+005C (\), validation error .
-
If buffer is a double-dot path segment , then:
-
Otherwise, if buffer is a single-dot path segment and if neither c is U+002F (/), nor url is special and c is U+005C (\), append the empty string to url ’s path .
-
Otherwise, if buffer is not a single-dot path segment , then:
-
If url ’s scheme is "
file
", url ’s path is empty , and buffer is a Windows drive letter ,then: If url ’s host is neither the empty string nor null, validation error , set url ’s host to the empty string. Replacethen replace the second code point in buffer with U+003A (:).This is a (platform-independent) Windows drive letter quirk.
-
-
Set buffer to the empty string.
-
If
url ’s scheme is " file " andc isthe EOF code point , U+003F (?), or U+0023 (#), then while url ’s path ’s size is greater than 1 and url ’s path [0] is the empty string, validation error , remove the first item from url ’s path . If c isU+003F (?), then set url ’s query to the empty string and state to query state . -
If c is U+0023 (#), then set url ’s fragment to the empty string and state to fragment state .
-
-
Otherwise, run these steps:
-
If c is not a URL code point and not U+0025 (%), validation error .
-
If c is U+0025 (%) and remaining does not start with two ASCII hex digits , validation error .
-
UTF-8 percent-encode c using the path percent-encode set and append the result to buffer .
-
-
- cannot-be-a-base-URL path state
-
-
If c is U+003F (?), then set url ’s query to the empty string and state to query state .
-
Otherwise, if c is U+0023 (#), then set url ’s fragment to the empty string and state to fragment state .
-
Otherwise:
-
If c is not the EOF code point , not a URL code point , and not U+0025 (%), validation error .
-
If c is U+0025 (%) and remaining does not start with two ASCII hex digits , validation error .
-
If c is not the EOF code point , UTF-8 percent-encode c using the C0 control percent-encode set and append the result to url ’s path [0].
-
-
- query state
-
-
If encoding is not UTF-8 and one of the following is true:
-
url is not special
-
url ’s scheme is "
ws
" or "wss
"
then set encoding to UTF-8 .
-
-
If state override is not given and c is U+0023 (#), then set url ’s fragment to the empty string and state to fragment state .
-
Otherwise, if c is not the EOF code point :
-
If c is not a URL code point and not U+0025 (%), validation error .
-
If c is U+0025 (%) and remaining does not start with two ASCII hex digits , validation error .
-
Let queryPercentEncodeSet be the special-query percent-encode set if url is special ; otherwise the query percent-encode set .
-
Percent-encode after encoding , with encoding , c , and queryPercentEncodeSet , and append the result to url ’s query .
-
-
- fragment state
-
-
If c is not the EOF code point , then:
-
If c is not a URL code point and not U+0025 (%), validation error .
-
If c is U+0025 (%) and remaining does not start with two ASCII hex digits , validation error .
-
UTF-8 percent-encode c using the fragment percent-encode set and append the result to url ’s fragment .
-
-
-
Return url .
To set the username given a url and username , set url ’s username to the result of running UTF-8 percent-encode on username using the userinfo percent-encode set .
To set the password given a url and password , set url ’s password to the result of running UTF-8 percent-encode on password using the userinfo percent-encode set .
4.5. URL serializing
The URL serializer takes a URL url , an optional exclude fragment flag , and then runs these steps, returning an ASCII string :
-
Let output be url ’s scheme and U+003A (:) concatenated.
-
If url ’s host is non-null:
-
Append "
//
" to output . -
If url includes credentials , then:
-
Append url ’s host , serialized , to output .
-
If url ’s port is non-null, append U+003A (:) followed by url ’s port , serialized , to output .
-
-
Otherwise, if url ’s host is null and url ’s scheme is "
file
", append "//
" to output . -
If url ’s cannot-be-a-base-URL flag is set, append url ’s path [0] to output .
-
Otherwise:
-
If url ’s host is null, url ’s path ’s size is greater than 1, and url ’s path [0] is the empty string, then append U+002F (/) followed by U+002E (.) to output .
-
For each segment of url ’s path : append U+002F (/) followed by segment to output .
This prevents
web+demo:/.//not-a-host/
orweb+demo:/path/..//not-a-host/
, when parsed and then serialized , from ending up asweb+demo://not-a-host/
(they end up asweb+demo:/.//not-a-host/
). -
-
If url ’s query is non-null, append U+003F (?), followed by url ’s query , to output .
-
If the exclude fragment flag is unset and url ’s fragment is non-null, append U+0023 (#), followed by url ’s fragment , to output .
-
Return output .
4.6. URL equivalence
To determine whether a URL A equals B , optionally with an exclude fragments flag , run these steps:
-
Let serializedA be the result of serializing A , with the exclude fragment flag set if the exclude fragments flag is set.
-
Let serializedB be the result of serializing B , with the exclude fragment flag set if the exclude fragments flag is set.
-
Return true if serializedA is serializedB , and false otherwise.
4.7. Origin
See origin ’s definition in HTML for the necessary background information. [HTML]
A URL ’s origin is the origin returned by running these steps, switching on URL ’s scheme :
-
"
blob
" -
-
If URL ’s blob URL entry is non-null, then return URL ’s blob URL entry ’s environment ’s origin .
-
Return a new opaque origin , if url is failure, and url ’s origin otherwise.
The origin of
blob:https://whatwg.org/d0360e2f-caee-469f-9a2f-87d5b0456f6f
is the tuple (https
,whatwg.org
, null, null). -
-
"
ftp
"- "
http
"- "
https
"- "
ws
"- "
wss
" - "
-
Return a tuple consisting of URL ’s scheme , URL ’s host , URL ’s port , and null.
-
"
file
" -
Unfortunate as it is, this is left as an exercise to the reader. When in doubt, return a new opaque origin .
- Otherwise
-
Return a new opaque origin .
This does indeed mean that these URLs cannot be same-origin with themselves.
4.8. URL rendering
A URL should be rendered in its serialized form, with modifications described below, when the primary purpose of displaying a URL is to have the user make a security or trust decision. For example, users are expected to make trust decisions based on a URL rendered in the browser address bar.
4.8.1. Simplify non-human-readable or irrelevant components
Remove components that can provide opportunities for spoofing or distract from security-relevant information:
-
Browsers may render only a URL’s host in places where it is important for users to distinguish between the host and other parts of the URL such as the path . Browsers may consider simplifying the host further to draw attention to its registrable domain . For example, browsers may omit a leading
www
orm
domain label to simplify the host, or display its registrable domain only to remove spoofing opportunities posted by subdomains (e.g.,https://examplecorp.attacker.com/
). -
Browsers should not render a URL ’s username and password , as they can be mistaken for a URL ’s host (e.g.,
https://examplecorp.com@attacker.example/
). -
Browsers may render a URL without its scheme if the display surface only ever permits a single scheme (such as a browser feature that omits
https://
because it is only enabled for secure origins). Otherwise, the scheme may be replaced or supplemented with a human-readable string (e.g., "Not secure"), a security indicator icon, or both.
4.8.2. Elision
In a space-constrained display, URLs should be elided carefully to avoid misleading the user when making a security decision:
-
Browsers should ensure that at least the registrable domain can be shown when the URL is rendered (to avoid showing, e.g.,
...examplecorp.com
when loadinghttps://not-really-examplecorp.com/
). -
When the full host cannot be rendered, browsers should elide domain labels starting from the lowest-level domain label. For example,
examplecorp.com.evil.com
should be elided as...com.evil.com
, notexamplecorp.com...
. (Note that bidirectional text means that the lowest-level domain label may not appear on the left.)
4.8.3. Internationalization and special characters
Internationalized domain names (IDNs), special characters, and bidirectional text should be handled with care to prevent spoofing:
-
Browsers should render a URL ’s host using domain to Unicode .
Note that various characters can be used in homograph spoofing attacks. Consider detecting confusable characters and warning when they are in use. [IDNFAQ] [UTS39]
-
URLs are particularly prone to confusion between host and path when they contain bidirectional text, so in this case it is particularly advisable to only render a URL’s host . For readability, other parts of the URL , if rendered, should have their sequences of percent-encoded bytes replaced with code points resulting from percent-decoding those sequences converted to bytes, unless that renders those sequences invisible. Browsers may choose to not decode certain sequences that present spoofing risks (e.g., U+1F512 (🔒)).
-
Browsers should render bidirectional text as if it were in a left-to-right embedding. [BIDI]
Unfortunately, as rendered URLs are strings and can appear anywhere, a specific bidirectional algorithm for rendered URLs would not see wide adoption. Bidirectional text interacts with the parts of a URL in ways that can cause the rendering to be different from the model. Users of bidirectional languages can come to expect this, particularly in plain text environments.
5.
application/x-www-form-urlencoded
The
application/x-www-form-urlencoded
format
provides
a
way
to
encode
name-value
pairs.
The
application/x-www-form-urlencoded
format
is
in
many
ways
an
aberrant
monstrosity,
the
result
of
many
years
of
implementation
accidents
and
compromises
leading
to
a
set
of
requirements
necessary
for
interoperability,
but
in
no
way
representing
good
design
practices.
In
particular,
readers
are
cautioned
to
pay
close
attention
to
the
twisted
details
involving
repeated
(and
in
some
cases
nested)
conversions
between
character
encodings
and
byte
sequences.
Unfortunately
the
format
is
in
widespread
use
due
to
the
prevalence
of
HTML
forms.
[HTML]
5.1.
application/x-www-form-urlencoded
parsing
A
legacy
server-oriented
implementation
might
have
to
support
encodings
other
than
UTF-8
as
well
as
have
special
logic
for
tuples
of
which
the
name
is
`
_charset
`.
Such
logic
is
not
described
here
as
only
UTF-8
is
conforming.
The
application/x-www-form-urlencoded
parser
takes
a
byte
sequence
input
,
and
then
runs
these
steps:
-
Let sequences be the result of splitting input on 0x26 (&).
-
Let output be an initially empty list of name-value tuples where both name and value hold a string.
-
For each byte sequence bytes in sequences :
-
If bytes is the empty byte sequence, then continue .
-
If bytes contains a 0x3D (=), then let name be the bytes from the start of bytes up to but excluding its first 0x3D (=), and let value be the bytes, if any, after the first 0x3D (=) up to the end of bytes . If 0x3D (=) is the first byte, then name will be the empty byte sequence. If it is the last, then value will be the empty byte sequence.
-
Otherwise, let name have the value of bytes and let value be the empty byte sequence.
-
Replace any 0x2B (+) in name and value with 0x20 (SP).
-
Let nameString and valueString be the result of running UTF-8 decode without BOM on the percent-decoding of name and value , respectively.
-
Append ( nameString , valueString ) to output .
-
-
Return output .
5.2.
application/x-www-form-urlencoded
serializing
The
application/x-www-form-urlencoded
serializer
takes
a
list
of
name-value
tuples
tuples
,
optionally
with
an
encoding
encoding
override
,
and
then
runs
these
steps:
-
Let encoding be UTF-8 .
-
If encoding override is given, then set encoding to the result of getting an output encoding from encoding override .
-
Let output be the empty string.
-
For each tuple of tuples :
-
Let name be the result of running percent-encode after encoding with encoding , tuple ’s name, the
application/x-www-form-urlencoded
percent-encode set , and true. -
Let value be tuple ’s value.
-
If value is a file, then set value to value ’s filename.
-
Set value to the result of running percent-encode after encoding with encoding , value , the
application/x-www-form-urlencoded
percent-encode set , and true. -
If tuple is not tuples [0], then append U+0026 (&) to output .
- Append name , followed by U+003D (=), followed by value , to output .
-
- Return output .
HTML invokes this algorithm with values that are files. [HTML]
5.3. Hooks
The
application/x-www-form-urlencoded
string
parser
takes
a
string
input
,
UTF-8
encodes
it,
and
then
returns
the
result
of
application/x-www-form-urlencoded
parsing
it.
6. API
In all current engines.
Opera 19+ Edge 79+
Edge (Legacy) 12+ IE 10+
Firefox for Android 19+ iOS Safari 7+ Chrome for Android 32+ Android WebView 4.4+ Samsung Internet 2.0+ Opera Mobile 19+
Node.js 10.0.0+
6.1. URL class
In all current engines.
Opera Yes Edge 79+
Edge (Legacy) 17+ IE None
Firefox for Android 54+ iOS Safari Yes Chrome for Android 25+ Android WebView 37+ Samsung Internet 6.0+ Opera Mobile Yes
[Exposed =(Window ,Worker ),LegacyWindowAlias =]
webkitURL interface {
URL constructor (USVString ,
url optional USVString );
base stringifier attribute USVString href ;readonly attribute USVString origin ;attribute USVString protocol ;attribute USVString username ;attribute USVString password ;attribute USVString host ;attribute USVString hostname ;attribute USVString port ;attribute USVString pathname ;attribute USVString search ;[;[SameObject ]readonly attribute URLSearchParams searchParams ;attribute USVString hash ;USVString toJSON (); };
A
URL
object
has
an
associated:
- URL : a URL .
-
query
object
:
a
URLSearchParams
object.
In all current engines.
Opera 15+ Edge 79+
Edge (Legacy) 12+ IE None
Firefox for Android 26+ iOS Safari 6+ Chrome for Android 25+ Android WebView 37+ Samsung Internet 1.5+ Opera Mobile 14+
Node.js Yes
The
new
URL(
url
,
base
)
constructor
steps
are:
-
Let parsedBase be null.
-
If base is given, then:
-
Let parsedBase be the result of running the basic URL parser on base .
-
-
Let parsedURL be the result of running the basic URL parser on url with parsedBase .
-
Let query be parsedURL ’s query , if that is non-null, and the empty string otherwise.
-
Set this ’s query object to a new
URLSearchParams
object. -
Initialize this ’s query object with query .
-
Set this ’s query object ’s URL object to this .
To
parse
a
string
into
a
URL
without
using
a
base
URL
,
invoke
the
URL
constructor
with
a
single
argument:
var input = "https://example.org/💩" ,
url = new URL( input)
url
.
pathname
//
"/%F0%9F%92%A9"
This throws an exception if the input is not an absolute-URL-with-fragment string :
try {
var url = new URL( "/🍣🍺" )
} catch ( e) {
// that happened
}
A base URL is necessary if the input is a relative-URL string :
var input = "/🍣🍺" ,
url = new URL( input, document. baseURI)
url
.
href
//
"https://url.spec.whatwg.org/%F0%9F%8D%A3%F0%9F%8D%BA"
A
URL
object
can
be
used
as
base
URL
(while
IDL
requires
a
string
as
argument,
a
URL
object
stringifies
to
its
href
getter
return
value):
var url = new URL( "🏳️🌈" , new URL( "https://pride.example/hello-world" ))
url
.
pathname
//
"/%F0%9F%8F%B3%EF%B8%8F%E2%80%8D%F0%9F%8C%88"
In all current engines.
Opera Yes Edge Yes
Edge (Legacy) 13+ IE None
Firefox for Android 22+ iOS Safari Yes Chrome for Android Yes Android WebView Yes Samsung Internet Yes Opera Mobile Yes
Node.js Yes
In only one current engine.
Opera None Edge None
Edge (Legacy) None IE None
Firefox for Android 4+ iOS Safari None Chrome for Android None Android WebView None Samsung Internet None Opera Mobile None
In all current engines.
Opera Yes Edge 79+
Edge (Legacy) 17+ IE None
Firefox for Android 54+ iOS Safari Yes Chrome for Android 71+ Android WebView 71+ Samsung Internet 10.0+ Opera Mobile Yes
Node.js Yes
The
href
getter
steps
and
the
toJSON()
method
steps
are
to
return
the
serialization
of
this
’s
URL
.
The
href
setter
steps
are:
-
Let parsedURL be the result of running the basic URL parser on the given value.
-
Empty this ’s query object ’s list .
-
If query is non-null, then set this ’s query object ’s list to the result of parsing query .
In all current engines.
Opera Yes Edge 79+
Edge (Legacy) 12+ IE None
Firefox for Android 26+ iOS Safari Yes Chrome for Android 32+ Android WebView 37+ Samsung Internet 6.0+ Opera Mobile Yes
Node.js Yes
In only one current engine.
Opera None Edge None
Edge (Legacy) None IE None
Firefox for Android 29+ iOS Safari None Chrome for Android None Android WebView None Samsung Internet None Opera Mobile None
The
origin
getter
steps
are
to
return
the
serialization
of
this
’s
URL
’s
origin
.
[HTML]
In all current engines.
Opera Yes Edge Yes
Edge (Legacy) 13+ IE None
Firefox for Android 22+ iOS Safari Yes Chrome for Android Yes Android WebView Yes Samsung Internet Yes Opera Mobile Yes
Node.js Yes
In only one current engine.
Opera None Edge None
Edge (Legacy) None IE None
Firefox for Android 4+ iOS Safari None Chrome for Android None Android WebView None Samsung Internet None Opera Mobile None
The
protocol
getter
steps
are
to
return
this
’s
URL
’s
scheme
,
followed
by
U+003A
(:).
The
protocol
setter
steps
are
to
basic
URL
parse
the
given
value,
followed
by
U+003A
(:),
with
this
’s
URL
as
url
and
scheme
start
state
as
state
override
.
In all current engines.
Opera Yes Edge 79+
Edge (Legacy) 12+ IE None
Firefox for Android 26+ iOS Safari Yes Chrome for Android 32+ Android WebView 37+ Samsung Internet 6.0+ Opera Mobile Yes
Node.js Yes
The
username
getter
steps
are
to
return
this
’s
URL
’s
username
.
The
username
setter
steps
are:
-
If this ’s URL cannot have a username/password/port , then return.
-
Set the username given this ’s URL and the given value.
In all current engines.
Opera Yes Edge 79+
Edge (Legacy) 12+ IE None
Firefox for Android 26+ iOS Safari Yes Chrome for Android 32+ Android WebView 37+ Samsung Internet 6.0+ Opera Mobile Yes
Node.js Yes
The
password
getter
steps
are
to
return
this
’s
URL
’s
password
.
The
password
setter
steps
are:
-
If this ’s URL cannot have a username/password/port , then return.
-
Set the password given this ’s URL and the given value.
In all current engines.
Opera Yes Edge Yes
Edge (Legacy) 13+ IE None
Firefox for Android 22+ iOS Safari Yes Chrome for Android Yes Android WebView Yes Samsung Internet Yes Opera Mobile Yes
Node.js Yes
In only one current engine.
Opera None Edge None
Edge (Legacy) None IE None
Firefox for Android 4+ iOS Safari None Chrome for Android None Android WebView None Samsung Internet None Opera Mobile None
The
host
getter
steps
are:
-
If url ’s host is null, then return the empty string.
-
If url ’s port is null, return url ’s host , serialized .
-
Return url ’s host , serialized , followed by U+003A (:) and url ’s port , serialized .
The
host
setter
steps
are:
-
If this ’s URL ’s cannot-be-a-base-URL flag is set, then return.
-
Basic URL parse the given value with this ’s URL as url and host state as state override .
If
the
given
value
for
the
host
setter
lacks
a
port
,
this
’s
URL
’s
port
will
not
change.
This
can
be
unexpected
as
host
getter
does
return
a
URL-port
string
so
one
might
have
assumed
the
setter
to
always
"reset"
both.
In all current engines.
Opera Yes Edge Yes
Edge (Legacy) 13+ IE None
Firefox for Android 22+ iOS Safari Yes Chrome for Android Yes Android WebView Yes Samsung Internet Yes Opera Mobile Yes
Node.js Yes
In only one current engine.
Opera None Edge None
Edge (Legacy) None IE None
Firefox for Android 4+ iOS Safari None Chrome for Android None Android WebView None Samsung Internet None Opera Mobile None
The
hostname
getter
steps
are:
-
If this ’s URL ’s host is null, then return the empty string.
-
Return this ’s URL ’s host , serialized .
The
hostname
setter
steps
are:
-
If this ’s URL ’s cannot-be-a-base-URL flag is set, then return.
-
Basic URL parse the given value with this ’s URL as url and hostname state as state override .
In all current engines.
Opera Yes Edge Yes
Edge (Legacy) 13+ IE None
Firefox for Android 22+ iOS Safari Yes Chrome for Android Yes Android WebView Yes Samsung Internet Yes Opera Mobile Yes
Node.js Yes
In only one current engine.
Opera None Edge None
Edge (Legacy) None IE None
Firefox for Android 4+ iOS Safari None Chrome for Android None Android WebView None Samsung Internet None Opera Mobile None
The
port
getter
steps
are:
-
If this ’s URL ’s port is null, then return the empty string.
-
Return this ’s URL ’s port , serialized .
The
port
setter
steps
are:
-
If this ’s URL cannot have a username/password/port , then return.
-
If the given value is the empty string, then set this ’s URL ’s port to null.
-
Otherwise, basic URL parse the given value with this ’s URL as url and port state as state override .
In all current engines.
Opera Yes Edge Yes
Edge (Legacy) 13+ IE None
Firefox for Android 53+ iOS Safari Yes Chrome for Android Yes Android WebView Yes Samsung Internet Yes Opera Mobile Yes
Node.js Yes
In only one current engine.
Opera None Edge None
Edge (Legacy) None IE None
Firefox for Android 4+ iOS Safari None Chrome for Android None Android WebView None Samsung Internet None Opera Mobile None
The
pathname
getter
steps
are:
-
If this ’s URL ’s cannot-be-a-base-URL flag is set, then return this ’s URL ’s path [0].
-
If this ’s URL ’s path is empty , then return the empty string.
-
Return U+002F (/), followed by the strings in this ’s URL ’s path (including empty strings), if any, separated from each other by U+002F (/).
The
pathname
setter
steps
are:
-
If this ’s URL ’s cannot-be-a-base-URL flag is set, then return.
-
Basic URL parse the given value with this ’s URL as url and path start state as state override .
In all current engines.
Opera Yes Edge Yes
Edge (Legacy) 13+ IE None
Firefox for Android 53+ iOS Safari Yes Chrome for Android Yes Android WebView Yes Samsung Internet Yes Opera Mobile Yes
Node.js Yes
In only one current engine.
Opera None Edge None
Edge (Legacy) None IE None
Firefox for Android 4+ iOS Safari None Chrome for Android None Android WebView None Samsung Internet None Opera Mobile None
The
search
getter
steps
are:
The
search
setter
steps
are:
-
If the given value is the empty string, set url ’s query to null, empty this ’s query object ’s list , and then return.
-
Let input be the given value with a single leading U+003F (?) removed, if any.
-
Set url ’s query to the empty string.
-
Basic URL parse input with url as url and query state as state override .
-
Set this ’s query object ’s list to the result of parsing input .
In all current engines.
Opera 38+ Edge 79+
Edge (Legacy) 17+ IE None
Firefox for Android 29+ iOS Safari 10+ Chrome for Android 51+ Android WebView 51+ Samsung Internet 5.0+ Opera Mobile 41+
Node.js 7.5.0+
The
searchParams
getter
steps
are
to
return
this
’s
query
object
.
In all current engines.
Opera Yes Edge Yes
Edge (Legacy) 13+ IE None
Firefox for Android 22+ iOS Safari Yes Chrome for Android Yes Android WebView Yes Samsung Internet Yes Opera Mobile Yes
Node.js Yes
In only one current engine.
Opera None Edge None
Edge (Legacy) None IE None
Firefox for Android 38+ iOS Safari None Chrome for Android None Android WebView None Samsung Internet None Opera Mobile None
The
hash
getter
steps
are:
The
hash
setter
steps
are:
-
If the given value is the empty string, then set this ’s URL ’s fragment to null and return.
-
Let input be the given value with a single leading U+0023 (#) removed, if any.
-
Basic URL parse input with this ’s URL as url and fragment state as state override .
6.2. URLSearchParams class
In all current engines.
Opera 36+ Edge 79+
Edge (Legacy) 17+ IE None
Firefox for Android 44+ iOS Safari Yes Chrome for Android 49+ Android WebView 49+ Samsung Internet 5.0+ Opera Mobile 36+
Node.js Yes
In all current engines.
Opera 36+ Edge 79+
Edge (Legacy) 17+ IE None
Firefox for Android 44+ iOS Safari Yes Chrome for Android 49+ Android WebView 49+ Samsung Internet 5.0+ Opera Mobile 36+
In all current engines.
Opera 36+ Edge 79+
Edge (Legacy) 17+ IE None
Firefox for Android 44+ iOS Safari Yes Chrome for Android 49+ Android WebView 49+ Samsung Internet 5.0+ Opera Mobile 36+
Node.js Yes
In all current engines.
Opera 36+ Edge 79+
Edge (Legacy) 17+ IE None
Firefox for Android 29+ iOS Safari Yes Chrome for Android 49+ Android WebView 49+ Samsung Internet 5.0+ Opera Mobile 36+
Node.js Yes
In all current engines.
Opera 36+ Edge 79+
Edge (Legacy) 17+ IE None
Firefox for Android 44+ iOS Safari Yes Chrome for Android 49+ Android WebView 49+ Samsung Internet 5.0+ Opera Mobile 36+
Node.js Yes
In all current engines.
Opera 36+ Edge 79+
Edge (Legacy) 17+ IE None
Firefox for Android 29+ iOS Safari 10.3+ Chrome for Android 49+ Android WebView 49+ Samsung Internet 5.0+ Opera Mobile 36+
Node.js 10.0.0+
[Exposed =(Window ,Worker )]interface {
URLSearchParams = "");constructor (optional (sequence <sequence <USVString >>or record <USVString ,USVString >or USVString )= "");
init undefined append (USVString ,
name USVString );
value undefined delete (USVString );
name USVString ?get (USVString );
name );sequence <USVString >getAll (USVString );
name boolean has (USVString );
name undefined set (USVString ,
name USVString );
value undefined sort ();iterable <USVString ,USVString >;stringifier ; };
Constructing
and
stringifying
a
URLSearchParams
object
is
fairly
straightforward:
let params = new URLSearchParams({ key: "730d67" })
params
.
toString
()
//
"key=730d67"
As
a
URLSearchParams
object
uses
the
application/x-www-form-urlencoded
format
underneath
there
are
some
difference
with
how
it
encodes
certain
code
points
compared
to
a
URL
object
(including
href
and
search
).
This
can
be
especially
surprising
when
using
searchParams
to
operate
on
a
URL
’s
query
.
const url = new URL( 'https://example.com/?a=b ~' );
console. log( url. href); // "https://example.com/?a=b%20~"
url. searchParams. sort();
console
.
log
(
url
.
href
);
//
"https://example.com/?a=b+%7E"
const url = new URL( 'https://example.com/?a=~&b=%7E' );
console. log( url. search); // "?a=~&b=%7E"
console. log( url. searchParams. get( 'a' )); // "~"
console
.
log
(
url
.
searchParams
.
get
(
'b'
));
//
"~"
URLSearchParams
objects
will
percent-encode
anything
in
the
application/x-www-form-urlencoded
percent-encode
set
,
and
will
encode
U+0020
SPACE
as
U+002B
(+).
Ignoring
encodings
(use
UTF-8
),
search
will
percent-encode
anything
in
the
query
percent-encode
set
or
the
special-query
percent-encode
set
(depending
on
whether
or
not
the
URL
is
special
).
A
URLSearchParams
object
has
an
associated:
- list : a list of name-value pairs, initially empty.
-
URL
object
:
null
or
a
URL
object, initially null.
To
initialize
a
URLSearchParams
object
query
with
init
,
run
these
steps:
-
Otherwise, if init is a record , then for each name → value of init , append a new name-value pair whose name is name and value is value , to query ’s list .
-
Otherwise:
To
update
a
URLSearchParams
object
query
,
run
these
steps:
-
If query ’s URL object is null, then return.
-
Let serializedQuery be the serialization of query ’s list .
-
If serializedQuery is the empty string, then set serializedQuery to null.
-
Set query ’s URL object ’s URL ’s query to serializedQuery .
URLSearchParams/URLSearchParams
In all current engines.
Opera 36+ Edge 79+
Edge (Legacy) 17+ IE None
Firefox for Android 29+ iOS Safari 10.3+ Chrome for Android 49+ Android WebView 49+ Samsung Internet 5.0+ Opera Mobile 36+
Node.js 7.10.0+
The
new
URLSearchParams(
init
)
constructor
steps
are:
-
If init is a string and starts with U+003F (?), then remove the first code point from init .
-
Initialize this with init .
In all current engines.
Opera 36+ Edge 79+
Edge (Legacy) 17+ IE None
Firefox for Android 29+ iOS Safari 10.3+ Chrome for Android 49+ Android WebView 49+ Samsung Internet 5.0+ Opera Mobile 36+
Node.js Yes
The
append(
name
,
value
)
method
steps
are:
Opera 36+ Edge 79+
Edge (Legacy) 17+ IE None
Firefox for Android 29+ iOS Safari None Chrome for Android 49+ Android WebView 49+ Samsung Internet 5.0+ Opera Mobile 36+
Node.js Yes
The
delete(
name
)
method
steps
are:
In all current engines.
Opera 36+ Edge 79+
Edge (Legacy) 17+ IE None
Firefox for Android 29+ iOS Safari Yes Chrome for Android 49+ Android WebView 49+ Samsung Internet 5.0+ Opera Mobile 36+
Node.js Yes
The
get(
name
)
method
steps
are
to
return
the
value
of
the
first
name-value
pair
whose
name
is
name
in
this
’s
list
,
if
there
is
such
a
pair,
and
null
otherwise.
In all current engines.
Opera 36+ Edge 79+
Edge (Legacy) 17+ IE None
Firefox for Android 29+ iOS Safari Yes Chrome for Android 49+ Android WebView 49+ Samsung Internet 5.0+ Opera Mobile 36+
Node.js Yes
The
getAll(
name
)
method
steps
are
to
return
the
values
of
all
name-value
pairs
whose
name
is
name
,
in
this
’s
list
,
in
list
order,
and
the
empty
sequence
otherwise.
In all current engines.
Opera 36+ Edge 79+
Edge (Legacy) 17+ IE None
Firefox for Android 29+ iOS Safari Yes Chrome for Android 49+ Android WebView 49+ Samsung Internet 5.0+ Opera Mobile 36+
Node.js Yes
The
has(
name
)
method
steps
are
to
return
true
if
there
is
a
name-value
pair
whose
name
is
name
in
this
’s
list
,
and
false
otherwise.
In all current engines.
Opera 36+ Edge 79+
Edge (Legacy) 17+ IE None
Firefox for Android 29+ iOS Safari Yes Chrome for Android 49+ Android WebView 49+ Samsung Internet 5.0+ Opera Mobile 36+
Node.js Yes
The
set(
name
,
value
)
method
steps
are:
-
If this ’s list contains any name-value pairs whose name is name , then set the value of the first such name-value pair to value and remove the others.
-
Otherwise, append a new name-value pair whose name is name and value is value , to this ’s list .
It
can
be
useful
to
sort
the
name-value
pairs
in
a
URLSearchParams
object,
in
particular
to
increase
cache
hits.
This
can
be
accomplished
through
invoking
the
sort()
method:
const url = new URL( "https://example.org/?q=🏳️🌈&key=e1f7bc78" );
url. searchParams. sort();
url
.
search
;
//
"?key=e1f7bc78&q=%F0%9F%8F%B3%EF%B8%8F%E2%80%8D%F0%9F%8C%88"
To
avoid
altering
the
original
input,
e.g.,
for
comparison
purposes,
construct
a
new
URLSearchParams
object:
const sorted = new URLSearchParams( url. search)
sorted
.
sort
()
In all current engines.
Opera 48+ Edge 79+
Edge (Legacy) 17+ IE None
Firefox for Android 54+ iOS Safari Yes Chrome for Android 61+ Android WebView 61+ Samsung Internet 8.0+ Opera Mobile 45+
Node.js 7.7.0+
The
sort()
method
steps
are:
-
Sort all name-value pairs, if any, by their names. Sorting must be done by comparison of code units. The relative order between name-value pairs with equal names must be preserved.
The value pairs to iterate over are this ’s list ’s name-value pairs with the key being the name and the value being the value.
The stringification behavior steps are to return the serialization of this ’s list .
6.3. URL APIs elsewhere
A
standard
that
exposes
URLs
,
should
expose
the
URL
as
a
string
(by
serializing
an
internal
URL
).
A
standard
should
not
expose
a
URL
using
a
URL
object.
URL
objects
are
meant
for
URL
manipulation.
In
IDL
the
USVString
type
should
be
used.
The higher-level notion here is that values are to be exposed as immutable data structures.
If a standard decides to use a variant of the name "URL" for a feature it defines, it should name such a feature "url" (i.e., lowercase and with an "l" at the end). Names such as "URL", "URI", and "IRI" should not be used. However, if the name is a compound, "URL" (i.e., uppercase) is preferred, e.g., "newURL" and "oldURL".
The
EventSource
and
HashChangeEvent
interfaces
in
HTML
are
examples
of
proper
naming.
[HTML]
Acknowledgments
There have been a lot of people that have helped make URLs more interoperable over the years and thereby furthered the goals of this standard. Likewise many people have helped making this standard what it is today.
With that, many thanks to 100の人, Adam Barth, Addison Phillips, Albert Wiersch, Alex Christensen, Alexandre Morgaut, Alexis Hunt, Alwin Blok, Andrew Sullivan, Arkadiusz Michalski, Behnam Esfahbod, Bobby Holley, Boris Zbarsky, Brad Hill, Brandon Ross, Chris Dumez, Chris Rebert, Corey Farwell, Dan Appelquist, Daniel Bratell, Daniel Stenberg, David Burns, David Håsäther, David Sheets, David Singer, David Walp, Domenic Denicola, Emily Schechter, Emily Stark, Eric Lawrence, Erik Arvidsson, Gavin Carothers, Geoff Richards, Glenn Maynard, Gordon P. Hemsley, Henri Sivonen, Ian Hickson, Ilya Grigorik, Italo A. Casas, Jakub Gieryluk, James Graham, James Manger, James Ross, Jeff Hodges, Jeffrey Posnick, Jeffrey Yasskin, Joe Duarte, Joshua Bell, Jxck, 田村健人 (Kent TAMURA), Kevin Grandon, Kornel Lesiński, Larry Masinter, Leif Halvard Silli, Mark Amery, Mark Davis, Marcos Cáceres, Marijn Kruisselbrink, Martin Dürst, Mathias Bynens, Matt Falkenhagen, Matt Giuca, Michael Peick, Michael™ Smith, Michal Bukovský, Michel Suignard, Mikaël Geljić, Noah Levitt, Peter Occil, Philip Jägenstedt, Philippe Ombredanne, Prayag Verma, Rimas Misevičius, Robert Kieffer, Rodney Rehm, Roy Fielding, Ryan Sleevi, Sam Ruby, Sam Sneddon, Santiago M. Mola, Sebastian Mayr, Simon Pieters, Simon Sapin, Steven Vachon, Stuart Cook, Sven Uhlig, Tab Atkins, 吉野剛史 (Takeshi Yoshino), Tantek Çelik, Tiancheng "Timothy" Gu, Tim Berners-Lee, 簡冠庭 (Tim Guan-tin Chien), Titi_Alone, Tomek Wytrębowicz, Trevor Rowbotham, Tristan Seligmann, Valentin Gosu, Vyacheslav Matva, Wei Wang, 山岸和利 (Yamagishi Kazutoshi), Yongsheng Zhang, 成瀬ゆい (Yui Naruse), and zealousidealroll for being awesome!
This standard is written by Anne van Kesteren ( Mozilla , annevk@annevk.nl ).
Copyright © WHATWG (Apple, Google, Mozilla, Microsoft). This work is licensed under a Creative Commons Attribution 4.0 International License .