Goals
The URL standard takes the following approach towards making URLs fully interoperable:
-
Align RFC 3986 and RFC 3987 with contemporary implementations and obsolete the RFCs in the process. (E.g., spaces, other "illegal" code points, query encoding, equality, canonicalization, are all concepts not entirely shared, or defined.) URL parsing needs to become as solid as HTML parsing. [RFC3986] [RFC3987]
-
Standardize on the term URL. URI and IRI are just confusing. In practice a single algorithm is used for both so keeping them distinct is not helping anyone. URL also easily wins the search result popularity contest .
-
Supplanting Origin of a URI [sic] . [RFC6454]
-
Define URL’s existing JavaScript API in full detail and add enhancements to make it easier to work with. Add a new
URLobject as well for URL manipulation without usage of HTML elements. (Useful for JavaScript worker environments.) -
Ensure the combination of parser, serializer, and API guarantee idempotence. For example, a non-failure result of a parse-then-serialize operation will not change with any further parse-then-serialize operations applied to it. Similarly, manipulating a non-failure result through the API will not change from applying any number of serialize-then-parse operations to it.
As the editors learn more about the subject matter the goals might increase in scope somewhat.
1. Infrastructure
This specification depends on Infra . [INFRA]
Some terms used in this specification are defined in the following standards and specifications:
- Encoding [ENCODING]
- File API [FILEAPI]
- HTML [HTML]
- Unicode IDNA Compatibility Processing [UTS46]
- Web IDL [WEBIDL]
To serialize an integer , represent it as the shortest possible decimal number.
1.1. Writing
A validation error indicates a mismatch between input and valid input. User agents, especially conformance checkers, are encouraged to report them somewhere.
A validation error does not mean that the parser terminates. Termination of a parser is always stated explicitly, e.g., through a return statement.
It is useful to signal validation errors as error-handling can be non-intuitive, legacy user agents might not implement correct error-handling, and the intent of what is written might be unclear to other developers.
| Error type | Error description | Failure |
|---|---|---|
| IDNA | ||
| domain-to-ASCII |
Unicode ToASCII records an error or returns the empty string. [UTS46] If details about Unicode ToASCII errors are recorded, user agents are encouraged to pass those along. | Yes |
| domain-to-Unicode |
Unicode ToUnicode records an error. [UTS46] The same considerations as with domain-to-ASCII apply. | · |
| Host parsing | ||
| domain-invalid-code-point |
The input’s host contains a forbidden domain code point .
Hosts
are
percent-decoded
before
being
processed
when
the
URL
is
special
,
which
would
result
in
the
following
host
portion
becoming
"
"
| Yes |
| host-invalid-code-point |
An opaque host (in a URL that is not special ) contains a forbidden host code point . | Yes |
| IPv4-empty-part |
An IPv4 address ends with a U+002E (.). | · |
| IPv4-too-many-parts |
An IPv4 address does not consist of exactly 4 parts. | Yes |
| IPv4-non-numeric-part |
An IPv4 address part is not numeric. | Yes |
| IPv4-non-decimal-part |
The IPv4 address contains numbers expressed using hexadecimal or octal digits. | · |
| IPv4-out-of-range-part |
An IPv4 address part exceeds 255. |
Yes
(only if applicable to the last part) |
| IPv6-unclosed |
An IPv6 address is missing the closing U+005D (]). | Yes |
|
IPv6-invalid-compression
|
An IPv6 address begins with improper compression. | Yes |
|
IPv6-too-many-pieces
|
An IPv6 address contains more than 8 pieces. | Yes |
|
IPv6-multiple-compression
|
An IPv6 address is compressed in more than one spot. | Yes |
|
IPv6-invalid-code-point
|
An IPv6 address contains a code point that is neither an ASCII hex digit nor a U+003A (:). Or it unexpectedly ends. | Yes |
|
IPv6-too-few-pieces
|
An uncompressed IPv6 address contains fewer than 8 pieces. | Yes |
|
IPv4-in-IPv6-too-many-pieces
|
An IPv6 address with IPv4 address syntax: the IPv6 address has more than 6 pieces. | Yes |
|
IPv4-in-IPv6-invalid-code-point
|
An IPv6 address with IPv4 address syntax:
| Yes |
|
IPv4-in-IPv6-out-of-range-part
|
An IPv6 address with IPv4 address syntax: an IPv4 part exceeds 255. | Yes |
|
IPv4-in-IPv6-too-few-parts
|
An IPv6 address with IPv4 address syntax: an IPv4 address contains too few parts. | Yes |
|
| ||
|
invalid-URL-unit
|
A code point is found that is not a URL unit . | · |
|
special-scheme-missing-following-solidus
|
The
input’s
scheme
is
not
followed
by
"
| · |
|
missing-scheme-non-relative-URL
|
The
input
is
missing
a
scheme
Input’s
scheme
Input’s
scheme
| Yes |
|
invalid-reverse-solidus
|
The URL has a special scheme and it uses U+005C (\) instead of U+002F (/). | · |
|
invalid-credentials
|
The input includes credentials . |
Yes
|
|
host-missing
|
The input has a special scheme , but does not contain a host . | Yes |
|
port-out-of-range
|
The input’s port is too big. | Yes |
|
port-invalid
|
The input’s port is invalid. | Yes |
|
file-invalid-Windows-drive-letter
|
The
input
is
a
relative-URL
string
| · |
|
file-invalid-Windows-drive-letter-host
|
A
| · |
1.2.
1.2.
Parsers
The
EOF code point
A
The EOF code point is a conceptual code point that signifies the end of a string or code point stream.
A
pointer
for
a
string
input
code point
is
an
integer
that
points
to
a
code
point
within
input
.
Initially
it
points
to
the
start
of
input
.
If
it
is
−1
it
points
nowhere.
If
it
is
greater
than
or
equal
to
input
’s
code point length
EOF code point
When a
’s
code
point
length
,
it
points
to
the
EOF
code
point
.
When
a
pointer
is
used,
c
code point
references
the
code
point
the
pointer
points
to
as
long
as
it
does
not
point
nowhere.
When
the
pointer
points
to
nowhere
c
When a
cannot
be
used.
When
a
pointer
is
used,
remaining
code point substring
references
the
code
point
substring
from
the
pointer
+
1
to
the
end
of
the
string,
as
long
as
c
EOF code point
is
not
the
EOF
code
point
.
When
c
EOF code point
is
the
EOF
code
point
remaining
cannot
be
used.
If "
If
"
mailto:username@example
" is a
"
is
a
string
being
processed
and
a
pointer
points
to
@,
c
is
U+0040
(@)
and
remaining
is "
is
"
example
".
".
If the empty string is being processed and a
If
the
empty
string
is
being
processed
and
a
pointer
points
to
the
start
and
is
then
decreased
by
1,
using
c
or
remaining
would
be
an
error.
1.3.
1.3.
Percent-encoded bytes
Percent-encoded
bytes
A
percent-encoded byte
ASCII hex digits
It is generally a good idea for sequences of
percent-encoded bytes
A percent-encoded byte is U+0025 (%), followed by two ASCII hex digits .
It
is
generally
a
good
idea
for
sequences
of
percent-encoded
bytes
to
be
such
that,
when
percent-decoded
UTF-8 decode without BOM or fail
percent-encoded bytes
host parser
URL rendering
percent-encoded bytes
and
then
passed
to
UTF-8
decode
without
BOM
or
fail
,
they
do
not
end
up
as
failure.
How
important
this
is
depends
on
where
the
percent-encoded
bytes
are
used.
E.g.,
for
the
host
parser
not
following
this
advice
is
fatal,
whereas
for
URL
rendering
the
percent-encoded
bytes
would
not
be
rendered
percent-decoded
To
.
To
percent-encode
a
byte
byte
,
return
a
string
ASCII upper hex digits
consisting
of
U+0025
(%),
followed
by
two
ASCII
upper
hex
digits
representing
byte
To
.
To
percent-decode
byte sequence
a
byte
sequence
input
Using anything but
UTF-8 decode without BOM
,
run
these
steps:
Using
anything
but
UTF-8
decode
without
BOM
when
input
ASCII bytes
Let
contains
bytes
that
are
not
ASCII
bytes
might
be
insecure
and
is
not
recommended.
Let output
byte sequence For each bytebe an empty byte sequence .For each byte byte in input
If:-
If byte is not 0x25 (%), then append byte to output
Otherwise, if. -
Otherwise, if byte is 0x25 (%) and the next two bytes after byte in input are not in the ranges 0x30 (0) to 0x39 (9), 0x41 (A) to 0x46 (F), and 0x61 (a) to 0x66 (f), all inclusive, append byte to output
Otherwise: Let. -
Otherwise:
Let bytePoint be the two bytes after byte in input , decoded
Append a byte whose value is, and then interpreted as hexadecimal number.Append a byte whose value is bytePoint to output
Skip the next two bytes in.-
Skip the next two bytes in input
Return.
-
-
Return output
To.
To
percent-decode
scalar value string
a
scalar
value
string
input
Let
:
-
Let bytes
UTF-8 encodingbe the UTF-8 encoding of inputReturn the. -
Return the percent-decoding of bytes
In general, percent-encoding results in a string with more U+0025 (%) code points than the input, and percent-decoding results in a byte sequence with less 0x25 (%) bytes than the input. The C0 control percent-encode set.
In general, percent-encoding results in a string with more U+0025 (%) code points than the input, and percent-decoding results in a byte sequence with less 0x25 (%) bytes than the input.
The
C0
control
percent-encode
set
C0 controls
code points
The
fragment percent-encode set
C0 control percent-encode set
The
query percent-encode set
C0 control percent-encode set
The
query percent-encode set
fragment percent-encode set
The
special-query percent-encode set
query percent-encode set
The
path percent-encode set
are
the
C0
controls
and
all
code
points
greater
than
U+007E
(~).
The fragment percent-encode set is the C0 control percent-encode set and U+0020 SPACE, U+0022 ("), U+003C (<), U+003E (>), and U+0060 (`).
The query percent-encode set is the C0 control percent-encode set and U+0020 SPACE, U+0022 ("), U+0023 (#), U+003C (<), and U+003E (>).
The query percent-encode set cannot be defined in terms of the fragment percent-encode set due to the omission of U+0060 (`).
The special-query percent-encode set is the query percent-encode set and U+0027 (').
The
path
percent-encode
set
query percent-encode set
The
userinfo percent-encode set
is
the
query
percent-encode
set
and
U+003F
(?),
U+0060
(`),
U+007B
({),
and
U+007D
(}).
The
userinfo
percent-encode
set
path percent-encode set
The
component percent-encode set
userinfo percent-encode set
This is used by
is
the
path
percent-encode
set
and
U+002F
(/),
U+003A
(:),
U+003B
(;),
U+003D
(=),
U+0040
(@),
U+005B
([)
to
U+005E
(^),
inclusive,
and
U+007C
(|).
The component percent-encode set is the userinfo percent-encode set and U+0024 ($) to U+0026 (&), inclusive, U+002B (+), and U+002C (,).
This
is
used
by
HTML
for
registerProtocolHandler()
,
and
could
also
be
used
by
other
standards
to
percent-encode
data
that
can
then
be
embedded
in
a
URL
’s
’s
path
,
query
,
or
fragment
; or in an
opaque host
UTF-8 percent-encode
;
or
in
an
opaque
host
.
Using
it
with
UTF-8
percent-encode
gives
identical
results
to
JavaScript’s
encodeURIComponent()
[sic]
.
[HTML]
[ECMA-262]
[sic]
The
The
application/x-www-form-urlencoded
percent-encode
set
is
the
component
percent-encode
set
and
U+0021
(!),
U+0027
(')
to
U+0029
RIGHT
PARENTHESIS,
inclusive,
and
U+007E
(~).
percent-encode set
component percent-encode set
The
The
application/x-www-form-urlencoded
percent-encode
set
contains
all
code
points,
except
the
ASCII
alphanumeric
,
U+002A
(*),
U+002D
(-),
U+002E
(.),
and
U+005F
(_).
percent-encode set
ASCII alphanumeric
To
percent-encode after encoding
To
percent-encode
after
encoding
,
given
an
encoding
scalar value string
encoding
,
scalar
value
string
input
,
a
percentEncodeSet
,
and
an
optional
boolean
spaceAsPlus
Let
(default
false):
Let encoder
getting an encoderbe the result of getting an encoder from encodingLet.-
Let inputQueue be input
I/O queue Letconverted to an I/O queue . Let output
Letbe the empty string.Let potentialError
This needs to be a non-null value to initiate the subsequent while loop. Whilebe 0.This needs to be a non-null value to initiate the subsequent while loop.
While potentialError
Letis non-null:Let encodeOutput
I/O queue Setbe an empty I/O queue .Set potentialError
encode or failto the result of running encode or fail with inputQueue , encoder , and encodeOutputFor each.-
For each byte of encodeOutput
Ifconverted to a byte sequence:If spaceAsPlus is true and byte is 0x20 (SP), then append U+002B (+) to output and continue
Let.-
Let isomorph
code pointbe a code point whose value is byte’s’s valueAssert:. -
Assert: percentEncodeSet
includes all non-includes all non-ASCII code points IfASCII code points . If isomorph is not in percentEncodeSet , then append isomorph to output
Otherwise,.-
Otherwise, percent-encode byte and append the result to output
If.
-
If potentialError
is non-null, then append "is non-null, then append "%26%23", followed by the shortest sequence of ASCII digits", followed by the shortest sequence of ASCII digits representing potentialErrorin base ten, followed by "in base ten, followed by "%3B", to", to outputThis can happen when.This can happen when encoding is not UTF-8
Return.
-
Return output
Of the possible values for the.
Of
the
possible
values
for
the
percentEncodeSet
component percent-encode set
argument
only
two
end
up
encoding
U+0025
(%)
and
thus
give
“roundtripable
data”:
component
percent-encode
set
and
application/x-www-form-urlencoded
percent-encode
set
.
The
other
values
for
the
percentEncodeSet
percent-encode set
URL parser
argument
—
which
happen
to
be
used
by
the
URL
parser
—
leave
U+0025
(%)
untouched
and
as
such
it
needs
to
be
percent-encoded
To
UTF-8 percent-encode
scalar value
first
in
order
to
be
properly
represented.
To
UTF-8
percent-encode
a
scalar
value
scalarValue
using
a
percentEncodeSet
percent-encode after encoding
,
return
the
result
of
running
percent-encode
after
encoding
with
UTF-8
,
scalarValue
as
a
string
,
and
percentEncodeSet
To
UTF-8 percent-encode
scalar value string
.
To
UTF-8
percent-encode
a
scalar
value
string
input
using
a
percentEncodeSet
percent-encode after encoding
,
return
the
result
of
running
percent-encode
after
encoding
with
UTF-8
,
input
,
and
percentEncodeSet
.
Here is a summary, by way of example, of the operations defined above:
| Operation | Input | Output |
|---|---|---|
|
Percent-encode
input
| 0x23 |
"
%23
|
| 0x7F |
"
%7F
| |
| Percent-decode input |
`
%25%s%1G
|
`
%%s%1G
|
| Percent-decode input |
"
‽%25%2E
| 0xE2 0x80 0xBD 0x25 0x2E |
|
|
"
"
|
"
%20
|
"
≡
|
"
%81%DF
| |
"
‽
|
"
%26%238253%3B
| |
|
|
"
¥
|
"
%1B(J\%1B(B
|
|
|
"
|
"
1+1+%81%DF+2%20%26%238253%3B
|
|
| U+2261 (≡) |
"
%E2%89%A1
|
| U+203D (‽) |
"
%E2%80%BD
| |
|
|
"
|
"
Say%20what%E2%80%BD
|
2.
2.
Security considerations
Security
considerations
The security of a
The
security
of
a
URL
is
a
function
of
its
environment.
Care
is
to
be
taken
when
rendering,
interpreting,
and
passing
URLs
When rendering and allocating new
around.
When
rendering
and
allocating
new
URLs
"spoofing"
needs
to
be
considered.
An
attack
whereby
one
host
or
URL
code points
can
be
confused
for
another.
For
instance,
consider
how
1/l/I,
m/rn/rri,
0/O,
and
а/a
can
all
appear
eerily
similar.
Or
worse,
consider
how
U+202A
LEFT-TO-RIGHT
EMBEDDING
and
similar
code
points
are
invisible.
[UTR36]
When passing a
When passing a URL from party A to B , both need to carefully consider what is happening. A might end up leaking data it does not want to leak. B might receive input it did not expect and take an action that harms the user. In particular, B should never trust A , as at some point URLs from A can come from untrusted sources.
3.
3.
Hosts (domains and IP addresses)
Hosts
(domains
and
IP
addresses)
At a high level, a
At
a
high
level,
a
host
valid host string
host parser
host serializer
The
host parser
scalar value string
,
valid
host
A
string
,
host
A
valid host string
validation error
host parser
The
host serializer
parser
,
and
host
ASCII string
serializer
relate
as
follows:
The host parser takes an arbitrary scalar value string and returns either failure or a host .
A host can be seen as the in-memory representation.
A valid host string defines what input would not trigger a validation error or failure when given to the host parser . I.e., input that would be considered conforming or valid.
The host serializer takes a host and returns an ASCII string . (If that string is then parsed , the result will equal the host that was serialized .)
A
parse
-
serialize
roundtrip
gives
the
following
results,
depending
on
the
isNotSpecial
host parser
Input
Output (
argument
to
the
host
parser
:
| Input |
Output
(
isNotSpecial
| Output ( isNotSpecial = true) |
|---|---|---|
EXAMPLE.COM
|
example.com
(
domain
|
EXAMPLE.COM
(
|
example%2Ecom
|
example%2Ecom
(
| |
faß.example
|
xn--fa-hia.example
(
domain
|
fa%C3%9F.example
(
|
0
|
0.0.0.0
(
IPv4
|
0
(
|
%30
|
%30
(
| |
0x
|
0x
(
| |
0xffffffff
|
255.255.255.255
(
IPv4
|
0xffffffff
(
|
[0:0::1]
|
[::1]
(
IPv6
| |
[0:0::1%5D
| Failure | |
[0:0::%31]
| ||
09
| Failure |
09
(
|
example.255
|
example.255
(
| |
example^example
| Failure | |
3.1.
3.1.
Host representation
Host
representation
A
A
host
is
a
domain
IP address
opaque host
empty host
,
an
IP
address
,
an
opaque
host
,
or
an
empty
host
.
Typically
a
host
serves
as
a
network
address,
but
it
is
sometimes
used
as
opaque
identifier
in
URLs
where
a
network
address
is
not
necessary.
A typical
A
typical
URL
whose
host
opaque host
is
an
opaque
host
is
git://github.com/whatwg/url.git
.
The RFCs referenced in the paragraphs below are for informative purposes only. They have no influence on
The
RFCs
referenced
in
the
paragraphs
below
are
for
informative
purposes
only.
They
have
no
influence
on
host
A
writing,
parsing,
and
serialization.
Unless
stated
otherwise
in
the
sections
that
follow.
A
domain
ASCII string
is
a
non-empty
ASCII
string
that
identifies
a
realm
within
a
network.
[RFC1034]
The
domain labels
The
domain
labels
of
a
domain
strictly splitting
domain
The
are
the
result
of
strictly
splitting
domain
on
U+002E
(.).
The
example.com
and
example.com.
domains
An
IP address
IPv4 address
IPv6 address
An
IPv4 address
are
not
equivalent
and
typically
treated
as
distinct.
An IP address is an IPv4 address or an IPv6 address .
An
IPv4
address
is
a
32-bit
unsigned
integer
that
identifies
a
network
address.
[RFC791]
An
IPv6 address
An
IPv6
address
is
a
128-bit
unsigned
integer
that
identifies
a
network
address.
For
the
purposes
of
this
standard
it
is
represented
as
a
list
IPv6 pieces
of
eight
16-bit
unsigned
integers,
also
known
as
IPv6
pieces
.
[RFC4291]
Support for
Support
for
<zone_id>
is
intentionally
omitted
.intentionally omitted
An
opaque host
ASCII string
An
empty host
An opaque host is a non-empty ASCII string that can be used for further processing.
An empty host is the empty string.
3.2.
3.2.
Host miscellaneous
Host
miscellaneous
A
forbidden host code point
A
forbidden domain code point
forbidden host code point
C0 control
To obtain the
public suffix
A forbidden host code point is U+0000 NULL, U+0009 TAB, U+000A LF, U+000D CR, U+0020 SPACE, U+0023 (#), U+002F (/), U+003A (:), U+003C (<), U+003E (>), U+003F (?), U+0040 (@), U+005B ([), U+005C (\), U+005D (]), U+005E (^), or U+007C (|).
A forbidden domain code point is a forbidden host code point , a C0 control , U+0025 (%), or U+007F DELETE.
To
obtain
the
public
suffix
of
a
host
host
,
run
these
steps.
They
return
null
or
a
domain
representing
a
portion
of
host
Public Suffix List
that
is
included
on
the
Public
Suffix
List
.
[PSL]
If
-
If host is not a domain
Let, then return null. Let trailingDot
be "be "." if" if hostends with "ends with "."; otherwise the empty string. Let"; otherwise the empty string.Let publicSuffix
Public Suffix List algorithmbe the public suffix determined by running the Public Suffix List algorithm with host as domain. [PSL]Assert:-
Assert: publicSuffix
ASCII string end with "is an ASCII string that does not end with ".". Return". Return publicSuffix and trailingDot
To obtain the registrable domainconcatenated.
To
obtain
the
registrable
domain
of
a
host
host
,
run
these
steps.
They
return
null
or
a
domain
formed
by
host
’s
public suffix
domain label
If
’s
public
suffix
and
the
domain
label
preceding
it,
if
any.
If host
’s public suffix’s public suffix is null or host’s public suffix’s public suffix equals hostLet, then return null.Let trailingDot
be "be "." if" if hostends with "ends with "."; otherwise the empty string. Let"; otherwise the empty string.Let registrableDomain
Public Suffix List algorithmbe the registrable domain determined by running the Public Suffix List algorithm with host as domain. [PSL]Assert:-
Assert: registrableDomain
ASCII string end with "is an ASCII string that does not end with ".". Return". Return registrableDomain and trailingDot concatenated.
| Host input | Public suffix | Registrable domain |
|---|---|---|
com
|
com
| null |
example.com
|
com
|
example.com
|
www.example.com
|
com
|
example.com
|
sub.www.example.com
|
com
|
example.com
|
EXAMPLE.COM
|
com
|
example.com
|
example.com.
|
com.
|
example.com.
|
github.io
|
github.io
| null |
whatwg.github.io
|
github.io
|
whatwg.github.io
|
إختبار
|
xn--kgbechtv
| null |
example.إختبار
|
xn--kgbechtv
|
example.xn--kgbechtv
|
sub.example.إختبار
|
xn--kgbechtv
|
example.xn--kgbechtv
|
[2001:0db8:85a3:0000:0000:8a2e:0370:7334]
| null | null |
Specifications
should
prefer
the
origin
concept for security decisions. The notion of "
concept
for
security
decisions.
The
notion
of
"
public suffix
public
suffix
" and "
"
and
"
registrable domain
registrable
domain
" cannot be relied-upon to provide a hard security boundary, as the public suffix list will diverge from client to client. Specifications which ignore this advice are encouraged to carefully consider whether URLs' schemes ought to be incorporated into any decisions made, i.e. whether to use the
same site
schemelessly same site
"
cannot
be
relied-upon
to
provide
a
hard
security
boundary,
as
the
public
suffix
list
will
diverge
from
client
to
client.
Specifications
which
ignore
this
advice
are
encouraged
to
carefully
consider
whether
URLs'
schemes
ought
to
be
incorporated
into
any
decisions
made,
i.e.
whether
to
use
the
same
site
or
schemelessly
same
site
concepts.
3.3.
3.3.
IDNA
The
domain to ASCII
The
domain
to
ASCII
algorithm,
given
a
string
domain
and
a
boolean
beStrict
Let
,
runs
these
steps:
Let result
Unicode ToASCIIbe the result of running Unicode ToASCII with domain_name set to domain , UseSTD3ASCIIRules set to beStrict , CheckHyphens set to false, CheckBidi set to true, CheckJoiners set to true, Transitional_Processing set to false, and VerifyDnsLength set to beStrict . [UTS46]IfIf beStrict is false, domain
ASCII string strictly splittingis an ASCII string , and strictly splitting domain on U+002E (.) does not produce any itemstarts with ASCII case-insensitive match for "that starts with an ASCII case-insensitive match for "xn--", this step is equivalent to ASCII lowercasing", this step is equivalent to ASCII lowercasing domainIf.-
If result is a failure value, domain-to-ASCII
validation error Ifvalidation error , return failure. If result is the empty string, domain-to-ASCII
validation error Returnvalidation error , return failure.Return result
This document and the web platform at large use Unicode IDNA Compatibility Processing.
This
document
and
the
web
platform
at
large
use
Unicode
IDNA
Compatibility
Processing
and
not
IDNA2008.
For
instance,
☕.example
becomes
xn--53h.example
and
not
failure.
[UTS46]
[RFC5890]
The
domain to Unicode
The
domain
to
Unicode
algorithm,
given
a
domain
domain
and
a
boolean
beStrict
Let
,
runs
these
steps:
Let result
Unicode ToUnicodebe the result of running Unicode ToUnicode with domain_name set to domain , CheckHyphens set to false, CheckBidi set to true, CheckJoiners set to true, UseSTD3ASCIIRules set to beStrict , and Transitional_Processing set to false. [UTS46]Signify-
Signify domain-to-Unicode
validation errorsvalidation errors for any returned errors, and then, return result .
3.4.
3.4.
Host writing
Host
writing
A
valid host string
A
valid
host
string
valid domain string
valid IPv4-address string
valid IPv6-address string
A
must
be
a
valid
domain
valid domain
Let
string
,
a
valid
IPv4-address
string
,
or:
U+005B
([),
followed
by
a
valid
IPv6-address
string
,
followed
by
U+005D
(]).
A domain is a valid domain if these steps return success:
Let result
domain to ASCIIbe the result of running domainIfto ASCII with domain and true.If result
Setis failure, then return failure.Set result
domain to Unicodeto the resultIfof running domain to Unicode with resultReturn success. Ideally we define this in terms of a sequence of code points that make up a valid domain issue 245 A valid domain stringand true.If result contains any errors, return failure.
Return success.
Ideally we define this in terms of a sequence of code points that make up a valid domain rather than through a whack-a-mole: issue 245 .
A
valid
domain
string
valid domain
A
valid IPv4-address string
must
be
a
string
that
is
a
valid
domain
.
A
valid
IPv4-address
string
ASCII digits
A
valid IPv6-address string
must
be
four
shortest
possible
strings
of
ASCII
digits
,
representing
a
decimal
number
in
the
range
0
to
255,
inclusive,
separated
from
each
other
by
U+002E
(.).
A
valid
IPv6-address
string
"Text Representation of Addresses" chapter of IP Version 6 Addressing Architecture
is
defined
in
the
"Text
Representation
of
Addresses"
chapter
of
IP
Version
6
Addressing
Architecture
.
[RFC4291]
A
valid opaque-host string
one or more
URL units
forbidden host code points
U+005B ([), followed by a
valid IPv6-address string
This is not part of the definition of
valid host string
A valid opaque-host string must be one of the following:
one or more URL units excluding forbidden host code points
U+005B ([), followed by a valid IPv6-address string , followed by U+005D (]).
This is not part of the definition of valid host string as it requires context to be distinguished.
3.5.
3.5.
Host parsing
Host
parsing
The
host parser
scalar value string
The
host
parser
takes
a
scalar
value
string
input
with
an
optional
boolean
isNotSpecial
(default
false),
and
then
runs
these
steps.
They
return
failure
or
a
host
If
.
-
If input
Ifstarts with U+005B ([), then:If input does not end with U+005D (]), IPv6-unclosed
validation error Return the result of IPv6 parsingvalidation error , return failure.Return the result of IPv6 parsing input
Ifwith its leading U+005B ([) and trailing U+005D (]) removed.
If isNotSpecial
opaque-host parsingis true, then return the result of opaque-host parsing inputAssert:.-
Assert: input
Letis not the empty string. Let domain
UTF-8 decode without BOMbe the result of running UTF-8 decode without BOM on the percent-decoding of inputAlternatively UTF-8 decode without BOM or fail domain to ASCII Let.Alternatively UTF-8 decode without BOM or fail can be used, coupled with an early return for failure, as domain to ASCII fails on U+FFFD (�).
Let asciiDomain
domain to ASCIIbe the result of running domainIfto ASCII with domain and false.If asciiDomain
Ifis failure, then return failure.If asciiDomain
forbidden domain code pointcontains a forbidden domain code point , domain-invalid-code-pointvalidation error Ifvalidation error , return failure.If asciiDomain
ends in a number IPv4 parsingends in a number , then return the result of IPv4 parsing asciiDomainReturn.-
Return asciiDomain
The ends in a number checker ASCII string.
The
ends
in
a
number
checker
takes
an
ASCII
string
input
Let
and
then
runs
these
steps.
They
return
a
boolean.
Let parts
strictly splittingbe the result of strictly splitting inputIf the laston U+002E (.).If the last item in parts
Ifis the empty string, then:-
Let last be the last item in parts
If. -
If last
ASCII digits The erroneous input "is non-empty and contains only ASCII digits , then return true.The erroneous input "
09" will be caught by the IPv4 parser If parsing" will be caught by the IPv4 parser at a later stage. If parsing last
IPv4 number This is equivalent to checking thatas an IPv4 number does not return failure, then return true.This is equivalent to checking that last
is "is "0X" or "" or "0x", followed by zero or more ASCII hex digits Return false. The IPv4 parser ASCII string", followed by zero or more ASCII hex digits .Return false.
The
IPv4
parser
takes
an
ASCII
string
input
IPv4 address
The
IPv4 parser
host parser
IPv4 address
Let
and
then
runs
these
steps.
They
return
failure
or
an
IPv4
address
.
The IPv4 parser is not to be invoked directly. Instead check that the return value of the host parser is an IPv4 address .
Let parts
strictly splittingbe the result of strictly splitting inputIf the laston U+002E (.).If the last item in parts is the empty string, then:
-
If parts
’s’s size is greater than 4, IPv4-too-many-partsvalidation error Letvalidation error , return failure. Let numbers be an empty list .
-
For eachFor each part of partsLet:-
Let result be the result of parsing part
If. -
If result is failure, IPv4-non-numeric-part
validation error Ifvalidation error , return failure. If result
[1] is true,[1] is true, IPv4-non-decimal-partvalidation errorvalidation error .Append result
[0] to[0] to numbersIf any item in.
-
-
If any item in numbers is greater than 255, IPv4-out-of-range-part
validation error If any but the lastvalidation error . If any but the last item in numbers
If the lastis greater than 255, then return failure.If the last item in numbers
is greater than or equal to 256is greater than or equal to 256(5 −(5 − numbers’s’s size )Let, then return failure.Let ipv4 be the last item in numbers .
-
Let counter be 0.
For eachFor each n of numbersIncrement:-
Increment ipv4 by n
× 256× 256(3 −(3 − counter )Increment. -
Increment counter
Returnby 1.
-
Return ipv4
The IPv4 number parser ASCII string.
The
IPv4
number
parser
takes
an
ASCII
string
input
and
then
runs
these
steps.
They
return
failure
or
a
tuple
If
of
a
number
and
a
boolean.
If input
Letis the empty string, then return failure.Let validationError
Letbe false.Let R
Ifbe 10.If input
contains at least two code points and the first two code points are either "contains at least two code points and the first two code points are either "0X" or "" or "0x", then: Set", then:Set validationError
Remove the first two code points fromto true.Remove the first two code points from input
Set.-
Set R
Otherwise, ifto 16.
Otherwise, if input
Setcontains at least two code points and the first code point is U+0030 (0), then:Set validationError
Remove the first code point fromto true.Remove the first code point from input
Set.-
Set R
Ifto 8.
If input
Ifis the empty string, then return (0, true).If input
contains a code point that is not a radix-contains a code point that is not a radix- RLetdigit, then return failure.Let output be the mathematical integer value that is represented by input
in radix-in radix- RASCII hex digits Return (notation, using ASCII hex digits for digits with values 0 through 15.Return ( output , validationError
). The IPv6 parser scalar value string).
The
IPv6
parser
takes
a
scalar
value
string
input
IPv6 address
The
IPv6 parser
Let
and
then
runs
these
steps.
They
return
failure
or
an
IPv6
address
IPv6 address
IPv6 pieces
Let
.
The IPv6 parser could in theory be invoked directly, but please discuss actually doing that with the editors of this document first.
Let address be a new IPv6 address whose IPv6 pieces are all 0.
Let pieceIndex
Letbe 0.Let compress
Letbe null.Let pointer be a pointer for input
If.-
If c
Ifis U+003A (:), then:If remaining does not start with U+003A (:), IPv6-invalid-compression
validation error Increasevalidation error , return failure.Increase pointer
Increaseby 2.Increase pieceIndex by 1 and then set compress to pieceIndex
While.
-
While c
EOF code point Ifis not the EOF code point :If pieceIndex is 8, IPv6-too-many-pieces
validation error Ifvalidation error , return failure.If c
Ifis U+003A (:), then:If compress is non-null, IPv6-multiple-compression
validation error Increasevalidation error , return failure.-
Increase
pointer
and
pieceIndex
by
1,
set
compress
to
pieceIndex
,
and
then
continue
Let.
-
Let value and length
Whilebe 0. While length is less than 4 and c
ASCII hex digitis an ASCII hex digit , set value to value × 0x10 + c interpreted as hexadecimal number, and increase pointer and lengthIfby 1.If c
Ifis U+002E (.), then:If length is 0, IPv4-in-IPv6-invalid-code-point
validation error Decreasevalidation error , return failure.Decrease pointer by length
If.-
If pieceIndex is greater than 6, IPv4-in-IPv6-too-many-pieces
validation error Letvalidation error , return failure. Let numbersSeen
Whilebe 0.While c
EOF code point Letis not the EOF code point :Let ipv4Piece
Ifbe null.If numbersSeen
Ifis greater than 0, then:If c is a U+002E (.) and numbersSeen is less than 4, then increase pointer
Otherwise,by 1.-
Otherwise,
IPv4-in-IPv6-invalid-code-point
validation error Ifvalidation error , return failure.
If c
ASCII digitis not an ASCII digit , IPv4-in-IPv6-invalid-code-pointvalidation error Whilevalidation error , return failure.While c
ASCII digit Letis an ASCII digit :Let number be c
Ifinterpreted as decimal number.If ipv4Piece is null, then set ipv4Piece to number
Otherwise, if.Otherwise, if ipv4Piece is 0, IPv4-in-IPv6-invalid-code-point
validation error Otherwise, setvalidation error , return failure.Otherwise, set ipv4Piece to ipv4Piece × 10 + number
If.-
If ipv4Piece is greater than 255, IPv4-in-IPv6-out-of-range-part
validation error Increasevalidation error , return failure. Increase pointer
Setby 1.
Set address [ pieceIndex
] to] to address [ pieceIndex] × 0x100 +] × 0x100 + ipv4PieceIncrease.-
Increase numbersSeen
Ifby 1. If numbersSeen is 2 or 4, then increase pieceIndex
Ifby 1.
If numbersSeen is not 4, IPv4-in-IPv6-too-few-parts
validation errorvalidation error , return failure.
-
Otherwise, if c
Increaseis U+003A (:):Increase pointer
Ifby 1.If c
EOF code pointis the EOF code point , IPv6-invalid-code-pointvalidation error Otherwise, ifvalidation error , return failure.
Otherwise, if c
EOF code pointis not the EOF code point , IPv6-invalid-code-pointvalidation error Setvalidation error , return failure.Set address [ pieceIndex
] to] to valueIncrease.-
Increase pieceIndex
Ifby 1.
If compress
Letis non-null, then:Let swaps be pieceIndex − compress
Set.-
Set pieceIndex
Whileto 7. While pieceIndex is not 0 and swaps is greater than 0, swap address [ pieceIndex
] with] with address [ compress + swaps − 1], and then decrease both pieceIndex and swapsOtherwise, ifby 1.
Otherwise, if compress is null and pieceIndex is not 8, IPv6-too-few-pieces
validation error Returnvalidation error , return failure.Return address
The opaque-host parser scalar value string.
The
opaque-host
parser
takes
a
scalar
value
string
input
opaque host
If
,
and
then
runs
these
steps.
They
return
failure
or
an
opaque
host
.
If input
forbidden host code pointcontains a forbidden host code point , host-invalid-code-pointvalidation error Ifvalidation error , return failure.If input
code point URL code pointcontains a code point that is not a URL code point and not U+0025 (%), invalid-URL-unitvalidation error Ifvalidation error .If input
code points ASCII hex digitscontains a U+0025 (%) and the two code points following it are not ASCII hex digits , invalid-URL-unitvalidation error Return the result of running UTF-8 percent-encodevalidation error .Return the result of running UTF-8 percent-encode on input
C0 control percent-encode setusing the C0 control percent-encode set .
3.6.
3.6.
Host serializing
Host
serializing
The
host serializer
The
host
serializer
takes
a
host
ASCII string
If
host
IPv4 address
IPv4 serializer
and
then
runs
these
steps.
They
return
an
ASCII
string
.
If host
Otherwise, ifis an IPv4 address , return the result of running the IPv4 serializer on hostIPv6 address IPv6 serializer.-
Otherwise, if host
Otherwise,is an IPv6 address , return U+005B ([), followed by the result of running the IPv6 serializer on host , followed by U+005D (]). Otherwise, host is a domain
opaque host empty host, opaque hostThe IPv4 serializer IPv4 address, or empty host , return host .
The
IPv4
serializer
takes
an
IPv4
address
ASCII string
address
and
then
runs
these
steps.
They
return
an
ASCII
string
.
Let
-
Let output
Letbe the empty string. Let n be the value of address .
-
For eachFor each iPrependin the range 1 to 4, inclusive:Prepend n % 256, serialized , to output
If.-
If i is not 4, then prepend U+002E (.) to output
Set. -
Set n
to floor(to floor( nReturn/ 256).
Return output
The IPv6 serializer IPv6 address.
The
IPv6
serializer
takes
an
IPv6
address
ASCII string
Let
address
and
then
runs
these
steps.
They
return
an
ASCII
string
.
Let output
Letbe the empty string.Let compress
IPv6 piecebe an index to the first IPv6 piece in the first longest sequences of address’s IPv6 pieces’s IPv6 pieces that are 0.InIn0:f:0:0:f:f:0:0it would point to the second 0.If there is no sequence ofIf there is no sequence of address
’s IPv6 pieces’s IPv6 pieces that are 0 that is longer than 1, then set compressLetto null.Let ignore0 be false.
For eachFor each pieceIndexIfin the range 0 to 7, inclusive:If ignore0 is true and address [ pieceIndex
] is 0, then] is 0, then continueOtherwise, if.-
Otherwise, if ignore0 is true, set ignore0
Ifto false. If compress is pieceIndex
Let, then:Let separator
be "be "::" if" if pieceIndexAppendis 0, and U+003A (:) otherwise.Append separator to output
Set.-
Set ignore0 to true and continue
Append.
-
Append address [ pieceIndex
], represented as the shortest possible lowercase hexadecimal number, to], represented as the shortest possible lowercase hexadecimal number, to outputIf. -
If pieceIndex is not 7, then append U+003A (:) to output
Return.
-
Return output
This algorithm requires the recommendation from A Recommendation for IPv6 Address Text Representation..
This algorithm requires the recommendation from A Recommendation for IPv6 Address Text Representation. [RFC5952]
3.7.
3.7.
Host equivalence
Host
equivalence
To determine whether a
To
determine
whether
a
host
A
equals
host
B
,
return
true
if
A
is
B
Certificate comparison requires a host equivalence check that ignores the trailing dot of a domain (if any). However, those hosts have also various other facets enforced, such as DNS length, that are not enforced here, as URLs do not enforce them. If anyone has a good suggestion for how to bring these two closer together, or what a good unified model would be, please file an issue.
,
and
false
otherwise.
Certificate comparison requires a host equivalence check that ignores the trailing dot of a domain (if any). However, those hosts have also various other facets enforced, such as DNS length, that are not enforced here, as URLs do not enforce them. If anyone has a good suggestion for how to bring these two closer together, or what a good unified model would be, please file an issue.
4.
4.
URLs
At a high level, a
At
a
high
level,
a
URL
valid URL string
URL parser
URL serializer
The
URL parser
scalar value string
,
valid
URL
validation errors
A
string
,
URL
A
valid URL string
validation error
URL parser
The
URL serializer
parser
,
and
URL
ASCII string
serializer
relate
as
follows:
The URL parser takes an arbitrary scalar value string and returns either failure or a URL . It might also record zero or more validation errors .
A URL can be seen as the in-memory representation.
A valid URL string defines what input would not trigger a validation error or failure when given to the URL parser . I.e., input that would be considered conforming or valid.
The URL serializer takes a URL and returns an ASCII string . (If that string is then parsed , the result will equal the URL that was serialized
URL serializer valid URL string.) The output of the URL serializer is not always a valid URL string .
| Input | Base | Valid | Output |
|---|---|---|---|
https:example.org
| ❌ |
https://example.org/
| |
https://////example.com///
| ❌ |
https://example.com///
| |
https://example.com/././foo
| ✅ |
https://example.com/foo
| |
hello:world
|
https://example.com/
| ✅ |
hello:world
|
https:example.org
|
https://example.com/
| ❌ |
https://example.com/example.org
|
\example\..\demo/.\
|
https://example.com/
| ❌ |
https://example.com/demo/
|
example
|
https://example.com/demo
| ✅ |
https://example.com/example
|
file:///C|/demo
| ❌ |
file:///C:/demo
| |
..
|
file:///C:/demo
| ✅ |
file:///C:/
|
file://loc%61lhost/
| ✅ |
file:///
| |
https://user:password@example.org/
| ❌ |
https://user:password@example.org/
| |
| ❌ |
https://example.org/foo%20bar
| |
https://EXAMPLE.com/../x
| ✅ |
https://example.com/x
| |
| ❌ | Failure | |
example
| ❌, due to lack of base | Failure | |
https://example.com:demo
| ❌ | Failure | |
http://[www.example.com]/
| ❌ | Failure | |
https://example.org//
| ✅ |
https://example.org//
| |
https://example.com/[]?[]#[]
| ❌ |
https://example.com/[]?[]#[]
| |
https://example/%?%#%
| ❌ |
https://example/%?%#%
| |
https://example/%25?%25#%25
| ✅ |
https://example/%25?%25#%25
|
The base and output URL are represented in serialized form for brevity.
4.1.
4.1.
URL representation
URL
representation
A
A
URL
is
a
struct
valid URL string
URL record
A
that
represents
a
universal
identifier.
To
disambiguate
from
a
valid
URL
string
’s
it
can
also
be
referred
to
as
a
URL
record
.
A
URL
’s
scheme
ASCII string
is
an
ASCII
string
that
identifies
the
type
of
URL
and
can
be
used
to
dispatch
a
URL
for
further
processing
after
parsing
A
.
It
is
initially
the
empty
string.
A
URL
’s
’s
username
ASCII string
A
is
an
ASCII
string
identifying
a
username.
It
is
initially
the
empty
string.
A
URL
’s
’s
password
ASCII string
A
is
an
ASCII
string
identifying
a
password.
It
is
initially
the
empty
string.
A
URL
’s
’s
host
is
null
or
a
host
The following table lists allowed
.
It
is
initially
null.
The
following
table
lists
allowed
URL
’s
’s
scheme
/
host
combinations.
| scheme | host | |||||
|---|---|---|---|---|---|---|
| domain |
|
|
|
| null | |
file
| ✅ | ✅ | ✅ | ❌ | ❌ | ❌ |
"
file
| ✅ | ✅ | ✅ | ❌ | ✅ | ❌ |
| Others | ❌ | ❌ | ✅ | ✅ | ✅ | ✅ |
A
URL
’s
’s
port
A
is
either
null
or
a
16-bit
unsigned
integer
that
identifies
a
networking
port.
It
is
initially
null.
A
URL
’s
’s
path
URL path segment
is
either
a
URL
path
segment
or
a
list
URL path segments
A
of
zero
or
more
URL
path
segments
,
usually
identifying
a
location.
It
is
initially
«
».
A
special
URL
’s
’s
path
is
always
a
list
,
i.e.,
it
is
never
opaque
A
.
A
URL
’s
’s
query
ASCII string
A
is
either
null
or
an
ASCII
string
.
It
is
initially
null.
A
URL
’s
’s
fragment
ASCII string
is
either
null
or
an
ASCII
string
that
can
be
used
for
further
processing
on
the
resource
the
URL
’s other components identify. It is initially null.
A
’s
other
components
identify.
It
is
initially
null.
A
URL
blob URL entry
blob URL entry
This is used to support caching the object a "
also
has
an
associated
blob
URL
entry
that
is
either
null
or
a
blob
URL
entry
.
It
is
initially
null.
This
is
used
to
support
caching
the
object
a
"
blob
" URL refers to as well as its origin. It is important that these are cached as the
"
URL
blob URL store
refers
to
as
well
as
its
origin.
It
is
important
that
these
are
cached
as
the
URL
might
be
removed
from
the
blob
URL
store
between
parsing
and
fetching,
while
fetching
will
still
need
to
succeed.
The
following
table
lists
how
valid
URL
strings
,
when
parsed
,
map
to
a
URL
’s components.
’s
components.
Username
,
password
blob URL entry
Input
,
and
blob
URL
entry
are
omitted;
in
the
examples
below
they
are
the
empty
string,
the
empty
string,
and
null,
respectively.
| Input | Scheme | Host | Port | Path | Query | Fragment |
|---|---|---|---|---|---|---|
https://example.com/
|
"
https
|
"
example.com
| null | « the empty string » | null | null |
https://localhost:8000/search?q=text#hello
|
"
https
|
"
localhost
| 8000 |
«
"
search
|
"
q=text
|
"
hello
|
urn:isbn:9780307476463
|
"
urn
| null | null |
"
isbn:9780307476463
| null | null |
|
"
file
| null | null |
«
"
ada
Analytical%20Engine
README.md
| null | null |
A URL path segment is an ASCII string . It commonly refers to a directory or a file, but has no predefined meaning.
A
single-dot
URL
path
segment
URL path segment
that is "
is
a
URL
path
segment
that
is
"
.
" or an
ASCII case-insensitive
match for "
"
or
an
ASCII
case-insensitive
match
for
"
%2e
".
A
double-dot URL path segment
".
A
double-dot
URL
path
segment
URL path segment
that is "
is
a
URL
path
segment
that
is
"
..
" or an
ASCII case-insensitive
match for "
"
or
an
ASCII
case-insensitive
match
for
"
.%2e
", "
",
"
%2e.
", or "
",
or
"
%2e%2e
".
".
4.2.
4.2.
URL miscellaneous
URL
miscellaneous
A
special scheme
ASCII string
default port
special scheme
default port
ASCII string
A special scheme is an ASCII string that is listed in the first column of the following table. The default port for a special scheme is listed in the second column on the same row. The default port for any other ASCII string is null.
|
|
|
|---|---|
"
ftp
| 21 |
"
file
| null |
"
http
| 80 |
"
https
| 443 |
"
ws
| 80 |
"
wss
| 443 |
A
URL
is special
is
special
if
its
scheme
special scheme
is
a
special
scheme
.
A
URL
is not special
is
not
special
if
its
scheme
special scheme
A
is
not
a
special
scheme
.
A
URL
includes credentials
includes
credentials
if
its
username
or
password
A
is
not
the
empty
string.
A
URL
opaque path
has
an
opaque
path
URL path segment
A
if
its
path
is
a
URL
cannot have a username/password/port
path
segment
.
A
URL
cannot
have
a
username/password/port
if
its
host
is
null
or
the
empty
string,
or
its
scheme
is "
is
"
file
".
A
".
A
URL
base URL
A
base URL
URL parser
relative-URL string
A
Windows drive letter
ASCII alpha
A
normalized Windows drive letter
Windows drive letter
As per the
URL writing
normalized Windows drive letter
A string
starts with a Windows drive letter
its
can
be
designated
as
base
URL
.
A base URL is useful for the URL parser when the input might be a relative-URL string .
A Windows drive letter is two code points, of which the first is an ASCII alpha and the second is either U+003A (:) or U+007C (|).
A normalized Windows drive letter is a Windows drive letter of which the second code point is U+003A (:).
As per the URL writing section, only a normalized Windows drive letter is conforming.
A string starts with a Windows drive letter if all of the following are true:
-
its
length
its first two code points are a Windows drive letter itsis greater than or equal to 2 - its first two code points are a Windows drive letter
- its length is 2 or its third code point is U+002F (/), U+005C (\), U+003F (?), or U+0023 (#).
| String | Starts with a Windows drive letter |
|---|---|
"
c:
| ✅ |
"
c:/
| ✅ |
"
c:a
| ❌ |
To
shorten
a
url
’s path
’s
path
:
Assert : url
opaque path Letdoes not have an opaque path .-
Let path be url
’s’s pathIf. -
If url
’s’s schemeis "is "file",", path’s’s size is 1, and path[0] is a normalized Windows drive letter[0] is a normalized Windows drive letter , then return. Remove path
’s last item, if any.’s last item, if any.
4.3.
4.3.
URL writing
URL
writing
A
valid URL string
A
valid
URL
string
relative-URL-with-fragment string
absolute-URL-with-fragment string
An
absolute-URL-with-fragment string
must
be
either
a
relative-URL-with-fragment
string
or
an
absolute-URL-with-fragment
string
.
An
absolute-URL-with-fragment
string
absolute-URL string
URL-fragment string
An
absolute-URL string
must
be
an
absolute-URL
string
,
optionally
followed
by
U+0023
(#)
and
a
URL-fragment
string
.
An
absolute-URL
string
a
URL-scheme string
ASCII case-insensitive
special scheme
ASCII case-insensitive
match for "
must
be
one
of
the
following:
a URL-scheme string that is an ASCII case-insensitive match for a special scheme and not an ASCII case-insensitive match for "
file", followed by U+003A (:) and a scheme-relative-special-URL string a URL-scheme string", followed by U+003A (:) and a scheme-relative-special-URL stringa URL-scheme string that is not
ASCII case-insensitive special scheme relative-URL string a URL-scheme string ASCII case-insensitive match for "an ASCII case-insensitive match for a special scheme , followed by U+003A (:) and a relative-URL stringa URL-scheme string that is an ASCII case-insensitive match for "
file", followed by U+003A (:) and a scheme-relative-file-URL string any optionally followed by U+003F (?) and a URL-query string A URL-scheme string", followed by U+003A (:) and a scheme-relative-file-URL string
any optionally followed by U+003F (?) and a URL-query string .
A
URL-scheme
string
ASCII alpha
ASCII alphanumeric
must
be
one
ASCII
alpha
,
followed
by
zero
or
more
of
ASCII
alphanumeric
,
U+002B
(+),
U+002D
(-),
and
U+002E
(.).
Schemes
IANA URI [sic] Schemes
should
be
registered
in
the
IANA
URI
[sic]
Schemes
registry.
[IANA-URI-SCHEMES]
[RFC7595]
A
relative-URL-with-fragment string
A
relative-URL-with-fragment
string
relative-URL string
URL-fragment string
A
relative-URL string
must
be
a
relative-URL
string
,
optionally
followed
by
U+0023
(#)
and
a
URL-fragment
string
.
A
relative-URL
string
base URL
must
be
one
of
the
following,
switching
on
base
URL
’s
’s
scheme
A
special scheme
that is not "
:
-
A
special
scheme
that
is
not
"
file" a scheme-relative-special-URL string a path-absolute-URL string a path-relative-scheme-less-URL string" a scheme-relative-special-URL string
-
"
file" a scheme-relative-file-URL string a path-absolute-URL string base URL" a scheme-relative-file-URL string
’sa path-absolute-URL string if base URL ’s host
empty host a path-absolute-non-Windows-file-URL string base URL’sis an empty hostempty host a path-relative-scheme-less-URL string Otherwise a scheme-relative-URL string a path-absolute-URL string a path-relative-scheme-less-URL string any optionally followed by U+003F (?) and a URL-query string A non-null base URLa path-absolute-non-Windows-file-URL string if base URL ’s host is not an empty host
- Otherwise
any optionally followed by U+003F (?) and a URL-query string .
A
non-null
base
URL
is
necessary
when
parsing
relative-URL string
A
scheme-relative-special-URL string
must be "
a
relative-URL
string
.
A
scheme-relative-special-URL
string
must
be
"
//
", followed by a
valid host string
URL-port string
path-absolute-URL string
A
URL-port string
",
followed
by
a
valid
host
string
,
optionally
followed
by
U+003A
(:)
and
a
URL-port
string
,
optionally
followed
by
a
path-absolute-URL
string
.
A
URL-port
string
the empty string
one or more
ASCII digits
representing a decimal number no greater than 2
must
be
one
of
the
following:
the empty string
one or more ASCII digits representing a decimal number no greater than 2 16
− 1. A scheme-relative-URL string− 1.
A
scheme-relative-URL
string
must be "
must
be
"
//
", followed by an
opaque-host-and-port string
path-absolute-URL string
An
opaque-host-and-port string
valid opaque-host string
URL-port string
A
scheme-relative-file-URL string
",
followed
by
an
opaque-host-and-port
string
,
optionally
followed
by
a
path-absolute-URL
string
.
An opaque-host-and-port string must be either the empty string or: a valid opaque-host string , optionally followed by U+003A (:) and a URL-port string .
A
scheme-relative-file-URL
string
must be "
must
be
"
//
", followed by one of the following:
a
valid host string
path-absolute-non-Windows-file-URL string
a
path-absolute-URL string
A
path-absolute-URL string
",
followed
by
one
of
the
following:
a valid host string , optionally followed by a path-absolute-non-Windows-file-URL string
A
path-absolute-URL
string
path-relative-URL string
A
path-absolute-non-Windows-file-URL string
must
be
U+002F
(/)
followed
by
a
path-relative-URL
string
.
A
path-absolute-non-Windows-file-URL
string
path-absolute-URL string
Windows drive letter
A
path-relative-URL string
must
be
a
path-absolute-URL
string
that
does
not
start
with:
U+002F
(/),
followed
by
a
Windows
drive
letter
,
followed
by
U+002F
(/).
A
path-relative-URL
string
URL-path-segment strings
A
path-relative-scheme-less-URL string
must
be
zero
or
more
URL-path-segment
strings
,
separated
from
each
other
by
U+002F
(/),
and
not
start
with
U+002F
(/).
A
path-relative-scheme-less-URL
string
path-relative-URL string
URL-scheme string
A
URL-path-segment string
must
be
a
path-relative-URL
string
that
does
not
start
with:
a
URL-scheme
string
,
followed
by
U+003A
(:).
A
URL-path-segment
string
zero or more
URL units
single-dot URL path segment
double-dot URL path segment
a
single-dot URL path segment
a
double-dot URL path segment
A
URL-query string
must
be
one
of
the
following:
zero or more URL units excluding U+002F (/) and U+003F (?), that together are not a single-dot URL path segment or a double-dot URL path segment .
A
URL-query
string
URL units
A
URL-fragment string
must
be
zero
or
more
URL
units
.
A
URL-fragment
string
URL units
The
URL code points
ASCII alphanumeric
code points
must
be
zero
or
more
URL
units
.
The
URL
code
points
are
ASCII
alphanumeric
,
U+0021
(!),
U+0024
($),
U+0026
(&),
U+0027
('),
U+0028
LEFT
PARENTHESIS,
U+0029
RIGHT
PARENTHESIS,
U+002A
(*),
U+002B
(+),
U+002C
(,),
U+002D
(-),
U+002E
(.),
U+002F
(/),
U+003A
(:),
U+003B
(;),
U+003D
(=),
U+003F
(?),
U+0040
(@),
U+005F
(_),
U+007E
(~),
and
code
points
in
the
range
U+00A0
to
U+10FFFD,
inclusive,
excluding
surrogates
and
noncharacters
Code points greater than U+007F DELETE will be converted to
percent-encoded bytes
URL parser
In HTML, when the document encoding is a legacy encoding, code points in the
URL-query string
percent-encoded bytes
using the document’s encoding
.
Code points greater than U+007F DELETE will be converted to percent-encoded bytes by the URL parser .
In HTML, when the document encoding is a legacy encoding, code points in the URL-query string that are higher than U+007F DELETE will be converted to percent-encoded bytes using the document’s encoding . This can cause problems if a URL that works in one document is copied to another document that uses a different document encoding. Using the UTF-8 encoding everywhere solves this problem.
For example, consider this HTML document:
<!doctype html> < meta charset = "windows-1252" > < a href = "?smörgåsbord" > Test</ a > Since the document encoding is windows-1252, the link’s
Since
the
document
encoding
is
windows-1252,
the
link’s
URL
’s
’s
query
will be "
will
be
"
sm%F6rg%E5sbord
". If the document encoding had been UTF-8, it would instead be "
".
If
the
document
encoding
had
been
UTF-8,
it
would
instead
be
"
sm%C3%B6rg%C3%A5sbord
".
The
URL units
URL code points
percent-encoded bytes
".
The URL units are URL code points and percent-encoded bytes .
Percent-encoded bytes
URL code points
There is no way to express a
Percent-encoded
bytes
can
be
used
to
encode
code
points
that
are
not
URL
code
points
or
are
excluded
from
being
written.
There
is
no
way
to
express
a
username
or
password
URL record
valid URL string
of
a
URL
record
within
a
valid
URL
string
.
4.4.
4.4.
URL parsing
URL
parsing
The
URL parser
scalar value string
The
URL
parser
takes
a
scalar
value
string
input
base URL
,
with
an
optional
null
or
base
URL
base
(default
null)
and
an
optional
encoding
encoding
(default
UTF-8
), and then runs these steps:
Non-web-browser implementations only need to implement the
basic URL parser
How user input in the web browser’s address bar is converted to a
URL record
URL rendering requirements
Let
),
and
then
runs
these
steps:
Non-web-browser implementations only need to implement the basic URL parser .
How user input in the web browser’s address bar is converted to a URL record is out-of-scope of this standard. This standard does include URL rendering requirements as they pertain trust decisions.
Let url
basic URL parserbe the result of running the basic URL parser on input with base and encodingIf.-
If url
Ifis failure, return failure. If url
’s’s schemeis not "is not "blob", return", return urlSet.-
Set url
’s blob URL entry resolving the blob URL’s blob URL entry to the result of resolving the blob URL urlReturn, if that did not return failure, and null otherwise. Return url
The basic URL parser scalar value string.
The
basic
URL
parser
takes
a
scalar
value
string
input
base URL
,
with
an
optional
null
or
base
URL
base
(default
null),
an
optional
encoding
encoding
(default
UTF-8
), an optional
),
an
optional
URL
url
,
and
an
optional
state
override
state override
state
override
The
,
and
then
runs
these
steps:
The
encoding
argument
is
a
legacy
concept
only
relevant
for
HTML
.
The
url
state override
and
state
override
arguments
are
only
for
use
by
various
APIs.
[HTML]
When the
When
the
url
state override
basic URL parser
and
state
override
arguments
are
not
passed,
the
basic
URL
parser
returns
either
a
new
URL
or
failure.
If
they
are
passed,
the
algorithm
modifies
the
passed
url
If
and
can
terminate
without
returning
anything.
If url
Setis not given:Set url to a new URL
If.-
If input
C0 control or spacecontains any leading or trailing C0 control or space , invalid-URL-unitvalidation error Remove any leading and trailing C0 control or spacevalidation error . Remove any leading and trailing C0 control or space from input
If.
-
If input
ASCII tab or newlinecontains any ASCII tab or newline , invalid-URL-unitvalidation error Remove all ASCII tab or newlinevalidation error . Remove all ASCII tab or newline from input
Let.-
Let state
state override scheme start state Setbe state override if given, or scheme start state otherwise. Set encoding
getting an output encodingto the result of getting an output encodingLetfrom encoding .Let buffer
Letbe the empty string.Let atSignSeen , insideBrackets , and passwordTokenSeen
Letbe false.Let pointer be a pointer for input
Keep running the following state machine by switching on.-
Keep running the following state machine by switching on state . If after a run pointer
EOF code pointpoints to the EOF code point , go to the next step. Otherwise, increase pointer by 1 and continue with the state machine.-
scheme start state Ifscheme start state If c
ASCII alphais an ASCII alpha , append c , lowercased , to buffer , and set statescheme state Otherwise, if state overrideto scheme stateno scheme state.-
Otherwise, if state override is not given, set state to no scheme state and decrease pointer
Otherwise, return failure. This indication of failure is used exclusively by theby 1. Otherwise, return failure.
This indication of failure is used exclusively by the
Locationobject’sprotocolsetter.
-
scheme state Ifscheme state If c
ASCII alphanumericis an ASCII alphanumeric , U+002B (+), U+002D (-), or U+002E (.), append c , lowercased , to bufferOtherwise, if.-
Otherwise, if c
If state override Ifis U+003A (:), then:If state override is given, then:
If url
’s’s schemespecial schemeis a special scheme and bufferspecial scheme Ifis not a special scheme , then return.If url
’s’s schemespecial schemeis not a special scheme and bufferspecial scheme Ifis a special scheme , then return.If url
includes credentialsincludes credentials or has a non-null port , and bufferis "is "file", then return. If", then return.If url
’s’s schemeis "is "file" and its" and its hostempty host Setis an empty host , then return.
Set url
’s’s scheme to bufferIf state override If.-
If state override is given, then:
If url
’s’s port is url’s’s scheme’s default port’s default port , then set url’s’s portReturn. Setto null.Return.
Set buffer
Ifto the empty string.If url
’s’s schemeis "is "file", then: If", then:If remaining
does not start with "does not start with "//",", special-scheme-missing-following-solidusvalidation error Setvalidation error .Set state
file state Otherwise, ifto file state .
Otherwise, if url
is specialis special , base is non-null, and base’s’s scheme is url’s’s scheme :-
Assert : base
is special opaque pathis is special). Set(and therefore does not have an opaque path ). Set state
special relative or authority state Otherwise, ifto special relative or authority state .
-
Otherwise, if url
is specialis special , set statespecial authority slashes state Otherwise, ifto special authority slashes state .Otherwise, if remaining starts with an U+002F (/), set state
path or authority stateto path or authority state and increase pointerOtherwise, setby 1.Otherwise, set url
’s’s path to the empty string and set stateopaque path state Otherwise, if state overrideto opaque path state .
Otherwise, if state override is not given, set buffer to the empty string, state
no scheme stateto no scheme state , and start over (from the first code point in input). Otherwise, return failure. This indication of failure is used exclusively by the).Otherwise, return failure.
This indication of failure is used exclusively by the
Locationobject’sprotocolsetter. Furthermore, the non-failure termination earlier in this state is an intentional difference for defining that setter.
-
no scheme state Ifno scheme state If base is null, or base
opaque pathhas an opaque path and c is not U+0023 (#), missing-scheme-non-relative-URLvalidation error Otherwise, ifvalidation error , return failure.Otherwise, if base
opaque pathhas an opaque path and c is U+0023 (#), set url’s’s scheme to base’s’s scheme , url’s’s path to base’s’s path , url’s’s query to base’s’s query , url’s’s fragment to the empty string, and set statefragment state Otherwise, ifto fragment state .Otherwise, if base
’s’s schemeis not "is not "file", set", set staterelative stateto relative state and decrease pointerOtherwise, setby 1.Otherwise, set state
file stateto file state and decrease pointer by 1.
-
special relative or authority state Ifspecial relative or authority state If c is U+002F (/) and remaining starts with U+002F (/), then set state
special authority ignore slashes stateto special authority ignore slashes state and increase pointerOtherwise,by 1.Otherwise, special-scheme-missing-following-solidus
validation errorvalidation error , set staterelative stateto relative state and decrease pointer by 1.
-
path or authority state Ifpath or authority state If c is U+002F (/), then set state
authority state Otherwise, setto authority statepath state.-
Otherwise, set state to path state , and decrease pointer by 1.
-
relative state Assert:relative state Assert: base
’s’s schemeis not "is not "file". Set".-
If c is U+002F (/), then set state
relative slash state Otherwise, ifto relative slash state . Otherwise, if url
is specialis special and c is U+005C (\), invalid-reverse-solidusvalidation errorvalidation error , set staterelative slash state Otherwise: Setto relative slash state .Otherwise:
Set url
’s’s username to base’s’s username , url’s’s password to base’s’s password , url’s’s host to base’s’s host , url’s’s port to base’s’s port , url’s’s path to a clone of base’s’s path , and url’s’s query to base’s’s queryIf.-
If c is U+003F (?), then set url
’s’s query to the empty string, and statequery state Otherwise, ifto query state . Otherwise, if c is U+0023 (#), set url
’s’s fragment to the empty string and statefragment state Otherwise, ifto fragment state .Otherwise, if c
EOF code point Setis not the EOF code point :Set url
’s’s query to null.-
Set state
path stateto path state and decrease pointer by 1.
-
relative slash state Ifrelative slash state If url
is specialis special and cIfis U+002F (/) or U+005C (\), then:If c is U+005C (\), invalid-reverse-solidus
validation error Setvalidation error .Set state
special authority ignore slashes state Otherwise, ifto special authority ignore slashes state .
Otherwise, if c is U+002F (/), then set state
authority state Otherwise, setto authority state .Otherwise, set url
’s’s username to base’s’s username , url’s’s password to base’s’s password , url’s’s host to base’s’s host , url’s’s port to base’s’s port , statepath stateto path state , and then, decrease pointer by 1.
-
special authority slashes state Ifspecial authority slashes state If c is U+002F (/) and remaining starts with U+002F (/), then set state
special authority ignore slashes stateto special authority ignore slashes state and increase pointerOtherwise,by 1.Otherwise, special-scheme-missing-following-solidus
validation errorvalidation error , set statespecial authority ignore slashes stateto special authority ignore slashes state and decrease pointer by 1.
-
special authority ignore slashes state Ifspecial authority ignore slashes state If c is neither U+002F (/) nor U+005C (\), then set state
authority stateto authority state and decrease pointerOtherwise,by 1.Otherwise, special-scheme-missing-following-solidus
validation errorvalidation error .
-
authority state Ifauthority state If c is U+0040 (@), then:
If atSignSeen
is true, then prepend "is true, then prepend "%40" to" to bufferSet.-
Set atSignSeen
For eachto true. For each codePoint in buffer
If:-
If codePoint is U+003A (:) and passwordTokenSeen is false, then set passwordTokenSeen to true and continue
Let. -
Let encodedCodePoints
UTF-8 percent-encodebe the result of running UTF-8 percent-encode codePointuserinfo percent-encode set Ifusing the userinfo percent-encode set . If passwordTokenSeen is true, then append encodedCodePoints to url
’s’s passwordOtherwise, append.-
Otherwise, append encodedCodePoints to url
’s’s usernameSet.
-
-
Set buffer
Otherwise, if one of the following is true:to the empty string.
Otherwise, if one of the following is true:
c
EOF code pointis the EOF code point , U+002F (/), U+003F (?), or U+0023 (#)url
is specialis special and cthen: Ifis U+005C (\)
then:
If atSignSeen is true and buffer is the empty string, invalid-credentials
validation error Decreasevalidation error , return failure.Decrease pointer by buffer
’s code point length’s code point length + 1, set buffer to the empty string, and set statehost state Otherwise, appendto host state .
Otherwise, append c to buffer .
-
host statehost statehostname state If state overridehostname state If state override is given and url
’s’s schemeis "is "file", then decrease", then decrease pointer by 1 and set statefile host state Otherwise, ifto file host state .Otherwise, if c is U+003A (:) and insideBrackets
Ifis false, then:If buffer is the empty string, host-missing
validation error If state override state override hostname state Letvalidation error , return failure.If state override is given and state override is hostname state , then return.
Let host
host parsingbe the result of host parsing buffer with urlis not special Ifis not special .If host
Setis failure, then return failure.Set url
’s’s host to host , buffer to the empty string, and stateport state Otherwise, if one of the following is true:to port state .
Otherwise, if one of the following is true:
c
EOF code pointis the EOF code point , U+002F (/), U+003F (?), or U+0023 (#)url
is specialis special and cthen decreaseis U+005C (\)
then decrease pointer
Ifby 1, and then:If url
is specialis special and buffer is the empty string, host-missingvalidation error Otherwise, if state overridevalidation error , return failure.Otherwise, if state override is given, buffer is the empty string, and either url
includes credentialsincludes credentials or url’s’s portLetis non-null, return.Let host
host parsingbe the result of host parsing buffer with urlis not special Ifis not special .If host
Setis failure, then return failure.Set url
’s’s host to host , buffer to the empty string, and statepath start state If state override Otherwise: Ifto path start state .If state override is given, then return.
Otherwise:
-
port state Ifport state If c
ASCII digitis an ASCII digit , append c to bufferOtherwise, if one of the following is true:.-
Otherwise, if one of the following is true:
c
EOF code pointis the EOF code point , U+002F (/), U+003F (?), or U+0023 (#)url
is specialis special and c is U+005C (\)state override then: Ifstate override is given
then:
If buffer
Letis not the empty string, then:Let port be the mathematical integer value that is represented by buffer
ASCII digits Ifin radix-10 using ASCII digits for digits with values 0 through 9.If port
is greater than 2is greater than 2 16− 1,− 1, port-out-of-rangevalidation error Setvalidation error , return failure.Set url
’s’s port to null, if port is url’s’s scheme’s default port’s default port; otherwise to; otherwise to portSet.-
Set buffer
If state override Setto the empty string.
If state
path start stateoverride is given, then return.Set state to path start state and decrease pointer
Otherwise,by 1.
Otherwise, port-invalid
validation errorvalidation error , return failure.
-
file state Setfile state Set url
’s’s schemeto "to "file". Set".Set url
’s’s hostIfto the empty string.If c
Ifis U+002F (/) or U+005C (\), then:If c is U+005C (\), invalid-reverse-solidus
validation error Setvalidation error .Set state
file slash state Otherwise, ifto file slash state .
Otherwise, if base is non-null and base
’s’s schemeis "is "file": Set":Set url
’s’s host to base’s’s host , url’s’s path to a clone of base’s’s path , and url’s’s query to base’s’s queryIf.-
If c is U+003F (?), then set url
’s’s query to the empty string and statequery state Otherwise, ifto query state . Otherwise, if c is U+0023 (#), set url
’s’s fragment to the empty string and statefragment state Otherwise, ifto fragment state .Otherwise, if c
EOF code point Setis not the EOF code point :Set url
’s’s queryIf the code point substringto null.If the code point substring from pointer to the end of input
start with a Windows drive letterdoes not start with a Windows drive letter , then shorten url’s’s pathOtherwise:.-
Otherwise:
File-invalid-Windows-drive-letter
validation error Setvalidation error .Set url
’s’s pathThis is a (platform-independent) Windows drive letter quirk. Setto « ».
This is a (platform-independent) Windows drive letter quirk.
Set state
path stateto path state and decrease pointerOtherwise, setby 1.
Otherwise, set state
path stateto path state , and decrease pointer by 1.
-
file slash state Iffile slash state If c
Ifis U+002F (/) or U+005C (\), then:If c is U+005C (\), invalid-reverse-solidus
validation error Setvalidation error .Set state
file host state Otherwise: Ifto file host state .
Otherwise:
If base is non-null and base
’s’s schemeis "is "file", then: Set", then:Set url
’s’s host to base’s’s hostIf the code point substring.-
If the code point substring from pointer to the end of input
start with a Windows drive letterdoes not start with a Windows drive letter and base’s’s path[0] is a normalized Windows drive letter[0] is a normalized Windows drive letter , then append base’s’s path[0] to[0] to url’s’s pathThis is a (platform-independent) Windows drive letter quirk. Set.This is a (platform-independent) Windows drive letter quirk.
Set state
path stateto path state , and decrease pointer by 1.
-
file host state Iffile host state If c
EOF code pointis the EOF code point , U+002F (/), U+005C (\), U+003F (?), or U+0023 (#), then decrease pointerIf state overrideby 1 and then:If state override is not given and buffer
Windows drive letteris a Windows drive letter , file-invalid-Windows-drive-letter-hostvalidation errorvalidation error , set statepath state This is a (platform-independent) Windows drive letter quirk.to path state .This is a (platform-independent) Windows drive letter quirk. buffer
path state Otherwise, ifis not reset here and instead used in the path state .Otherwise, if buffer
Setis the empty string, then:Set url
’s’s hostIf state override Setto the empty string.If state
path start state Otherwise, run these steps: Letoverride is given, then return.Set state to path start state .
Otherwise, run these steps:
Let host
host parsingbe the result of host parsing buffer with urlis not special Ifis not special .If host
Ifis failure, then return failure.If host
is "is "localhost", then set", then set hostSetto the empty string.Set url
’s’s host to hostIf state override Set.-
If state override is given, then return.
Set buffer to the empty string and state
path start state Otherwise, appendto path start state .
Otherwise, append c to buffer .
-
path start state Ifpath start state If url
is special Ifis special , then:If c is U+005C (\), invalid-reverse-solidus
validation error Setvalidation error .Set state
path state Ifto path state .If c is neither U+002F (/) nor U+005C (\), then decrease pointer
Otherwise, if state overrideby 1.
Otherwise, if state override is not given and c is U+003F (?), set url
’s’s query to the empty string and statequery state Otherwise, if state overrideto query state .Otherwise, if state override is not given and c is U+0023 (#), set url
’s’s fragment to the empty string and statefragment state Otherwise, ifto fragment state .Otherwise, if c
EOF code point Setis not the EOF code point :Set state
path state Ifto path state .If c is not U+002F (/), then decrease pointer
Otherwise, if state overrideby 1.
Otherwise, if state override is given and url
’s’s host is null, append the empty string to url’s’s path .
-
path state If one of the following is true:path state If one of the following is true:
c
EOF code pointis the EOF code point or U+002F (/)url
is specialis special and c is U+005C (\)state overridestate override is not given and cthen: Ifis U+003F (?) or U+0023 (#)
then:
If url
is specialis special and c is U+005C (\), invalid-reverse-solidusvalidation error Ifvalidation error .If buffer
double-dot URL path segmentis a double-dot URL path segment , then:Shorten url
’s’s pathIf neither.-
If neither c is U+002F (/), nor url
is specialis special and c is U+005C (\), append the empty string to url’s’s pathThis means that for input.This means that for input
/usr/..the result is/and not a lack of a path.Otherwise, if
Otherwise, if buffer
single-dot URL path segmentis a single-dot URL path segment and if neither c is U+002F (/), nor urlis specialis special and c is U+005C (\), append the empty string to url’s’s pathOtherwise, if.-
Otherwise, if buffer
single-dot URL path segment Ifis not a single-dot URL path segment , then:If url
’s’s schemeis "is "file",", url’s’s pathis emptyis empty , and bufferWindows drive letteris a Windows drive letter , then replace the second code point in bufferThis is a (platform-independent) Windows drive letter quirk.with U+003A (:).This is a (platform-independent) Windows drive letter quirk.
-
Set buffer
Ifto the empty string. If c is U+003F (?), then set url
’s’s query to the empty string and statequery state Ifto query state .If c is U+0023 (#), then set url
’s’s fragment to the empty string and statefragment state Otherwise, run these steps: Ifto fragment state .
Otherwise, run these steps:
If c
URL code pointis not a URL code point and not U+0025 (%), invalid-URL-unitvalidation error Ifvalidation error .If c is U+0025 (%) and remaining
ASCII hex digitsdoes not start with two ASCII hex digits , invalid-URL-unitvalidation errorvalidation error .UTF-8 percent-encodeUTF-8 percent-encode cpath percent-encode setusing the path percent-encode set and append the result to buffer .
-
opaque path state Ifopaque path state If c is U+003F (?), then set url
’s’s query to the empty string and statequery state Otherwise, ifto query state .Otherwise, if c is U+0023 (#), then set url
’s’s fragment to the empty string and statefragment state Otherwise: Ifto fragment state .Otherwise:
If c
EOF code point URL code pointis not the EOF code point , not a URL code point , and not U+0025 (%), invalid-URL-unitvalidation error Ifvalidation error .If c is U+0025 (%) and remaining
ASCII hex digitsdoes not start with two ASCII hex digits , invalid-URL-unitvalidation error Ifvalidation error .If c
EOF code point UTF-8 percent-encodeis not the EOF code point , UTF-8 percent-encode cC0 control percent-encode setusing the C0 control percent-encode set and append the result to url’s’s path .
-
query state Ifquery state If encoding is not UTF-8 and one of the following is true:
url
is not specialis not specialurl
’s’s schemeis "is "ws" or "" or "wss" then set"
then set encoding to UTF-8
If one of the following is true:.-
If one of the following is true:
state overridestate override is not given and c is U+0023 (#)
then:
Let queryPercentEncodeSet
special-query percent-encode setbe the special-query percent-encode set if urlis specialis special; otherwise the query percent-encode set; otherwise the query percent-encode set .Percent-encode after encodingPercent-encode after encoding , with encoding , buffer , and queryPercentEncodeSet , and append the result to url’s’s queryThis operation cannot be invoked code-point-for-code-point due to the stateful ISO-2022-JP encoder Set.This operation cannot be invoked code-point-for-code-point due to the stateful ISO-2022-JP encoder .
Set buffer
Ifto the empty string.If c is U+0023 (#), then set url
’s’s fragmentfragment state Otherwise, ifto the empty string and state to fragment state .
Otherwise, if c
EOF code point Ifis not the EOF code point :If c
URL code pointis not a URL code point and not U+0025 (%), invalid-URL-unitvalidation error Ifvalidation error .If c is U+0025 (%) and remaining
ASCII hex digitsdoes not start with two ASCII hex digits , invalid-URL-unitvalidation error Appendvalidation error .Append c to buffer .
-
fragment state Iffragment state If c
EOF code point Ifis not the EOF code point , then:If c
URL code pointis not a URL code point and not U+0025 (%), invalid-URL-unitvalidation error Ifvalidation error .If c is U+0025 (%) and remaining
ASCII hex digitsdoes not start with two ASCII hex digits , invalid-URL-unitvalidation errorvalidation error .UTF-8 percent-encodeUTF-8 percent-encode cfragment percent-encode setusing the fragment percent-encode set and append the result to url’s’s fragmentReturn.
-
-
Return url
To set the username.
To
set
the
username
given
a
url
and
username
,
set
url
’s
’s
username
UTF-8 percent-encode
to
the
result
of
running
UTF-8
percent-encode
on
username
userinfo percent-encode set
To
set the password
using
the
userinfo
percent-encode
set
.
To
set
the
password
given
a
url
and
password
,
set
url
’s
’s
password
UTF-8 percent-encode
to
the
result
of
running
UTF-8
percent-encode
on
password
userinfo percent-encode set
using
the
userinfo
percent-encode
set
.
4.5.
4.5.
URL serializing
URL
serializing
The
URL serializer
The
URL
serializer
takes
a
URL
url
,
with
an
optional
boolean
exclude fragment
exclude
fragment
ASCII string
Let
(default
false),
and
then
runs
these
steps.
They
return
an
ASCII
string
.
Let output be url
’s’s schemeIfand U+003A (:) concatenated.If url
’s’s hostAppend "is non-null:Append "
//" to" to outputIf.-
If url
includes credentials Appendincludes credentials , then: -
Append url
’s’s host , serialized , to outputIf. -
If url
’s’s port is non-null, append U+003A (:) followed by url’s’s port , serialized , to outputIf.
-
If url
’s’s host is null, urlopaque pathdoes not have an opaque path , url’s’s path’s’s size is greater than 1, and url’s’s path[0] is the empty string, then append U+002F (/) followed by U+002E (.) to[0] is the empty string, then append U+002F (/) followed by U+002E (.) to outputThis prevents.This prevents
web+demo:/.//not-a-host/orweb+demo:/path/..//not-a-host/, when parsed and then serialized , from ending up asweb+demo://not-a-host/(they end up asweb+demo:/.//not-a-host/). Append the result of URL path serializing). Append the result of URL path serializing url to output
If.-
If url
’s’s query is non-null, append U+003F (?), followed by url’s’s query , to outputIf exclude fragment. -
If exclude fragment is false and url
’s’s fragment is non-null, then append U+0023 (#), followed by url’s’s fragment , to outputReturn. -
Return output
The URL path serializer.
The
URL
path
serializer
takes
a
URL
url
ASCII string
If
and
then
runs
these
steps.
They
return
an
ASCII
string
.
If url
opaque pathhas an opaque path , then return url’s’s pathLet.-
Let output be the empty string.
For eachFor each segment of url’s’s path : append U+002F (/) followed by segment to outputReturn.-
Return output .
4.6.
4.6.
URL equivalence
URL
equivalence
To determine whether a
To
determine
whether
a
URL
A
equals
URL
B
,
with
an
optional
boolean
exclude fragments
exclude
fragments
Let
(default
false),
run
these
steps:
Let serializedA be the result of serializing A , with
exclude fragmentexclude fragmentexclude fragments Letset to exclude fragments .Let serializedB be the result of serializing B , with
exclude fragmentexclude fragmentexclude fragments Return true ifset to exclude fragments .Return true if serializedA is serializedB
; otherwise false.; otherwise false.
4.7.
4.7.
Origin
See
See
origin
’s definition in
’s
definition
in
HTML
for
the
necessary
background
information.
[HTML]
The
The
origin
of
a
URL
url
is
the
origin
returned
by
running
these
steps,
switching
on
url
’s
’s
scheme
:
-
"
blob" If" If url
’s blob URL entry’s blob URL entry is non-null, then return url’s blob URL entry’s blob URL entry’s’s environment’s’s originLet.-
Let pathURL be the result of parsing
URL path serializingthe result of URL path serializing urlIf. -
If pathURL
opaque origin Returnis failure, then return a new opaque origin . Return pathURL
’s’s origin .
TheThe origin ofblob:https://whatwg.org/d0360e2f-caee-469f-9a2f-87d5b0456f6fis the tuple origin ("tuple origin ("https", "", "whatwg.org", null, null).", null, null).-
"
ftp""- "
http""- "
https""- "
ws""- "
wss" Return the tuple origin (" - "
Return the tuple origin ( url
’s’s scheme , url’s’s host , url’s’s port , null).-
"
file" Unfortunate as it is, this is left as an exercise to the reader. When in doubt, return a new opaque origin Otherwise Return a new opaque origin This does indeed mean that these" Unfortunate as it is, this is left as an exercise to the reader. When in doubt, return a new opaque origin .
- Otherwise
Return a new opaque origin .
This does indeed mean that these URLs
same origincannot be same origin with themselves.
4.8.
4.8.
URL rendering
URL
rendering
A
A URL should be rendered in its serialized form, with modifications described below, when the primary purpose of displaying a URL is to have the user make a security or trust decision. For example, users are expected to make trust decisions based on a URL rendered in the browser address bar.
4.8.1.
4.8.1.
Simplify non-human-readable or irrelevant components
Simplify
non-human-readable
or
irrelevant
components
Remove components that can provide opportunities for spoofing or distract from security-relevant information:
Browsers may render only a URL’s
Remove components that can provide opportunities for spoofing or distract from security-relevant information:
Browsers may render only a URL’s host in places where it is important for end users to distinguish between the host and other parts of the URL such as the path
registrable domain. Browsers may consider simplifying the host further to draw attention to its registrable domain . For example, browsers may omit a leadingwwwormdomain label to simplify the host, or display its registrable domain only to remove spoofing opportunities posted by subdomains (e.g.,domain labelhttps://examplecorp.attacker.com/). Browsers should not render a).Browsers should not render a URL
’s’s username and password , as they can be mistaken for a URL’s’s host (e.g.,https://examplecorp.com@attacker.example/). Browsers may render a URL without its).Browsers may render a URL without its scheme if the display surface only ever permits a single scheme (such as a browser feature that omits
https://because it is only enabled for secure origins). Otherwise, the scheme may be replaced or supplemented with a human-readable string (e.g., "Not secure"), a security indicator icon, or both.
4.8.2.
4.8.2.
Elision
In a space-constrained display, URLs should be elided carefully to avoid misleading the user when making a security decision:
Browsers should ensure that at least the
registrable domain
...examplecorp.com
In a space-constrained display, URLs should be elided carefully to avoid misleading the user when making a security decision:
Browsers should ensure that at least the registrable domain can be shown when the URL is rendered (to avoid showing, e.g.,
...examplecorp.comwhen loadinghttps://not-really-examplecorp.com/). When the full).When the full host
domain labelscannot be rendered, browsers should elide domain labels starting from the lowest-level domain label. For example,examplecorp.com.evil.comshould be elided as...com.evil.com...com.evil.com, notexamplecorp.com.... (Note that bidirectional text means that the lowest-level domain label may not appear on the left.)
4.8.3.
4.8.3.
Internationalization and special characters
Internationalization
and
special
characters
Internationalized domain names (IDNs), special characters, and bidirectional text should be handled with care to prevent spoofing:
Browsers should render a
Internationalized domain names (IDNs), special characters, and bidirectional text should be handled with care to prevent spoofing:
Browsers should render a URL
’s’s hostdomain to Unicodeby running domain to Unicode with the URL’s’s hostVarious characters can be used in homograph spoofing attacks. Consider detecting confusable characters and warning when they are in use.and false.Various characters can be used in homograph spoofing attacks. Consider detecting confusable characters and warning when they are in use. [IDNFAQ] [UTS39]
URLs are particularly prone to confusion between host and path when they contain bidirectional text, so in this case it is particularly advisable to only render a URL’s-
URLs are particularly prone to confusion between host and path when they contain bidirectional text, so in this case it is particularly advisable to only render a URL’s host . For readability, other parts of the URL
percent-encoded bytes UTF-8 decode without BOM, if rendered, should have their sequences of percent-encoded bytes replaced with code points resulting from running UTF-8 decode without BOM on the percent-decodingBrowsers should render bidirectional text as if it were in a left-to-right embedding.of those sequences, unless that renders those sequences invisible. Browsers may choose to not decode certain sequences that present spoofing risks (e.g., U+1F512 (🔒)). Browsers should render bidirectional text as if it were in a left-to-right embedding. [BIDI]
Unfortunately, as renderedUnfortunately, as rendered URLs are strings and can appear anywhere, a specific bidirectional algorithm for rendered URLs would not see wide adoption. Bidirectional text interacts with the parts of a URL in ways that can cause the rendering to be different from the model. Users of bidirectional languages can come to expect this, particularly in plain text environments.
5.
5.
application/x-www-form-urlencoded
The
The
application/x-www-form-urlencoded
format
provides
a
way
to
encode
a
list
of
tuples
The
,
each
consisting
of
a
name
and
a
value.
The
application/x-www-form-urlencoded
format
is
in
many
ways
an
aberrant
monstrosity,
the
result
of
many
years
of
implementation
accidents
and
compromises
leading
to
a
set
of
requirements
necessary
for
interoperability,
but
in
no
way
representing
good
design
practices.
In
particular,
readers
are
cautioned
to
pay
close
attention
to
the
twisted
details
involving
repeated
(and
in
some
cases
nested)
conversions
between
character
encodings
and
byte
sequences.
Unfortunately
the
format
is
in
widespread
use
due
to
the
prevalence
of
HTML
forms.
[HTML]
5.1.
5.1.
application/x-www-form-urlencoded
parsing
parsing
A legacy server-oriented implementation might have to support
A
legacy
server-oriented
implementation
might
have
to
support
encodings
other
than
UTF-8
as well as have special logic for tuples of which the name is `
as
well
as
have
special
logic
for
tuples
of
which
the
name
is
`
_charset
`. Such logic is not described here as only
`.
Such
logic
is
not
described
here
as
only
UTF-8
The
is
conforming.
The
application/x-www-form-urlencoded
parser
takes
a
byte
sequence
input
parser
Let
,
and
then
runs
these
steps:
Let sequences be the result of splitting input
Leton 0x26 (&).Let output be an initially empty list of name-value tuples where both name and value hold a string.
For eachFor each byte sequence bytes in sequencesIf:-
If bytes is the empty byte sequence, then continue
If. -
If bytes contains a 0x3D (=), then let name be the bytes from the start of bytes up to but excluding its first 0x3D (=), and let value be the bytes, if any, after the first 0x3D (=) up to the end of bytes . If 0x3D (=) is the first byte, then name will be the empty byte sequence. If it is the last, then value
Otherwise, letwill be the empty byte sequence. Otherwise, let name have the value of bytes and let value
Replace any 0x2B (+) inbe the empty byte sequence.Replace any 0x2B (+) in name and value
Letwith 0x20 (SP).Let nameString and valueString
UTF-8 decode without BOMbe the result of running UTF-8 decode without BOM on the percent-decoding of name and value , respectively.Append
(( nameString , valueString) to) to outputReturn.
-
-
Return output .
5.2.
5.2.
application/x-www-form-urlencoded
serializing
serializing
The
The
application/x-www-form-urlencoded
serializer
takes
a
list
of
name-value
tuples
tuples
,
with
an
optional
encoding
encoding
(default
UTF-8
serializer
), and then runs these steps. They return an
ASCII string
Set
),
and
then
runs
these
steps.
They
return
an
ASCII
string
.
Set encoding
getting an output encodingto the result of getting an output encodingLetfrom encoding .Let output be the empty string.
For eachFor each tuple of tuples :-
Assert : tuple
’s name and’s name and tuple’s value are scalar value strings Let’s value are scalar value strings . Let name
percent-encode after encodingbe the result of running percent-encode after encoding with encoding , tuple’s name, the’s name, theapplication/x-www-form-urlencodedpercent-encode set , and true.percent-encode set LetLet value
percent-encode after encodingbe the result of running percent-encode after encoding with encoding , tuple’s value, the’s value, theapplication/x-www-form-urlencodedpercent-encode set , and true.percent-encode set IfIf output is not the empty string, then append U+0026 (&) to output
Append.-
Append
name
,
followed
by
U+003D
(=),
followed
by
value
,
to
output
Return.
-
- Return output .
5.3.
5.3.
Hooks
The
The
application/x-www-form-urlencoded
string
parser
takes
a
scalar
value
string
input
string parser
scalar value string
UTF-8 encodes
,
UTF-8
encodes
it,
and
then
returns
the
result
of
application/x-www-form-urlencoded
parsing
it.
parsing
6.
6.
API
This section uses terminology from
Web IDL
This section uses terminology from Web IDL . Browser user agents must support this API. JavaScript implementations should support this API. Other user agents or programming languages are encouraged to use an API suitable to their needs, which might not be this one. [WEBIDL]
6.1.
6.1.
URL class
URL
class
[Exposed=*,
[Exposed=*,LegacyWindowAlias =]webkitURL interface {URL constructor (USVString ,url optional USVString );base ; ; ; ; ; ; ; ; ; ;static boolean canParse (USVString ,url optional USVString );base stringifier attribute USVString href ;readonly attribute USVString origin ;attribute USVString protocol ;attribute USVString username ;attribute USVString password ;attribute USVString host ;attribute USVString hostname ;attribute USVString port ;attribute USVString pathname ;attribute USVString search ; [SameObject ]readonly attribute URLSearchParams searchParams ;;attribute USVString hash ;();USVString toJSON (); };A
A
URL
object
has
an
associated:
- URL : a URL .
-
query objectquery object : aURLSearchParamsobject.To potentially strip trailing spaces from an opaque path
To
potentially
strip
trailing
spaces
from
an
opaque
path
given
a
URL
object
url
If
:
-
If url
’s’s URLopaque path Ifdoes not have an opaque path , then return. If url
’s’s URL’s’s fragmentIfis non-null, then return.If url
’s’s URL’s’s queryRemove all trailing U+0020 SPACE code pointsis non-null, then return.Remove all trailing U+0020 SPACE code points from url
’s’s URL’s’s pathThe new URL(.
The
API
URL
parser
takes
a
scalar
value
string
url
and
an
optional
null-or-
scalar
value
string
base
)
Let
(default
null),
and
then
runs
these
steps:
Let parsedBase
Ifbe null.If base
Letis non-null:Set parsedBase
basic URL parserto the result of running the basic URL parser on baseIf.-
If parsedBase
throw TypeErroris failure, then return failure.
Return the result of running the basic URL parser on url with parsedBase .
The
Let
new
URL(
url
,
base
)
constructor
steps
are:
Let parsedURL
basic URL parserbe the result of running the API URL parser on urlparsedBase Ifwith base , if given.-
Let query be parsedURL
’s’s querySet, if that is non-null, and the empty string otherwise. -
Set this
’s query object’s query object to a newURLSearchParamsobject. Initialize this
’s query object’s querySetobject with query .Set this
’s query object’s query object’s URL object’s URL object to this .
To
parse
a
string
into
a
URL
base URL
without
using
a
base
URL
,
invoke
the
URL
constructor
with
a
single
argument:
var input= "https://example.org/💩" , url= new URL( input) url. pathname // "/%F0%9F%92%A9"pathname// "/%F0%9F%92%A9" This throws an exception if the input is a relative-URL string
This throws an exception if the input is a relative-URL string :
try { var url= new URL( "/🍣🍺" ) } catch ( e) { // that happened } For those cases a base URL
For those cases a base URL is necessary:
var input= "/🍣🍺" , url= new URL( input, document. baseURI) url. href // "https://url.spec.whatwg.org/%F0%9F%8D%A3%F0%9F%8D%BA"href// "https://url.spec.whatwg.org/%F0%9F%8D%A3%F0%9F%8D%BA" A
A
URL
object
can
be
used
as
a
base
URL
(as
the
IDL
requires
a
string
as
argument,
a
base URL
URL
object
stringifies
to
its
href
getter
return
value):
getter return value):
var url= new URL( "🏳️🌈" , new URL( "https://pride.example/hello-world" )) url. pathname // "/%F0%9F%8F%B3%EF%B8%8F%E2%80%8D%F0%9F%8C%88"pathname// "/%F0%9F%8F%B3%EF%B8%8F%E2%80%8D%F0%9F%8C%88" The
The
static
canParse(
url
,
base
)
method
steps
are:
Let parsedURL be the result of running the API URL parser on url with base , if given.
If parsedURL is failure, then return false.
Return true.
The
href
getter
steps
and
the
toJSON()
method
steps
are
to
return
the
serialization
of
this
’s
’s
URL
The
.
The
href
setter
steps
are:
Let
Let parsedURL
basic URL parser Ifbe the result of running the basic URL parser on the given value.-
Empty this
’s query object’s query object’s’s listLet. -
If query is non-null, then set this
’s query object’s query object’s’s list to the result of parsing queryThe.
The
origin
getter
steps
are
to
return
the
serialization
of
this
’s
’s
URL
’s
’s
origin
.
[HTML]
The
The
protocol
getter
steps
are
to
return
this
’s
’s
URL
’s
’s
scheme
The
,
followed
by
U+003A
(:).
The
protocol
setter
steps
are
to
basic
URL
parse
the
given
value,
followed
by
U+003A
(:),
with
this
basic URL parse
’s
’s
URL
as
url
scheme start state
and
scheme
start
state
as
state override
state
override
The
.
The
username
getter
steps
are
to
return
this
’s
’s
URL
’s
’s
username
The
.
The
username
setter
steps
are:
If
If this
’s’s URLcannot have a username/password/portcannot have a username/password/port , then return.Set the usernameSet the username given this’s’s URLTheand the given value.
The
password
getter
steps
are
to
return
this
’s
’s
URL
’s
’s
password
The
.
The
password
setter
steps
are:
If
If this
’s’s URLcannot have a username/password/portcannot have a username/password/port , then return.Set the passwordSet the password given this’s’s URLTheand the given value.
The
host
Let
getter
steps
are:
-
If url
’s’s hostIfis null, then return the empty string. If url
’s’s port is null, return url’s’s host , serializedReturn.-
Return url
’s’s host , serialized , followed by U+003A (:) and url’s’s port , serializedThe.
The
host
setter
steps
are:
If
If this
’s’s URLopaque pathhas an opaque path , then return.Basic URL parseBasic URL parse the given value with this’s’s URL as urlhost stateand host state asstate overridestate overrideIf the given value for the.
If
the
given
value
for
the
host
setter
lacks
a
port
,
this
’s
’s
URL
’s
’s
port
will
not
change.
This
can
be
unexpected
as
host
getter
does
return
a
URL-port
string
so
one
might
have
assumed
the
setter
to
always
"reset"
both.
URL-port string
The
The
hostname
If
getter
steps
are:
If this
’s’s URL’s’s hostReturnis null, then return the empty string.Return this
’s’s URL’s’s host , serializedThe.
The
hostname
setter
steps
are:
If
If this
’s’s URLopaque pathhas an opaque path , then return.Basic URL parseBasic URL parse the given value with this’s’s URL as urlhostname stateand hostname state asstate overridestate overrideThe.
The
port
If
getter
steps
are:
If this
’s’s URL’s’s portReturnis null, then return the empty string.Return this
’s’s URL’s’s port , serializedThe.
The
port
setter
steps
are:
If
If this
’s’s URLcannot have a username/password/port If the given value is the empty string, then setcannot have a username/password/port , then return.If the given value is the empty string, then set this
’s’s URL’s’s portto null. Otherwise, basic URL parseto null.Otherwise, basic URL parse the given value with this
’s’s URL as urlport stateand port state asstate overridestate overrideThe.
The
pathname
URL path serializing
getter
steps
are
to
return
the
result
of
URL
path
serializing
this
’s
’s
URL
The
.
The
pathname
setter
steps
are:
If
If this
’s’s URLopaque pathhas an opaque path , then return.-
Basic URL parseBasic URL parse the given value with this’s’s URL as urlpath start stateand path start state asstate overridestate overrideThe.
The
search
If
getter
steps
are:
If this
’s’s URL’s’s queryReturn U+003F (?), followed byis either null or the empty string, then return the empty string.Return U+003F (?), followed by this
’s’s URL’s’s queryThe.
The
search
setter
steps
are:
Let
Let url be this
’s’s URLIf the given value is the empty string: Set.-
If the given value is the empty string:
Set url
’s’s query to null.Empty this
’s query object’s query object’s’s list .-
Potentially strip trailing spaces from an opaque pathPotentially strip trailing spaces from an opaque path with thisReturn. Let. -
Return.
Let input
Setbe the given value with a single leading U+003F (?) removed, if any.Set url
’s’s query to the empty string.Basic URL parseBasic URL parse input with url as urlquery stateand query state asstate overridestate overrideSet.-
Set this
’s query object’s query object’s’s list to the result of parsing inputThe.
The
search
setter
has
the
potential
to
remove
trailing
U+0020
SPACE
code
points
from
this
code points
’s
’s
URL
’s
’s
path
URL parser
URL serializer
.
It
does
this
so
that
running
the
URL
parser
’s
on
the
output
of
running
the
URL
serializer
on
this
’s
URL
does
not
yield
a
URL
that
is
not
equal
The
.
The
searchParams
getter
steps
are
to
return
this
’s
query object
The
’s
query
object
.
The
hash
If
getter
steps
are:
If this
’s’s URL’s’s fragmentReturn U+0023 (#), followed byis either null or the empty string, then return the empty string.Return U+0023 (#), followed by this
’s’s URL’s’s fragmentThe.
The
hash
setter
steps
are:
If the given value is the empty string:
Set
If the given value is the empty string:
Let input
Setbe the given value with a single leading U+0023 (#) removed, if any.Basic URL parseBasic URL parse input with this’s’s URL as urlfragment stateand fragment state asstate overridestate overrideThe.
The
hash
setter
has
the
potential
to
change
this
’s
’s
URL
’s
’s
path
in
a
manner
equivalent
to
the
search
setter.
6.2.
6.2.
URLSearchParams class
URLSearchParams
class
[Exposed=*]
[Exposed=*]interface {URLSearchParams = "");constructor (optional (sequence <sequence <USVString >>or record <USVString ,USVString >or USVString )= "");init readonly attribute unsigned long size ;); ); ); ); ); );undefined append (USVString ,name USVString );value undefined delete (USVString );name USVString ?get (USVString );name sequence <USVString >getAll (USVString );name boolean has (USVString );name undefined set (USVString ,name USVString );value undefined sort ();>;iterable <USVString ,USVString >;stringifier ; };
Constructing
and
stringifying
a
URLSearchParams
object
is
fairly
straightforward:
let params= new URLSearchParams({ key: "730d67" }) params. toString() // "key=730d67"// "key=730d67" As a
As
a
URLSearchParams
object
uses
the
application/x-www-form-urlencoded
format
underneath
there
are
some
difference
with
how
it
encodes
certain
code
points
compared
to
a
URL
object
(including
href
and
search
). This can be especially surprising when using
).
This
can
be
especially
surprising
when
using
searchParams
to
operate
on
a
URL
’s
’s
query
.
const url = new URL( 'https://example.com/?a=b ~' );
console. log( url. href); // "https://example.com/?a=b%20~"
url. searchParams. sort();
console
.
log
(
url
.
href
);
// "https://example.com/?a=b+%7E"
//
"https://example.com/?a=b+%7E"
const url = new URL( 'https://example.com/?a=~&b=%7E' );
console. log( url. search); // "?a=~&b=%7E"
console. log( url. searchParams. get( 'a' )); // "~"
console
.
log
(
url
.
searchParams
.
get
(
'b'
));
// "~"
//
"~"
URLSearchParams
objects
will
percent-encode
anything
in
the
application/x-www-form-urlencoded
percent-encode
set
,
and
will
encode
U+0020
SPACE
as
U+002B
(+).
percent-encode set
Ignoring encodings (use
Ignoring
encodings
(use
UTF-8
),
),
search
will
percent-encode
anything
in
the
query
percent-encode
set
or
the
special-query
percent-encode
set
(depending
on
whether
or
not
the
URL
query percent-encode set
special-query percent-encode set
is special
).
A
is
special
).
A
URLSearchParams
object
has
an
associated:
- list : a list of tuples each consisting of a name and a value, initially empty.
-
URL objectURL object : null or aURLobject, initially null.A
A
URLSearchParams
object
with
a
non-null
URL
object
has
the
potential
to
change
that
object’s
path
in
a
manner
equivalent
to
the
URL object
URL
object’s
search
and
hash
setters.
To
To
initialize
a
URLSearchParams
object
query
with
init
If
,
run
these
steps:
If init is a sequence
for each, then for each innerSequence of initIf:-
If innerSequence
’s’s size is not 2, then throw aTypeError. -
Append
(( innerSequence[0],[0], innerSequence[1]) to[1]) to query’s’s listOtherwise, if.
-
-
Otherwise, if init is a record
for each, then for each name → value of init , append(( name , value) to) to query’s’s listOtherwise: Assert:. -
Otherwise:
To
update
a
URLSearchParams
object
query
If
,
run
these
steps:
If query
’s URL object Let’s URL object is null, then return.Let serializedQuery be the serialization of query
’s’s listIf.-
If serializedQuery is the empty string, then set serializedQuery
Setto null. Set query
’s URL object’s URL object’s’s URL’s’s query to serializedQueryIf.-
If serializedQuery
potentially strip trailing spaces from an opaque pathis null, then potentially strip trailing spaces from an opaque path with query’s URL object The’s URL object .
The
new URLSearchParams(
new
URLSearchParams(
init
)
constructor steps are:
If
constructor
steps
are:
If init is a string and starts with U+003F (?), then remove the first code point from init .
-
Initialize this with init
The.
The
size
getter
steps
are
to
return
this
’s
’s
list
’s
’s
size
The
.
The
append(
name
,
value
)
method
steps
are:
The
delete(
name
)
method
steps
are:
The
get(
name
)
method
steps
are
to
return
the
value
of
the
first
tuple
whose
name
is
name
in
this
’s
’s
list
,
if
there
is
such
a
tuple
; otherwise null.
The
;
otherwise
null.
The
getAll(
name
)
method
steps
are
to
return
the
values
of
all
tuples
whose
name
is
name
in
this
’s
’s
list
The
,
in
list
order;
otherwise
the
empty
sequence.
The
has(
name
)
method
steps
are
to
return
true
if
there
is
a
tuple
whose
name
is
name
in
this
’s
’s
list
; otherwise false.
The
;
otherwise
false.
The
set(
name
,
value
)
If
method
steps
are:
If this
’s’s list contains any tuples whose name is name , then set the value of the first such tuple to value and removeOtherwise,the others.Otherwise, append
(( name , value) to) to this’s’s list .
It
can
be
useful
to
sort
the
name-value
tuples
in
a
URLSearchParams
object,
in
particular
to
increase
cache
hits.
This
can
be
accomplished
through
invoking
the
sort()
method:
const url= new URL( "https://example.org/?q=🏳️🌈&key=e1f7bc78" ); url. searchParams. sort(); url. search; // "?key=e1f7bc78&q=%F0%9F%8F%B3%EF%B8%8F%E2%80%8D%F0%9F%8C%88"// "?key=e1f7bc78&q=%F0%9F%8F%B3%EF%B8%8F%E2%80%8D%F0%9F%8C%88" To avoid altering the original input, e.g., for comparison purposes, construct a new
To
avoid
altering
the
original
input,
e.g.,
for
comparison
purposes,
construct
a
new
URLSearchParams
object:
const sorted= new URLSearchParams( url. search) sorted. sort() The
The
sort()
Sort all
method
steps
are:
Sort all tuples in this
’s’s list , if any, by their names. Sorting must be done by comparison of code units. The relative order between tuples with equal names must be preserved.
The
value
pairs
to
iterate
over
are
this
’s
’s
list
’s
’s
tuples
The
stringification behavior
with
the
key
being
the
name
and
the
value
being
the
value.
The
stringification
behavior
steps
are
to
return
the
serialization
of
this
’s
’s
list
.
6.3.
6.3.
URL APIs elsewhere
URL
APIs
elsewhere
A standard that exposes
A
standard
that
exposes
URLs
,
should
expose
the
URL
as
a
string
(by
serializing
an
internal
URL
). A standard should not expose a
).
A
standard
should
not
expose
a
URL
using
a
URL
object.
URL
objects
are
meant
for
URL
The higher-level notion here is that values are to be exposed as immutable data structures.
If a standard decides to use a variant of the name "URL" for a feature it defines, it should name such a feature "url" (i.e., lowercase and with an "l" at the end). Names such as "URL", "URI", and "IRI" should not be used. However, if the name is a compound, "URL" (i.e., uppercase) is preferred, e.g., "newURL" and "oldURL".
The
manipulation.
In
IDL
the
USVString
type
should
be
used.
The higher-level notion here is that values are to be exposed as immutable data structures.
If a standard decides to use a variant of the name "URL" for a feature it defines, it should name such a feature "url" (i.e., lowercase and with an "l" at the end). Names such as "URL", "URI", and "IRI" should not be used. However, if the name is a compound, "URL" (i.e., uppercase) is preferred, e.g., "newURL" and "oldURL".
The
EventSource
and
HashChangeEvent
interfaces
in
HTML
are
examples
of
proper
naming.
[HTML]
Acknowledgments
There have been a lot of people that have helped make
There
have
been
a
lot
of
people
that
have
helped
make
URLs
With that, many thanks to 100の人, Adam Barth, Addison Phillips, Adrián Chaves, Albert Wiersch, Alex Christensen, Alexis Hunt, Alexandre Morgaut, Alexis Hunt, Alwin Blok, Andrew Sullivan, Arkadiusz Michalski, Behnam Esfahbod, Bobby Holley, Boris Zbarsky, Brad Hill, Brandon Ross, Chris Dumez, Chris Rebert, Corey Farwell, Dan Appelquist, Daniel Bratell, Daniel Stenberg, David Burns, David Håsäther, David Sheets, David Singer, David Walp, Domenic Denicola, Emily Schechter, Emily Stark, Eric Lawrence, Erik Arvidsson, Gavin Carothers, Geoff Richards, Glenn Maynard, Gordon P. Hemsley, Henri Sivonen, Ian Hickson, Ilya Grigorik, Italo A. Casas, Jakub Gieryluk, James Graham, James Manger, James Ross, Jeff Hodges, Jeffrey Posnick, Jeffrey Yasskin, Joe Duarte, Joshua Bell, Jxck, Karl Wagner, 田村健人 (Kent TAMURA), Kevin Grandon, Kornel Lesiński, Larry Masinter, Leif Halvard Silli, Mark Amery, Mark Davis, Marcos Cáceres, Marijn Kruisselbrink, Martin Dürst, Mathias Bynens, Matt Falkenhagen, Matt Giuca, Michael Peick, Michael™ Smith, Michal Bukovský, Michel Suignard, Mikaël Geljić, Noah Levitt, Peter Occil, Philip Jägenstedt, Philippe Ombredanne, Prayag Verma, Rimas Misevičius, Robert Kieffer, Rodney Rehm, Roy Fielding, Ryan Sleevi, Sam Ruby, Sam Sneddon, Santiago M. Mola, Sebastian Mayr, Simon Pieters, Simon Sapin, Steven Vachon, Stuart Cook, Sven Uhlig, Tab Atkins, 吉野剛史 (Takeshi Yoshino), Tantek Çelik, Tiancheng "Timothy" Gu, Tim Berners-Lee, 簡冠庭 (Tim Guan-tin Chien), Titi_Alone, Tomek Wytrębowicz, Trevor Rowbotham, Tristan Seligmann, Valentin Gosu, Vyacheslav Matva, Wei Wang, Wolf Lammen, 山岸和利 (Yamagishi Kazutoshi), Yongsheng Zhang, 成瀬ゆい (Yui Naruse), and zealousidealroll for being awesome!
This standard is written by
Anne van Kesteren
(
more
interoperable
over
the
years
and
thereby
furthered
the
goals
of
this
standard.
Likewise
many
people
have
helped
making
this
standard
what
it
is
today.
With that, many thanks to 100の人, Adam Barth, Addison Phillips, Adrián Chaves, Albert Wiersch, Alex Christensen, Alexis Hunt, Alexandre Morgaut, Alexis Hunt, Alwin Blok, Andrew Sullivan, Arkadiusz Michalski, Behnam Esfahbod, Bobby Holley, Boris Zbarsky, Brad Hill, Brandon Ross, Chris Dumez, Chris Rebert, Corey Farwell, Dan Appelquist, Daniel Bratell, Daniel Stenberg, David Burns, David Håsäther, David Sheets, David Singer, David Walp, Domenic Denicola, Emily Schechter, Emily Stark, Eric Lawrence, Erik Arvidsson, Gavin Carothers, Geoff Richards, Glenn Maynard, Gordon P. Hemsley, hemanth, Henri Sivonen, Ian Hickson, Ilya Grigorik, Italo A. Casas, Jakub Gieryluk, James Graham, James Manger, James Ross, Jeff Hodges, Jeffrey Posnick, Jeffrey Yasskin, Joe Duarte, Joshua Bell, Jxck, Karl Wagner, 田村健人 (Kent TAMURA), Kevin Grandon, Kornel Lesiński, Larry Masinter, Leif Halvard Silli, Mark Amery, Mark Davis, Marcos Cáceres, Marijn Kruisselbrink, Martin Dürst, Mathias Bynens, Matt Falkenhagen, Matt Giuca, Michael Peick, Michael™ Smith, Michal Bukovský, Michel Suignard, Mikaël Geljić, Noah Levitt, Peter Occil, Philip Jägenstedt, Philippe Ombredanne, Prayag Verma, Rimas Misevičius, Robert Kieffer, Rodney Rehm, Roy Fielding, Ryan Sleevi, Sam Ruby, Sam Sneddon, Santiago M. Mola, Sebastian Mayr, Simon Pieters, Simon Sapin, Steven Vachon, Stuart Cook, Sven Uhlig, Tab Atkins, 吉野剛史 (Takeshi Yoshino), Tantek Çelik, Tiancheng "Timothy" Gu, Tim Berners-Lee, 簡冠庭 (Tim Guan-tin Chien), Titi_Alone, Tomek Wytrębowicz, Trevor Rowbotham, Tristan Seligmann, Valentin Gosu, Vyacheslav Matva, Wei Wang, Wolf Lammen, 山岸和利 (Yamagishi Kazutoshi), Yongsheng Zhang, 成瀬ゆい (Yui Naruse), and zealousidealroll for being awesome!
This
standard
is
written
by
Anne
van
Kesteren
(
Apple
,
annevk@annevk.nl
).
).
Intellectual property rights
Intellectual
property
rights
Copyright © WHATWG (Apple, Google, Mozilla, Microsoft). This work is licensed under a
Creative Commons Attribution 4.0 International License
BSD 3-Clause License
instead.
This is the Living Standard. Those interested in the patent-review version should view the
Living Standard Review Draft
Copyright © WHATWG (Apple, Google, Mozilla, Microsoft). This work is licensed under a Creative Commons Attribution 4.0 International License . To the extent portions of it are incorporated into source code, such portions in the source code are licensed under the BSD 3-Clause License instead.
This is the Living Standard. Those interested in the patent-review version should view the Living Standard Review Draft .