1. URL patterns
1.1. Introduction
A URL pattern consists of several components , each of which represents a pattern which could be matched against the corresponding component of a URL .
It can be constructed using a string for each component, or from a shorthand string . It can optionally be resolved relative to a base URL.
The
shorthand
"
https://example.com/:category/*
"
corresponds
to
the
following
components:
- protocol
-
"
https
" - username
-
"
*
" - password
-
"
*
" - hostname
-
"
example.com
" - port
- ""
- pathname
-
"
/:category/*
" - search
-
"
*
" - hash
-
"
*
"
It matches the following URLs:
-
https://example.com/products/
-
https://example.com/blog/our-greatest-product-ever
It does not match the following URLs:
-
https://example.com/
-
http://example.com/products/
-
https://example.com:8443/blog/our-greatest-product-ever
This
is
a
fairly
simple
pattern
which
requires
most
components
to
either
match
an
exact
string,
or
allows
any
string
("
*
").
The
pathname
component
matches
any
path
with
at
least
two
/
-separated
path
components,
the
first
of
which
is
captured
as
"
category
".
The
shorthand
"
http{s}?://{:subdomain.}?shop.example/products/:id([0-9]+)#reviews
"
corresponds
to
the
following
components:
- protocol
-
"
http{s}?
" - username
-
"
*
" - password
-
"
*
" - hostname
-
"
{:subdomain.}?shop.example
" - port
- ""
- pathname
-
"
/products/:id([0-9]+)
" - search
- ""
- hash
-
"
reviews
"
It matches the following URLs:
-
https://shop.example/products/74205#reviews
-
https://kathryn@voyager.shop.example/products/74656#reviews
-
http://insecure.shop.example/products/1701#reviews
It does not match the following URLs:
-
https://shop.example/products/2000
-
http://shop.example:8080/products/0#reviews
-
https://nx.shop.example/products/01?speed=5#reviews
-
https://shop.example/products/chair#reviews
This is a more complicated pattern which includes:
-
optional parts marked with
?
(braces are needed to make it unambiguous exactly what is optional ), and -
a regexp part named "
id
" which uses a regular expression to define what sorts of substrings match (the parentheses are necessary to mark it as a regular expression, and are not part of the regexp itself).
The
shorthand
"
../admin/*
"
with
the
base
URL
"
https://discussion.example/forum/?page=2
"
corresponds
to
the
following
components:
- protocol
-
"
https
" - username
-
"
*
" - password
-
"
*
" - hostname
-
"
discussion.example
" - port
- ""
- pathname
-
"
/admin/*
" - search
-
"
*
" - hash
-
"
*
"
It matches the following URLs:
-
https://discussion.example/admin/
-
https://edd:librarian@discussion.example/admin/update?id=1
It does not match the following URLs:
-
https://discussion.example/forum/admin/
-
http://discussion.example:8080/admin/update?id=1
This pattern demonstrates how pathnames are resolved relative to a base URL, in a similar way to relative URLs.
1.2.
The
URLPattern
class
typedef (USVString or URLPatternInit ); [
URLPatternInput Exposed =(Window ,Worker )]interface {
URLPattern constructor (URLPatternInput ,
input USVString ,
baseURL optional URLPatternOptions = {});
options constructor (optional URLPatternInput = {},
input optional URLPatternOptions = {});
options boolean test (optional URLPatternInput = {},
input optional USVString );
baseURL URLPatternResult ?exec (optional URLPatternInput = {},
input optional USVString );
baseURL readonly attribute USVString protocol ;readonly attribute USVString username ;readonly attribute USVString password ;readonly attribute USVString hostname ;readonly attribute USVString port ;readonly attribute USVString pathname ;readonly attribute USVString search ;readonly attribute USVString hash ;readonly attribute boolean hasRegExpGroups ; };dictionary {
URLPatternInit USVString ;
protocol USVString ;
username USVString ;
password USVString ;
hostname USVString ;
port USVString ;
pathname USVString ;
search USVString ;
hash USVString ; };
baseURL dictionary {
URLPatternOptions boolean =
ignoreCase false ; };dictionary {
URLPatternResult sequence <URLPatternInput >;
inputs URLPatternComponentResult ;
protocol URLPatternComponentResult ;
username URLPatternComponentResult ;
password URLPatternComponentResult ;
hostname URLPatternComponentResult ;
port URLPatternComponentResult ;
pathname URLPatternComponentResult ;
search URLPatternComponentResult ; };
hash dictionary {
URLPatternComponentResult USVString ;
input record <USVString , (USVString or undefined )>; };
groups
Each
URLPattern
has
an
associated
URL
pattern
,
a
URL
pattern
.
-
urlPattern = new
URLPattern
( input ) -
Constructs
a
new
URLPattern
object. The input is an object containing separate patterns for each URL component; e.g. hostname, pathname, etc. Missing components will default to a wildcard pattern. In addition, input can contain abaseURL
property that provides static text patterns for any missing components. -
urlPattern = new
URLPattern
( patternString , baseURL ) -
Constructs
a
new
URLPattern
object. patternString is a URL string containing pattern syntax for one or more components. If baseURL is provided, then patternString can be relative. This constructor will always set at least an empty string value and does not default any components to wildcard patterns. -
urlPattern = new
URLPattern
( input , options ) -
Constructs
a
new
URLPattern
object. The options is an object containing the additional configuration options that can affect how the components are matched. Currently it has only one propertyignoreCase
which can be set to true to enable case-insensitive matching.Note that by default, that is in the absence of the options argument, matching is always case-sensitive.
-
urlPattern = new
URLPattern
( patternString , baseURL , options ) -
Constructs
a
new
URLPattern
object. This overrides supports aURLPatternOptions
object when constructing a pattern from a patternString object, describing the patterns for individual components, and base URL. -
matches = urlPattern .
test
( input ) -
Tests
if
urlPattern
matches
the
given
arguments.
The
input
is
an
object
containing
strings
representing
each
URL
component;
e.g.
hostname,
pathname,
etc.
Missing
components
are
treated
as
empty
strings.
In
addition,
input
can
contain
a
baseURL
property that provides values for any missing components. If urlPattern matches the input on a component-by-component basis then true is returned. Otherwise, false is returned. -
matches = urlPattern .
test
( url , baseURL ) -
Tests
if
urlPattern
matches
the
given
arguments.
url
is
a
URL
string.
If
baseURL
is
provided,
then
url
can
be
relative.
If urlPattern matches the input on a component-by-component basis then true is returned. Otherwise, false is returned.
-
result = urlPattern .
exec
( input ) -
Executes
the
urlPattern
against
the
given
arguments.
The
input
is
an
object
containing
strings
representing
each
URL
component;
e.g.
hostname,
pathname,
etc.
Missing
components
are
treated
as
empty
strings.
In
addition,
input
can
contain
a
baseURL
property
that
provides
values
for
any
missing
components.
If urlPattern matches the input on a component-by-component basis then an object is returned containing the results. Matched group values are contained in per-component group objects within the result object; e.g.
matches.pathname.groups.id
. If urlPattern does not match the input , then result is null. -
result = urlPattern .
exec
( url , baseURL ) -
Executes
the
urlPattern
against
the
given
arguments.
url
is
a
URL
string.
If
baseURL
is
provided,
then
input
can
be
relative.
If urlPattern matches the input on a component-by-component basis then an object is returned containing the results. Matched group values are contained in per-component group objects within the result object; e.g.
matches.pathname.groups.id
. If urlPattern does not match the input , then result is null. -
urlPattern .
protocol
-
Returns urlPattern ’s normalized protocol pattern string.
-
urlPattern .
username
-
Returns urlPattern ’s normalized username pattern string.
-
urlPattern .
password
-
Returns urlPattern ’s normalized password pattern string.
-
urlPattern .
hostname
-
Returns urlPattern ’s normalized hostname pattern string.
-
urlPattern .
port
-
Returns urlPattern ’s normalized port pattern string.
-
urlPattern .
pathname
-
Returns urlPattern ’s normalized pathname pattern string.
-
urlPattern .
search
-
Returns urlPattern ’s normalized search pattern string.
-
urlPattern .
hash
-
Returns urlPattern ’s normalized hash pattern string.
-
urlPattern .
hasRegExpGroups
-
Returns whether urlPattern contains one or more groups which uses regular expression matching.
new
URLPattern(
input
,
baseURL
,
options
)
constructor
steps
are:
-
Run initialize given this , input , baseURL , and options .
new
URLPattern(
input
,
options
)
constructor
steps
are:
-
Run initialize given this , input , null, and options .
URLPattern
given
a
URLPattern
this
,
URLPatternInput
input
,
string
or
null
baseURL
,
and
URLPatternOptions
options
:
-
Set this ’s associated URL pattern to the result of create given input , baseURL , and options .
protocol
getter
steps
are:
-
Return this ’s associated URL pattern ’s protocol component ’s pattern string .
username
getter
steps
are:
-
Return this ’s associated URL pattern ’s username component ’s pattern string .
password
getter
steps
are:
-
Return this ’s associated URL pattern ’s password component ’s pattern string .
hostname
getter
steps
are:
-
Return this ’s associated URL pattern ’s hostname component ’s pattern string .
port
getter
steps
are:
-
Return this ’s associated URL pattern ’s port component ’s pattern string .
pathname
getter
steps
are:
-
Return this ’s associated URL pattern ’s pathname component ’s pattern string .
search
getter
steps
are:
-
Return this ’s associated URL pattern ’s search component ’s pattern string .
hash
getter
steps
are:
-
Return this ’s associated URL pattern ’s hash component ’s pattern string .
hasRegExpGroups
getter
steps
are:
-
If this ’s associated URL pattern ’s has regexp groups , then return true.
-
Return false.
test(
input
,
baseURL
)
method
steps
are:
-
Let result be the result of match given this ’s associated URL pattern , input , and baseURL if given.
-
If result is null, return false.
-
Return true.
exec(
input
,
baseURL
)
method
steps
are:
-
Return the result of match given this ’s associated URL pattern , input , and baseURL if given.
1.3. The URL pattern struct
A URL pattern is a struct with the following items :
-
protocol component , a component
-
username component , a component
-
password component , a component
-
hostname component , a component
-
port component , a component
-
pathname component , a component
-
search component , a component
-
hash component , a component
A component is a struct with the following items :
-
pattern string , a well formed pattern string
-
regular expression , a
RegExp
-
group name list , a list of strings
-
has regexp groups , a boolean
1.4. High-level operations
URLPatternInput
input
,
string
or
null
baseURL
,
and
URLPatternOptions
options
:
-
Let init be null.
-
If input is a scalar value string then:
-
Otherwise:
-
Assert : input is a
URLPatternInit
. -
If baseURL is not null, then throw a
TypeError
. -
Set init to input .
-
-
Let processedInit be the result of process a URLPatternInit given init , "
pattern
", null, null, null, null, null, null, null, and null. -
For each componentName of « "
protocol
", "username
", "password
", "hostname
", "port
", "pathname
", "search
", "hash
" »: -
If processedInit ["
protocol
"] is a special scheme and processedInit ["port
"] is a string which represents its corresponding default port in radix-10 using ASCII digits then set processedInit ["port
"] to the empty string. -
Let urlPattern be a new URL pattern .
-
Set urlPattern ’s protocol component to the result of compiling a component given processedInit ["
protocol
"], canonicalize a protocol , and default options . -
Set urlPattern ’s username component to the result of compiling a component given processedInit ["
username
"], canonicalize a username , and default options . -
Set urlPattern ’s password component to the result of compiling a component given processedInit ["
password
"], canonicalize a password , and default options . -
If the result running hostname pattern is an IPv6 address given processedInit ["
hostname
"] is true, then set urlPattern ’s hostname component to the result of compiling a component given processedInit ["hostname
"], canonicalize an IPv6 hostname , and hostname options . -
Otherwise, if the result of running protocol component matches a special scheme given urlPattern ’s protocol component is true, or urlPattern ’s protocol component ’s pattern string is "
*
", then set urlPattern ’s hostname component to the result of compiling a component given processedInit ["hostname
"], canonicalize a domain name , and hostname options . Otherwise, set urlPattern ’s hostname component to the result of compiling a component given processedInit ["
hostname
"], canonicalize a hostname , and hostname options .-
Set urlPattern ’s port component to the result of compiling a component given processedInit ["
port
"], canonicalize a port , and default options . -
Let compileOptions be a copy of the default options with the ignore case property set to options ["
ignoreCase
"]. -
If the result of running protocol component matches a special scheme given urlPattern ’s protocol component is true, then:
-
Let pathCompileOptions be copy of the pathname options with the ignore case property set to options ["
ignoreCase
"]. -
Set urlPattern ’s pathname component to the result of compiling a component given processedInit ["
pathname
"], canonicalize a pathname , and pathCompileOptions .
-
-
Otherwise set urlPattern ’s pathname component to the result of compiling a component given processedInit ["
pathname
"], canonicalize an opaque pathname , and compileOptions . -
Set urlPattern ’s search component to the result of compiling a component given processedInit ["
search
"], canonicalize a search , and compileOptions . -
Set urlPattern ’s hash component to the result of compiling a component given processedInit ["
hash
"], canonicalize a hash , and compileOptions . -
Return urlPattern .
URLPatternInput
or
URL
input
,
and
an
optional
string
baseURLString
:
-
Let protocol be the empty string.
-
Let username be the empty string.
-
Let password be the empty string.
-
Let hostname be the empty string.
-
Let port be the empty string.
-
Let pathname be the empty string.
-
Let search be the empty string.
-
Let hash be the empty string.
-
Let inputs be an empty list .
-
If input is a URL , then append the serialization of input to inputs .
-
Otherwise, append input to inputs .
-
If input is a
URLPatternInit
then:-
If baseURLString was given, throw a
TypeError
. -
Let applyResult be the result of process a URLPatternInit given input , "url", protocol , username , password , hostname , port , pathname , search , and hash . If this throws an exception, catch it, and return null.
-
Set protocol to applyResult ["
protocol
"]. -
Set username to applyResult ["
username
"]. -
Set password to applyResult ["
password
"]. -
Set hostname to applyResult ["
hostname
"]. -
Set port to applyResult ["
port
"]. -
Set pathname to applyResult ["
pathname
"]. -
Set search to applyResult ["
search
"]. -
Set hash to applyResult ["
hash
"].
-
-
Otherwise:
-
Let url be input .
-
If input is a
USVString
:-
Let baseURL be null.
-
If baseURLString was given, then:
-
Set baseURL to the result of running the basic URL parser on baseURLString .
-
If baseURL is failure, return null.
-
Append baseURLString to inputs .
-
-
Set url to the result of running the basic URL parser on input with baseURL .
-
If url is failure, return null.
-
-
Set protocol to url ’s scheme .
-
Set username to url ’s username .
-
Set password to url ’s password .
-
Set hostname to url ’s host , serialized , or the empty string if the value is null.
-
Set port to url ’s port , serialized , or the empty string if the value is null.
-
Set pathname to the result of URL path serializing url .
-
Set search to url ’s query or the empty string if the value is null.
-
Set hash to url ’s fragment or the empty string if the value is null.
-
-
Let protocolExecResult be RegExpBuiltinExec ( urlPattern ’s protocol component ’s regular expression , protocol ).
-
Let usernameExecResult be RegExpBuiltinExec ( urlPattern ’s username component ’s regular expression , username ).
-
Let passwordExecResult be RegExpBuiltinExec ( urlPattern ’s password component ’s regular expression , password ).
-
Let hostnameExecResult be RegExpBuiltinExec ( urlPattern ’s hostname component ’s regular expression , hostname ).
-
Let portExecResult be RegExpBuiltinExec ( urlPattern ’s port component ’s regular expression , port ).
-
Let pathnameExecResult be RegExpBuiltinExec ( urlPattern ’s pathname component ’s regular expression , pathname ).
-
Let searchExecResult be RegExpBuiltinExec ( urlPattern ’s search component ’s regular expression , search ).
-
Let hashExecResult be RegExpBuiltinExec ( urlPattern ’s hash component ’s regular expression , hash ).
-
If protocolExecResult , usernameExecResult , passwordExecResult , hostnameExecResult , portExecResult , pathnameExecResult , searchExecResult , or hashExecResult are null then return null.
-
Let result be a new
URLPatternResult
. -
Set result ["
inputs
"] to inputs . -
Set result ["
protocol
"] to the result of creating a component match result given urlPattern ’s protocol component , protocol , and protocolExecResult . -
Set result ["
username
"] to the result of creating a component match result given urlPattern ’s username component , username , and usernameExecResult . -
Set result ["
password
"] to the result of creating a component match result given urlPattern ’s password component , password , and passwordExecResult . -
Set result ["
hostname
"] to the result of creating a component match result given urlPattern ’s hostname component , hostname , and hostnameExecResult . -
Set result ["
port
"] to the result of creating a component match result given urlPattern ’s port component , port , and portExecResult . -
Set result ["
pathname
"] to the result of creating a component match result given urlPattern ’s pathname component , pathname , and pathnameExecResult . -
Set result ["
search
"] to the result of creating a component match result given urlPattern ’s search component , search , and searchExecResult . -
Set result ["
hash
"] to the result of creating a component match result given urlPattern ’s hash component , hash , and hashExecResult . -
Return result .
-
If urlPattern ’s protocol component has regexp groups is true, then return true.
-
If urlPattern ’s username component has regexp groups is true, then return true.
-
If urlPattern ’s password component has regexp groups is true, then return true.
-
If urlPattern ’s hostname component has regexp groups is true, then return true.
-
If urlPattern ’s port component has regexp groups is true, then return true.
-
If urlPattern ’s pathname component has regexp groups is true, then return true.
-
If urlPattern ’s search component has regexp groups is true, then return true.
-
If urlPattern ’s hash component has regexp groups is true, then return true.
-
Return false.
1.5. Internals
-
Let part list be the result of running parse a pattern string given input , options , and encoding callback .
-
Let ( regular expression string , name list ) be the result of running generate a regular expression and name list given part list and options .
-
Let flags be an empty string.
-
If options ’s ignore case is true then set flags to "
vi
". -
Otherwise set flags to "
v
" -
Let regular expression be RegExpCreate ( regular expression string , flags ). If this throws an exception, catch it, and throw a
TypeError
.The specification uses regular expressions to perform all matching, but this is not mandated. Implementations are free to perform matching directly against the part list when possible; e.g. when there are no custom regexp matching groups. If there are custom regular expressions, however, its important that they be immediately evaluated in the compile a component algorithm so an error can be thrown if they are invalid.
-
Let pattern string be the result of running generate a pattern string given part list and options .
-
Let has regexp groups be false.
-
For each part of part list :
-
Return a new component whose pattern string is pattern string , regular expression is regular expression , group name list is name list , and has regexp groups is has regexp groups .
-
Let result be a new
URLPatternComponentResult
. -
Set result ["
input
"] to input . -
Let groups be a
record <
.USVString
, (USVString
orundefined
)> -
Let index be 1.
-
While index is less than Get ( execResult , "
length
"):-
Let name be component ’s group name list [ index − 1].
-
Set groups [ name ] to value .
-
Increment index by 1.
-
-
Set result ["
groups
"] to groups . -
Return result .
The default options is an options struct with delimiter code point set to the empty string and prefix code point set to the empty string.
The
hostname
options
is
an
options
struct
with
delimiter
code
point
set
"
.
"
and
prefix
code
point
set
to
the
empty
string.
The
pathname
options
is
an
options
struct
with
delimiter
code
point
set
"
/
"
and
prefix
code
point
set
to
"
/
".
-
Let special scheme list be a list populated with all of the special schemes .
-
For each scheme of special scheme list :
-
Let test result be RegExpBuiltinExec ( protocol component ’s regular expression , scheme ).
-
If test result is not null, then return true.
-
-
Return false.
-
If input ’s code point length is less than 2, then return false.
-
Let input code points be input interpreted as a list of code points .
-
If input code points [0] is U+005B (
[
), then return true. -
If input code points [0] is U+007B (
{
) and input code points [1] is U+005B ([
), then return true. -
If input code points [0] is U+005C (
\
) and input code points [1] is U+005B ([
), then return true. -
Return false.
1.6. Constructor string parsing
A constructor string parser is a struct .
A constructor string parser has an associated input , a string, which must be set upon creation.
A constructor string parser has an associated token list , a token list , which must be set upon creation.
A
constructor
string
parser
has
an
associated
result
,
a
URLPatternInit
,
initially
set
to
a
new
URLPatternInit
.
A constructor string parser has an associated component start , a number, initially set to 0.
A constructor string parser has an associated token index , a number, initially set to 0.
A constructor string parser has an associated token increment , a number, initially set to 1.
A constructor string parser has an associated group depth , a number, initially set to 0.
A constructor string parser has an associated hostname IPv6 bracket depth , a number, initially set to 0.
A constructor string parser has an associated protocol matches a special scheme flag , a boolean, initially set to false.
A
constructor
string
parser
has
an
associated
state
,
a
string,
initially
set
to
"
init
".
It
must
be
one
of
the
following:
-
"
init
" -
"
protocol
" -
"
authority
" -
"
username
" -
"
password
" -
"
hostname
" -
"
port
" -
"
pathname
" -
"
search
" -
"
hash
" -
"
done
"
The URLPattern constructor string algorithm is very similar to the basic URL parser algorithm, but some differences prevent us from using that algorithm directly.
First,
the
URLPattern
constructor
string
parser
operates
on
tokens
generated
using
the
"
lenient
"
tokenize
policy
.
In
constrast,
basic
URL
parser
operates
on
code
points.
Operating
on
tokens
allows
the
URLPattern
constructor
string
parser
to
more
easily
distinguish
between
code
points
that
are
significant
pattern
syntax
and
code
points
that
might
be
a
URL
component
separator.
For
example,
it
makes
it
trivial
to
handle
named
groups
like
"
:hmm
"
in
"
https://a.c:hmm.example.com:8080
"
without
getting
confused
with
the
port
number.
Second, the URLPattern constructor string parser needs to avoid applying URL canonicalization to all code points like basic URL parser does. Instead we perform canonicalization on only parts of the pattern string we know are safe later when compiling each component pattern string.
Finally,
the
URLPattern
constructor
string
parser
does
not
handle
some
parts
of
the
basic
URL
parser
state
machine.
For
example,
it
does
not
treat
backslashes
specially
as
they
would
all
be
treated
as
pattern
characters
and
would
require
excessive
escaping.
In
addition,
this
parser
might
not
handle
some
more
esoteric
parts
of
the
URL
parsing
algorithm
like
file
URLs
with
a
hostname.
The
goal
with
this
parser
was
to
handle
the
most
common
URLs
while
allowing
any
niche
case
to
be
handled
instead
via
the
URLPatternInit
constructor.
In
the
constructor
string
algorithm,
the
pathname,
search,
and
hash
are
wildcarded
if
earlier
components
are
specified
but
later
ones
are
not.
For
example,
"
https://example.com/foo
"
matches
any
search
and
any
hash.
Similarly,
"
https://example.com
"
matches
any
URL
on
that
origin.
This
is
analogous
to
the
notion
of
a
more
specific
component
in
the
notes
about
process
a
URLPatternInit
(e.g.,
a
search
is
more
specific
than
a
pathname),
but
the
constructor
syntax
only
has
a
few
cases
where
it
is
possible
to
specify
a
more
specific
component
without
also
specifying
the
less
specific
components.
The username and password components are always wildcard unless they are explicitly specified.
If
a
hostname
is
specified
and
the
port
is
not,
the
port
is
assumed
to
be
the
default
port.
If
authors
want
to
match
any
port,
they
have
to
write
:*
explicitly.
For
example,
"
https://*
"
is
any
HTTPS
origin
on
port
443,
and
"
https://*:*
"
is
any
HTTPS
origin
on
any
port.
-
Let parser be a new constructor string parser whose input is input and token list is the result of running tokenize given input and "
lenient
". -
While parser ’s token index is less than parser ’s token list size :
-
Set parser ’s token increment to 1.
On every iteration of the parse loop the parser ’s token index will be incremented by its token increment value. Typically this means incrementing by 1, but at certain times it is set to zero. The token increment is then always reset back to 1 at the top of the loop.
-
If parser ’s token list [ parser ’s token index ]'s type is "
end
" then:-
If parser ’s state is "
init
":If we reached the end of the string in the "
init
" state , then we failed to find a protocol terminator and this has to be a relative URLPattern constructor string.-
Run rewind given parser .
We next determine at which component the relative pattern begins. Relative pathnames are most common, but URLs and URLPattern constructor strings can begin with the search or hash components as well.
-
If the result of running is a hash prefix given parser is true, then run change state given parser , "
hash
" and 1. -
Otherwise if the result of running is a search prefix given parser is true:
-
Run change state given parser , "
search
" and 1.
-
-
Otherwise:
-
Run change state given parser , "
pathname
" and 0.
-
-
Increment parser ’s token index by parser ’s token increment .
-
Continue .
-
-
If parser ’s state is "
authority
":If we reached the end of the string in the "
authority
" state , then we failed to find an "@
". Therefore there is no username or password.-
Run rewind and set state given parser , and "
hostname
". -
Increment parser ’s token index by parser ’s token increment .
-
Continue .
-
-
Run change state given parser , "
done
" and 0. -
Break .
-
-
If the result of running is a group open given parser is true:
We ignore all code points within "
{ ... }
" pattern groupings. It would not make sense to allow a URL component boundary to lie within a grouping; e.g. "https://example.c{om/fo}o
". While not supported within well formed pattern strings , we handle nested groupings here to avoid parser confusion.It is not necessary to perform this logic for regexp or named groups since those values are collapsed into individual tokens by the tokenize algorithm.
-
Increment parser ’s group depth by 1.
-
Increment parser ’s token index by parser ’s token increment .
-
Continue .
-
-
If parser ’s group depth is greater than 0:
-
If the result of running is a group close given parser is true, then decrement parser ’s group depth by 1.
-
Otherwise:
-
Increment parser ’s token index by parser ’s token increment .
-
Continue .
-
-
-
Switch on parser ’s state and run the associated steps:
-
"
init
" -
-
If the result of running is a protocol suffix given parser is true:
-
Run rewind and set state given parser and "
protocol
".
-
-
-
"
protocol
" -
-
If the result of running is a protocol suffix given parser is true:
-
Run compute protocol matches a special scheme flag given parser .
We need to eagerly compile the protocol component to determine if it matches any special schemes . If it does then certain special rules apply. It determines if the pathname defaults to a "
/
" and also whether we will look for the username, password, hostname, and port components. Authority slashes can also cause us to look for these components as well. Otherwise we treat this as an "opaque path URL" and go straight to the pathname component. -
Let next state be "
pathname
". -
Let skip be 1.
-
If the result of running next is authority slashes given parser is true:
-
Set next state to "
authority
". -
Set skip to 3.
-
-
Otherwise if parser ’s protocol matches a special scheme flag is true, then set next state to "
authority
". -
Run change state given parser , next state , and skip .
-
-
-
"
authority
" -
-
If the result of running is an identity terminator given parser is true, then run rewind and set state given parser and "
username
". -
Otherwise if any of the following are true:
- the result of running is a pathname start given parser ;
- the result of running is a search prefix given parser ; or
- the result of running is a hash prefix given parser ,
then run rewind and set state given parser and "
hostname
".
-
-
"
username
" -
-
If the result of running is a password prefix given parser is true, then run change state given parser , "
password
", and 1. -
Otherwise if the result of running is an identity terminator given parser is true, then run change state given parser , "
hostname
", and 1.
-
-
"
password
" -
-
If the result of running is an identity terminator given parser is true, then run change state given parser , "
hostname
", and 1.
-
-
"
hostname
" -
-
If the result of running is an IPv6 open given parser is true, then increment parser ’s hostname IPv6 bracket depth by 1.
-
Otherwise if the result of running is an IPv6 close given parser is true, then decrement parser ’s hostname IPv6 bracket depth by 1.
-
Otherwise if the result of running is a port prefix given parser is true and parser ’s hostname IPv6 bracket depth is zero, then run change state given parser , "
port
", and 1. -
Otherwise if the result of running is a pathname start given parser is true, then run change state given parser , "
pathname
", and 0. -
Otherwise if the result of running is a search prefix given parser is true, then run change state given parser , "
search
", and 1. -
Otherwise if the result of running is a hash prefix given parser is true, then run change state given parser , "
hash
", and 1.
-
-
"
port
" -
-
If the result of running is a pathname start given parser is true, then run change state given parser , "
pathname
", and 0. -
Otherwise if the result of running is a search prefix given parser is true, then run change state given parser , "
search
", and 1. -
Otherwise if the result of running is a hash prefix given parser is true, then run change state given parser , "
hash
", and 1.
-
-
"
pathname
" -
-
If the result of running is a search prefix given parser is true, then run change state given parser , "
search
", and 1. -
Otherwise if the result of running is a hash prefix given parser is true, then run change state given parser , "
hash
", and 1.
-
-
"
search
" -
-
If the result of running is a hash prefix given parser is true, then run change state given parser , "
hash
", and 1.
-
-
"
hash
" -
-
Do nothing.
-
-
"
done
" -
-
Assert : This step is never reached.
-
-
"
-
Increment parser ’s token index by parser ’s token increment .
-
-
If parser ’s result contains "
hostname
" and not "port
", then set parser ’s result ["port
"] to the empty string.This is special-cased because when an author does not specify a port, they usually intend the default port. If any port is acceptable, the author can specify it as a wildcard explicitly. For example, "https://example.com/*
" does not match URLs beginning with "https://example.com:8443/
", which is a different origin. -
Return parser ’s result .
-
If parser ’s state is not "
init
", not "authority
", and not "done
", then set parser ’s result [ parser ’s state ] to the result of running make a component string given parser . -
If parser ’s state is not "
init
" and new state is not "done
", then:-
If parser ’s state is "
protocol
", "authority
", "username
", or "password
"; new state is "port
", "pathname
", "search
", or "hash
"; and parser ’s result ["hostname
"] does not exist , then set parser ’s result ["hostname
"] to the empty string. -
If parser ’s state is "
protocol
", "authority
", "username
", "password
", "hostname
", or "port
"; new state is "search
" or "hash
"; and parser ’s result ["pathname
"] does not exist , then:-
If parser ’s protocol matches a special scheme flag is true, then set parser ’s result ["
pathname
"] to "/
". -
Otherwise, set parser ’s result ["
pathname
"] to the empty string.
-
-
If parser ’s state is "
protocol
", "authority
", "username
", "password
", "hostname
", "port
", or "pathname
"; new state is "hash
"; and parser ’s result ["search
"] does not exist , then set parser ’s result ["search
"] to the empty string.
-
-
Set parser ’s state to new state .
-
Increment parser ’s token index by skip .
-
Set parser ’s component start to parser ’s token index .
-
Set parser ’s token increment to 0.
-
Set parser ’s token index to parser ’s component start .
-
Set parser ’s token increment to 0.
-
If index is less than parser ’s token list ’s size , then return parser ’s token list [ index ].
-
Assert : parser ’s token list ’s size is greater than or equal to 1.
-
Let last index be parser ’s token list ’s size − 1.
-
Let token be parser ’s token list [ last index ].
-
Return token .
-
Let token be the result of running get a safe token given parser and index .
-
If token ’s value is not value , then return false.
-
If any of the following are true:
-
token
’s
type
is
"
char
"; -
token
’s
type
is
"
escaped-char
"; or -
token
’s
type
is
"
invalid-char
",
then return true.
-
token
’s
type
is
"
-
Return false.
-
Return the result of running is a non-special pattern char given parser , parser ’s token index , and "
:
".
-
If the result of running is a non-special pattern char given parser , parser ’s token index + 1, and "
/
" is false, then return false. -
If the result of running is a non-special pattern char given parser , parser ’s token index + 2, and "
/
" is false, then return false. -
Return true.
-
Return the result of running is a non-special pattern char given parser , parser ’s token index , and "
@
".
-
Return the result of running is a non-special pattern char given parser , parser ’s token index , and "
:
".
-
Return the result of running is a non-special pattern char given parser , parser ’s token index , and "
:
".
-
Return the result of running is a non-special pattern char given parser , parser ’s token index , and "
/
".
-
If result of running is a non-special pattern char given parser , parser ’s token index and "
?
" is true, then return true. -
If parser ’s token list [ parser ’s token index ]'s value is not "
?
", then return false. -
Let previous index be parser ’s token index − 1.
-
If previous index is less than 0, then return true.
-
Let previous token be the result of running get a safe token given parser and previous index .
-
If any of the following are true, then return false:
-
Return true.
-
Return the result of running is a non-special pattern char given parser , parser ’s token index and "
#
".
-
If parser ’s token list [ parser ’s token index ]'s type is "
open
", then return true. -
Otherwise return false.
-
If parser ’s token list [ parser ’s token index ]'s type is "
close
", then return true. -
Otherwise return false.
-
Return the result of running is a non-special pattern char given parser , parser ’s token index , and "
[
".
-
Return the result of running is a non-special pattern char given parser , parser ’s token index , and "
]
".
-
Assert : parser ’s token index is less than parser ’s token list ’s size .
-
Let token be parser ’s token list [ parser ’s token index ].
-
Let component start token be the result of running get a safe token given parser and parser ’s component start .
-
Let component start input index be component start token ’s index .
-
Let end index be token ’s index .
-
Return the code point substring from component start input index to end index within parser ’s input .
-
Let protocol string be the result of running make a component string given parser .
-
Let protocol component be the result of compiling a component given protocol string , canonicalize a protocol , and default options .
-
If the result of running protocol component matches a special scheme given protocol component is true, then set parser ’s protocol matches a special scheme flag to true.
2. Pattern strings
A pattern string is a string that is written to match a set of target strings. A well formed pattern string conforms to a particular pattern syntax. This pattern syntax is directly based on the syntax used by the popular path-to-regexp JavaScript library.
It can be parsed to produce a part list which describes, in order, what must appear in a component string for the pattern string to match.
/
in
the
pathname,
.
in
the
hostname).
For
example,
the
pathname
pattern
"
/blog/:title
"
will
match
"
/blog/hello-world
"
but
not
"
/blog/2012/02
".
A
regular
expression
enclosed
in
parentheses
can
also
be
used
instead,
so
the
pathname
pattern
"
/blog/:year(\\d+)/:month(\\d+)
"
will
match
"
/blog/2012/02
".
A
group
can
also
be
made
optional
,
or
repeated,
by
using
a
modifier.
For
example,
the
pathname
pattern
"
/products/:id?"
will
match
both
"
/products
"
and
"
/products/2
"
(but
not
"
/products/
").
In
the
pathname
specifically,
groups
automatically
require
a
leading
/
;
to
avoid
this,
the
group
can
be
explicitly
deliminated,
as
in
the
pathname
pattern
"
/products/{:id}?
".
A
full
wildcard
*
can
also
be
used
to
match
as
much
as
possible,
as
in
the
pathname
pattern
"
/products/*
".
2.1. Parsing pattern strings
2.1.1. Tokens
A token list is a list containing zero or more token structs .
A token is a struct representing a single lexical token within a pattern string .
A
token
has
an
associated
type
,
a
string,
initially
"
invalid-char
".
It
must
be
one
of
the
following:
-
"
open
" -
The
token
represents
a
U+007B
(
{
) code point. -
"
close
" -
The
token
represents
a
U+007D
(
}
) code point. -
"
regexp
" -
The
token
represents
a
string
of
the
form
"
(<regular expression>)
". The regular expression is required to consist of only ASCII code points. -
"
name
" -
The
token
represents
a
string
of
the
form
"
:<name>
". The name value is restricted to code points that are consistent with JavaScript identifiers. -
"
char
" - The token represents a valid pattern code point without any special syntactical meaning.
-
"
escaped-char
" -
The
token
represents
a
code
point
escaped
using
a
backslash
like
"
\<char>
". -
"
other-modifier
" -
The
token
represents
a
matching
group
modifier
that
is
either
the
U+003F
(
?
) or U+002B (+
) code points. -
"
asterisk
" -
The
token
represents
a
U+002A
(
*
) code point that can be either a wildcard matching group or a matching group modifier. -
"
end
" - The token represents the end of the pattern string .
-
"
invalid-char
" - The token represents a code point that is invalid in the pattern. This could be because of the code point value itself or due to its location within the pattern relative to other syntactic elements.
A token has an associated index , a number, initially 0. It is the position of the first code point in the pattern string represented by the token .
A token has an associated value , a string, initially the empty string. It contains the code points from the pattern string represented by the token .
2.1.2. Tokenizing
A
tokenize
policy
is
a
string
that
must
be
either
"
strict
"
or
"
lenient
".
A tokenizer is a struct .
A tokenizer has an associated input , a pattern string , initially the empty string.
A
tokenizer
has
an
associated
policy
,
a
tokenize
policy
,
initially
"
strict
".
A tokenizer has an associated token list , a token list , initially an empty list .
A tokenizer has an associated index , a number, initially 0.
A tokenizer has an associated next index , a number, initially 0.
A tokenizer has an associated code point , a Unicode code point, initially null.
-
Let tokenizer be a new tokenizer .
-
Set tokenizer ’s input to input .
-
Set tokenizer ’s policy to policy .
-
While tokenizer ’s index is less than tokenizer ’s input ’s code point length :
-
Run seek and get the next code point given tokenizer and tokenizer ’s index .
-
If tokenizer ’s code point is U+002A (
*
):-
Run add a token with default position and length given tokenizer and "
asterisk
". -
Continue .
-
-
If tokenizer ’s code point is U+002B (
+
) or U+003F (?
):-
Run add a token with default position and length given tokenizer and "
other-modifier
". -
Continue .
-
-
If tokenizer ’s code point is U+005C (
\
):-
If tokenizer ’s index is equal to tokenizer ’s input ’s code point length − 1:
-
Run process a tokenizing error given tokenizer , tokenizer ’s next index , and tokenizer ’s index .
-
Continue .
-
-
Let escaped index be tokenizer ’s next index .
-
Run get the next code point given tokenizer .
-
Run add a token with default length given tokenizer , "
escaped-char
", tokenizer ’s next index , and escaped index . -
Continue .
-
-
If tokenizer ’s code point is U+007B (
{
):-
Run add a token with default position and length given tokenizer and "
open
". -
Continue .
-
-
If tokenizer ’s code point is U+007D (
}
):-
Run add a token with default position and length given tokenizer and "
close
". -
Continue .
-
-
If tokenizer ’s code point is U+003A (
:
):-
Let name position be tokenizer ’s next index .
-
Let name start be name position .
-
While name position is less than tokenizer ’s input ’s code point length :
-
Run seek and get the next code point given tokenizer and name position .
-
Let first code point be true if name position equals name start and false otherwise.
-
Let valid code point be the result of running is a valid name code point given tokenizer ’s code point and first code point .
-
If valid code point is false break .
-
Set name position to tokenizer ’s next index .
-
-
If name position is less than or equal to name start :
-
Run process a tokenizing error given tokenizer , name start , and tokenizer ’s index .
-
Continue .
-
-
Run add a token with default length given tokenizer , "
name
", name position , and name start . -
Continue .
-
-
If tokenizer ’s code point is U+0028 (
(
):-
Let depth be 1.
-
Let regexp position be tokenizer ’s next index .
-
Let regexp start be regexp position .
-
Let error be false.
-
While regexp position is less than tokenizer ’s input ’s code point length :
-
Run seek and get the next code point given tokenizer and regexp position .
-
If tokenizer ’s code point is not an ASCII code point :
-
Run process a tokenizing error given tokenizer , regexp start , and tokenizer ’s index .
-
Set error to true.
-
Break .
-
-
If regexp position equals regexp start and tokenizer ’s code point is U+003F (
?
):-
Run process a tokenizing error given tokenizer , regexp start , and tokenizer ’s index .
-
Set error to true.
-
Break .
-
-
If tokenizer ’s code point is U+005C (
\
):-
If regexp position equals tokenizer ’s input ’s code point length − 1:
-
Run process a tokenizing error given tokenizer , regexp start , and tokenizer ’s index .
-
Set error to true.
-
-
Run get the next code point given tokenizer .
-
If tokenizer ’s code point is not an ASCII code point :
-
Run process a tokenizing error given tokenizer , regexp start , and tokenizer ’s index .
-
Set error to true.
-
Break .
-
-
Set regexp position to tokenizer ’s next index .
-
Continue .
-
-
If tokenizer ’s code point is U+0029 (
)
):-
Decrement depth by 1.
-
If depth is 0:
-
Set regexp position to tokenizer ’s next index .
-
Break .
-
-
-
Otherwise if tokenizer ’s code point is U+0028 (
(
):-
Increment depth by 1.
-
If regexp position equals tokenizer ’s input ’s code point length − 1:
-
Run process a tokenizing error given tokenizer , regexp start , and tokenizer ’s index .
-
Set error to true.
-
-
Let temporary position be tokenizer ’s next index .
-
Run get the next code point given tokenizer .
-
If tokenizer ’s code point is not U+003F (
?
):-
Run process a tokenizing error given tokenizer , regexp start , and tokenizer ’s index .
-
Set error to true.
-
Break .
-
-
Set tokenizer ’s next index to temporary position .
-
-
Set regexp position to tokenizer ’s next index .
-
-
If error is true continue .
-
If depth is not zero:
-
Run process a tokenizing error given tokenizer , regexp start , and tokenizer ’s index .
-
Continue .
-
-
Let regexp length be regexp position − regexp start − 1.
-
If regexp length is zero:
-
Run process a tokenizing error given tokenizer , regexp start , and tokenizer ’s index .
-
Continue .
-
-
Run add a token given tokenizer , "
regexp
", regexp position , regexp start , and regexp length . -
Continue .
-
-
Run add a token with default position and length given tokenizer and "
char
".
-
-
Run add a token with default length given tokenizer , "
end
", tokenizer ’s index , and tokenizer ’s index . -
Return tokenizer ’s token list .
-
Set tokenizer ’s code point to the Unicode code point in tokenizer ’s input at the position indicated by tokenizer ’s next index .
-
Increment tokenizer ’s next index by 1.
-
Set tokenizer ’s next index to index .
-
Run get the next code point given tokenizer .
-
Let token be a new token .
-
Set token ’s type to type .
-
Set token ’s value to the code point substring from value position with length value length within tokenizer ’s input .
-
Append token to the back of tokenizer ’s token list .
-
Set tokenizer ’s index to next position .
-
Let computed length be next position − value position .
-
Run add a token given tokenizer , type , next position , value position , and computed length .
-
Run add a token with default length given tokenizer , type , tokenizer ’s next index , and tokenizer ’s index .
-
If tokenizer ’s policy is "
strict
", then throw aTypeError
. -
Run add a token with default length given tokenizer , "
invalid-char
", next position , and value position .
-
If first is true return the result of checking if code point is contained in the IdentifierStart set of code points.
-
Otherwise return the result of checking if code point is contained in the IdentifierPart set of code points.
2.1.3. Parts
A part list is a list of zero or more parts .
A part is a struct representing one piece of a parser pattern string . It can contain at most one matching group, a fixed text prefix, a fixed text suffix, and a modifier. It can contain as little as a single fixed text string or a single matching group.
A part has an associated type , a string, which must be set upon creation. It must be one of the following:
-
"
fixed-text
" - The part represents a simple fixed text string.
-
"
regexp
" - The part represents a matching group with a custom regular expression.
-
"
segment-wildcard
" -
The
part
represents
a
matching
group
that
matches
code
points
up
to
the
next
separator
code
point.
This
is
typically
used
for
a
named
group
like
"
:foo
" that does not have a custom regular expression. -
"
full-wildcard
" -
The
part
represents
a
matching
group
that
greedily
matches
all
code
points.
This
is
typically
used
for
the
"
*
" wildcard matching group.
A part has an associated value , a string, which must be set upon creation.
A part has an associated modifier a string, which must be set upon creation. It must be one of the following:
-
"
none
" - The part does not have a modifier .
-
"
optional
" -
The
part
has
an
optional
modifier
indicated
by
the
U+003F
(
?
) code point. -
"
zero-or-more
" -
The
part
has
a
"zero
or
more"
modifier
indicated
by
the
U+002A
(
*
) code point. -
"
one-or-more
" -
The
part
has
a
"one
or
more"
modifier
indicated
by
the
U+002B
(
+
) code point.
A part has an associated name , a string, initially the empty string.
A part has an associated prefix , a string, initially the empty string.
A part has an associated suffix , a string, initially the empty string.
2.1.4. Options
An
options
struct
contains
different
settings
that
control
how
pattern
string
behaves.
These
options
originally
come
from
path-to-regexp
.
We
only
include
the
options
that
are
modified
within
the
URLPattern
specification
and
exclude
the
other
options.
For
the
purposes
of
comparison,
this
specification
acts
like
path-to-regexp
where
strict
,
start
,
and
end
are
always
set
to
false.
An
options
has
an
associated
delimiter
code
point
,
a
string,
which
must
be
set
upon
creation.
It
must
contain
one
ASCII
code
point
or
the
empty
string.
This
code
point
is
treated
as
a
segment
separator
and
is
used
for
determining
how
far
a
:foo
named
group
should
match
by
default.
For
example,
if
the
delimiter
code
point
is
"
/
"
then
"
/:foo
"
will
match
"
/bar
",
but
not
"
/bar/baz
".
If
the
delimiter
code
point
is
the
empty
string
then
the
example
pattern
would
match
both
strings.
An
options
has
an
associated
prefix
code
point
,
a
string,
which
must
be
set
upon
creation.
It
must
contain
one
ASCII
code
point
or
the
empty
string.
The
code
point
is
treated
as
an
automatic
prefix
if
found
immediately
preceding
a
match
group.
This
matters
when
a
match
group
is
modified
to
be
optional
or
repeating.
For
example,
if
prefix
code
point
is
"
/
"
then
"
/foo/:bar?/baz
"
will
treat
the
"
/
"
before
"
:bar
"
as
a
prefix
that
becomes
optional
along
with
the
named
group.
So
in
this
example
the
pattern
would
match
"
/foo/baz
".
An
options
has
an
associated
ignore
case
,
a
boolean,
which
must
be
set
up
upon
creation.
It
defaults
to
false.
Depending
on
the
set
value,
true
or
false,
this
flag
enables
case-sensitive
or
case-insensitive
matches,
respectively.
For
the
purpose
of
comparison,
this
case
be
thought
of
as
the
negated
sensitive
option
in
path-to-regexp
.
2.1.5. Parsing
A pattern parser is a struct .
A pattern parser has an associated token list , a token list , initially an empty list .
A pattern parser has an associated encoding callback , a encoding callback , that must be set upon creation.
A pattern parser has an associated segment wildcard regexp , a string, that must be set upon creation.
A pattern parser has an associated part list , a part list , initially an empty list .
A pattern parser has an associated pending fixed value , a string, initially the empty string.
A pattern parser has an associated index , a number, initially 0.
A pattern parser has an associated next numeric name , a number, initially 0.
-
Let parser be a new pattern parser whose encoding callback is encoding callback and segment wildcard regexp is the result of running generate a segment wildcard regexp given options .
-
Set parser ’s token list to the result of running tokenize given input and "
strict
". -
While parser ’s index is less than parser ’s token list ’s size :
This first section is looking for the sequence:
<prefix char><name><regexp><modifier>
. There could be zero to all of these tokens.-
"
/:foo(bar)?
" - All four tokens .
-
"
/
" -
One
"
char
" token . -
"
:foo
" -
One
"
name
" token . -
"
(bar)
" -
One
"
regexp
" token . -
"
/:foo
" -
"
char
" and "name
" tokens . -
"
/(bar)
" -
"
char
" and "regexp
" tokens . -
"
/:foo?
" -
"
char
", "name
", and "other-modifier
" tokens . -
"
/(bar)?
" -
"
char
", "regexp
", and "other-modifier
" tokens .
-
Let char token be the result of running try to consume a token given parser and "
char
". -
Let name token be the result of running try to consume a token given parser and "
name
". -
Let regexp or wildcard token be the result of running try to consume a regexp or wildcard token given parser and name token .
-
If name token is not null or regexp or wildcard token is not null:
If there is a matching group, we need to add the part immediately.
-
Let prefix be the empty string.
-
If char token is not null then set prefix to char token ’s value .
-
If prefix is not the empty string and not options ’s prefix code point :
-
Append prefix to the end of parser ’s pending fixed value .
-
Set prefix to the empty string.
-
-
Run maybe add a part from the pending fixed value given parser .
-
Let modifier token be the result of running try to consume a modifier token given parser .
-
Run add a part given parser , prefix , name token , regexp or wildcard token , the empty string, and modifier token .
-
Continue .
-
-
Let fixed token be char token .
If there was no matching group, then we need to buffer any fixed text. We want to collect as much text as possible before adding it as a "
fixed-text
" part . -
If fixed token is null, then set fixed token to the result of running try to consume a token given parser and "
escaped-char
". -
If fixed token is not null:
-
Append fixed token ’s value to parser ’s pending fixed value .
-
Continue .
-
-
Let open token be the result of running try to consume a token given parser and "
open
". -
If open token is not null:
-
Let prefix be the result of running consume text given parser .
-
Set name token to the result of running try to consume a token given parser and "
name
". -
Set regexp or wildcard token to the result of running try to consume a regexp or wildcard token given parser and name token .
-
Let suffix be the result of running consume text given parser .
-
Run consume a required token given parser and "
close
". -
Let modifier token be the result of running try to consume a modifier token given parser .
-
Run add a part given parser , prefix , name token , regexp or wildcard token , suffix , and modifier token .
-
Continue .
-
-
Run maybe add a part from the pending fixed value given parser .
-
Run consume a required token given parser and "
end
".
-
"
-
Return parser ’s part list .
The
full
wildcard
regexp
value
is
the
string
"
.*
".
-
Let result be "
[^
". -
Append the result of running escape a regexp string given options ’s delimiter code point to the end of result .
-
Append "
]+?
" to the end of result . -
Return result .
-
Assert : parser ’s index is less than parser ’s token list size .
-
Let next token be parser ’s token list [ parser ’s index ].
-
If next token ’s type is not type return null.
-
Increment parser ’s index by 1.
-
Return next token .
-
Let token be the result of running try to consume a token given parser and "
other-modifier
". -
If token is not null, then return token .
-
Set token to the result of running try to consume a token given parser and "
asterisk
". -
Return token .
-
Let token be the result of running try to consume a token given parser and "
regexp
". -
If name token is null and token is null, then set token to the result of running try to consume a token given parser and "
asterisk
". -
Return token .
-
Let result be the result of running try to consume a token given parser and type .
-
If result is null, then throw a
TypeError
. -
Return result .
-
Let result be the empty string.
-
While true:
-
Let token be the result of running try to consume a token given parser and "
char
". -
If token is null, then set token to the result of running try to consume a token given parser and "
escaped-char
". -
If token is null, then break .
-
Append token ’s value to the end of result .
-
-
Return result .
-
If parser ’s pending fixed value is the empty string, then return.
-
Let encoded value be the result of running parser ’s encoding callback given parser ’s pending fixed value .
-
Set parser ’s pending fixed value to the empty string.
-
Let part be a new part whose type is "
fixed-text
", value is encoded value , and modifier is "none
".
-
Let modifier be "
none
". -
If modifier token is not null:
-
If modifier token ’s value is "
?
" then set modifier to "optional
". -
Otherwise if modifier token ’s value is "
*
" then set modifier to "zero-or-more
". -
Otherwise if modifier token ’s value is "
+
" then set modifier to "one-or-more
".
-
-
If name token is null and regexp or wildcard token is null and modifier is "
none
":This was a "
{foo}
" grouping. We add this to the pending fixed value so that it will be combined with any previous or subsequent text.-
Append prefix to the end of parser ’s pending fixed value .
-
Return.
-
-
Run maybe add a part from the pending fixed value given parser .
-
If name token is null and regexp or wildcard token is null:
This was a "
{foo}?
" grouping. The modifier means we cannot combine it with other text. Therefore we add it as a part immediately.-
Assert : suffix is the empty string.
-
If prefix is the empty string, then return.
-
Let encoded value be the result of running parser ’s encoding callback given prefix .
-
Let part be a new part whose type is "
fixed-text
", value is encoded value , and modifier is modifier . -
Return.
-
-
Let regexp value be the empty string.
Next, we convert the regexp or wildcard token into a regular expression.
-
If regexp or wildcard token is null, then set regexp value to parser ’s segment wildcard regexp .
-
Otherwise if regexp or wildcard token ’s type is "
asterisk
", then set regexp value to the full wildcard regexp value . -
Otherwise set regexp value to regexp or wildcard token ’s value .
-
Let type be "
regexp
".Next, we convert regexp value into a part type . We make sure to go to a regular expression first so that an equivalent "
regexp
" token will be treated the same as a "name
" or "asterisk
" token . -
If regexp value is parser ’s segment wildcard regexp :
-
Set type to "
segment-wildcard
". -
Set regexp value to the empty string.
-
-
Otherwise if regexp value is the full wildcard regexp value :
-
Set type to "
full-wildcard
". -
Set regexp value to the empty string.
-
-
Let name be the empty string.
Next, we determine the part name . This can be explicitly provided by a "
name
" token or be automatically assigned. -
If name token is not null, then set name to name token ’s value .
-
Otherwise if regexp or wildcard token is not null:
-
Set name to parser ’s next numeric name , serialized .
-
Increment parser ’s next numeric name by 1.
-
-
If the result of running is a duplicate name given parser and name is true, then throw a
TypeError
. -
Let encoded prefix be the result of running parser ’s encoding callback given prefix .
Finally, we encode the fixed text values and create the part .
-
Let encoded suffix be the result of running parser ’s encoding callback given suffix .
-
Let part be a new part whose type is type , value is regexp value , modifier is modifier , name is name , prefix is encoded prefix , and suffix is encoded suffix .
2.2. Converting part lists to regular expressions
-
Let result be "
^
". -
Let name list be a new list .
-
For each part of part list :
-
If part ’s type is "
fixed-text
":-
If part ’s modifier is "
none
", then append the result of running escape a regexp string given part ’s value to the end of result . -
Otherwise:
A "
fixed-text
" part with a modifier uses a non capturing group. It uses the following form.(?:<fixed text>)<modifier>
-
Append "
(?:
" to the end of result . -
Append the result of running escape a regexp string given part ’s value to the end of result .
-
Append "
)
" to the end of result . -
Append the result of running convert a modifier to a string given part ’s modifier to the end of result .
-
-
Continue .
-
-
Append part ’s name to name list .
We collect the list of matching group names in a parallel list. This is largely done for legacy reasons to match path-to-regexp . We could attempt to convert this to use regular expression named captured groups, but given the complexity of this algorithm there is a real risk of introducing unintended bugs. In addition, if we ever end up exposing the generated regular expressions to the web we would like to maintain compability with path-to-regexp which has indicated its unlikely to switch to using named capture groups.
-
Let regexp value be part ’s value .
-
If part ’s type is "
segment-wildcard
", then set regexp value to the result of running generate a segment wildcard regexp given options . -
Otherwise if part ’s type is "
full-wildcard
", then set regexp value to full wildcard regexp value . -
If part ’s prefix is the empty string and part ’s suffix is the empty string:
If there is no prefix or suffix then generation depends on the modifier. If there is no modifier or just the optional modifier, it uses the following simple form:
(<regexp value>)<modifier>
If there is a repeating modifier, however, we will use the more complex form:
((?:<regexp value>)<modifier>)
-
If part ’s modifier is "
none
" or "optional
", then:-
Append "
(
" to the end of result . -
Append regexp value to the end of result .
-
Append "
)
" to the end of result . -
Append the result of running convert a modifier to a string given part ’s modifier to the end of result .
-
-
Otherwise:
-
Append "
((?:
" to the end of result . -
Append regexp value to the end of result .
-
Append "
)
" to the end of result . -
Append the result of running convert a modifier to a string given part ’s modifier to the end of result .
-
Append "
)
" to the end of result .
-
-
Continue .
-
-
If part ’s modifier is "
none
" or "optional
":This section handles non-repeating parts with a prefix or suffix . There is an inner capturing group that contains the primary regexp value . The inner group is then combined with the prefix or suffix in an outer non-capturing group. Finally the modifier is applied. The resulting form is as follows.
(?:<prefix>(<regexp value>)<suffix>)<modifier>
-
Append "
(?:
" to the end of result . -
Append the result of running escape a regexp string given part ’s prefix to the end of result .
-
Append "
(
" to the end of result . -
Append regexp value to the end of result .
-
Append "
)
" to the end of result . -
Append the result of running escape a regexp string given part ’s suffix to the end of result .
-
Append "
)
" to the end of result . -
Append the result of running convert a modifier to a string given part ’s modifier to the end of result .
-
Continue .
-
-
Assert : part ’s modifier is "
zero-or-more
" or "one-or-more
". -
Assert : part ’s prefix is not the empty string or part ’s suffix is not the empty string.
Repeating parts with a prefix or suffix are dramatically more complicated. We want to exclude the initial prefix and the final suffix , but include them between any repeated elements. To achieve this we provide a separate initial expression that excludes the prefix . Then the expression is duplicated with the prefix / suffix values included in an optional repeating element. If zero values are permitted then a final optional modifier can be appended. The resulting form is as follows.
(?:<prefix>((?:<regexp value>)(?:<suffix><prefix>(?:<regexp value>))*)<suffix>)?
-
Append "
(?:
" to the end of result . -
Append the result of running escape a regexp string given part ’s prefix to the end of result .
-
Append "
((?:
" to the end of result . -
Append regexp value to the end of result .
-
Append "
)(?:
" to the end of result . -
Append the result of running escape a regexp string given part ’s suffix to the end of result .
-
Append the result of running escape a regexp string given part ’s prefix to the end of result .
-
Append "
(?:
" to the end of result . -
Append regexp value to the end of result .
-
Append "
))*)
" to the end of result . -
Append the result of running escape a regexp string given part ’s suffix to the end of result .
-
Append "
)
" to the end of result . -
If part ’s modifier is "
zero-or-more
" then append "?
" to the end of result .
-
-
Append "
$
" to the end of result . -
Return ( result , name list ).
-
Assert : input is an ASCII string .
-
Let result be the empty string.
-
Let index be 0.
-
While index is less than input ’s length :
-
Let c be input [ index ].
-
Increment index by 1.
-
If c is one of:
-
U+002E
(
.
); -
U+002B
(
+
); -
U+002A
(
*
); -
U+003F
(
?
); -
U+005E
(
^
); -
U+0024
(
$
); -
U+007B
(
{
); -
U+007D
(
}
); -
U+0028
(
(
); -
U+0029
(
)
); -
U+005B
(
[
); -
U+005D
(
]
); -
U+007C
(
|
); -
U+002F
(
/
); or -
U+005C
(
\
),
then append "
\
" to the end of result . -
U+002E
(
-
Append c to the end of result .
-
-
Return result .
2.3. Converting part lists to pattern strings
-
Let result be the empty string.
-
Let index list be the result of getting the indices for part list .
-
For each index of index list :
-
Let part be part list [ index ].
-
Let previous part be part list [ index - 1] if index is greater than 0, otherwise let it be null.
-
Let next part be part list [ index + 1] if index is less than index list ’s size - 1, otherwise let it be null.
-
If part ’s type is "
fixed-text
" then:-
If part ’s modifier is "
none
" then:-
Append the result of running escape a pattern string given part ’s value to the end of result .
-
Continue .
-
-
Append "
{
" to the end of result . -
Append the result of running escape a pattern string given part ’s value to the end of result .
-
Append "
}
" to the end of result . -
Append the result of running convert a modifier to a string given part ’s modifier to the end of result .
-
Continue .
-
-
Let custom name be true if part ’s name [0] is not an ASCII digit ; otherwise false.
-
Let needs grouping be true if at least one of the following are true, otherwise let it be false:
- part ’s suffix is not the empty string.
- part ’s prefix is not the empty string and is not options ’s prefix code point .
-
If all of the following are true:
- needs grouping is false; and
- custom name is true; and
-
part
’s
type
is
"
segment-wildcard
"; and -
part
’s
modifier
is
"
none
"; and - next part is not null; and
- next part ’s prefix is the empty string; and
- next part ’s suffix is the empty string
-
If next part ’s type is "
fixed-text
":-
Set needs grouping to true if the result of running is a valid name code point given next part ’s value ’s first code point and the boolean false is true.
-
-
Otherwise:
-
Set needs grouping to true if next part ’s name [0] is an ASCII digit .
-
-
If all of the following are true:
- needs grouping is false; and
- part ’s prefix is the empty string; and
- previous part is not null; and
-
previous
part
’s
type
is
"
fixed-text
"; and - previous part ’s value ’s last code point is options ’s prefix code point .
-
If needs grouping is true, then append "
{
" to the end of result . -
Append the result of running escape a pattern string given part ’s prefix to the end of result .
-
If custom name is true:
-
Append "
:
" to the end of result . -
Append part ’s name to the end of result .
-
-
If part ’s type is "
regexp
" then:-
Append "
(
" to the end of result . -
Append part ’s value to the end of result .
-
Append "
)
" to the end of result .
-
-
Otherwise if part ’s type is "
segment-wildcard
" and custom name is false:-
Append "
(
" to the end of result . -
Append the result of running generate a segment wildcard regexp given options to the end of result .
-
Append "
)
" to the end of result .
-
-
Otherwise if part ’s type is "
full-wildcard
":-
If custom name is false and one of the following is true:
- previous part is null; or
-
previous
part
’s
type
is
"
fixed-text
"; or -
previous
part
’s
modifier
is
not
"
none
"; or - needs grouping is true; or
- part ’s prefix is not the empty string
*
" to the end of result . -
Otherwise:
-
Append "
(
" to the end of result . -
Append full wildcard regexp value to the end of result .
-
Append "
)
" to the end of result .
-
-
-
If all of the following are true:
-
part
’s
type
is
"
segment-wildcard
"; and - custom name is true; and
- part ’s suffix is not the empty string; and
- The result of running is a valid name code point given part ’s suffix ’s first code point and the boolean false is true
\
) to the end of result . -
part
’s
type
is
"
-
Append the result of running escape a pattern string given part ’s suffix to the end of result .
-
If needs grouping is true, then append "
}
" to the end of result . -
Append the result of running convert a modifier to a string given part ’s modifier to the end of result .
-
-
Return result .
-
Assert : input is an ASCII string .
-
Let result be the empty string.
-
Let index be 0.
-
While index is less than input ’s length :
-
Let c be input [ index ].
-
Increment index by 1.
-
If c is one of:
-
U+002B
(
+
); -
U+002A
(
*
); -
U+003F
(
?
); -
U+003A
(
:
); -
U+007B
(
{
); -
U+007D
(
}
); -
U+0028
(
(
); -
U+0029
(
)
); or -
U+005C
(
\
),
then append U+005C (
\
) to the end of result . -
U+002B
(
-
Append c to the end of result .
-
-
Return result .
-
If modifier is "
zero-or-more
", then return "*
". -
If modifier is "
optional
", then return "?
". -
If modifier is "
one-or-more
", then return "+
". -
Return the empty string.
3. Canonicalization
3.1. Encoding callbacks
-
If value is the empty string, return value .
-
Let dummyURL be a new URL record .
-
Let parseResult be the result of running the basic URL parser given value followed by "
://dummy.test
", with dummyURL as url .Note, state override is not used here because it enforces restrictions that are only appropriate for the
protocol
setter. Instead we use the protocol to parse a dummy URL using the normal parsing entry point. -
If parseResult is failure, then throw a
TypeError
. -
Return dummyURL ’s scheme .
-
If value is the empty string, return value .
-
Let dummyURL be a new URL record .
-
Set the username given dummyURL and value .
-
Return dummyURL ’s username .
-
If value is the empty string, return value .
-
Let dummyURL be a new URL record .
-
Set the password given dummyURL and value .
-
Return dummyURL ’s password .
-
If value is the empty string, return value .
-
Let dummyURL be a new URL record .
-
If protocolValue was given, then set dummyURL ’s scheme to protocolValue .
We set the URL record ’s scheme in order for the basic URL parser to recognize and normalize non-opaque hostname values.
Let parseResult be the result of running the basic URL parser given value with dummyURL as url and hostname state as state override .
-
If parseResult is failure, then throw a
TypeError
. -
Return dummyURL ’s host , serialized , or empty string if it is null.
Return the result of running canonicalize a hostname given value and "
https
".
-
Let result be the empty string.
-
For each code point in value interpreted as a list of code points :
-
If all of the following are true:
- code point is not an ASCII hex digit ;
-
code
point
is
not
U+005B
(
[
); -
code
point
is
not
U+005D
(
]
); and -
code
point
is
not
U+003A
(
:
),
then throw a
TypeError
. -
Append the result of running ASCII lowercase given code point to the end of result .
-
-
Return result .
-
If portValue is the empty string, return portValue .
-
Let dummyURL be a new URL record .
-
If protocolValue was given, then set dummyURL ’s scheme to protocolValue .
Note, we set the URL record ’s scheme in order for the basic URL parser to recognize and normalize default port values.
-
Let parseResult be the result of running basic URL parser given portValue with dummyURL as url and port state as state override .
-
If parseResult is failure, then throw a
TypeError
. -
Return dummyURL ’s port , serialized , or empty string if it is null.
-
If value is the empty string, then return value .
-
Let leading slash be true if the first code point in value is U+002F (
/
) and otherwise false. -
Let modified value be "
/-
" if leading slash is false and otherwise the empty string.The URL parser will automatically prepend a leading slash to the canonicalized pathname. This does not work here unfortunately. This algorithm is called for pieces of the pathname, instead of the entire pathname, when used as an encoding callback. Therefore we disable the prepending of the slash by inserting our own. An additional character is also inserted here in order to avoid inadvertantly collapsing a leading dot due to the fake leading slash being interpreted as a "
/.
" sequence. These inserted characters are then removed from the result below.Note, implementations are free to simply disable slash prepending in their URL parsing code instead of paying the performance penalty of inserting and removing characters in this algorithm.
-
Append value to the end of modified value .
-
Let dummyURL be a new URL record .
-
Let parseResult be the result of running basic URL parser given modified value with dummyURL as url and path start state as state override .
-
If parseResult is failure, then throw a
TypeError
. -
Let result be the result of URL path serializing dummyURL .
-
If leading slash is false, then set result to the code point substring from 2 to the end of the string within result .
-
Return result .
-
If value is the empty string, return value .
-
Let dummyURL be a new URL record .
-
Set dummyURL ’s path to the empty string.
-
Let parseResult be the result of running URL parsing given value with dummyURL as url and opaque path state as state override .
-
If parseResult is failure, then throw a
TypeError
. -
Return the result of URL path serializing dummyURL .
-
If value is the empty string, return value .
-
Let dummyURL be a new URL record .
-
Set dummyURL ’s query to the empty string.
-
Let parseResult be the result of running basic URL parser given value with dummyURL as url and query state as state override .
-
If parseResult is failure, then throw a
TypeError
. -
Return dummyURL ’s query .
-
If value is the empty string, return value .
-
Let dummyURL be a new URL record .
-
Set dummyURL ’s fragment to the empty string.
-
Let parseResult be the result of running basic URL parser given value with dummyURL as url and fragment state as state override .
-
If parseResult is failure, then throw a
TypeError
. -
Return dummyURL ’s fragment .
3.2.
URLPatternInit
processing
URLPatternInit
init
,
a
string
type
,
a
string
or
null
protocol
,
a
string
or
null
username
,
a
string
or
null
password
,
a
string
or
null
hostname
,
a
string
or
null
port
,
a
string
or
null
pathname
,
a
string
or
null
search
,
and
a
string
or
null
hash
:
-
Let result be the result of creating a new
URLPatternInit
. -
If protocol is not null, set result ["
protocol
"] to protocol . -
If username is not null, set result ["
username
"] to username . -
If password is not null, set result ["
password
"] to password . -
If hostname is not null, set result ["
hostname
"] to hostname . -
If pathname is not null, set result ["
pathname
"] to pathname . -
Let baseURL be null.
-
If init ["
baseURL
"] exists :The base URL can be used to supply additional context, but for each component, if init includes a component which is at least as specific as one in the base URL, none is inherited.A component is more specific if it appears later in one of the following two lists (which are very similar to the order they appear in the URL syntax):
-
protocol, hostname, port, pathname, search, hash
-
protocol, hostname, port, username, password
Username and password are also never inherited from a base URL when constructing a
URLPattern
. (They are, however, inherited from the base URL when parsing a URL supplied as an argument totest()
orexec()
.)-
Set baseURL to the result of running the basic URL parser on init ["
baseURL
"]. -
If baseURL is failure, then throw a
TypeError
. -
If init ["
protocol
"] does not exist , then set result ["protocol
"] to the result of processing a base URL string given baseURL ’s scheme and type . -
If type is not "
pattern
" and init contains none of "protocol
", "hostname
", "port
" and "username
", then set result ["username
"] to the result of processing a base URL string given baseURL ’s username and type . -
If type is not "
pattern
" and init contains none of "protocol
", "hostname
", "port
", "username
" and "password
", then set result ["password
"] to the result of processing a base URL string given baseURL ’s password and type . -
If init contains neither "
protocol
" nor "hostname
", then:-
Let baseHost be the empty string.
-
If baseURL ’s host is not null, then set baseHost to its serialization .
-
Set result ["
hostname
"] to the result of processing a base URL string given baseHost and type .
-
-
If init contains none of "
protocol
", "hostname
", and "port
", then:-
If baseURL ’s port is null, then set result ["
port
"] to the empty string. -
Otherwise, set result ["
port
"] to baseURL ’s port , serialized .
-
-
If init contains none of "
protocol
", "hostname
", "port
", and "pathname
", then set result ["pathname
"] to the result of processing a base URL string given the result of URL path serializing baseURL and type . -
If init contains none of "
protocol
", "hostname
", "port
", "pathname
", and "search
", then:-
Let baseQuery be baseURL ’s query .
-
If baseQuery is null, then set baseQuery to the empty string.
-
Set result ["
search
"] to the result of processing a base URL string given baseQuery and type .
-
-
If init contains none of "
protocol
", "hostname
", "port
", "pathname
", "search
", and "hash
", then:-
Let baseFragment be baseURL ’s fragment .
-
If baseFragment is null, then set baseFragment to the empty string.
-
Set result ["
hash
"] to the result of processing a base URL string given baseFragment and type .
-
-
-
If init ["
protocol
"] exists , then set result ["protocol
"] to the result of process protocol for init given init ["protocol
"] and type . -
If init ["
username
"] exists , then set result ["username
"] to the result of process username for init given init ["username
"] and type . -
If init ["
password
"] exists , then set result ["password
"] to the result of process password for init given init ["password
"] and type . -
If init ["
hostname
"] exists , then set result ["hostname
"] to the result of process hostname for init given init ["hostname
"]"], result ["protocol
"], and type . -
If init ["
port
"] exists , then set result ["port
"] to the result of process port for init given init ["port
"], result ["protocol
"], and type . -
If init ["
pathname
"] exists :-
If the following are all true:
- baseURL is not null;
- baseURL does not have an opaque path ; and
-
the
result
of
running
is
an
absolute
pathname
given
result
["
pathname
"] and type is false,
then:
-
Let baseURLPath be the result of running process a base URL string given the result of URL path serializing baseURL and type .
-
Let slash index be the index of the last U+002F (
/
) code point found in baseURLPath , interpreted as a sequence of code points , or null if there are no instances of the code point. -
If slash index is not null:
-
Let new pathname be the code point substring from 0 to slash index + 1 within baseURLPath .
-
Append result ["
pathname
"] to the end of new pathname . -
Set result ["
pathname
"] to new pathname .
-
-
Set result ["
pathname
"] to the result of process pathname for init given result ["pathname
"], result ["protocol
"], and type .
-
If init ["
search
"] exists then set result ["search
"] to the result of process search for init given init ["search
"] and type . -
If init ["
hash
"] exists then set result ["hash
"] to the result of process hash for init given init ["hash
"] and type . -
Return result .
-
Assert : input is not null.
-
If type is not "
pattern
" return input . -
Return the result of escaping a pattern string given input .
-
If input is the empty string, then return false.
-
If input [0] is U+002F (
/
), then return true. -
If type is "
url
", then return false. -
If input ’s code point length is less than 2, then return false.
-
If input [0] is U+005C (
\
) and input [1] is U+002F (/
), then return true. -
If input [0] is U+007B (
{
) and input [1] is U+002F (/
), then return true. -
Return false.
-
Let strippedValue be the given value with a single trailing U+003A (
:
) removed, if any. -
If type is "
pattern
" then return strippedValue . -
Return the result of running canonicalize a protocol given strippedValue .
-
If type is "
pattern
" then return value . -
Return the result of running canonicalize a username given value .
-
If type is "
pattern
" then return value . -
Return the result of running canonicalize a password given value .
-
If type is "
pattern
" then returnvaluehostnameValue . -
If protocolValue is a special scheme or the empty string, then return the result of running canonicalize a domain name given hostnameValue .
If the protocolValue is the empty string then no value was provided for
protocol
in the constructor dictionary. Normally we do not special case empty string dictionary values, but in this case we treat it as a special scheme in order to default to the most common hostname canonicalization. Return the result of running canonicalize a hostname given
valuehostnameValue .
-
If type is "
pattern
" then return portValue . -
Return the result of running canonicalize a port given portValue and protocolValue .
-
If type is "
pattern
" then return pathnameValue . -
If protocolValue is a special scheme or the empty string, then return the result of running canonicalize a pathname given pathnameValue .
If the protocolValue is the empty string then no value was provided for
protocol
in the constructor dictionary. Normally we do not special case empty string dictionary values, but in this case we treat it as a special scheme in order to default to the most common pathname canonicalization. -
Return the result of running canonicalize an opaque pathname given pathnameValue .
-
Let strippedValue be the given value with a single leading U+003F (
?
) removed, if any. -
If type is "
pattern
" then return strippedValue . -
Return the result of running canonicalize a search given strippedValue .
-
Let strippedValue be the given value with a single leading U+0023 (
#
) removed, if any. -
If type is "
pattern
" then return strippedValue . -
Return the result of running canonicalize a hash given strippedValue .
4. Using URL patterns in other specifications
To promote consistency on the web platform, other documents integrating with this specification should adhere to the following guidelines, unless there is good reason to diverge.
-
Accept shorthands . Most author patterns will be simple and straightforward. Accordingly, APIs should accept shorthands for those common cases and avoid the need for authors to take additional steps to transform these into complete
URLPattern
objects. -
Respect the base URL . Just as URLs are generally parsed relative to a base URL for their environment (most commonly, a document base URL ), URL patterns should respect this as well. The
URLPattern
constructor itself is an exception because it directly exposes the concept itself, similar to how the URL constructor does not respect the base URL even though the rest of the platform does. -
Be clear about regexp groups . Some APIs may benefit from only allowing URL patterns which do not have regexp groups , for example, because user agents are likely to implement them in a different thread or process from those executing author script, and because of security or performance concerns, a JavaScript engine would not ordinarily run there. If so, this should be clearly documented (with reference to has regexp groups ) and the operation should report an error as soon as possible (e.g., by throwing a JavaScript exception). If possible, this should be feature-detectable to allow for the possibility of this constraint being lifted in the future. Avoid creating different subsets of URL patterns without consulting the editors of this specification.
-
Be clear about what URLs will be matched . For instance, algorithms during fetching are likely to operate on URLs with no fragment . If so, the specification should be clear that this is the case, and may advise showing a developer warning if a pattern which cannot match (e.g., because it requires a non-empty fragment) is used.
4.1. Integrating with JavaScript APIs
typedef (USVString or URLPatternInit or URLPattern );
URLPatternCompatible
JavaScript APIs should accept all of:
-
a
URLPattern
object -
a dictionary-like object which specifies the components required to construct a pattern
-
a string (in the constructor string syntax )
To
accomplish
this,
specifications
should
accept
URLPatternCompatible
as
an
argument
to
an
operation
or
dictionary
member
,
and
process
it
using
the
following
algorithm,
using
the
appropriate
environment
settings
object
’s
API
base
URL
or
equivalent.
URLPattern
object
from
a
Web
IDL
value
URLPatternCompatible
input
given
URL
baseURL
and
realm
realm
,
perform
the
following
steps:
-
If the specific type of input is
URLPattern
:-
Return input .
-
-
Otherwise:
-
Let pattern be a new
URLPattern
with realm . -
Set pattern ’s associated URL pattern to the result of building a URL pattern from a Web IDL value given input and baseURL .
-
Return pattern .
-
URLPatternCompatible
input
given
URL
baseURL
,
perform
the
following
steps:
-
If the specific type of input is
URLPattern
:-
Return input ’s associated URL pattern .
-
-
Otherwise, if the specific type of input is
URLPatternInit
: -
Otherwise:
-
Assert : The specific type of input is
USVString
. -
Return the result of creating a URL pattern given input , the serialization of baseURL , and an empty map .
-
This allows authors to concisely specify most patterns, and use the constructor to access uncommon options if necessary. The implicit use of the base URL is similar to, and consistent with, HTML ’s parse a URL algorithm. [HTML]
4.2. Integrating with JSON data formats
JSON data formats which include URL patterns should mirror the behavior of JavaScript APIs and accept both:
-
an object which specifies the components required to construct a pattern
-
a string (in the constructor string syntax )
If a specification has an Infra value (e.g., after using parse a JSON string to an Infra value ), use the following algorithm, using the appropriate base URL (by default, the URL of the JSON resource). [INFRA]
-
Let serializedBaseURL be the serialization of baseURL .
-
If rawPattern is a string , then:
-
Otherwise, if rawPattern is a map , then:
-
Let init be «[ "
baseURL
" → serializedBaseURL ]», representing a dictionary of typeURLPatternInit
. -
For each key → value of rawPattern :
-
If key is not the identifier of a dictionary member of
URLPatternInit
or one of its inherited dictionaries , value is not a string , or the member’s type is not declared to beUSVString
, then return null.This will need to be updated ifURLPatternInit
gains members of other types.A future version of this specification might also have a less strict mode, if that proves useful to other specifications. -
Set init [ key ] to value .
-
-
Return the result of creating a URL pattern given init , null, and an empty map .
It might become necessary in the future to plumb non-empty options here.
-
-
Otherwise, return null.
Specifications
may
wish
to
leave
room
in
their
formats
to
accept
options
for
URLPatternOptions
,
override
the
base
URL,
or
similar,
since
it
is
not
possible
to
construct
a
URLPattern
object
directly
in
this
case,
unlike
in
a
JavaScript
API.
For
example,
Speculation
Rules
accepts
a
"
relative_to
"
key
which
can
be
used
to
switch
to
using
the
document
base
URL
instead
of
the
JSON
resource’s
URL.
[SPECULATION-RULES]
4.3. Integrating with HTTP header fields
HTTP headers which include URL patterns should accept a string in the constructor string syntax , likely as part of a structured field [RFC9651] .
Specifications
for
HTTP
headers
should
operate
on
URL
patterns
(e.g.,
using
the
match
algorithm)
rather
than
URLPattern
objects
(which
imply
the
existence
of
a
JavaScript
realm
).
-
Let serializedBaseURL be the serialization of baseURL .
-
Return the result of creating a URL pattern given rawPattern , serializedBaseURL , and an empty map .
Acknowledgments
The
editors
would
like
to
thank
Alex
Russell,
Anne
van
Kesteren,
Asa
Kusuma,
Blake
Embrey,
Cyrus
Kasaaian,
Daniel
Murphy,
Darwin
Huang,
Devlin
Cronin,
Domenic
Denicola,
Dominick
Ng,
Jake
Archibald,
Jeffrey
Posnick,
Jeremy
Roman,
Jimmy
Shen,
Joe
Gregorio,
Joshua
Bell,
Kenichi
Ishibashi,
Kenji
Baheux,
Kenneth
Rohde
Christiansen,
Kingsley
Ngan,
Kinuko
Yasuda,
L.
David
Baron,
Luca
Casonato,
Łukasz
Anforowicz,
Makoto
Shimazu,
Marijn
Kruisselbrink,
Matt
Falkenhagen,
Matt
Giuca,
Michael
Landry,
R.
Samuel
Klatchko,
Rajesh
Jagannathan,
Ralph
Chelala,
Sangwhan
Moon,
Sayan
Pal,
Victor
Costan,
Yoshisato
Yanagisawa,
and
Youenn
Fablet
Fablet,
and
Yves-Marie
K.
Rinquin
for
their
contributors
to
this
specification.
Special thanks to Blake Embrey and the other pillarjs/path-to-regexp contributors for building an excellent open source library that so many have found useful.
Also,
special
thanks
to
Kenneth
Rohde
Christiansen
for
his
work
on
the
polyfill.
He
put
in
extensive
work
to
adapt
to
the
changing
URLPattern
API.
This standard is written by Ben Kelly ( Google , wanderview@chromium.org ), Jeremy Roman ( Google , jbroman@chromium.org ), and 宍戸俊哉 (Shunya Shishido, Google , sisidovski@chromium.org ).
Intellectual property rights
Copyright © WHATWG (Apple, Google, Mozilla, Microsoft). This work is licensed under a Creative Commons Attribution 4.0 International License . To the extent portions of it are incorporated into source code, such portions in the source code are licensed under the BSD 3-Clause License instead.
This is the Living Standard. Those interested in the patent-review version should view the Living Standard Review Draft .