Parse (absolute and relative) URLs.

urlparse module is based upon the following RFC specifications. RFC 2732 : "Format for Literal IPv6 Addresses in URL's by R.Hinden, B.Carpenter and L.Masinter, December 1999. RFC 2396: "Uniform Resource Identifiers (URI)": Generic Syntax by T. Berners-Lee, R. Fielding, and L. Masinter, August 1998. RFC 2368: "The mailto URL scheme", by P.Hoffman , L Masinter, J. Zawinski, July 1998. RFC 1808: "Relative Uniform Resource Locators", by R. Fielding, UC Irvine, June 1995. RFC 1738: "Uniform Resource Locators (URL)" by T. Berners-Lee, L. Masinter, M. McCahill, December 1994 RFC 3986 is considered the current standard and any future changes to urlparse module should conform with it. The urlparse module is currently not entirely compliant with this RFC due to defacto scenarios for parsing, and for backward compatibility purposes, some parsing quirks from older RFCs are retained. The testcases in test_urlparse.py provides a good indicator of parsing behavior. Nurlparse urlunparseurljoin urldefragurlsplit urlunsplit urlencodeparse_qs parse_qslquote quote_plusquote_from_bytesunquote unquote_plusunquote_to_bytes DefragResult ParseResult SplitResultDefragResultBytesParseResultBytesSplitResultBytesZftphttpZgopherZnntpZimapZwaisfileZhttpsZshttpZmmsZprosperoZrtspZrtspuZsftpZsvnzsvn+sshZwsZwssZtelnetZsnewsZrsyncZnfsZgitzgit+sshZhdlZsipZsipsZtelZmailtoZnewszAabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789+-.cCsttdS)z,Clear the parse cache and the quoters cache.N) _parse_cacheclear _safe_quotersrr"/usr/lib/python3.8/urllib/parse.py clear_cacheTsr asciistrictcCs|SNr)objrrr_noopcsr%cCs |||Sr#encode)r$encodingerrorsrrr_encode_resultfsr*cstfdd|DS)Nc3s"|]}|r|ndVqdS)rNdecode.0xr(r)rr lsz_decode_args..)tuple)argsr(r)rr0r _decode_argsjsr4cGsVt|dt}|ddD]}|rt|t|krtdq|rH|tfSt|tfS)Nrz$Cannot mix str and non-str arguments) isinstancestr TypeErrorr%r4r*)r3Z str_inputargrrr _coerce_argsns  r:c@seZdZdZdZdddZdS) _ResultMixinStrz>Standard approach to encoding parsed results from str to bytesrr!r"cs|jfdd|DS)Nc3s|]}|VqdSr#r&r-r0rrr1sz)_ResultMixinStr.encode..)_encoded_counterpartselfr(r)rr0rr'sz_ResultMixinStr.encodeN)r!r")__name__ __module__ __qualname____doc__ __slots__r'rrrrr;sr;c@seZdZdZdZdddZdS) _ResultMixinBytesz>Standard approach to decoding parsed results from bytes to strrr!r"cs|jfdd|DS)Nc3s|]}|VqdSr#r+r-r0rrr1sz+_ResultMixinBytes.decode..)_decoded_counterpartr=rr0rr,sz_ResultMixinBytes.decodeN)r!r")r?r@rArBrCr,rrrrrDsrDc@sDeZdZdZdZeddZeddZeddZed d Z d S) _NetlocResultMixinBasezHShared methods for the parsed result objects containing a netloc elementrcCs |jdS)Nr _userinfor>rrrusernamesz_NetlocResultMixinBase.usernamecCs |jdS)Nr5rGrIrrrpasswordsz_NetlocResultMixinBase.passwordcCsD|jd}|sdSt|tr dnd}||\}}}|||S)Nr%%) _hostinfor6r7 partitionlower)r>hostnameZ separatorZpercentZzonerrrrQs  z_NetlocResultMixinBase.hostnamecCsl|jd}|dk rhzt|d}Wn(tk rHd|}t|dYnXd|kr^dkshntd|S)Nr5 z+Port could not be cast to integer value as rizPort out of range 0-65535)rNint ValueError)r>portmessagerrrrUs  z_NetlocResultMixinBase.portN) r?r@rArBrCpropertyrJrKrQrUrrrrrFs   rFc@s(eZdZdZeddZeddZdS)_NetlocResultMixinStrrcCsD|j}|d\}}}|r4|d\}}}|sr\ZuserinfoZ have_infohostinforJZ have_passwordrKrrrrHsz_NetlocResultMixinStr._userinfocCsl|j}|d\}}}|d\}}}|rL|d\}}}|d\}}}n|d\}}}|sdd}||fS)NrY[]rZr[r>r\_r_Z have_open_brZ bracketedrQrUrrrrNsz_NetlocResultMixinStr._hostinfoNr?r@rArCrWrHrNrrrrrXs  rXc@s(eZdZdZeddZeddZdS)_NetlocResultMixinBytesrcCsD|j}|d\}}}|r4|d\}}}|s:///;?# Return a 6-tuple: (scheme, netloc, path, params, query, fragment). Note that we don't break the components up in smaller bits (e.g. netloc is a single string) and we don't expand % escapes.;r)r:r uses_params _splitparamsr) rnschemeallow_fragments_coerce_resultZ splitresultr\queryrmparamsresultrrrrns cCsRd|kr,|d|d}|dkr6|dfSn |d}|d|||ddfS)N/rurrr5)findrfind)rnirrrrw~s   rwcCsHt|}dD]"}|||}|dkr t||}q |||||dfS)Nz/?#r)lenrmin)rnstartdelimcZwdelimrrr _splitnetlocs   rcCs|r |rdSddl}|dd}|dd}|dd}|dd}|d|}||kr`dSdD] }||krdtd |d d qddS) NrrYrrZrk?ZNFKCz/?#@:znetloc 'z' contains invalid z#characters under NFKC normalization)isascii unicodedatareplaceZ normalizerT)r\rnZnetloc2rrrr _checknetlocs       rc CsTt||\}}}t|}|||t|t|f}t|d}|rF||StttkrXtd}}}|d} | dkr|d| dkr>|| dd}|dddkrt |d\}}d |krd |ksd |krd |krt d |rd |kr| d d\}}d |kr| d d\}}t |t d||||} | t|<|| S|d| D]} | tkrJqqJ|| dd} | rtdd| Dr|d| | }}|dddkrt |d\}}d |krd |ksd |krd |krt d |rd |kr| d d\}}d |kr,| d d\}}t |t |||||} | t|<|| S)aParse a URL into 5 components: :///?# Return a 5-tuple: (scheme, netloc, path, query, fragment). Note that we don't break the components up in smaller bits (e.g. netloc is a single string) and we don't expand % escapes.NrrZrrr5//r`razInvalid IPv6 URLrkrcss|]}|dkVqdS) 0123456789Nrr.rrrrr1szurlsplit..)r:booltypergetrMAX_CACHE_SIZEr rrrTsplitrr scheme_charsanyrP) rnrxryrzkeycachedr\r{rmrvrrestrrrrsf          cCs<t|\}}}}}}}|r&d||f}|t|||||fS)zPut a parsed URL back together again. This may result in a slightly different, but equivalent URL, if the URL that was parsed originally had redundant delimiters, e.g. a ? with an empty query (the draft states that these are equivalent).z%s;%s)r:r) componentsrxr\rnr|r{rmrzrrrrs  cCst|\}}}}}}|s4|r`|tkr`|dddkr`|rP|dddkrPd|}d|pXd|}|rp|d|}|r|d|}|r|d |}||S) akCombine the elements of a tuple as returned by urlsplit() into a complete URL as a string. The data argument can be any five-item iterable. This may result in a slightly different, but equivalent URL, if the URL that was parsed originally had unnecessary delimiters (for example, a ? with an empty query; the RFC states that these are equivalent).Nrrr5r~rrZrrk)r: uses_netloc)rrxr\rnr{rmrzrrrrs    c Cs|s|S|s|St||\}}}t|d|\}}}}}} t|||\} } } } }}| |ks`| tkrh||S| tkr| r|t| | | | ||fS|} | s| s|} |} |s|}|t| | | | ||fS|d}|ddkr|d=| dddkr| d}n(|| d}td|dd|dd<g}|D]P}|dkr\z |Wntk rXYnXn|dkrlq(n | |q(|ddkr| d|t| | d |pd| ||fS) zaJoin a base URL and a possibly relative URL to form an absolute interpretation of the latter.rr~Nr5...)rr) r:r uses_relativerrrfilterpop IndexErrorappendjoin)basernryrzZbschemeZbnetlocZbpathZbparamsZbqueryZ bfragmentrxr\pathr|r{rmZ base_partsZsegmentsZ resolved_pathZsegrrrrsp            c CsTt|\}}d|kr>t|\}}}}}}t|||||df}nd}|}|t||S)zRemoves any existing fragment from URL. Returns a tuple of the defragmented URL and the fragment. If the URL contained no fragments, the second element is the empty string. rkr)r:rrr) rnrzsrpaqZfragZdefragrrrr=s Z0123456789ABCDEFabcdefc Cs|s|jdSt|tr"|d}|d}t|dkr<|S|dg}|j}tdkrbddtDa|ddD]R}z(|t|dd ||d dWqntk r|d||YqnXqnd |S) z,unquote_to_bytes('abc%20def') -> b'abc def'.utf-8rMr5rNcSs.i|]&}tD]}||t||q qSr)_hexdigr'bytesfromhex)r.rbrrr cs  z$unquote_to_bytes..r) rr6r7r'rr _hextobyterKeyErrorr)stringbitsresritemrrrrPs,     z([-]+)rrcCsd|kr|j|S|dkrd}|dkr*d}t|}|dg}|j}tdt|dD],}|t|||||||dqTd|S) aReplace %xx escapes by their single-character equivalent. The optional encoding and errors parameters specify how to decode percent-encoded sequences into Unicode characters, as accepted by the bytes.decode() method. By default, percent-encoded sequences are decoded with UTF-8, and invalid sequences are replaced by a placeholder character. unquote('abc%20def') -> 'abc def'. rLNrrrr5rr)r_asciirerrangerrr,r)rr(r)rrrrrrrrps   Fc CsLi}t||||||d}|D]*\}} ||kr<||| q| g||<q|S)aParse a query given as a string argument. Arguments: qs: percent-encoded query string to be parsed keep_blank_values: flag indicating whether blank values in percent-encoded queries should be treated as blank strings. A true value indicates that blanks should be retained as blank strings. The default false value indicates that blank values are to be ignored and treated as if they were not included. strict_parsing: flag indicating what to do with parsing errors. If false (the default), errors are silently ignored. If true, errors raise a ValueError exception. encoding and errors: specify how to decode percent-encoded sequences into Unicode characters, as accepted by the bytes.decode() method. max_num_fields: int. If set, then throws a ValueError if there are more than n fields read by parse_qsl(). Returns a dictionary. )r(r)max_num_fields)r r) qskeep_blank_valuesstrict_parsingr(r)rZ parsed_resultpairsnamevaluerrrr s  cCst|\}}|dk rs zparse_qsl..=rzbad query field: %rrr+ r0)r:countrTrrrrr)rrrr(r)rrz num_fieldsrrZ name_valueZnvrrrrrr s4    cCs|dd}t|||S)zLike unquote(), but also replace plus signs by spaces, as required for unquoting HTML form values. unquote_plus('%7e/abc+def') -> '~/abc def' rr)rr)rr(r)rrrrs sBABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_.-~c@s(eZdZdZddZddZddZdS) QuoterzA mapping from bytes (in range(0,256)) to strings. String values are percent-encoded byte values, unless the key < 128, and in the "safe" set (either the specified safe set, or default set). cCst||_dS)zsafe: bytes object.N) _ALWAYS_SAFEunionsafe)r>rrrr__init__szQuoter.__init__cCsd|jjt|fS)Nz<%s %r>) __class__r?dictrIrrr__repr__ szQuoter.__repr__cCs(||jkrt|nd|}|||<|S)Nz%{:02X})rchrformat)r>rrrrr __missing__ szQuoter.__missing__N)r?r@rArBrrrrrrrrsrr~cCsbt|tr8|s|S|dkrd}|dkr*d}|||}n |dk rHtd|dk rXtdt||S)aquote('abc def') -> 'abc%20def' Each part of a URL, e.g. the path info, the query, etc., has a different set of reserved characters that must be quoted. The quote function offers a cautious (not minimal) way to quote a string for most of these parts. RFC 3986 Uniform Resource Identifier (URI): Generic Syntax lists the following (un)reserved characters. unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" reserved = gen-delims / sub-delims gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@" sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" Each of the reserved characters is reserved in some component of a URL, but not necessarily in all of them. The quote function %-escapes all characters that are neither in the unreserved chars ("always safe") nor the additional chars set via the safe arg. The default for the safe arg is '/'. The character is reserved, but in typical usage the quote function is being called on a path where the existing slash characters are to be preserved. Python 3.7 updates from using RFC 2396 to RFC 3986 to quote URL strings. Now, "~" is included in the set of unreserved characters. string and safe may be either str or bytes objects. encoding and errors must not be specified if string is a bytes object. The optional encoding and errors parameters specify how to deal with non-ASCII characters, as accepted by the str.encode method. By default, encoding='utf-8' (characters are encoded with UTF-8), and errors='strict' (unsupported characters raise a UnicodeEncodeError). Nrr"z,quote() doesn't support 'encoding' for bytesz*quote() doesn't support 'errors' for bytes)r6r7r'r8r )rrr(r)rrrr s' cCsdt|trd|ks$t|tr2d|kr2t||||St|trBd}nd}t|||||}|ddS)zLike quote(), but also replace ' ' with '+', as required for quoting HTML form values. Plus signs in the original string are escaped unless they are included in safe. It also does not have safe default to '/'. r r)r6r7rr r)rrr(r)Zspacerrrr Is cst|ttfstd|sdSt|tr6|dd}ntdd|D}|t|s^|Sz t |Wn&t k rt |j t |<YnXd fdd|DS)zLike quote(), but accepts a bytes object rather than a str, and does not perform string-to-bytes encoding. It always returns an ASCII string. quote_from_bytes(b'abc def?') -> 'abc%20def%3f' z!quote_from_bytes() expected bytesrr!ignorecSsg|]}|dkr|qS)rrrrrrgsz$quote_from_bytes..csg|] }|qSrr)r.charZquoterrrrns)r6r bytearrayr8r7r'rstrip_ALWAYS_SAFE_BYTESr,rrr __getitem__r)Zbsrrrrr Zs  c Cs t|dr|}nPzt|r0t|dts0tWn0tk rbt\}}}td|YnXg} |s|D]j\} } t| t r|| |} n|t | |||} t| t r|| |} n|t | |||} | | d| qpn"|D]\} } t| t r|| |} n|t | |||} t| t rB|| |} | | d| qt| t rp|| |||} | | d| qz t| } Wn:tk r|t | |||} | | d| YqX| D]B} t| t r|| |} n|t | |||} | | d| qqd | S)a^Encode a dict or sequence of two-element tuples into a URL query string. If any values in the query arg are sequences and doseq is true, each sequence element is converted to a separate parameter. If the query arg is a sequence of two-element tuples, the order of the parameters in the output will match the order of parameters in the input. The components of a query arg may each be either a string or a bytes type. The safe, encoding, and errors parameters are passed down to the function specified by quote_via (encoding and errors only if a component is a str). itemsrz1not a valid non-string sequence or mapping objectrr) hasattrrrr6r2r8sysexc_infowith_tracebackrr7rr)r{Zdoseqrr(r)Z quote_viaZtyZvatblkrr/ZeltrrrrpsR                cCstjdtddt|S)Nz/urllib.parse.to_bytes() is deprecated as of 3.8r stacklevel)warningswarnDeprecationWarning _to_bytesrnrrrto_bytess rcCsJt|trFz|d}Wn(tk rDtdt|dYnX|S)zto_bytes(u"URL") --> 'URL'.ASCIIzURL z contains non-ASCII characters)r6r7r'r, UnicodeErrorreprrrrrrs   rcCs`t|}|dddkr<|dddkr<|dd}|dddkr\|dd}|S)zTransform a string like '' into 'scheme://host/path'. The string is returned unchanged if it's not a wrapped URL. Nr5zURL:)r7striprrrrunwraps   rcCstjdtddt|S)NzUurllib.parse.splittype() is deprecated as of 3.8, use urllib.parse.urlparse() insteadrr)rrr _splittyperrrr splittypes rcCsDtdkrtdtjat|}|r<|\}}||fSd|fS)z:splittype('type:opaquestring') --> 'type', 'opaquestring'.Nz ([^/:]+):(.*)) _typeprogrecompileDOTALLmatchgroupsrP)rnrrxdatarrrrs   rcCstjdtddt|S)NzUurllib.parse.splithost() is deprecated as of 3.8, use urllib.parse.urlparse() insteadrr)rrr _splithostrrrr splithosts rcCsXtdkrtdtjat|}|rP|\}}|rH|ddkrHd|}||fSd|fS)z;splithost('//host[:port]/path') --> 'host[:port]', '/path'.Nz//([^/#?]*)(.*)rr~) _hostprogrrrrr)rnrZ host_portrrrrrs  rcCstjdtddt|S)NzUurllib.parse.splituser() is deprecated as of 3.8, use urllib.parse.urlparse() insteadrr)rrr _splituserhostrrr splitusers r cCs |d\}}}|r|nd|fS)zJsplituser('user[:passwd]@host[:port]') --> 'user[:passwd]', 'host[:port]'.rYNr])r userrrrrrsrcCstjdtddt|S)NzWurllib.parse.splitpasswd() is deprecated as of 3.8, use urllib.parse.urlparse() insteadrr)rrr _splitpasswd)r rrr splitpasswds rcCs |d\}}}||r|ndfS)z/splitpasswd('user:passwd') -> 'user', 'passwd'.rZNrO)r rZpasswdrrrr "sr cCstjdtddt|S)NzUurllib.parse.splitport() is deprecated as of 3.8, use urllib.parse.urlparse() insteadrr)rrr _splitportrrrr splitport(s rcCsDtdkrtdtjat|}|r<|\}}|r<||fS|dfS)z*splitport('host:port') --> 'host', 'port'.Nz (.*):([0-9]*)) _portprogrrr fullmatchr)r rrUrrrr1s  rrcCstjdtddt||S)NzVurllib.parse.splitnport() is deprecated as of 3.8, use urllib.parse.urlparse() insteadrr)rrr _splitnport)r defportrrr splitnport?s rcCsT|d\}}}|s|}n2|rLz t|}Wntk rBd}YnX||fS||fS)zSplit host and port, returning numeric port. Return given default port if no ':' found; defaults to -1. Return numerical port if a valid number are found after ':'. Return None if ':' but not a valid number.rZN)r]rSrT)r rrrUZnportrrrrFs  rcCstjdtddt|S)NzVurllib.parse.splitquery() is deprecated as of 3.8, use urllib.parse.urlparse() insteadrr)rrr _splitqueryrrrr splitqueryWs rcCs$|d\}}}|r||fS|dfS)z/splitquery('/path?query') --> '/path', 'query'.rNr )rnrrr{rrrr^srcCstjdtddt|S)NzTurllib.parse.splittag() is deprecated as of 3.8, use urllib.parse.urlparse() insteadrr)rrr _splittagrrrrsplittagfs rcCs$|d\}}}|r||fS|dfS)z)splittag('/path#tag') --> '/path', 'tag'.rkNr )rnrrtagrrrrmsrcCstjdtddt|S)NzUurllib.parse.splitattr() is deprecated as of 3.8, use urllib.parse.urlparse() insteadrr)rrr _splitattrrrrr splitattrus rcCs|d}|d|ddfS)zksplitattr('/path;attr1=value1;attr2=value2;...') -> '/path', ['attr1=value1', 'attr2=value2', ...].rurr5Nr)rnZwordsrrrr|s rcCstjdtddt|S)NzWurllib.parse.splitvalue() is deprecated as of 3.8, use urllib.parse.parse_qsl() insteadrr)rrr _splitvalue)attrrrr splitvalues r cCs |d\}}}||r|ndfS)z-splitvalue('attr=value') --> 'attr', 'value'.rNr)rrrrrrrsr)rT)r)rT)T)rr)FFrrN)FFrrN)rr)r~NN)rNN)r~)r)r)frBrr collectionsr__all__rrrvZnon_hierarchicalZ uses_queryZ uses_fragmentrrrr Z_implicit_encodingZ_implicit_errorsr%r*r4r:objectr;rDrFrXrerjZ_DefragResultBaseZ_SplitResultBaseZ_ParseResultBasernrmrxr\rr{r|Z ResultBaserrrrrrrtrrwrrrrrrrrrrrrrr r r frozensetrrrr defaultdictrr r r rrrrrrrrrrr rrr rrrrrrrrrrrr rrrrrs       %           9  E   ' <  6   Q