3
使用urlliib.parse库解析url
source link: https://www.lujun9972.win/blog/2018/02/22/%E4%BD%BF%E7%94%A8urlliib.parse%E5%BA%93%E8%A7%A3%E6%9E%90url/index.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
解析url
urlparse()
函数可以将url解析成 ParseResult
对象. 对象中包含了六个元素,分别为:
- 协议(scheme)
- 域名(netloc)
- 路径(path)
- 路径参数(params)
- 查询参数(query)
- 片段(fragment)
from urllib.parse import urlparse url='http://user:pwd@domain:80/path;params?query=queryarg#fragment' parsed_result=urlparse(url) print('parsed_result包含了',len(parsed_result),'个元素') print(parsed_result)
parsed_result包含了 6 个元素 ParseResult(scheme='http', netloc='user:pwd@domain:80', path='/path', params='params', query='query=queryarg', fragment='fragment')
ParseResult
继承于 namedtuple
, 因此既可以同时通过索引和命名属性来获取URL中各部分的值
为了方便起见, ParseResult
还提供了 username
, password
, hostname
, port
对 netloc
进一步进行拆分。
print('scheme :', parsed_result.scheme) print('netloc :', parsed_result.netloc) print('path :', parsed_result.path) print('params :', parsed_result.params) print('query :', parsed_result.query) print('fragment:', parsed_result.fragment) print('username:', parsed_result.username) print('password:', parsed_result.password) print('hostname:', parsed_result.hostname) print('port :', parsed_result.port)
scheme : http netloc : user:pwd@domain:80 path : /path params : params query : query=queryarg fragment: fragment username: user password: pwd hostname: domain port : 80
除了 urlparse()
之外,还有一个类似的 urlsplit()
函数也能对URL进行拆分,所不同的是, urlsplit()
并不会把 路径参数(params)
从 路径(path)
中分离出来。
当URL中path部分包含多个参数时,使用 urlparse()
解析是有问题的
url='http://user:pwd@domain:80/path1;params1/path2;params2?query=queryarg#fragment' parsed_result=urlparse(url) print(parsed_result) print('parsed.path :', parsed_result.path) print('parsed.params :', parsed_result.params)
ParseResult(scheme='http', netloc='user:pwd@domain:80', path='/path1;params1/path2', params='params2', query='query=queryarg', fragment='fragment') parsed.path : /path1;params1/path2 parsed.params : params2
这时可以使用 urlsplit()
来解析
from urllib.parse import urlsplit split_result=urlsplit(url) print(split_result) print('split.path :', split_result.path) # SplitResult 没有params属性
SplitResult(scheme='http', netloc='user:pwd@domain:80', path='/path1;params1/path2;params2', query='query=queryarg', fragment='fragment') split.path : /path1;params1/path2;params2
若只是要将URL后的fragment标识拆分出来,可以使用 urldefrag()
函数
from urllib.parse import urldefrag url = 'http://user:pwd@domain:80/path1;params1/path2;params2?query=queryarg#fragment' d = urldefrag(url) print(d) print('url :', d.url) print('fragment:', d.fragment)
DefragResult(url='http://user:pwd@domain:80/path1;params1/path2;params2?query=queryarg', fragment='fragment') url : http://user:pwd@domain:80/path1;params1/path2;params2?query=queryarg fragment: fragment
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK