6
遇到一个奇怪的问题,用 xpath 抓取这个网站 id 为'wc_channels'的 table 标签,会直...
source link: https://www.v2ex.com/t/846110
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
遇到一个奇怪的问题,用 xpath 抓取这个网站 id 为'wc_channels'的 table 标签,会直接从 table 标签一直获取的 html 文件结尾
tg11 · 18 小时 26 分钟前 · 374 次点击import ssl
from urllib import request
from lxml import etree
def urllibGet(url):
# 取消全局证书验证
ssl._create_default_https_context = ssl._create_unverified_context
headers = {
'Host': 'www.livesoccertv.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0',
}
res = request.Request(url=url, headers=headers)
res = request.urlopen(res)
return res.read().decode('utf-8')
res = urllibGet('https://www.livesoccertv.com/match/4027329/aston-villa-vs-tottenham-hotspur/')
html = etree.HTML(res)
channel_list = html.xpath('//table[@id="wc_channels"]')
html_str = etree.tostring(channel_list[0], encoding='utf-8').decode()
open('content.html', 'w').write(html_str)
是我哪里写错了吗?
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK