6

遇到一个奇怪的问题,用 xpath 抓取这个网站 id 为'wc_channels'的 table 标签,会直...

 2 years ago
source link: https://www.v2ex.com/t/846110
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

V2EX  ›  Python

遇到一个奇怪的问题,用 xpath 抓取这个网站 id 为'wc_channels'的 table 标签,会直接从 table 标签一直获取的 html 文件结尾

  tg11 · 18 小时 26 分钟前 · 374 次点击
import ssl
from urllib import request
from lxml import etree


def urllibGet(url):
    # 取消全局证书验证
    ssl._create_default_https_context = ssl._create_unverified_context
    headers = {
        'Host': 'www.livesoccertv.com',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0',
    }
    res = request.Request(url=url, headers=headers)
    res = request.urlopen(res)
    return res.read().decode('utf-8')


res = urllibGet('https://www.livesoccertv.com/match/4027329/aston-villa-vs-tottenham-hotspur/')
html = etree.HTML(res)
channel_list = html.xpath('//table[@id="wc_channels"]')
html_str = etree.tostring(channel_list[0], encoding='utf-8').decode()
open('content.html', 'w').write(html_str)

是我哪里写错了吗?


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK