遇到一个奇怪的问题，用 xpath 抓取这个网站 id 为'wc_channels'的 table 标签，会直...

2 years ago

source link: https://www.v2ex.com/t/846110
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

V2EX › Python

遇到一个奇怪的问题，用 xpath 抓取这个网站 id 为'wc_channels'的 table 标签，会直接从 table 标签一直获取的 html 文件结尾

tg11 · 18 小时 26 分钟前 · 374 次点击

import ssl
from urllib import request
from lxml import etree


def urllibGet(url):
    # 取消全局证书验证
    ssl._create_default_https_context = ssl._create_unverified_context
    headers = {
        'Host': 'www.livesoccertv.com',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0',
    }
    res = request.Request(url=url, headers=headers)
    res = request.urlopen(res)
    return res.read().decode('utf-8')


res = urllibGet('https://www.livesoccertv.com/match/4027329/aston-villa-vs-tottenham-hotspur/')
html = etree.HTML(res)
channel_list = html.xpath('//table[@id="wc_channels"]')
html_str = etree.tostring(channel_list[0], encoding='utf-8').decode()
open('content.html', 'w').write(html_str)

是我哪里写错了吗？

Recommend

遇到一个奇怪的问题，用 xpath 抓取这个网站 id 为'wc_channels'的 table 标签，会直...

遇到一个奇怪的问题，用 xpath 抓取这个网站 id 为'wc_channels'的 table 标签，会直接从 table 标签一直获取的 html 文件结尾

Recommend

Hyperledger Fabric无系统通道启动及通道的创建和删除 - 丿风色幻想

企业如何创建更环保的数据中心？

茅台重返电商，究竟在打什么算盘？

国内码农和国外码农区别对比，太真实了…

时尚圈的「元宇宙」复刻了 20 年前的失败

iOS微信支持更换桌面图标，原来方法这么简单

8大技术：认识元宇宙技术的框架研究

就算是智商税，他们也认了

办公室网络还能这么搭建？从入门到精通，看这里

设计模式学习笔记（二十一）访问者模式及其实现 - 归斯君

About Joyk