19

GNE v0.1 正式发布: 4 行代码开发新闻网站通用爬虫

 4 years ago
source link: https://juejin.im/post/5e0c7c546fb9a047f66eafea
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
GNE(GeneralNewsExtractor)是一个通用新闻网站正文抽取模块,输入一篇新闻网页的 HTML, 输出正文内容、标题、作者、发布时间、正文中的图片地址和正文所在的标签源代码。GNE在提取今日头条、网易新闻、游民星空、 观察者网、凤凰网、腾讯新闻

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK