95

.Net文本抽取类库 Thrinax(一)基于网页区块的正文抽取 | 地平线

 6 years ago
source link: https://www.tnidea.com/new-netcore-unstructured-text-capture-lib-thrinax.html?
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
好久不见,这次给大家带来一个全新的基于 .Net 的中文网页信息抽取的类库,Thrinax。该库的目标是通过一种简单的,低人工参与的方式来实现稳定的获取网页中的有效信息;这将会是一个系列文章,在书写文章的同时,类库也会不断完善,今天带来第一篇,基于网页区块的详情页信息抽取。

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK