90

做敏感词监测,为什么都要先做中文分词?

 6 years ago
source link: https://www.v2ex.com/t/450304
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Python - @Ranyxr - 因为公司的需要,我做了一个敏感词监测的功能,这样实现的:1、爬虫抓取网页源码2、从网页源码提取所有中文文本(数据清洗)3、对中文文本进行分词(用 Python 的 jieba 来分的词)

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK