18

机器自动摘要 - sumy 简单使用

 3 years ago
source link: http://blog.ilibrary.me/2017/05/04/%E6%9C%BA%E5%99%A8%E8%87%AA%E5%8A%A8%E6%91%98%E8%A6%81-sumy-%E7%AE%80%E5%8D%95%E4%BD%BF%E7%94%A8
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
欢迎转载,请支持原创,保留原文链接:blog.ilibrary.me

#sumy 是什么 sumy 是一个机器摘要库,实现了LSA-5, LexRank, TextRank, Luhn, KL, Sum-Basic等摘要算法。

sumy 提供命令行程序供直接使用,也提供python库供代码集成.

sumy命令行怎么用

$ sumy lex-rank --length=10 --url=http://en.wikipedia.org/wiki/Automatic_summarization # what's summarization?
$ sumy luhn --language=czech --url=http://www.zdrojak.cz/clanky/automaticke-zabezpeceni/
$ sumy edmundson --language=czech --length=3% --url=http://cs.wikipedia.org/wiki/Bitva_u_Lipan
$ sumy_eval lex-rank reference_summary.txt --url=http://en.wikipedia.org/wiki/Automatic_summarization
$ sumy_eval lsa reference_summary.txt --language=czech --url=http://www.zdrojak.cz/clanky/automaticke-zabezpeceni/
$ sumy_eval edmundson reference_summary.txt --language=czech --url=http://cs.wikipedia.org/wiki/Bitva_u_Lipan

$ sumy --help # for more info

sumy python怎么用

简单粗暴,遍历各种算法。

def summarizations(originalText):
    for sum in [LSA, LexRank, Luhn, TextRank, SumBasic, KL]:
        parser = PlaintextParser.from_string(originalText, Tokenizer(LANGUAGE))
        # or for plain text files
        # parser = PlaintextParser.from_file("document.txt", Tokenizer(LANGUAGE))
        stemmer = Stemmer(LANGUAGE)

        summarizer = sum(stemmer)
        summarizer.stop_words = get_stop_words(LANGUAGE)

        
        for sentence in summarizer(parser.document, SENTENCES_COUNT):
            print(sentence)

sumy 效果怎样

Lex-rank, text-rank, luhn这三种效果感觉要优于其它算法。总体上,能用,但是和人工摘要还是差了些。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK