74

GitHub - chakki-works/sumeval: Well tested & Multi-language evaluation frame...

 6 years ago
source link: https://github.com/chakki-works/sumeval
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Well tested & Multi-language
evaluation framework for Text Summarization.

  • Well tested
  • Multi-language
    • Not only English, Japanese are also supported. The other language is extensible easily.

Of course, implementation is Pure Python!

How to use

from sumeval.metrics.rouge import RougeCalculator


rouge = RougeCalculator(stopwords=True, lang="en")

rouge_1 = rouge.rouge_n(
            summary="I went to the Mars from my living town.",
            references="I went to Mars",
            n=1)

rouge_2 = rouge.rouge_n(
            summary="I went to the Mars from my living town.",
            references=["I went to Mars", "It's my living town"],
            n=2)

rouge_l = rouge.rouge_l(
            summary="I went to the Mars from my living town.",
            references=["I went to Mars", "It's my living town"])

# You need spaCy to calculate ROUGE-BE

rouge_be = rouge.rouge_be(
            summary="I went to the Mars from my living town.",
            references=["I went to Mars", "It's my living town"])

print("ROUGE-1: {}, ROUGE-2: {}, ROUGE-L: {}, ROUGE-BE: {}".format(
    rouge_1, rouge_2, rouge_l, rouge_be
).replace(", ", "\n"))
from sumeval.metrics.bleu import BLEUCalculator


bleu = BLEUCalculator()
score = bleu.bleu("I am waiting on the beach",
                  "He is walking on the beach")

bleu_ja = BLEUCalculator(lang="ja")
score_ja = bleu_ja.bleu("私はビーチで待ってる", "彼がベンチで待ってる")

From the command line

sumeval r-nlb "I'm living New York its my home town so awesome" "My home town is awesome"

output.

{
  "options": {
    "stopwords": true,
    "stemming": false,
    "word_limit": -1,
    "length_limit": -1,
    "alpha": 0.5,
    "input-summary": "I'm living New York its my home town so awesome",
    "input-references": [
      "My home town is awesome"
    ]
  },
  "averages": {
    "ROUGE-1": 0.7499999999999999,
    "ROUGE-2": 0.6666666666666666,
    "ROUGE-L": 0.7499999999999999,
    "ROUGE-BE": 0
  },
  "scores": [
    {
      "ROUGE-1": 0.7499999999999999,
      "ROUGE-2": 0.6666666666666666,
      "ROUGE-L": 0.7499999999999999,
      "ROUGE-BE": 0
    }
  ]
}

Undoubtedly you can use file input. Please see more detail by sumeval -h.

Install

pip install sumeval

Dependencies

  • BLEU is depends on SacréBLEU
  • To calculate ROUGE-BE, spaCy is required.
  • To use lang ja, janome or MeCab is required.
    • Especially to get score of ROUGE-BE, GiNZA is needed additionally.
  • To use lang zh, jieba is required.
    • Especially to get score of ROUGE-BE, pyhanlp is needed additionally.

sumeval uses two packages to test the score.

  • pythonrouge
    • It calls original perl script
    • pip install git+https://github.com/tagucci/pythonrouge.git
  • rougescore
    • It's simple python implementation for rouge score
    • pip install git+git://github.com/bdusell/rougescore.git

Welcome Contribution tada

Add supported language

The tokenization and dependency parse process for each language is located on sumeval/metrics/lang.

You can make language class by inheriting BaseLang.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK