nltk.translate.nist

nltk.translate.nist(references, hypothesis, n=5)[source]

Calculate NIST score from George Doddington. 2002. “Automatic evaluation of machine translation quality using n-gram co-occurrence statistics.” Proceedings of HLT. Morgan Kaufmann Publishers Inc. https://dl.acm.org/citation.cfm?id=1289189.1289273

DARPA commissioned NIST to develop an MT evaluation facility based on the BLEU score. The official script used by NIST to compute BLEU and NIST score is mteval-14.pl. The main differences are:

  • BLEU uses geometric mean of the ngram overlaps, NIST uses arithmetic mean.

  • NIST has a different brevity penalty

  • NIST score from mteval-14.pl has a self-contained tokenizer

Note: The mteval-14.pl includes a smoothing function for BLEU score that is NOT

used in the NIST score computation.

>>> hypothesis1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'which',
...               'ensures', 'that', 'the', 'military', 'always',
...               'obeys', 'the', 'commands', 'of', 'the', 'party']
>>> hypothesis2 = ['It', 'is', 'to', 'insure', 'the', 'troops',
...               'forever', 'hearing', 'the', 'activity', 'guidebook',
...               'that', 'party', 'direct']
>>> reference1 = ['It', 'is', 'a', 'guide', 'to', 'action', 'that',
...               'ensures', 'that', 'the', 'military', 'will', 'forever',
...               'heed', 'Party', 'commands']
>>> reference2 = ['It', 'is', 'the', 'guiding', 'principle', 'which',
...               'guarantees', 'the', 'military', 'forces', 'always',
...               'being', 'under', 'the', 'command', 'of', 'the',
...               'Party']
>>> reference3 = ['It', 'is', 'the', 'practical', 'guide', 'for', 'the',
...               'army', 'always', 'to', 'heed', 'the', 'directions',
...               'of', 'the', 'party']
>>> sentence_nist([reference1, reference2, reference3], hypothesis1) 
3.3709...
>>> sentence_nist([reference1, reference2, reference3], hypothesis2) 
1.4619...
Parameters
  • references (list(list(str))) – reference sentences

  • hypothesis (list(str)) – a hypothesis sentence

  • n (int) – highest n-gram order