nltk.translate.StackDecoder¶

class nltk.translate.StackDecoder[source]¶

Bases: object

Phrase-based stack decoder for machine translation

>>> from nltk.translate import PhraseTable
>>> phrase_table = PhraseTable()
>>> phrase_table.add(('niemand',), ('nobody',), log(0.8))
>>> phrase_table.add(('niemand',), ('no', 'one'), log(0.2))
>>> phrase_table.add(('erwartet',), ('expects',), log(0.8))
>>> phrase_table.add(('erwartet',), ('expecting',), log(0.2))
>>> phrase_table.add(('niemand', 'erwartet'), ('one', 'does', 'not', 'expect'), log(0.1))
>>> phrase_table.add(('die', 'spanische', 'inquisition'), ('the', 'spanish', 'inquisition'), log(0.8))
>>> phrase_table.add(('!',), ('!',), log(0.8))

>>> #  nltk.model should be used here once it is implemented
>>> from collections import defaultdict
>>> language_prob = defaultdict(lambda: -999.0)
>>> language_prob[('nobody',)] = log(0.5)
>>> language_prob[('expects',)] = log(0.4)
>>> language_prob[('the', 'spanish', 'inquisition')] = log(0.2)
>>> language_prob[('!',)] = log(0.1)
>>> language_model = type('',(object,),{'probability_change': lambda self, context, phrase: language_prob[phrase], 'probability': lambda self, phrase: language_prob[phrase]})()

>>> stack_decoder = StackDecoder(phrase_table, language_model)

>>> stack_decoder.translate(['niemand', 'erwartet', 'die', 'spanische', 'inquisition', '!'])
['nobody', 'expects', 'the', 'spanish', 'inquisition', '!']

__init__(phrase_table, language_model)[source]¶

Parameters

phrase_table (PhraseTable) – Table of translations for source language phrases and the log probabilities for those translations.
language_model (object) – Target language model. Must define a probability_change method that calculates the change in log probability of a sentence, if a given string is appended to it. This interface is experimental and will likely be replaced with nltk.model once it is implemented.

word_penalty¶

float: Influences the translation length exponentially.: If positive, shorter translations are preferred. If negative, longer translations are preferred. If zero, no penalty is applied.

beam_threshold¶

float: Hypotheses that score below this factor of the best: hypothesis in a stack are dropped from consideration. Value between 0.0 and 1.0.

stack_size¶

int: Maximum number of hypotheses to consider in a stack.: Higher values increase the likelihood of a good translation, but increases processing time.

property distortion_factor¶

float: Amount of reordering of source phrases.: Lower values favour monotone translation, suitable when word order is similar for both source and target languages. Value between 0.0 and 1.0. Default 0.5.

translate(src_sentence)[source]¶

Parameters: src_sentence (list(str)) – Sentence to be translated
Returns: Translated sentence
Return type: list(str)

find_all_src_phrases(src_sentence)[source]¶

Finds all subsequences in src_sentence that have a phrase translation in the translation table

Returns: Subsequences that have a phrase translation, represented as a table of lists of end positions. For example, if result[2] is [5, 6, 9], then there are three phrases starting from position 2 in src_sentence, ending at positions 5, 6, and 9 exclusive. The list of ending positions are in ascending order.
Return type: list(list(int))

compute_future_scores(src_sentence)[source]¶

Determines the approximate scores for translating every subsequence in src_sentence

Future scores can be used a look-ahead to determine the difficulty of translating the remaining parts of a src_sentence.

Returns: Scores of subsequences referenced by their start and end positions. For example, result[2][5] is the score of the subsequence covering positions 2, 3, and 4.
Return type: dict(int: (dict(int): float))

future_score(hypothesis, future_score_table, sentence_length)[source]¶: Determines the approximate score for translating the untranslated words in hypothesis

expansion_score(hypothesis, translation_option, src_phrase_span)[source]¶

Calculate the score of expanding hypothesis with translation_option

Parameters

hypothesis (_Hypothesis) – Hypothesis being expanded
translation_option (PhraseTableEntry) – Information about the proposed expansion
src_phrase_span (tuple(int, int)) – Word position span of the source phrase

distortion_score(hypothesis, next_src_phrase_span)[source]¶

static valid_phrases(all_phrases_from, hypothesis)[source]¶

Extract phrases from all_phrases_from that contains words that have not been translated by hypothesis

Parameters: all_phrases_from (list(list(int))) – Phrases represented by their spans, in the same format as the return value of find_all_src_phrases
Returns: A list of phrases, represented by their spans, that cover untranslated positions.
Return type: list(tuple(int, int))

NLTK

Documentation

nltk.translate.StackDecoder¶