nltk.translate.extract¶

nltk.translate.extract(f_start, f_end, e_start, e_end, alignment, f_aligned, srctext, trgtext, srclen, trglen, max_phrase_length)[source]¶

This function checks for alignment point consistency and extracts phrases using the chunk of consistent phrases.

A phrase pair (e, f ) is consistent with an alignment A if and only if:

No English words in the phrase pair are aligned to words outside it.

∀e i ∈ e, (e i , f j ) ∈ A ⇒ f j ∈ f
No Foreign words in the phrase pair are aligned to words outside it.

∀f j ∈ f , (e i , f j ) ∈ A ⇒ e i ∈ e
The phrase pair contains at least one alignment point.

∃e i ∈ e ̄ , f j ∈ f ̄ s.t. (e i , f j ) ∈ A

Parameters

f_start (int) – Starting index of the possible foreign language phrases
f_end (int) – End index of the possible foreign language phrases
e_start (int) – Starting index of the possible source language phrases
e_end (int) – End index of the possible source language phrases
srctext (list) – The source language tokens, a list of string.
trgtext (list) – The target language tokens, a list of string.
srclen (int) – The number of tokens in the source language tokens.
trglen (int) – The number of tokens in the target language tokens.

NLTK

Documentation

nltk.translate.extract¶