nltk.translate.extract

nltk.translate.extract(f_start, f_end, e_start, e_end, alignment, f_aligned, srctext, trgtext, srclen, trglen, max_phrase_length)[source]

This function checks for alignment point consistency and extracts phrases using the chunk of consistent phrases.

A phrase pair (e, f ) is consistent with an alignment A if and only if:

  1. No English words in the phrase pair are aligned to words outside it.

    ∀e i ∈ e, (e i , f j ) ∈ A ⇒ f j ∈ f

  2. No Foreign words in the phrase pair are aligned to words outside it.

    ∀f j ∈ f , (e i , f j ) ∈ A ⇒ e i ∈ e

  3. The phrase pair contains at least one alignment point.

    ∃e i ∈ e ̄ , f j ∈ f ̄ s.t. (e i , f j ) ∈ A

Parameters
  • f_start (int) – Starting index of the possible foreign language phrases

  • f_end (int) – End index of the possible foreign language phrases

  • e_start (int) – Starting index of the possible source language phrases

  • e_end (int) – End index of the possible source language phrases

  • srctext (list) – The source language tokens, a list of string.

  • trgtext (list) – The target language tokens, a list of string.

  • srclen (int) – The number of tokens in the source language tokens.

  • trglen (int) – The number of tokens in the target language tokens.