August 7, 2017

Lemmatization Process & Principles


A Quickstart guide to lemmatizing texts for inclusion in the Bridge is available here.

You can find complete instructions for lemmatizing your text here.

Lemmatization Principles

  1. Substantives are lemmatized to the adjective TITLE unless the substantive has an independent meaning that is unintelligible from the adjective.
  2. Participles are lemmatized to the verb TITLE unless the substantive has an independent meaning that is unintelligible from the verb.
  3. Cardinal & ordinal numbers on cardinal TITLE for numbers greater than 3.
  4. For compound numbers, lemmatize each component, e.g. 72 = two words SEPTVAGINTA and DVO

Morphological Categories for Latin Data

  1. Noun
    1. First Declension
    2. Second Declension
    3. Third Declension
    4. Fourth Declension
    5. Fifth Declension
    6. Irregular or Indeclinable
  2. Adjective
    1. 1st/2nd Declension
    3. Third Declension
  3. Number
  4. Pronoun
  5. Verb
    1. First Conjugation
    2. Second Conjugation
    3. Third Conjugation and Third Conjugation -io
    4. Fourth Conjugation
    5. Irregular
  6. Adverb
  7. Preposition
  8. Conjugation
  9. Interjection
  10. Idioms
  11. Prefixes & Suffixes
  12. Abbreviations

Outstanding Editorial Questions/Data inconsistencies

  1. Provide full principal parts (amo amare amavi amatus) or abbreviated (amo -are -avi -atus)?
  2. What to do about fourth principle parts of verbs: -us or -um?
  3. Should non-idiomatic but common entries like those for salve or vale or XAIPE receive their own entry or be combined into main entry for verb? With salve, etc. mentioned in definition?