August 7, 2017

Lemmatization Process & Principles


A Quickstart guide to lemmatizing texts for inclusion in the Bridge is available here.

You can find complete instructions for lemmatizing your text here.

Lemmatization Principles

  1. Substantives are lemmatized to the adjective TITLE unless the substantive has an independent meaning that is unintelligible from the adjective. This includes ethnonymns (e.g. ACARNANIS/A > ACARNANES/N; ROMANVS/A > ROMANI/N), even if the adjective form is unattested.
  2. Participles (including perfect passiave) are lemmatized to the verb TITLE unless the participle has an independent meaning that is unintelligible from the verb or if the verb is not extant independently
  3. Cardinal & ordinal numbers on cardinal TITLE for numbers greater than 3.
  4. For compound numbers, lemmatize each component, e.g. 72 = two words SEPTVAGINTA and DVO
  5. Collatoral forms (e.g. different spellings, deponent forms with the same meaning) are lemmatized to main TITLE where possible
  6. Abstractions (e.g. “Luxury”) are not generally lemmatized as a separate proper name TITLE but the general TITLE for the noun.

Morphological Categories for Latin Data

  1. Noun
    1. First Declension
    2. Second Declension
    3. Third Declension
    4. Fourth Declension
    5. Fifth Declension
    6. Irregular or Indeclinable
  2. Adjective
    1. 1st/2nd Declension
    3. Third Declension
  3. Number
  4. Pronoun
  5. Verb
    1. First Conjugation
    2. Second Conjugation
    3. Third Conjugation and Third Conjugation -io
    4. Fourth Conjugation
    5. Irregular
  6. Adverb
  7. Preposition
  8. Conjugation
  9. Interjection
  10. Idioms
  11. Prefixes & Suffixes
  12. Abbreviations

Outstanding Editorial Questions/Data inconsistencies

  1. Provide full principal parts (amo amare amavi amatus) or abbreviated (amo -are -avi -atus)?
  2. What to do about fourth principle parts of verbs: -us or -um?
  3. Should non-idiomatic but common entries like those for salve or vale or XAIPE receive their own entry or be combined into main entry for verb? With salve, etc. mentioned in definition?