QUICKSTART: Guide to Lemmatizing a Text

0. What happens before you begin

Someone has processed a plain text file (*.txt) of your text using Bridge Tools.

For now, this is typically done by Bret Mulligan, using the Bridge Tools Scripts (available to the public on GitHub/Git-Classical/Bridge)

An alpha version of Bridge Tools is now available, which allows for the lemmatization of Latin and Greek texts.

1. Getting to Know the Lemmatization Sheet

 You now have a lemmatization spreadsheet that can be worked on in Excel, Google Docs, or any major spreadsheet program. By default, unambiguous words (i.e. non-homonyms) will have been lemmatized by the lemmatization program, allowing you to focus on the real work that requires human judgment.

Figure 1. The TEXT Sheet in the Lemmatization Spreadsheet

The Columns

CHECK: This cell checks the entry in TITLE (Column B) to see if it is already in The Bridge Dictionary.When you enter a valid TITLE in Column B, this text will fill with a matching entry when you SAVE the file. A Read #N/A will appear if the TITLE is not valid.

TITLE: Where you will lemmatize the text by adding the UNIQUE ID that matches the inflected form in TEXT (Column C) with the word in the DICTIONARY.  Blank cells (which need TITLEs) should appear yellow.

TEXT: your text runs down the sheet in this column. NOTE: Latin words that end in -N or -QUE might have been split into multiple rows. If you, lemmatize the actual form and delete the superfluous row.

LOCATION: the book, chapter, poem, line, or section in which the word appears. This will be automatically created by the program that generated the spreadsheet but should be checked.

RUNNING COUNT: allows you to sort the spreadsheet back to text order. If you add any words, be sure to add a value between those in the rows above and below

DISPLAY LEMMA: the principal parts of the word; this will automatically fill for valid TITLEs when you save the file. DO NOT MODIFY THIS ENTRY.

SHORTDEF: a succinct definition of the word; ; this will automatically fill for valid TITLEs when you save the file. DO NOT MODIFY THIS ENTRY.

LONGDEF: a more expansive definition of the word; ; this will automatically fill for valid TITLEs when you save the file. DO NOT MODIFY THIS ENTRY.

LOCALDEF: [optional, but if you are adding custom definitions for your text, they will display here; ; this will automatically fill for valid TITLEs when you save the file. DO NOT MODIFY THIS ENTRY.

PROBLEM: Add notes if you detect any problems with the TEXT, DISPLAY LEMMA, SHORTDEF, OR LONGDEF of a word.

The Sheets

If you look at the bottom of the spreadsheet you will see three sheets: TEXT (the sheet you’ll be working in), DICTIONARY (a local copy of the Bridge Dictionary), and QUICKSTART (which contains a link to this page).

Figure 2. The sheets in the Lemmatization Spreadsheet

3. Lemmatize your passage by identifying each word and adding new vocabulary to the DICTIONARY.

Read through every entry, manually checking those words that have been lemmatized and lemmatizing the ambiguous forms that were not auto-lemmatized.

BEFORE YOU BEGIN WATCH THIS SHORT (SILENT) VIDEO TO SEE LEMMATIZATION IN ACTION

 Lemmatizing requires you to add the correct TITLE to the TITLE Column (C). A TITLE is either a Known Lemma or a New Lemma. First we will discuss Known Lemmas, then how to handle New Lemmas.

At the start you’ll need to find the correct TITLEs in the DICTIONARY sheet. When you start typing in a cell in Column C, possibilities will be suggested if that TITLE already appears in the TEXT sheet.  After a little while you’ll have a sense of what form a TITLE may take and the process can move quite quickly.

Note that TITLES follow a standard orthography and format:

    • TITLES are always ALL-CAPS
    • U’s are always V’s; J’s are I’s; e.g., the TITLE for “abjuro” is ABIVRO.
    • homonyms are distinguished by /1, /2, etc. These are usually ranked in a rational order (nouns, adjectives, numbers, pronouns, verbs, adverbs, prepositions, other) but unless you are absolutely certain about the TITLE, please verify it by looking at the DISPLAY LEMMA and DEFs (after you save your file, these will populate automatically)
    • There are a few other suffixes to distinguish homonyms: e.g., /N for proper names; /A for proper adjectives.

You can find complete instructions for lemmatizing your text here.

ADDING NEW LEMMAS: if a word is not in the DICTIONARY, first check and triple check that it is not in the DICTIONARY. Consider different spellings; try searching for a principal part (without macrons). If you are certain that the word is not in the DICTIONARY, then add it at the bottom of DICTIONARY. Preface the TITLE with three hashtags (###). Fill in the DISPLAY LEMMA, SHORTDEF, and LONGDEF. Don’t worry about the other columns, they will be generated automatically or must be added by the Project Director. By adding the ###, you will guarantee that your new entry will be vetted and add to the master Bridge DICTIONARY used by the Bridge Program.

E.g. if the proper name “Bevis” appeared in your text but there is no BEVIS/N entry in the DICTIONARY. At the bottom of the DICTIONARY sheet, add:

Figure 3. Adding a new TITLE

In the TEXT sheet, be sure to refer to the new entry as ###. 

ADDING DATA TO BLANK TITLES: if a word appears in the DICTIONARY but without any other information (i.e. the TITLE is there, but it lacks dictionary entries and definitions), you can add the DISPLAY LEMMA (with macrons), SHORTDEF, and LONGDEF. I’ll be able to harvest these. But note, if there is already information present, if cannot be harvested. You must make a note in the PROBLEM Column of DICTIONARY (Column N)

E.g. if the proper name “Bretus” appeared in your text and BRETVS/N appears in the DICTIONARY but without any other information you would add the dictionary entry and definition in the row.

If you need to add dictionary entries for Latin texts, the fastest and most accurate way to do so is to copy them from LaNe* which is available on Logeion

Figure 4. Logion

*  LaNe = Woordenboek Latijn/Nederlands, 6th revised edition 2014, a Latin-Dutch translation dictionary, originally based on Pons Globalwörterbuch Lateinisch-Deutsch (Klett) but with full coverage of all entries also contained in the Oxford Latin Dictionary.  It is the current gold standard for Latin vowel quantities.

ADDING CUSTOM DEFINITIONS [Optional]: if you are adding custom definitions for your text, be sure that the LOCALDEF for each word is the best definition for the word. Modify these as needed.

Logeion is also a great place to find/copy definitions (but be thoughtful about this; make sure that you include definitions relevant to your text).

Note that your spreadsheet may some additional columns that do not appear in the sheets described in these instructions. These contain morphological information and other coding that you can ignore.

4. Submit your lemmatization sheet

When you have finished lemmatizing your text, submit it to The Bridge. We will harvest the new information in your local DICTIONARY and add your text information to The Bridge!

Leave a Reply