TIP of the day: Logos tagging #2 - surface, manuscript, lemma, root

MJ. Smith
MJ. Smith MVP Posts: 53,111
edited November 20 in English Forum

In this set of tagging we run into an option other than tagging - NLP stemming. NLP standard for Natural Language Processing and stemming is the process that a computer uses to "find all forms". If a language had no ambiguous forms and no irregularities the computer would be able to assign lemmas and roots accurately. However, since language is ambiguous and irregular, tagging is necessary to get close to 100% accuracy. However, NLP is often accurate enough to be useful.

1. For purposes of this post, I set the inline reverse interlinear to show the elements under consideration - surface, manuscript, lemma and root.

2. As with speaker and addressee, there are visual filter options that act on the coding: Visual Filter --> Resource --> Corresponding words. In this example the "same surface text" option is selected.

Notice that the HCSB displays the Visual Filter even though it does not have a reverse interlinear. This tells us that NLP rather than tagging is involved. The matching of "baptizer" and "baptized" to "baptizing" indicates that a "match all forms" rule is in effect. Note: I have a query in Faithlife to verify that this is how "same surface text" is intended to work.

3. A second option is to match lemmas i.e. the 3rd line in the interlinears.

Here the HCSB received no highlighting because the visual filter is dependent upon the reverse interlinear to provide the required coding. Note that when multiple words in the surface text translate the lemma, all the words are highlighted with the "main" word in a darker color.

4.The third option is to match roots (4th line of the reverse interlinear as displayed). The behavior is the same as for lemma except that the root is used (lemma is the dictionary form, root is a common base element). In this example, to baptize and baptism are different lemmas but they share a root.

5. One, two or all 3 of the Correspond Words options may be requested at the same time. Note that different coloring is used to differentiate the types of correspondence.

6. Turning to the Context Menu (right-click) behavior, manuscript and lemma both provide links to the Persus Web Lookup. Note that the Persus Web site also provides frequency statistics for the word for each book in the Perseus corpus.

7. The lemma form has an option to add the lemma to you last used word list.

8. The manuscript and lemma forms generate a list of dictionary entries which can be opened via the Context Menu. Note that dictionaries may use different lemmas for the same word depending upon their linguistic philosophy.

9. A wide variety of searches are based upon this information - surface and tagged. The options beginning with MORPH will open a Morph search rather than a basic search. From the results you can see if the results cover all resource (no tags), Resources in a specific language (NLP) or are limited to interlinears.

Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."