TIP of the day - from the blogs: identifying potentially important words and concepts in a text
MJ. Smith
MVP Posts: 53,108
From Perseus:
The Vocabulary Tool is very versatile and it can be used in several ways to help you read a text in the Perseus Digital Library.
- A Comprehensive Vocabulary List for a Work: If you want a comprehensive vocabulary list that you can consult as you read and review a text, you should select the text that you are trying to read in the select box. Use the alphabetical sort order and show all words for the list size. This will produce a comprehensive list of words in alphabetical order that you can annotate and consult easily as you are reading a text.
- A List of Essential Words for an Author: If you want to improve your mastery of a particular Greek or Latin author, you should select all of the works by that author in the select box. Select weighted frequency as your sort order and top 40% or top 50% as your list size option. This will provide you with a list of 'essential words' that you should memorize to maximize your understanding of that author.
- A List of Basic Words for Intermediate-Level Reading: If you are an intermediate-level student, beginning to read unadapted texts, select five or six texts of interest in the select box. Select weighted frequency as your sort order and top 50% or top 60% as your list size option. This will give you a sense of the most important words in the language; when you are familar with these words, you can begin reading, confident that you will know half to two-thirds of the words on a typical page.
- A List of Essential Words for a Comprehensive Greek or Latin Exam: If you are an advanced student preparing for comprehensive exams, select a large list of authors that are appropriate for the requirements of your exam in the language box. Select weighted frequency as your sort order and top 70% or top 80% as your list size option. This will provide you with a list of important words to help you prepare for your exam.
- A List of Key Words for a Text: If you want a quick overview of the potentially important words and concepts in a text, select the text that interests you with a sort order of key word score and a list size of top 10%. This will provide a short list of potentially important words to be aware of as you read the text.
- Word Frequency Tool (Greek or Latin): If you are searching for occurrences of specific Greek or Latin words, you may use the Word Frequency Tool. There are several options for displaying results. Sort Authors Alphabetically is the default option. Sort Authors by Type of Literature will sort results according to types such as comedy, history, tragedy,etc. Sort Authors by Date will list authors starting from the earliest work to the latest based on the best evidence we have for each author. Words in Author will sort results from the author with the most words in Perseus, to the author with the fewest. Maximum Instances will sort results from the most possible instances in a given author; Minimum Instances reverses this list and starts with the fewest. Maximum Frequency/10K will sort the results from the highest incidence of relative frequency to the lowest; Minimum Frequency/10K reverses this list and begins with the lowest relative frequency.
Why are there Maximum and Minimum Frequencies? Although Perseus can disambiguate a vast majority of Greek and Latin words, there are some forms which may be derived from more than one lexicon entry. (E.g. "flies" may be an instance of the verb "to fly" or the noun "fly", so Perseus would include it in the count for both words. On the other hand, there's no doubt that "sneezed" is a form of "to sneeze") In cases where the maximum instances differ from the minimum, the maximum are all of the possible occurrences of a given lemma, and the minimum are all of the occurrences of the word which the computer has disambiguated. So, all ambiguous forms are included in a maximum count, and excluded from the minimum. This is also true of the relative frequency calculations.
What is a Weighted Frequency? A weighted frequency tells you whether the actual frequency count for a word (if this were possible) would be closer to the minimum or maximum frequency score. The weighted frequency is determined by assigning a weight to each inflected form based on the number of possible dictionary forms from which the inflected form could be derived. For example, an unambiguous word would have a weight of 1, a word that could be derived from two dictionary headwords would receive a weight of 1/2, a word that could be from 3 different headwords is given the weight of 1/3, etc. The weighted frequency is calculated as the sum of the weights for each inflected form that appears in a text. If the weighted score is equal to the average of the minimum and maximum score, you know that the word is entirely ambiguous in all of its forms. On the other hand, if the minimum, maximum, and weighted scores are all the same, you know the word is entirely unambiguous in all of its forms. As the weight approaches the maximum score, it becomes more likely that the maximum count is closer to the actual count; the actual count would be greater than the weighted score and less than or equal to the maximum.
Why use relative frequencies? Relative frequencies are based on occurrences of a given word per 10,000 words. For instance, in the case of the Greek verb pempô, Plutarch uses this verb 146 times, which is unimpressive compared with Xenophon's maximum of 350 times. Yet, the corpus of Plutarch on-line in Perseus is about 107,000 words compared with Xenophon's 312,000. So, the relative frequency in Plutarch is 13.67 at its maximum, compared with Xenophon's maximum of 11.21. When making comparisons between authors, it is most useful to know the relative frequency for a given word rather than the word count itself, since the size of the corpora vary.
Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."
0