Bug/Dataset Issue: Building a Word List from the Old Testament

Page 1 of 1 (12 items)
This post has 11 Replies | 1 Follower

Posts 87
Robert Kelbe | Forum Activity | Posted: Mon, Apr 6 2020 2:20 PM

I created a word list from the entire Old Testament (based on the NKJV) in order to study vocabulary. I noticed a few peculiarities so far...

The gloss for מַלְאָךְ mispells "God" - a simple spelling mistake: "messenger; messengers of God (prophets, priests, angels); angel of Go, Yahweh"

More worryingly, נוּס seems to be completely missing ("to flee, escape") even though it occurs 158 times in the NKJV.

In addition, there was at least one time where the word count did not match the results when I searched by lemma in the NKJV, which was strange.

Mostly, however, I am wondering why נוּס was missing and what else could possibly be missing.

Thank you,

Posts 29383
Forum MVP
MJ. Smith | Forum Activity | Replied: Mon, Apr 6 2020 3:23 PM

I hope you have reported via the typo option - then it helps the entire community.

When a lemma appears to be missing, go to an instance of it and use (a) interlinear, (b) information panel, or (c) context menu to see how Logos treats it. Assigning lemmas is not an exact science - rather there are multiple theories as to how to assign them. Read the Logos glossary to understand their approach.

One should expect the word count in the Greek to differ from the English in that the Greek is not exactly the text that the translators used - where they made choices, that choice is not reflected back into the Greek.

Orthodox Bishop Hilarion Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."

Posts 25825
Forum MVP
Dave Hooton | Forum Activity | Replied: Mon, Apr 6 2020 5:44 PM

Robert Kelbe:
More worryingly, נוּס seems to be completely missing ("to flee, escape") even though it occurs 158 times in the NKJV.

נוס  flee; drive on**  does occur 158 times in NKJV. I did a Word List for Gen 14-39 and it was included with the correct count of 7.

.

** the gloss does vary e.g. BHW 4.18 has "to flee; to drive on"

Robert Kelbe:
Mostly, however, I am wondering why נוּס was missing and what else could possibly be missing.

Have you now found נוס  flee; drive on? Unless very familiar with Hebrew, the word can be missed in a large List. I would sort by Gloss (looking for 'flee') or Count (looking for those occurring 158x).

I would doubt that anything is missing.

How did you build the Word List e.g.  Gen-Mal?

Robert Kelbe:
In addition, there was at least one time where the word count did not match the results when I searched by lemma in the NKJV, which was strange.

Can you provide examples?

Dave
===

Windows 10 & Android 8

Posts 87
Robert Kelbe | Forum Activity | Replied: Mon, Apr 6 2020 5:52 PM

Thank you, MJ and Dave!

Ahhh... נוּס was there. Long story... I had exported it into an Excel spreadsheet and accidentally had it filtered off. I also looked in the word list but I searched with a Shureq instead of a Vav. Sorry about that!

Regarding submitting a typo, I can't figure out what resource it is in, in order to submit a typo. I don't think it is LTW. 

Posts 87
Robert Kelbe | Forum Activity | Replied: Mon, Apr 6 2020 6:04 PM

Regarding the count, here is an example: the word list says 166 (based on the NKJV), but when I right-click, select the lemma, and Search "Bible" it shows as having 158 matches (in the NKJV). So I don't understand why that is, but that is OK; I don't need to waste anyone else's time!

Posts 25825
Forum MVP
Dave Hooton | Forum Activity | Replied: Mon, Apr 6 2020 6:30 PM

Robert Kelbe:
Regarding submitting a typo, I can't figure out what resource it is in, in order to submit a typo.

Create a new thread with Title BUG:Typo in gloss for מַלְאָךְ

The gloss is a composite or truncation, so the typo is the phrase "angel of Go, Yahweh"  as Yahweh is never a gloss for this word (I think it comes from Lexham Analytic Lexicon of the Hebrew Bible which states "the angel of God/Yahweh"). Just include the BWS screenshot above, though.

Dave
===

Windows 10 & Android 8

Posts 25825
Forum MVP
Dave Hooton | Forum Activity | Replied: Mon, Apr 6 2020 7:24 PM

Robert Kelbe:
Regarding the count, here is an example: the word list says 166 (based on the NKJV), but when I right-click, select the lemma, and Search "Bible" it shows as having 158 matches (in the NKJV). So I don't understand why that is,

It is connected with the Hebrew Reverse Interlinear (complex to explain), so if you want a 'true'/reliable count build your Word List from a Hebrew Bible e.g. LHB, SESB.

EDIT: having said that, the word count in LHB was also wrong (139 instead of 160). I might create a bug report

Dave
===

Windows 10 & Android 8

Posts 29383
Forum MVP
MJ. Smith | Forum Activity | Replied: Mon, Apr 6 2020 7:40 PM

Robert Kelbe:

I don't need to waste anyone else's time!

Rarely a waste of time. We all learn tracking down the details.

Orthodox Bishop Hilarion Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."

Posts 87
Robert Kelbe | Forum Activity | Replied: Fri, Jul 10 2020 12:37 PM

I would love to figure out why there is the inconsistency in "count" and if it truly is an error, for Logos to make a formal bug report for it. Many people use their Bible software program for statistics of word count, etc. Here two examples taken at random.

(Comparing count based on exporting the entire Bible in the NKJV as a word list. I confirmed by exporting the Matt-Rev as a word list that the issue still exists)

קֶ֫דֶם - word list: 74 actual: 61

προσκαρτερέω - word list: 18 actual: 10

Posts 25825
Forum MVP
Dave Hooton | Forum Activity | Replied: Sat, Jul 11 2020 3:24 PM

Robert Kelbe:
I would love to figure out why there is the inconsistency in "count" and if it truly is an error, for Logos to make a formal bug report for it.

At this time, I recommend that you create a separate BUG thread for this post to be noticed.

Dave
===

Windows 10 & Android 8

Posts 2234
LogosEmployee

Robert Kelbe:
I would love to figure out why there is the inconsistency in "count" and if it truly is an error, for Logos to make a formal bug report for it. Many people use their Bible software program for statistics of word count, etc.

I can't tell you at the moment exactly why the count is wrong, but I can tell you generally what is wrong along with how to work around the problem.

You are creating a list of lemmas from a translation, where zero or more words in English can map to zero or more words in the original language. Whatever mechanism the Word List is using to determine these counts is not accounting for this mapping in the right way. If you want to fix it then, create your Word List from an original language resource, such as the SBLGNT, and then the counts will be correct.

FYI, if you are attempting to get counts of words, the Word List may not be the right tool for you. The Concordance tool may be better fit for your needs.

Posts 87
Robert Kelbe | Forum Activity | Replied: Sat, Jul 11 2020 5:35 PM

Thank you, Andrew, for responding. Actually, I didn't know about the concordance tool, so I appreciate you showing me that!

I am using the word list for studying vocabulary. I exported it into an excel spreadsheet that I use for studying. I sort by "Count" to prioritize most frequent words. You could say that it doesn't really matter if the count is off, since it is mostly correct, enough to get a rough order (although the error is large enough that the order is affected). However, If I know I've learned all the words with a count greater than a certain number, I would like to be able to input that number into the "READER'S EDITION" view in an interlinear and have the words actually correspond to the words I've already studied. There is enough discrepancy that that isn't the case. Finally, I generate some statistics based on the "Count". For example, I've memorized 820 lemmas which is 15.1% of the total number of lemmas but 86.8% of the words in the New Testament based on "Count". The problem is, that this is not true if the "Count" is inaccurate.

You said that the issue is likely caused by the fact that I am using the Greek text underlying an English translation (the NKJV). That may be true, but why is the concordance accurate for the NKJV? The concordance for the NKJV for the first page of results matches the concordance for the Scrivener 1881 Greek New Testament. It seems to me that the way it generates the count in the word list should match the count in the concordance. To me, this does seem like a bug - probably not your most important one, but one I would like to see fixed eventually. Thank you!

Page 1 of 1 (12 items) | RSS