Bug: 5.0a: Spark graph in BWS shows incorrect results

Mark Barnes
Mark Barnes Member Posts: 15,432 ✭✭✭

The spark graphs in the Bible Word Study are displaying incorrect results. You can demo it by running a BWS for logos. There is a serious discrepancy for NIV84 and NIV2011, and minor discrepancies for other versions.

Bug 1:

The additional problem for the NIVs appears to be that for these two versions only, the spark graph display the number of verses instead of the number of results. If the Translation section is set to NIV 1984, you'll see the spark graph shows 318. But run a morph search for logos in the NIV84, and you'll see it's 330 results in 318 verses.

Bug 2:

But other versions are also wrong, or at least they don't give the same results as a morph search (although I'm wondering if the morph search is wrong). Morph searches for all versions to be 330 results, but the spark graphs show different results for each version: 326 (RSVCE), 327 (KJV, NLT), 329 (ESV, NASB, NKJV, NRSV) and 330 (LEB).

Those morph searches look suspicious to me - but given bug 1, it could be the spark graph at fault.

This is my personal Faithlife account. On 1 March 2022, I started working for Faithlife, and have a new 'official' user account. Posts on this account shouldn't be taken as official Faithlife views!

Comments

  • MJ. Smith
    MJ. Smith MVP Posts: 55,539

    spark graphs

    i.e. what I have called density or distribution bars in the BWS documentation? The problem is a little deeper in that the concordance count agrees with the distribution bar. When does 328 = (318 OR 300)? ... see below as I did work out the answer.

    image






    Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."

  • David Bailey
    David Bailey Member Posts: 654 ✭✭

    MJ and Mark,

     

    What I called "mini column chart" in http://community.logos.com/forums/t/61437.aspx is also called by others distribution graph, distribution bars, or spark graph.  In this post, I am calling these the (mini) column charts:

    image

    The top mini column chart is from the Lemma section; the bottom column chart is from the Translation section of the BWS; both generated using the Greek word logos.

    Anyways, I was wondering in my other post about the number that is shown next to the column chart.  I don't know how the software calculates "316" for the Greek word logos.

    My understanding is that this number reflects the number of occurrences of the Greek word found in the interlinear of an English bible. Which English bible?  In the Lemma section, I think it is the highest prioritized bible that has a corresponding Greek interlinear.  In the Translation section, I believe it is by default the bible that you used to launch the BWS (i.e. highlight Greek word in the interlinear, right-mouse click, select Lemma, select Bible Word Study or your customized BWS). I say by default because in the Translation section you can select another English bible. I provided suggestions in my other post.

    I did a Bible search on the Greek word logos within the NIV2011 bible and only in John's gospel.  The partial result is shown:

    image

    According to my search, the Greek word logos occurs 40 times in 36 verses.  When I counted manually the above highlights, I found 39 occurrences in 36 verses. Where is the missing occurrence?

    I searched the Greek work logos for the entire NT using the NIV2011 and got 330:

    image

    For the NIV84, I got the same:

    image

     

     

    I then clicked on the mini column chart to launch the Graph Bible Search Results, and select the Bar Chart.  The Bar Chart agrees with my earlier search: the Greek word occurs 40 times in John:

    image

    Using the Bar Chart above, I added up all the numbers of 24 bars or NT books and the total is 330 for the NIV2011.

    I found out that not all the numbers in the Bar Charts agree with the individual numbers found in the mini column charts (distribution graph) in Lemma and Translation sections.  So, the Bar Chart is not a graphical representation of the Greek word occurrences shown in the mini column chart.  In fact, the Bar Charts always give a sum of 330 for ESV, NIV84, and NIV2011.  Probably because it's based on the same Greek text using the same morph searches?

    Mark,


    I agree that something is not correct in either the morph search result or the column chart summations, or both. But there is also a lack of understanding of what the software is doing to come up with the summations and how morph search relates to these.  Hopefully, someone from Logos can help us understand what is going on in the software.

    David

  • MJ. Smith
    MJ. Smith MVP Posts: 55,539

    Case #1: Luke 23:9 the Greek word λόγος appears so that it is a count in the lemma search. However, the word is not translated in the NRSV (my test case) and therefore does not appear in the BWS for λόγος. While I understand the logic behind this it decreases the usefulness of the BWS IMHO (okay NSH). With a second case Acts 20:24 we have the BWS and the lemma search agreeing that λόγος appears in 218 verses and is translated in 216 verses.

    Yes, this is now in the wiki facts






    Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."

  • MJ. Smith
    MJ. Smith MVP Posts: 55,539

    Case # 2: The morphological search will count a verse only once. Because the translation ring is subdivided by the word in the target language (English) a verse will be counted twice if the same original language word has multiple translations in the same verse.

    Thus for Logos BWS references 316 verses but counts them as 328; the search references 318 verses in which two of the occurrences are omitted in the English.

    Because the Passage list and BWS section have no search function and the Passage list drops duplicates, I don't know how to identify these and therefore don't know how to identify discrepancies and balance the counts without exporting to Excel or something similar.






    Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."

  • Jon
    Jon Member Posts: 767 ✭✭

    DBailey said:

    image

     

    At the risk of pointing out the obvious, the graphs are also wrong in the sense that the size of the bars bears no relation to the number of occurrences! Unless I'm missing the point of what the bars are meant to represent [:S]

  • MJ. Smith
    MJ. Smith MVP Posts: 55,539

    Jon said:

    At the risk of pointing out the obvious, the graphs are also wrong in the sense that the size of the bars bears no relation to the number of occurrences! Unless I'm missing the point of what the bars are meant to represent

    No they are not wrong - they are normalized. See my write up on the BWS. http://wiki.logos.com/Logos_5$3a_Bible_Word_Study  And some wondered why I went into such detail.[:P]






    Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."

  • Jon
    Jon Member Posts: 767 ✭✭

    Doh, it was too obvious... Number of hits in book / number of words in book [:)]

  • MJ. Smith
    MJ. Smith MVP Posts: 55,539

    DBailey said:

    distribution bars

    I promise to be consistent in referring to them only as distribution or density bar (charts) depending upon which they are. Be glad I'm not prone to precision 'cause then they'd be normalized distribution or density bar (charts).

    DBailey said:

    I don't know how the software calculates "316" for the Greek word logos.

    I have explained the 316, 318 and 328 in the posts above case #1 and case #2. The information has been added to http://wiki.logos.com/Logos_5$3a_Bible_Word_Study

    DBailey said:

    In the Translation section, I believe it is by default the bible that you used to launch the BWS (i.e. highlight Greek word in the interlinear, right-mouse click, select Lemma, select Bible Word Study or your customized BWS).

    I'll double check re:launching from a non-prioritized reverse interlinear Bible.

    DBailey said:

    Where is the missing occurrence?

    While I did it for the NRSV, the procedure for identifying the "missing" occurrence is a simple symmetrical difference between a passage list of the BWS and a passage list of the Search. It is a case where the word in Greek is not translated in the English.

    DBailey said:

    Hopefully, someone from Logos can help us understand what is going on in the software.

    I guess I can't be helpful 'cause I'm not from Logos. [:'(]  And people thought I was a pest when I forced Logos to divulge their secrets and make me able to match their counts and build their distribution and density bars.[;)]

     






    Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."

  • David Bailey
    David Bailey Member Posts: 654 ✭✭

    MJ. Smith said:

    Case # 2: The morphological search will count a verse only once. Because the translation ring is subdivided by the word in the target language (English) a verse will be counted twice if the same original language word has multiple translations in the same verse.

    Thus for Logos BWS references 316 verses but counts them as 328; the search references 318 verses in which two of the occurrences are omitted in the English.

    I think you are correct, MJ. That would explain the numbers. However, shouldn't the mini column chart in the Lemma section reflect the frequency of the word logos rather than count the number of verses that has logos?

    thanks,

    David

  • MJ. Smith
    MJ. Smith MVP Posts: 55,539

    DBailey said:

    So, the Bar Chart is not a graphical representation of the Greek word occurrences shown in the mini column chart.

    Correct. The Graph Bible Search function always takes its data from the Search not from a BWS. I've added this to the wiki. Also note the misleading titles - it is not the number of hits but the number of verses containing hits.






    Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."

  • MJ. Smith
    MJ. Smith MVP Posts: 55,539

    Okay - are there any outstanding issues that I have not addressed? And given this thread, are there any additional suggestions on the form and content of the wiki page on the BWS?






    Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."

  • David Bailey
    David Bailey Member Posts: 654 ✭✭

    MJ. Smith said:

    Jon said:

    At the risk of pointing out the obvious, the graphs are also wrong in the sense that the size of the bars bears no relation to the number of occurrences! Unless I'm missing the point of what the bars are meant to represent

    No they are not wrong - they are normalized. See my write up on the BWS. http://wiki.logos.com/Logos_5$3a_Bible_Word_Study  And some wondered why I went into such detail.Stick out tongue

    The density of each of the small bars is presented by the height, as MJ stated.  If you hover your pointer at or near the number in front of the mini column chart (distribution graph), you will see how the density of each bar is defined:

    image

     

    The above mini column chart was generated using the NIV2011 and the Greek lemma logos.

    Example: In the Gospel of John

    Y is the total number of Greek words in that book

    X is the number of hits in John (40 hits)

    N is normalization parameter

    Then the height of the bar for John is a normalized ratio of (X/Y) x N

    Using the Column (Bar) Chart in the Graph Bible Search Results, with same Greek lemma, NIV2011, show zero items selected, and statistics setting matching the above ratio, I can approximate the column chart densities:

    image

     

    I do agree with Jon.  I wish the column charts were larger in size.

    David

  • MJ. Smith
    MJ. Smith MVP Posts: 55,539

    DBailey said:

    However, shouldn't the mini column chart in the Lemma section reflect the frequency of the word logos rather than count the number of verses that has logos?

    In this particular thread, because I've just made public (but not official) the L5 wiki page for the BWS, I see my role as explaining what Logos does. In pulling the documentation I produced a number of error reports and a few suggestions when Logos didn't consider it an error.[:)] Therefore, I'm staying out of "shouldn't" questions - others are free to push Logos in cases where they believe a function to be less than ideal. I think that in many ways the BWS is one of the most difficult section of Logos as it has to meld together resources that hold to differing levels of technicality, differing linguistic bases, differing base texts, differing textual analysis of scribal errors ... I'm still running into a number of cases where I cannot fully explain the results because my primary dictionaries have fundamental differences - or, occasionally, because Logos made a design decision that would have never entered my mind (e.g. treatment of roots of proper nouns) although I can justify the treatment once I've seen what they are doing.






    Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."

  • MJ. Smith
    MJ. Smith MVP Posts: 55,539

    DBailey said:

    I do agree with Jon.  I wish the column charts were larger in size.

    There are several things that I know Logos has computed in order to generate what they show, that I would like to see the raw data on. And there are several graphics that I wish could be enlarged to make detail more visible. We need more people wanting to use text analytics (numerical and statistical techniques) to build the market for Logos.






    Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."

  • Mark Barnes
    Mark Barnes Member Posts: 15,432 ✭✭✭

    Thanks, MJ, for your detective work. It does make sense, even though it's not intuitive.

    A simply re-wording of the hover text over each spark graph would solve the problem though.

    This is my personal Faithlife account. On 1 March 2022, I started working for Faithlife, and have a new 'official' user account. Posts on this account shouldn't be taken as official Faithlife views!

  • Dave Hooton
    Dave Hooton MVP Posts: 36,339

    DBailey said:

    According to my search, the Greek word logos occurs 40 times in 36 verses.  When I counted manually the above highlights, I found 39 occurrences in 36 verses. Where is the missing occurrence?

    You probably missed the occuurence in Jn 15:20

    image

    The similar occurrences in Jn 19:8, 13 are easier to spot. But the total highlighted is 40.

    Dave
    ===

    Windows 11 & Android 13

  • Dave Hooton
    Dave Hooton MVP Posts: 36,339

    But other versions are also wrong, or at least they don't give the same results as a morph search (although I'm wondering if the morph search is wrong). Morph searches for all versions to be 330 results, but the spark graphs show different results for each version: 326 (RSVCE), 327 (KJV, NLT), 329 (ESV, NASB, NKJV, NRSV) and 330 (LEB).

    Morph Search is fine. The number in the spark graphs is the number of times the word is translated in the RI. Search results shows the non-translated occurrences as per Jn 15:20 in my previous post.

    EDIT:  run a BWS on John only and you can easily account for the missing occurrences (ESV will show all 40, but NIV 2011 = 37, NIV84 = 38)

     

    Dave
    ===

    Windows 11 & Android 13

  • David Bailey
    David Bailey Member Posts: 654 ✭✭

    MJ. Smith said:

    In this particular thread, because I've just made public (but not official) the L5 wiki page for the BWS, I see my role as explaining what Logos does. In pulling the documentation I produced a number of error reports and a few suggestions when Logos didn't consider it an error.Smile

    Thanks for putting it all together MJ.  The L5 wiki page on BWS looks great - thanks for your hard work on it!

    David

  • David Bailey
    David Bailey Member Posts: 654 ✭✭

    You probably missed the occuurence in Jn 15:20

    I believe you are correct Dave.  Thanks for finding that.

    David

  • David Bailey
    David Bailey Member Posts: 654 ✭✭

    DBailey said:

    What I called "mini column chart" in http://community.logos.com/forums/t/61437.aspx is also called by others distribution graph, distribution bars, or spark graph.  In this post, I am calling these the (mini) column charts:

    Hmm, perhaps histogram is a better term to use for those mini column charts.