Feedback on Lemma in Passage

Mark Barnes
Mark Barnes Member Posts: 15,432 ✭✭✭
edited November 20 in English Forum

This looks like a really interesting feature, that could certainly prove useful. However, I feel it still needs some refinement. In particular:

  1. The grouping needs to be at two levels, not just one. That is, if I select the 'lemma' view, it should group by lemma, but then group by resource within each lemma. Equally, if I select resource view, it should group first by resource, but then by lemma within each resource.
  • The first lemma I wanted to test it with was pneumatikos in 1 Cor 12:1. But there are several results in each commentary, and it makes it very hard to scroll through everything.
  • I then tried the resource view. But looking, for example, at Billroth's Commentary vol 2, occurrences of pneumatikon are scattered throughout the results, rather than grouped together.
  • In the lemma view, it would be much better if the results were ordered by prioritisation. I know you might say this is a search, and therefore results can't be ranked by prioritisation, but because it's a searching within a milestone, I do think prioritisation should be used.
  • In the resource view, the resources ought to be ranked by priority (or alphabetical order at least). At the moment they seem random.
  • It would be nice to have a link to a search. I presume the search criteria is something like this: (πνευματικός OR πνευματικῶν) WITHIN {Milestone <Bible ~ 1 Cor 12:1>} within type:bible-commentary, but it would be nice not to have to re-type it. Searches (particularly inline searches) are particularly useful when you want to scan through the results in context.
  • Given (4), above, it seems odd that this search needs to be performed on the Logos servers. A locally-based dataset of acceptable declensions for each lemma should be all that's needed to make this local. And that dataset, if it doesn't already exist, could be useful for all sorts of things.
  • This is my personal Faithlife account. On 1 March 2022, I started working for Faithlife, and have a new 'official' user account. Posts on this account shouldn't be taken as official Faithlife views!

    Comments

    • Rick Brannan (Logos)
      Rick Brannan (Logos) Member, Logos Employee Posts: 1,862

      Hi Mark.

      This one is pretty early in the process; I actually just ran it for the first time on the day the beta was delivered. There's more that'll happen with it; I'll let others interact regarding your suggestions (which are great).

      Regarding the data itself (items 4 & 5), we've actually done a ton of analysis on commentaries that contain Hebrew, Aramaic, and Greek strings. For that (rather large) set of commentaries, we've done some analysis of those strings to, where possible, lemmatize them with some attention to context and store the associations and their locations. In other words, there's more than a search on forms going on. The resultant dataset, which is what the online portion is using, is pretty sizeable. It really isn't feasible to deliver it to each user's installation. It also has the possibility of being frequently updated; recent feedback indicates that, for a not insignificant proportion of users, large resources + frequent updates aren't a good combination.

      I'm really looking forward to seeing how this one develops, and how beta folks can help us tune the feature to maximize its role in study.

      Rick Brannan
      Data Wrangler, Faithlife
      My books in print

    • MJ. Smith
      MJ. Smith MVP Posts: 53,405

      Rick, is this using the language tags? I ask because of transliterations being grouped together which would have a negative impact on the resources that would be included.

      Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."

    • Rick Brannan (Logos)
      Rick Brannan (Logos) Member, Logos Employee Posts: 1,862

      Hi M.J.

      MJ. Smith said:

      Rick, is this using the language tags? I ask because of transliterations being grouped together which would have a negative impact on the resources that would be included.

      Only actual Hebrew/Aramaic and Greek text is being evaluated; transliterations are not. It would be nice some day, but Hebrew transliterations are really hard to retrovert to actual Hebrew text that could be parsed — most Hebrew transliteration schemes actually employed are fairly lossy.

      I'm not sure I understand what you're saying about "transliterations being grouped together", though. If I haven't answered your question, can you expand on it?

      Rick Brannan
      Data Wrangler, Faithlife
      My books in print

    • Eli Evans (Logos)
      Eli Evans (Logos) Member, Logos Employee Posts: 1,404

      The grouping needs to be at two levels, not just one. That is, if I select the 'lemma' view, it should group by lemma, but then group by resource within each lemma. Equally, if I select resource view, it should group first by resource, but then by lemma within each resource.

      I agree, and it's already in the works.

      In the lemma view, it would be much better if the results were ordered by prioritisation. I know you might say this is a search, and therefore results can't be ranked by prioritisation, but because it's a searching within a milestone, I do think prioritisation should be used.

      Good point. It may be deceptively difficult since the results are coming from a web service that may not have ready access to your personal priority rules. But we'll look into it.

      It would be nice to have a link to a search. I presume the search criteria is something like this: (πνευματικός OR πνευματικῶν) WITHIN {Milestone <Bible ~ 1 Cor 12:1>} within type:bible-commentary, but it would be nice not to have to re-type it. Searches (particularly inline searches) are particularly useful when you want to scan through the results in context.

      When/if we build a search provider, we'll include something like that.

      We aren't doing a search for multiple surface forms. Rather, Rick wrote a much more sophisticated heuristic to compare a given surface form against candidate lemmas from the passage in question. This cuts down on false positives due to homographs and false negatives for spelling variations. His dataset then assigns an exact lemma to the selection in the resource. In some cases, it could assign an exact instance of a lemma, though we don't leverage that data just yet.

      So your search would yield different results than the dataset.

      Given (4), above, it seems odd that this search needs to be performed on the Logos servers. A locally-based dataset of acceptable declensions for each lemma should be all that's needed to make this local. And that dataset, if it doesn't already exist, could be useful for all sorts of things.

      We pursued a solution that just maps inflections to lemmas. Such an approach is tolerable for NT Greek, less good for Greek beyond the NT, and embarrassingly awful for Hebrew and Aramaic. (We might still include this global mapping in the lemma list dataset for guessing lemmas for surface words in any resource. A sort of last-resort automatic parser, if you will.)

      But commentaries are a special case. So instead we opted for a process that maps specific selections in a resource to specific lemmas -- that is, as if the tagging had been done in situ in the resource. This results in a much more accurate but much larger data set.

      Given that, we can either a) create one very large (~ 1 gig and growing) supplemental data resource that covers every commentary we've ever produced, and then update it every time a new commentary is produced whether you buy it or not, or b) create one supplemental data resource for each commentary resource, or c) some number of supplemental data resources less than the number of commentary resources, say, by lumping commentary volumes together. A is a non-starter from the user perspective. B and C are a production and maintenance nightmare. It could be done, but then the feature would fail at cost/benefit and we wouldn't build it at all.

      On the other hand, the library services architecture we've been building to support the web app already keeps one automatically-updated gigantic index for every resource that will ever exist and filters out results for a given query based on the user's licenses. 

    • Rick Brannan (Logos)
      Rick Brannan (Logos) Member, Logos Employee Posts: 1,862

      In 6.8 Beta 2, the Lemma in Passage section is now on by default in Exegetical Guide and BWS. The Collection selector in the guide section is also now functional. Narrowing by commentary collection is one way to narrow hits that display for this section.

      Bonus: You can now add Lemma in Passage to a guide template without crashing. (yay!)

      BWS does not (yet) constrain to reference range; I'm unsure where that stands. 

      Rick Brannan
      Data Wrangler, Faithlife
      My books in print

    • Eli Evans (Logos)
      Eli Evans (Logos) Member, Logos Employee Posts: 1,404

      I'm unsure where that stands. 

      It's in progress. Should come up in a beta real soon now.

    • Mark Barnes
      Mark Barnes Member Posts: 15,432 ✭✭✭

      Eli Evans said:

      We pursued a solution that just maps inflections to lemmas. Such an approach is tolerable for NT Greek, less good for Greek beyond the NT, and embarrassingly awful for Hebrew and Aramaic. (We might still include this global mapping in the lemma list dataset for guessing lemmas for surface words in any resource. A sort of last-resort automatic parser, if you will.)

      Thanks Eli and Rick for providing the detail about how this works. When I was doing my testing, it was in the NT, and has you said, this method was acceptable there. It makes sense that it would work less well in other corpus.

      It's an interesting discussion because it does show clearly why you have chosen to go with an online service rather than a local one, even when that choice isn't obvious to users.

      However, given what you've said my preferred approach would be to add tagging to the existing lexicons and commentaries, so that your "supplemental data resource" is actually a part of the resources themselves. That way users could access it locally, but would only have that part of the database that was relevant to them.

      This desire on my part is not simply because I don't want to be tied to an online service (although that's a small factor). It's much more that having all these separate databases severely limits my ability to construct my own searches that use data from a variety of databases, rather than simply through the guides and tools.

      I'm very appreciative that this has been improving, (particularly with entities, and labels for sermons and journal articles), and that the trend is often that data is exposed first through guides/tools, and later in search. But I want to encourage you to continue that trend, and when practically possible, to design datasets in such a way that allows them to be integrated with the resources they point to, so that users can search them in their own libraries. Wonderful possibilities open up if we're able to combine Rick's Ancient Literature dataset with other searches, or to able to search for all entities with particular characteristics, or to be able to use the Milestone search for headwords, or..... well, you get the idea.

      Thank you for listening!

      This is my personal Faithlife account. On 1 March 2022, I started working for Faithlife, and have a new 'official' user account. Posts on this account shouldn't be taken as official Faithlife views!

    • Eli Evans (Logos)
      Eli Evans (Logos) Member, Logos Employee Posts: 1,404

      However, given what you've said my preferred approach would be to add tagging to the existing lexicons and commentaries, so that your "supplemental data resource" is actually a part of the resources themselves. That way users could access it locally, but would only have that part of the database that was relevant to them.

      That would be technologically simpler. However, to do that would require the book production team to do what we call a "recycle" on each of around 2,400 commentary volumes. Recycling a resource means taking it back to Xml source files, adding markup, and then re-compiling and re-publishing the book. Depending on 20 years worth of previous circumstances when the book was first built, recycling can be a very smooth process, or a lossy and fragile process — with no obvious way to tell which. (A good rule of thumb for computer files is, "if it ain't broke, DON'T TOUCH IT!")

      This process, as I understand it, would take around two calendar years, or 100 volumes per month, and would impact the team's ability to do other more valuable things.

      A second reason, more pertinent to the business factors than tech, is that this data is value added markup that we want to make available only to subscribers. If we embed it in the books, we currently have no way to list it as a separate line item for commerce. It's just part of the book, so 100% giveaway.

      So in the end, the choice wasn't really between "in the books" or "on the server," it was "on the server" or "not at all."

      It's much more that having all these separate databases severely limits my ability to construct my own searches that use data from a variety of databases, rather than simply through the guides and tools.

      That's the situation at the moment, but we do plan to make these searchable eventually. The same technology that allows search extensions ({...}) allows us to build custom search providers to search data that is not stored in the local index.

      But, as Bradley is fond of reminding me, the default state for any feature is to not exist. So that means that the immediate choice wasn't between "search" or "guides" but between "guides" or "nowhere."

      Hope that makes sense!

    • Eli Evans (Logos)
      Eli Evans (Logos) Member, Logos Employee Posts: 1,404

      I'm very appreciative that this has been improving, (particularly with entities, and labels for sermons and journal articles), and that the trend is often that data is exposed first through guides/tools, and later in search. But I want to encourage you to continue that trend, and when practically possible, to design datasets in such a way that allows them to be integrated with the resources they point to, so that users can search them in their own libraries. Wonderful possibilities open up if we're able to combine Rick's Ancient Literature dataset with other searches, or to able to search for all entities with particular characteristics, or to be able to use the Milestone search for headwords, or..... well, you get the idea.

      100% agreement. Recall that we shipped more than a few new datasets with Logos 5 that didn't become searchable at all until Logos 6. We've shortened that turnaround significantly. If that trend continues, which I want it to, then maybe someday we'll get to a point where a new, exotic dataset will be fully searchable on the same day as it releases.

    • Mark Barnes
      Mark Barnes Member Posts: 15,432 ✭✭✭

      Eli Evans said:

      That's the situation at the moment, but we do plan to make these searchable eventually. The same technology that allows search extensions ({...}) allows us to build custom search providers to search data that is not stored in the local index.

      Thanks, Eli!

      This is my personal Faithlife account. On 1 March 2022, I started working for Faithlife, and have a new 'official' user account. Posts on this account shouldn't be taken as official Faithlife views!

    • Mark Barnes
      Mark Barnes Member Posts: 15,432 ✭✭✭

      Eli Evans said:

      The grouping needs to be at two levels, not just one. That is, if I select the 'lemma' view, it should group by lemma, but then group by resource within each lemma. Equally, if I select resource view, it should group first by resource, but then by lemma within each resource.

      I agree, and it's already in the works.

      This is much better in beta 3. Thanks!

      This is my personal Faithlife account. On 1 March 2022, I started working for Faithlife, and have a new 'official' user account. Posts on this account shouldn't be taken as official Faithlife views!