INTERSECTS always superior to ANDEQUALS and WITHIN 0 WORDS for typical usage scenarios?

Page 1 of 1 (16 items)
This post has 15 Replies | 1 Follower

Posts 1499
Forum MVP
Fr Devin Roza | Forum Activity | Posted: Mon, Aug 29 2016 12:15 PM

Thanks for the new INTERSECTS operator. 

I've been running some tests, and I'm getting the impression that INTERSECTS is superior to ANDEQUALS at finding translated words, and that it is far superior to WITHIN 0 WORDS at finding others tags that correspond to words. 

For example, to find everywhere that the ESV translates κύριος as Lord, up to now we would use lemma:κύριος ANDEQUALS Lord. The INTERSECTS search works even better, however, and finds 1 more correct hit. lemma:κύριος INTERSECTS Lord find a correct hit in Matthew 21:42 that the ANDEQUALS search misses. (EDIT: Looks like it's because the ESV Interlinear classifies it as part of an "idiomatic expression"... although as regards in Mt 21:42, the word kurios --> Lord is quite literal... for all practical purposes this would still be a disadvantage of ANDEQUALS for almost all usage scenarios I think).

Previously, if we wanted to find places where the word Lord is used to refer to <Person Jesus>, we would search for Lord WITHIN 0 WORDS <Person Jesus>. This works, but often returns "false hits" that are nearby the word "Lord." The search Lord INTERSECTS <Person Jesus> works perfectly, however, as best I can tell. 

Is this expected behavior? Can we think of INTERSECTS as replacing these two operators for all practical purposes? And, can anyone think of any real-life case usage scenarios where ANDEQUALS and WITHIN 0 WORDS would still be of use?

Posts 3619
Francis | Forum Activity | Replied: Mon, Aug 29 2016 1:08 PM

As far as I can tell ANDEQUALS works in morph search, but not INTERSECTS.

Posts 1499
Forum MVP
Fr Devin Roza | Forum Activity | Replied: Mon, Aug 29 2016 1:14 PM

Francis:

As far as I can tell ANDEQUALS works in morph search, but not INTERSECTS.

For me, INTERSECTS is working on the Morph tab (note this is a v7.1 Beta feature, in case you don't have the Beta installed?):

Posts 3619
Francis | Forum Activity | Replied: Mon, Aug 29 2016 1:18 PM

Right... my mistake.

I cannot experiment with this right now because I don't want to go beta, but it seems to me that the idea of intersection is intrinsically broader than that of equation. Although I cannot think of a specific case right now, it would seem to me that in certain cases, intersection could produce broader results than what is expected if one's goal is to find equation. This being said, I am not sure I follow the reference to an idiomatic expression in the ESV as an explanation for the missing result when using ANDEQUALS: the surface text is what is being looked for by 'Lord' isn't it? And the underlying lemma certainly is kyrios. So I don't get why it's not a hit.

Posts 1499
Forum MVP
Fr Devin Roza | Forum Activity | Replied: Mon, Aug 29 2016 2:18 PM

Francis:

This being said, I am not sure I follow the reference to an idiomatic expression in the ESV as an explanation for the missing result when using ANDEQUALS: the surface text is what is being looked for by 'Lord' isn't it? And the underlying lemma certainly is kyrios. So I don't get why it's not a hit.

Here's an image that can help understand why ANDEQUALS comes up short in this particular case:

The text in the ESV interlinear is in italics, indicating this is an idiomatic expression. For idiomatic expressions, words have not been lined up one by one, but as a group. 

With ANDEQUALS, things are supposed to start and end at the same "location". This almost always occurs with lemmas. But with these idiomatic phrases, the tagging doesn't define which lemma corresponds to which word in the idiomatic phrase. So, even though they "intersect", they don't start and end at the same location, so ANDEQUALS doesn't find them. 

So, INTERSECTS gives more hits, normally correct hits. It might occasionally give a false positive. So: 

lemma:κύριος INTERSECTS doing

returns Matthew 21:42, because doing is also part of the idiomatic expression. 

But when we actually run real-life searches for words being translated, it's very hard for me to imagine a case where a false positive would occur with INTERSECTS (nobody searches for lemma:κύριος INTERSECTS doing in real life). It's much easier, I think, that you would miss hits with ANDEQUALS where you have idiomatic expressions, as in fact occurs in lemma:κύριος ANDEQUALS Lord.

Posts 7906
LogosEmployee

(To the best of my knowledge; I could be mistaken on some of the finer points but will try to return and edit this post as necessary)

ANDEQUALS: Originally developed for English+Strongs (now English+lemma or lemma+morph) searching. Requires hits for the two terms to be exactly at the same offset and length. De-duplicates hits before returning them.

WITHIN 0 CHARS: Requires the two terms to be indexed at positions that differ by at most zero characters; in practice, this means intersection of hits. Returns all the hits without de-duplicating them. 

WITHIN 0 WORDS: Should be the same as WITHIN 0 CHARS (except measured in words) except it looks like I found a bug that returns false positives (when word counts aren't known, e.g., for terms inside footnotes which don't count as words in the surface text?).

INTERSECTS: Requires the terms to have hits that intersect/overlap. Returns a new hit from just the overlapping portions of the hits and de-duplicates the results.

ANDEQUALS will remain the most "precise" operator (and should return a subset of the results from the other operators). All the operators can have false positives (for lemma@morph) when lemmas & morphs from two different Greek/Hebrew words are aligned to the same English word. ANDEQUALS can have false negatives (for English+lemma) when Greek/Hebrew is aligned to a phrase; the other operators can have false positives in that situation instead.

ANDEQUALS and INTERSECTS will probably return the result count you're expecting, i.e., the number of hits will be the number of distinct highlights shown in search results.

WITHIN 0 CHARS may still have some use for making matches from idioms more obvious because it will highlight the entirety of the matched idiom (e.g., "was the Lord's doing"), not just the smallest individual piece ("Lord" or "doing").

Probably a good rule of thumb is to always start with INTERSECTS by default, then consider switching to another operator if you're not getting the results you want, and if you understand why a different operator will give you different results?

Fr Devin Roza:
Can we think of INTERSECTS as replacing these two operators for all practical purposes?

I think so, yes.

Posts 3619
Francis | Forum Activity | Replied: Tue, Aug 30 2016 12:48 AM

Fr Devin Roza:
But with these idiomatic phrases, the tagging doesn't define which lemma corresponds to which word in the idiomatic phrase.

Yet the context menu does link up the surface text "Lord" with the lemma "Kyrios", and, as even your screenshot shows, each word in the idiomatic expression is also linked up with its corresponding lemma. Are you sure this is the reason or is it your hypothesis?

Posts 1499
Forum MVP
Fr Devin Roza | Forum Activity | Replied: Tue, Aug 30 2016 1:30 AM

Francis:

Fr Devin Roza:
But with these idiomatic phrases, the tagging doesn't define which lemma corresponds to which word in the idiomatic phrase.

Yet the context menu does link up the surface text "Lord" with the lemma "Kyrios", and, as even your screenshot shows, each word in the idiomatic expression is also linked up with its corresponding lemma. 

The screenshot above from the Interlinear shows that the entire English phrase is tagged to the entire Greek phrase. But the specific English words are not aligned to specific Greek words. If they happen to be in the "right order", it is pure coincidence. That is why they are all blue. To select one is to select all three.

This is also reflected when right clicking on the word "Lord's":

The word "Lord's" is tagged to all three Greek words. Verbum has no idea which of the three it actually corresponds to in the ESV. (Note this doesn't apply to the original Greek versions, where each word is individually tagged).

The ANDEQUALS search that reflects what the Interlinear displays in the ESV would be this one, which returns Mt 21:42:

"παρὰ κυρίου ἐγένετο" ANDEQUALS "was the Lord’s doing"

ANDEQUALS doesn't work here because of what Bradley described as: "Requires hits for the two terms to be exactly at the same offset and length."

Francis:

Are you sure this is the reason or is it your hypothesis?

I'm pretty sure (Bradley could confirm). But cf. below for an example that would seem to indicate this is only limited to phrases classified by the interlinear as idiomatic. Other phrases seem to be more flexible with ANDEQUALS.

Posts 1499
Forum MVP
Fr Devin Roza | Forum Activity | Replied: Tue, Aug 30 2016 1:33 AM

Bradley Grainger (Faithlife):

Probably a good rule of thumb is to always start with INTERSECTS by default, then consider switching to another operator if you're not getting the results you want, and if you understand why a different operator will give you different results?

Fr Devin Roza:
Can we think of INTERSECTS as replacing these two operators for all practical purposes?

I think so, yes.

Thanks, Bradley, for taking the time to put this together. This post is a great reference point.

And INTERSECTS is a fantastic operator - it really simplifies and improves a lot of searches! Much appreciated!

Posts 1499
Forum MVP
Fr Devin Roza | Forum Activity | Replied: Tue, Aug 30 2016 1:56 AM

Bradley Grainger (Faithlife):

ANDEQUALS can have false negatives (for English+lemma) when Greek/Hebrew is aligned to a phrase

"Lord's" in Mt 21:42 would seem to indicated pretty clearly that idiomatic phrases don't work well with ANDEQUALS. 

But other phrases that aren't classified as idiomatic by the interlinear seem to be more flexible. For example, in Mt 21:42, the English "become" corresponds to "ἐγενήθη εἰς" in Greek, and the interlinear and right-click menu works similarly, with multiple words attached to the single English word.

Yet the ANDEQUALS is flexible. All six of these searches return Mt 21:42:

"ἐγενήθη εἰς" ANDEQUALS become

ἐγενήθη ANDEQUALS become

εἰς ANDEQUALS become

<Lemma = lbs/el/γίνομαι> ANDEQUALS become

<Lemma = lbs/el/εἰς> ANDEQUALS become

"<Lemma = lbs/el/γίνομαι> <Lemma = lbs/el/εἰς>" ANDEQUALS become

These here would seem to go against the idea that ANDEQUALS "Requires hits for the two terms to be exactly at the same offset and length." Why do they work, when the idiomatic phrases do not?  [EDIT: Found the answer, cf. explanation in my next post below. As long as the English phrase is complete, the individual Greek words or the Greek phrase all match it with ANDEQUALS].

Posts 3619
Francis | Forum Activity | Replied: Tue, Aug 30 2016 2:27 AM

Are there no other idiomatic expressions in the hits that are found? I tried to do a figurative language search but the syntax in the help file does not work (even the examples provided do not work!) and I don't have time to figure it out just now (it's been a while since I've used dataset/types searches and I am rusty on proper syntax). My idea was to run the search you conducted initially (with ANDEQUALS) and then search the passage list for figurative language. This could confirm or infirm that the missing hit is due to figurative language. But of course, perhaps you have done this already or are completely sure for other, equally valid, reasons. 

Posts 1499
Forum MVP
Fr Devin Roza | Forum Activity | Replied: Tue, Aug 30 2016 2:56 AM

Francis:

Are there no other idiomatic expressions in the hits that are found? 

Apparently not according to the ESV Interlinear! 

Here is another example, Mt 1:18, the expression "to be with child". The phrase is classified as an idiomatic expression in the Interlinears (you can see this because it is in italics).

The Greek is ἐν γαστρὶ ἔχουσα. And, just like with our previous example, a search like <Lemma = lbs/el/γαστήρ> ANDEQUALS child does not work. Individual Greek words are not aligned with individual English words. 

However, searching for the individual Greek words aligned with the entire English idiomatic expression does work with ANDEQUALS!

<Lemma = lbs/el/γαστήρ> ANDEQUALS "to be with child"

<Lemma = lbs/el/ἐν> ANDEQUALS "to be with child"

<Lemma = lbs/el/ἔχω> ANDEQUALS "to be with child"

So, with phrases that the interlinears classify as idiomatic expressions, each Greek word (as well as the entire Greek phrase) is lined up with the entire English phrase. They all "begin and end" at the same place that the entire English phrase does, and thus ANDEQUALS only works with the entire English phrase. 

And so, with the previous example from Mt 21:42, the following search works:

<Lemma = lbs/el/κύριος> ANDEQUALS "the Lord's doing"

While the one we are really interested in in real life does not:

<Lemma = lbs/el/κύριος> ANDEQUALS Lord

This also corresponds to the example in Mt 21:42 of "become", where multiple Greek words align with a single English words. The English word is what everything is aligned to, and so ANDEQUALS finds it, either with the entire Greek phrase, or with any of their parts.

Conclusion: Grateful for INTERSECTS, which always returns hoped for results in all these cases! Smile 

Posts 13126
Forum MVP
Mark Barnes | Forum Activity | Replied: Tue, Aug 30 2016 3:51 AM

Fr Devin Roza:
"Requires hits for the two terms to be exactly at the same offset and length." Why do they work, when the idiomatic phrases do not?

It's the 'length' of the interlinear cells that matters, I think, not the length of the word(s).

Posts 1499
Forum MVP
Fr Devin Roza | Forum Activity | Replied: Tue, Aug 30 2016 5:49 AM

Mark Barnes:

Fr Devin Roza:
"Requires hits for the two terms to be exactly at the same offset and length." Why do they work, when the idiomatic phrases do not?

It's the 'length' of the interlinear cells that matters, I think, not the length of the word(s).

That's my conclusion as well, with the added nuance (new for me) that the Surface Text interlinear cell determines the smallest possible "length". So, when multiple Greek words are inside a single interlinear cell, each Greek word individually has the same "length" as the Surface Text cell. 

Posts 7906
LogosEmployee

Fr Devin Roza:
So, when multiple Greek words are inside a single interlinear cell, each Greek word individually has the same "length" as the Surface Text cell. 

This is correct, and is the explanation for the behaviour you're observing. Offsets/lengths for indexing are derived from the surface text.

Posts 24450
Forum MVP
Dave Hooton | Forum Activity | Replied: Mon, Sep 5 2016 3:23 PM

Bradley Grainger (Faithlife):

Probably a good rule of thumb is to always start with INTERSECTS by default, then consider switching to another operator if you're not getting the results you want, and if you understand why a different operator will give you different results?

Fr Devin Roza:
Can we think of INTERSECTS as replacing these two operators for all practical purposes?

I think so, yes.

That would be my position for reverse interlinears.

Dave
===

Windows & Android

Page 1 of 1 (16 items) | RSS