Do proximity searches use exact number of words

Page 1 of 1 (10 items)
This post has 9 Replies | 0 Followers

Posts 725
Harry Hahne | Forum Activity | Posted: Wed, Sep 22 2010 4:21 PM

In Logos 3 proximity searches (BEFORE, WITHIN, etc.), "words" were estimated based on a number of character positions. The Logos 3 help says:

Terms in Libronix DLS resources are only indexed by character positions; actual word counts are not stored in the index. For this reason, word proximity is simulated at search time by using an average word length based on the language of the text being searched. For English, the average word length is assumed to be six characters plus an extra allowance for spaces, punctuation, and other potential intervening characters.

Does anyone know if this is still true in Logos 4 or do proximity searches in Logos 4 actually measure the proximity between words as actual words?

Posts 19689
Rosie Perera | Forum Activity | Replied: Wed, Sep 22 2010 4:44 PM

I'm afraid I don't know. We'll probably need a Logos developer to answer this. But if I were implementing this I'd probably do it the same way it was done in L3. I would think that a "word" proximity search would be relatively infrequent and wouldn't warrant the excess data storage to be able to compute it exactly. You'd have to store not only the character offset but the word count for every single occurrence of every single word in every single resource. It would also slow down indexing.

Posts 13428
Mark Barnes | Forum Activity | Replied: Wed, Sep 22 2010 4:48 PM

Given the fact that antidisestablishmentarianism WITHIN 3 WORDS nothing correctly returns "In fact, with the exception of “antidisestablishmentarianism,” which has nothing to do with theology, these are the longest and most confusing words I know…", I would say it does use the exact number of words.

This is my personal Faithlife account. On 1 March 2022, I started working for Faithlife, and have a new 'official' user account. Posts on this account shouldn't be taken as official Faithlife views!

Posts 27974
Forum MVP
Dave Hooton | Forum Activity | Replied: Wed, Sep 22 2010 6:27 PM

Harry Hahne:
do proximity searches in Logos 4 actually measure the proximity between words as actual words?

Yes. That was clarified here! To be clear, though:-

"the cat and the bag"   ==>   "bag" is 4 words from "the" .

So proximity is distance (4 words) rather than separation (3 words).

Dave
===

Windows 11 & Android 8

Posts 19689
Rosie Perera | Forum Activity | Replied: Wed, Sep 22 2010 6:37 PM

Mark Barnes:

Given the fact that antidisestablishmentarianism WITHIN 3 WORDS nothing correctly returns "In fact, with the exception of “antidisestablishmentarianism,” which has nothing to do with theology, these are the longest and most confusing words I know…", I would say it does use the exact number of words.

I would have thought you'd need to test it where antidisestablishmentarianism is one of the words in between, not one of the endpoints. That occurrence of antidisestablishmentarianism you found is still within about 3 x 6 letters of the word nothing, so the example doesn't prove anything. However I think you are right, because "Church of England" WITHIN 4 WORDS adjective finds

But "Church of England" WITHIN 3 WORDS adjective doesn't find it, which answers another question I've had about what "WITHIN" means.

antidisestablishmentarianism /ˌantɪdɪsɪˌstablɪʃm(ə)nˈtɛːrɪənɪz(ə)m/

noun rare opposition to the disestablishment of the Church of England.
derivatives antidisestablishmentarian noun & adjective

Posts 27974
Forum MVP
Dave Hooton | Forum Activity | Replied: Wed, Sep 22 2010 7:32 PM

Rosie Perera:
That occurrence of antidisestablishmentarianism you found is still within about 3 x 6 letters of the word nothing, so the example doesn't prove anything.

antidisestablishmentarianism WITHIN 3-3 WORDS nothing will establish that proximity is exactly 3 words.

Rosie Perera:

"Church of England" WITHIN 4 WORDS adjective finds

But "Church of England" WITHIN 3 WORDS adjective doesn't find it,


"Church of England" WITHIN 4-4 WORDS adjective should establish that  - and &  are disregarded (not words).

Dave
===

Windows 11 & Android 8

Posts 725
Harry Hahne | Forum Activity | Replied: Wed, Sep 22 2010 9:55 PM

Dave Hooton:

Thanks for the information. It is so good to hear that they non longer use the character based approach. I wish this type of thing was clearly documented.

Dave Hooton:

"the cat and the bag"   ==>   "bag" is 4 words from "the" .

So proximity is distance (4 words) rather than separation (3 words).

This is a very important distinction.

Posts 19689
Rosie Perera | Forum Activity | Replied: Wed, Sep 22 2010 10:04 PM

Dave Hooton:
"the cat and the bag"   ==>   "bag" is 4 words from "the"

Actually "bag" is only 1 word from "the" Stick out tongue

Posts 27974
Forum MVP
Dave Hooton | Forum Activity | Replied: Wed, Sep 22 2010 10:39 PM

Rosie Perera:
Actually "bag" is only 1 word from "the" Stick out tongue

Don't let the cat out of (the) bagSad

Dave
===

Windows 11 & Android 8

Posts 3004
Forum MVP
Jacob Hantla | Forum Activity | Replied: Wed, Sep 22 2010 11:24 PM

Mark Barnes:

Given the fact that antidisestablishmentarianism WITHIN 3 WORDS nothing correctly returns "In fact, with the exception of “antidisestablishmentarianism,” which has nothing to do with theology, these are the longest and most confusing words I know…", I would say it does use the exact number of words.

Yup because antidisestablishmentarianism WITHIN 2 WORDS nothing returns 0 results whereas antidisestablishmentarianism WITHIN 3 WORDS nothing returns 1

Jacob Hantla
Pastor/Elder, Grace Bible Church
gbcaz.org

Page 1 of 1 (10 items) | RSS