I remember that in L3 a word in the search YYY WITHIN 5 words ZZZ was defined by 5 characters or something like that. In L4 is it still defined this way or by the grammatical definition of a word?
My guess would be that hasn't changed.
While I can't answer the question, I believe something has changed. I remember Dave Hooton being excited about the fact that WITHIN, BEFORE, etc. were now couting "real" words (or something to that effect), which was an improvement from 3.0.
I remember Dave Hooton being excited about the fact that WITHIN, BEFORE, etc. were now couting "real" words
Yup, I am still excited. L4 counts grammatical words. You can try a simple test:-
Logos 4 uses the Unicode word-breaking algorithm (http://unicode.org/reports/tr29/) to split text into words when indexing. (Putting it simplistically, it splits on spaces, but there are some special cases: colons between letters break them into separate words, but colons between numbers don't. Hyphenated-phrases are also treated as separate words. There is currently no CKJV support (e.g., bigram indexing); Asian languages typically require a dictionary for good word-breaking, and we haven't implemented that.) There are also some smarts to ignore footnote characters when counting words in the surface text, except when those superscripted characters are significant (think "P71").
In general, though, it should do what you expect, so the "WORDS" unit in Logos 4 searches counts actual words; it is no longer a character-based simulation.
There is currently no CKJV support (e.g., bigram indexing); Asian languages typically require a dictionary for good word-breaking, and we haven't implemented that
RE: Japanese,
Is this for lack of a usable dictionary? (Jim Breen's JDIC) Is this on the future agenda?
Does Logos 4 search Hebrew by word-breaking? And is it only double-byte text that presents this problem in word-break searching?
There is currently no CKJV support (e.g., bigram indexing); Asian languages typically require a dictionary for good word-breaking, and we haven't implemented that RE: Japanese, Is this for lack of a usable dictionary? (Jim Breen's JDIC) Is this on the future agenda? Does Logos 4 search Hebrew by word-breaking? And is it only double-byte text that presents this problem in word-break searching?
I don't know what the roadmap is, but we worked closely with the Japan Bible Society to get the NIT etc. indexed correctly in LDLS3, so I expect a similar thing will happen in Logos 4.
Hebrew text is basically broken on spaces and punctuation (including maqqef and sof pasuq).
Double-byte text doesn't cause any problems itself; it's the script that is encoded that determines whether algorithmic word-breaking is feasible.
I am years behind on Unicode.
Can Logos not use the "Grapheme Cluster Boundaries" for searching? Must there be spaces? I thought Unicode Extended was supposed to solve all this.
I was very excited to see the Greek/Japanese Interlinear in Logos. I hope someday to see it run in Version 4.