Page 1 of 2 (25 items) 1 2 Next >
This post has 24 Replies | 3 Followers

Posts 30
Kent | Forum Activity | Posted: Wed, Jul 6 2016 2:52 AM

Sometimes I felt that the bible search in CUV (both Shen and Shangti versions) is wrong. Now I found a good example to show that. The results seems totally wrong and useless. Can anyone have a look and fix it. Thank you.

1. search for "瞎了眼",I get 4 results.

2. search for "瞎了",I get another 4 results

3. search for "瞎", I get only 2 results. 

Posts 30
Kent | Forum Activity | Replied: Wed, Jul 6 2016 2:54 AM

I forgot to mention my version. It's Logos 6.12 SR-2.

Posts 9060
LogosEmployee

We have a case in our bug tracker for fixing problems searching Chinese text. I've added a link to this thread to the case.

Posts 1002
LimJK | Forum Activity | Replied: Sun, Nov 6 2016 6:00 PM

Bradley,

Adding another case for the search reliability issue:

I am trying to search for all occurrences of "国" (kingdom), for illustration I am just limiting to Daniel Chapter 2 ... noticed those that I annotated with red circle are words that the search missed.

By the way, I noticed that when I do a select the single word "国" in the CUV text, I always get the word with a neighboring word selected too, eg. "国度", "一国", etc. I am highlighting this as I think this may related the the search issue (just a guess ?)

JK

MacBookPro Retina 15" Late 2013 2.6GHz RAM:16GB SSD:500GB macOS Sierra 10.12.3 | iPhone 7 Plus iOS 10.2.1

Posts 9060
LogosEmployee

We will be releasing some improvements for Chinese search in the next version of the program (7.2, currently in beta). See the release notes here and the screenshot below: https://wiki.logos.com/Logos_7.2_Beta_3 

Posts 504
Leo Wee Fah | Forum Activity | Replied: Sun, Nov 6 2016 6:46 PM

YesYes

Posts 1002
LimJK | Forum Activity | Replied: Mon, Nov 7 2016 2:14 AM

Bradley,

Thanks for the heads up on beta ... look forward to that.

While I am at this, I noticed that text in "notes" of the CUV Bible are not being searched like in the English Bibles. An example here: I am searching for "定命" (command in English) here and notice that there is no hit.

Thanks!

JK

MacBookPro Retina 15" Late 2013 2.6GHz RAM:16GB SSD:500GB macOS Sierra 10.12.3 | iPhone 7 Plus iOS 10.2.1

Posts 9060
LogosEmployee

LimJK:
While I am at this, I noticed that text in "notes" of the CUV Bible are not being searched like in the English Bibles.

Unless I'm misunderstanding, that behaviour is what I would expect, and matches English Bibles. A "Bible" search will not find results in translators' notes, but they can be found with a "Basic" search.

Posts 1002
LimJK | Forum Activity | Replied: Mon, Nov 7 2016 8:32 AM

Oops ... sorry,

I meant Basis search ... see below, I search for Bible Text, Footnote Text and Translator's Note.

  • I have "Bible Text" selected just to show that the search is a valid search
  • Turn on "Footnote" and "Translator's note" ... yields no result ... specifically, missed the "定命" in the Footnote in Dan 2:5.

JK

MacBookPro Retina 15" Late 2013 2.6GHz RAM:16GB SSD:500GB macOS Sierra 10.12.3 | iPhone 7 Plus iOS 10.2.1

Posts 9060
LogosEmployee

I think this should also be fixed with the new search code.

Posts 1002
LimJK | Forum Activity | Replied: Mon, Nov 7 2016 8:57 AM

I suppose, I should test it after 7.2 is release ... Just to complete the various permutation, so that your tester(s) can tests these ... thanks

  • First test "定"was not pickup in the notes of Dan 2:5
  • Second test "命" picks up Dan 2:5, however "定命" missed, sounds like 7.2 should resolve this
  • Third test "定" AND "命" missed
  • Fourth test "定" WITHIN 2 WORDS "命" missed

JK

MacBookPro Retina 15" Late 2013 2.6GHz RAM:16GB SSD:500GB macOS Sierra 10.12.3 | iPhone 7 Plus iOS 10.2.1

Posts 1002
LimJK | Forum Activity | Replied: Mon, Nov 7 2016 8:58 AM

Bradley,

Thanks

JK

MacBookPro Retina 15" Late 2013 2.6GHz RAM:16GB SSD:500GB macOS Sierra 10.12.3 | iPhone 7 Plus iOS 10.2.1

Posts 1509
PL | Forum Activity | Replied: Mon, Nov 7 2016 9:51 AM

Hi Bradley,

Let me know if I should move this to the Beta forum.

A search query consisting of consecutive Han characters without spaces is treated as an AND search for all of the characters, regardless of order, with the added consideration that any consecutive pairs will merged into a single search hit. E.g. if A, B, and C are Han characters, then the search query ABC will return hits in articles or verses that contain A, B, AND C in any order, including consecutively.To restrict the result to just ABC you could specify the query as a phrase search “ABC”, although for ranked search results you should use ABC INTERSECTS “ABC”. This causes the search to use the better rankings of the search results for ABC, while limiting the results to the same set as the phrase search.

This new search behavior in Beta is extremely confusing for Chinese users. It swings the pendulum from missing a lot of hits to returning too many hits. E.g. A search for 和平 ("peace") also returns verses with 和平安祭 ("and peace offering"). These results are confusing and misleading.

I think as a start, for Chinese searches you should at least do the same as Korean - default the search for ABC as ABC INTERSECTS "ABC".  A search for ABC XYZ should be interpreted as (ABC INTERSECTS "ABC") AND (XYZ INTERSECTS "XYZ").

In addition, for the indexer to blindly index any two consecutive characters is a flawed approach, for two reasons:

1. Many Chinese terms (especially transliterated names) contain 3, 4, 5 characters.

2. Two consecutive characters may not necessarily imply they are related as one unit of meaning (as seen in the above 和平安祭 example).

You almost need to have a dictionary of multi-character terms (耶和華,和平,平安祭) for the indexer to know which consecutive characters form legitimate units of meaning, and which do not, in the absence of spaces in CJK languages.

Many Chinese input methods (輸入法) provide such a "dictionary" (詞典) of multi-character phrases, but I'm not sure if they can be leveraged for this purpose (technically and legally). I know Sougou Input Method 搜狗輸入法 has many user-created dictionaries 細胞詞典 and many users have created Christianity and Bible-related ones for other users to use.

My few cents. Thanks for trying to solve this hard problem with CJK languages. Prior to this I have avoided using Logos for Chinese Bible searches all these years because I know the search results are completely unreliable. Now with the Chinese Bronze version pending, this problem comes to the forefront.

Thanks,

Peter

Posts 89
LogosEmployee
Lawrence Rafferty | Forum Activity | Replied: Mon, Nov 7 2016 12:19 PM

PL:

This new search behavior in Beta is extremely confusing for Chinese users. It swings the pendulum from missing a lot of hits to returning too many hits. E.g. A search for 和平 ("peace") also returns verses with 和平安祭 ("and peace offering"). These results are confusing and misleading.

I think as a start, for Chinese searches you should at least do the same as Korean - default the search for ABC as ABC INTERSECTS "ABC".  A search for ABC XYZ should be interpreted as (ABC INTERSECTS "ABC") AND (XYZ INTERSECTS "XYZ").

Thank you for this feedback. If we change the search behavior to match Korean, you will also be forced to put spaces in the search query where you want to explicitly allow the terms to be treated separately. I take your feedback to mean that you would find this preferable, that you would rather be forced to always put spaces between words in you search query, for the benefit of not also getting unhelpful hits for every character in the search query. If I am mistaken please let me know.

PL:

In addition, for the indexer to blindly index any two consecutive characters is a flawed approach, for two reasons:

1. Many Chinese terms (especially transliterated names) contain 3, 4, 5 characters.

2. Two consecutive characters may not necessarily imply they are related as one unit of meaning (as seen in the above 和平安祭 example).

I think you misunderstand the reason for indexing every two consecutive characters. We index every *overlapping* two consecutive characters so we can merge the hits together into longer hits. That means we can find any one or two character word because it is in the search index, and we can find any longer word by merging the hits for the overlapping bigrams.

PL:

You almost need to have a dictionary of multi-character terms (耶和華,和平,平安祭) for the indexer to know which consecutive characters form legitimate units of meaning, and which do not, in the absence of spaces in CJK languages.

Many Chinese input methods (輸入法) provide such a "dictionary" (詞典) of multi-character phrases, but I'm not sure if they can be leveraged for this purpose (technically and legally). I know Sougou Input Method 搜狗輸入法 has many user-created dictionaries 細胞詞典 and many users have created Christianity and Bible-related ones for other users to use.



The prior indexer was using ICU's word breaker, which uses a dictionary, which has its own set of problems.

Our new approach is based on the scholarly research and software practices for CJK information retrieval. The options are basically to index unigrams, bigrams, or dictionary words, or some combination thereof, and the query parser must be coded to match the indexing method.

Our approach of indexing unigrams and overlapping bigrams solves the problem of unknowable word breaks by abandoning any attempt to predetermine them. Rather, we make sure we can find any string of characters the user wishes to find, in any combination. The main drawback to this approach is the loss of meaningful word based proximity searches, but that seems worth sacrificing for actually being able to find every occurrence of a given search term.

I'd like to thank you again for your thoughtful feedback and assure you we will investigate changing the parsing to be more like Korean as you suggested.

Posts 1509
PL | Forum Activity | Replied: Mon, Nov 7 2016 1:12 PM

Hi Lawrence,

Thanks for your prompt response and for your working for CJK users. I deeply appreciate Logos paying attention and investing time and resources to this longstanding search reliability issue.

Before you make the change, I think we should hear the preference from other users first.

I spent some time playing with the Beta again, and it actually is more usable than I first thought:

- In most cases I can just do "ABC" instead of ABC INTERSECTS "ABC" if I don't care about ranking.

- I can easily solve the problem of 和平 finding 和平安祭 by searching for “和平” -平安祭 (the search engine even recognizes the curly quotation marks that my Chinese input method uses by default, which is very cool).

- Searching for 耶穌福音 (Jesus Gospel) finds verses with 耶穌 and/or 福音 in whichever order, which is fine. (With the proposed change, users will have to search for 耶穌 福音 which I'm not sure everyone is OK with. To find the exact string in that exact order, user will have to use quotation marks.)

- Searching for 和平 without quotation marks will find MANY extraneous verses with 和 (and) and 平 (flat) in whichever order, which is confusing and seems wrong, but this can be easily solved by using quotation marks, and it only happens with bigrams or trigrams where each of the characters is also a common word that can be used in totally different contexts. Such phrases may be rarer than I originally think. I also tried 主的恩 (Lord's grace) which has the same issue but again putting quotation marks solves it.

I also tried doing similar searches on other Chinese Bible search iOS apps or engines (e.g. Bible Gateway) and compare results. Looks like 和平 trips up almost all of them.

Other CJK users, please chime in?

Thanks again!

Peter

Posts 306
LogosEmployee
Philip Peng | Forum Activity | Replied: Mon, Nov 7 2016 5:38 PM

PL:

- Searching for 耶穌福音 (Jesus Gospel) finds verses with 耶穌 and/or 福音 in whichever order, which is fine. (With the proposed change, users will have to search for 耶穌 福音 which I'm not sure everyone is OK with. To find the exact string in that exact order, user will have to use quotation marks.)

Hi, Peter,

Lawrence and I have worked on this for some time and we think using the Korean style by having a space between 耶穌 福音 might NOT be ok with other CJK users since it is usually no space between Chinese characters when typing or writing.  Therefore, Lawrence and I decided to use quotation marks for searching exact Chinese phrase.  We plan to tell users in our instructions to use quotation mark for exact phrase search results.  However, we are welcome for other Chinese users to give us the suggestions or inputs on your proposal.  Like Lawrence said, we will take all the suggestions into the consideration to solve this Chinese search issue for the majority of Chinese users.

I appreciate very much for your time and effort to help us improve Logos Bible Software user experiences.  We strive to give our Chinese users the best Bible software in the world.  Thank you once again!

Best regards,

Philip

Posts 1509
PL | Forum Activity | Replied: Mon, Nov 7 2016 5:44 PM

Thank you Philip and Lawrence (and Logos!)  Either implementation will be acceptable to me. By using quotation marks, your search results are already better than the other apps/websites I've tested.

I'm eagerly looking forward to the Bronze package for a full Chinese Bible software experience!

Thank YOU for all of your diligence, professionalism, and service to the global Church!

Peter

Posts 306
LogosEmployee
Philip Peng | Forum Activity | Replied: Mon, Nov 7 2016 7:48 PM

PL:

Thank you Philip and Lawrence (and Logos!)  Either implementation will be acceptable to me. By using quotation marks, your search results are already better than the other apps/websites I've tested.

I'm eagerly looking forward to the Bronze package for a full Chinese Bible software experience!

Thank YOU for all of your diligence, professionalism, and service to the global Church!

Peter

Hi, Peter,

You are welcome.  Your encouragement and loyal support of Logos make our jobs more meaningful.  Thanks again for your contributions.

Best regards,

Philip

Posts 1002
LimJK | Forum Activity | Replied: Tue, Nov 8 2016 12:12 AM

Philip,

I have not participated in beta for a long time ... after hearing about 7.2 from Bradley above, I finally install the 7.2 RC1

Chinese Search is finally working nowSmile

See my input on 2 words search as a string if I initiate the search from the CUV text in the following post in beta.

http://community.logos.com/forums/p/132973/864140.aspx#864140 

JK

MacBookPro Retina 15" Late 2013 2.6GHz RAM:16GB SSD:500GB macOS Sierra 10.12.3 | iPhone 7 Plus iOS 10.2.1

Posts 1002
LimJK | Forum Activity | Replied: Sun, Nov 13 2016 9:03 PM

Hi,

Can someone with better command of the Chinese Language chip in on this, so that Logos can consider to fix this before releasing 7.2. Chinese is my second languageSmile

http://community.logos.com/forums/t/133070.aspx 

JK

MacBookPro Retina 15" Late 2013 2.6GHz RAM:16GB SSD:500GB macOS Sierra 10.12.3 | iPhone 7 Plus iOS 10.2.1

Page 1 of 2 (25 items) 1 2 Next > | RSS