SEARCH Punctuation question for Bradley, Dave or other super search nerd.

MJ. Smith
MJ. Smith MVP Posts: 53,857
edited November 2024 in English Forum

I just discovered that my second rule of building a search is wrong. It was my understanding that one could not use punctuation in a search term e.g. here's would be treated as heres. It was also my understanding that FileFormat.Info was our "official" super-nerd site to identify Unicode characters that are considered "punctuation".

I was wrong. "aint" and "ain't" produce different results - the apostrophe being treated as a legit letter although defined as punctuation at https://www.fileformat.info/info/unicode/category/Po/list.htm. What is the actual treatment of punctuation?

Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."

Tagged:

Comments

  • NB.Mick
    NB.Mick MVP Posts: 16,049

    MJ. Smith said:

    I just discovered that my second rule of building a search is wrong. It was my understanding that one could not use punctuation in a search term e.g. here's would be treated as heres. It was also my understanding that FileFormat.Info was our "official" super-nerd site to identify Unicode characters that are considered "punctuation".

    I was wrong. "aint" and "ain't" produce different results - the apostrophe being treated as a legit letter although defined as punctuation at https://www.fileformat.info/info/unicode/category/Po/list.htm. What is the actual treatment of punctuation?

    Not a super search nerd (have never heard of your file reference), but I think it was Bradley who recently corrected my mis-assumption that punctuation simply is left out: I was informed This only is correct between words, but not within words. The apostrophe within "ain't" then should make a difference. I'll look up the thread later.

    Have joy in the Lord! Smile

  • MJ. Smith
    MJ. Smith MVP Posts: 53,857

    Thanks - that matches what I'm seeing but doesn't match some earlier answers in the forums.

    Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."

  • NB.Mick
    NB.Mick MVP Posts: 16,049

    MJ. Smith said:

    Thanks - that matches what I'm seeing but doesn't match some earlier answers in the forums.

    The discussion was in a Faithlife Search group thread  about a bug with multiple "OR"s (fixed with 8.13) about finding what in the end turned out to be  "Beni Na'im", "Beni Naʿim", "Beni Na im", "Beni Naim" (you can have OR instead of the commas now that the bug is fixed, but different-style apostrophes are different from each other and different from space and different from simply leaving them off, and regardless off the match-all-forms setting) and Bradley commented on my cited side remark as follows:

    "Intra-word punctuation has always been indexed."

    Actually that wasn't what I had remembered, but maybe there just was no need to discuss intra-word punctuation before.

    Have joy in the Lord! Smile

  • Dave Hooton
    Dave Hooton MVP Posts: 35,878

    NB.Mick said:

    "Intra-word punctuation has always been indexed."

    The classic case is to Search for God with Match all word Forms; which will provide "God's" as a result (as well as "gods"). To search for "God's" only, you have to enter God's, and turn off Match all word Forms.

    Hyphenated words are also indexed e.g. intra-text, and you also find them with "intra text" (quotes included).

    Dave
    ===

    Windows 11 & Android 13

  • Keep Smiling 4 Jesus :)
    Keep Smiling 4 Jesus :) MVP Posts: 23,136

    To search for "God's" only, you have to enter God's, and turn off Match all word Forms.

    Alternative (with Match all word Forms checked) using an advanced search directive is:

    [match exact] God's

    [match all] God's

    Keep Smiling [:)]