TIP of the day: Nitty-gritty of the very basic search

MJ. Smith
MJ. Smith MVP Posts: 53,407
edited November 20 in English Forum

This post serves as an addendum to Tip of the day: basic search options (Search panel icon menu) and drop-down options

Can I use "Match case" to identify all cases of "job" referring to employment?

Almost but not quite. Because English uses capitals in titles and to start a sentence, the word may be capitalized and still refer to employment rather than being a proper noun. "Match case" when used with a word starting with a capital may always pick up false positives. "Match case" when used with a word starting with a lower case letter may always miss items.

Can I search for punctuation - for example to find all questions?

No. Punctuation as defined by industry standards is not indexed and therefore can not be search. See http://www.fileformat.info/ to determine if a Unicode character is considered to be punctuation.

In addition, some punctuation serves specific purposes within a search string which led to this exchange:

NB.Mick said:

While the questionmark works as a wildcard to find one character (counterintuitively even within phrases),

Logos does not index punctuation characters, and they are ignored during a search  e.g. "lord jesus christ" will find "Lord, Jesus Christ" in 1 Cor 8:6 (ESV). If you try to search for "lord, jesus christ" the comma is also ignored and you get the same results as before. If you search for "lord? jesus christ" you get zero results because the search engine is looking for a single non-punctuation character immediately after "lord"  e.g. "lords jesus christ".

If you search for "lord ? jesus christ" the engine is looking for a one character word (e.g. "a") following lord, and fails. But a search for "lord ??? christ" will find "lord and christ" because it is looking for a 3 character word following lord.

Now, whilst "lord? jesus christ" failed, a search for "lord* jesus christ" will succeed, giving the same results as "lord jesus christ" in this case. The * wildcard looks for a sequence of zero or more non-punctuation characters.

I trust that illustrates the purpose of a wildcard in a phrase (a wildcard character won't match punctuation!).

The engine does allow matches where punctuation is part of a word e.g. enter God and you will see a suggestion for God's. One will search strictly for "God", the latter will look only for "God's". Match all word forms will search for both (as well as "gods").

Finally,  Lord BEFORE 3-3 CHARS “Jesus Christ” will find only “Lord, Jesus Christ” (in ESV) because the CHARACTER proximity (exactly 3 chars) includes the comma! Ergo,  Lord BEFORE 2 CHARS “Jesus Christ” will find  “Lord Jesus Christ” (without the comma).

EDIT: (subtle refinement)

Wildcard will search for apostrophe when part of a word e.g  God?s will find God's

Note: you can use a Find (string search) to find punctuation

The Pilcrow Sign, U+00B6, has character class "Punctuation, other". http://www.fileformat.info/info/unicode/char/b6/index.htm

Punctuation isn't indexed. You can use Ctrl+F Find.

I set "Match all forms" but "sit" misses "sat" - why?

"Match all forms uses a stemming routine to identify what is or is not the "same form". These stemming routines are specific to a particular language (or set of languages) and generally do not handle exceptions.

To handle irregular forms, list the forms separated by commas in addition to setting "match all forms".

Can I run a string search in the Search panel?

Simple answer: no, the Search panel is word oriented. However, the "Find (in this panel)" option provides a true string search.

Long answer: regular expression searches are no longer a supported feature but they sometimes works. This gives one a true string search.

Do I have to enter the diacritical marks?

Testing, I get the same results with or without the diacritical marks - somewhat to my surprise.

I recall a recent exchange between Rosie and Bradley regarding Lord in all (small) caps, but I cannot locate it.

Have you found any other interesting details on how the very basic elements of the search work?

Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."

Comments

  • John Kight
    John Kight Member Posts: 1,618

    Thanks! This was a helpful explanation. 

    For book reviews and more visit sojotheo.com 

  • Mark Barnes
    Mark Barnes Member Posts: 15,432 ✭✭✭

    MJ. Smith said:

    Testing, I get the same results with or without the diacritical marks - somewhat to my surprise.

    You can override this behaviour if you wish: https://wiki.logos.com/Search_Matching_Commands 

    [match exact]Erklarung will NOT find Erklärung, and vice versa.

    This is my personal Faithlife account. On 1 March 2022, I started working for Faithlife, and have a new 'official' user account. Posts on this account shouldn't be taken as official Faithlife views!