Regular expression question for Logos

Page 1 of 1 (15 items)
This post has 14 Replies | 1 Follower

Posts 116
Matt | Forum Activity | Posted: Fri, Feb 19 2010 12:50 PM

Hello,

  By default, regular expressions are supposed to be case sensitive.  Hence, if one were searching for all instances of 'Father' or 'father', one would need to construct the regex as e.g. [Ff]{1}ather.  In Logos 4, if I enter the regex

/[F]{1}ather/

It returns instances of 'Father' and 'father.'  Mind you, this should have nothing to do with whether or not 'Case sensitive' is selected in the search panel option.  Because it appeared that Logos has deviated from standard regular expression syntax, I worked around it by creating the following regular expression:

/(?-i)(?:[F]{1}ather)/

OK, it worked, but why the deviation from the standard?

Also, I'd appreciate some help on query parsing.  Continuing the previous example, if I wanted to mix a regular expression and plain text in the same query, should I assume that where the starting '/' and ending '/' delimit the regular expression, that I should consider that a term?  If so, this would explain the strange results I get if I run the following query:

/(?-i)(?:[F]{1})/ather

Finally, when is Logos going to present a help file for the search syntax that is at least as thorough as that which existed in Libronix 3?  Even a work in progress that lists what search functionality is missing in L4 that existed in L3 and the possible workarounds, new search syntax features, etc. would be much more helpful than nothing.

Thanks,

Matt

 

Specs:

  • Windows 7 x64
  • Quad Xeon 2.83 GHz x2
  • 16GB RAM
  • Nvidia 285 GTX 1GB VRAM
  • Logos 4.1 Platinum, SR-3, indexed
    Posts 132
    LogosEmployee
    Peter Venable | Forum Activity | Replied: Fri, Feb 19 2010 2:38 PM

    Yes, the slashes are delimiters, so I wouldn't expect /(?-i)(?:[F]{1})/ather to work.

    Posts 116
    Matt | Forum Activity | Replied: Fri, Feb 19 2010 2:51 PM

    Peter Venable:
    Yes, the slashes are delimiters, so I wouldn't expect /(?-i)(?:[F]{1})/ather to work.

    Hi Peter, thank you for the reply.  The query actually did work in the sense that Logos 4 accepted it as a valid query and returned 316,032 results in 149,251 articles.  What I'm trying to figure out is how the Logos 4 query parser actually parsed it.  I ran an experiment using the query

    /(?-i)(?:[F]{1})/ OR ather

    and Logos 4 returned 316,050 results in 149,263 articles.

    I then ran the query

    /(?-i)(?:[F]{1})/ AND ather

    and Logos 4 returned 19 results in 6 articles.

    So, as you can see, it appears to be some sort of amalgam between OR and AND.  These types of questions are important because it helps the end user to understand how queries work and the results one should expect from them.

    Any feedback on my other questions in the original post?

    Thanks,

    Matt

    Specs:

  • Windows 7 x64
  • Quad Xeon 2.83 GHz x2
  • 16GB RAM
  • Nvidia 285 GTX 1GB VRAM
  • Logos 4.1 Platinum, SR-3, indexed
    Posts 132
    LogosEmployee
    Peter Venable | Forum Activity | Replied: Fri, Feb 19 2010 3:23 PM

    Here are my results, with match all word forms on:

    /(?-i)(?:[F]{1})/ 260826

    /(?-i)(?:[F]{1})/ather 260826

    ather 13

    /(?-i)(?:[F]{1})/ OR ather 260893

    /(?-i)(?:[F]{1})/  ather  14

    /(?-i)(?:[F]{1})/  AND ather  14

    So... I conclude that /(?-i)(?:[F]{1})/ather is being interpreted as if it were /(?-i)(?:[F]{1})/

    Hmm... actually that makes (some) sense because arguments are allowed after the final slash.  For example /father/i means case-insensitive.  Invalid options are ignored.  Currently valid (though not necessarily supported) options are:

    • i ignore case
    • x ignore pattern whitespace
    • r right-to-left
    • n explicit capture

     

    Posts 116
    Matt | Forum Activity | Replied: Fri, Feb 19 2010 3:58 PM

    Peter Venable:
    So... I conclude that /(?-i)(?:[F]{1})/ather is being interpreted as if it were /(?-i)(?:[F]{1})/

    Great!  So, the closing '/' delimiter not only signals the end of the regular expression proper, but also, potentially, the beginning of the the regular expression modifiers.

    Peter Venable:
    Currently valid (though not necessarily supported) options are

    If I'm not mistaken you have a background in databases.  Imagine this:  You're presented with a wealth of data and the SQL frontend you're using has no syntax guide and you're left to figure out how SQL statements work and whether the SQL frontend your'e working with supports only a subset of valid SQL (such as WQL: WMI Query Language) or the full breadth and depth of SQL.  You can imagine the frustration that lack of documentation would cause to one needing to query the data.  So, just so nothing is lost in all of these posts, let me, as clearly as I can, ask for these remaining questions to be answered.

    1. When is Logos going to document the search syntax and via that let us know what is both valid and supported by the software?  Is this even on a Logos to-do list?
    2. Why isn't case sensitivity turned on for regular expressions by default, is this an oversight that will be rectified in the future so that those of us familiar with regular expressions won't have to deal with the silly (?-i) syntax?
    3. Is there a way to use the 'i ignore case' option in the negative?  e.g. I just tried /[F]{1}ather/-i and it didn't work.

    Thanks,

    Matt

    Specs:

  • Windows 7 x64
  • Quad Xeon 2.83 GHz x2
  • 16GB RAM
  • Nvidia 285 GTX 1GB VRAM
  • Logos 4.1 Platinum, SR-3, indexed
    Posts 132
    LogosEmployee
    Peter Venable | Forum Activity | Replied: Fri, Feb 19 2010 4:12 PM

    I can't say when these will get done but I can assure you they've been added to our issue tracking system.

    I think #2 and #3 are just two approaches to the same problem.  It was by design that case sensitivity uses the "Match case" option, but an oversight that there's no way to negate the ignore case option directly.  In my opinion, either ignoring the "Match case" option for regexes or supporting -i for case-sensitivity (or both) would suffice; I'm not sure which is better for the average user.

    Posts 116
    Matt | Forum Activity | Replied: Fri, Feb 19 2010 4:43 PM

    Peter Venable:
    I can't say when these will get done but I can assure you they've been added to our issue tracking system.

    Peter, that's great!  Just knowing that Logos understands that the syntax help needs to be written is encouraging in itself.  Although I must protest that if one considers that being able to rapidly and powerfully search one's digital library is arguably the raison d'etre to one having a digital library, it would be wonderful if Logos would take the syntax documentation seriously and give it a much higher priority.

    As to your comment on #2 and #3, I agree that they are both solutions to the same problem.  I broke it out because I was generally interested in the ability to negate any of the regular expression options that you had listed (i.e. 'i' 'x' 'r' and 'n').

    Finally, I would argue that regular expression functionality in Logos 4 should match, as closely as possible, regular expression functionality and expected behavior as found in Python, C#, C++ (boost), etc., in that by default, match case is on.

    Thanks,

    Matt

    Specs:

  • Windows 7 x64
  • Quad Xeon 2.83 GHz x2
  • 16GB RAM
  • Nvidia 285 GTX 1GB VRAM
  • Logos 4.1 Platinum, SR-3, indexed
    Posts 19710
    Rosie Perera | Forum Activity | Replied: Fri, Feb 19 2010 5:39 PM

    This is a fascinating thread. I am a power searcher and I didn't even know that Logos does regular expressions, so that shows the documentation needs to be written! The only Search syntax I knew of is documented on http://wiki.logos.com/Search_HELP.

    Matt, where did you find "the regular expression options...listed (i.e. 'i' 'x' 'r' and 'n')"?

    Posts 9205
    LogosEmployee

    Regular expressions are undocumented because they're unsupported. They may work (to some degree) in the current beta (and, as you've found out, there are bugs), but they're not currently part of the official feature set. They may change or be removed in a future release.

    Posts 116
    Matt | Forum Activity | Replied: Fri, Feb 19 2010 8:05 PM

    Bradley Grainger:

    Regular expressions are undocumented because they're unsupported. They may work (to some degree) in the current beta (and, as you've found out, there are bugs), but they're not currently part of the official feature set. They may change or be removed in a future release.

    That's really disappointing and if true is quite poor customer service on the part of Logos because one of the prime reasons I purchased the software was because of its regular expression support as detailed in various posts such as the following which I had read about Libronix 3 prior to my purchase of Logos 4:

    http://blog.logos.com/archives/2008/06/how_can_i_find_all_geminate_verbs_in_the_ot.html

    http://www.logos.com/support/lbs/HebrewRegularExpressions

    http://blog.logos.com/archives/2008/05/which_theologian_uses_the_most_latin_fun_with_data_types_and_regular_expressions.html

    http://blog.logos.com/archives/2008/05/understanding_data_types_language_data_types_1.html

    I had even read the coder's blog about .Net and regular expressions here:

    http://code.logos.com/blog/2008/07/net_regular_expressions_and_unicode.html

    Because Logos 4 was touted as being vastly superior in terms of searching, one would naturally expect that the search functionality would have been expanded not reduced.  A lot of what worked in Libronix 3 doesn't work, and the 'help' that exists in L4 regarding syntax is barely a skeleton.  I was depending upon what was written on the Logos 4 'missing features' page:

    Great News!

    Logos Bible Software 4 doesn't disable your Libronix DLS 3.x installation. So when you upgrade you don't lose anything—you can continue to use LDLS 3.x for anything that doesn't yet have a Logos 4 equivalent!

    What is the official and supported query syntax and what further features will be removed that existed in L3?  What will be the equivalent of regular expressions in Logos 4?

    Thanks,

    Matt

    Specs:

  • Windows 7 x64
  • Quad Xeon 2.83 GHz x2
  • 16GB RAM
  • Nvidia 285 GTX 1GB VRAM
  • Logos 4.1 Platinum, SR-3, indexed
    Posts 116
    Matt | Forum Activity | Replied: Fri, Feb 19 2010 8:07 PM

    Rosie Perera:
    Matt, where did you find "the regular expression options...listed (i.e. 'i' 'x' 'r' and 'n')"?

    Hi Rosie, as I mentioned in my post to Bradley, I had read all about the Logos product support of regular expressions prior to purchasing, and you can follow those links to some really neat examples.  The  'i' 'x' 'r' and 'n' that you saw was something I had never seen before Peter pointed it out to me.  If you have L3, you can just open the help file to 'Advanced Searching' and you'll see the regular expression documentation in all its glory.

    Matt

    Specs:

  • Windows 7 x64
  • Quad Xeon 2.83 GHz x2
  • 16GB RAM
  • Nvidia 285 GTX 1GB VRAM
  • Logos 4.1 Platinum, SR-3, indexed
    Posts 19710
    Rosie Perera | Forum Activity | Replied: Fri, Feb 19 2010 9:07 PM

    Matt:

    Hi Rosie, as I mentioned in my post to Bradley, I had read all about the Logos product support of regular expressions prior to purchasing, and you can follow those links to some really neat examples.  The  'i' 'x' 'r' and 'n' that you saw was something I had never seen before Peter pointed it out to me.  If you have L3, you can just open the help file to 'Advanced Searching' and you'll see the regular expression documentation in all its glory.

    Cool! Thanks. I wonder how many users know about this undocumented feature?

    Posts 28016
    Forum MVP
    Dave Hooton | Forum Activity | Replied: Fri, Feb 19 2010 10:04 PM

    Rosie Perera:
    Cool! Thanks. I wonder how many users know about this undocumented feature?

    There is so much of L4's Search that is incomplete or not properly implemented (equivalent to L3) that I hadn't worried about regexp! But L3's was always widely accessible and fairly useful.

    Dave
    ===

    Windows 11 & Android 8

    Posts 19710
    Rosie Perera | Forum Activity | Replied: Fri, Feb 19 2010 10:29 PM

    Dave Hooton:

    There is so much of L4's Search that is incomplete or not properly implemented (equivalent to L3)....

    I knew that nostem() was missing in L4, and I'd already noticed recently that the 'pipe' character ( | ) doesn't work as an equivalent to OR anymore, and was going to report that. But now that I look through the Advanced Searching help screen in L3, I realize that there is tons that's missing or at least not documented yet. I wish I had time to go try it all and see what's working and what's not, but I don't.

    Posts 107
    Andrew Malone | Forum Activity | Replied: Tue, Nov 15 2016 9:16 PM

    I'm just confirming that a number of regular expressions (at least the rudimentary ones that I've tested so far) seem to still be operable in L6.

    Page 1 of 1 (15 items) | RSS