Indexing: time to think of a better approach

Page 1 of 2 (35 items) 1 2 Next >
This post has 34 Replies | 3 Followers

Posts 3770
Francis | Forum Activity | Posted: Fri, Sep 16 2016 1:44 AM

Okay, I will confess right from the start, I am totally ignorant of the technical aspects of indexing and what constraints it places on how it works in Logos in relation to other aspects of the software. I am simply speaking from the standpoint of experience: latest example Wiersbe's study guide on Revelation from Vyrso. We're talking a small book to start with and vyrso books as I understand it, are precisely NOT tagged as logos books are. Yet here it is a long indexing session during which my computer, which otherwise performs very well is annoyingly lagging. All this for this one little, untagged book? I turned it off at 15% so I could get some work done.

Sure, some will say, download later, pause indexing, change process prioritisation. There is something to be said for all this, but each of these workarounds have their disadvantages as well. And these are bandaids, they don't deal with the root problem.

I remember in Wordsearch -- I know, a considerably less complex piece of software -- that they would do incremental indexing. I was under the impression that this was the case, at least at some point, in Logos as well, and that one could rebuild or consolidate the index. But then if it is incremental, why so long for so little material and why does it use so much processing power? 

Well, I know all this can be explained. That's not my point however. I don't want to be told that the reason my car is slow is because it is designed to work with square wheels; what I want to know is whether, really, we cannot revisit the design so as to use the much more efficient circular ones. I find it hard to imagine that there cannot be a better way. 

BTW, performance has long been a problem with Logos, and one that comes back in the conversation with users of other competing software ("I used Logos in the past but it was so slow..."). The overall performance has improved a lot (speed of searches), but indexing is perhaps another relic of a problematic approach. This has been the case long before resources started to be super-tagged in relation to all kinds of datasets and types, so it's difficult to think this is the reason. I don't know, was it coded unpropitiously and now it is hard to change it? 

Posts 30215
Forum MVP
MJ. Smith | Forum Activity | Replied: Fri, Sep 16 2016 2:04 AM

Are you sure your system wasn't correcting a corrupt index? Small books usually are invisible for me when they index ... it is reindexing that brings me to a cccrrraaawwwlll....

Orthodox Bishop Hilarion Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."

Posts 2652
Jan Krohn | Forum Activity | Replied: Fri, Sep 16 2016 2:13 AM

I had the same experience. The Wiersbe book was a pain to index.

Past IT Consultant. Past Mission Worker. Entrepreneur. Future Seminary Student.
Why Amazon sucks: Full background story of my legal dispute with the online giant

Posts 9414
Forum MVP
Bruce Dunning | Forum Activity | Replied: Fri, Sep 16 2016 5:14 AM

Jan Krohn:

I had the same experience. The Wiersbe book was a pain to index.

That's strange. I had no troubles when min Wiersbe book indexed.

Using adventure and community to challenge young people to continually say "yes" to God

Posts 3770
Francis | Forum Activity | Replied: Sat, Sep 17 2016 1:35 AM

Well, I cannot be sure that the system was not correcting a corrupt index, but the general experience is that indexing is a real drag and resource hog. I am sure this is not the case for everybody, or all the time, but it seems to me that there are enough complaints about that to show that it is not an exceptional case, but a situation that is problematic.

Posts 560
Glenn Crouch | Forum Activity | Replied: Sat, Sep 17 2016 1:55 AM

Bruce Dunning:

Jan Krohn:

I had the same experience. The Wiersbe book was a pain to index.

That's strange. I had no troubles when min Wiersbe book indexed.

Like Bruce, no problem at all with the Indexing of that book - quick and unobtrusive...

Pastor Glenn Crouch
St Paul's Lutheran Church
Kalgoorlie-Boulder, Western Australia

Posts 30215
Forum MVP
MJ. Smith | Forum Activity | Replied: Sat, Sep 17 2016 2:26 AM

Francis:
Well, I cannot be sure that the system was not correcting a corrupt index, but the general experience is that indexing is a real drag and resource hog.

I agree that when there is a real drag on performance it is often indexing ... or building menus the first time. But I'm not convinced that the problem is indexing per se ... I'm seeing too much that looks like the rush of purchases has generated a rush of corrupt indexes ... and until the corrupt indexes problem is resolved it is hard to judge how much improvement possibility exists in a standard update of indexes for new/updated resources.

Orthodox Bishop Hilarion Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."

Posts 13411
Mark Barnes | Forum Activity | Replied: Sat, Sep 17 2016 3:26 AM

Francis:
Okay, I will confess right from the start, I am totally ignorant of the technical aspects of indexing and what constraints it places on how it works in Logos in relation to other aspects of the software. I am simply speaking from the standpoint of experience: latest example Wiersbe's study guide on Revelation from Vyrso. We're talking a small book to start with and vyrso books as I understand it, are precisely NOT tagged as logos books are. Yet here it is a long indexing session during which my computer, which otherwise performs very well is annoyingly lagging. All this for this one little, untagged book? I turned it off at 15% so I could get some work done.

There are two major stages to indexing:

  1. Generating an index for that book
  2. Merging the new index into the existing index.

Stage (1) probably took a few seconds for Wiersbe, which is true of most books (complex resources like Bibles might take a little longer). The slow part of the process is the merge. The time taken to merge is proportional to the size of the existing index, so users with large libraries will see long index times, even when adding only one small book.

For example, the last indexing done on my system took 6.9s (one resource). The merge took 11m 14s. I looked through older log files to find an occasion where several resources where indexed at once, and found one where the initial index took 7m and 49s, whilst the merge took 14m and 54s.

That suggests that the most performant way of indexing is to only perform the merge periodically, and that's what actually happened in the early days of Logos 4 — you'd have a separate search section for newly indexed documents. But believe me, you don't want to go back there.

Limiting downloads to once a week would obviously help, too.

The easiest thing Logos could do (IMO) would be to have an option to automatically pause the indexer whilst the computer was in use. That should be very easy to implement - but it wouldn't help people who never leave their computer switched on and unattended.

Francis:
BTW, performance has long been a problem with Logos, and one that comes back in the conversation with users of other competing software ("I used Logos in the past but it was so slow..."). The overall performance has improved a lot (speed of searches), but indexing is perhaps another relic of a problematic approach. This has been the case long before resources started to be super-tagged in relation to all kinds of datasets and types, so it's difficult to think this is the reason. I don't know, was it coded unpropitiously and now it is hard to change it? 

It's unfair to compare the speed of Logos' indexing with other apps like Accordance. I have a very small library in Accordance 10, and Accordance is horrible at searching your whole library. There's no "Ranked" view — you can only see search results per resource, and even the results from a single resource might be split across multiple headings which you can't see all at the same time (e.g. a Dictionary could have separate sections for "entry", "definition", "etymology", "quotations").

The problem is not "Logos is really bad at this". The problem is "Logos is trying to do search better than everyone else, and as a result you have to index."

Posts 2674
David Ames | Forum Activity | Replied: Sat, Sep 17 2016 5:03 AM

What if:

Message from Logos: there are new or updated resources ready to load - Is this a good time to load them?  

- if you answer no then they will not be loaded at this time

- if you answered no then open the library there will be a banner stating that there are resources waiting to be loaded.

Message from Logos: Logos needs to run the index program - is this a good time? 

- if you answer no then indexing will not run at this time

- if you did not let the indexer run then all searches will have the banner stating the not all results will be shown.

Why: You just entered the dock [pulpit] and while giving your opening remarks and prayer you opened Logos  

Whoops - big down load and index start.  And you just wanted to give your congregation the revelations that God had given you. [Your sermon]

At that time you do not need the new resources nor indexing of them.  Maybe this evening.   And maybe then you will rerun the search you did for that sermon just to see what new insights were added.

Posts 13411
Mark Barnes | Forum Activity | Replied: Sat, Sep 17 2016 5:13 AM

David Ames:

Message from Logos: there are new or updated resources ready to load - Is this a good time to load them?  

- if you answer no then they will not be loaded at this time

- if you answered no then open the library there will be a banner stating that there are resources waiting to be loaded.

This already happens if you turn Automatic Updating off.

Posts 28714
Forum MVP
JT (alabama24) | Forum Activity | Replied: Sat, Sep 17 2016 5:16 AM

Mark Barnes:
The problem is not "Logos is really bad at this". The problem is "Logos is trying to do search better than everyone else, and as a result you have to index."

Yes Smile

OSX & iOS | Logs |  Install

Posts 1505
Rick Ausdahl | Forum Activity | Replied: Sat, Sep 17 2016 5:21 AM

Francis:

Sure, some will say, download later, pause indexing, change process prioritisation. There is something to be said for all this, but each of these workarounds have their disadvantages as well. And these are bandaids, they don't deal with the root problem.

I have absolutely no operating system expertise so maybe this is a technical impossibility, but I can't help but wonder... if Faithlife does not already do this, would it be possible to run the indexer as a separate executable that could have it's CPU priority individually set (in Logos program settings), so that rather than having to either pause indexing entirely or deal with a sluggish Logos session, we could have something near more normal Logos speed/response while the indexer slowly goes about its business in the background.  I realize that would affect how quickly new resources were fully integrated in the entire index, but it would at least give the user more control of the impact on overall Logos performance. 

Posts 5495
DIsciple II | Forum Activity | Replied: Sat, Sep 17 2016 5:27 AM

David Ames:
Message from Logos: Logos needs to run the index program - is this a good time? 

David Ames:
Message from Logos: there are new or updated resources ready to load - Is this a good time to load them?  

I like these ideas David

Also what might be helpful in conjunction with these ideas is if rather than simply being able to nominate certain hours during the day in which downloads occur as it the current option in settings, but be able to setup a weekly schedule of hours when dowloads can occur.  Using the scenario you raised David.it might be desireable for a preacher to be able to setup a schedule where they never download any updates at anytime on a Sunday and for example's sake lets say Thursday because on Thursday you are doing the bulk of your sermon preparation but on Mondays and Tuesdays you are either in staff meetings or out doing visitations so you are fine with downloads occuring at anytime on these days becuase whatever access you need to your library, your tablet and or smart phone will suffice on those days if your libary is indexing. On the other days of the week you might to want to block out half of the day.

Equally for different reasons a professor / teacher or a student may want to prevent certain day / hours duriing the week where downloads should not occur but are ok with it happening at any other time.

Posts 13411
Mark Barnes | Forum Activity | Replied: Sat, Sep 17 2016 6:12 AM

Rick Ausdahl:
if Faithlife does not already do this, would it be possible to run the indexer as a separate executable that could have it's CPU priority individually set (in Logos program settings)

They already do both of these things. The indexer is a separate executable. All the threads begun by the indexer run at low priority.

The problem isn't even CPU for most users. CPU is unlikely to reach full utilisation (and therefore will have spare cycles for other uses). For the majority of users, it's SSD/HDD usage that will get maxed out and cause slow down elsewhere.

Posts 1505
Rick Ausdahl | Forum Activity | Replied: Sat, Sep 17 2016 7:21 AM

Mark Barnes:

Rick Ausdahl:
if Faithlife does not already do this, would it be possible to run the indexer as a separate executable that could have it's CPU priority individually set (in Logos program settings)

They already do both of these things. The indexer is a separate executable. All the threads begun by the indexer run at low priority.

The problem isn't even CPU for most users. CPU is unlikely to reach full utilisation (and therefore will have spare cycles for other uses). For the majority of users, it's SSD/HDD usage that will get maxed out and cause slow down elsewhere.

My library is small (apx. 2500 resources) compared to the libraries many other have, so I've never been hit as hard by the indexer as those with large libraries.  But even with a smallish library, on the 6 year old hand-me-down laptop I inherited from my wife after getting her a new one, the time required for indexing and the impact on system response was painful.  Installing a SSD on the old machine did help general Logos performance considerably, but even then, with the older AMD Turion 64 x2 processor in that machine, the SSD upgrade was not enough to take the pain out of the performance hit when the indexer kicked in.  Things are much better (with my small library) after biting the bullet and purchasing a new machine for myself, but I feel the pain others are still going through.

I realize the storage drive--especially a traditional HDD--is the real bottleneck when it comes to indexing, but even so, I thought that if the CPU priority could be set low enough, that it might still reduce the impact on the drive, thereby increasing overall Logos performance.

Since the indexer is already a separate executable set to low priority and people still feel the pain, would it be possible to reduce the impact even further by having options to have an "intermittent" setting by which the indexer would only engage/run at set time intervals to slow it down even further?  Or... perhaps for those who are more concerned with Logos performance than say the performance of other apps like email and web browsing, to have an option that would only allow the indexer to run when the Logos app is not running.

In any event, my thoughts are not currently for myself, but for others who are more impacted by the indexer than I am.

Posts 11325
Denise | Forum Activity | Replied: Sat, Sep 17 2016 7:29 AM

I suppose the whole issue of 'Logos' is user-specific, far more than other programs. For example, for me, indexing is pretty transparent (mainly updates, few purchases), but program performance is approaching pre-L5 days (due to buffering changes that likely benefit most).

CPU-wise, the indexer on mine (w/SSD) runs at full CPU. Basically all or none. But while reading, transparent. If I was 'using' Logos, I might be frustrated.

Queries to download/index would be like Vista's security queries ... they'd highlight the problem. Badly.

Value-wise, I rearely search. But constantly use the CitedBy's, Info-panel, and right-click ... all indexer-based.

"God will save his fallen angels and their broken wings He'll mend."

Posts 13411
Mark Barnes | Forum Activity | Replied: Sat, Sep 17 2016 7:45 AM

Rick Ausdahl:
Since the indexer is already a separate executable set to low priority and people still feel the pain, would it be possible to reduce the impact even further by having options to have an "intermittent" setting by which the indexer would only engage/run at set time intervals to slow it down even further?

At least one of the problems here is that a large part of the time in merging is running a single SQLite query through a third-party library. If it was multiple queries, Logos could space them out to reduce the bottleneck. But there's no easy way of reducing the performance impact of a single command.

Theoretically, reducing the dependence on SQLite would give Faithlife many more options, but it's unrealistic to think that Faithlife developers on their own could develop a more efficient database system than hundreds of SQLite developers — at least without a massive budget.

A more realistic option would be to switch to another third party library such as Sphinx, Lucene or Xapian. (Bradley has also shown interest in Gigablast in the past.) I don't know enough about those libraries to know how realistic that is, although Lucene has been ported to .NET, which might help. Faithlife do have developers with understanding in these areas, so we can only hope that they're working on it, or at least thinking about it. But it would mean a complete re-architecturing of the indexing system, which would be a major investment that may not be possible.

Posts 1319
Myke Harbuck | Forum Activity | Replied: Sat, Sep 17 2016 8:39 AM

Mark Barnes:

The problem is not "Logos is really bad at this". The problem is "Logos is trying to do search better than everyone else, and as a result you have to index."

I dont think this could have been said any better!! 

I, too, get frustrated with the amount of time indexing takes (I tend to think it takes longer on a Mac for some reason), but I know that its because Logos is working behind the scenes to provide the best search experience possible, so I ignore the time it takes. 

Myke Harbuck
Lead Pastor, www.ByronCity.Church
Adjunct Professor, Georgia Military College

Posts 201
Stephen Terlizzi | Forum Activity | Replied: Sat, Sep 17 2016 9:36 AM

Mark Barnes:

But it would mean a complete re-architecturing of the indexing system, which would be a major investment that may not be possible.

i think it is very unlikely that Faithlife would re-architect its SQL data structure on the PC-based perpetual licensed product. There is too much product risk and Faithful doesn't even tag resources without getting enough pre-orders to cover at least some of the development costs. I think a more likely approach with be to use multi-model databases in the cloud to provide better search. However, this means that those improvements will be limited to Logos Now subscription customers. Such is the problem of using a version 7 product.

IMHO.

Agape,

Steve

Posts 82
Joseph | Forum Activity | Replied: Sat, Sep 17 2016 9:41 AM

Doesn't having a near fully developed web app (app.logos.com) solve this problem? 

Page 1 of 2 (35 items) 1 2 Next > | RSS