Indexing Again?

24

Comments

  • DominicM
    DominicM Member Posts: 2,995 ✭✭✭

    yeah I agree, if the indexer kept a track of where it was by writing the last resource id it finished indexing to a .cfg file, it should be able to resume at point it was terminated by the system.

    Never Deprive Anyone of Hope.. It Might Be ALL They Have

  • Kevin A. Purcell
    Kevin A. Purcell Member Posts: 3,405 ✭✭✭

    While I'm hoping that the final version indexes much faster, can anyone give me an example of a piece of software they have purchased that required you to install and then leave it alone for 8 hours to make it usable? Not even Windows 7 or Vista for that matter, required that much time for me to be able to use it.

    My hope is this is just a beta problem.

    Dr. Kevin Purcell, Director of Missions
    Brushy Mountain Baptist Association

    www.kevinpurcell.org

  • Simon’s Brother
    Simon’s Brother Member Posts: 6,816 ✭✭✭

    While I'm hoping that the final version indexes much faster, can anyone give me an example of a piece of software they have purchased that required you to install and then leave it alone for 8 hours to make it usable? Not even Windows 7 or Vista for that matter, required that much time for me to be able to use it.

    My hope is this is just a beta problem.


    I can't think of any Kevin, and particularly one that requires it everytime you add a new resource to the the mix.  Being able to search an entire library of 2000, 4000, or even 8000 resources in a blink is only the first part of the equation... you then need to be able to wade through and process those results.  I haven't looked that much at search yet as I've been trying to work out how to naviage the program first but can you know discard individul results that you determine not relevant and can you save those 'edited' search results to come back to them a later date.... yes it might be quick to generate the results but not so quick to re-eliminate the 4,000 hits you've already determine are not relevant to what you are trying to achieve with the search......so in that situation saving the results does seem important.

  • Bob Pritchett
    Bob Pritchett Member, Logos Employee Posts: 2,280

    can anyone give me an example of a piece of software they have purchased that required you to install and then leave it alone for 8 hours to make it usable?

    Any desktop searching engine? :-)

    (To be fair, some of them appear to work immediately. But they're sometimes a day or more from truly doing what they're supposed to -- searching your whole hard drive and email archive.)

    We're working at optimizing indexing, and finding creative solutions, but we're up against limitations of time and space. If you want a product that lets you build a personal library of a unique collection of up to 10,000 books, and that searches it in seconds, you'll need an index. And your index will be different from everyone else's, so we've got to build it on your system. (Or pre-build everyone's index, and then give you a 5 gig download....after a few months, when we get to yours....)

    Or we could just make everyone use exactly the same library and deliver it pre-indexed....

    (BTW, Windows isn't ready to search your whole hard-drive instantly either; it takes time to index as well.)

    Alas... we're working on it, but there may not be a silver bullet for this.

  • Simon’s Brother
    Simon’s Brother Member Posts: 6,816 ✭✭✭

    Any desktop searching engine? :-)

     

    I'll conceed on that one but never though of them because I never found them of any value and turn them off pretty quickly, as they just seem to hog system resources. I seem to get better results look for a file on my hard drive using standrd windows search function... the indexer never seems to find the file.

     

     

  • Nigel Cunningham
    Nigel Cunningham Member Posts: 181 ✭✭

    Hi Bob.

    I don't think anyone is suggesting that the index should be prebuilt. We're just wondering whether adding X resources to a library of Y books really requires reindexing the Y books as well as the X books. Surely you'd be able to reuse just about all of the information from the previous version of the index?

    Regards,

    Nigel

  • Alex Scott
    Alex Scott Member Posts: 718

    If you're concerned about interuptions from Microsoft updates, why not change it to notification of updates instead of automatic install.  That's what I use.  They notify you with a shield on the task bar and then you choose the time to download and install them.  You also get the option to choose if you want all the updates too.

    Longtime Logos user (more than $30,000 in purchases) - now a second class user because I won't pay them more every month or year.

  • Jacob Hantla
    Jacob Hantla Member, MVP Posts: 3,871 ✭✭✭

    If you're concerned about interuptions from Microsoft updates, why not change it to notification of updates instead of automatic install.  That's what I use.  They notify you with a shield on the task bar and then you choose the time to download and install them.  You also get the option to choose if you want all the updates too.

    You're right Alex, that would be easy for us to do, but the point of what we are doing I think is to "be normal users" where we do "normal user type things" and then report all the annoyances that normal users would experience so that those annoyances can be minimized and the user has to do as few work arounds as possible

    Jacob Hantla
    Pastor/Elder, Grace Bible Church
    gbcaz.org

  • Mark Smith
    Mark Smith Member, MVP Posts: 11,790 ✭✭✭

    While I don't run too many searches in L3 that take a long time I'd be glad to pay some price for very rapid searches in L4. However, I'd like to have some control over when reindexing occurs as others have said. Since it takes a long time, I'd like to schedule it when I don't care that it is running in the background.

    If a way can be made to just incrementally add to the index file rather than starting all over again, that would seem to be a very good solution as well. Do give us some control over reindexing in any event, please.

    Pastor, North Park Baptist Church

    Bridgeport, CT USA

  • Bob Pritchett
    Bob Pritchett Member, Logos Employee Posts: 2,280

    We could save some time by merging new books into existing indexes, but less than you'd think. It also makes the process slower and takes even more disk space.
    Indexing everything at once lets us store everything very tightly, and retrieve it quickly.

    Systems that support deletion and insertion are slower and take more space.


    Massive simplification:


    If I build an index of "red", "blue", and "green" in documents 1-8, it looks like this on the hard drive:


    blue:123568;green:1245678;red:2357

    Minimal space, quick to access. (MUCH smaller than a relational database.) Now, add "cyan" as a new word, in alphabetical order, and add document 9, which has a hit for all four colors.


    blue:1235689;cyan:9;green:12456789;red:23579

    Almost every byte of the file (which could be gigabytes) had to be examined, compared, and moved. Of course there are techniques, like leaving space for insertions, moving data to new pages, etc. But very quickly those things double/triple the size of the database. Fine normally, but we're already at multiple gigabytes for many users. And, the bigger the file (even if it's just padded with space, or indexes to allow things to move around) the more bytes we read off the hard disk, which is the slowest part of the entire process.

    Of course merging the index skips the "reading the books" stage, and that has some benefits, but it may not compensate enough. We'll continue to look at it. We're doing lots of testing (and research) to find the optimal solution.

  • Nigel Cunningham
    Nigel Cunningham Member Posts: 181 ✭✭

    Thanks for the reply Bob.

    Makes a lot of sense.

    Nigel

  • spitzerpl
    spitzerpl Member Posts: 4,998

    We could save some time by merging new books into existing indexes, but less than you'd think. It also makes the process slower and takes even more disk space.

    Indexing everything at once lets us store everything very tightly, and retrieve it quickly. Systems that support deletion and insertion are slower and take more space.

    Massive simplification:

    If I build an index of "red", "blue", and "green" in documents 1-8, it looks like this on the hard drive:

    blue:123568;green:1245678;red:2357

    Minimal space, quick to access. (MUCH smaller than a relational database.) Now, add "cyan" as a new word, in alphabetical order, and add document 9, which has a hit for all four colors.

    blue:1235689;cyan:9;green:12456789;red:23579

    Almost every byte of the file (which could be gigabytes) had to be examined, compared, and moved. Of course there are techniques, like leaving space for insertions, moving data to new pages, etc. But very quickly those things double/triple the size of the database. Fine normally, but we're already at multiple gigabytes for many users. And, the bigger the file (even if it's just padded with space, or indexes to allow things to move around) the more bytes we read off the hard disk, which is the slowest part of the entire process.

    Of course merging the index skips the "reading the books" stage, and that has some benefits, but it may not compensate enough. We'll continue to look at it. We're doing lots of testing (and research) to find the optimal solution.

    I admit my complete ignorance here. Google indexes massive amounts of information. What is the difference that enables them to get info so quickly? is it 1)one of there secrets 2)money and resources 3)having it on servers vs local machines 4)the type of info they categorize or 5)a bit of all the above?

  • Bob Pritchett
    Bob Pritchett Member, Logos Employee Posts: 2,280

    I doubt their indexing is actually faster; they're brute forcing with lots of machines and memory. (And you aren't watching their indexing speed, just their searching speed. Our searching speed is pretty good, too.)

    Google literally keeps everything in memory, distributed over a massive number of computers. (See http://www.labnol.org/internet/search/google-query-uses-1000-machines/7433/, which reveals that a single Google search involves 1,000 machines and that the entire search index (which means, essentially, the entire Internet) is in memory.)

    On a powerful machine here we could index every book we have overnight. (I'm pretty certain; Bradley can correct me if I'm off.) Then we could serve up speedy searches over the web, just like Google. But remember, Google A) sends you just the top 10 results on the first page, B) doesn't filter the results to your particular list of websites.

    We actually thought about hosting the whole index online and serving results over the web, but users have told us they want to be able to search without being connected to the Internet, and people want to see just results from their books, not a majority of the results coming from books they haven't purchased and can't access. And you all seem to want to see more than 10 results. :-)

    Each machine has different hardware specs, too; a particular limitation is hard disk speed. Many notebooks often have slower drives (5400 RPM) than desktops (7400 RPM), for example.

    But if you want to put us up against Google, I'm ready. To make it fair though, you've got to ask them to index the Internet on your machine or else give us 1,000 machines to store everything in memory. :-)

  • Simon’s Brother
    Simon’s Brother Member Posts: 6,816 ✭✭✭

    If you're concerned about interuptions from Microsoft updates, why not change it to notification of updates instead of automatic install.  That's what I use.  They notify you with a shield on the task bar and then you choose the time to download and install them.  You also get the option to choose if you want all the updates too.


    This is what I use also but I am thnking of the user Bob is trying to target and they may not know about that or even feel comfortable with that.... it is a great suggestion though Alex in terms of us who are beta testing.....

  • spitzerpl
    spitzerpl Member Posts: 4,998

    But if you want to put us up against Google, I'm ready. To make it fair though, you've got to ask them to index the Internet on your machine or else give us 1,000 machines to store everything in memory. :-)

    I've always wondered how they did it. Thanks for the explanation. though I don't agree with every book you offer, Google would loose hands down at their ability to deliver the truth :-)

    If you did want to do some over the web results you could have it as a separate search, sort of like your Library search in 3.0. That would give the same benefits we gained from being able to search locked books in 3.0 without bogging down our computers.

    I in no way want you to compete with Google. I would hate to know what you would have to charge for books then!

  • Simon’s Brother
    Simon’s Brother Member Posts: 6,816 ✭✭✭

    ...

    Of course merging the index skips the "reading the books" stage, and that has some benefits, but it may not compensate enough. We'll continue to look at it. We're doing lots of testing (and research) to find the optimal solution.

     

    thanks for bearing wtih us Bob, even if I am pushing hard on some of these things I do appreciate the efforts put into this release....

  • Nigel Cunningham
    Nigel Cunningham Member Posts: 181 ✭✭

    The amount of raw processing power is certainly important, but let's not also forget that your algorithms and data structures make an even bigger difference, especially as the volume of data increases. You've clearly picked algorithms and data structures that give excellent searching capability but - dare I say it - have a horrible insertion cost at the moment!

    By the way, re the 1000 machines, I think you've have to give us a thousand machines each as part of our book purchases, not us give them to you! :)

    Nigel

  • Damian McGrath
    Damian McGrath Member Posts: 3,051 ✭✭✭

    The saga of reindexing comes to a conclusion:

    My notebook finished indexing my in 11 1/2 hours. That's a big difference to my netbook (19 1/2 hrs). The compacting took 2 hours and 20 minutes compared to 4 1/2 hours on the netbook.

    How much does having a 7200rpm disk drive make a difference? Everything else is about equal.

    It took 46 minutes to index the logos4 version of the LXX - that's an enormous amount of time for a newly re-written piece. Mind you it took 1 hr and 46 minutes on my netbook - woah! Three other books took over an hour to index on the netbook....

    I really don't want to go through this again in a real hurry.

     

  • Bradley Grainger (Logos)
    Bradley Grainger (Logos) Administrator, Logos Employee Posts: 11,950

    It took 46 minutes to index the logos4 version of the LXX - that's an enormous amount of time for a newly re-written piece. Mind you it took 1 hr and 46 minutes on my netbook - woah! Three other books took over an hour to index on the netbook....

    All resources with associated reverse interlinears index incredibly slowly in Beta 1—sometimes 20x slower than the equivalent Bible without a reverse interlinear. This is probably the highest priority indexing speed bug that we're working on, but it probably won't be fixed for Beta 2. (Assuming that without the reverse interlinear it would take 5 minutes, and assuming that you have the LXX, ESV, NRSV, NASB, NKJV, and LEB reverse interlinears, you could probably see a speed-up of about four hours if reverse interlinear indexing can be brought close to the speed of indexing normal resources.)

  • Damian McGrath
    Damian McGrath Member Posts: 3,051 ✭✭✭

    (Assuming that without the reverse interlinear it would take 5 minutes, and assuming that you have the LXX, ESV, NRSV, NASB, NKJV, and LEB reverse interlinears, you could probably see a speed-up of about four hours if reverse interlinear indexing can be brought close to the speed of indexing normal resources.)

    I have them all. Watching the log file for indexing I could see how much time they took. Bring on Beta 3!

  • Kevin A. Purcell
    Kevin A. Purcell Member Posts: 3,405 ✭✭✭

    Forgive my ignorance; I know what I am about to ask  may be like asking why you Detroit doesn't deliver cars with enough gas to run forever, but I'll ask.

    Why can't you deliver the program preindexed?  Index on your end and then ship so the indexing files are put in place already done. Is there a technical reason for not doing this? Is it because everyone has a different library?

    If it is the latter then why can't you just index each book and tell the database to add that book once I have it installed?

    I'm obviously not a programmer.  I'm sure someone on your end has to be as smart or more likely smarter than me and had to think of this.  But wanted to ask.

    Dr. Kevin Purcell, Director of Missions
    Brushy Mountain Baptist Association

    www.kevinpurcell.org

  • Bob Pritchett
    Bob Pritchett Member, Logos Employee Posts: 2,280

    Why can't you deliver the program preindexed?

    Yes, it's because everyone has a different library. We intend to deliver a pre-built index for each core collection, but users who have purchased other books will need to build a new index to incorporate those other books.

  • Dave Hooton
    Dave Hooton Member, MVP Posts: 35,672 ✭✭✭

    I had an update when I started up my laptop today.

    Please describe the process and how you kept track of its progress. What was updated?

    Dave
    ===

    Windows 11 & Android 13

  • Jacob Hantla
    Jacob Hantla Member, MVP Posts: 3,871 ✭✭✭

    Bob-

     I don't think my problem is the indexing per-se, it's that while it is indexing it kills performance of other apps and makes my computer run hot and hurts battery life.

     

    Would it be possible run the reindexing as a lower priority operation in the background? I think we're understanding and appreciative of the initial indexing process. it is the computer slowing reindexing that is the pain.

     

    Jacob Hantla
    Pastor/Elder, Grace Bible Church
    gbcaz.org

  • Terri & Keith
    Terri & Keith Member Posts: 17

    indexing resources never finishes.  I shut off automatic hibernate, standby, updates, which appears to stop the function to completion only to start over again at start up.  Returning hours later expecting indexing to be finished i see no indication it finished.  Then when initiating a search the note indicates the rescources are not indexed.  When I open logos 4 indexing begins again.   I am looking for a post, 'ABC's of indexing'  with a notebook computer.  btw, my tower office computer, no problem, all resources are indexed.  Please direct me what I should do or look for so indexing will finish on my notebook computer.  Thank you.

  • Terri & Keith
    Terri & Keith Member Posts: 17

    while the notebook comuter is endexing the resource file i have been working on an essay watching the resource counter decrease.  But the counter just disappeared and then trying to do a search i get an indication the resource files are being indexed.  I do not believe the files are being indexed.  The computer set running for days and no change with this condition.

  • Justin Langley
    Justin Langley Member Posts: 19 ✭✭

    Ditto to Keith's post.

    I've tried 4 times now, and each time indexing starts over.

    And, I haven't added any new resources since I got Logos4.

    --Justin Langley

    Windows XP Home, running via Parallels on Macbook Pro
    2.4 GHz Intel Core 2 Duo
    2 GB 667 MHz DDR2 SDRAM

  • Terri & Keith
    Terri & Keith Member Posts: 17

    It just occured to me that maybe, just maybe, the reason why the notebook computer will not allow the indexing to complete is because I am using "slimbrowser" for my internet software.  Would this make a difference?  I am not a computer kid, just asking. thank you.

  • J.R. Miller
    J.R. Miller Member Posts: 3,566 ✭✭✭

    Keith, from what I understand, the v4 has no connection to your browser so that should not matter any more.

    My Books in Logos & FREE Training

  • John Minter
    John Minter Member Posts: 16 ✭✭

    Bob, you guys REALLY  need to rethink this indexing scheme. Users just WILL NOT put up with this multi-hour re-index every time a resource is changed. It is a major time waster for people who just want to get work done.  I have to tell you that my irritation with indexing eclipses my appreciation of the work you put in to the new features in 4.0.