Collections Performance hit question

MVP Posts: 2,485
edited November 2024 in English Forum

Considering that collections can be a major culprit for performance degradation, I'm wondering whether it makes a difference how a collection is defined. I can think of at least 3 distinct (but not mutually exclusive) ways to create a collection:

1) Rule-based using Title, Series, etc.

2) Rule-based using (only) MyTag:x

3) "plus these resources"

I would love to know if any of these makes a difference in performance vs. another way.

Welcome!

It looks like you're new here. Sign in or register to get started.

Comments

  • MVP Posts: 36,158

    2) Rule-based using (only) MyTag:x

    That is the most efficient as it eliminates multiple instances of titles, authors, series etc. from 1). Evaluating those rules takes time. You can start with predefined collections formatted like 1) and then tag multiple resources at once in Library e.g. limit Library to the collection.

    3) is usually needed with 1), but is unnecessary if you use 2).

    2) will always require manual updating as you add resources to Library (including monthly review ones if you care about temporary resources).  1) + 3) will require a review to ensure the (dynamic) rules cover a new resource.

    Dave
    ===

    Windows 11 & Android 13

  • Member Posts: 941 ✭✭

    I cannot speak regarding performance hit one way or the other, but I have always made extensive use of mytag: whenever I obtain new resources and have largely found that this removes the need to use collections. If I am searching for something I will just search the resources I have tagged with whatever rather than searching a collection. I also use mytag: when creating custom guides. That said, about once every year or two I will download the latest collection rules that Mark Barnes and others have helped create regarding categorizing commentaries and denominational affiliations of various authors. I will then batch update my tagging as needed, delete the collections I downloaded, and go on with life. Does it take more effort on the front end? Absolutely. Would I do it again? Absolutely.

  • MVP Posts: 2,485

    That is the most efficient

    Thanks, Dave! Is this true even over 3)? I suspected that 2) had the biggest performance hit, but thought that perhaps 3) by itself would be the most efficient since it needn't "evaluate" anything. 3) can be achieved in a similar fashion as bulk tagging via the "save as collection" option in the library so it's not necessarily effort prohibitive.

  • MVP Posts: 2,485

    Another question: Do collections affect performance even if not being used in a layout? If so, is it only during startup?

  • Member Posts: 15,432 ✭✭✭

    I would love to know if any of these makes a difference in performance vs. another way.

    If you have permanent logging enabled, slow (> 250ms) collection rules are logged. Search for lines containing the text "Searching for all records matching: ".

    Large collections will cause slow-downs in two main places:

    • Startup
    • Any time anything changes in your library (new resources, tags/ratings added, community tags downloaded, metadata updated, etc.)

    At this time all collections are calculated, regardless of whether they're in use.

    On my spinning HDD, my big commentary collection rules take around 1.5s to 7s to evaluate. On my SSD, it's 0.2s to 5s. That's with a very large library, so most users will see much less of a performance hit. It's also one of a series of operations that happens in parallel, so the real-time delay is probably much less than this.

    However, bear in mind that every time you add tags, your entire library catalog has to rebuild (even if you just add one tag to one resource). This also triggers rebuilding all the collections. Rebuilding the library catalog can take several minutes if you have lots of tags and lots of resources. Then they've all got to sync as well. I therefore wouldn't recommend frequent tagging if you're trying to optimise for speed.

    In testing, I added a tag to 1,000 resources. It took just over 2 minutes to update the library catalog, and a further minute to sync. However, building the collection took less than 250ms. The same collection generated by rules takes around 1-2s on my spinning HD. It's not possible to test how long it would take to build the collection with "plus these resources", as that isn't logged, although one would imagine it would be even quicker than tags.

    So it comes down to whether you prefer a one-off delay of 2 minutes, or a repeated delay (on startup and library update) of 1 second. I prefer the latter.

    PS - I performed testing with my Expository Collections rule.

    This is my personal Faithlife account. On 1 March 2022, I started working for Faithlife, and have a new 'official' user account. Posts on this account shouldn't be taken as official Faithlife views!

  • MVP Posts: 2,485

    Thanks for your detailed response, Mark! 

    If the "library updating" banner is any indication, it only took 18 seconds to update after I added a tag to 1k resources (I "only" have ~7100 resources). 

    What I'm considering doing (feedback welcome) is to have a note that contains any collection rule I may want to update/modify. I can simply copy/paste one of these into my library window and choose "save as collection". This would arguably give me the best (ongoing) performance while not being a terrible task to maintain/update.

  • Member Posts: 15,432 ✭✭✭

    If the "library updating" banner is any indication, it only took 18 seconds to update after I added a tag to 1k resources (I "only" have ~7100 resources)

    My library's three times the size, but took six times as long — possibly because I tested it with a HDD not an SSD.

    What I'm considering doing (feedback welcome) is to have a note that contains any collection rule I may want to update/modify. I can simply copy/paste one of these into my library window and choose "save as collection". This would arguably give me the best (ongoing) performance while not being a terrible task to maintain/update.

    The only disadvantage of this is that if you have (e.g.) custom guides that use collections, you'd going to need to re-edit the templates every time you create a new collection in this way. You'll obviously need to delete the old collections, too.

    It would certainly be interesting to know if replacing all your dynamic collections with static ones gives a noticeable improvement to make it worthwhile. Even on my HDD Logos starts in 30s. Shaving even 30% of that time wouldn't be worthwhile for me.

    This is my personal Faithlife account. On 1 March 2022, I started working for Faithlife, and have a new 'official' user account. Posts on this account shouldn't be taken as official Faithlife views!

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.