Can we get logging to default to "on"?

Todd Phillips
Todd Phillips Member Posts: 6,737 ✭✭✭

I'm not seeing the wisdom to not having logging on by default.  The Mac version does log by default.  Every time someone comes here with a problem, we have to explain how to turn on logging. 

Could this be changed for the next release?  or am I just missing something?

MacBook Pro (2019), ThinkPad E540

Comments

  • TCBlack
    TCBlack Member Posts: 10,980 ✭✭✭

    I would agree that it needs to be on by default in every beta version.  But  not for gold releases.  

    Logging does slow down the system even by a small degree, and on older systems (more than a year or two old) that seems to be significant.  

    Hmm Sarcasm is my love language. Obviously I love you. 

  • Danny Parker
    Danny Parker Member Posts: 437 ✭✭

    Disagree. Performance is a major issue for all users. Why confuse new beta performance with logging slow downs? Maybe need to make instructions more visible instead.

  • Danny Parker
    Danny Parker Member Posts: 437 ✭✭

    Maybe make it a sticky on the Beta forum, or include instructions (or a pointer to them)  in every release announcement.

  • Todd Phillips
    Todd Phillips Member Posts: 6,737 ✭✭✭

    Disagree. Performance is a major issue for all users. Why confuse new beta performance with logging slow downs? Maybe need to make instructions more visible instead.

    Well, I haven't ever had logging off long enough to even know if there is a performance benefit.  But logging is necessary enough (even for non-beta users) that if it isn't on by default, it should have a easily seen toggle switch in the Program Settings. (Which I would argue should be set to "on" by default, but easily switched to off)

    MacBook Pro (2019), ThinkPad E540

  • Bradley Grainger (Logos)
    Bradley Grainger (Logos) Administrator, Logos Employee Posts: 12,072

    But logging is necessary enough (even for non-beta users)

    The most useful logs that we developers get are the Logos4Crash.txt reports, which are always generated whether logging is on or off. (From my perspective, the main logs often contain a lot of noise, don't necessarily help pinpoint the problem, and do slow down the application. They are still useful in some circumstances, though.)

  • Mark Barnes
    Mark Barnes Member Posts: 15,432 ✭✭✭

    Bradley's right - from a developers point of view, but not from a support point of view, which is Todd's point.

    I wouldn't want logs turned on my default. Useful logs would soon get over-written by less useful ones. But it would be nice to (a) have logging on by default in beta builds, and (b) make it easier to turn on for normal builds.

    This is my personal Faithlife account. On 1 March 2022, I started working for Faithlife, and have a new 'official' user account. Posts on this account shouldn't be taken as official Faithlife views!

  • Dave Hooton
    Dave Hooton MVP Posts: 36,106

    Maybe make it a sticky on the Beta forum

    Logs are also relevant outside the beta forum but logging should not be ON by default for reasons already stated.

    Have a sticky in each relevant Forum (PC Beta, Logos4 as a minimum) with a title like "Problem reporting: How to produce diagnostic logs". The content should cover:-

    • Why logs are needed
    • Where to find crash reports
    • How and when to enable permanent logging eg. when asked or if a crash does not occur
    • How to upload (compressed) logs

    From my (support) point of view logs tell me far more than a crash report especially with minimum information from the user.

    Dave
    ===

    Windows 11 & Android 13

  • Dave Hooton
    Dave Hooton MVP Posts: 36,106

    The most useful logs that we developers get are the Logos4Crash.txt reports

    We know that developers prefer the application to Crash rather than produce a useful message[:)] Will there come a time when crashing can be obviated?

    Dave
    ===

    Windows 11 & Android 13

  • Todd Phillips
    Todd Phillips Member Posts: 6,737 ✭✭✭

    The most useful logs that we developers get are the Logos4Crash.txt reports, which are always generated whether logging is on or off. (From my perspective, the main logs often contain a lot of noise, don't necessarily help pinpoint the problem, and do slow down the application. They are still useful in some circumstances, though.)

    Not every problem is a crash, especially when it comes to performance issues and indexing problems.  Plus, crashes usually get quick attention from the devs, but it's those other nagging problems that the forum users often have to analyze ourselves before Logos staff gets a chance to look at it.

    MacBook Pro (2019), ThinkPad E540

  • Dave Hooton
    Dave Hooton MVP Posts: 36,106

    Dave
    ===

    Windows 11 & Android 13

  • JimTowler
    JimTowler Member Posts: 1,383 ✭✭✭

    We know that developers prefer the application to Crash rather than produce a useful messageSmile Will there come a time when crashing can be obviated?

    For me, its NASTY that Logos4 always crashes, rather than log something and continue to work correctly (except for whatever thing just failed to load/display/save/whatever)

    Please Logos: Fix the application so its not always crashing.

  • The Mac version does log by default.

    Earlier Logos 4 Mac Alpha releases did not log by default - changed in Alpha 18 (no longer needed to download nlog.config file and move to logging folder).

    Wonder about set update channel command enhancement - if Beta, enable logging.  If changing from Beta, disable logging.

    Keep Smiling [:)]

  • Bradley Grainger (Logos)
    Bradley Grainger (Logos) Administrator, Logos Employee Posts: 12,072

    We know that developers prefer the application to Crash rather than produce a useful messageSmile Will there come a time when crashing can be obviated?

    We actually prefer it to crash than corrupt your data. [:)]

    By definition, when the application crashes, it's in an unanticipated and unknown state."Recovering from a crash would need to move the app from an arbitrary unknown state to a known good state. A botched attempt at recovery could persist bad data to disk (or worse, the sync store). Yes, there is some data loss when the app crashes (e.g., current layout isn't saved), but there's little chance of permanent damage, because all further action is halted.

    The beta can be very unstable, but data from Windows Error Reporting indicates that actual crashes in the released version are comparatively rare; we use feedback from the beta and from WER to create service releases to reduce even further the incidence of crashing.

  • Rosie Perera
    Rosie Perera Member Posts: 26,194 ✭✭✭✭✭


    We actually prefer it to crash than corrupt your data. Smile

    Would you like me to add this post to the "Logos Speaks" page on the wiki? [:O]

    I'd prefer it to not crash and not corrupt my data. Where I come from, both would be considered Level 1 bugs (highest priority, show-stopping, even recall if it's a common scenario).

  • MJ. Smith
    MJ. Smith MVP Posts: 54,787

    Where I come from, both would be considered Level 1 bugs (highest priority, show-stopping, even recall if it's a common scenario).

    That has not been my experience with PC software.  Generally, I have a reasonable crash rate in the betas  - my complaint is with the length of time I am down waiting for the standard Windows error report to generate. I also get frustrated at the errors that slip through regression testing. But I think Logos responds very quickly to production crashes now that the product is stable. That said, if I ran Logos development and testing and had an infinite budget I would choose a different development strategy.

    Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."

  • Dave Hooton
    Dave Hooton MVP Posts: 36,106

    We actually prefer it to crash than corrupt your data.

    One can conclude that crashes occur only when data corruption is possible; yet many crashes occur because of an invalid state on or after a file read eg. Biblical People crashing on "Holy Spirit". It should be possible to recover gracefully such that one can continue to work from a known state or at least prevent repeated crashing because that name has become the default for accessing BP from the menu. This was the thrust of my question.

    Dave
    ===

    Windows 11 & Android 13

  • MJ. Smith
    MJ. Smith MVP Posts: 54,787

    yet many crashes occur because of an invalid state on or after a file read eg. Biblical People crashing on "Holy Spirit"

    This is why a TRY-CATCH is my favorite construct. And why testing requires careful analysis to create the test cases.

    Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."

  • Bradley Grainger (Logos)
    Bradley Grainger (Logos) Administrator, Logos Employee Posts: 12,072

    We actually prefer it to crash than corrupt your data.

    One can conclude that crashes occur only when data corruption is possible; yet many crashes occur because of an invalid state on or after a file read eg. Biblical People crashing on "Holy Spirit". It should be possible to recover gracefully such that one can continue to work from a known state or at least prevent repeated crashing because that name has become the default for accessing BP from the menu. This was the thrust of my question.

    We looked into what it would take to make Logos 4 more like Chrome (where each tab is isolated in a separate process, and if one crashes, just that tab can be closed, not the entire application). Unfortunately, it would be a much bigger undertaking than we're able to tackle right now, so we have to stick with the IE7 model (where a crash takes down the entire application).

  • Dave Hooton
    Dave Hooton MVP Posts: 36,106

    so we have to stick with the IE7 model (where a crash takes down the entire application).

    Are you stating that a developer cannot manage the conditions for a crash irrespective of the model ie. you cannot interrogate a Null Exception before it crashes?

    Dave
    ===

    Windows 11 & Android 13

  • Bradley Grainger (Logos)
    Bradley Grainger (Logos) Administrator, Logos Employee Posts: 12,072

    so we have to stick with the IE7 model (where a crash takes down the entire application).

    Are you stating that a developer cannot manage the conditions for a crash irrespective of the model ie. you cannot interrogate a Null Exception before it crashes?

    Sure, we could catch all NullReferenceExceptions... but then what? It's not as bad as C++, where the offending code might have trashed some other part of the program's memory, but the application is still in a rather unknown state. There's not really a simple way to retry the operation and see if it succeeds this time. And simply ignoring the exception leaves open the possibility that internal data structures have been partially updated and could cause even worse problems.

    If the crashing code has entered a mutex, should you release it or keep holding it when handling the exception? If you release it, data structures that were intended to be updated atomically will now be visible with partial changes, a situation that none of the code in the application will have been tested against. If you keep the mutex, you've now lost any chance to release it and the app will hang as soon as some other thread tries to acquire it.

    Then there are certain types of exceptions that are really not able to be handled, e.g., OutOfMemoryException or StackOverflowException.  In these cases, there may not even be enough system resources to run our exception handling code. In fact, .NET 4 has been changed to treat these as "critical exceptions" and not allow user code to handle them by default.

    Anyway, if it were an easy and safe problem to solve, we would solve it; I don't believe it is.

  • Rosie Perera
    Rosie Perera Member Posts: 26,194 ✭✭✭✭✭


    Sure, we could catch all NullReferenceExceptions... but then what? It's not as bad as C++, where the offending code might have trashed some other part of the program's memory, but the application is still in a rather unknown state. There's not really a simple way to retry the operation and see if it succeeds this time. And simply ignoring the exception leaves open the possibility that internal data structures have been partially updated and could cause even worse problems.

    Aren't nearly all NullReferenceExceptions caused by faulty code in the application anyway? Some pointer ended up null which shouldn't have been and then the program is trying to reference it? The point of catching the NullReferenceExceptions isn't so you can then fail gracefully -- it might indeed be impossible -- but it's so that you can find the earlier bug that led to the null reference in the first place. We used to catch these in MS Word by throwing up assertion failures (in the debug version only) whenever any situtation happend that we hadn't expected to ever occur and which we hadn't written code to handle. In other words, if we expected that some pointer shouldn't be zero because we were about to reference it, we'd write Assert(pFoo != NULL). Assert was a macro that resolved to putting up a message box telling us the offending line in the code. So during the testing/debugging phase, if we ever saw an asssertion failure (which might very well be followed by a crash) we would know "Ah! I guess that situation can come up after all, and we'll have to write code to handle it." We had Asserts all over the place in the code. Much better to get annoying assertion failures in your face while testing than to have the program crash due to an unexpected situation that users then have to see. You want to not let users ever see the program crash, as that reflects very badly on the skill of the developers.

    Then there are certain types of exceptions that are really not able to be handled, e.g., OutOfMemoryException or StackOverflowException.  In these cases, there may not even be enough system resources to run our exception handling code. In fact, .NET 4 has been changed to treat these as "critical exceptions" and not allow user code to handle them by default.

    The way to handle that is allocate some memory reserve at program start up which you can release if you ever get into a serious out-of-memory situation. It will give you just enough memory to put up a message to the user saying memory is running low, please close the program and reboot your computer as soon as you can. It's better to let the user know why something is happening and give him/her a sense of control over it than have the program just fail catastrophically.

  • Bradley Grainger (Logos)
    Bradley Grainger (Logos) Administrator, Logos Employee Posts: 12,072

    Aren't nearly all NullReferenceExceptions caused by faulty code in the application anyway? Some pointer ended up null which shouldn't have been and then the program is trying to reference it? The point of catching the NullReferenceExceptions isn't so you can then fail gracefully -- it might indeed be impossible -- but it's so that you can find the earlier bug that led to the null reference in the first place. We used to catch these in MS Word by throwing up assertion failures (in the debug version only) whenever any situtation happend that we hadn't expected to ever occur and which we hadn't written code to handle. In other words, if we expected that some pointer shouldn't be zero because we were about to reference it, we'd write Assert(pFoo != NULL). Assert was a macro that resolved to putting up a message box telling us the offending line in the code. So during the testing/debugging phase, if we ever saw an asssertion failure (which might very well be followed by a crash) we would know "Ah! I guess that situation can come up after all, and we'll have to write code to handle it." We had Asserts all over the place in the code. Much better to get annoying assertion failures in your face while testing than to have the program crash due to an unexpected situation that users then have to see. You want to not let users ever see the program crash, as that reflects very badly on the skill of the developers.

    We have assertions and use them to find and fix bugs. In a Release build, they get compiled out, so this code will still crash:

    ASSERT(pFoo != NULL);
    pFoo->Method(); // assume pFoo uses 'this'

    This seems orthogonal to the original topic, which was whether the second line should still be allowed to crash the program in a Release build. Not catching the exception it throws allows us to quickly discover the cause of the bug when it's reported from the field (via Logos4Crash.txt or WER).

    The way to handle that is allocate some memory reserve at program start up which you can release if you ever get into a serious out-of-memory situation. It will give you just enough memory to put up a message to the user saying memory is running low, please close the program and reboot your computer as soon as you can. It's better to let the user know why something is happening and give him/her a sense of control over it than have the program just fail catastrophically.

    This is extremely difficult in a managed environment, where the runtime requires memory to JIT the error-handling code in order to run it. You can try to use Constrained Execution Regions, but those are very limited in what they can do.

  • Bradley Grainger (Logos)
    Bradley Grainger (Logos) Administrator, Logos Employee Posts: 12,072

    This seems orthogonal to the original topic

    Actually, I guess the original topic was about having logging on by default... I was referring to the first tangent I went off on... [:)]

  • Dave Hooton
    Dave Hooton MVP Posts: 36,106

    Anyway, if it were an easy and safe problem to solve, we would solve it; I don't believe it is.

    I appreciate your candour Bradley. It's difficult for us back seat experts to comprehend the problems posed by multi-threaded, object-oriented API's.

    Dave
    ===

    Windows 11 & Android 13

  • Rosie Perera
    Rosie Perera Member Posts: 26,194 ✭✭✭✭✭


    We have assertions and use them to find and fix bugs. In a Release build, they get compiled out, so this code will still crash:

    ASSERT(pFoo != NULL);
    pFoo->Method(); // assume pFoo uses 'this'

    This seems orthogonal to the original topic, which was whether the second line should still be allowed to crash the program in a Release build. Not catching the exception it throws allows us to quickly discover the cause of the bug when it's reported from the field (via Logos4Crash.txt or WER).

    Thanks for the explanation. You're right, it is orthogonal to the original topic, and forgive me for presuming to tell you how to do your job. I've been out of development for many years now and never did work on a multi-threaded app, so I do not understand the complexities. I have no idea what a "mutex" is, for example. Sounds like you are handling this the best you can. I dare say Word would have crashed in the release build too, if the possiblity of a null reference hadn't been caught during testing.

    I'm grateful for how knowledgeable you Logos developers are and how responsive you are to our bug reports. Keep up the good work!

  • Bradley Grainger (Logos)
    Bradley Grainger (Logos) Administrator, Logos Employee Posts: 12,072

    I appreciate your candour Bradley. It's difficult for us back seat experts to comprehend the problems posed by multi-threaded, object-oriented API's.

    I realised that I might have come across as "shutting the conversation down". And in fact, I did derail the spirit of the conversation onto a fairly esoteric topic of "how should exceptions be handled" (which I would argue is a subject for debate amonst the programmers). However, the bigger question, which is fair for you (as end users to ask) is "Why are there so many uncaught exceptions in the first place, and can their frequency be reduced?"

    In many places, the Logos 4 code is not as robust as it should be when it encounters an unexpected situation. It's as if you typed "Gen 1:100" into Copy Bible Verses and the program crashed because there's no verse 100 in Genesis chapter 1; it's not acceptable behaviour (and it's good for you to point this out). While the input is not 100% correct, there is a fairly easy and obvious fallback behaviour that can executed when this data is encountered. The problem is that each of these situations has to be handled individually (because the right fallback often depends entirely on context), and failure (on the programmer's part) to plan for a contingency often results in the dreaded unhandled exception.

    We're slowly discovering and correcting all these oversights; I'd love to get the code to the place where every reasonable erroneous input is handled without crashing, and only the truly unpredictable (and really unrecoverable) situations (e.g., cosmic ray flipping a memory bit, user yanking the hard drive out of the computer while the program is running, etc.) result in exceptions and crashes.

    In the meantime, the beta programme is a great help for us, as it exercises the code on 300+ hardware/software/OS configurations we may not have tested ourselves, it runs the program with real-world documents (not just mocked up test data), it (probably) runs under many more combinations of Program Settings than we test with, it runs the network client code outside our network, and in general executes the code in many more environments than we can create in our (small) test lab. This, more often than not, exposes a code path that was not anticipated here, resulting in annoying crashes. Frustrating as those may be, they help us create a far more stable "gold master" product for the majority of our customers.

  • Bradley Grainger (Logos)
    Bradley Grainger (Logos) Administrator, Logos Employee Posts: 12,072

    Rosie,

    As I just posted, I myself twisted the conversation away from the original (and much more important) topic of "why can't Logos 4 be more stable", and took us down a somewhat unproductive rabbit trail; I apologise for that.

    Thanks for all the hard work you (and others) do to uncover places where we haven't anticipated the crazy things you'll enter into the software, and where we need to be more robust in handling user input. [;)]

    I hope I can explain a mutex fairly simply: a mutex (or "critical section", or sometimes a "lock") allows a program to ensure that only one function is executing at a given time; it mutually excludes other functions from running. A simple example is a list of data. Assume that we have an array of items, and a count of how many indexes in that array are valid (the list can grow and shrink at runtime). To add an item to the list, we need to set "data[count + 1]" and increment "count". These are two separate operations, and it would be bad for a separate thread running in the background to access the list in between these two operations. If count was incremented first, another thread trying to read the last item in the list might read it before it was set by the first thread. Alternatively, if two threads are adding to the list simultaneously, they might obliterate each other's new value of "count", causing an item to be lost.

    A mutex lets the program ensure that only one thread accesses the list's data structures at once. If one thread is adding an item, the thread that wants to read it will have to wait until the first thread finishes. (Taking control of the mutex, and waiting to receive a notification when it's available are handled by the OS.) Using the mutex is voluntary, so each function that manipulates shared data needs to ensure it acquires all the mutexes it needs to do its job; failing to do so is a bug (and often one that doesn't appear until someone's running the program under heavy load on an 8-core processor...).

    Back to the problem with exceptions: if a thread that throws an exception was holding a mutex, then it's extremely likely that it's updated a part of shared data, but hasn't finished updating it. In the example above, maybe it incremented "count", but didn't set a new item in the "data" array. Or maybe it set a new item, but didn't update "count" yet. Either way, if the exception handling code releases the mutex, it very well could leave this shared data in a partially-updated state, which is likely to crash the next piece of code that tries to use it. If eventually you get to a point where the program can't recover from the errors that are occurring, the snowball that started the avalanche may have been lost minutes or hours ago. It's much easier for us to correct the original bug if the program crashes right away.

    The point of this is that it's difficult to recover from an arbitrary exception. However, as I previously posted, in many cases the exception could be avoided entirely with more robust code, and that's the path we're trying to go down. While it might be nice to simply ignore errors and try to continue, we believe it's ultimately better to crash in an unexpected situation so that the root cause can be found and corrected more quickly.

  • Rosie Perera
    Rosie Perera Member Posts: 26,194 ✭✭✭✭✭

    Bradley,

    Thanks for that good explanation. I'd heard the term "semaphore" before for what you describe as a mutex. I don't know much about database programming, but I seem to recall that when there was the chance of data becoming unstable if a crash happened part way through an operation, the software used a "commit" method to make sure it did all its changing in one fell swoop. That way if something failed during the intermediate stages, the data would be rolled back to the previous stable state before bailing out entirely. I'm not sure if any of this has relevance for the multi-thread situation. Again, we're probably way off track from what this thread started out being about...

  • MJ. Smith
    MJ. Smith MVP Posts: 54,787

    Rosie,

    The handling of potentially unstable data varies among the various db's. I believe that SQLite uses a fairly decent rollback process within a failed transaction. So the proper handling of errors becomes a very careful analysis of when to enter (and exit) transaction state to ensure that a rollback is the complete unit. http://www.sqlite.org/lang_transaction.html  gives a good overview of the process involved.

    Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."

  • NetworkGeek
    NetworkGeek Member Posts: 3,734 ✭✭✭

    I'd heard the term "semaphore" before for what you describe as a mutex

    Rosie, a semaphore is the more general design, controlling access to a resource for 'n' number of resources at a time (for example you could design something that allowed access to three threads at a time).  A mutex is a semaphore where n=1 (access by 1 thread at a time).

This discussion has been closed.