Some info on community rating

Kolen Cheung
Kolen Cheung Member Posts: 1,096 ✭✭✭
edited November 2024 in English Forum

Here’s just a note on what I’ve found out:

  • community rating is a float with 1 d.p. (i.e. 3.5, 4.6, etc.)
  • you can set rating:2.1. If you set rating:2, you’re going to miss out all other ratings that might have been round off to 2. i.e. rating:2 doesn’t include 1.9, etc.
  • currently comparative operator doesn’t work on community rating. I consider that as a bug: https://community.logos.com/forums/t/179258.aspx
  • using 4904 resources as a sample, one can find the PDF and KDE of this:
Logos Community Rating PDF and KDE Logos Community Rating PDF and KDE

This clearly shows 2 clusters, and one can use it to “discern between good and bad” and added that as a collection rule or search rule for example. (Only if the bug mentioned above is fixed, to facilitate the use of rating>2.1)

The original data is manually entered by filtering using Library. This is an example of what would becomes much easier to do if scriptable API to interface with Logos is provided. (Imagine this analysis is updated when new resources are added, and also community ratings themselves are updated.)

P.S. if one pays very close attention, there might have been 3 clusters. Or it may just be noise. Can’t tell unless more data is fed!

Tagged:

Comments

  • NB.Mick
    NB.Mick MVP Posts: 16,045

    This clearly shows 2 clusters, and one can use it to “discern between good and bad” and added that as a collection rule or search rule for example

    I don't think so. Many users don't use rating at all. From forum discussions it seems that those who do, especially the oldtimers (and thus sometimes serving as a role model for the others) apply a system that uses 2 as a default rating - this from the times where you couldn't see "added" dates and thus needed a way to flag resources as "has been part of my library last I looked" (I use the mytag:mystuff for the same reason). For people using this scheme, 2 is not a rating. Some may then start and use 3 through 5 in the way the "Amazon-model" 1 through 5 is used by others (but then some may use ratings like European school grades (1 best, 5 worst - you find those inconsistencies as well on Amazon). 

    The curve you depicted looks a lot like a bell curve around the 3, overlayed with those special 2s, so I would not use it to discern between good and bad.    

    Have joy in the Lord! Smile

  • Kolen Cheung
    Kolen Cheung Member Posts: 1,096 ✭✭✭

    Note that:

    • what you’re saying is very different from what the graph is saying. Remember each data point is a community rating (I guess the average of all participating users) of a given resource in my library. The existence of a bunch of special 2s to a subset of users of course will skew stuffs, but first they will skew the averages towards 2 (ie you can imagine the variance around 2 is smaller than expected), but averaging and focusing only on the relative order largely overcome this
    • I also have manually compared my own rating with the community ones. Note that for any data analysis to make sense, the human has to go check it makes some sort of sense which is beyond what the data could’ve tell you (because the human has prior knowledge eg on what’s good and bad)
  • PetahChristian
    PetahChristian Member Posts: 4,636 ✭✭✭

    NB.Mick said:

    Many users don't use rating at all.

    Guilty. I happen to have rated two books, out of thousands.

    Thanks to FL for including Carta and a Hebrew audio bible in Logos 9!

  • PetahChristian
    PetahChristian Member Posts: 4,636 ✭✭✭

    the human has prior knowledge eg on what's good and bad

    We can all think of cases where we liked a book that others didn't tend to like.

    We don't tend to be a good judge of what's good and bad. Relative vs. absolute, subjective vs. objective, and so on.

    Thanks to FL for including Carta and a Hebrew audio bible in Logos 9!

  • Kolen Cheung
    Kolen Cheung Member Posts: 1,096 ✭✭✭

    An analogy with be like ratings from IMDb. Given IDMb score (community rating), does high IDMb score result in more likely one, or more personally I, like the movie/TV? From my experience It’s true for me most of the time, but I have rare cases that I absolutely hate a movie with a very high IMDb rating.

    Lessons here:

    • One should first check if the community rating closely follows one’s taste. In my case, a manual check shows IMDb and Logos community rating both resemble my taste to a large degree. This assumption can breaks, eg if a non English native, non US born person being shown a very good TV show in politics in US, they are very likely to absolutely hate it.
    • This predictor (using community rating to predict personal taste) are virtually never to be 100% accurate, as in anything involved in Statistics.
    • Wether this is useful or not depends, especially on the user. There’s another thread someone essentially is dismissing the value of community tag. The logic is the same—statistics never reflects 100% correctly on what I think about that (rate, tag, etc.) I think may be many people here are spoiled by the Logos style of doing things. I.e. Logos spends a lot of resources to manual tag something to create accurate dataset. The quality of this is superior to anything done automatically (such as using natural language processing.) So perhaps especially Logos users has a larger tendency to not accept that a certain search might give them fuzzy results.
    • Yes, statistics never reflects 100% on your judgement on the issue. But then who is to say that one’s judgement can be trusted? I wonder how normally Logos users rate their resources, and how many of them. For me I’d like to rate all of them. The problem becomes one hasn’t read all of them but need to rate all of them. Which one should one do first? Rating all of them is doable while tedious, having each one read first and then rate it is just impossible (given typical Logos users buy base packages and has thousands of resources.) Now why would I trust my own rating if I haven’t happened to even finish reading it? One may base their rating by the word of mouth, but the same question can be asked (how can you be sure say D.A. Carson’s taste reflect yours?); or base on reading the excerpt (this is just arrogant for one to think that can results in an accurate rating.)
    • What I personally did is to revisit my own rating vs. community rating and essentially they are mostly agreeing (in terms of trend, e.g. most of my 3 happens around communities' 4), and for those that they doesn’t agree, I’ve found they are mostly my mistakes. E.g. I rated something I wouldn’t use at all as 1, where one such resource is very reputable and has a community rating of 4. Another instance is I use relative rating to order which King James Versions I think are better. So I gave 2, 3, 4 rating to 3 of them, but in reality they aren’t differing that much to result in that much a difference in score.
    • With enough data, much of the noise and bias can be removed. E.g. if the whole database of user rating are given, such as the rating distribution per resource over all users, you can do a ton of data cleaning to remove bias. E.g. with the prior knowledge some Logos users convention like to use 2 as default (default value bias is very common and is almost always one of the first things people try to remove), then one would first perform a data analysis to detect that and remove it (again it won’t be perfect but reduces biases.)

    So take this thread as a glimpse into what data analysis into Logos/Faithlife’s data can provide, and how an API to this enable these, rather than a final verdict.

    P.S. it is not impossible for default value of 2 can strongly skewed the averages. Imagine there’s a resource most Logos users don’t read and rate at all, and they assign a default value of 2. Then actual rating samples are of low quantities, hence the mean is essentially determined by the default. However, this doesn’t prevent one to conclude there’s 2 clusters here (one many argue from the graph, luckily the real ratings seems to tends to be high, i.e. centered and spreader higher than 2 to make this default bias not mingled within the actual distribution.) nor the usefulness of this information. One can infer since these resources is mostly not read and rated by the general Logos community, it might not worth my time exploring it. Not that you won’t find gems here, just less likely.

  • DMB
    DMB Member Posts: 13,787 ✭✭✭

    Well, Kolen, your work is admirable but, I tend to agree with NB. When they first intro'd, people were indeed using them in code-like fashion. The first rule in analytics is to know the basis for input, before interpreting output.

    Also, I'm not quite sure what the community rating is, vs the Logos.com rating. The latter seems almost useless, since Logos oddly tries to not offer books people don't like. So, best I can see, books are either 4.8 (1-5) or unrated. Unrated tends to be not popular, instead of not good.

    "If myth is ideology in narrative form, then scholarship is myth with footnotes." B. Lincolm 1999.

  • Kolen Cheung
    Kolen Cheung Member Posts: 1,096 ✭✭✭

    Denise said:

    Well, Kolen, your work is admirable but, I tend to agree with NB. When they first intro'd, people were indeed using them in code-like fashion. The first rule in analytics is to know the basis for input, before interpreting output.

    I think you didn't read my last reply and don't really understands what's going on... (Hint: read the P.S. part.)

  • Kolen Cheung
    Kolen Cheung Member Posts: 1,096 ✭✭✭

    I think the right takeaway is these:

    • there are discernible clustering, how you label the 2 clusters are entirely subjective. I would personally rate it as useful or not. Again, any classification of this sort is not going to be 100% accurate but is a good prior to your next move (think Bayesian)

    • more data is better. i.e. if the whole database is given, one can perform EDA to discover something strange and marked them as defaults and perform data cleaning. But given the current available information, this is as good as one can gets. (think bayesian again, about what’s your current prior, data, and theory)

      • so this actually supports the idea for users to be able to tap into the APIs (Logos and/or web, just like how Twitter for example has release API keys)
  • Kolen Cheung
    Kolen Cheung Member Posts: 1,096 ✭✭✭

    First, the bug I mentioned isn’t really bug but a wrong syntax.

    Case in point: rating:<2.2 ANDNOT rating:0 results in 100% agreement in my own tagging. i.e. if I use that as a rejection criteria (labeling “bad”), I’d have no false positives.

    Note that 2.1 is the minimum between the 2 clusters and is used as a simple cut on clustering decision. Note that we aren’t even trying to do Logistic regression and provide my own supervised labels. The point is really to get something very useful in short amount of time (comparing to e.g. going through all resources and rating it by myself.)

    It’s never completely accurate but a tradeoff. Wether that tradeoff is good enough varies on opinion and also on how closely it is to an individuals.

    I kind of regret posting it here because there’s often people don’t understand and/or don’t get it. I’d have saved more time by just performing that and use it on my own.

  • David Ames
    David Ames Member Posts: 2,977 ✭✭✭

    NB.Mick said:

    I don't think so. Many users don't use rating at all. From forum discussions it seems that those who do, especially the oldtimers (and thus sometimes serving as a role model for the others) apply a system that uses 2 as a default rating 

    Yes, some "oldtimers" use the "rating system" in way that has nothing to do with how good the resource is but if they want it to be included in a search.   [[0 New resource, 1 don't include, though 5 for PBB as in if I took the time to input that then I want it in all searches.] 

    What if Logos looked at our flag on report community ratings. Then if we do not want to see how the community reports ratings then don't include our rating in the community ratings for others.     

  • Kolen Cheung
    Kolen Cheung Member Posts: 1,096 ✭✭✭

    NB.Mick said:

    I don't think so. Many users don't use rating at all. From forum discussions it seems that those who do, especially the oldtimers (and thus sometimes serving as a role model for the others) apply a system that uses 2 as a default rating 

    Yes, some "oldtimers" use the "rating system" in way that has nothing to do with how good the resource is but if they want it to be included in a search.   [[0 New resource, 1 don't include, though 5 for PBB as in if I took the time to input that then I want it in all searches.] 

    What if Logos looked at our flag on report community ratings. Then if we do not want to see how the community reports ratings then don't include our rating in the community ratings for others.     

    Note that the discussion on the default of 2 is quite irrelevant here. His observation is based on a misunderstanding of the graph itself (what distribution are you really seeing?)

    Equation-wise, one can model this bias:

    \mu_\text{measured} = 2 p + (1 - p) \mu_\text{true}

    Note that

    • we are focusing one 1 single resource, and the mean is over the entire user pool Logos has access to in order to calculate the community rating.
    • this is not an assumption but can be derived, assuming some users rated the resource at default 2
    • p depends on resource. i.e. if i is the resource index, \mu_{i, \text{measured}} = 2 p_i + (1 - p_i) \mu_{i, \text{true}}
    • p can be interpreted as the probability that a user rated that resource at 2. i.e. when p = 0, the estimator is unbiased. When p \neq 0, it is not difficult to see that the estimator biased towards 2.
    • the distribution you’re seeing is \mu_{i, \text{measured}}, not s_{ij}, where s is the score, i is the resource index, j is the user index. You don’t got to see j, Logos doesn’t release this dependency.

    Now, another question is how p_i would distribute across resources. If it doesn’t vary too much, and is sort of random, its effect on \mu_{i, \text{measured}} is just biasing it towards 2, i.e. narrowing its variance towards 2, but not changing its relative order.

    However, if there’s some strong correlation between p_i to some i, the situation can be different. The most extreme scenario one can imagine is this: there are 2 kinds of resources, one that Logos users generally care about and one doesn’t, such as free Faithlife eBooks. So for these kinds of resources, p_i might distribute much closer to 1. i.e. the estimator is almost insensitive to the actual mean. In this case, it will results in a strong cluster around 2.

    Essentially this is the detailed argument of the counterpoint I loosely describe above verbally. Even in this case, how will one label this kind of resources such that most users who rated it don’t even bother changing it from the default 2? “good” or “bad” may not be the right word, but you get the idea, these are the resource I shouldn’t spend too much time caring about either.

    Also, try to think about this, why would the conventional wisdom of a default value to be 2? It reflects the relative ranking one would placed between something one cares or not. That’s why you don’t default it to 5, it must be pretty special to be regarded like that. That’s also why you don’t default it to 1, because it must be pretty special to be trashed like that.

    And in reality no one really cares why it clusters like this, as long as it is a good predictor. As I’ve already mentioned, the minimum between the 2 clusters happened at 2.1. Comparing community rating < 2.2 has 100% agreement on my own manual rating (i.e. 1 or 2). As long as I move it to community rating < 2.3, my rating already doesn’t agree with that (2 of mine rated at 4 is at community rating of 2.2.)

    So case in point it is entirely useful to me. Whether or not it is useful to you requires your own experiment, because the last verification step requires your own rating as the supervised label. (And the crossing point in this cluster depends on your library, i.e. the distribution of average ratings across all your available resources.)

    And please if you can’t follow the arguments/equations above, please don’t even try to argue at all. Those are very introductory probability that if one can’t follow it, one almost certainly do not know how to interpret even what the graph means. Even for those who understand, with the available data this is as much as one can do. Unless we are given more data (the j index above), not much more can be said.

    P.S. there’s a more subtle argument that the default value might not be at fault here. If you follow the model above, you’ll found that (with simple assumption on \mu_{i, \text{true}}) the left clusters should peaked above 2, not below it.

  • scooter
    scooter Member Posts: 781

    NB.Mick said:

    I don't think so. Many users don't use rating at all. From forum discussions it seems that those who do, especially the oldtimers (and thus sometimes serving as a role model for the others) apply a system that uses 2 as a default rating 

    Yes, some "oldtimers" use the "rating system" in way that has nothing to do with how good the resource is but if they want it to be included in a search.   [[0 New resource, 1 don't include, though 5 for PBB as in if I took the time to input that then I want it in all searches.] 

    What if Logos looked at our flag on report community ratings. Then if we do not want to see how the community reports ratings then don't include our rating in the community ratings for others.     

    Note that the discussion on the default of 2 is quite irrelevant here. His observation is based on a misunderstanding of the graph itself (what distribution are you really seeing?)

    Equation-wise, one can model this bias:

    \mu_\text{measured} = 2 p + (1 - p) \mu_\text{true}

    Note that

    • we are focusing one 1 single resource, and the mean is over the entire user pool Logos has access to in order to calculate the community rating.
    • this is not an assumption but can be derived, assuming some users rated the resource at default 2
    • p depends on resource. i.e. if i is the resource index, \mu_{i, \text{measured}} = 2 p_i + (1 - p_i) \mu_{i, \text{true}}
    • p can be interpreted as the probability that a user rated that resource at 2. i.e. when p = 0, the estimator is unbiased. When p \neq 0, it is not difficult to see that the estimator biased towards 2.
    • the distribution you’re seeing is \mu_{i, \text{measured}}, not s_{ij}, where s is the score, i is the resource index, j is the user index. You don’t got to see j, Logos doesn’t release this dependency.

    Now, another question is how p_i would distribute across resources. If it doesn’t vary too much, and is sort of random, its effect on \mu_{i, \text{measured}} is just biasing it towards 2, i.e. narrowing its variance towards 2, but not changing its relative order.

    However, if there’s some strong correlation between p_i to some i, the situation can be different. The most extreme scenario one can imagine is this: there are 2 kinds of resources, one that Logos users generally care about and one doesn’t, such as free Faithlife eBooks. So for these kinds of resources, p_i might distribute much closer to 1. i.e. the estimator is almost insensitive to the actual mean. In this case, it will results in a strong cluster around 2.

    Essentially this is the detailed argument of the counterpoint I loosely describe above verbally. Even in this case, how will one label this kind of resources such that most users who rated it don’t even bother changing it from the default 2? “good” or “bad” may not be the right word, but you get the idea, these are the resource I shouldn’t spend too much time caring about either.

    Also, try to think about this, why would the conventional wisdom of a default value to be 2? It reflects the relative ranking one would placed between something one cares or not. That’s why you don’t default it to 5, it must be pretty special to be regarded like that. That’s also why you don’t default it to 1, because it must be pretty special to be trashed like that.

    And in reality no one really cares why it clusters like this, as long as it is a good predictor. As I’ve already mentioned, the minimum between the 2 clusters happened at 2.1. Comparing community rating < 2.2 has 100% agreement on my own manual rating (i.e. 1 or 2). As long as I move it to community rating < 2.3, my rating already doesn’t agree with that (2 of mine rated at 4 is at community rating of 2.2.)

    So case in point it is entirely useful to me. Whether or not it is useful to you requires your own experiment, because the last verification step requires your own rating as the supervised label. (And the crossing point in this cluster depends on your library, i.e. the distribution of average ratings across all your available resources.)

    And please if you can’t follow the arguments/equations above, please don’t even try to argue at all. Those are very introductory probability that if one can’t follow it, one almost certainly do not know how to interpret even what the graph means. Even for those who understand, with the available data this is as much as one can do. Unless we are given more data (the j index above), not much more can be said.

    P.S. there’s a more subtle argument that the default value might not be at fault here. If you follow the model above, you’ll found that (with simple assumption on \mu_{i, \text{true}}) the left clusters should peaked above 2, not below it.

    U de man!!

  • David Ames
    David Ames Member Posts: 2,977 ✭✭✭

    Note that the discussion on the default of 2 is quite irrelevant here. 

    72% of my library is rated at my normal of 3.  4% at 4 [all of my PBBs]]   23% at 1 [[most of my none Biblical resources like the Harvard Classics]]

    If you want true user ratings we need to remove people like me that have not rated their library the way 'normal' people are 'expected' to do.  

    I assume that most of them have set community ratings to no so they could be ignored in the community ratings if any one chose to do so.

    [[If I am not interested in the community ratings [Com. Ratings => 'No'] then ignore my resource ratings when looking at what the community likes]] 

  • Kolen Cheung
    Kolen Cheung Member Posts: 1,096 ✭✭✭

    72% of my library is rated at my normal of 3. 4% at 4 [all of my PBBs]] 23% at 1 [[most of my none Biblical resources like the Harvard Classics]]

    If you want true user ratings we need to remove people like me that have not rated their library the way 'normal' people are 'expected' to do.

    That won't matter, you're just a noise I the seas of user ratings. In general "isolated cases" like your rating is part of systematic biases. Without the j index mentioned above, there's no way one can estimate the systematic bias introduce, that's why other kind of check is needed to make sure the data isn't completely garbage.

    Luckily, in your case, considering only Christian resources, had introduce no bias at all in relative ordering of the final score. Also your bias towards non biblical resources reflects what this community generally thinks. Luckily I am not the minority of classic scholar using Noet.

  • MJ. Smith
    MJ. Smith MVP Posts: 53,774

    Also your bias towards non biblical resources reflects what this community generally thinks.

    Are you considering liturgical books, church fathers, histories, Judaica, ancient literature etc. as biblical or non biblical when you make this statement?

    Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."

  • Kolen Cheung
    Kolen Cheung Member Posts: 1,096 ✭✭✭

    I don’t care. Think of it as IMDb rating, it reflects What the community thinks. So what’s that community? Presumably majority are white, North America, men, movie lovers, etc.

    The sample is not SRS, is biased in that community. So it reflects the communities’ general taste. I’ll consult with it, make it part of my decision towards if I should watch it or not. (Together with “community tag” so to speak.) Sometimes I watched one based on a very high score and I deemed it as one of the worst I’ve ever watch. And I’m sure I would have missed some hidden gems (I once watched a comedy trashed in IMDb and find it very unique and underrated.) But one have to start somewhere (as a prior information) to bet their next move. It could be wrong, but getting right can be more costly.

    Another way to think about it is this: it is rediculous to trust one’s own rating than the community one. The practice of rating most if not all of one‘s resources when they buy a package is arrogant to say the least. It is as if I go ahead to rate all movies in IMDb first before I’ve even watched the whole thing (we might read excerpts or watch trailers.)

    in this regard, Faithlife TV’s rating system can be much better. Instead of rating everything before one watched anything (don’t think anyone would do that), they prompt you to rate it upon you finishing it. Memory is fresh, rating is genuine, much more trust worthy. (But people might bias towards just rating 4 or 5 there, as shown in Faithlife’s timeline. Even so it doesn’t matter, it is like binary, 5 is like, 4 is not like (but not too bad such that one doesn’t even finish it.))

    After This exercise (of comparing my own rating to the community one), I decided to remove all my original rating and will add it back one by one, only to those I really know how good it is. Having a community rating free one to force themselves to somehow get this rating out of somewhere just to get start (eg on prioritizing resources.) in the past I consulted eg best commentaries.com for help to rate resources, this is laborious and at best a white lie because I genuinely had no idea if it is good but to take someone else’ rating to pretend it as my own.

  • MJ. Smith
    MJ. Smith MVP Posts: 53,774

    Another way to think about it is this: it is rediculous to trust one’s own rating than the community one.

    Actually, for those of us who are not part of your presumed community but are Catholic, Orthodox, Latter Day Saints, Messianic Jewish . . . the reverse is true. That is why I asked the initial question.

    Orthodox Bishop Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."; Orthodox proverb: "We know where the Church is, we do not know where it is not."

  • Kolen Cheung
    Kolen Cheung Member Posts: 1,096 ✭✭✭

    I think the point is I don’t care as long as the data don’t have that information. As suggested earlier on, in principle one can use one‘s denomination declared from Faithlife profile to “slice“ through the data. It can also infer some very interesting things, such as if a particular resource varies a lot between a denomination or not, to infer if that resource is very “opinionated“ or not.

    One can even ask questions if a denominational package is targeting the right population, or to put it in reverse, design a denominational package based on that data.

    But also one can argue if a resource can be rated as good across denominations and disciplines, then that must be a good and objective one. That‘s why scoring high is difficult (I think for me the highest averge among my resources are like 4.6), because the ceiling is at 5 but there can be many reason one to downvote it. (That’s also why the right tail, say above 3.8 or 4, is very smooth and exponential like.)