Wanna find typos in your resources? Search for "modem"

Page 2 of 3 (46 items) < Previous 1 2 3 Next >
This post has 45 Replies | 3 Followers

Posts 5424
DIsciple II | Forum Activity | Replied: Sun, Dec 9 2012 1:09 PM

Paul Newsome:

I believe I've found another high frequency OCR error word.

My pastor quoted Spurgeon comparing clowns to certain pastors and upon searching for the quote in my library, I noticed many instances of the word "down" interpreted as the word "clown"

A search of my library returns 905 results for the word "clown".  In many of the older  public domain works though highly comical are clearly errors.

Can anyone else confirm this?

I can see how down would become clown .....Stick out tongue

Here's some more examples:

 

And speaking of Clown's in case your wondering why clown ministries are never asked to make a return visit to churches, the TOP 7 Bible Humor Lists offers the following suggestions:

 

REASONS 

CHURCHES DON’T 

ASK CLOWN MINISTRIES 

TO RETURN

 7  They force people to smile at 10 A.M. on Sunday.

 6  It’s difficult to say with a straight face: “The sermon today will be brought to you by Brother Dimples.”

 5 Those balloon sculptures of the Last Supper just take too long to construct.

 4 Clowns wearing blue curly wigs might be confused with the older sisters.

3 Seltzer–water baptism is not recognized by your denomination.

2 Dribble glasses may be used for communion.

1 The ushers don’t appreciate all the Monopoly money in the offering plates.

 Anderson, R., & Veerman, D. 1999. Bible humor top 7 lists (electronic ed.) (222). Word Pub.: Nashville

 

Posts 3163
Dominick Sela | Forum Activity | Replied: Sun, Dec 9 2012 2:22 PM

Posts 47
JD | Forum Activity | Replied: Sun, Dec 9 2012 3:42 PM

Ahhnold says: Fohhget "modem"  !  Ancient technologies revealed... Fighter jets in the Bible !!!  Big Smile

Posts 15805
Forum MVP
Keep Smiling 4 Jesus :) | Forum Activity | Replied: Sun, Dec 9 2012 3:57 PM

Andrew Mckenzie:

I can see how down would become clown .....Stick out tongue

Concur about down becoming clown and modern becoming modem:

After reading about "scannos", wonder about "be" for "he", "arid" for "and", ... => https://wiki.benetech.org/display/BSO/4.2+C.+10.+Do+a+Spell+Check+to+catch+typical+scannos

Keep Smiling Smile

Posts 18903
Rosie Perera | Forum Activity | Replied: Sun, Dec 9 2012 4:17 PM

I once started a huge systematic check for lower-case L's that had scanned as the number 1 and upper-case letter O's that had scanned as number 0 in the first letter or two of words. (E.g., S0ren Kierkegaard; just found a bunch of them and reported them). I don't think I ever made it all the way through, but I started by typing into the Search tab a1a, a1b, a1c, etc. and seeing what matched each time. When I found what looked like it was meant to be a real word but which had a digit in place of a letter, I'd search my whole library for that "word" and then report all the instances of it as typos. There were hundreds and hundreds of these. It's the kind of thing one could spend days and days on and find tons of typos. They could probably do it faster if they wanted to automate it. But frankly, I don't think typos are that high a priority to them.

Posts 5424
DIsciple II | Forum Activity | Replied: Sun, Dec 9 2012 5:29 PM

With the release of all the new data sets we can see why typos haven't been a priority.  I don't find them a problem but report what I come across.  My only concern would be incorrect hyperlinks being fixed as this effects function.  Typos can on the other hand improve daily smile factor so they are less of a prority

Posts 18903
Rosie Perera | Forum Activity | Replied: Sun, Dec 9 2012 5:45 PM

Andrew Mckenzie:

With the release of all the new data sets we can see why typos haven't been a priority.  I don't find them a problem but report what I come across.  My only concern would be incorrect hyperlinks being fixed as this effects function.  Typos can on the other hand improve daily smile factor so they are less of a prority

Typos aren't a problem for me when I'm reading, because I know what was intended. But they're a problem when I'm searching. For example, if I were searching for Soren Kierkegaard, it wouldn't find any occurrences that were spelled S0ren Kierkegaard.

Posts 1674
Paul N | Forum Activity | Replied: Sun, Dec 9 2012 8:08 PM

Rosie Perera:

Andrew Mckenzie:

With the release of all the new data sets we can see why typos haven't been a priority.  I don't find them a problem but report what I come across.  My only concern would be incorrect hyperlinks being fixed as this effects function.  Typos can on the other hand improve daily smile factor so they are less of a prority

Typos aren't a problem for me when I'm reading, because I know what was intended. But they're a problem when I'm searching. For example, if I were searching for Soren Kierkegaard, it wouldn't find any occurrences that were spelled S0ren Kierkegaard.

 

I think this would the greatest incentive for Logos to focus on correcting such scanning errors/inconsistencies as they are found.  The percentage of error is incredibly small, yet they still cause innacuracies in searches.

Posts 8893
fgh | Forum Activity | Replied: Mon, Dec 10 2012 4:43 AM

Keep Smiling 4 Jesus :):
After reading about "scannos", wonder about "be" for "he", "arid" for "and", ...

There are plenty of words that can easily become other words. The thing with modem and clown is that in Logos they're almost certain to be errors, so it shouldn't be that hard to get rid of them. And yet there's been modem's even in resources released this autumn. 

Be vs he, on the other hand, requires careful reading by a real person.

Andrew Mckenzie:
I don't find them a problem (...) they are less of a prority

They should be a very high priority. Logos sells their products with the claim that the software will find "every" occurrence of the words we search for. As long as resources are full of typos, that's not true. And missing the most relevant search result because of a typo isn't fun.

Paul Newsome:
The percentage of error is incredibly small,

Unfortunately, in some resources it's incredibly large.

"The Christian way of life isn't so much an assignment to be performed, as a gift to be received."  Wilfrid Stinissen

Mac Pro OS 10.9.

Posts 5424
DIsciple II | Forum Activity | Replied: Mon, Dec 10 2012 11:12 AM

fgh:

Keep Smiling 4 Jesus :):
After reading about "scannos", wonder about "be" for "he", "arid" for "and", ...

There are plenty of words that can easily become other words. The thing with modem and clown is that in Logos they're almost certain to be errors, so it shouldn't be that hard to get rid of them. And yet there's been modem's even in resources released this autumn. 

Be vs he, on the other hand, requires careful reading by a real person.

 

Andrew Mckenzie:
I don't find them a problem (...) they are less of a prority

They should be a very high priority. Logos sells their products with the claim that the software will find "every" occurrence of the words we search for. As long as resources are full of typos, that's not true. And missing the most relevant search result because of a typo isn't fun.

 

Paul Newsome:
The percentage of error is incredibly small,

Unfortunately, in some resources it's incredibly large.

 

Not suggesting they shouldn't be a priority just if I had to make a choice I'd much rather see hyperlinks fixed, but do take others view in terms of searching being a significant problem seriously.  For the way I use Logos this comes in a very close second.

Posts 3163
Dominick Sela | Forum Activity | Replied: Mon, Dec 10 2012 4:28 PM

This has of course been discussed in the past.  The only way to really eliminate it long term IMHO is to develop an automated process where the user community can be involved in actually updating with the fixes - kind of like a wiki, or an open source process. It would have to be rigorous and secure, maybe users have to be authorized for permission to supply updates, maybe users can opt in to accept the typo updates or not...but I think expecting Logos to invest in fixing the .001% of characters that are typos is very unrealistic. The problem should be addressed though, it's one of those things where the aggravation factor for customers is greater than the benefit to investing in fixing them by Logos - the dollars to fix them comes away from something likely more important.

Posts 292
Hapax Legomena | Forum Activity | Replied: Mon, Dec 10 2012 5:01 PM

I tried to find some of these errors, but what is it about Catholic theologians and clowns?

 

Posts 1692
Ken McGuire | Forum Activity | Replied: Tue, Dec 11 2012 7:46 AM

Another common OCR error I have found in producing PB's is "m" for "in".  I try to search for in in Word before releasing it them to the public.  Harder to do this search within Logos.

The Gospel is not ... a "new law," on the contrary, ... a "new life." - William Julius Mann

L8 Anglican, Lutheran and Orthodox Silver, Reformed Basic, Academic Essentials

L7 Lutheran Gold, Anglican Bronze

Posts 3163
Dominick Sela | Forum Activity | Replied: Tue, Dec 11 2012 1:30 PM

I have almost 200 PBs, and about 20 of them had the modem for modern error. I corrected those!  Check your own PBs...

Posts 5615
Todd Phillips | Forum Activity | Replied: Wed, Dec 12 2012 12:09 PM

After much deliberation, I have concluded that the word "modem" should be thrown out of the English language.  Before long, analog carrier signals will become obsolete. Un-naming the "modem" will hasten the process, which is in everyone's best interest---both for increased network bandwidth and for OCR accuracy.

Geeked

Wiki Links: Enabling Logging / Detailed Search Help - MacBook Pro (2014), ThinkPad E570

Posts 268
JC54 | Forum Activity | Replied: Wed, Dec 12 2012 2:22 PM

I have 543 clowns, but looking them over, most of them seem to be real clowns.

Posts 18903
Rosie Perera | Forum Activity | Replied: Wed, Dec 12 2012 2:52 PM

Todd Phillips:
After much deliberation, I have concluded that the word "modem" should be thrown out of the English language.  Before long, analog carrier signals will become obsolete. Un-naming the "modem" will hasten the process, which is in everyone's best interest---both for increased network bandwidth and for OCR accuracy.

Throwing the word modem out of the dictionary won't solve the OCR accuracy problem. I encounter tons of OCR errors in creating my PBs where the "words" are not anything recognized in any dictionary. Mixtures of letters and numbers, etc.

Posts 5615
Todd Phillips | Forum Activity | Replied: Wed, Dec 12 2012 2:58 PM

Rosie Perera:

Todd Phillips:
After much deliberation, I have concluded that the word "modem" should be thrown out of the English language.  Before long, analog carrier signals will become obsolete. Un-naming the "modem" will hasten the process, which is in everyone's best interest---both for increased network bandwidth and for OCR accuracy.

Throwing the word modem out of the dictionary won't solve the OCR accuracy problem. I encounter tons of OCR errors in creating my PBs where the "words" are not anything recognized in any dictionary. Mixtures of letters and numbers, etc.

I didn't say it would solve all OCR problems. It would just solve the Modem problem...every little bit of accuracy counts.

(You do realize I was being tongue-in-cheek?) Stick out tongue

Wiki Links: Enabling Logging / Detailed Search Help - MacBook Pro (2014), ThinkPad E570

Posts 59
Eric Ruhnow | Forum Activity | Replied: Wed, Dec 12 2012 5:31 PM

The infamous internet transpose "teh" makes it in a few times as well. Once you weed out the use of it in pronunciations, there are several in the Perseus collection.

Even have it in one of the Bibles (NCV). Thankfully the error is NOT in the Bible text, but the introduction:

 

What Is the Bible About?

The Bible cannot be considered just a “crutch” that you can turn to when the pressures of life overwhelm you. It is a supernatural book that has survivied and thrived through centuries of being scoffed at, ridiculed, and banned. Kings have branded it as illegal, and countless lives have been martyred because they had teh courage to stand by its truths. For the millions and millions of peole who have tested its answers to life’s questions and found them true, there is only one conclusion—the Bible is God’s book. Every word is inspired by him and reveals something very important about him. From these pages we hear the voice of God.

 

Lenovo TS130 Xeon E3-1245V2 | 20GB | 256 GB SSD (OS and Logos) | 3TB WD Red | Windows 10 Pro x64

L4 & L5 Platinum, L6 Gold, L5 Reformed Gold, L6 Reformed Bronze, L7 Lutheran Silver, L7 Reformed Starter, L7 Full Feature Set

Posts 18903
Rosie Perera | Forum Activity | Replied: Wed, Dec 12 2012 6:52 PM

Todd Phillips:

Rosie Perera:

Todd Phillips:
After much deliberation, I have concluded that the word "modem" should be thrown out of the English language.  Before long, analog carrier signals will become obsolete. Un-naming the "modem" will hasten the process, which is in everyone's best interest---both for increased network bandwidth and for OCR accuracy.

Throwing the word modem out of the dictionary won't solve the OCR accuracy problem. I encounter tons of OCR errors in creating my PBs where the "words" are not anything recognized in any dictionary. Mixtures of letters and numbers, etc.

I didn't say it would solve all OCR problems. It would just solve the Modem problem...every little bit of accuracy counts.

(You do realize I was being tongue-in-cheek?) Stick out tongue

I know you were being tongue-in-cheek, but I like being pedantic about imprecision even with tongue-in-cheek posts. You see, it wouldn't even solve the Modem problem. My point was that OCR doesn't care if something is in the dictionary or not (it seems to have no problem with S0ren Kierkegaard for example; the o in Soren being a zero). If Modem isn't in the dictionary, it could still show up instead of Modern in an OCR'ed text. So there!

Page 2 of 3 (46 items) < Previous 1 2 3 Next > | RSS