Bob,
As a Logos user from the days of Logos 1 when 50 books was a complete library, I have a question.
Years ago I imagined you slaving away in a garage somewhere, manually scanning books into the latest $100 OCR program, and then doing your computer magic to "Logosise" the resource.
I was wondering if you would like to give us a little history of how you digitised resources in the past, what OCR programs you used, and how you do it now.
I don't expect any trade secrets to be revealed. I am not a spy for a competitor, just curious.
Thx
Stephen Miller
Sydney, Australia
Bob isn't involved in the current conversion process these days but I can speak to it briefly.
I hope that helps.
I recall reading on these boards that, given the error rate with even the best OCR, once the manual effort to correct those errors is factored in, the double key method is actually better than OCR-and-correct; less time and/or less money, I forget which.
Kyle,
Thanks for the modern info. I am amazed that the resources are copied by hand ..... have we goner back to the medieval monasteries?????
Love others to fill in more info.
Stephen
Kyle G. Anderson:Of course its dependent on the source material. I recall a recent case where the output for a Latin text of Jerome was quite bad. Unfortunately this was due to a print edition that was lacking. A user alerted us to this and pointed us to a better quality PDF and we were able to fix it.
I believe you're referring to the Augustine Loeb volume, Kyle, not the Jerome, which is still waiting to be cleaned up. ;)
Are you sure that the Latin texts were double-keyed as you describe? I find it hard to believe that two different typists would both type in "ct" instead of "et" twenty-five times in the one Jerome volume alone. (You can see what I mean by doing an inline search for " ct ", or "dc", which should be " et " and "de" respectively). Given, the Augustine text was worse. :)
...
Stephen Miller: Kyle, Thanks for the modern info. I am amazed that the resources are copied by hand ..... have we goner back to the medieval monasteries????? Love others to fill in more info. Stephen
Given that the copyists are typing on a computer keyboard, nothing medieval about it. The human brain remains a better processor for many cognitive tasks, and optical character recognition (aka "reading") is one of those.
Greg F: Kyle G. Anderson:Of course its dependent on the source material. I recall a recent case where the output for a Latin text of Jerome was quite bad. Unfortunately this was due to a print edition that was lacking. A user alerted us to this and pointed us to a better quality PDF and we were able to fix it. I believe you're referring to the Augustine Loeb volume, Kyle, not the Jerome, which is still waiting to be cleaned up. ;) Are you sure that the Latin texts were double-keyed as you describe? I find it hard to believe that two different typists would both type in "ct" instead of "et" twenty-five times in the one Jerome volume alone. (You can see what I mean by doing an inline search for " ct ", or "dc", which should be " et " and "de" respectively). Given, the Augustine text was worse. :)
Yes. Places that do projects like this key exclusively off of character recognition. While they may have an excellent grasp of the nuances of English that may not be true for languages like Latin or Greek which are read by a much, much, much smaller section of the population. I distinctly recall the book you are mentioning. I know enough Latin to be able to say "that's not right" but was astounded to discover that in the print that we used the "e's" looked exactly like "c's". If anything I put the fault on us for not catching that in the print to begin with. Believe me, there's some truly awful print out there and we've had to reject a great deal of it for use.
Kyle G. Anderson:I know enough Latin to be able to say "that's not right" but was astounded to discover that in the print that we used the "e's" looked exactly like "c's".
I wonder if you were looking at a book that had been OCRd and typeset by another person before you got it. I once bought a print book on Amazon. It was modern reprinting of an old book,long out of print. But it had been OCRd and re-typeset, presumably by an automatic process and it was almost unreadable in places, particularly in non-English passages.
Here's what the frontispiece says (which wasn't made clear before purchase, of course):
Not sure I'd feel confident paying someone who offered to "proof read"....
Abram K-J: Pastor, Writer, Freelance Editor, Youth Ministry Consultant Blog: Words on the Word
Abram K-J:paying someone who offered to "proof read".
Hey, my sister did that for an educational software firm for many years ...
Orthodox Bishop Hilarion Alfeyev: "To be a theologian means to have experience of a personal encounter with God through prayer and worship."
MJ. Smith: Abram K-J:paying someone who offered to "proof read". Hey, my sister did that for an educational software firm for many years ...
proof read vs proofread ;)I was one for a few years in an advertising firm. You can't tell it by my typing in general however...
L2 lvl4 (...) WORDsearch, L9