Serious problems with Latin OCR in Loeb volumes

I was doing a close reading of Book 7 of Augustine's Confessions using the Latin Loeb volumes, and I was dismayed at the number of OCR errors in that one book alone. The main problem appears to be mistaking "c" and "e", but there are other problems.
This produces problems such as "ct" instead of "et" and "dcus" instead of "deus". Please see the screenshot below, I probably missed a few mistakes here and there.
I then went to my library and did a search for "ct" and found that the problem with poor OCR scans is pretty widespread in the Loeb volumes.
I'm not really sure how Logos could resolve this problem in quality control: are there Latin spell checkers available? But it certainly diminishes the value of Logos' Latin offerings if there are that many mistakes per page..
Comments
-
Yeah, that's an issue. I've often wondered if Latin spell checkers were applied to these scans, but hadn't begun looking closely, yet. I'll be interested to see what answers arise from this question.
0 -
FWIW, the Perseus volumes appear not to have the "ct" problem. It appears here and there, but they tend to be abbreviations, and not misspellings. Given the number of texts scanned by Perseus for their Latin texts, they must have their quality control down. So it should be possible to produce correctly spelled Latin texts en masse.
However, other texts done by Logos (or oursourced?) appear to have the same problem: Summa Theologica, Summa Doctrinae Christianae, old NT apparatuses, etc.
If a text shows up with "ct", you can probably bet that there will be a good number of other OCR errors.
0 -
Faithlife, any comments?
“The trouble is that everyone talks about reforming others and no one thinks about reforming himself.” St. Peter of Alcántara
0 -
Greg--I'm going to be looking into this further. I hope to have a full answer/solution by tomorrow.
Two things until then:
- This resource wasn't an OCR scan. It was double keyed with a diff run against the two files.
- I looked at the typos you reported in the first column and compared them to the original print source. Unfortunately, the characters in the logos edition accurately reflect what is found in the print edition we used as source material.
0 -
Hi Kyle,
Thanks for your message. I find it very surprising that there would be so many spelling mistakes in a Loeb volume, even an early one. Could you point me to the file you were using? I'm looking at my print copy of my Loeb volume (the original Watts translation) reprinted in 2006 but based on the 1912 text, and I'm not seeing any of these spelling mistakes.
You can have a look at a clean copy from 1950 here: https://archive.org/stream/staugustinesconf01augu#page/332/mode/2up
Page 332, no spelling mistakes.
0 -
Kyle G. Anderson said:
I looked at the typos you reported in the first column and compared them to the original print source. Unfortunately, the characters in the logos edition accurately reflect what is found in the print edition we used as source material.
Greg F said:I'm looking at my print copy of my Loeb volume (the original Watts translation) reprinted in 2006 but based on the 1912 text, and I'm not seeing any of these spelling mistakes.
Looks to me like FL used a print book that was itself an OCR edition. Is that the case?
“The trouble is that everyone talks about reforming others and no one thinks about reforming himself.” St. Peter of Alcántara
0 -
Kyle G. Anderson said:
Greg--I'm going to be looking into this further. I hope to have a full answer/solution by tomorrow.
Two things until then:
- This resource wasn't an OCR scan. It was double keyed with a diff run against the two files.
- I looked at the typos you reported in the first column and compared them to the original print source. Unfortunately, the characters in the logos edition accurately reflect what is found in the print edition we used as source material.
This is one case where I think Logos should set aside its usual policy of not correcting errors found in the original since we all know that Augustine would not have made such errors. The original should therefore be Augustine rather than the print edition used.
george
gfsomselיְמֵי־שְׁנוֹתֵינוּ בָהֶם שִׁבְעִים שָׁנָה וְאִם בִּגְבוּרֹת שְׁמוֹנִים שָׁנָה וְרָהְבָּם עָמָל וָאָוֶן
0 -
George Somsel said:
This is one case where I think Logos should set aside its usual policy of not correcting errors found in the original since we all know that Augustine would not have made such errors. The original should therefore be Augustine rather than the print edition used.
I think the real issue here is why they were using such a poor base text to begin with. I've never heard of a Loeb text having that many spelling mistakes. Classicists are sticklers for these kinds of things, and it would have gotten around by now if there was an Augustine Confessions that was that ugly..
But as I mentioned in my OP, the problem is endemic to the Latin Loeb volumes in the Logos environment, not just to the Confessions. See Jerome's Letters in the second screenshot above.
In my opinion this is less about correcting the Augustine volume than it is about addressing a serious quality control issue that effects multiple Latin texts.
0 -
Greg F said:
Hi Kyle,
Thanks for your message. I find it very surprising that there would be so many spelling mistakes in a Loeb volume, even an early one. Could you point me to the file you were using? I'm looking at my print copy of my Loeb volume (the original Watts translation) reprinted in 2006 but based on the 1912 text, and I'm not seeing any of these spelling mistakes.
You can have a look at a clean copy from 1950 here: https://archive.org/stream/staugustinesconf01augu#page/332/mode/2up
Page 332, no spelling mistakes.
Very interesting. Thank you for providing me that link. We used the same 1950 copy that you provided in the link. However two interesting points emerge: 1) that link is significantly cleaner than the edition we used to key the book. I can see the typos now. I could not before. 2) Unfortunately, we acquired our materials before that archive.org's cleaner copy was available to us.
Because we have a cleaner copy we'll be able to go back and analyze the text and fix the mistakes.
0 -
Thank you, Kyle, for your attention to this. And thank you Greg for bringing it up.
0 -
I just got an update on the two volumes of Augustine's Confessions, and the spelling errors appear to have been fixed in the Latin! Thank you Faithlife!
The problem persists in other volumes (in particular in the Select Letters of St. Jerome), but Augustine's texts were the worst off. As I wrote above, you can find which Loeb volumes probably have OCR errors by searching for "ct", which should generally be "et".
In any case, thank you again for fixing these!
0 -
I might take this opportunity to point out that the fixed spelling of the Latin (thank you!) unfortunately broke the automatic morph tagging on the spell-checked words. I believe the automatic tagging needs to be re-run (?) on this resource.
0 -
Greg F said:
I might take this opportunity to point out that the fixed spelling of the Latin (thank you!) unfortunately broke the automatic morph tagging on the spell-checked words. I believe the automatic tagging needs to be re-run (?) on this resource.
Thanks for pointing that out. Could you give a specific example of where that is happening? It will help me eliminate other potential problems.
FYI: it should be an easy fix as well. If it is what I think it is we'll have an update out with the next batch of updates--a week from Monday.
0 -
Hi Kyle, sorry for my late reply. If you look at the screenshot I posted above, all of the words highlighted in red (that is, the previously misspelled words that were fixed in the last update), appear to not be tagged with the automatic morphology.
Basically everything that was fixed appears to be broken, morphologically speaking.
0 -
Thanks Greg. That's exactly what I needed to know.
I updated the morph tagging. We'll push out the update on October 17.
0