Typos in BHS

Page 1 of 2 (22 items) 1 2 Next >
This post has 21 Replies | 1 Follower

Posts 2465
Lee | Forum Activity | Posted: Wed, Mar 19 2014 9:04 PM

-voth becomes -oth as seen here in both the SESB and WIVU version of BHS.

Posts 433
Vincent Setterholm | Forum Activity | Replied: Wed, Mar 19 2014 11:33 PM

It's not a typo. The typographic distinction between holem-waw and waw+holem, where the latter has the holem dot shifted slightly to the left, is an optional convention. There was no support for it until Unicode 5, and as such it has some compatibility issues (not just at the level of some fonts not supporting the new holem code point - SBL Hebrew does, but also issues relating to how you search the new mark - is the user required to type the correct holem in order to find the word they are looking for or are the marks treated identically for the purposes of indexing/searching? It seems obvious that for 99% of users, 99% of the time, you'd want to treat these marks as the same for everything but display, but do we need to encode a new match syntax to make it possible to find just the one or the other for the people who care? Does '[match all]' really match all, or is this left as an exception to the rule without another flag like '[match all, holem]'? How about for KeyLinking and lexicon navigation - do we need to write additional code to support working around the change? Does the text comparison tool need code to ignore these cosmetic differences when comparing two texts? It might turn out that none of that is terribly difficult to handle, but it would all have to be tested.)

A bigger issue for us is that none of our databases come from Unicode source files and they don't distinguish between the two holem characters, so it would be an exercise for us (OK, me) to determine which character to use where. Now I might be wrong about that - some of the source files might encode the difference, but in a fairly non-obvious, non-Unicode way, where the trick is to look at each holem and ask 'is this on a waw (checking for intervening characters, like dagesh)?' and then treat it differently if the answer is 'yes' because the combination holem-waw is consistently handled with a completely different mark. Or perhaps the source file encodes the different in order (OW vs. WO) and our code carefully obliterates this difference to get the characters in proper Unicode order, which would mean rewriting the code we use to convert the source files to intercept 'WO' differently without breaking the proper conversion of 'OW'. In the absence of a consistent code in the source files, I could easily write a script that could catch 95% or more of these, but then you run into sequences like hireq+yodh+waw+holem and there isn't a programmatic way to know if this is i/yo or iy/wo. If you're lucky, there might be the right type of accent directly on the yodh or the waw which clears the matter up, but if not, I'd have to appeal to some other authority to look up which way to read it. Which, of course, I could do. My point isn't to make this seem unsolvable, just time consuming and I haven't decided if the pain is worth it for a non-standard typographic feature that is only partially supported. In your Genesis example, there is only one way to read that word - it's really only in the tricky sequences where the effort pays off, so an automated solution that only gets it right say 98% of the time isn't really very compelling to me.

But I'll think about it some more. I'm supposed to be getting new SESB files sometime soon, and that'll give me an excuse to look at those source files again and see if they have actually made all the editorial decisions in a way that we simply ignored because before Unicode 5 there was only one proper way to handle this. If so, maybe I'll try building it with two different holem code points and see how well it works in our system.

 

Posts 2465
Lee | Forum Activity | Replied: Thu, Mar 20 2014 12:18 AM

I had a peek at a certain bible application (a 2004 version), and see that Gen 37:10 is set out correctly.

It's not my place to confirm whether their database is Unicode-kosher or not. If it is, I'm not sure whether their database was only patched post-Unicode 5 (2006) to show it correctly (what I know is that database patches come fast and furious), Fact is, they've got the text right.

I expected a "thanks for the typo report". Never expected the kind of reply I got. You want to market a "program for scholars etc..." like that?

Vincent Setterholm:
I could easily write a script that could catch 95% or more of these, but then you run into sequences like hireq+yodh+waw+holem and there isn't a programmatic way to know if this is i/yo or iy/wo. If you're lucky, there might be the right type of accent directly on the yodh or the waw which clears the matter up, but if not, I'd have to appeal to some other authority to look up which way to read it.

I'd be happy enough with an accurate digital representation of the printed version!

Vincent Setterholm:
My point isn't to make this seem unsolvable, just time consuming and I haven't decided if the pain is worth it for a non-standard typographic feature that is only partially supported.

Vav-holem-tav a partially supported typographic feature? I thought rather naively that Logos is here to save my time, not Logos'.

Posts 433
Vincent Setterholm | Forum Activity | Replied: Thu, Mar 20 2014 3:05 AM

Sorry, Lee. I mistakenly thought you'd find the details interesting. I didn't mean to give the impression that I was minimizing your request. I did say I'd look into it.

Posts 418
davidphillips | Forum Activity | Replied: Thu, Mar 20 2014 3:14 AM

Vincent,

Thanks so much for the detailed response! One of my favorite things about Logos as a company is that you guys do such a good job of giving thoughtful and thorough responses to questions that are asked. It really is appreciated. Keep up the great work!

Posts 2465
Lee | Forum Activity | Replied: Thu, Mar 20 2014 4:51 AM

Vincent Setterholm:

Sorry, Lee. I mistakenly thought you'd find the details interesting. I didn't mean to give the impression that I was minimizing your request. I did say I'd look into it.

Apology accepted. I did find the details interesting. Unicode has had a fun ride with many languages! But some of the thinking expressed in your post, I'll just state I'm not in agreement with. I do hope something can be done to make the database more precise, as precise as possible, in due course.

Posts 465
Brian W. Davidson | Forum Activity | Replied: Thu, Mar 20 2014 6:23 AM

Vincent, thanks for the details. VERY interesting.

Posts 433
Vincent Setterholm | Forum Activity | Replied: Thu, Mar 20 2014 7:05 AM

Lee:

But some of the thinking, I'll just say I'm not in agreement with.

OK, I'll try to make my thinking a bit clearer.

Flip to Google and enter the following searches:

מִצְוֹת

מִצְוֺת

The first one gets 'about 81,400' hits, the second gets 1,230 hits. Both mark orders (U+05D5, U+05B9 and U+05D5, U+05BA) are technically supported by the Unicode standard (as of version 5), but the first (the way we have it now) is far more common and despite Google's massive development team, they obviously haven't solved the 'problem' that these are two valid ways of encoding the same word (both searches should return the same count: 'about 82,600'). That's what I meant by 'partially supported'. In most areas of computer development, 8 years is an eternity, but support for complex, minority scripts moves at a different pace. Of course, as a bible software company, we care about biblical Hebrew more than Google does, and we could fix the things under our control whether or not the resulting text plays well with other software. But the point of my first paragraph was 1) the way we have it is actually standard (in all three senses: it adheres to the Unicode standard - as does the alternative, it's the way most people encode these letters and it's what most print books do - grab a print copy of BDB, for example, and you won't find any orthographic variation between holem-waw and waw+holem), and 2) implementing the less common orthography would require some testing and possibly some coding to make sure we weren't wrecking other parts of the user experience by adopting this convention.

I guess I can understand a little why you misunderstood my comment about time; you don't know me. If you did, you'd realize the irony of thinking that I wouldn't do something just because it would take a long time. I'm the guy who spent 9 months lining up different bibles verse by verse so that Logos 4 could do a bang-up job of syncing them. However, I'm sure you can appreciate that there are trade-offs that have to be made, both for my time and for development time. Maybe it was bad form for me to mention it? You may have a point that I'm no good at marketing.

Posts 2465
Lee | Forum Activity | Replied: Thu, Mar 20 2014 7:15 AM

And I'll state my way of thinking.

Forget Google. Just look at any print copy of BHS. They have a typo? Follow it. They print holem-vav? That's the way it will show up. They print vav-holem? By golly that's what we'll do. That's the gold standard.

If I need to draw any comparison about relative standards, I would not look at a search engine. I would look at some rival product(s) or contemporary language references.

My point is not that you're no good at marketing. I didn't mean to take a dig at you personally, and I am sorry if I came across that way. My point is that if the marketing is consistent, and considering Unicode is not really an issue, it needs to be corrected.

Posts 433
Vincent Setterholm | Forum Activity | Replied: Thu, Mar 20 2014 8:03 AM

Leaving aside that I could find fifty contemporary language references on my shelf that don't follow the BHS convention, let me ask you this:

Would you still advise making the change in the BHS SESB if as a result you could no longer right-click the affected words, run a search on your entire library, and expect to see ANY hits outside of that Bible? If the answer is 'yes', then we do have a difference of opinion, because I don't think it's worth breaking basic functionality over a minor orthographic difference. If you answer 'no', then we're actually on the same page - it's a desirable change as long as it doesn't break basic functionality.

The Google example just illustrates that this is something that has to be tested with any off-the-shelf components we use in our development (on five different platforms) and so at that level, it doesn't matter that the Unicode Standard specifies either encoding if the tools for processing Unicode don't play well with the newer code point.

Posts 2465
Lee | Forum Activity | Replied: Thu, Mar 20 2014 8:18 AM

Vincent Setterholm:

Would you still advise making the change in the BHS SESB if as a result you could no longer right-click the affected words, run a search on your entire library, and expect to see ANY hits outside of that Bible? If the answer is 'yes', then we do have a difference of opinion, because I don't think it's worth breaking basic functionality over a minor orthographic difference. If you answer 'no', then we're actually on the same page - it's a desirable change as long as it doesn't break basic functionality.

That's a relevant consideration. It used to be, for example, that in early days Adobe Reader had problems with perfectly kosher Unicode content, even Latin character ligatures.

However, they've got it fixed a few years ago. Somehow or other, it's no longer an issue, as long is there is Unicode compliance.

I also see that the issue doesn't affect a particular piece of software that we both don't want to talk about.

You invited me to see things from your perspective. I'd like to turn it around.

A user runs a search, and it does not distinguish vav-holem, holem-vav? Alright, that may not be so important. And the actual event that led up to this. He exports the verse, and it gets printed. And later, embarassingly, the typo is seen by some people. Where did I get this? People are interested to know.

Now put yourself in my shoes, and tell me how I can convince experts that Logos is a serious piece of software if this orthographic representation stays despite the company being aware of it. It is NOT a typo? BDB? Google? Search compatibility? 50 contemporary language references say it doesn't matter?

And how do I answer people who advise: don't rely too much on the software if you're into serious work?

Posts 9945
George Somsel | Forum Activity | Replied: Thu, Mar 20 2014 8:56 AM

Lee:

Vincent Setterholm:

Would you still advise making the change in the BHS SESB if as a result you could no longer right-click the affected words, run a search on your entire library, and expect to see ANY hits outside of that Bible? If the answer is 'yes', then we do have a difference of opinion, because I don't think it's worth breaking basic functionality over a minor orthographic difference. If you answer 'no', then we're actually on the same page - it's a desirable change as long as it doesn't break basic functionality.

That's a relevant consideration. It used to be, for example, that in early days Adobe Reader had problems with perfectly kosher Unicode content, even Latin character ligatures.

However, they've got it fixed a few years ago. Somehow or other, it's no longer an issue, as long is there is Unicode compliance.

I also see that the issue doesn't affect a particular piece of software that we both don't want to talk about.

Now, let's look at it through the customer's eyes. He runs a search, and it does not distinguish vav-holem, holem-vav? Alright, that may not be so important. And the actual event that led up to this. He exports the verse, and it gets printed. And later, embarassingly, the typo is seen by some people. Where did I get this? People are interested to know.

Now put yourself in my shoes, and tell me how I can convince experts that Logos is a serious piece of software if this orthographic representation stays despite the company being aware of it. It is NOT a typo? BDB? Google? Search compatibility? 50 contemporary language references say you're wrong?

And how do I answer people who advise: don't rely too much on the software if you're into serious work?

My advice to you is to actually LOOK at the text and determine the order.  It shouldn't take more than a moment to determine which is correct.  If what you copied doesn't appear to be correct, change it.  It has taken more time to discuss the matter than it would take to correct the text.

george
gfsomsel

יְמֵי־שְׁנוֹתֵינוּ בָהֶם שִׁבְעִים שָׁנָה וְאִם בִּגְבוּרֹת שְׁמוֹנִים שָׁנָה וְרָהְבָּם עָמָל וָאָוֶן

Posts 2465
Lee | Forum Activity | Replied: Thu, Mar 20 2014 8:59 AM

If you want to offer real advice, George, stick to the issues or offer some substantiation please.

The discussion taking place between myself and V.S. is whether this typo is a typo at all, or whether it is something that we can all live with, or the extent to which we can expect conformance with texts and standards. The usual, civil discussion.

If on the other hand you do not have real help or goodwill to offer, please take your own advice that you rudely threw at me in another thread. It was so rude, I won't even repeat it.

Posts 9945
George Somsel | Forum Activity | Replied: Thu, Mar 20 2014 9:03 AM

Lee:

If you want to offer real advice, George, stick to the issues or offer some substantiation please.

The discussion taking place between myself and V.S. is whether this typo is a typo at all, or whether it is something that we can all live with.

If you do not have substantial help or goodwill to offer, please take your own advice that you rudely threw at me in another thread. It was so rude, I won't even repeat it.

But not nearly so rude as you.

george
gfsomsel

יְמֵי־שְׁנוֹתֵינוּ בָהֶם שִׁבְעִים שָׁנָה וְאִם בִּגְבוּרֹת שְׁמוֹנִים שָׁנָה וְרָהְבָּם עָמָל וָאָוֶן

Posts 433
Vincent Setterholm | Forum Activity | Replied: Thu, Mar 20 2014 9:05 AM

Re: 'typo'. To quote the Princess Bride, "You keep using that word. I do not think it means what you think it means." :)  I'm suggesting that it is an orthographic variant. Hebrew is full of those. You can spell 'Moses' without a holem at all, and it isn't a 'typo' - the holem can be 'invisible' when next to a sin or shin dot, following one orthographic convention. Not all Bibles (or fonts, or for that matter manuscripts) shift the furtive patach or the hireq under the mem in Jerusalem to the right. Not doing so is an orthographic variant, but it isn't a 'typo'. (Though in the case of 'Jerusalem', in a digital library I would consider it a 'typo' if the reason the hireq wasn't shifted was because of the order in which the characters were encoded, since that could break a search - a similar case could be made for standardizing 'Moses' in a digital edition no matter which convention a particular print version followed, since that spelling change could affect search results as well. But unfortunately you can't fix that one programmatically, or you'll wipe out the instances in some grammars that are trying to illustrate that very convention!) I haven't tried to conduct a statistical survey, but I'd hazard that most Hebrew reference works don't follow the BHS convention re: waw+holem, and no one seems bothered by it. To my knowledge, we've never gotten a typo report on a lexicon or grammar for it not following the BHS convention. Scholars are used to seeing it either way. The world keeps spinning. 

In your scenario, you started with 'he runs a search'. How did he run the search? Did he type it in? I'd guess that the majority of Unicode Hebrew keyboards in the world (starting with all the Israeli Standard keyboards) don't support the new(ish) holem code point, so how did he type it in (if he was only looking for waw+holem)? There are no two words that are only distinguished from each other by which holem dot you use, so how did our hypothetical customer get a wrong search hit? (I know: you didn't say he did get bad search results, but rather that the correct hits appeared but displayed with a 'typo'. But since the two dots never distinguish between words, it is actually easier to search if you only use the more common code point - unless your indexer is smarter than Google's and finds both holems no matter which one you key in.)

For the record, despite Google's failure, I don't actually know that doing something smarter here will be hard for us. I haven't looked into it yet. It could be simple, it could be a frightful mess. I'm withholding judgment.

Posts 2465
Lee | Forum Activity | Replied: Thu, Mar 20 2014 9:13 AM

Vincent, somebody is now crowding into this thread, and intentionally fanning a flame-war.

So I'll stick to what you're saying and end right here.

For me it's plain simple. A typo, in the sense of a digitization of a physical published work, is a misrepresentation of that work. The work in question here is not any other work, but BHS, which scrupulously observes these infinitesimally fine distinctions.

For me, if this misrepresentation comes direct from the publisher then maybe, arguably it's not a typo (I suppose, I don't really know?). If not, a rose by any other name...

Speaking as an end-user, it is comforting for me that another piece of software seems to have it sorted out.

Now, if you'll excuse me, I have a presentation to prepare. Have a blessed day. Smile

Posts 9945
George Somsel | Forum Activity | Replied: Thu, Mar 20 2014 9:32 AM

Lee:
Vincent, somebody is now crowding into this thread, and intentionally fanning a flame-war.

In case no one has thus far informed you, when you post to the forum it is open for all to comment.  If you don't want someone else "crowding" in, write an email.  As for "fanning a flame-war", what I wrote was simply common-sense advice, but perhaps you have no common-sense.

george
gfsomsel

יְמֵי־שְׁנוֹתֵינוּ בָהֶם שִׁבְעִים שָׁנָה וְאִם בִּגְבוּרֹת שְׁמוֹנִים שָׁנָה וְרָהְבָּם עָמָל וָאָוֶן

Posts 1674
Paul N | Forum Activity | Replied: Thu, Mar 20 2014 9:48 AM

Could some form of visual filter be a work around for Lee?

Posts 2465
Lee | Forum Activity | Replied: Sun, Mar 30 2014 9:56 AM

Revisiting this topic for a while.

Vincent seems to be saying that if the database is rectified it could interfere with the Logos search engine convention. I think that's putting the cart before the horse, design-wise.

Also, I don't think a comparison with Google is all that valid too. Google searches aim to be inclusive, to catch typos and variants. So if you search colour you're going to get entries for color as well. That doesn't mean that you should digitize a book with "colour" in the original to "color", or vice versa. (And in BHS -oth and -voth are clearly distinguished.)

Oh well, I've grown used to this now, and take the advice of my luddite friends with more respect.

Posts 433
Vincent Setterholm | Forum Activity | Replied: Sun, Mar 30 2014 11:39 AM

Lee:

Vincent seems to be saying that if the database is rectified it could interfere with the Logos search engine convention. I think that's putting the cart before the horse, design-wise.

That's not a fair summary of what I've written. You're making it sound as if I'm making excuses for not looking into this when I've repeatedly written that I will. Furthermore, the very definition of 'putting the cart before the horse, design-wise' would be to make this, frankly cosmetic, change to the resources without first making sure that our architecture supports it.

I was quite happy to give you the last word on this thread. I don't know what this new post was supposed to contribute to the discussion. In hindsight, I should have kept it simple: Thank you for your suggestion. We appreciate it.

Page 1 of 2 (22 items) 1 2 Next > | RSS