[ACK] BUG: Pashta's tagged as Qadma's (Hebrew Cantillation Interactive)

Page 1 of 1 (19 items)
This post has 18 Replies | 2 Followers

Posts 1934
Forum MVP
Reuben Helmuth | Forum Activity | Posted: Thu, Apr 16 2015 10:47 AM

It seems that the Pashta's are consistently tagged as Qadma (see screen shot (which also shows the problem of the Mahpakh not showing up)).

I'm not sure why this would be because the shape is the same, the two are separate Unicode characters...

HEBREW ACCENT QADMA Unicode: U+05A8, UTF-8: D6 A8 ֙ HEBREW ACCENT PASHTA Unicode: U+0599, UTF-8: D6 99
Posts 1934
Forum MVP
Reuben Helmuth | Forum Activity | Replied: Fri, Apr 17 2015 3:35 AM

Eli or Bradley,

Could one of you verify this bug?

Thanks!

Posts 776
Lew Worthington | Forum Activity | Replied: Fri, Apr 17 2015 4:37 AM

Indeed. I'm not at my PC, but is it this way throughout?

Reuben_BT:

the two are separate Unicode characters

I had no idea.

Posts 791
LogosEmployee
Eli Evans (Faithlife) | Forum Activity | Replied: Fri, Apr 17 2015 5:36 AM

I had to do a little research, but I'll confirm: It's a bug.

Turns out, there are two Unicode characters, but none of our encoded Hebrew texts uses them both. (U+0599 appears, but U+05A8 does not.) This goes all the way back to the 70's when the texts were first keyed in.

So the cantillation tool must use a heuristic to differentiate them. After some (re-)reading, I understand that Pashta is always on the last consonant, and Qadma is placed elsewhere. Where the stressed syllable is not the last consonant but Pashta is required, the mark is doubled, appearing once on the stressed syllable and again on the final consonant to definitely indicate Pashta. (Please correct me if that understanding is wrong. We used Helmut Richter's website as a primary reference, and it is very confusing on this point.)

We are correctly recognizing Pashta on words where the mark is doubled (eg, Isaiah 2:3), but failing when it is a single mark placed on the last consonant (eg, Isaiah 1:11). Half right is also half wrong.

I'll get this filed as a bug. Thank you very much for your attention to detail!

Posts 1934
Forum MVP
Reuben Helmuth | Forum Activity | Replied: Fri, Apr 17 2015 6:26 AM

Eli Evans:
I had to do a little research, but I'll confirm: It's a bug.

Thanks!

Eli Evans:
there are two Unicode characters, but none of our encoded Hebrew texts uses them both.

Not sure how... when I copy either one (from BHW) into character viewer, they get identified correctly. Perhaps it's different in LHB(?)

Eli Evans:

I understand that Pashta is always on the last consonant, and Qadma is placed elsewhere. Where the stressed syllable is not the last consonant but Pashta is required, the mark is doubled, appearing once on the stressed syllable and again on the final consonant to definitely indicate Pashta. (Please correct me if that understanding is wrong. We used Helmut Richter's website as a primary reference, and it is very confusing on this point.)

We are correctly recognizing Pashta on words where the mark is doubled (eg, Isaiah 2:3), but failing when it is a single mark placed on the last consonant (eg, Isaiah 1:11). Half right is also half wrong.

It's true that Pashta is always placed on the last consonant, but it doesn't follow that Qadma occurs only elsewhere. The difference is that Pashta is post-positive which means that it'll occur at the very END of the word (not OVER the last consonant, but at the END of it). Qadma, on the other hand will occur directly over the consonant to which it's "attached". Because of Pashta being post-positive (and being forced to occur at the end), whenever a word being governed by pashta has penultimate stress, it gets doubled to indicate this (the primary mark is actually the end one and the secondary one is to indicate the stress). 

After looking more closely through some text (before responding to you, Lew), I discovered more problems. Sorry Eli!!

At first as I started in Gen. 1:1ff. I thought "oh double pashta gets recognized correctly" (vs 2), then I came across vs. 7 and initially thought I'd found a case of a single pashta getting recognized, until I glanced at the text in BHW again and realized that it WAS doubled. Upon looking very closely I found this crazy little dot next to the Dagesh in the Mem. Apparently this is a result of the pashta. If you see the mahpakh is also not showing and the dagesh in the Bet appears "out of focus." This, I assume, is also a result of the accent.

Here's a clipping of the same words from BHW.

Next in vss. 5 & 10 (identical pattern), I found Paseq is listed (it's NOT an accent mark, though an identical line IS a component of L'garmeh & Shalshelet). But then again, I found that Meteg and Maqqef are also listed, even though they aren't properly part of the cantillation system. Given then, that Paseq might continue to be listed (I don't think it should), it should at least be listed in the correct order (after Mahpakh). 

The bigger problem evident in this screenshot is that Pashta got totally ignored (not even identified as Qadma). 

I just noticed that this clipping is perfect for illustrating the different position of Qadma/Pashta!... Both accents occur at a ר, but one can see quite easily that Qadma is centered over the ר, while pashta is not. Assuming the first word ויקרא was spelled ויקר but with the same vowels, the Qadma would be on a final ר like the pashta, but could still be identified as Qadma because of it's position over the consonant.

Blessings to you as you work on this bug!

Posts 791
LogosEmployee
Eli Evans (Faithlife) | Forum Activity | Replied: Fri, Apr 17 2015 7:40 AM

Reuben_BT:
It's true that Pashta is always placed on the last consonant, but it doesn't follow that Qadma occurs only elsewhere. The difference is that Pashta is post-positive which means that it'll occur at the very END of the word (not OVER the last consonant, but at the END of it). Qadma, on the other hand will occur directly over the consonant to which it's "attached". Because of Pashta being post-positive (and being forced to occur at the end), whenever a word being governed by pashta has penultimate stress, it gets doubled to indicate this (the primary mark is actually the end one and the secondary one is to indicate the stress).

Ah, okay. That makes sense. Thanks for the primer! That means we can't fix the tool without also fixing the text, which is good to know. (Yes, it uses LHB not BHW. That should be noted in the help text, but isn't. Another thing to fix.)

Reuben_BT:
Upon looking very closely I found this crazy little dot next to the Dagesh in the Mem.

Hm. I am not seeing that "crazy dot," either in the LHB resource or in the cantillation tool. I'm on a PC, which may explain the difference if you're on a Mac?

FWIW, we decided to document marks that are not technically part of the nikudim/trop such as paseq and metheg as a help to students who are not as familiar with the whole system of marks. Even so, I agree that we should list them in the order of appearance. 

Posts 1934
Forum MVP
Reuben Helmuth | Forum Activity | Replied: Fri, Apr 17 2015 11:02 AM

Eli, I don't understand where the discrepancy is...I copy/pasted from LHB into character viewer and bother were recognized correctly (see screenshots).

Perhaps it's a Mac/Windows disparity(?) I am on a Mac with OS X 10.10.3

Eli Evans:
we decided to document marks that are not technically part of the nikudim/trop such as paseq and metheg as a help to students who are not as familiar with the whole system of marks

I thought that's probably what happened. Might I suggest assigning other values to these instead of "conjunctive" in order 'help' but not 'mislead'? Big Smile

Posts 1934
Forum MVP
Reuben Helmuth | Forum Activity | Replied: Fri, Apr 17 2015 11:34 AM

Eli Evans:
I am not seeing that "crazy dot," either in the LHB resource or in the cantillation tool.

Everything is fine in the text, it's only in the cantillation tool that things go "wacky."

So I've tried to track down what's going on... From what I can determine Every time that an accent mark falls on a consonant that has a dagesh (or sin/shin dot) the accent mark converts to a dagesh converts to a dot and is displayed just slightly to the left of the dagesh (or shin dot). This happens irregardless of which accent mark it is.

In the process of my investigation (I REALLY hate to do this! Confused) ...yes...I found other problems.

1) The small vertical dash the occurs on the last word of EVERY verse is a Silluq, even though it looks identical to Meteg. I'm not aware that there is a Unicode distinction, but it wouldn't be hard to write a heuristic for this since it Always occurs Only on the Last word of Every verse. This is an important distinction since Silluq is arguably the most important of all the accents while Meteg is not even part of the accents.

2) While I was happy to discover that the tool had link-sets, I was sorry to find that it didn't work (either following OR leading).

I'm really sorry for heaping on problems! Honestly, I'm only trying to help! IndifferentTongue Tied

Posts 791
LogosEmployee
Eli Evans (Faithlife) | Forum Activity | Replied: Fri, Apr 17 2015 11:43 AM

Reuben_BT:

Eli Evans:
we decided to document marks that are not technically part of the nikudim/trop such as paseq and metheg as a help to students who are not as familiar with the whole system of marks

I thought that's probably what happened. Might I suggest assigning other values to these instead of "conjunctive" in order 'help' but not 'mislead'? Big Smile

We followed Richter's Mechon Mamre paper on this point. http://www.mechon-mamre.org/c/hr/tables.htm#con21

As you see it, does this go beyond "difference of opinion" all the way to "plain wrong"? I'm happy to change it, but I'd  prefer not to disagree with the cited sources unless it's unavoidable. (Or there may be a better source to consult/cite.)

Posts 791
LogosEmployee
Eli Evans (Faithlife) | Forum Activity | Replied: Fri, Apr 17 2015 11:48 AM

Reuben_BT:
1) The small vertical dash the occurs on the last word of EVERY verse is a Silluq, even though it looks identical to Meteg. I'm not aware that there is a Unicode distinction, but it wouldn't be hard to write a heuristic for this since it Always occurs Only on the Last word of Every verse. This is an important distinction since Silluq is arguably the most important of all the accents while Meteg is not even part of the accents.

Easily changed. We're currently listing "Silluq" as "Sof Pasuq + Meteg" and no, I agree that's not very correct. Even according to Richter that's wrong. Smile

Reuben_BT:
2) While I was happy to discover that the tool had link-sets, I was sorry to find that it didn't work (either following OR leading).

Interactives don't support panel linking. It's a known bug that we included the UI for them in L6 when we shouldn't have. (Although we do want to add link sets in the future.)

Reuben_BT:
I'm really sorry for heaping on problems! Honestly, I'm only trying to help! IndifferentTongue Tied

It's very much appreciated! Very much. This is a complex, highly technical subject, and it's extremely helpful to have multiple expert opinions. (I only wish we had gotten it 100% right the first time.) Heap away, because it will be easier to change all of these problems in one fix-er-up session all at once.

Posts 791
LogosEmployee
Eli Evans (Faithlife) | Forum Activity | Replied: Fri, Apr 17 2015 11:57 AM

Reuben_BT:
Eli, I don't understand where the discrepancy is...I copy/pasted from LHB into character viewer and bother were recognized correctly (see screenshots)

I'm going to plead insufficient caffeine, because now I find both U+0599 and U+05A8 in LHB, too. (Thanks for double-checking me, though!)

Posts 1934
Forum MVP
Reuben Helmuth | Forum Activity | Replied: Fri, Apr 17 2015 12:01 PM

Eli Evans:

We followed Richter's Mechon Mamre paper on this point. http://www.mechon-mamre.org/c/hr/tables.htm#con21

As you see it, does this go beyond "difference of opinion" all the way to "plain wrong"? I'm happy to change it, but I'd  prefer not to disagree with the cited sources unless it's unavoidable. (Or there may be a better source to consult/cite.)

Thanks for the link BTW! Very nice! I hadn't seen this one before. The way I read the table/legend I totally agree with it. It looks like the very three that I "complained" about, he lists as "other characters"!

Posts 1934
Forum MVP
Reuben Helmuth | Forum Activity | Replied: Fri, Apr 17 2015 12:02 PM

Eli Evans:
I'm going to plead insufficient caffeine
Big Smile

Posts 791
LogosEmployee
Eli Evans (Faithlife) | Forum Activity | Replied: Fri, Apr 17 2015 12:03 PM

Reuben_BT:
It looks like the very three that I "complained" about, he lists as "other characters"!

Touche! Happy to make that change, then.

Posts 1934
Forum MVP
Reuben Helmuth | Forum Activity | Replied: Fri, Apr 17 2015 12:07 PM

Eli Evans:
Touche!

AHHH! That felt good. LOL

Eli Evans:
Happy to make that change, then.

Appreciate the responsiveness! I'll be leaving you in peace (or pain Wink) now. It's past 10pm here (Israel).

Posts 1934
Forum MVP
Reuben Helmuth | Forum Activity | Replied: Wed, Apr 22 2015 9:39 AM

Eli Evans:
Hm. I am not seeing that "crazy dot," either in the LHB resource or in the cantillation tool. I'm on a PC, which may explain the difference if you're on a Mac?

Hey Eli, I discovered something that might help in tracking down the display issue! When copy/pasting into Pages, I noticed that the same issue showed up where consonants that had both a dagesh/shin dot & an accent simply showed an offset dot or dagesh. I then tried deleting and typing in, which is when I discovered that the issue only occurred when the dagesh/dot was typed first followed by the accent. When TYPING THE ACCENT FIRST, EVERYTHING WAS GREAT! Apparently the order of entry needs to be Consonant>Vowel>Accent>Dagesh/Dot.

Hopefully this helps catch the critter! Big Smile

Posts 791
LogosEmployee
Eli Evans (Faithlife) | Forum Activity | Replied: Wed, Apr 22 2015 9:47 AM

Yes, encoding order is very important for Hebrew fonts, and that's probably it. Good catch, thanks!

Posts 1934
Forum MVP
Reuben Helmuth | Forum Activity | Replied: Wed, May 20 2015 10:21 AM

This seems to still not be fixed in 6.3 Is it still on the table or is it under the rug? Wink

Posts 791
LogosEmployee
Eli Evans (Faithlife) | Forum Activity | Replied: Wed, May 20 2015 5:08 PM

More like on the horizon. Smile

Page 1 of 1 (19 items) | RSS