jed-users mailing list

[2007 Date Index] [2007 Thread Index] [Other years]
[Thread Prev] [Thread Next]      [Date Prev] [Date Next]

Re: Non-ascii chars in UTF-8 mode not bindable


G. Milde <milde@xxxxxxxxxxxxxxxxxxxxx> wrote:
>I did some more testing to track down the problem:

There is really no mystery to what is happening.  The fact is that the
keymap routines use byte-semantics.  In general, a keymap is a series
of 256 element lookup tables.  If the tables were naively expanded
from 256 to the maximum allowable unicode character (~1 million), then
the tables would be unacceptably large.  The work-around that I posted
(and later corrected) avoids this problem.

In the case you considered, the four keys correspond to the following
byte strings:

   Key: ´ : "\c2\b4"
   Key: ¬ : "\c2\ac"
   Key: ° : "\c2\b0"
   Key: ¼ : "\c2\bc"

The default bindings of the bytes 0xc2, 0xb4, 0xac, 0xb0, and 0xbc are
to "self_insert_cmd".  So when the editor sees a byte sequence such as
0xc2 0xb4, it simply inserts both bytes into the buffer.  When in
UTF-8 mode, this combination is interpreted as the single unicode
character '´' (\u{00b4}).

When you bound the the byte-sequence "\c2\b4" to something, that
effectively created a keymap for sequences beginning with 0xc2.  As a
result, 0xc2 was nolonger bound to "self_insert_cmd", and a sequence
such as "\c2\bc" would not do anything since "\bc" is unbound in the
0xc2 based keymap.

At some point, I will integrate the work-around that I posted into the
setkey functions.  I posted the slang version to give others an
immediate solution to the problem, although I suspect only a few will
ever run into this issue.

I hope this clarifies things a bit.

Thanks,
--John

--------------------------
To unsubscribe send email to <jed-users-request@xxxxxxxxxxx> with
the word "unsubscribe" in the message body.
Need help? Email <jed-users-owner@xxxxxxxxxxx>.


[2007 date index] [2007 thread index]
[Thread Prev] [Thread Next]      [Date Prev] [Date Next]