- Subject: Re: some UTF8 problems, one fixed
- From: "G. Milde" <milde@xxxxxxxxxxxxxxxxxxxxx>
- Date: Tue, 3 Jun 2008 09:37:16 +0200
Dear John,
thanks a lot for the fast reply.
On 29.05.08, John E. Davis wrote:
> G. Milde <milde@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > a) dabrev() does not work with words containing umlauts (or other 2-byte
> > chars) (Jörg reported this).
> I must have overlooked this problem. Try using the latest snapshot
> from the svn repository.
It works like a charm, thanks.
> > b) quoted_insert inserts just one byte
> > so M-x quoted-insert ² is inserted as Â� (where the last character is
> > illegal in utf8).
> I do not regard this as a bug since the purpose of this function is to
> allow bytes to be inserted into the buffer.
However, in UTF8 mode, quoted-insert inserts 2 bytes into the buffer
if the next char in the input stream is > 127, i.e.
M-x quoted-insert §
inserts the bytes [195, 130, 167] while the input char is [194, 167].
If the purpose would be the direct insertion of *bytes*, I would expect it
to insert the first byte as-is even in UTF8 mode.
OTOH, I would prefer if quoted_insert could be used for inserting
*characters*, as my most frequent use-case up to now was the insertion of
characters by simple keypress overriding the key-binding. A famous
example beeing the binding of "quoted_insert" itself, but also the "smart
quotes", bra and ket {}, or control-keys.
> If you want to quoted-insert a specific character code then prefix the
> call to quoted_prefix using the character decimal code, e.g.,
> ESC 1 7 8 `
> will cause a ² to be inserted, since its decimal code is 178.
For this I need to know the decimal code, while typing
`§
to override the binding of § is far more easy to remember.
My most urgent problem is, that the backtick ` is heavily used in
reStructuredText syntax and hence I would like to use another key for the
quoted_prefix. My choice of the degree sign ° worked nice in latin-1
encoded texts but with utf8 I can no longer pressing °° results in Â�.
OTOH, the set of rarely used 7-bit characters that are accessible from a
German keyboard is restricted (well, rather empty).
> > c) self_insert_cmd sometimes inserts an invalid (latin-1) char
> > (setkey(" <ch>", "<ch>"); works)
> I am unable to reproduce this. Do you have a specific example that
> illustrates the problem.
After evaluating
foreach $1 ([160:255])
setkey("self_insert_cmd", char($1));
all characters in the range 192:255 insert themself as invalid one-byte
chars (a-la latin1) while characters below 160:191 insert as valid utf8
encoded two-byte chars. In other words, the prefix  works nice while the
prefix à [195] is missing.
Tested with `xjed -n`, Jed Version: 0.99.18,
S-Lang Version: 2.1.3 on Debian/testing
> > d) as soon as one 2-Byte character is defined (with setkey(),
> > definekey(), ...), most (all?) other characters that start with the
> > same byte in utf8 encoding loose their self-insert default behaviour.
> That is the way the keymaps work--- they use byte-semantics.
This is a pity, but as a workaround exists I can live with it.
Thanks for your patience,
Günter
--------------------------
To unsubscribe send email to <jed-users-request@xxxxxxxxxxx> with
the word "unsubscribe" in the message body.
Need help? Email <jed-users-owner@xxxxxxxxxxx>.
[2008 date index]
[2008 thread index]
[Thread Prev] [Thread Next]
[Date Prev] [Date Next]