jed-users mailing list

[2008 Date Index] [2008 Thread Index] [Other years]
[Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: some UTF8 problems, one fixed

Subject: Re: some UTF8 problems, one fixed
From: "G. Milde" <milde@xxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 3 Jun 2008 09:37:16 +0200

Dear John,

thanks a lot for the fast reply.

On 29.05.08, John E. Davis wrote:
> G. Milde <milde@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > a) dabrev() does not work with words containing umlauts (or other 2-byte
> >    chars) (Jörg reported this).

> I must have overlooked this problem.  Try using the latest snapshot
> from the svn repository.

It works like a charm, thanks.

> > b) quoted_insert inserts just one byte
> >    so M-x quoted-insert ² is inserted as Â� (where the last character is
> >    illegal in utf8).

> I do not regard this as a bug since the purpose of this function is to
> allow bytes to be inserted into the buffer.

However, in UTF8 mode, quoted-insert inserts 2 bytes into the buffer 
if the next char in the input stream is > 127, i.e. 

  M-x quoted-insert §          

inserts the bytes [195, 130, 167] while the input char is [194, 167].

If the purpose would be the direct insertion of *bytes*, I would expect it
to insert the first byte as-is even in UTF8 mode.

OTOH, I would prefer if quoted_insert could be used for inserting
*characters*, as my most frequent use-case up to now was the insertion of
characters by simple keypress overriding the key-binding. A famous
example beeing the binding of "quoted_insert" itself, but also the "smart
quotes", bra and ket {}, or control-keys.

> If you want to quoted-insert a specific character code then prefix the
> call to quoted_prefix using the character decimal code, e.g., 

>    ESC 1 7 8 `

> will cause a ² to be inserted, since its decimal code is 178.

For this I need to know the decimal code, while typing 
 `§ 
to override the binding of § is far more easy to remember.

My most urgent problem is, that the backtick ` is heavily used in
reStructuredText syntax and hence I would like to use another key for the
quoted_prefix. My choice of the degree sign ° worked nice in latin-1
encoded texts but with utf8 I can no longer pressing °° results in Â�.
OTOH, the set of rarely used 7-bit characters that are accessible from a
German keyboard is restricted (well, rather empty).

> > c) self_insert_cmd sometimes inserts an invalid (latin-1) char
> >    (setkey(" <ch>", "<ch>"); works)

> I am unable to reproduce this.  Do you have a specific example that
> illustrates the problem.

After evaluating

foreach $1 ([160:255])
   setkey("self_insert_cmd", char($1));

all characters in the range 192:255 insert themself as invalid one-byte
chars (a-la latin1) while characters below 160:191 insert as valid utf8
encoded two-byte chars. In other words, the prefix Â works nice while the
prefix Ã [195] is missing.

Tested with `xjed -n`, Jed Version: 0.99.18, 
S-Lang Version: 2.1.3 on Debian/testing

> > d) as soon as one 2-Byte character is defined (with setkey(),
> >    definekey(), ...), most (all?) other characters that start with the
> >    same byte in utf8 encoding loose their self-insert default behaviour.

> That is the way the keymaps work--- they use byte-semantics.

This is a pity, but as a workaround exists I can live with it.

Thanks for your patience,

Günter

--------------------------
To unsubscribe send email to <jed-users-request@xxxxxxxxxxx> with
the word "unsubscribe" in the message body.
Need help? Email <jed-users-owner@xxxxxxxxxxx>.

Follow-Ups:
- Re: some UTF8 problems, one fixed
  - From: John E. Davis

References:
- some UTF8 problems, one fixed
  - From: G. Milde
- Re: some UTF8 problems, one fixed
  - From: John E. Davis

[2008 date index] [2008 thread index]
[Thread Prev] [Thread Next] [Date Prev] [Date Next]