jed-users mailing list

[2007 Date Index] [2007 Thread Index] [Other years]
[Thread Prev] [Thread Next]      [Date Prev] [Date Next]

Re: UTF-8 and Regular Expressions


Hello G.,

"G. Milde" <milde@xxxxxxxxxxxxxxxxxxxxx> wrote:
> On 11.04.07, John E. Davis wrote:
>> In testing, one problem has come up: When used in UTF-8 mode, PCRE
>> cannot tolerate malformed text.  This can be a problem when jed is
>> running in UTF-8 mode, but one is editing text in some other encoding,
>> e.g., ISO-Latin-1.
>
> However, when editing text in Jed-U, "the right thing" would be to
> convert it transparently to UTF-8 in a find_file_hook and re-convert back
> when saving (analog to compress.sl). 

But UTF‐8 has also malformed sequences. see UTF-8-test.txt.

What would happen, if the PCRE see such a malformed sequence? Would jed
die? Isn't it possible, that an exception (InvalidUTF8Error) is thrown?

> Conversion could be done by `iconv`, `recode` or (from|to latin-1) a

Are those converters accessable via SLang? (I talk about iconv(3)). That
would be great, but I doubt it. They aren't available everywhere.

Bye, Jörg.
-- 
Mathematiker beim Kuchenessen (aus dem wahren Leben):
J: Du überlegst wohl, wie du das Stück optimal teilst?
K: Ja, ich wende gerade den Simplex‐Algorithmus darauf an.
C: Schau mal, da hast du schon vier Ecken.

--------------------------
To unsubscribe send email to <jed-users-request@xxxxxxxxxxx> with
the word "unsubscribe" in the message body.
Need help? Email <jed-users-owner@xxxxxxxxxxx>.


[2007 date index] [2007 thread index]
[Thread Prev] [Thread Next]      [Date Prev] [Date Next]