jed-users mailing list

[2007 Date Index] [2007 Thread Index] [Other years]
[Thread Prev] [Thread Next]      [Date Prev] [Date Next]

RE: UTF-8 and Regular Expressions


My 0.02€ here:

On Wed, 2007-04-18 at 09:19 +0200, G. Milde wrote:
> I do have the impression, that it would be quite surprising
> if re_search_forward did in UTF-8 mode pattern "f..r" did match "för"
> (with ö == U+00F6 LATIN SMALL LETTER O WITH DIAERESIS) and a pattern
> "f[oö]r" were invalid (or not matching "för").

I completely agree with this statement... it bit me a couple of week
ago, and I am almost inclined to call it a bug. In jed 0.99.18U, if I
re_search_forward with a pattern "e.e", "eñe" is not matched. If I
search with "e..e", "eñe" IS matched, but the highlighted zone extends
one space left after the final "e". That's quite confusing.

By the way, I discovered that in this same version, if you load
"iso-latin1.sl" when in utf-8 mode, dabbrev stops working (it say
something like "invalid utf-8 sequence" every time you call it). Just a
note so that hapeless people like me can find it googling away...

On Wed, 2007-04-18 at 17:18 +0200, SANGOI DINO LEONARDO wrote:
> Anyways, I use regular expression a lot, and I don't remember ever
> needing or having a problem because of missing utf-8 support.
> 
> But probably I'm not a good test case: in Italian we have very few
> characters outside the ASCII (7 bit) set. 

I can understand, being myself Italian too; but living in Spain and
writing often in Spanish (where ñ and accented chars are much more
common and much more important than in Italian) I noticed it more. 
On the other hand, it's true too that I use re_* mainly when
programming, in pure ASCII, so the problem is effectively quite rare. 

Romano



--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. 

--------------------------
To unsubscribe send email to <jed-users-request@xxxxxxxxxxx> with
the word "unsubscribe" in the message body.
Need help? Email <jed-users-owner@xxxxxxxxxxx>.


[2007 date index] [2007 thread index]
[Thread Prev] [Thread Next]      [Date Prev] [Date Next]