jed-users mailing list

[2003 Date Index] [2003 Thread Index] [Other years]
[Thread Prev] [Thread Next]      [Date Prev] [Date Next]

Re: Jed and utf-8... a pre-pre-pre-plea :-)


On Thu, Jun 19, 2003 at 09:15:59AM +0200, Günter Milde wrote:

> If I understood well, doing set_charset("latin1") would also mean that the 
> buffer is saved in latin1 encoding, i.e. although you change bytes in the
> buffer, doing find_file("test.tex"); set_charset("latin1");
> save_buffer() will not change the file.

yes, if test.tex was originally in latin1. I think the problem is if I have some
.tex in latin1, some in latin2, and some in latin9. Maybe we can have a
"raw8bit" mode that does work as if the locale was not utf-8... but that's
tricky at least... 

> Of course one can then have a latex_mode_hook that calls
> set_charset("latin1"), so everything is transparent and fairly automatic.

Best would be a function that search for "\usepackage[charset]{inputenc}"
and set the charset appropriately. 

> 
> > One of the things that I need to think about is an "API" for the
> > charset-mapping functions.  I mentioned one function, "set_charset",
> > but perhaps there should be more.  Can you think of other
> > charset-related functions that may be useful?  "get_charset" comes to
> > mind.

When converting from utf-8 to a 8bit charset, rules differs with the charset
in use. I do not think that you will implement all the conversions, so a
mechanism (hooks) to add conversion functions from a generic charset and
utf-8 and back should be implemented. With the possibility to give errors:
for example, if I type a "long umlaut uppercase U" (TeX \"U, utf-8 0xb0
0xc5), it can be translated to latin2 (0xe1) but _not_ in latin1, where the
same code is for \^U ... in this case for example yudit write an ASCII
\u0170 (unicode value) to the output file. This should could be a point
where a hook could be useful (what to do with character that are
unrepresentable in the requested file charset; that would be really nice for
TeX things, you can fake almost all composing accents with macros, without
the need of a special font). 

I do not know if I had explained myself well... trying to resume (it's a
horrible interface, but I want just to give the idea): 

add_translation_from_charset_to_utf8(charset,f_one);
add_translation_from_utf8_to_charset(charset,f_two,f_three);

f_one get a byte and output a sequence of bytes
f_two get a sequence of byte representing a utf-8 char and output a byte 
      if possible, or call f_three if impossible 
f_three get a sequence of byte representing a utf-8 char that has not
      representation in the charset and do something.

f_one is called when reading a file, the other two  when writing it. 

More trivially, a function to lookup between short ("latin1", "latin9") and
canonical names ("iso-8859-1", "iso-8859-15" respectively). 


> As well as a Buffers>Character_Set menu entry...

Yes. And 

        * a "buffer-hook" like -*-charset: bla-*- in the first lines of the
          file 
        * a %<something> formatting for showing buffer charset (well, it
          should be "charset of the file associated with buffer, but well) in
          status line.

have a nice day,
                   Romano 
    
-- 
Romano Giannetti             -  Univ. Pontificia Comillas (Madrid, Spain)
Electronic Engineer - phone +34 915 422 800 ext 2416  fax +34 915 411 132

--------------------------
To unsubscribe send email to <jed-users-request@xxxxxxxxxxx> with
the word "unsubscribe" in the message body.
Need help? Email <jed-users-owner@xxxxxxxxxxx>.


[2003 date index] [2003 thread index]
[Thread Prev] [Thread Next]      [Date Prev] [Date Next]