jed-users mailing list

[2005 Date Index] [2005 Thread Index] [Other years]
[Thread Prev] [Thread Next]      [Date Prev] [Date Next]

How to work in mixed-encoding environments?


Some thoughts...

In a mixed encoding environment JED should be able to handle multiple
encodings. It should have the ability to load a file in any encoding,
display it properly, modify it properly and save it with the same encoding
or convert it to another one if the user wishes so.

There are at least two possibilities:

  1. JED loads each file in a buffer as it is. The buffer has a variable,
  that describes it's current encoding.

  2. JED loads each file in a buffer an converts it to UTF-8. The buffer
  has a variable, that describes the encoding of the original file.

The first case is simpler when loading the file into memory and saving it.
The internal operations get more complicated:

 * conversion upon every display operation

 * conversion when moving data betweeen buffers (copy, bufsubstr, ...)

 * when a user wants to change the coding of the buffer, the
   whole buffer must be re-processed

In the second case, there is only the conversion when loading or saving
the file.

If a user wants to change the encoding of the (saved) file, only one
buffer-variable is changed, but the conversion is still done only at
save-time.
change_buffer_encoding(new_encoding).

The buffer has to be re-processed only if the encoding was wrongly assumed
at load-time. In this case the conversions UTF-8 -> wrong_encoding and
right_encoding -> UTF-8 are performed.
recode_buffer(wrong_encoding, right_encoding).



Automatic detection of file encoding:

*  If a file contains a modeline
     -*- coding: coding_name -*-
   the detection is trivial (but it should be checked?). Such modeline
   already has menaing in Python.

*  If there is no modeline, try to validate the file: is it a valid UTF-8 file?

*  Is the file encoded as ISO10646 / UNICODE / widechar?

*  Assume some reasonable default (default system locale encoding, default
   JED encoding, ...) or ask the user

In case 2, when the encoding is detected, the loaded buffer should be converted.



Encoding conversion

The library libiconv could be used to perform encoding conversions.



Binary files

The binary-loaded files should not be converted at all. They should be
treated as ASCII7 with bytes in range 0x00-0xff displayed as <80> - <FF>.
The buffer-variable that describes the encoding says "binary". The
encoding of such buffers can not be changed.



Marko Mahnic

--------------------------
To unsubscribe send email to <jed-users-request@xxxxxxxxxxx> with
the word "unsubscribe" in the message body.
Need help? Email <jed-users-owner@xxxxxxxxxxx>.


[2005 date index] [2005 thread index]
[Thread Prev] [Thread Next]      [Date Prev] [Date Next]