slang-users mailing list

[2004 Date Index] [2004 Thread Index] [Other years]
[Thread Prev] [Thread Next]      [Date Prev] [Date Next]

[slang-users] Re: unicode (was Re: Minor error message change)


Hi,

I've personally had to deal with some of the utf8 issues recently as
the maintainer of DOSEMU. It already internally converts things from the 
DOS character set (say cp437) to an external character set (the LC_CTYPE 
one by default) via Unicode. So I'm looking forward to slang-2.0 as right 
now always some cp437 characters get lost.

John E. Davis wrote:

> I realize that
> converting from one character set to another is more or less a solved
> problem and as such, it is not an issue.  But as Pavel pointed out the
> terminal (xterm, rxvt, etc) is the problem.

As far as I can see "luit" solves a large part of this problem, by 
enforcing the terminal to behave as specified in LC_CTYPE. xterm invokes 
luit automatically, so with xterm -u8, but LC_CTYPE corresponding to 
ISO8859-1 it will still behave like a latin terminal.

> To allow me to deal with UTF-8 encoded files on a non-UTF-8 terminal, I 
> have added the ability to turn on or off support for UTF-8 in the 
> various slang layers.

I wonder if this is necessary. Unless I miss something I would personally 
only distinguish between a "plain 8 bit mode" (the only thing Slang 1.x 
supports), and an LC_CTYPE mode, and not special case UTF-8.

Internally it seems better to do everything using wchar_t, rather than 
UTF-8. If LC_CTYPE is not UTF-8 but your strings are then the C library 
will be thoroughly confused...

Then, to display a UTF-8 file on a latin terminal you could:
1. convert UTF-8 to wchar_t at the moment you read the file.
2. work internally using wchar_t.
3. convert using wcrtomb to the LC_CTYPE set.
4. display the resulting string using SLsmg_write_string() or similar.
(step 3 and 4 could be combined if an wide character version of
 SLsmg_write_string() would exist).
This means that nowhere it is necessary to call nl_langinfo(CODESET), 
nowhere you actually need to *know* that the current set is multibyte, 
UTF-8 or whatever else.

However. What I personnaly miss about the patched Slang that various Linux 
distributors are shipping is that LC_CTYPE is the only way and there is no 
way to do a "straight through" SLsmg_write_string(). For DOSEMU I'd like 
to be able to do this if I can switch the terminal into cp437 mode (as is 
possible on the Linux console).

So suppose I were able to do:
  SLsmg_8bit_enable (1);
then the internal representation could collapse into the current
internal representation, only using an 8bit character and colour inside
the structure, without any UCS conversion.

Bart


_______________________________________________
To unsubscribe, visit http://jedsoft.org/slang/mailinglists.html


[2004 date index] [2004 thread index]
[Thread Prev] [Thread Next]      [Date Prev] [Date Next]