slang-users mailing list

[2004 Date Index] [2004 Thread Index] [Other years]
[Thread Prev] [Thread Next]      [Date Prev] [Date Next]

Re: [slang-users] Re: unicode (was Re: Minor error message change)


Bart Oldeman <bartoldeman@xxxxxxxxxxxxxxxxxxxxx> wrote:
>I've personally had to deal with some of the utf8 issues recently as
>the maintainer of DOSEMU. It already internally converts things from the 
>DOS character set (say cp437) to an external character set (the LC_CTYPE 
>one by default) via Unicode. So I'm looking forward to slang-2.0 as right 
>now always some cp437 characters get lost.

You will definitely want to grab the first snapshot when it is
available to see how well it compiles against DOSEMU.  For one thing,
DOSEMU makes use of the SLtt_Use_Blink_For_ACS hack that was necessary
because slang 1 only supported up to 256 "color objects".  At the
moment SLang 2 raises this limit to 512 making SLtt_Use_Blink_For_ACS
unnecessary and unsupported.  Eventually I intend remove the arbitrary
limit on color objects altogether.

>As far as I can see "luit" solves a large part of this problem, by 
>enforcing the terminal to behave as specified in LC_CTYPE. xterm invokes 
>luit automatically, so with xterm -u8, but LC_CTYPE corresponding to 
>ISO8859-1 it will still behave like a latin terminal.

Keep in mind that slang is more than the SLsmg/SLtt interfaces.  It
also has a number of other features such as the interpreter, the
SLsearch interface, etc.  I can imagine some slang-based applications
that do not yet handle unicode everywhere internally but want unicode
support in the screen management and not, e.g., in the interpreter.

>> To allow me to deal with UTF-8 encoded files on a non-UTF-8 terminal, I 
>> have added the ability to turn on or off support for UTF-8 in the 
>> various slang layers.
>
>I wonder if this is necessary. Unless I miss something I would personally 
>only distinguish between a "plain 8 bit mode" (the only thing Slang 1.x 
>supports), and an LC_CTYPE mode, and not special case UTF-8.

SLang 2 will support "plain 8 bit mode" and UTF-8--- no other
character sets.  And it will have this support on all OSs supported by
the interpreter.

>Internally it seems better to do everything using wchar_t, rather than 
>UTF-8. If LC_CTYPE is not UTF-8 but your strings are then the C library 
>will be thoroughly confused...

I did not pursue this approach because I felt that it would have been
too much of a performance hit on the interpreter.  For example,
applications using the interpreter may implement intrinsic functions
that take string arguments.  SLang would be converting between char
and wchar_t all the time.

>However. What I personnaly miss about the patched Slang that various Linux 
>distributors are shipping is that LC_CTYPE is the only way and there is no 
>way to do a "straight through" SLsmg_write_string(). For DOSEMU I'd like 
>to be able to do this if I can switch the terminal into cp437 mode (as is 
>possible on the Linux console).
>
>So suppose I were able to do:
>  SLsmg_8bit_enable (1);
>then the internal representation could collapse into the current
>internal representation, only using an 8bit character and colour inside
>the structure, without any UCS conversion.

That is what SLsmg_utf8_enable(0) will provide.  It tells the SLsmg
interface that strings passed to it are simple 8bit characters.

Thanks for the feedback.
--John

_______________________________________________
To unsubscribe, visit http://jedsoft.org/slang/mailinglists.html


[2004 date index] [2004 thread index]
[Thread Prev] [Thread Next]      [Date Prev] [Date Next]