slang-users mailing list
[2004 Date Index] [2004 Thread Index] [Other years]
[Thread Prev] [Thread Next] [Date Prev] [Date Next]
[slang-users] Re: unicode (was Re: Minor error message change)

Subject: [slang-users] Re: unicode (was Re: Minor error message change)
From: "John E. Davis" <davis@xxxxxxxxxxxxx>
Date: Mon, 16 Aug 2004 11:58:34 -0400
[This followup are to comments made by Pavel a while back on the old
 slang mailing  list.  I feel that his comments are right on target and are
 particularly relavent for the upcoming slang2 release.  For this
 reason, the original message is quoted in its entirety.]

Pavel Roskin <proski@xxxxxxx> wrote:
>On Mon, 30 Jun 2003, Goran Koruga wrote:
>
>> Have you looked at this URL :
>> http://www.cl.cam.ac.uk/~mgk25/unicode.html
>
>I'm looking at it now.
>
>> It suggests you use langinfo() to get info about it.
>
>I'm not sure it's relevant to the original question.  You probably mean
>this part:
>
>  To do this, on any X/Open compliant systems, where <langinfo.h> is
>  available, you can use a line such as
>
>    utf8_mode = (strcmp(nl_langinfo(CODESET), "UTF-8") == 0);
>
>  in order to detect whether the current locale uses the UTF-8 encoding.
>  You have of course to add a setlocale(LC_CTYPE, "") at the beginning of
>  your application to set the locale according to the environment
>  variables first.
>
>This is not related to the properties of the terminal.  Most likely it
>just takes the encoding from the LC_CTYPE locale, and failing that it
>probably returns the "default" encoding for the given language/locale
>combination.  As I said in my previous message, LC_CTYPE affects the
>assumed encoding of the data and of the regular expressions supplied by
>the program.
>
>There is no way for libc to know whether the terminal supports UTF-8
>output.
>
>It seems that we need some kind of "summit" on this topic, involving the
>major players in the i18n development and text terminal software on POSIX
>and Linux in particular (e.g. John E. Davis, Thomas Dickey, Bruno Haible,
>Ulrich Drepper, Eric Raymond, Ted Ts'o).  It's time to create a better
>standard.
>
>The existing specifications are unclear and don't address several issues.
>The separation between locale categories is artificial, and it's even more
>artificial when encodings are added to them.
>
>It makes no sense for a user to set encodings for different locale
>categories.  The encodings should be set for the terminal (the only
>setting irrelevant for GUI), for the default text (e.g. if most my files
>are in koi8-r, I set it as default), and maybe for legacy programs that
>use regular expressions but don't specify their charset.
>
>There should be a standard way to check the encoding of the terminal.
>Maybe it should be another capability or another environment variable.
>
>If we don't create such standard, somebody will create it for us, poorly.
>And then we'll waste our time gluing ad-hoc solutions together.
>
>-- 
>Regards,
>Pavel Roskin
>

I think that Pavel is correct regarding the existing locale
specifications and I have encountered its weaknesses in adding UTF-8
support to jed.  For example, I would like to edit and view UTF-8
documents while at the same time deal with documents using other
character sets, particularly iso-latin character sets.  I realize that
converting from one character set to another is more or less a solved
problem and as such, it is not an issue.  But as Pavel pointed out the
terminal (xterm, rxvt, etc) is the problem.  

Since most of the time I deal with an iso-latin character set, I use
an ordinary xterm that has no unicode support.  To allow me to deal
with UTF-8 encoded files on a non-UTF-8 terminal, I have added the
ability to turn on or off support for UTF-8 in the various slang
layers.  For example, writing to the terminal using slang's SLsmg
functions involves at least two interfaces: The higher level SLsmg
layer and the lower-level SLtt layer. In an ordinary xterm,
the SLsmg layer would have UTF-8 support activated but the SLtt layer
would run with UTF-8 support deactivated.  When is started with UTF-8
support, the interpreter will always run in UTF-8 mode.

Currently I use the following bit of code in jed to activate slang's UTF-8
capabilities:

   Jed_UTF8_Mode = SLutf8_enable (-1);
   SLsmg_utf8_enable (1);
   SLinterp_utf8_enable (1);
   Jed_UTF8_Mode = 1;		       /* force jed to use UTF-8 internally */

SLutf8_enable is a slang functions that activates base support for
UTF-8.  Passing -1 as I have done above causes it to use the current
locale.  If the locale indicates UTF-8, or UTF-8 support is forced (by
passing +1 as the argument to SLutf8_enable), then all interfaces will
use UTF-8 unless told otherwise via calls to an interface specific
SLxxx_enable_utf8 function.  In the above, the SLsmg and SLinterp
layers are forced into into UTF-8 mode by calls to the appropriate
layer-specific functions.  Since jed does not call SLtt_utf8_enable,
the terminal itself (SLtt layer) has its UTF-8 support activated
depending upon the locale.

The above outlines how I have handled UTF-8 activation for jed, but
this may not be the best way to do it and may change if better
suggestions come along.  Nevertheless, I wanted to illustrate that the
slang library has provisions for fine-grain control of its UTF-8
support and it should be fairly easy to accomodate a "better standard"
if one arises.

Sometime soon I will announce the availablity of a slang 2 snapshot.
Thanks,
--John

_______________________________________________
To unsubscribe, visit http://jedsoft.org/slang/mailinglists.html
Follow-Ups:
- RE: [slang-users] Re: unicode (was Re: Minor error message change)
  - From: David Somers
[2004 date index] [2004 thread index]
[Thread Prev] [Thread Next] [Date Prev] [Date Next]