John E. Davis spake unto us the following wisdom: > [4:16pm] /tmp>gcc foo.c > [4:17pm] /tmp>./a.out > UTF-8 > [4:17pm] /tmp>printenv LANG > en_US.UTF-8 > > For this reason, I think that something like this might work: > > define get_encoding () > { > if (_slang_utf8_ok) return "UTF-8"; > > variable lang = getenv ("LANG"); > if (lang == NULL) > return NULL; > variable fields = strchop (lang, '.', 0); > if (2 == length (fields)) > return fields[1]; > return NULL; > } Unfortunately, this won't be reliable, for several reasons; one, locale names are up to the system to at least some extent, so they can choose to stuff other information in that space (look at the list of locale on a non-Linux non-386BSD-derived system; they're often weird and wonderful). Two, even on systems with regular locale name syntax like above, the character set is not always present. The "C" locale, for example, is required to exist, and its associated character set will always (if I'm not mistaken) be whatever the current system calls ASCII ("ANSI_X3.4-1968" on recent glibc, "646" on Solaris, etc.). Finally, even when that string does represent the character set in some way, it may not be in the canonical form required by the system iconv. (GNU iconv is pretty liberal in what it accepts, but many other systems are much more strict. UTF-8, UTF8, and utf8 may not all be valid encodings on all systems, for example.) Ethan -- The laws that forbid the carrying of arms are laws [that have no remedy for evils]. They disarm only those who are neither inclined nor determined to commit crimes. -- Cesare Beccaria, "On Crimes and Punishments", 1764
Attachment:
signature.asc
Description: Digital signature