From 390955cbdeb674bead490fc3f74a8a0893ea83cf Mon Sep 17 00:00:00 2001
From: Ulrich Drepper <drepper@redhat.com>
Date: Mon, 11 Jan 1999 20:13:43 +0000
Subject: Update.

1999-01-11  Ulrich Drepper  <drepper@cygnus.com>

	* ctype/Versions [GLIBC_2.0]: Export __ctype32_b.
	* include/wctype.h: Declare __iswctype.
	* stdio-common/vfscanf.c (__vfscanf): Use __iswspace instead of
	iswspace.
	* wctype/Makefile (routines): Add wcextra_l.
	* wctype/wcextra.c (iswblank): Implement function here and don't use
	__iswctype.
	(__iswblank_l):  Move definition to...
	* wctype/wcextra_l.c: ...here.  New file.
	* wctype/wcfuncs.c: Really implement functions and don't call
	__iswctype or __towctrans.
	* wctype/wctype.h: Change isw* and tow* macros.  Don't call
	__iswctype or __towctrans.  Instead optimize constant argument case.

	* iconv/gconv.h: Fix typos.

	* iconv/skeleton.c: Fix typos.  Optimize init function a bit.
	Correctly emit escape sequence to return to initial state in
	conversion function.

	* iconvdata/iso-2022-jp.c (gconv_init): Correctly initialize
	max_needed_to element.

	* manual/mbyte.texi: Removed.  This is now described in charset.texi.
	* manual/charset.texi: New file.
	* manual/Makefile (chapters): Replace mbyte by charset.
	* manual/ctype.texi: Document wide character functions.
	* manual/intro.texi: Fix reference to mbyte chapter.
	* manual/lang.texi: Likewise.
	* manual/locale.texi: Likewise.
	* manual/stdio.texi: Likewise.
	* manual/string.texi: Fix @node line for new charset chapter.
	* manual/libc.texinfo (UPDATED): Updated.  Also update copyright years.
	* manual/memory.texi (savestring): Optimize code to give a good
	example.

	* manual/filesys.texi: Fix wording.  Patches by Jim Meyering.

	* nscd/nscd_getgr_r.c: Include stdint.h to get uintptr_t definition.
	* nscd/nscd_getpw_r.c: Likewise.
	* nscd/nscd_gethst_r.c: Likewise.

	* stdlib/stdtold_l.c: Always include xlocale.h.

1999-01-11  Geoffrey Keating  <geoffk@ozemail.com.au>

	* stdlib/fpioconst.h (LDBL_MAX_10_EXP_LOG): Define to be same as
	DBL_MAX_10_EXP_LOG if there is no long double.
	(_fpioconst_pow10): Always use size as LDBL_MAX_10_EXP_LOG to match
	printf_fp.c.

1999-01-10  Andreas Jaeger  <aj@arthur.rhein-neckar.de>

	* timezone/Makefile ($(testdata)/GB): Changed to ...
	($(testdata)/Europe/London): ... for tst-timezone test.
	($(objpfx)tst-timezone.out): Change GB to Europe/London.

	* timezone/tst-timezone.c (main): Enable DST switching test,
	change GB to Europe/London.

1999-01-10  Philip Blundell  <philb@gnu.org>

	* socket/Makefile (headers): Remove bits/sockunion.h.

1999-01-09  Philip Blundell  <philb@gnu.org>

	* socket/sys/socket.h: Don't include <bits/sockunion.h>.
	* sysdeps/generic/bits/sockunion.h: Deleted.
	* sysdeps/unix/sysv/linux/bits/sockunion.h: Likewise.

1999-01-08  H.J. Lu  <hjl@gnu.org>

	* io/fts.c (fts_close): Don't access memory after having it freed.
---
 ChangeLog                                |   76 +
 ctype/Versions                           |    3 +-
 iconv/gconv.h                            |    6 +-
 iconv/skeleton.c                         |   44 +-
 iconvdata/iso-2022-jp.c                  |    6 +-
 include/wctype.h                         |    6 +
 io/fts.c                                 |   12 +-
 manual/Makefile                          |    4 +-
 manual/chapters.texi                     |    3 +-
 manual/charset.texi                      | 2846 ++++++++++++++++++++++++++++++
 manual/ctype.texi                        |  521 +++++-
 manual/filesys.texi                      |    4 +-
 manual/intro.texi                        |    2 +-
 manual/lang.texi                         |    2 +-
 manual/libc.texinfo                      |    4 +-
 manual/locale.texi                       |    6 +-
 manual/memory.texi                       |    3 +-
 manual/stdio.texi                        |    8 +-
 manual/string.texi                       |    2 +-
 manual/texis                             |    2 +-
 manual/top-menu.texi                     |   70 +-
 nscd/nscd_getgr_r.c                      |    3 +-
 nscd/nscd_gethst_r.c                     |    3 +-
 nscd/nscd_getpw_r.c                      |    3 +-
 socket/Makefile                          |    5 +-
 socket/sys/socket.h                      |    5 +-
 stdio-common/vfscanf.c                   |    4 +-
 stdlib/fpioconst.h                       |   12 +-
 stdlib/strtold_l.c                       |    4 +-
 sysdeps/generic/bits/sockunion.h         |   40 -
 sysdeps/unix/sysv/linux/bits/sockunion.h |   48 -
 timezone/Makefile                        |    7 +-
 timezone/tst-timezone.c                  |   12 +-
 wctype/Makefile                          |    4 +-
 wctype/wcextra.c                         |   18 +-
 wctype/wcextra_l.c                       |   43 +
 wctype/wcfuncs.c                         |   50 +-
 wctype/wctype.h                          |  108 +-
 38 files changed, 3736 insertions(+), 263 deletions(-)
 create mode 100644 manual/charset.texi
 delete mode 100644 sysdeps/generic/bits/sockunion.h
 delete mode 100644 sysdeps/unix/sysv/linux/bits/sockunion.h
 create mode 100644 wctype/wcextra_l.c

diff --git a/ChangeLog b/ChangeLog
index 159bd65c51..0515d68376 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,79 @@
+1999-01-11  Ulrich Drepper  <drepper@cygnus.com>
+
+	* ctype/Versions [GLIBC_2.0]: Export __ctype32_b.
+	* include/wctype.h: Declare __iswctype.
+	* stdio-common/vfscanf.c (__vfscanf): Use __iswspace instead of
+	iswspace.
+	* wctype/Makefile (routines): Add wcextra_l.
+	* wctype/wcextra.c (iswblank): Implement function here and don't use
+	__iswctype.
+	(__iswblank_l):  Move definition to...
+	* wctype/wcextra_l.c: ...here.  New file.
+	* wctype/wcfuncs.c: Really implement functions and don't call
+	__iswctype or __towctrans.
+	* wctype/wctype.h: Change isw* and tow* macros.  Don't call
+	__iswctype or __towctrans.  Instead optimize constant argument case.
+
+	* iconv/gconv.h: Fix typos.
+
+	* iconv/skeleton.c: Fix typos.  Optimize init function a bit.
+	Correctly emit escape sequence to return to initial state in
+	conversion function.
+
+	* iconvdata/iso-2022-jp.c (gconv_init): Correctly initialize
+	max_needed_to element.
+
+	* manual/mbyte.texi: Removed.  This is now described in charset.texi.
+	* manual/charset.texi: New file.
+	* manual/Makefile (chapters): Replace mbyte by charset.
+	* manual/ctype.texi: Document wide character functions.
+	* manual/intro.texi: Fix reference to mbyte chapter.
+	* manual/lang.texi: Likewise.
+	* manual/locale.texi: Likewise.
+	* manual/stdio.texi: Likewise.
+	* manual/string.texi: Fix @node line for new charset chapter.
+	* manual/libc.texinfo (UPDATED): Updated.  Also update copyright years.
+	* manual/memory.texi (savestring): Optimize code to give a good
+	example.
+
+	* manual/filesys.texi: Fix wording.  Patches by Jim Meyering.
+
+	* nscd/nscd_getgr_r.c: Include stdint.h to get uintptr_t definition.
+	* nscd/nscd_getpw_r.c: Likewise.
+	* nscd/nscd_gethst_r.c: Likewise.
+
+	* stdlib/stdtold_l.c: Always include xlocale.h.
+
+1999-01-11  Geoffrey Keating  <geoffk@ozemail.com.au>
+
+	* stdlib/fpioconst.h (LDBL_MAX_10_EXP_LOG): Define to be same as
+	DBL_MAX_10_EXP_LOG if there is no long double.
+	(_fpioconst_pow10): Always use size as LDBL_MAX_10_EXP_LOG to match
+	printf_fp.c.
+
+1999-01-10  Andreas Jaeger  <aj@arthur.rhein-neckar.de>
+
+	* timezone/Makefile ($(testdata)/GB): Changed to ...
+	($(testdata)/Europe/London): ... for tst-timezone test.
+	($(objpfx)tst-timezone.out): Change GB to Europe/London.
+
+	* timezone/tst-timezone.c (main): Enable DST switching test,
+	change GB to Europe/London.
+
+1999-01-10  Philip Blundell  <philb@gnu.org>
+
+	* socket/Makefile (headers): Remove bits/sockunion.h.
+
+1999-01-09  Philip Blundell  <philb@gnu.org>
+
+	* socket/sys/socket.h: Don't include <bits/sockunion.h>.
+	* sysdeps/generic/bits/sockunion.h: Deleted.
+	* sysdeps/unix/sysv/linux/bits/sockunion.h: Likewise.
+
+1999-01-08  H.J. Lu  <hjl@gnu.org>
+
+	* io/fts.c (fts_close): Don't access memory after having it freed.
+
 1998-01-08  Andreas Schwab  <schwab@issan.cs.uni-dortmund.de>
 
 	* manual/Makefile (stamp-summary): Remove space after -t option
diff --git a/ctype/Versions b/ctype/Versions
index 56647bd784..6110f848c8 100644
--- a/ctype/Versions
+++ b/ctype/Versions
@@ -1,7 +1,8 @@
 libc {
   GLIBC_2.0 {
     # global variables
-    __ctype_b; __ctype_tolower; __ctype_toupper; _tolower; _toupper;
+    __ctype_b; __ctype32_b; __ctype_tolower; __ctype_toupper;
+    _tolower; _toupper;
 
     # i*
     isalnum; isalpha; isascii; isblank; iscntrl; isdigit; isgraph; islower;
diff --git a/iconv/gconv.h b/iconv/gconv.h
index 3f787c5e1c..66c34aa928 100644
--- a/iconv/gconv.h
+++ b/iconv/gconv.h
@@ -1,4 +1,4 @@
-/* Copyright (C) 1997, 1998 Free Software Foundation, Inc.
+/* Copyright (C) 1997, 1998, 1999 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -69,7 +69,7 @@ typedef void (*gconv_end_fct) __PMT ((struct gconv_step *));
 struct gconv_step
 {
   struct gconv_loaded_object *shlib_handle;
-  const char *modname;
+  __const char *modname;
 
   int counter;
 
@@ -104,7 +104,7 @@ struct gconv_step_data
   int is_last;
 
   /* Counter for number of invocations of the module function for this
-     desriptor.  */
+     descriptor.  */
   int invocation_counter;
 
   /* Flag whether this is an internal use of the module (in the mb*towc*
diff --git a/iconv/skeleton.c b/iconv/skeleton.c
index 4ed16d6e68..c124eb1e07 100644
--- a/iconv/skeleton.c
+++ b/iconv/skeleton.c
@@ -1,5 +1,5 @@
 /* Skeleton for a conversion module.
-   Copyright (C) 1998 Free Software Foundation, Inc.
+   Copyright (C) 1998, 1999 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
    Contributed by Ulrich Drepper <drepper@cygnus.com>, 1998.
 
@@ -119,7 +119,7 @@ static int to_object;
    character set we we can define RESET_INPUT_BUFFER is necessary.  */
 #if !defined RESET_INPUT_BUFFER && !defined SAVE_RESET_STATE
 # if MIN_NEEDED_FROM == MAX_NEEDED_FROM && MIN_NEEDED_TO == MAX_NEEDED_TO
-/* We have to used these `if's here since the compiler cannot know that
+/* We have to use these `if's here since the compiler cannot know that
    (outbuf - outerr) is always divisible by MIN_NEEDED_TO.  */
 #  define RESET_INPUT_BUFFER \
   if (MIN_NEEDED_FROM % MIN_NEEDED_TO == 0)				      \
@@ -144,26 +144,25 @@ gconv_init (struct gconv_step *step)
 {
   /* Determine which direction.  */
   if (__strcasecmp (step->from_name, CHARSET_NAME) == 0)
-    step->data = &from_object;
-  else if (__strcasecmp (step->to_name, CHARSET_NAME) == 0)
-    step->data = &to_object;
-  else
-    return GCONV_NOCONV;
-
-  if (step->data == &from_object)
     {
+      step->data = &from_object;
+
       step->min_needed_from = MIN_NEEDED_FROM;
       step->max_needed_from = MAX_NEEDED_FROM;
       step->min_needed_to = MIN_NEEDED_TO;
       step->max_needed_to = MAX_NEEDED_TO;
     }
-  else
+  else if (__strcasecmp (step->to_name, CHARSET_NAME) == 0)
     {
+      step->data = &to_object;
+
       step->min_needed_from = MIN_NEEDED_TO;
       step->max_needed_from = MAX_NEEDED_TO;
       step->min_needed_to = MIN_NEEDED_FROM;
       step->max_needed_to = MAX_NEEDED_FROM;
     }
+  else
+    return GCONV_NOCONV;
 
 #ifdef RESET_STATE
   step->stateful = 1;
@@ -210,22 +209,17 @@ FUNCTION_NAME (struct gconv_step *step, struct gconv_step_data *data,
      dropped.  */
   if (do_flush)
     {
-      /* Call the steps down the chain if there are any.  */
-      if (data->is_last)
-	status = GCONV_OK;
-      else
-	{
-#ifdef EMIT_SHIFT_TO_INIT
-	  status = GCONV_OK;
+      status = GCONV_OK;
 
-	  EMIT_SHIFT_TO_INIT;
-
-	  if (status == GCONV_OK)
+#ifdef EMIT_SHIFT_TO_INIT
+      /* Emit the escape sequence to reset the state.  */
+      EMIT_SHIFT_TO_INIT;
 #endif
-	    /* Give the modules below the same chance.  */
-	    status = DL_CALL_FCT (fct, (next_step, next_data, NULL, NULL,
-					written, 1));
-	}
+      /* Call the steps down the chain if there are any but only if we
+         successfully emitted the escape sequence.  */
+      if (status == GCONV_OK && ! data->is_last)
+	status = DL_CALL_FCT (fct, (next_step, next_data, NULL, NULL,
+				    written, 1));
     }
   else
     {
@@ -271,7 +265,7 @@ FUNCTION_NAME (struct gconv_step *step, struct gconv_step_data *data,
 			      data->statep, step->data, &converted
 			      EXTRA_LOOP_ARGS);
 
-	  /* If this is the last step leave the loop, there is nothgin
+	  /* If this is the last step leave the loop, there is nothing
              we can do.  */
 	  if (data->is_last)
 	    {
diff --git a/iconvdata/iso-2022-jp.c b/iconvdata/iso-2022-jp.c
index 36465ccd45..a7ec09b32d 100644
--- a/iconvdata/iso-2022-jp.c
+++ b/iconvdata/iso-2022-jp.c
@@ -1,5 +1,5 @@
 /* Conversion module for ISO-2022-JP.
-   Copyright (C) 1998 Free Software Foundation, Inc.
+   Copyright (C) 1998, 1999 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
    Contributed by Ulrich Drepper <drepper@cygnus.com>, 1998.
 
@@ -149,14 +149,14 @@ gconv_init (struct gconv_step *step)
 	      step->min_needed_from = MIN_NEEDED_FROM;
 	      step->max_needed_from = MAX_NEEDED_FROM;
 	      step->min_needed_to = MIN_NEEDED_TO;
-	      step->max_needed_to = MIN_NEEDED_TO;
+	      step->max_needed_to = MAX_NEEDED_TO;
 	    }
 	  else
 	    {
 	      step->min_needed_from = MIN_NEEDED_TO;
 	      step->max_needed_from = MAX_NEEDED_TO;
 	      step->min_needed_to = MIN_NEEDED_FROM;
-	      step->max_needed_to = MIN_NEEDED_FROM + 2;
+	      step->max_needed_to = MAX_NEEDED_FROM + 2;
 	    }
 
 	  /* Yes, this is a stateful encoding.  */
diff --git a/include/wctype.h b/include/wctype.h
index c76f50c866..f93ec64abc 100644
--- a/include/wctype.h
+++ b/include/wctype.h
@@ -1 +1,7 @@
+#ifndef _WCTYPE_H
+
 #include <wctype/wctype.h>
+
+extern int __iswspace __P ((wint_t __wc));
+
+#endif
diff --git a/io/fts.c b/io/fts.c
index 4ce6527441..cf52d9e299 100644
--- a/io/fts.c
+++ b/io/fts.c
@@ -231,6 +231,7 @@ fts_close(sp)
 {
 	register FTSENT *freep, *p;
 	int saved_errno;
+	int retval = 0;
 
 	/*
 	 * This still works if we haven't read anything -- the dummy structure
@@ -259,15 +260,16 @@ fts_close(sp)
 		(void)__close(sp->fts_rfd);
 	}
 
-	/* Free up the stream pointer. */
-	free(sp);
-
 	/* Set errno and return. */
 	if (!ISSET(FTS_NOCHDIR) && saved_errno) {
 		__set_errno (saved_errno);
-		return (-1);
+		retval = -1;
 	}
-	return (0);
+
+	/* Free up the stream pointer. */
+	free (sp);
+
+	return retval;
 }
 
 /*
diff --git a/manual/Makefile b/manual/Makefile
index e0dad4792c..8eb4d5b69e 100644
--- a/manual/Makefile
+++ b/manual/Makefile
@@ -49,7 +49,7 @@ endif
 mkinstalldirs = $(..)scripts/mkinstalldirs
 
 chapters = $(addsuffix .texi, \
-		       intro errno memory ctype string mbyte locale	\
+		       intro errno memory ctype string charset locale	\
 		       message search pattern io stdio llio filesys	\
 		       pipe socket terminal math arith time setjmp	\
 		       signal startup process job nss users sysinfo conf)
@@ -74,7 +74,7 @@ libc.dvi: texinfo.tex
 # Generate the summary from the Texinfo source files for each chapter.
 summary.texi: stamp-summary ;
 stamp-summary: summary.awk $(filter-out summary.texi, $(texis))
-	$(AWK) -f $^ | sort -t'^L' -df +0 -1 | tr '\014' '\012' > summary-tmp
+	$(AWK) -f $^ | sort -t'' -df +0 -1 | tr '\014' '\012' > summary-tmp
 	$(move-if-change) summary-tmp summary.texi
 	touch $@
 
diff --git a/manual/chapters.texi b/manual/chapters.texi
index a5a8a57903..bf7c4c01e0 100644
--- a/manual/chapters.texi
+++ b/manual/chapters.texi
@@ -3,7 +3,7 @@
 @include memory.texi
 @include ctype.texi
 @include string.texi
-@include mbyte.texi
+@include charset.texi
 @include locale.texi
 @include message.texi
 @include search.texi
@@ -27,6 +27,7 @@
 @include users.texi
 @include sysinfo.texi
 @include conf.texi
+@include ../crypt/crypt.texi
 @include ../linuxthreads/linuxthreads.texi
 @include lang.texi
 @include header.texi
diff --git a/manual/charset.texi b/manual/charset.texi
new file mode 100644
index 0000000000..6179128e3c
--- /dev/null
+++ b/manual/charset.texi
@@ -0,0 +1,2846 @@
+@node Character Set Handling, Locales, String and Array Utilities, Top
+@c %MENU% Support for extended character sets
+@chapter Character Set Handling
+
+@ifnottex
+@macro cal{text}
+\text\
+@end macro
+@end ifnottex
+
+Character sets used in the early days of computers had only six, seven,
+or eight bits for each character.  In no case more bits than would fit
+into one byte which nowadays is almost exclusively @w{8 bits} wide.
+This of course leads to several problems once not all characters needed
+at one time can be represented by the up to 256 available characters.
+This chapter shows the functionality which was added to the C library to
+overcome this problem.
+
+@menu
+* Extended Char Intro::              Introduction to Extended Characters.
+* Charset Function Overview::        Overview about Character Handling
+                                      Functions.
+* Restartable multibyte conversion:: Restartable multibyte conversion
+                                      Functions.
+* Non-reentrant Conversion::         Non-reentrant Conversion Function.
+* Generic Charset Conversion::       Generic Charset Conversion.
+@end menu
+
+
+@node Extended Char Intro
+@section Introduction to Extended Characters
+
+To overcome the limitations of character sets with a 1:1 relation
+between bytes and characters people came up with a variety of solutions.
+The remainder of this section gives a few examples to help understanding
+the design decision made while developing the functionality of the @w{C
+library} to support them.
+
+@cindex internal representation
+A distinction we have to make right away is between internal and
+external representation.  @dfn{Internal representation} means the
+representation used by a program while keeping the text in memory.
+External representations are used when text is stored or transmitted
+through whatever communication channel.
+
+Traditionally there was no difference between the two representations.
+It was equally comfortable and useful to use the same one-byte
+representation internally and externally.  This changes with more and
+larger character sets.
+
+One of the problems to overcome with the internal representation is
+handling text which were externally encoded using different character
+sets.  Assume a program which reads two texts and compares them using
+some metric.  The comparison can be usefully done only if the texts are
+internally kept in a common format.
+
+@cindex wide character
+For such a common format (@math{=} character set) eight bits are certainly
+not enough anymore.  So the smallest entity will have to grow: @dfn{wide
+characters} will be used.  Here instead of one byte one uses two or four
+(three are not good to address in memory and more than four bytes seem
+not to be necessary).
+
+@cindex Unicode
+@cindex ISO 10646
+As shown in some other part of this manual
+@c !!! Ahem, wide char string functions are not yet covered -- drepper
+there exists a completely new family of functions which can handle texts
+of this kinds in memory.  The most commonly used character set for such
+internal wide character representations are Unicode and @w{ISO 10646}.
+The former is a subset of the later and used when wide characters are
+chosen to by 2 bytes (@math{= 16} bits) wide.  The standard names of the
+@cindex UCS2
+@cindex UCS4
+encodings used in these cases are UCS2 (@math{= 16} bits) and UCS4
+(@math{= 32} bits).
+
+To represent wide characters the @code{char} type is certainly not
+suitable.  For this reason the @w{ISO C} standard introduces a new type
+which is designed to keep one character of a wide character string.  To
+maintain the similarity there is also a type corresponding to @code{int}
+for those functions which take a single wide character.
+
+@comment stddef.h
+@comment ISO
+@deftp {Data type} wchar_t
+This data type is used as the base type for wide character strings.
+I.e., arrays of objects of this type are the equivalent of @code{char[]}
+for multibyte character strings.  The type is defined in @file{stddef.h}.
+
+The @w{ISO C89} standard, where this type was introduced, does not say
+anything specific about the representation.  It only requires that this
+type is capable to store all elements of the basic character set.
+Therefore it would be legitimate to define @code{wchar_t} and
+@code{char}.  This might make sense for embedded systems.
+
+But for GNU systems this type is always 32 bits wide.  It is therefore
+capable to represent all UCS4 value therefore covering all of @w{ISO
+10646}.  Some Unix systems define @code{wchar_t} as a 16 bit type and
+thereby follow Unicode very strictly.  This is perfectly fine with the
+standard but it also means that to represent all characters fro Unicode
+and @w{ISO 10646} one has to use surrogate character which is in fact a
+multi-wide-character encoding.  But this contradicts the purpose of the
+@code{wchar_t} type.
+@end deftp
+
+@comment wchar.h
+@comment ISO
+@deftp {Data type} wint_t
+@code{wint_t} is a data type used for parameters and variables which
+contain a single wide character.  As the name already suggests it is the
+equivalent to @code{int} when using the normal @code{char} strings.  The
+types @code{wchar_t} and @code{wint_t} have often the same
+representation if their size if 32 bits wide but if @code{wchar_t} is
+defined as @code{char} the type @code{wint_t} must be defined as
+@code{int} due to the parameter promotion.
+
+@pindex wchar.h
+This type is defined in @file{wchar.h} and got introduced in the second
+amendment to @w{ISO C 89}.
+@end deftp
+
+As there are for the @code{char} data type there also exist macros
+specifying the minimum and maximum value representable in an object of
+type @code{wchar_t}.
+
+@comment wchar.h
+@comment ISO
+@deftypevr Macro wint_t WCHAR_MIN
+The macro @code{WCHAR_MIN} evaluates to the minimum value representable
+by an object of type @code{wint_t}.
+
+This macro got introduced in the second amendment to @w{ISO C89}.
+@end deftypevr
+
+@comment wchar.h
+@comment ISO
+@deftypevr Macro wint_t WCHAR_MAX
+The macro @code{WCHAR_MIN} evaluates to the maximum value representable
+by an object of type @code{wint_t}.
+
+This macro got introduced in the second amendment to @w{ISO C89}.
+@end deftypevr
+
+Another special wide character value is the equivalent to @code{EOF}.
+
+@comment wchar.h
+@comment ISO
+@deftypevr Macro wint_t WEOF
+The macro @code{WEOF} evaluates to a constant expression of type
+@code{wint_t} whose value is different from any member of the extended
+character set.
+
+@code{WEOF} need not be the same value as @code{EOF} and unlike
+@code{EOF} it also need @emph{not} be negative.  I.e., sloppy code like
+
+@smallexample
+@{
+  int c;
+  ...
+  while ((c = getc (fp)) < 0)
+    ...
+@}
+@end smallexample
+
+@noindent
+has to be rewritten to explicitly use @code{WEOF} when wide characters
+are used.
+
+@smallexample
+@{
+  wint_t c;
+  ...
+  while ((c = wgetc (fp)) != WEOF)
+    ...
+@}
+@end smallexample
+
+@pindex wchar.h
+This macro was introduced in the second amendment to @w{ISO C89} and is
+defined in @file{wchar.h}.
+@end deftypevr
+
+
+These internal representations present problems when it comes to storing
+and transmitting them.  Since a single wide character consists of more
+than one byte they are effected by byte-ordering.  I.e., machines with
+different endianesses would see different value accessing the same data.
+This also applies for communication protocols which are all byte-based
+and therefore the sender has to decide about splitting the wide
+character in bytes.  A last but not least important point is that wide
+characters often require more storage space than an customized byte
+oriented character set.
+
+@cindex multibyte character
+This is why most of the time an external encoding which is different
+from the internal encoding is used if the later is UCS2 or UCS4.  The
+external encoding is byte-based and can be chosen appropriately for the
+environment and for the texts to be handled.  There exists a variety of
+different character sets which can be used which is too much to be
+handled completely here.  We restrict ourself here to a description of
+the major groups.  All of the ASCII-based character sets fulfill one
+requirement: they are ``filesystem safe''.  This means that the
+character @code{'/'} is used in the encoding @emph{only} to represent
+itself.  Things are a bit different for character like EBCDIC but if the
+operation system does not understand EBCDIC directly the parameters to
+system calls have to be converted first anyhow.
+
+@itemize @bullet
+@item
+The simplest character sets are one-byte character sets.  There can be
+only up to 256 characters (for @w{8 bit} character sets) which is not
+sufficient to cover all languages but might be sufficient to handle a
+specific text.  Another reason to choose this is because of constraints
+from interaction with other programs.
+
+@cindex ISO 2022
+@item
+The @w{ISO 2022} standard defines a mechanism for extended character
+sets where one character @emph{can} be represented by more than one
+byte.  This is achieved by associating a state with the text.  Embedded
+in the text can be characters which can be used to change the state.
+Each byte in the text might have a different interpretation in each
+state.  The state might even influence whether a given byte stands for a
+character on its own or whether it has to be combined with some more
+bytes.
+
+@cindex EUC
+@cindex SJIS
+In most uses of @w{ISO 2022} the defined character sets do not allow
+state changes which cover more than the next character.  This has the
+big advantage that whenever one can identify the beginning of the byte
+sequence of a character one can interpret a text correctly.  Examples of
+character sets using this policy are the various EUC character sets
+(used by Sun's operations systems, EUC-JP, EUC-KR, EUC-TW, and EUC-CN)
+or SJIS (Shift JIS, a Japanese encoding).
+
+But there are also character sets using a state which is valid for more
+than one character and has to be changed by another byte sequence.
+Examples for this are ISO-2022-JP, ISO-2022-KR, and ISO-2022-CN.
+
+@item
+@cindex ISO 6937
+Early attempts to fix 8 bit character sets for other languages using the
+Roman alphabet lead to character sets like @w{ISO 6937}.  Here bytes
+representing characters like the acute accent do not produce output on
+there on.  One has to combine them with other characters.  E.g., the
+byte sequence @code{0xc2 0x61} (non-spacing acute accent, following by
+lower-case `a') to get the ``small a with acute'' character.  To get the
+acute accent character on its on one has to write @code{0xc2 0x20} (the
+non-spacing acute followed by a space).
+
+This type of characters sets is quite frequently used in embedded
+systems such as video text.
+
+@item
+@cindex UTF-8
+Instead of converting the Unicode or @w{ISO 10646} text used internally
+it is often also sufficient to simply use an encoding different then
+UCS2/UCS4.  The Unicode and @w{ISO 10646} standards even specify such an
+encoding: UTF-8.  This encoding is able to represent all of @w{ISO
+10464} 31 bits in a byte string of length one to seven.
+
+@cindex UTF-7
+There were a few other attempts to encode @w{ISO 10646} such as UTF-7
+but UTF-8 is today the only encoding which should be used.  In fact,
+UTF-8 will hopefully soon be the only external which has to be
+supported.  It proofs to be universally usable and the only disadvantage
+is that it favor Latin languages very much by making the byte string
+representation of other scripts (Cyrillic, Greek, Asian scripts) longer
+than necessary if using a specific character set for these scripts.  But
+with methods like the Unicode compression scheme one can overcome these
+problems and the ever growing memory and storage capacities do the rest.
+@end itemize
+
+The question remaining now is: how to select the character set or
+encoding to use.  The answer is mostly: you cannot decide about it
+yourself, it is decided by the developers of the system or the majority
+of the users.  Since the goal is interoperability one has to use
+whatever the other people one works with use.  If there are no
+constraints the selection is based on the requirements the expected
+circle of users will have.  I.e., if a project is expected to only be
+used in, say, Russia it is fine to use KOI8-R or a similar character
+set.  But if at the same time people from, say, Greek are participating
+one should use a character set which allows all people to collaborate.
+
+A general advice here could be: go with the most general character set,
+namely @w{ISO 10646}.  Use UTF-8 as the external encoding and problems
+about users not being able to use their own language adequately are a
+thing of the past.
+
+One final comment about the choice of the wide character representation
+is necessary at this point.  We have said above that the natural choice
+is using Unicode or @w{ISO 10646}.  This is not specified in any
+standard, though.  The @w{ISO C} standard does not specify anything
+specific about the @code{wchar_t} type.  There might be systems where
+the developers decided differently.  Therefore one should as much as
+possible avoid making assumption about the wide character representation
+although GNU systems will always work as described above.  If the
+programmer uses only the functions provided by the C library to handle
+wide character strings there should not be any compatibility problems
+with other systems.
+
+@node Charset Function Overview
+@section Overview about Character Handling Functions
+
+A Unix @w{C library} contains three different sets of functions in two
+families to handling character set conversion.  The one function family
+is specified in the @w{ISO C} standard and therefore is portable even
+beyond the Unix world.
+
+The most commonly known set of functions, coming from the @w{ISO C89}
+standard, is unfortunately the least useful one.  In fact, these
+functions should be avoided whenever possible, especially when
+developing libraries (as opposed to applications).
+
+The second family o functions got introduced in the early Unix standards
+(XPG2) and is still part of the latest and greatest Unix standard:
+@w{Unix 98}.  It is also the most powerful and useful set of functions.
+But we will start with the functions defined in the second amendment to
+@w{ISO C89}.
+
+@node Restartable multibyte conversion
+@section Restartable Multibyte Conversion Functions
+
+The @w{ISO C} standard defines functions to convert strings from a
+multibyte representation to wide character strings.  There are a number
+of peculiarities:
+
+@itemize @bullet
+@item
+The character set assumed for the multibyte encoding is not specified
+as an argument to the functions.  Instead the character set specified by
+the @code{LC_CTYPE} category of the current locale is used; see
+@ref{Locale Categories}.
+
+@item
+The functions handling more than one character at a time require NUL
+terminated strings as the argument.  I.e., converting blocks of text
+does not work unless one can add a NUL byte at an appropriate place.
+The GNU C library contains some extensions the standard which allow
+specifying a size but basically they also expect terminated strings.
+@end itemize
+
+Despite these limitations the @w{ISO C} functions can very well be used
+in many contexts.  In graphical user interfaces, for instance, it is not
+uncommon to have functions which require text to be displayed in a wide
+character string if it is not simple ASCII.  The text itself might come
+from a file with translations and of course to user should decide about
+the current locale which determines the translation and therefore also
+the external encoding used.  In such a situation (and many others) the
+functions described here are perfect.  If more freedom while performing
+the conversion is necessary take a look at the @code{iconv} functions
+(@pxref{Generic Charset Conversion})
+
+@menu
+* Selecting the Conversion::     Selecting the conversion and its properties.
+* Keeping the state::            Representing the state of the conversion.
+* Converting a Character::       Converting Single Characters.
+* Converting Strings::           Converting Multibyte and Wide Character
+                                  Strings.
+* Multibyte Conversion Example:: A Complete Multibyte Conversion Example.
+@end menu
+
+@node Selecting the Conversion
+@subsection Selecting the conversion and its properties
+
+We already said above that the currently selected locale for the
+@code{LC_CTYPE} category decides about the conversion which is performed
+by the functions we are about to describe.  Each locale uses its own
+character set (given as an argument to @code{localedef}) and this is the
+one assumed as the external multibyte encoding.  The wide character
+character set always is UCS4.  So we can see here already where the
+limitations of these conversion functions are.
+
+A characteristic of each multibyte character set is the maximum number
+of bytes which can be necessary to represent one character.  This
+information is quite important when writing code which uses the
+conversion functions.  In the examples below we will see some examples.
+The @w{ISO C} standard defines two macros which provide this information.
+
+
+@comment limits.h
+@comment ISO
+@deftypevr Macro int MB_LEN_MAX
+This macro specifies the maximum number of bytes in the multibyte
+sequence for a single character in any of the supported locales.  It is
+a compile-time constant and it is defined in @file{limits.h}.
+@pindex limits.h
+@end deftypevr
+
+@comment stdlib.h
+@comment ISO
+@deftypevr Macro int MB_CUR_MAX
+@code{MB_CUR_MAX} expands into a positive integer expression that is the
+maximum number of bytes in a multibyte character in the current locale.
+The value is never greater than @code{MB_LEN_MAX}.  Unlike
+@code{MB_LEN_MAX} this macro need not be a compile-time constant and in
+fact, in the GNU C library it is not.
+
+@pindex stdlib.h
+@code{MB_CUR_MAX} is defined in @file{stdlib.h}.
+@end deftypevr
+
+Two different macros are necessary since strictly @w{ISO C89} compiles
+do not allow variable length array definitions but still it is desirable
+to avoid dynamic allocation.  This incomplete piece of code shows the
+problem:
+
+@smallexample
+@{
+  char buf[MB_LEN_MAX];
+  ssize_t len = 0;
+
+  while (! feof (fp))
+    @{
+      fread (&buf[len], 1, MB_CUR_MAX - len, fp);
+      /* @r{... process} buf */
+      len -= used;
+    @}
+@}
+@end smallexample
+
+The code in the inner loop is expected to have always enough bytes in
+the array @var{buf} to convert one multibyte character.  The array
+@var{buf} has to be sized statically since many compilers do not allow a
+variable size.  The @code{fread} call makes sure that always
+@code{MB_CUR_MAX} bytes are available in @var{buf}.  Note that it is no
+problem if @code{MB_CUR_MAX} is not a compile-time constant.
+
+
+@node Keeping the state
+@subsection Representing the state of the conversion
+
+@cindex stateful
+In the introduction of this chapter it was said that certain character
+sets use a @dfn{stateful} encoding.  I.e., the encoded values depend in
+some way on the previous byte in the text.
+
+Since the conversion functions allow converting a text in more than one
+step we must have a way to pass this information from one call of the
+functions to another.
+
+@comment wchar.h
+@comment ISO
+@deftp {Data type} mbstate_t
+@cindex shift state
+A variable of type @code{mbstate_t} can contain all the information
+about the @dfn{shift state} needed from one call to a conversion
+function to another.
+
+@pindex wchar.h
+This type is defined in @file{wchar.h}.  It got introduced in the second
+amendment to @w{ISO C89}.
+@end deftp
+
+To use objects of this type the programmer has to define such objects
+(normally as local variables on the stack) and pass a pointer to the
+object to the conversion functions.  This way the conversion function
+can update the object if the current multibyte character set is
+stateful.
+
+There is no specific function or initializer to put the state object in
+any specific state.  The rules are that the object should always
+represent the initial state before the first use and this is achieved by
+clearing the whole variable with code such as follows:
+
+@smallexample
+@{
+  mbstate_t state;
+  memset (&state, '\0', sizeof (state));
+  /* @r{from now on @var{state} can be used.}  */
+  ...
+@}
+@end smallexample
+
+When using the conversion functions to generate output it is often
+necessary to test whether current state corresponds to the initial
+state.  This is necessary, for example, to decide whether or not to emit
+escape sequences to set the state to the initial state at certain
+sequence points.  Communication protocols often require this.
+
+@comment wchar.h
+@comment ISO
+@deftypefun int mbsinit (const mbstate_t *@var{ps})
+This function determines whether the state object pointed to by @var{ps}
+is in the initial state or not.  If @var{ps} is no null pointer or the
+object is in the initial state the return value is nonzero.  Otherwise
+it is zero.
+
+@pindex wchar.h
+This function was introduced in the second amendment to @w{ISO C89} and
+is declared in @file{wchar.h}.
+@end deftypefun
+
+Code using this function often looks similar to this:
+
+@smallexample
+@{
+  mbstate_t state;
+  memset (&state, '\0', sizeof (state));
+  /* @r{Use @var{state}.}  */
+  ...
+  if (! mbsinit (&state))
+    @{
+      /* @r{Emit code to return to initial state.}  */
+      fputs ("@r{whatever needed}", fp);
+    @}
+  ...
+@}
+@end smallexample
+
+@node Converting a Character
+@subsection Converting Single Characters
+
+The most fundamental of the conversion functions are those dealing with
+single characters.  Please note that this does not always mean single
+bytes.  But since there is very often a subset of the multibyte
+character set which consists of single byte sequences there are
+functions to help with converting bytes.  One very important and often
+applicable scenario is where ASCII is a subpart of the multibyte
+character set.  I.e., all ASCII characters stand for itself and all
+other characters have at least a first byte which is beyond the range
+@math{0} to @math{127}.
+
+@comment wchar.h
+@comment ISO
+@deftypefun wint_t btowc (int @var{c})
+The @code{btowc} function (``byte to wide character'') converts a valid
+single byte character in the initial shift state into the wide character
+equivalent using the conversion rules from the currently selected locale
+of the @code{LC_CTYPE} category.
+
+If @code{(unsigned char) @var{c}} is no valid single byte multibyte
+character or if @var{c} is @code{EOF} the function returns @code{WEOF}.
+
+Please note the restriction of @var{c} being tested for validity only in
+the initial shift state.  There is no @code{mbstate_t} object used from
+which the state information is taken and the function also does not use
+any static state.
+
+@pindex wchar.h
+This function was introduced in the second amendment of @w{ISO C89} and
+is declared in @file{wchar.h}.
+@end deftypefun
+
+Despite the limitation that the single byte value always is interpreted
+in the initial state this function is actually useful most of the time.
+Most character are either entirely single-byte character sets or they
+are extension to ASCII.  But then it is possible to write code like this
+(not that this specific example is useful):
+
+@smallexample
+wchar_t *
+itow (unsigned long int val)
+@{
+  static wchar_t buf[30];
+  wchar_t *wcp = &buf[29];
+  *wcp = L'\0';
+  while (val != 0)
+    @{
+      *--wcp = btowc ('0' + val % 10);
+      val /= 10;
+    @}
+  if (wcp == &buf[29])
+    *--wcp = btowc ('0');
+  return wcp;
+@}
+@end smallexample
+
+The question is why is it necessary to use such a complicated
+implementation and not simply cast L'0' to a wide character.  The answer
+is that there is no guarantee that the compiler knows about the wide
+character set used at runtime.  Even if the wide character equivalent of
+a given single-byte character is simply the equivalent to casting a
+single-byte character to @code{wchar_t} this is no guarantee that this
+is the case everywhere.
+
+There also is a function for the conversion in the other direction.
+
+@comment wchar.h
+@comment ISO
+@deftypefun int wctob (wint_t @var{c})
+The @code{wctob} function (``wide character to byte'') takes as the
+paremeter a valid wide character.  If the multibyte representation for
+this character in the initial state is exactly one byte long the return
+value of this function is this character.  Otherwise the return value is
+@code{EOF}.
+
+@pindex wchar.h
+This function was introduced in the second amendment of @w{ISO C89} and
+is declared in @file{wchar.h}.
+@end deftypefun
+
+There are more general functions to convert single character from
+multibyte representation to wide characters and vice versa.  These
+functions pose no limit on the length of the multibyte representation
+and they also do not require it to be in the initial state.
+
+@comment wchar.h
+@comment ISO
+@deftypefun size_t mbrtowc (wchar_t *restrict @var{pwc}, const char *restrict @var{s}, size_t @var{n}, mbstate_t *restrict @var{ps})
+@cindex stateful
+The @code{mbrtowc} function (``multibyte restartable to wide
+character'') converts the next multibyte character in the string pointed
+to by @var{s} into a wide character and stores it in the wide character
+string pointed to by @var{pwc}.  The conversion is performed according
+to the locale currently selected for the @code{LC_CTYPE} category.  If
+the character set for the locale is stateful the multibyte string is
+interpreted in the state represented by the object pointed to by
+@var{ps}.  If @var{ps} is a null pointer an static, internal state
+variable used only by the @code{mbrtowc} variable is used.
+
+If the next multibyte character corresponds to the NUL wide character
+the return value of the function is @math{0} and the state object is
+afterwards in the initial state.  If the next @var{n} or fewer bytes
+form a correct multibyte character the return value is the number of
+bytes starting from @var{s} which form the multibyte character.  The
+conversion state is updated according to the bytes consumed in the
+conversion.  In both cases the wide character (either the @code{L'\0'}
+or the one found in the conversion) is stored in the string pointer to
+by @var{pwc} iff @var{pwc} is not null.
+
+If the first @var{n} bytes of the multibyte string possibly form a valid
+multibyte character but there are more than @var{n} bytes needed to
+complete it the return value of the function is @code{(size_t) -2} and
+no value is stored.  Please note that this can happen even if @var{n}
+has a value greater or equal to @code{MB_CUR_MAX} since the input might
+contain redundant shift sequences.
+
+If the first @code{n} bytes of the multibyte string cannot possibly
+form a valid multibyte character also no value is stored, the global
+variable i set to the value @code{EILSEQ} and the function return
+@code{(size_t) -1}.  The conversion state is afterwards undefined.
+
+@pindex wchar.h
+This function was introduced in the second amendment to @w{ISO C89} and
+is declared in @file{wchar.h}.
+@end deftypefun
+
+Using this function is straight forward.  A function which copies a
+multibyte string into a wide character string while at the same time
+converting all lowercase character into uppercase could look like this
+(this is not the final version, just an example; it has no error
+checking and leaks sometimes memory):
+
+@smallexample
+wchar_t *
+mbstouwcs (const char *s)
+@{
+  size_t len = strlen (s);
+  wchar_t *result = malloc ((len + 1) * sizeof (wchar_t));
+  wchar_t *wcp = result;
+  wchar_t tmp[1];
+  mbstate_t state;
+  memset (&state, '\0', sizeof (state));
+  size_t nbytes;
+  while ((nbytes = mbrtowc (tmp, s, len, &state)) > 0)
+    @{
+      if (nbytes >= (size_t) -2)
+        /* Invalid input string.  */
+        return NULL;
+      *result++ = towupper (tmp[0]);
+      len -= nbytes;
+      s += nbytes;
+    @}
+  return result;
+@}
+@end smallexample
+
+The use of @code{mbrtowc} should be clear.  A single wide character is
+stored in @code{@var{tmp}[0]} and the number of consumed bytes is stored
+in the variable @var{nbytes}.  In case the the conversion was successful
+the uppercase variant of the wide character is stored in the
+@var{result} array and the pointer to the input string and the number of
+available bytes is adjusted.
+
+The only non-obvious thing about the function might be the way memory is
+allocated for the result.  The above code uses the fact that there can
+never be more wide characters in the converted results than there are
+bytes in the multibyte input string.  This method yields to a
+pessimistic guess about the size of the result and if many wide
+character strings have to be constructed this way or the strings are
+long, the extra memory required to store the wide character strings
+might be significant.  It would of course be possible to resize the
+allocated memory block to the correct size before returning it.  A
+better solution might be to allocate just the right amount of space for
+the result right away.  Unfortunately there is no function to compute
+the length of the wide character string directly from the multibyte
+string.  But there is a function which does part of the work.
+
+@comment wchar.h
+@comment ISO
+@deftypefun size_t mbrlen (const char *restrict @var{s}, size_t @var{n}, mbstate_t *@var{ps})
+The @code{mbrlen} function (``multibyte restartable length'') computes
+the number of at most @var{n} bytes starting at @var{s} which form the
+next valid and complete multibyte character.
+
+If the next multibyte character corresponds to the NUL wide character
+the return value is @math{0}.  If the next @var{n} bytes form a valid
+multibyte character the number of bytes belonging to this multibyte
+character byte sequence is returned.
+
+If the the first @var{n} bytes possibly form a valid multibyte
+character but it is incomplete the return value is @code{(size_t) -2}.
+Otherwise the multibyte character sequence is invalid and the return
+value is @code{(size_t) -1}.
+
+The multibyte sequence is interpreted in the state represented by the
+object pointer to by @var{ps}.  If @var{ps} is a null pointer an state
+object local to @code{mbrlen} is used.
+
+@pindex wchar.h
+This function was introduced in the second amendment to @w{ISO C89} and
+is declared in @file{wchar.h}.
+@end deftypefun
+
+The tentative reader now will of course note that @code{mbrlen} can be
+implemented as
+
+@smallexample
+mbrtowc (NULL, s, n, ps != NULL ? ps : &internal)
+@end smallexample
+
+This is true and in fact is mentioned in the official specification.
+Now, how can this function be used to determine the length of the wide
+character string created from a multibyte character string?  It is not
+directly usable but we can define a function @code{mbslen} using it:
+
+@smallexample
+size_t
+mbslen (const char *s)
+@{
+  mbstate_t state;
+  size_t result = 0;
+  size_t nbytes;
+  memset (&state, '\0', sizeof (state));
+  while ((nbytes = mbrlen (s, MB_LEN_MAX, &state)) > 0)
+    @{
+      if (nbytes >= (size_t) -2)
+        /* @r{Something is wrong.}  */
+        return (size_t) -1;
+      s += nbytes;
+      ++result;
+    @}
+  return result;
+@}
+@end smallexample
+
+This function simply calls @code{mbrlen} for each multibyte character
+in the string and counts the number of function calls.  Please note that
+we here use @code{MB_LEN_MAX} as the size argument in the @code{mbrlen}
+call.  This is OK since a) this value is larger then the length of the
+longest multibyte character sequence and b) because we know that the
+string @var{s} ends with a NIL byte which cannot be part of any other
+multibyte character sequence but the one representing the NIL wide
+character.  Therefore the @code{mbrlen} function will never read invalid
+memory.
+
+Now that this function is available (just to make this clear, this
+function is @emph{not} part of the GNU C library) we can compute the
+number of wide character required to store the converted multibyte
+character string @var{s} using
+
+@smallexample
+wcs_bytes = (mbslen (s) + 1) * sizeof (wchar_t);
+@end smallexample
+
+Please note that the @code{mbslen} function is quite inefficient.  The
+implementation of @code{mbstouwcs} implemented using @code{mbslen} would
+have to perform the conversion of the multibyte character input string
+twice and this conversion might be quite expensive.  So it is necessary
+to think about the consequences of using the easier but inprecise method
+before doing the work twice.
+
+@comment wchar.h
+@comment ISO
+@deftypefun size_t wcrtomb (char *restrict @var{s}, wchar_t @var{wc}, mbstate_t *restrict @var{ps})
+The @code{wcrtomb} function (``wide character restartable to
+multibyte'') converts a single wide character into a multibyte string
+corresponding to that wide character.
+
+If @var{s} is a null pointer the resets the the state stored in the
+objects pointer to by @var{ps} to the initial state.  This can also be
+achieved by a call like this:
+
+@smallexample
+wcrtombs (temp_buf, L'\0', ps)
+@end smallexample
+
+@noindent
+since when @var{s} is a null pointer @code{wcrtomb} performs as if it
+writes into an internal buffer which is guaranteed to be large enough.
+
+If @var{wc} is the NUL wide character @code{wcrtomb} emits, if
+necessary, a shift sequence to get the state @var{ps} into the initial
+state followed by a single NUL byte is stored in the string @var{s}.
+
+Otherwise a byte sequence (possibly including shift sequences) is
+written into the string @var{s}.  This of course only happens if
+@var{wc} is a valid wide character, i.e., it has a multibyte
+representation in the character set selected by locale of the
+@code{LC_CTYPE} category.  If @var{wc} is no valid wide character
+nothing is stored in the strings @var{s}, @code{errno} is set to
+@code{EILSEQ}, the conversion state in @var{ps} is undefined and the
+return value is @code{(size_t) -1}.
+
+If no error occurred the function returns the number of bytes stored in
+the string @var{s}.  This includes all byte representing shift
+sequences.
+
+One word about the interface of the function: there is no parameter
+specifying the length of the array @var{s}.  Instead the function
+assumes that there are at least @code{MB_CUR_MAX} bytes available since
+this is the maximum length of any byte sequence representing a single
+character.  So the caller has to make sure that there is enough space
+available, otherwise buffer overruns can occur.
+
+@pindex wchar.h
+This function was introduced in the second amendment to @w{ISO C} and is
+declared in @file{wchar.h}.
+@end deftypefun
+
+Using this function is as easy as using @code{mbrtowc}.  The following
+example appends a wide character string to a multibyte character string.
+Again, the code is not really useful, it is simply here to demonstrate
+the use and some problems.
+
+@smallexample
+char *
+mbscatwc (char *s, size_t len, const wchar_t *ws)
+@{
+  mbstate_t state;
+  char *wp = strchr (s, '\0');
+  len -= wp - s;
+  memset (&state, '\0', sizeof (state));
+  do
+    @{
+      size_t nbytes;
+      if (len < MB_CUR_LEN)
+        @{
+          /* @r{We cannot guarantee that the next}
+             @r{character fits into the buffer, so}
+             @r{return an error.}  */
+          errno = E2BIG;
+          return NULL;
+        @}
+      nbytes = wcrtomb (wp, *ws, &state);
+      if (nbytes == (size_t) -1)
+        /* @r{Error in the conversion.}  */
+        return NULL;
+      len -= nbytes;
+      wp += nbytes;
+    @}
+  while (*ws++ != L'\0');
+  return s;
+@}
+@end smallexample
+
+First the function has to find the end of the string currently in the
+array @var{s}.  The @code{strchr} call does this very efficiently since a
+requirement for multibyte character representations is that the NUL byte
+never is used except to represent itself (and in this context, the end
+of the string).
+
+After initializing the state object the loop is entered where the first
+task is to make sure there is enough room in the array @var{s}.  We
+abort if there are not at least @code{MB_CUR_LEN} bytes available.  This
+is not always optimal but we have no other choice.  We might have less
+than @code{MB_CUR_LEN} bytes available but the next multibyte character
+might also be only one byte long.  At the time the @code{wcrtomb} call
+returns it is too late to decide whether the buffer was large enough or
+not.  If this solution is really unsuitable there is a very slow but
+more accurate solution.
+
+@smallexample
+  ...
+  if (len < MB_CUR_LEN)
+    @{
+      mbstate_t temp_state;
+      memcpy (&temp_state, &state, sizeof (state));
+      if (wcrtomb (NULL, *ws, &temp_state) > len)
+        @{
+          /* @r{We cannot guarantee that the next}
+             @r{character fits into the buffer, so}
+             @r{return an error.}  */
+          errno = E2BIG;
+          return NULL;
+        @}
+    @}
+  ...
+@end smallexample
+
+Here we do perform the conversion which might overflow the buffer so
+that we are afterwards in the position to make an exact decision about
+the buffer size.  Please note the @code{NULL} argument for the
+destination buffer in the new @code{wcrtomb} call; since we are not
+interested in the result at this point this is a nice way to express
+this.  The most unusual thing about this piece of code certainly is the
+duplication of the conversion state object.  But think about it: if a
+change of the state is necessary to emit the next multibyte character we
+want to have the same shift state change performed in the real
+conversion.  Therefore we have to preserve the initial shift state
+information.
+
+There are certainly many more and even better solutions to this problem.
+This example is only meant for educational purposes.
+
+@node Converting Strings
+@subsection Converting Multibyte and Wide Character Strings
+
+The functions described in the previous section only convert a single
+character at a time.  Most operations to be performed in real-world
+programs include strings and therefore the @w{ISO C} standard also
+defines conversions on entire strings.  The defined set of functions is
+quite limited, though.  Therefore contains the GNU C library a few
+extensions which are necessary in some important situations.
+
+@comment wchar.h
+@comment ISO
+@deftypefun size_t mbsrtowcs (wchar_t *restrict @var{dst}, const char **restrict @var{src}, size_t @var{len}, mbstate_t *restrict @var{ps})
+The @code{mbsrtowcs} function (``multibyte string restartable to wide
+character string'') converts an NUL terminated multibyte character
+string at @code{*@var{src}} into an equivalent wide character string,
+including the NUL wide character at the end.  The conversion is started
+using the state information from the object pointed to by @var{ps} or
+from an internal object of @code{mbsrtowcs} if @var{ps} is a null
+pointer.  Before returning the state object to match the state after the
+last converted character.  The state is the initial state if the
+terminating NUL byte is reached and converted.
+
+If @var{dst} is not a null pointer the result is stored in the array
+pointed to by @var{dst}, otherwise the conversion result is not
+available since it is stored in an internal buffer.
+
+If @var{len} wide characters are stored in the array @var{dst} before
+reaching the end of the input string the conversion stops and @var{len}
+is returned.  If @var{dst} is a null pointer @var{len} is never checked.
+
+Another reason for a premature return from the function call is if the
+input string contains an invalid multibyte sequence.  In this case the
+global variable @code{errno} is set to @code{EILSEQ} and the function
+returns @code{(size_t) -1}.
+
+@c XXX The ISO C9x draft seems to have a problem here.  It says that PS
+@c is not updated if DST is NULL.  This is not said straight forward and
+@c none of the other functions is described like this.  It would make sense
+@c to define the function this way but I don't think it is meant like this.
+
+In all other cases the function returns the number of wide characters
+converted during this call.  If @var{dst} is not null @code{mbsrtowcs}
+stores in the pointer pointed to by @var{src} a null pointer (if the NUL
+byte in the input string was reached) or the address of the byte
+following the last converted multibyte character.
+
+@pindex wchar.h
+This function was introduced in the second amendment to @w{ISO C} and is
+declared in @file{wchar.h}.
+@end deftypefun
+
+The definition of this function has one limitation which has to be
+understood.  The requirement that @var{dst} has to be a NUL terminated
+string provides problems if one wants to convert buffers with text.  A
+buffer is normally no collection of NUL terminated strings but instead a
+continuous collection of lines, separated by newline characters.  Now
+assume a function to convert one line from a buffer is needed.  Since
+the line is not NUL terminated the source pointer cannot directly point
+into the unmodified text buffer.  This means, either one inserts the NUL
+byte at the appropriate place for the time of the @code{mbsrtowcs}
+function call (which is not doable for a read-only buffer or in a
+multi-threaded application) or one copies the line in an extra buffer
+where it can be terminated by a NUL byte.  Note that it is not in
+general possible to limit the number of characters to convert by setting
+the parameter @var{len} to any specific value.  Since it is not known
+how many bytes each multibyte character sequence is in length one always
+could do only a guess.
+
+@cindex stateful
+There is still a problem with the method of NUL-terminating a line right
+after the newline character which could lead to very strange results.
+As said in the description of the @var{mbsrtowcs} function above the
+conversion state is guaranteed to be in the initial shift state after
+processing the NUL byte at the end of the input string.  But this NUL
+byte is not really part of the text.  I.e., the conversion state after
+the newline in the original text could be something different than the
+initial shift state and therefore the first character of the next line
+is encoded using this state.  But the state in question is never
+accessible to the user since the conversion stops after the NUL byte.
+Fortunately most stateful character sets in use today require that the
+shift state after a newline is the initial state but this is no
+guarantee.  Therefore simply NUL terminating a piece of a running text
+is not always the adequate solution.
+
+The generic conversion
+@comment XXX reference to iconv
+interface does not have this limitation (it simply works on buffers, not
+strings) but there is another way.  The GNU C library contains a set of
+functions why take additional parameters specifying maximal number of
+bytes which are consumed from the input string.  This way the problem of
+above's example could be solved by determining the line length and
+passing this length to the function.
+
+@comment wchar.h
+@comment ISO
+@deftypefun size_t wcsrtombs (char *restrict @var{dst}, const wchar_t **restrict @var{src}, size_t @var{len}, mbstate_t *restrict @var{ps})
+The @code{wcsrtombs} function (``wide character string restartable to
+multibyte string'') converts the NUL terminated wide character string at
+@code{*@var{src}} into an equivalent multibyte character string and
+stores the result in the array pointed to by @var{dst}.  The NUL wide
+character is also converted.  The conversion starts in the state
+described in the object pointed to by @var{ps} or by a state object
+locally to @code{wcsrtombs} in case @var{ps} is a null pointer.  If
+@var{dst} is a null pointer the conversion is performed as usual but the
+result is not available.  If all characters of the input string were
+successfully converted and if @var{dst} is not a null pointer the
+pointer pointed to by @var{src} gets assigned a null pointer.
+
+If one of the wide characters in the input string has no valid multibyte
+character equivalent the conversion stops early, sets the global
+variable @code{errno} to @code{EILSEQ}, and returns @code{(size_t) -1}.
+
+Another reason for a premature stop is if @var{dst} is not a null
+pointer and the next converted character would require more than
+@var{len} bytes in total to the array @var{dst}.  In this case (and if
+@var{dest} is not a null pointer) the pointer pointed to by @var{src} is
+assigned a value pointing to the wide character right after the last one
+successfully converted.
+
+Except in the case of an encoding error the return value of the function
+is the number of bytes in all the multibyte character sequences stored
+in @var{dst}.  Before returning the state in the object pointed to by
+@var{ps} (or the internal object in case @var{ps} is a null pointer) is
+updated to reflect the state after the last conversion.  The state is
+the initial shift state in case the terminating NUL wide character was
+converted.
+
+@pindex wchar.h
+This function was introduced in the second amendment to @w{ISO C} and is
+declared in @file{wchar.h}.
+@end deftypefun
+
+The restriction mentions above for the @code{mbsrtowcs} function applies
+also here.  There is no possibility to directly control the number of
+input characters.  One has to place the NUL wide character at the
+correct place or control the consumed input indirectly via the available
+output array size (the @var{len} parameter).
+
+@comment wchar.h
+@comment GNU
+@deftypefun size_t mbsnrtowcs (wchar_t *restrict @var{dst}, const char **restrict @var{src}, size_t @var{nmc}, size_t @var{len}, mbstate_t *restrict @var{ps})
+The @code{mbsnrtowcs} function is very similar to the @code{mbsrtowcs}
+function.  All the parameters are the same except for @var{nmc} which is
+new.  The return value is the same as for @code{mbsrtowcs}.
+
+This new parameter specifies how many bytes at most can be used from the
+multibyte character string.  I.e., the multibyte character string
+@code{*@var{src}} need not be NUL terminated.  But if a NUL byte is
+found within the @var{nmc} first bytes of the string the conversion
+stops here.
+
+This function is a GNU extensions.  It is meant to work around the
+problems mentioned above.  Now it is possible to convert buffer with
+multibyte character text piece for piece without having to care about
+inserting NUL bytes and the effect of NUL bytes on the conversion state.
+@end deftypefun
+
+A function to convert a multibyte string into a wide character string
+and display it could be written like this (this is no really useful
+example):
+
+@smallexample
+void
+showmbs (const char *src, FILE *fp)
+@{
+  mbstate_t state;
+  int cnt = 0;
+  memset (&state, '\0', sizeof (state));
+  while (1)
+    @{
+      wchar_t linebuf[100];
+      const char *endp = strchr (src, '\n');
+      size_t n;
+
+      /* @r{Exit if there is no more line.}  */
+      if (endp == NULL)
+        break;
+
+      n = mbsnrtowcs (linebuf, &src, endp - src, 99, &state);
+      linebuf[n] = L'\0';
+      fprintf (fp, "line %d: \"%S\"\n", linebuf);
+    @}
+@}
+@end smallexample
+
+There is no more problem with the state after a call to
+@code{mbsnrtowcs}.  Since we don't insert characters in the strings
+which were not in there right from the beginning and we use @var{state}
+only for the conversion of the given buffer there is no problem with
+mixing the state up.
+
+@comment wchar.h
+@comment GNU
+@deftypefun size_t wcsnrtombs (char *restrict @var{dst}, const wchar_t **restrict @var{src}, size_t @var{nwc}, size_t @var{len}, mbstate_t *restrict @var{ps})
+The @code{wcsnrtombs} function implements the conversion from wide
+character strings to multibyte character strings.  It is similar to
+@code{wcsrtombs} but it takes, just like @code{mbsnrtowcs}, an extra
+parameter which specifies the length of the input string.
+
+No more than @var{nwc} wide characters from the input string
+@code{*@var{src}} are converted.  If the input string contains a NUL
+wide character in the first @var{nwc} character to conversion stops at
+this place.
+
+This function is a GNU extension and just like @code{mbsnrtowcs} is
+helps in situations where no NUL terminated input strings are available.
+@end deftypefun
+
+
+@node Multibyte Conversion Example
+@subsection A Complete Multibyte Conversion Example
+
+The example programs given in the last sections are only brief and do
+not cont