diff options
| author | Ulrich Drepper <drepper@redhat.com> | 2001-02-12 08:22:23 +0000 |
|---|---|---|
| committer | Ulrich Drepper <drepper@redhat.com> | 2001-02-12 08:22:23 +0000 |
| commit | b5e73f5664cc2c3fce94162cdc6d97ac8232776f (patch) | |
| tree | 8cee0802d540f90594173597b35acb44139bbfa4 | |
| parent | 9279500aedf9d0e3382fd89023f42ae2d316533c (diff) | |
| download | glibc-b5e73f5664cc2c3fce94162cdc6d97ac8232776f.tar.xz glibc-b5e73f5664cc2c3fce94162cdc6d97ac8232776f.zip | |
Document wide character stream functions.
| -rw-r--r-- | manual/stdio.texi | 632 |
1 files changed, 586 insertions, 46 deletions
diff --git a/manual/stdio.texi b/manual/stdio.texi index d6dada1cae..f4d44e1b9b 100644 --- a/manual/stdio.texi +++ b/manual/stdio.texi @@ -18,6 +18,7 @@ representing a communications channel to a file, device, or process. * Opening Streams:: How to create a stream to talk to a file. * Closing Streams:: Close a stream when you are finished with it. * Streams and Threads:: Issues with streams in threaded programs. +* Streams and I18N:: Streams in internationalized applications. * Simple Output:: Unformatted output by characters and lines. * Character Input:: Unformatted input by characters and words. * Line Input:: Reading a line or a record from a stream. @@ -116,8 +117,8 @@ described in @ref{File System Interface}.) Most other operating systems provide similar mechanisms, but the details of how to use them can vary. In the GNU C library, @code{stdin}, @code{stdout}, and @code{stderr} are -normal variables which you can set just like any others. For example, to redirect -the standard output to a file, you could do: +normal variables which you can set just like any others. For example, +to redirect the standard output to a file, you could do: @smallexample fclose (stdout); @@ -129,6 +130,9 @@ Note however, that in other systems @code{stdin}, @code{stdout}, and But you can use @code{freopen} to get the effect of closing one and reopening it. @xref{Opening Streams}. +The three streams @code{stdin}, @code{stdout}, and @code{stderr} are not +unoriented at program start (@pxref{Streams and I18N}). + @node Opening Streams @section Opening Streams @@ -637,6 +641,144 @@ This function is especially useful when program code has to be used which is written without knowledge about the @code{_unlocked} functions (or if the programmer was to lazy to use them). +@node Streams and I18N +@section Streams in Internationalized Applications + +@w{ISO C90} introduced the new type @code{wchar_t} to allow handling +larger character sets. What was missing was a possibility to output +strings of @code{wchar_t} directly. One had to convert them into +multibyte strings using @code{mbstowcs} (there was no @code{mbsrtowcs} +yet) and then use the normal stream functions. While this is doable it +is very cumbersome since performing the conversions is not trivial and +greatly increases program complexity and size. + +The Unix standard early on (I think in XPG4.2) introduced two additional +format specifiers for the @code{printf} and @code{scanf} families of +functions. Printing and reading of single wide characters was made +possible using the @code{%C} specifier and wide character strings can be +handled with @code{%S}. These modifiers behave just like @code{%c} and +@code{%s} only that they expect the corresponding argument to have the +wide character type and that the wide character and string are +transformed into/from multibyte strings before being used. + +This was a beginning but it is still not good enough. Not always is it +desirable to use @code{printf} and @code{scanf}. The other, smaller and +faster functions cannot handle wide characters. Second, it is not +possible to have a format string for @code{printf} and @code{scanf} +consisting of wide characters. The result is that format strings would +have to be generated if they have to contain non-basic characters. + +@cindex C++ streams +@cindex streams, C++ +In the @w{Amendment 1} to @w{ISO C90} a whole new set of functions was +added to solve the problem. Most of the stream functions got a +counterpart which take a wide character or wide character string instead +of a character or string respectively. The new functions operate on the +same streams (like @code{stdout}). This is different from the model of +the C++ runtime library where separate streams for wide and normal I/O +are used. + +@cindex orientation, stream +@cindex stream orientation +Being able to use the same stream for wide and normal operations comes +with a restriction: a stream can be used either for wide operations or +for normal operations. Once it is decided there is no way back. Only a +call to @code{freopen} or @code{freopen64} can reset the +@dfn{orientation}. The orientation can be decided in three ways: + +@itemize @bullet +@item +If any of the normal character functions is used (this includes the +@code{fread} and @code{fwrite} functions) the steam is marked as not +wide oriented. + +@item +If any of the wide character functions is used the stream is marked as +wide oriented + +@item +The @code{fwide} function can be used to set the orientation either way. +@end itemize + +It is important to never mix the use of wide and not wide operations on +a stream. There are no diagnostics issued. The application behavior +will simply be strange or the application will simply crash. The +@code{fwide} function can help avoiding this. + +@comment wchar.h +@comment ISO +@deftypefun int fwide (FILE *@var{stream}, int @var{mode}) + +The @code{fwide} function can use used to set and query the state of the +orientation of the stream @var{stream}. If the @var{mode} parameter has +a positive value the streams get wide oriented, for negative values +narrow oriented. It is not possible to overwrite previous orientations +with @code{fwide}. I.e., if the stream @var{stream} was already +oriented before the call nothing is done. + +If @var{mode} is zero the current orientation state is queried and +nothing is changed. + +The @code{fwide} function returns a negative value, zero, or a positive +value if the stream is narrow, not at all, or wide oriented +respectively. + +This function was introduced in @w{Amendment 1} to @w{ISO C90} and is +declared in @file{wchar.h}. +@end deftypefun + +It is generally a good idea to orient a stream as early as possible. +This can prevent surprise especially for the standard streams +@code{stdin}, @code{stdout}, and @code{stderr}. If some library +function in some situations uses one of these streams and this use +orients the stream in a different way the rest of the application +expects it one might end up with hard to reproduce errors. Remember +that no errors are signal if the streams are used incorrectly. Leaving +a stream unoriented after creation is normally only necessary for +library functions which create streams which can be used in different +contexts. + +When writing code which uses streams and which can be used in different +contexts it is important to query the orientation of the stream before +using it (unless the rules of the library interface demand a specific +orientation). The following little, silly function illustrates this. + +@smallexample +void +print_f (FILE *fp) +@{ + if (fwide (fp, 0) > 0) + /* @r{Positive return value means wide orientation.} */ + fputwc (L'f', fp); + else + fputc ('f', fp); +@} +@end smallexample + +Note that in this case the function @code{print_f} decides about the +orientation of the stream if it was unoriented before (will not happen +if the advise above is followed). + +The encoding used for the @code{wchar_t} values is unspecified and the +user must not make any assumptions about it. For I/O of @code{wchar_t} +values this means that it is impossible to write these values directly +to the stream. This is not what follows from the @w{ISO C} locale model +either. What happens instead is that the bytes read from or written to +the underlying media are first converted into the internal encoding +chosen by the implementation for @code{wchar_t}. The external encoding +is determined by the @code{LC_CTYPE} category of the current locale or +by the @samp{ccs} part of the mode specification given to @code{fopen}, +@code{fopen64}, @code{freopen}, or @code{freopen64}. How and when the +conversion happens is unspecified and it happens invisible to the user. + +Since a stream is created in the unoriented state it has at that point +no conversion associated with it. The conversion which will be used is +determined by the @code{LC_CTYPE} category selected at the time the +stream is oriented. If the locales are changed at the runtime this +might produce surprising results unless one pays attention. This is +just another good reason to orient the stream explicitly as soon as +possible, perhaps with a call to @code{fwide}. + @node Simple Output @section Simple Output by Characters or Lines @@ -644,8 +786,10 @@ which is written without knowledge about the @code{_unlocked} functions This section describes functions for performing character- and line-oriented output. -These functions are declared in the header file @file{stdio.h}. +These narrow streams functions are declared in the header file +@file{stdio.h} and the wide stream functions in @file{wchar.h}. @pindex stdio.h +@pindex wchar.h @comment stdio.h @comment ISO @@ -656,6 +800,14 @@ The @code{fputc} function converts the character @var{c} to type character @var{c} is returned. @end deftypefun +@comment wchar.h +@comment ISO +@deftypefun wint_t fputwc (wchar_t @var{wc}, FILE *@var{stream}) +The @code{fputwc} function writes the wide character @var{wc} to the +stream @var{stream}. @code{WEOF} is returned if a write error occurs; +otherwise the character @var{wc} is returned. +@end deftypefun + @comment stdio.h @comment POSIX @deftypefun int fputc_unlocked (int @var{c}, FILE *@var{stream}) @@ -664,6 +816,16 @@ function except that it does not implicitly lock the stream if the state is @code{FSETLOCKING_INTERNAL}. @end deftypefun +@comment wchar.h +@comment POSIX +@deftypefun wint_t fputwc_unlocked (wint_t @var{wc}, FILE *@var{stream}) +The @code{fputwc_unlocked} function is equivalent to the @code{fputwc} +function except that it does not implicitly lock the stream if the state +is @code{FSETLOCKING_INTERNAL}. + +This function is a GNU extension. +@end deftypefun + @comment stdio.h @comment ISO @deftypefun int putc (int @var{c}, FILE *@var{stream}) @@ -674,6 +836,16 @@ general rule for macros. @code{putc} is usually the best function to use for writing a single character. @end deftypefun +@comment wchar.h +@comment ISO +@deftypefun wint_t putwc (wchar_t @var{wc}, FILE *@var{stream}) +This is just like @code{fputwc}, except that it can be implement as +a macro, making it faster. One consequence is that it may evaluate the +@var{stream} argument more than once, which is an exception to the +general rule for macros. @code{putwc} is usually the best function to +use for writing a single wide character. +@end deftypefun + @comment stdio.h @comment POSIX @deftypefun int putc_unlocked (int @var{c}, FILE *@var{stream}) @@ -682,6 +854,16 @@ function except that it does not implicitly lock the stream if the state is @code{FSETLOCKING_INTERNAL}. @end deftypefun +@comment wchar.h +@comment GNU +@deftypefun wint_t putwc_unlocked (wchar_t @var{wc}, FILE *@var{stream}) +The @code{putwc_unlocked} function is equivalent to the @code{putwc} +function except that it does not implicitly lock the stream if the state +is @code{FSETLOCKING_INTERNAL}. + +This function is a GNU extension. +@end deftypefun + @comment stdio.h @comment ISO @deftypefun int putchar (int @var{c}) @@ -689,6 +871,13 @@ The @code{putchar} function is equivalent to @code{putc} with @code{stdout} as the value of the @var{stream} argument. @end deftypefun +@comment wchar.h +@comment ISO +@deftypefun wint_t putchar (wchar_t @var{wc}) +The @code{putwchar} function is equivalent to @code{putwc} with +@code{stdout} as the value of the @var{stream} argument. +@end deftypefun + @comment stdio.h @comment POSIX @deftypefun int putchar_unlocked (int @var{c}) @@ -697,6 +886,16 @@ function except that it does not implicitly lock the stream if the state is @code{FSETLOCKING_INTERNAL}. @end deftypefun +@comment wchar.h +@comment GNU +@deftypefun wint_t putwchar_unlocked (wchar_t @var{wc}) +The @code{putwchar_unlocked} function is equivalent to the @code{putwchar} +function except that it does not implicitly lock the stream if the state +is @code{FSETLOCKING_INTERNAL}. + +This function is a GNU extension. +@end deftypefun + @comment stdio.h @comment ISO @deftypefun int fputs (const char *@var{s}, FILE *@var{stream}) @@ -720,6 +919,18 @@ fputs ("hungry?\n", stdout); outputs the text @samp{Are you hungry?} followed by a newline. @end deftypefun +@comment wchar.h +@comment ISO +@deftypefun int fputws (const wchar_t *@var{ws}, FILE *@var{stream}) +The function @code{fputws} writes the wide character string @var{ws} to +the stream @var{stream}. The terminating null character is not written. +This function does @emph{not} add a newline character, either. It +outputs only the characters in the string. + +This function returns @code{WEOF} if a write error occurs, and otherwise +a non-negative value. +@end deftypefun + @comment stdio.h @comment GNU @deftypefun int fputs_unlocked (const char *@var{s}, FILE *@var{stream}) @@ -730,6 +941,16 @@ is @code{FSETLOCKING_INTERNAL}. This function is a GNU extension. @end deftypefun +@comment wchar.h +@comment GNU +@deftypefun int fputws_unlocked (const wchar_t *@var{ws}, FILE *@var{stream}) +The @code{fputws_unlocked} function is equivalent to the @code{fputws} +function except that it does not implicitly lock the stream if the state +is @code{FSETLOCKING_INTERNAL}. + +This function is a GNU extension. +@end deftypefun + @comment stdio.h @comment ISO @deftypefun int puts (const char *@var{s}) @@ -761,21 +982,25 @@ recommend you use @code{fwrite} instead (@pxref{Block Input/Output}). @section Character Input @cindex reading from a stream, by characters -This section describes functions for performing character-oriented input. -These functions are declared in the header file @file{stdio.h}. +This section describes functions for performing character-oriented +input. These narrow streams functions are declared in the header file +@file{stdio.h} and the wide character functions are declared in +@file{wchar.h}. @pindex stdio.h - -These functions return an @code{int} value that is either a character of -input, or the special value @code{EOF} (usually -1). It is important to -store the result of these functions in a variable of type @code{int} -instead of @code{char}, even when you plan to use it only as a -character. Storing @code{EOF} in a @code{char} variable truncates its -value to the size of a character, so that it is no longer -distinguishable from the valid character @samp{(char) -1}. So always -use an @code{int} for the result of @code{getc} and friends, and check -for @code{EOF} after the call; once you've verified that the result is -not @code{EOF}, you can be sure that it will fit in a @samp{char} -variable without loss of information. +@pindex wchar.h + +These functions return an @code{int} or @code{wint_t} value (for narrow +and wide stream functions respectively) that is either a character of +input, or the special value @code{EOF}/@code{WEOF} (usually -1). For +the narrow stream functions it is important to store the result of these +functions in a variable of type @code{int} instead of @code{char}, even +when you plan to use it only as a character. Storing @code{EOF} in a +@code{char} variable truncates its value to the size of a character, so +that it is no longer distinguishable from the valid character +@samp{(char) -1}. So always use an @code{int} for the result of +@code{getc} and friends, and check for @code{EOF} after the call; once +you've verified that the result is not @code{EOF}, you can be sure that +it will fit in a @samp{char} variable without loss of information. @comment stdio.h @comment ISO @@ -786,6 +1011,14 @@ the stream @var{stream} and returns its value, converted to an @code{EOF} is returned instead. @end deftypefun +@comment wchar.h +@comment ISO +@deftypefun wint_t fgetwc (FILE *@var{stream}) +This function reads the next wide character from the stream @var{stream} +and returns its value. If an end-of-file condition or read error +occurs, @code{WEOF} is returned instead. +@end deftypefun + @comment stdio.h @comment POSIX @deftypefun int fgetc_unlocked (FILE *@var{stream}) @@ -794,6 +1027,16 @@ function except that it does not implicitly lock the stream if the state is @code{FSETLOCKING_INTERNAL}. @end deftypefun +@comment wchar.h +@comment GNU +@deftypefun wint_t fgetwc_unlocked (FILE *@var{stream}) +The @code{fgetwc_unlocked} function is equivalent to the @code{fgetwc} +function except that it does not implicitly lock the stream if the state +is @code{FSETLOCKING_INTERNAL}. + +This function is a GNU extension. +@end deftypefun + @comment stdio.h @comment ISO @deftypefun int getc (FILE *@var{stream}) @@ -804,6 +1047,15 @@ optimized, so it is usually the best function to use to read a single character. @end deftypefun +@comment wchar.h +@comment ISO +@deftypefun wint_t getwc (FILE *@var{stream}) +This is just like @code{fgetwc}, except that it is permissible for it to +be implemented as a macro that evaluates the @var{stream} argument more +than once. @code{getwc} can be highly optimized, so it is usually the +best function to use to read a single wide character. +@end deftypefun + @comment stdio.h @comment POSIX @deftypefun int getc_unlocked (FILE *@var{stream}) @@ -812,6 +1064,16 @@ function except that it does not implicitly lock the stream if the state is @code{FSETLOCKING_INTERNAL}. @end deftypefun +@comment wchar.h +@comment GNU +@deftypefun wint_t getwc_unlocked (FILE *@var{stream}) +The @code{getwc_unlocked} function is equivalent to the @code{getwc} +function except that it does not implicitly lock the stream if the state +is @code{FSETLOCKING_INTERNAL}. + +This function is a GNU extension. +@end deftypefun + @comment stdio.h @comment ISO @deftypefun int getchar (void) @@ -819,6 +1081,13 @@ The @code{getchar} function is equivalent to @code{getc} with @code{stdin} as the value of the @var{stream} argument. @end deftypefun +@comment wchar.h +@comment ISO +@deftypefun wint_t getwchar (void) +The @code{getwchar} function is equivalent to @code{getwc} with @code{stdin} +as the value of the @var{stream} argument. +@end deftypefun + @comment stdio.h @comment POSIX @deftypefun int getchar_unlocked (void) @@ -827,9 +1096,20 @@ function except that it does not implicitly lock the stream if the state is @code{FSETLOCKING_INTERNAL}. @end deftypefun +@comment wchar.h +@comment GNU +@deftypefun wint_t getwchar_unlocked (void) +The @code{getwchar_unlocked} function is equivalent to the @code{getwchar} +function except that it does not implicitly lock the stream if the state +is @code{FSETLOCKING_INTERNAL}. + +This function is a GNU extension. +@end deftypefun + Here is an example of a function that does input using @code{fgetc}. It would work just as well using @code{getc} instead, or using -@code{getchar ()} instead of @w{@code{fgetc (stdin)}}. +@code{getchar ()} instead of @w{@code{fgetc (stdin)}}. The code would +also work the same for the wide character stream functions. @smallexample int @@ -873,7 +1153,7 @@ way to distinguish this from an input word with value -1. @node Line Input @section Line-Oriented Input -Since many programs interpret input on the basis of lines, it's +Since many programs interpret input on the basis of lines, it is convenient to have functions to read a line of text from a stream. Standard C has functions to do this, but they aren't very safe: null @@ -969,6 +1249,31 @@ a null character, you should either handle it properly or print a clear error message. We recommend using @code{getline} instead of @code{fgets}. @end deftypefun +@comment wchar.h +@comment ISO +@deftypefun {wchar_t *} fgetws (wchar_t *@var{ws}, int @var{count}, FILE *@var{stream}) +The @code{fgetws} function reads wide characters from the stream +@var{stream} up to and including a newline character and stores them in +the string @var{ws}, adding a null wide character to mark the end of the +string. You must supply @var{count} wide characters worth of space in +@var{ws}, but the number of characters read is at most @var{count} +@minus{} 1. The extra character space is used to hold the null wide +character at the end of the string. + +If the system is already at end of file when you call @code{fgetws}, then +the contents of the array @var{ws} are unchanged and a null pointer is +returned. A null pointer is also returned if a read error occurs. +Otherwise, the return value is the pointer @var{ws}. + +@strong{Warning:} If the input data has a null wide character (which are +null bytes in the input stream), you can't tell. So don't use +@code{fgetws} unless you know the data cannot contain a null. Don't use +it to read files edited by the user because, if the user inserts a null +character, you should either handle it properly or print a clear error +message. +@comment XXX We need getwline!!! +@end deftypefun + @comment stdio.h @comment GNU @deftypefun {char *} fgets_unlocked (char *@var{s}, int @var{count}, FILE *@var{stream}) @@ -979,6 +1284,16 @@ is @code{FSETLOCKING_INTERNAL}. This function is a GNU extension. @end deftypefun +@comment wchar.h +@comment GNU +@deftypefun {wchar_t *} fgetws_unlocked (wchar_t *@var{ws}, int @var{count}, FILE *@var{stream}) +The @code{fgetws_unlocked} function is equivalent to the @code{fgetws} +function except that it does not implicitly lock the stream if the state +is @code{FSETLOCKING_INTERNAL}. + +This function is a GNU extension. +@end deftypefun + @comment stdio.h @comment ISO @deftypefn {Deprecated function} {char *} gets (char *@var{s}) @@ -1105,6 +1420,13 @@ input available. After you read that character, trying to read again will encounter end of file. @end deftypefun +@comment wchar.h +@comment ISO +@deftypefun wint_t ungetwc (wint_t @var{wc}, FILE *@var{stream}) +The @code{ungetwc} function behaves just like @code{ungetc} just that it +pushes back a wide character. +@end deftypefun + Here is an example showing the use of @code{getc} and @code{ungetc} to skip over whitespace characters. When this function reaches a non-whitespace character, it unreads that character to be seen again on @@ -1463,9 +1785,17 @@ Conversions}, for details. @item @samp{%c} Print a single character. @xref{Other Output Conversions}. +@item @samp{%C} +This is an alias for @samp{%lc} which is supported for compatibility +with the Unix standard. + @item @samp{%s} Print a string. @xref{Other Output Conversions}. +@item @samp{%S} +This is an alias for @samp{%ls} which is supported for compatibility +with the Unix standard. + @item @samp{%p} Print the value of a pointer. @xref{Other Output Conversions}. @@ -1585,6 +1915,10 @@ Specifies that the argument is a @code{long int} or @code{unsigned long int}, as appropriate. Two @samp{l} characters is like the @samp{L} modifier, below. +If used with @samp{%c} or @samp{%s} the corresponding parameter is +considered as a wide character or wide character string respectively. +This use of @samp{l} was introduced in @w{Amendment 1} to @w{ISO C90}. + @item L @itemx ll @itemx q @@ -1785,11 +2119,13 @@ Notice how the @samp{%g} conversion drops trailing zeros. This section describes miscellaneous conversions for @code{printf}. -The @samp{%c} conversion prints a single character. The @code{int} -argument is first converted to an @code{unsigned char}. The @samp{-} -flag can be used to specify left-justification in the field, but no -other flags are defined, and no precision or type modifier can be given. -For example: +The @samp{%c} conversion prints a single character. In case there is no +@samp{l} modifier the @code{int} argument is first converted to an +@code{unsigned char}. Then, if used in a wide stream function, the +character is converted into the corresponding wide character. The +@samp{-} flag can be used to specify left-justification in the field, +but no other flags are defined, and no precision or type modifier can be +given. For example: @smallexample printf ("%c%c%c%c%c", 'h', 'e', 'l', 'l', 'o'); @@ -1798,9 +2134,16 @@ printf ("%c%c%c%c%c", 'h', 'e', 'l', 'l', 'o'); @noindent prints @samp{hello}. -The @samp{%s} conversion prints a string. The corresponding argument -must be of type @code{char *} (or @code{const char *}). A precision can -be specified to indicate the maximum number of characters to write; +If there is a @samp{l} modifier present the argument is expected to be +of type @code{wint_t}. If used in a multibyte function the wide +character is converted into a multibyte character before being added to +the output. In this case more than one output byte can be produced. + +The @samp{%s} conversion prints a string. If no @samp{l} modifier is +present the corresponding argument must be of type @code{char *} (or +@code{const char *}). If used in a wide stream function the string is +first converted in a wide character string. A precision can be +specified to indicate the maximum number of characters to write; otherwise characters in the string up to but not including the terminating null character are written to the output stream. The @samp{-} flag can be used to specify left-justification in the field, @@ -1814,6 +2157,8 @@ printf ("%3s%-6s", "no", "where"); @noindent prints @samp{ nowhere }. +If there is a @samp{l} modifier present the argument is expected to be of type @code{wchar_t} (or @code{const wchar_t *}). + If you accidentally pass a null pointer as the argument for a @samp{%s} conversion, the GNU library prints it as @samp{(null)}. We think this is more useful than crashing. But it's not good practice to pass a null @@ -1911,6 +2256,15 @@ control of the template string @var{template} to the stream negative value if there was an output error. @end |
