manual/search.texi


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664

@node Searching and Sorting, Pattern Matching, Message Translation, Top
@c %MENU% General searching and sorting functions
@chapter Searching and Sorting

This chapter describes functions for searching and sorting arrays of
arbitrary objects.  You pass the appropriate comparison function to be
applied as an argument, along with the size of the objects in the array
and the total number of elements.

@menu
* Comparison Functions::        Defining how to compare two objects.
				 Since the sort and search facilities
                                 are general, you have to specify the
                                 ordering.
* Array Search Function::       The @code{bsearch} function.
* Array Sort Function::         The @code{qsort} function.
* Search/Sort Example::         An example program.
* Hash Search Function::        The @code{hsearch} function.
* Tree Search Function::        The @code{tsearch} function.
@end menu

@node Comparison Functions
@section Defining the Comparison Function
@cindex Comparison Function

In order to use the sorted array library functions, you have to describe
how to compare the elements of the array.

To do this, you supply a comparison function to compare two elements of
the array.  The library will call this function, passing as arguments
pointers to two array elements to be compared.  Your comparison function
should return a value the way @code{strcmp} (@pxref{String/Array
Comparison}) does: negative if the first argument is ``less'' than the
second, zero if they are ``equal'', and positive if the first argument
is ``greater''.

Here is an example of a comparison function which works with an array of
numbers of type @code{long int}:

@smallexample
int
compare_long_ints (const void *a, const void *b)
@{
  const long int *la = a;
  const long int *lb = b;

  return (*la > *lb) - (*la < *lb);
@}
@end smallexample

(The code would have to be more complicated for an array of @code{double},
to handle NaNs correctly.)

The header file @file{stdlib.h} defines a name for the data type of
comparison functions.  This type is a GNU extension.

@comment stdlib.h
@comment GNU
@tindex comparison_fn_t
@smallexample
int comparison_fn_t (const void *, const void *);
@end smallexample

@node Array Search Function
@section Array Search Function
@cindex search function (for arrays)
@cindex binary search function (for arrays)
@cindex array search function

Generally searching for a specific element in an array means that
potentially all elements must be checked.  @Theglibc{} contains
functions to perform linear search.  The prototypes for the following
two functions can be found in @file{search.h}.

@deftypefun {void *} lfind (const void *@var{key}, const void *@var{base}, size_t *@var{nmemb}, size_t @var{size}, comparison_fn_t @var{compar})
@standards{SVID, search.h}
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
The @code{lfind} function searches in the array with @code{*@var{nmemb}}
elements of @var{size} bytes pointed to by @var{base} for an element
which matches the one pointed to by @var{key}.  The function pointed to
by @var{compar} is used to decide whether two elements match.

The return value is a pointer to the matching element in the array
starting at @var{base} if it is found.  If no matching element is
available @code{NULL} is returned.

The mean runtime of this function is proportional to @code{*@var{nmemb}/2},
assuming random elements of the array are searched for.  This
function should be used only if elements often get added to or deleted from
the array in which case it might not be useful to sort the array before
searching.
@end deftypefun

@deftypefun {void *} lsearch (const void *@var{key}, void *@var{base}, size_t *@var{nmemb}, size_t @var{size}, comparison_fn_t @var{compar})
@standards{SVID, search.h}
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
@c A signal handler that interrupted an insertion and performed an
@c insertion itself would leave the array in a corrupt state (e.g. one
@c new element initialized twice, with parts of both initializations
@c prevailing, and another uninitialized element), but this is just a
@c special case of races on user-controlled objects, that have to be
@c avoided by users.

@c In case of cancellation, we know the array won't be left in a corrupt
@c state; the new element is initialized before the element count is
@c incremented, and the compiler can't reorder these operations because
@c it can't know that they don't alias.  So, we'll either cancel after
@c the increment and the initialization are both complete, or the
@c increment won't have taken place, and so how far the initialization
@c got doesn't matter.
The @code{lsearch} function is similar to the @code{lfind} function.  It
searches the given array for an element and returns it if found.  The
difference is that if no matching element is found the @code{lsearch}
function adds the object pointed to by @var{key} (with a size of
@var{size} bytes) at the end of the array and it increments the value of
@code{*@var{nmemb}} to reflect this addition.

This means for the caller that if it is not sure that the array contains
the element one is searching for the memory allocated for the array
starting at @var{base} must have room for at least @var{size} more
bytes.  If one is sure the element is in the array it is better to use
@code{lfind} so having more room in the array is always necessary when
calling @code{lsearch}.
@end deftypefun

To search a sorted or partially sorted array for an element matching the key,
use the @code{bsearch} function.  The prototype for this function is in
the header file @file{stdlib.h}.
@pindex stdlib.h

@deftypefun {void *} bsearch (const void *@var{key}, const void *@var{array}, size_t @var{count}, size_t @var{size}, comparison_fn_t @var{compare})
@standards{ISO, stdlib.h}
@safety{@prelim{}@mtsafe{}@assafe{}@acsafe{}}
The @code{bsearch} function searches @var{array} for an element
that is equivalent to @var{key}.  The array contains @var{count} elements,
each of which is of size @var{size} bytes.

The @var{compare} function is used to perform the comparison.  This
function is called with arguments that point to the key and to an
array element, in that order, and should return an
integer less than, equal to, or greater than zero corresponding to
whether the key is considered less than, equal to, or greater than
the array element.  The function should not alter the array's contents,
and the same array element should always compare the same way with the key.

Although the array need not be completely sorted, it should be
partially sorted with respect to @var{key}.  That is, the array should
begin with elements that compare less than @var{key}, followed by
elements that compare equal to @var{key}, and ending with elements
that compare greater than @var{key}.  Any or all of these element
sequences can be empty.

The return value is a pointer to a matching array element, or a null
pointer if no match is found.  If the array contains more than one element
that matches, the one that is returned is unspecified.

This function derives its name from the fact that it is implemented
using the binary search algorithm.
@end deftypefun

@node Array Sort Function
@section Array Sort Function
@cindex sort function (for arrays)
@cindex quick sort function (for arrays)
@cindex array sort function

To sort an array using an arbitrary comparison function, use the
@code{qsort} function.  The prototype for this function is in
@file{stdlib.h}.
@pindex stdlib.h

@deftypefun void qsort (void *@var{array}, size_t @var{count}, size_t @var{size}, comparison_fn_t @var{compare})
@standards{ISO, stdlib.h}
@safety{@prelim{}@mtsafe{}@assafe{}@acunsafe{@acucorrupt{}}}
The @code{qsort} function sorts the array @var{array}.  The array
contains @var{count} elements, each of which is of size @var{size}.

The @var{compare} function is used to perform the comparison on the
array elements.  This function is called with two pointer arguments and
should return an integer less than, equal to, or greater than zero
corresponding to whether its first argument is considered less than,
equal to, or greater than its second argument.
The function must not alter the array's contents, and must define a
total ordering on the array elements, including any unusual values
such as floating-point NaN (@pxref{Infinity and NaN}).
Because the sorting process can move elements,
the function's return value must not depend on the element addresses
or the relative positions of elements within the array,
as these are meaningless while @code{qsort} is running.

@cindex stable sorting
@strong{Warning:} If two elements compare equal, their order after
sorting is unpredictable.  That is to say, the sorting is not stable.
This can make a difference when the comparison considers only part of
the elements and two elements that compare equal may differ in other
respects.  To ensure a stable sort in this situation, you can augment
each element with an appropriate tie-breaking value, such as its
original array index.

Here is a simple example of sorting an array of @code{long int} in numerical
order, using the comparison function defined above (@pxref{Comparison
Functions}):

@smallexample
@{
  long int *array;
  size_t nmemb;
  @dots{}
  qsort (array, nmemb, sizeof *array, compare_long_ints);
@}
@end smallexample

The @code{qsort} function derives its name from the fact that it was
originally implemented using the ``quick sort'' algorithm.

The implementation of @code{qsort} attempts to allocate auxiliary memory
and use the merge sort algorithm, without violating C standard requirement
that arguments passed to the comparison function point within the array.
If the memory allocation fails, @code{qsort} resorts to a slower algorithm.
@end deftypefun

@node Search/Sort Example
@section Searching and Sorting Example

Here is an example showing the use of @code{qsort} and @code{bsearch}
with an array of structures.  The elements of the array are sorted
by comparing their @code{name} fields with the @code{strcmp} function.
Then, we can look up individual elements based on their names.

@comment This example is dedicated to the memory of Jim Henson.  RIP.
@smallexample
@include search.c.texi
@end smallexample

@cindex Kermit the frog
The output from this program looks like:

@smallexample
Kermit, the frog
Piggy, the pig
Gonzo, the whatever
Fozzie, the bear
Sam, the eagle
Robin, the frog
Animal, the animal
Camilla, the chicken
Sweetums, the monster
Dr. Strangepork, the pig
Link Hogthrob, the pig
Zoot, the human
Dr. Bunsen Honeydew, the human
Beaker, the human
Swedish Chef, the human

Animal, the animal
Beaker, the human
Camilla, the chicken
Dr. Bunsen Honeydew, the human
Dr. Strangepork, the pig
Fozzie, the bear
Gonzo, the whatever
Kermit, the frog
Link Hogthrob, the pig
Piggy, the pig
Robin, the frog
Sam, the eagle
Swedish Chef, the human
Sweetums, the monster
Zoot, the human

Kermit, the frog
Gonzo, the whatever
Couldn't find Janice.
@end smallexample


@node Hash Search Function
@section The @code{hsearch} function.

The functions mentioned so far in this chapter are for searching in a sorted
or unsorted array.  There are other methods to organize information
which later should be searched.  The costs of insert, delete and search
differ.  One possible implementation is using hashing tables.
The following functions are declared in the header file @file{search.h}.

@deftypefun int hcreate (size_t @var{nel})
@standards{SVID, search.h}
@safety{@prelim{}@mtunsafe{@mtasurace{:hsearch}}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}}
@c hcreate @mtasurace:hsearch @ascuheap @acucorrupt @acsmem
@c  hcreate_r dup @mtsrace:htab @ascuheap @acucorrupt @acsmem
The @code{hcreate} function creates a hashing table which can contain at
least @var{nel} elements.  There is no possibility to grow this table so
it is necessary to choose the value for @var{nel} wisely.  The method
used to implement this function might make it necessary to make the
number of elements in the hashing table larger than the expected maximal
number of elements.  Hashing tables usually work inefficiently if they are
filled 80% or more.  The constant access time guaranteed by hashing can
only be achieved if few collisions exist.  See Knuth's ``The Art of
Computer Programming, Part 3: Searching and Sorting'' for more
information.

The weakest aspect of this function is that there can be at most one
hashing table used through the whole program.  The table is allocated
in local memory out of control of the programmer.  As an extension @theglibc{}
provides an additional set of functions with a reentrant
interface which provides a similar interface but which allows keeping
arbitrarily many hashing tables.

It is possible to use more than one hashing table in the program run if
the former table is first destroyed by a call to @code{hdestroy}.

The function returns a non-zero value if successful.  If it returns zero,
something went wrong.  This could either mean there is already a hashing
table in use or the program ran out of memory.
@end deftypefun

@deftypefun void hdestroy (void)
@standards{SVID, search.h}
@safety{@prelim{}@mtunsafe{@mtasurace{:hsearch}}@asunsafe{@ascuheap{}}@acunsafe{@acucorrupt{} @acsmem{}}}
@c hdestroy @mtasurace:hsearch @ascuheap @acucorrupt @acsmem
@c  hdestroy_r dup @mtsrace:htab @ascuheap @acucorrupt @acsmem
The @code{hdestroy} function can be used to free all the resources
allocated in a previous call of @code{hcreate}.  After a call to this
function it is again possible to call @code{hcreate} and allocate a new
table with possibly different size.

It is important to remember that the elements contained in the hashing
table at the time @code{hdestroy} is called are @emph{not} freed by this
function.  It is the responsibility of the program code to free those
strings (if necessary at all).  Freeing all the element memory is not
possible without extra, separately kept information since there is no
function to iterate through all available elements in the hashing table.
If it is really necessary to free a table and all elements the
programmer has to keep a list of all table elements and before calling
@code{hdestroy} s/he has to free all element's data using this list.
This is a very unpleasant mechanism and it also shows that this kind of
hashing table is mainly meant for tables which are created once and
used until the end of the program run.
@end deftypefun

Entries of the hashing table and keys for the search are defined using
this type:

@deftp {Data type} ENTRY
@table @code
@item char *key
Pointer to a zero-terminated string of characters describing the key for
the search or the element in the hashing table.

This is a limiting restriction of the functionality of the
@code{hsearch} functions: They can only be used for data sets which
use the NUL character always and solely to terminate keys.  It is not
possible to handle general binary data for keys.

@item void *data
Generic pointer for use by the application.  The hashing table
implementation preserves this pointer in entries, but does not use it
in any way otherwise.
@end table
@end deftp

@deftp {Data type} {struct entry}
The underlying type of @code{ENTRY}.
@end deftp

@