From fa8d436c87f156d18208df3819fecee9fc1dbd9e Mon Sep 17 00:00:00 2001
From: Ulrich Drepper <drepper@redhat.com>
Date: Tue, 29 Jan 2002 07:54:51 +0000
Subject: Update.

2002-01-18  Wolfram Gloger  <wg@malloc.de>

	* malloc/malloc.c: Rewrite, adapted from Doug Lea's malloc-2.7.0.c.
	* malloc/malloc.h: Likewise.
	* malloc/arena.c: New file.
	* malloc/hooks.c: New file.
	* malloc/tst-mallocstate.c: New file.
	* malloc/Makefile: Add new testcase tst-mallocstate.
	Add arena.c and hooks.c to distribute.  Fix commented CPPFLAGS.

2002-01-28  Ulrich Drepper  <drepper@redhat.com>

	* stdlib/msort.c: Remove last patch.  The optimization violates the
	same rule which qsort.c had problems with.

2002-01-27  Paul Eggert  <eggert@twinsun.com>

	* stdlib/qsort.c (_quicksort): Do not apply the comparison function
	to a pivot element that lies outside the array to be sorted, as
	ISO C99 requires that the comparison function be called only with
	addresses of array elements [PR libc/2880].
---
 malloc/malloc.c | 8084 ++++++++++++++++++++++++++++---------------------------
 1 file changed, 4167 insertions(+), 3917 deletions(-)

(limited to 'malloc/malloc.c')

diff --git a/malloc/malloc.c b/malloc/malloc.c
index 8279ddaf22..e663f84707 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -1,32 +1,47 @@
 /* Malloc implementation for multiple threads without lock contention.
    Copyright (C) 1996,1997,1998,1999,2000,2001 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
-   Contributed by Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
-   and Doug Lea <dl@cs.oswego.edu>, 1996.
+   Contributed by Wolfram Gloger <wg@malloc.de>
+   and Doug Lea <dl@cs.oswego.edu>, 2001.
 
    The GNU C Library is free software; you can redistribute it and/or
-   modify it under the terms of the GNU Lesser General Public
-   License as published by the Free Software Foundation; either
-   version 2.1 of the License, or (at your option) any later version.
+   modify it under the terms of the GNU Library General Public License as
+   published by the Free Software Foundation; either version 2 of the
+   License, or (at your option) any later version.
 
    The GNU C Library is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-   Lesser General Public License for more details.
+   Library General Public License for more details.
 
-   You should have received a copy of the GNU Lesser General Public
-   License along with the GNU C Library; if not, write to the Free
-   Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
-   02111-1307 USA.  */
+   You should have received a copy of the GNU Library General Public
+   License along with the GNU C Library; see the file COPYING.LIB.  If not,
+   write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
+   Boston, MA 02111-1307, USA.  */
 
-/* $Id$
+/*
+  This is a version (aka ptmalloc2) of malloc/free/realloc written by
+  Doug Lea and adapted to multiple threads/arenas by Wolfram Gloger.
+
+* Version ptmalloc2-20011215
+  $Id$
+  based on:
+  VERSION 2.7.0 Sun Mar 11 14:14:06 2001  Doug Lea  (dl at gee)
 
-  This work is mainly derived from malloc-2.6.4 by Doug Lea
-  <dl@cs.oswego.edu>, which is available from:
+   Note: There may be an updated version of this malloc obtainable at
+           http://www.malloc.de/malloc/ptmalloc2.tar.gz
+         Check before installing!
 
-                 ftp://g.oswego.edu/pub/misc/malloc.c
+* Quickstart
 
-  Most of the original comments are reproduced in the code below.
+  In order to compile this implementation, a Makefile is provided with
+  the ptmalloc2 distribution, which has pre-defined targets for some
+  popular systems (e.g. "make posix" for Posix threads).  All that is
+  typically required with regard to compiler flags is the selection of
+  the thread package via defining one out of USE_PTHREADS, USE_THR or
+  USE_SPROC.  Check the thread-m.h file for what effects this has.
+  Many/most systems will additionally require USE_TSD_DATA_HACK to be
+  defined, so this is the default for "make posix".
 
 * Why use this malloc?
 
@@ -34,85 +49,62 @@
   most tunable malloc ever written. However it is among the fastest
   while also being among the most space-conserving, portable and tunable.
   Consistent balance across these factors results in a good general-purpose
-  allocator. For a high-level description, see
-     http://g.oswego.edu/dl/html/malloc.html
-
-  On many systems, the standard malloc implementation is by itself not
-  thread-safe, and therefore wrapped with a single global lock around
-  all malloc-related functions.  In some applications, especially with
-  multiple available processors, this can lead to contention problems
-  and bad performance.  This malloc version was designed with the goal
-  to avoid waiting for locks as much as possible.  Statistics indicate
-  that this goal is achieved in many cases.
-
-* Synopsis of public routines
-
-  (Much fuller descriptions are contained in the program documentation below.)
-
-  ptmalloc_init();
-     Initialize global configuration.  When compiled for multiple threads,
-     this function must be called once before any other function in the
-     package.  It is not required otherwise.  It is called automatically
-     in the Linux/GNU C libray or when compiling with MALLOC_HOOKS.
-  malloc(size_t n);
-     Return a pointer to a newly allocated chunk of at least n bytes, or null
-     if no space is available.
-  free(Void_t* p);
-     Release the chunk of memory pointed to by p, or no effect if p is null.
-  realloc(Void_t* p, size_t n);
-     Return a pointer to a chunk of size n that contains the same data
-     as does chunk p up to the minimum of (n, p's size) bytes, or null
-     if no space is available. The returned pointer may or may not be
-     the same as p. If p is null, equivalent to malloc.  Unless the
-     #define REALLOC_ZERO_BYTES_FREES below is set, realloc with a
-     size argument of zero (re)allocates a minimum-sized chunk.
-  memalign(size_t alignment, size_t n);
-     Return a pointer to a newly allocated chunk of n bytes, aligned
-     in accord with the alignment argument, which must be a power of
-     two.
-  valloc(size_t n);
-     Equivalent to memalign(pagesize, n), where pagesize is the page
-     size of the system (or as near to this as can be figured out from
-     all the includes/defines below.)
-  pvalloc(size_t n);
-     Equivalent to valloc(minimum-page-that-holds(n)), that is,
-     round up n to nearest pagesize.
-  calloc(size_t unit, size_t quantity);
-     Returns a pointer to quantity * unit bytes, with all locations
-     set to zero.
-  cfree(Void_t* p);
-     Equivalent to free(p).
-  malloc_trim(size_t pad);
-     Release all but pad bytes of freed top-most memory back
-     to the system. Return 1 if successful, else 0.
-  malloc_usable_size(Void_t* p);
-     Report the number usable allocated bytes associated with allocated
-     chunk p. This may or may not report more bytes than were requested,
-     due to alignment and minimum size constraints.
-  malloc_stats();
-     Prints brief summary statistics on stderr.
-  mallinfo()
-     Returns (by copy) a struct containing various summary statistics.
-  mallopt(int parameter_number, int parameter_value)
-     Changes one of the tunable parameters described below. Returns
-     1 if successful in changing the parameter, else 0.
+  allocator for malloc-intensive programs.
+
+  The main properties of the algorithms are:
+  * For large (>= 512 bytes) requests, it is a pure best-fit allocator,
+    with ties normally decided via FIFO (i.e. least recently used).
+  * For small (<= 64 bytes by default) requests, it is a caching
+    allocator, that maintains pools of quickly recycled chunks.
+  * In between, and for combinations of large and small requests, it does
+    the best it can trying to meet both goals at once.
+  * For very large requests (>= 128KB by default), it relies on system
+    memory mapping facilities, if supported.
+
+  For a longer but slightly out of date high-level description, see
+     http://gee.cs.oswego.edu/dl/html/malloc.html
+
+  You may already by default be using a C library containing a malloc
+  that is  based on some version of this malloc (for example in
+  linux). You might still want to use the one in this file in order to
+  customize settings or to avoid overheads associated with library
+  versions.
+
+* Contents, described in more detail in "description of public routines" below.
+
+  Standard (ANSI/SVID/...)  functions:
+    malloc(size_t n);
+    calloc(size_t n_elements, size_t element_size);
+    free(Void_t* p);
+    realloc(Void_t* p, size_t n);
+    memalign(size_t alignment, size_t n);
+    valloc(size_t n);
+    mallinfo()
+    mallopt(int parameter_number, int parameter_value)
+
+  Additional functions:
+    independent_calloc(size_t n_elements, size_t size, Void_t* chunks[]);
+    independent_comalloc(size_t n_elements, size_t sizes[], Void_t* chunks[]);
+    pvalloc(size_t n);
+    cfree(Void_t* p);
+    malloc_trim(size_t pad);
+    malloc_usable_size(Void_t* p);
+    malloc_stats();
 
 * Vital statistics:
 
-  Alignment:                            8-byte
-       8 byte alignment is currently hardwired into the design.  This
-       seems to suffice for all current machines and C compilers.
-
-  Assumed pointer representation:       4 or 8 bytes
-       Code for 8-byte pointers is untested by me but has worked
-       reliably by Wolfram Gloger, who contributed most of the
-       changes supporting this.
-
-  Assumed size_t  representation:       4 or 8 bytes
+  Supported pointer representation:       4 or 8 bytes
+  Supported size_t  representation:       4 or 8 bytes 
        Note that size_t is allowed to be 4 bytes even if pointers are 8.
+       You can adjust this by defining INTERNAL_SIZE_T
+
+  Alignment:                              2 * sizeof(size_t) (default)
+       (i.e., 8 byte alignment with 4byte size_t). This suffices for
+       nearly all current machines and C compilers. However, you can
+       define MALLOC_ALIGNMENT to be wider than this if necessary.
 
-  Minimum overhead per allocated chunk: 4 or 8 bytes
-       Each malloced chunk has a hidden overhead of 4 bytes holding size
+  Minimum overhead per allocated chunk:   4 or 8 bytes
+       Each malloced chunk has a hidden word of overhead holding size
        and status information.
 
   Minimum allocated size: 4-byte ptrs:  16 bytes    (including 4 overhead)
@@ -120,182 +112,136 @@
 
        When a chunk is freed, 12 (for 4byte ptrs) or 20 (for 8 byte
        ptrs but 4 byte size) or 24 (for 8/8) additional bytes are
-       needed; 4 (8) for a trailing size field
-       and 8 (16) bytes for free list pointers. Thus, the minimum
-       allocatable size is 16/24/32 bytes.
+       needed; 4 (8) for a trailing size field and 8 (16) bytes for
+       free list pointers. Thus, the minimum allocatable size is
+       16/24/32 bytes.
 
        Even a request for zero bytes (i.e., malloc(0)) returns a
        pointer to something of the minimum allocatable size.
 
-  Maximum allocated size: 4-byte size_t: 2^31 -  8 bytes
-                          8-byte size_t: 2^63 - 16 bytes
+       The maximum overhead wastage (i.e., number of extra bytes
+       allocated than were requested in malloc) is less than or equal
+       to the minimum size, except for requests >= mmap_threshold that
+       are serviced via mmap(), where the worst case wastage is 2 *
+       sizeof(size_t) bytes plus the remainder from a system page (the
+       minimal mmap unit); typically 4096 or 8192 bytes.
 
-       It is assumed that (possibly signed) size_t bit values suffice to
+  Maximum allocated size:  4-byte size_t: 2^32 minus about two pages 
+                           8-byte size_t: 2^64 minus about two pages
+
+       It is assumed that (possibly signed) size_t values suffice to
        represent chunk sizes. `Possibly signed' is due to the fact
        that `size_t' may be defined on a system as either a signed or
-       an unsigned type. To be conservative, values that would appear
-       as negative numbers are avoided.
-       Requests for sizes with a negative sign bit will return a
-       minimum-sized chunk.
-
-  Maximum overhead wastage per allocated chunk: normally 15 bytes
-
-       Alignment demands, plus the minimum allocatable size restriction
-       make the normal worst-case wastage 15 bytes (i.e., up to 15
-       more bytes will be allocated than were requested in malloc), with
-       two exceptions:
-         1. Because requests for zero bytes allocate non-zero space,
-            the worst case wastage for a request of zero bytes is 24 bytes.
-         2. For requests >= mmap_threshold that are serviced via
-            mmap(), the worst case wastage is 8 bytes plus the remainder
-            from a system page (the minimal mmap unit); typically 4096 bytes.
-
-* Limitations
-
-    Here are some features that are NOT currently supported
-
-    * No automated mechanism for fully checking that all accesses
-      to malloced memory stay within their bounds.
-    * No support for compaction.
+       an unsigned type. The ISO C standard says that it must be
+       unsigned, but a few systems are known not to adhere to this.
+       Additionally, even when size_t is unsigned, sbrk (which is by
+       default used to obtain memory from system) accepts signed
+       arguments, and may not be able to handle size_t-wide arguments
+       with negative sign bit.  Generally, values that would
+       appear as negative after accounting for overhead and alignment
+       are supported only via mmap(), which does not have this
+       limitation.
+
+       Requests for sizes outside the allowed range will perform an optional
+       failure action and then return null. (Requests may also
+       also fail because a system is out of memory.)
+
+  Thread-safety: thread-safe unless NO_THREADS is defined
+
+  Compliance: I believe it is compliant with the 1997 Single Unix Specification
+       (See http://www.opennc.org). Also SVID/XPG, ANSI C, and probably 
+       others as well.
 
 * Synopsis of compile-time options:
 
     People have reported using previous versions of this malloc on all
     versions of Unix, sometimes by tweaking some of the defines
     below. It has been tested most extensively on Solaris and
-    Linux. People have also reported adapting this malloc for use in
-    stand-alone embedded systems.
-
-    The implementation is in straight, hand-tuned ANSI C.  Among other
-    consequences, it uses a lot of macros.  Because of this, to be at
-    all usable, this code should be compiled using an optimizing compiler
-    (for example gcc -O2) that can simplify expressions and control
-    paths.
-
-  __STD_C                  (default: derived from C compiler defines)
-     Nonzero if using ANSI-standard C compiler, a C++ compiler, or
-     a C compiler sufficiently close to ANSI to get away with it.
-  MALLOC_DEBUG             (default: NOT defined)
-     Define to enable debugging. Adds fairly extensive assertion-based
-     checking to help track down memory errors, but noticeably slows down
-     execution.
-  MALLOC_HOOKS             (default: NOT defined)
-     Define to enable support run-time replacement of the allocation
-     functions through user-defined `hooks'.
-  REALLOC_ZERO_BYTES_FREES (default: defined)
-     Define this if you think that realloc(p, 0) should be equivalent
-     to free(p).  (The C standard requires this behaviour, therefore
-     it is the default.)  Otherwise, since malloc returns a unique
-     pointer for malloc(0), so does realloc(p, 0).
-  HAVE_MEMCPY               (default: defined)
-     Define if you are not otherwise using ANSI STD C, but still
-     have memcpy and memset in your C library and want to use them.
-     Otherwise, simple internal versions are supplied.
-  USE_MEMCPY               (default: 1 if HAVE_MEMCPY is defined, 0 otherwise)
-     Define as 1 if you want the C library versions of memset and
-     memcpy called in realloc and calloc (otherwise macro versions are used).
-     At least on some platforms, the simple macro versions usually
-     outperform libc versions.
-  HAVE_MMAP                 (default: defined as 1)
-     Define to non-zero to optionally make malloc() use mmap() to
-     allocate very large blocks.
-  HAVE_MREMAP                 (default: defined as 0 unless Linux libc set)
-     Define to non-zero to optionally make realloc() use mremap() to
-     reallocate very large blocks.
-  USE_ARENAS                (default: the same as HAVE_MMAP)
-     Enable support for multiple arenas, allocated using mmap().
-  malloc_getpagesize        (default: derived from system #includes)
-     Either a constant or routine call returning the system page size.
-  HAVE_USR_INCLUDE_MALLOC_H (default: NOT defined)
-     Optionally define if you are on a system with a /usr/include/malloc.h
-     that declares struct mallinfo. It is not at all necessary to
-     define this even if you do, but will ensure consistency.
-  INTERNAL_SIZE_T           (default: size_t)
-     Define to a 32-bit type (probably `unsigned int') if you are on a
-     64-bit machine, yet do not want or need to allow malloc requests of
-     greater than 2^31 to be handled. This saves space, especially for
-     very small chunks.
-  _LIBC                     (default: NOT defined)
-     Defined only when compiled as part of the Linux libc/glibc.
-     Also note that there is some odd internal name-mangling via defines
-     (for example, internally, `malloc' is named `mALLOc') needed
-     when compiling in this case. These look funny but don't otherwise
-     affect anything.
-  LACKS_UNISTD_H            (default: undefined)
-     Define this if your system does not have a <unistd.h>.
-  MORECORE                  (default: sbrk)
-     The name of the routine to call to obtain more memory from the system.
-  MORECORE_FAILURE          (default: -1)
-     The value returned upon failure of MORECORE.
-  MORECORE_CLEARS           (default 1)
-     The degree to which the routine mapped to MORECORE zeroes out
-     memory: never (0), only for newly allocated space (1) or always
-     (2).  The distinction between (1) and (2) is necessary because on
-     some systems, if the application first decrements and then
-     increments the break value, the contents of the reallocated space
-     are unspecified.
-  DEFAULT_TRIM_THRESHOLD
-  DEFAULT_TOP_PAD
-  DEFAULT_MMAP_THRESHOLD
-  DEFAULT_MMAP_MAX
-     Default values of tunable parameters (described in detail below)
-     controlling interaction with host system routines (sbrk, mmap, etc).
-     These values may also be changed dynamically via mallopt(). The
-     preset defaults are those that give best performance for typical
-     programs/systems.
-  DEFAULT_CHECK_ACTION
-     When the standard debugging hooks are in place, and a pointer is
-     detected as corrupt, do nothing (0), print an error message (1),
-     or call abort() (2).
-
-
-*/
+    Linux. It is also reported to work on WIN32 platforms.
+    People also report using it in stand-alone embedded systems.
+
+    The implementation is in straight, hand-tuned ANSI C.  It is not
+    at all modular. (Sorry!)  It uses a lot of macros.  To be at all
+    usable, this code should be compiled using an optimizing compiler
+    (for example gcc -O3) that can simplify expressions and control
+    paths. (FAQ: some macros import variables as arguments rather than
+    declare locals because people reported that some debuggers
+    otherwise get confused.)
+
+    OPTION                     DEFAULT VALUE
+
+    Compilation Environment options:
+
+    __STD_C                    derived from C compiler defines
+    WIN32                      NOT defined
+    HAVE_MEMCPY                defined
+    USE_MEMCPY                 1 if HAVE_MEMCPY is defined
+    HAVE_MMAP                  defined as 1 
+    MMAP_CLEARS                1
+    HAVE_MREMAP                0 unless linux defined
+    USE_ARENAS                 the same as HAVE_MMAP
+    malloc_getpagesize         derived from system #includes, or 4096 if not
+    HAVE_USR_INCLUDE_MALLOC_H  NOT defined
+    LACKS_UNISTD_H             NOT defined unless WIN32
+    LACKS_SYS_PARAM_H          NOT defined unless WIN32
+    LACKS_SYS_MMAN_H           NOT defined unless WIN32
+
+    Changing default word sizes:
+
+    INTERNAL_SIZE_T            size_t
+    MALLOC_ALIGNMENT           2 * sizeof(INTERNAL_SIZE_T)
+
+    Configuration and functionality options:
+
+    USE_DL_PREFIX              NOT defined
+    USE_PUBLIC_MALLOC_WRAPPERS NOT defined
+    USE_MALLOC_LOCK            NOT defined
+    MALLOC_DEBUG               NOT defined
+    REALLOC_ZERO_BYTES_FREES   1
+    MALLOC_FAILURE_ACTION      errno = ENOMEM, if __STD_C defined, else no-op
+    TRIM_FASTBINS              0
+
+    Options for customizing MORECORE:
+
+    MORECORE                   sbrk
+    MORECORE_FAILURE           -1
+    MORECORE_CONTIGUOUS        1 
+    MORECORE_CANNOT_TRIM       NOT defined
+    MORECORE_CLEARS            1
+    MMAP_AS_MORECORE_SIZE      (1024 * 1024) 
+
+    Tuning options that are also dynamically changeable via mallopt:
+
+    DEFAULT_MXFAST             64
+    DEFAULT_TRIM_THRESHOLD     128 * 1024
+    DEFAULT_TOP_PAD            0
+    DEFAULT_MMAP_THRESHOLD     128 * 1024
+    DEFAULT_MMAP_MAX           65536
+
+    There are several other #defined constants and macros that you
+    probably don't want to touch unless you are extending or adapting malloc.  */
 
 /*
-
-* Compile-time options for multiple threads:
-
-  USE_PTHREADS, USE_THR, USE_SPROC
-     Define one of these as 1 to select the thread interface:
-     POSIX threads, Solaris threads or SGI sproc's, respectively.
-     If none of these is defined as non-zero, you get a `normal'
-     malloc implementation which is not thread-safe.  Support for
-     multiple threads requires HAVE_MMAP=1.  As an exception, when
-     compiling for GNU libc, i.e. when _LIBC is defined, then none of
-     the USE_... symbols have to be defined.
-
-  HEAP_MIN_SIZE
-  HEAP_MAX_SIZE
-     When thread support is enabled, additional `heap's are created
-     with mmap calls.  These are limited in size; HEAP_MIN_SIZE should
-     be a multiple of the page size, while HEAP_MAX_SIZE must be a power
-     of two for alignment reasons.  HEAP_MAX_SIZE should be at least
-     twice as large as the mmap threshold.
-  THREAD_STATS
-     When this is defined as non-zero, some statistics on mutex locking
-     are computed.
-
+  __STD_C should be nonzero if using ANSI-standard C compiler, a C++
+  compiler, or a C compiler sufficiently close to ANSI to get away
+  with it.
 */
 
-
-
-
-/* Preliminaries */
-
 #ifndef __STD_C
-#if defined (__STDC__)
-#define __STD_C     1
-#else
-#if __cplusplus
+#if defined(__STDC__) || defined(__cplusplus)
 #define __STD_C     1
 #else
 #define __STD_C     0
-#endif /*__cplusplus*/
-#endif /*__STDC__*/
+#endif 
 #endif /*__STD_C*/
 
+
+/*
+  Void_t* is the pointer type that malloc should say it returns
+*/
+
 #ifndef Void_t
-#if __STD_C
+#if (__STD_C || defined(WIN32))
 #define Void_t      void
 #else
 #define Void_t      char
@@ -303,57 +249,59 @@
 #endif /*Void_t*/
 
 #if __STD_C
-# include <stddef.h>   /* for size_t */
-# if defined _LIBC || defined MALLOC_HOOKS
-#  include <stdlib.h>  /* for getenv(), abort() */
-# endif
+#include <stddef.h>   /* for size_t */
+#include <stdlib.h>   /* for getenv(), abort() */
 #else
-# include <sys/types.h>
-# if defined _LIBC || defined MALLOC_HOOKS
-extern char* getenv();
-# endif
+#include <sys/types.h>
 #endif
 
-/* Macros for handling mutexes and thread-specific data.  This is
-   included early, because some thread-related header files (such as
-   pthread.h) should be included before any others. */
-#include "thread-m.h"
-
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <errno.h>
-#include <stdio.h>    /* needed for malloc_stats */
+/* define LACKS_UNISTD_H if your system does not have a <unistd.h>. */
 
+/* #define  LACKS_UNISTD_H */
 
-/*
-  Compile-time options
-*/
+#ifndef LACKS_UNISTD_H
+#include <unistd.h>
+#endif
 
+/* define LACKS_SYS_PARAM_H if your system does not have a <sys/param.h>. */
+
+/* #define  LACKS_SYS_PARAM_H */
+
+
+#include <stdio.h>    /* needed for malloc_stats */
+#include <errno.h>    /* needed for optional MALLOC_FAILURE_ACTION */
 
-/*
-    Debugging:
-
-    Because freed chunks may be overwritten with link fields, this
-    malloc will often die when freed memory is overwritten by user
-    programs.  This can be very effective (albeit in an annoying way)
-    in helping track down dangling pointers.
-
-    If you compile with -DMALLOC_DEBUG, a number of assertion checks are
-    enabled that will catch more memory errors. You probably won't be
-    able to make much sense of the actual assertion errors, but they
-    should help you locate incorrectly overwritten memory.  The
-    checking is fairly extensive, and will slow down execution
-    noticeably. Calling malloc_stats or mallinfo with MALLOC_DEBUG set will
-    attempt to check every non-mmapped allocated and free chunk in the
-    course of computing the summaries. (By nature, mmapped regions
-    cannot be checked very much automatically.)
-
-    Setting MALLOC_DEBUG may also be helpful if you are trying to modify
-    this code. The assertions in the check routines spell out in more
-    detail the assumptions and invariants underlying the algorithms.
 
+/*
+  Debugging:
+
+  Because freed chunks may be overwritten with bookkeeping fields, this
+  malloc will often die when freed memory is overwritten by user
+  programs.  This can be very effective (albeit in an annoying way)
+  in helping track down dangling pointers.
+
+  If you compile with -DMALLOC_DEBUG, a number of assertion checks are
+  enabled that will catch more memory errors. You probably won't be
+  able to make much sense of the actual assertion errors, but they
+  should help you locate incorrectly overwritten memory.  The checking
+  is fairly extensive, and will slow down execution
+  noticeably. Calling malloc_stats or mallinfo with MALLOC_DEBUG set
+  will attempt to check every non-mmapped allocated and free chunk in
+  the course of computing the summmaries. (By nature, mmapped regions
+  cannot be checked very much automatically.)
+
+  Setting MALLOC_DEBUG may also be helpful if you are trying to modify
+  this code. The assertions in the check routines spell out in more
+  detail the assumptions and invariants underlying the algorithms.
+
+  Setting MALLOC_DEBUG does NOT provide an automated mechanism for
+  checking that all accesses to malloced memory stay within their
+  bounds. However, there are several add-ons and adaptations of this
+  or other mallocs available that do this.
 */
 
 #if MALLOC_DEBUG
@@ -365,42 +313,197 @@ extern "C" {
 
 /*
   INTERNAL_SIZE_T is the word-size used for internal bookkeeping
-  of chunk sizes. On a 64-bit machine, you can reduce malloc
-  overhead by defining INTERNAL_SIZE_T to be a 32 bit `unsigned int'
-  at the expense of not being able to handle requests greater than
-  2^31. This limitation is hardly ever a concern; you are encouraged
-  to set this. However, the default version is the same as size_t.
+  of chunk sizes.
+
+  The default version is the same as size_t.
+
+  While not strictly necessary, it is best to define this as an
+  unsigned type, even if size_t is a signed type. This may avoid some
+  artificial size limitations on some systems.
+
+  On a 64-bit machine, you may be able to reduce malloc overhead by
+  defining INTERNAL_SIZE_T to be a 32 bit `unsigned int' at the
+  expense of not being able to handle more than 2^32 of malloced
+  space. If this limitation is acceptable, you are encouraged to set
+  this unless you are on a platform requiring 16byte alignments. In
+  this case the alignment requirements turn out to negate any
+  potential advantages of decreasing size_t word size.
+
+  Implementors: Beware of the possible combinations of:
+     - INTERNAL_SIZE_T might be signed or unsigned, might be 32 or 64 bits,
+       and might be the same width as int or as long
+     - size_t might have different width and signedness as INTERNAL_SIZE_T
+     - int and long might be 32 or 64 bits, and might be the same width
+  To deal with this, most comparisons and difference computations
+  among INTERNAL_SIZE_Ts should cast them to unsigned long, being
+  aware of the fact that casting an unsigned int to a wider long does
+  not sign-extend. (This also makes checking for negative numbers
+  awkward.) Some of these casts result in harmless compiler warnings
+  on some systems.
 */
 
 #ifndef INTERNAL_SIZE_T
 #define INTERNAL_SIZE_T size_t
 #endif
 
+/* The corresponding word size */
+#define SIZE_SZ                (sizeof(INTERNAL_SIZE_T))
+
+
+/*
+  MALLOC_ALIGNMENT is the minimum alignment for malloc'ed chunks.
+  It must be a power of two at least 2 * SIZE_SZ, even on machines
+  for which smaller alignments would suffice. It may be defined as
+  larger than this though. Note however that code and data structures
+  are optimized for the case of 8-byte alignment.
+*/
+
+
+#ifndef MALLOC_ALIGNMENT
+#define MALLOC_ALIGNMENT       (2 * SIZE_SZ)
+#endif
+
+/* The corresponding bit mask value */
+#define MALLOC_ALIGN_MASK      (MALLOC_ALIGNMENT - 1)
+
+
+
+/*
+  REALLOC_ZERO_BYTES_FREES should be set if a call to
+  realloc with zero bytes should be the same as a call to free.
+  This is required by the C standard. Otherwise, since this malloc
+  returns a unique pointer for malloc(0), so does realloc(p, 0).
+*/
+
+#ifndef REALLOC_ZERO_BYTES_FREES
+#define REALLOC_ZERO_BYTES_FREES 1
+#endif
+
+/*
+  TRIM_FASTBINS controls whether free() of a very small chunk can
+  immediately lead to trimming. Setting to true (1) can reduce memory
+  footprint, but will almost always slow down programs that use a lot
+  of small chunks.
+
+  Define this only if you are willing to give up some speed to more
+  aggressively reduce system-level memory footprint when releasing
+  memory in programs that use many small chunks.  You can get
+  essentially the same effect by setting MXFAST to 0, but this can
+  lead to even greater slowdowns in programs using many small chunks.
+  TRIM_FASTBINS is an in-between compile-time option, that disables
+  only those chunks bordering topmost memory from being placed in
+  fastbins.
+*/
+
+#ifndef TRIM_FASTBINS
+#define TRIM_FASTBINS  0
+#endif
+
+
 /*
-  REALLOC_ZERO_BYTES_FREES should be set if a call to realloc with
-  zero bytes should be the same as a call to free.  The C standard
-  requires this. Otherwise, since this malloc returns a unique pointer
-  for malloc(0), so does realloc(p, 0).
+  USE_DL_PREFIX will prefix all public routines with the string 'dl'.
+  This is necessary when you only want to use this malloc in one part 
+  of a program, using your regular system malloc elsewhere.
+*/
+
+/* #define USE_DL_PREFIX */
+
+
+/* 
+   Two-phase name translation.
+   All of the actual routines are given mangled names.
+   When wrappers are used, they become the public callable versions.
+   When DL_PREFIX is used, the callable names are prefixed.
 */
 
+#ifdef USE_DL_PREFIX
+#define public_cALLOc    dlcalloc
+#define public_fREe      dlfree
+#define public_cFREe     dlcfree
+#define public_mALLOc    dlmalloc
+#define public_mEMALIGn  dlmemalign
+#define public_rEALLOc   dlrealloc
+#define public_vALLOc    dlvalloc
+#define public_pVALLOc   dlpvalloc
+#define public_mALLINFo  dlmallinfo
+#define public_mALLOPt   dlmallopt
+#define public_mTRIm     dlmalloc_trim
+#define public_mSTATs    dlmalloc_stats
+#define public_mUSABLe   dlmalloc_usable_size
+#define public_iCALLOc   dlindependent_calloc
+#define public_iCOMALLOc dlindependent_comalloc
+#define public_gET_STATe dlget_state
+#define public_sET_STATe dlset_state
+#else /* USE_DL_PREFIX */
+#ifdef _LIBC
+
+/* Special defines for the GNU C library.  */
+#define public_cALLOc    __libc_calloc
+#define public_fREe      __libc_free
+#define public_cFREe     __libc_cfree
+#define public_mALLOc    __libc_malloc
+#define public_mEMALIGn  __libc_memalign
+#define public_rEALLOc   __libc_realloc
+#define public_vALLOc    __libc_valloc
+#define public_pVALLOc   __libc_pvalloc
+#define public_mALLINFo  __libc_mallinfo
+#define public_mALLOPt   __libc_mallopt
+#define public_mTRIm     __malloc_trim
+#define public_mSTATs    __malloc_stats
+#define public_mUSABLe   __malloc_usable_size
+#define public_iCALLOc   __libc_independent_calloc
+#define public_iCOMALLOc __libc_independent_comalloc
+#define public_gET_STATe __malloc_get_state
+#define public_sET_STATe __malloc_set_state
+#define malloc_getpagesize __getpagesize()
+#define open             __open
+#define mmap             __mmap
+#define munmap           __munmap
+#define mremap           __mremap
+#define mprotect         __mprotect
+#define MORECORE         (*__morecore)
+#define MORECORE_FAILURE 0
+
+Void_t * __default_morecore (ptrdiff_t);
+Void_t *(*__morecore)(ptrdiff_t) = __default_morecore;
 
-#define REALLOC_ZERO_BYTES_FREES
+#else /* !_LIBC */
+#define public_cALLOc    calloc
+#define public_fREe      free
+#define public_cFREe     cfree
+#define public_mALLOc    malloc
+#define public_mEMALIGn  memalign
+#define public_rEALLOc   realloc
+#define public_vALLOc    valloc
+#define public_pVALLOc   pvalloc
+#define public_mALLINFo  mallinfo
+#define public_mALLOPt   mallopt
+#define public_mTRIm     malloc_trim
+#define public_mSTATs    malloc_stats
+#define public_mUSABLe   malloc_usable_size
+#define public_iCALLOc   independent_calloc
+#define public_iCOMALLOc independent_comalloc
+#define public_gET_STATe malloc_get_state
+#define public_sET_STATe malloc_set_state
+#endif /* _LIBC */
+#endif /* USE_DL_PREFIX */
 
 
 /*
   HAVE_MEMCPY should be defined if you are not otherwise using
   ANSI STD C, but still have memcpy and memset in your C library
   and want to use them in calloc and realloc. Otherwise simple
-  macro versions are defined here.
+  macro versions are defined below.
 
   USE_MEMCPY should be defined as 1 if you actually want to
   have memset and memcpy called. People report that the macro
-  versions are often enough faster than libc versions on many
-  systems that it is better to use them.
-
+  versions are faster than libc versions on some systems.
+  
+  Even if USE_MEMCPY is set to 1, loops to copy/clear small chunks
+  (of <= 36 bytes) are manually unrolled in realloc and calloc.
 */
 
-#define HAVE_MEMCPY 1
+#define HAVE_MEMCPY
 
 #ifndef USE_MEMCPY
 #ifdef HAVE_MEMCPY
@@ -410,125 +513,161 @@ extern "C" {
 #endif
 #endif
 
+
 #if (__STD_C || defined(HAVE_MEMCPY))
 
+#ifdef WIN32
+/* On Win32 memset and memcpy are already declared in windows.h */
+#else
 #if __STD_C
 void* memset(void*, int, size_t);
 void* memcpy(void*, const void*, size_t);
-void* memmove(void*, const void*, size_t);
 #else
 Void_t* memset();
 Void_t* memcpy();
-Void_t* memmove();
+#endif
 #endif
 #endif
 
-/* The following macros are only invoked with (2n+1)-multiples of
-   INTERNAL_SIZE_T units, with a positive integer n. This is exploited
-   for fast inline execution when n is small.  If the regions to be
-   copied do overlap, the destination lies always _below_ the source.  */
+/*
+  MALLOC_FAILURE_ACTION is the action to take before "return 0" when
+  malloc fails to be able to return memory, either because memory is
+  exhausted or because of illegal arguments.
+  
+  By default, sets errno if running on STD_C platform, else does nothing.  
+*/
 
-#if USE_MEMCPY
+#ifndef MALLOC_FAILURE_ACTION
+#if __STD_C
+#define MALLOC_FAILURE_ACTION \
+   errno = ENOMEM;
 
-#define MALLOC_ZERO(charp, nbytes)                                            \
-do {                                                                          \
-  INTERNAL_SIZE_T mzsz = (nbytes);                                            \
-  if(mzsz <= 9*sizeof(mzsz)) {                                                \
-    INTERNAL_SIZE_T* mz = (INTERNAL_SIZE_T*) (charp);                         \
-    if(mzsz >= 5*sizeof(mzsz)) {     *mz++ = 0;                               \
-                                     *mz++ = 0;                               \
-      if(mzsz >= 7*sizeof(mzsz)) {   *mz++ = 0;                               \
-                                     *mz++ = 0;                               \
-        if(mzsz >= 9*sizeof(mzsz)) { *mz++ = 0;                               \
-                                     *mz++ = 0; }}}                           \
-                                     *mz++ = 0;                               \
-                                     *mz++ = 0;                               \
-                                     *mz   = 0;                               \
-  } else memset((charp), 0, mzsz);                                            \
-} while(0)
+#else
+#define MALLOC_FAILURE_ACTION
+#endif
+#endif
 
-/* If the regions overlap, dest is always _below_ src.  */
+/*
+  MORECORE-related declarations. By default, rely on sbrk
+*/
 
-#define MALLOC_COPY(dest,src,nbytes,overlap)                                  \
-do {                                                                          \
-  INTERNAL_SIZE_T mcsz = (nbytes);                                            \
-  if(mcsz <= 9*sizeof(mcsz)) {                                                \
-    INTERNAL_SIZE_T* mcsrc = (INTERNAL_SIZE_T*) (src);                        \
-    INTERNAL_SIZE_T* mcdst = (INTERNAL_SIZE_T*) (dest);                       \
-    if(mcsz >= 5*sizeof(mcsz)) {     *mcdst++ = *mcsrc++;                     \
-                                     *mcdst++ = *mcsrc++;                     \
-      if(mcsz >= 7*sizeof(mcsz)) {   *mcdst++ = *mcsrc++;                     \
-                                     *mcdst++ = *mcsrc++;                     \
-        if(mcsz >= 9*sizeof(mcsz)) { *mcdst++ = *mcsrc++;                     \
-                                     *mcdst++ = *mcsrc++; }}}                 \
-                                     *mcdst++ = *mcsrc++;                     \
-                                     *mcdst++ = *mcsrc++;                     \
-                                     *mcdst   = *mcsrc  ;                     \
-  } else if(overlap)                                                          \
-    memmove(dest, src, mcsz);                                                 \
-  else                                                                        \
-    memcpy(dest, src, mcsz);                                                  \
-} while(0)
 
-#else /* !USE_MEMCPY */
+#ifdef LACKS_UNISTD_H
+#if !defined(__FreeBSD__) && !defined(__OpenBSD__) && !defined(__NetBSD__)
+#if __STD_C
+extern Void_t*     sbrk(ptrdiff_t);
+#else
+extern Void_t*     sbrk();
+#endif
+#endif
+#endif
 
-/* Use Duff's device for good zeroing/copying performance. */
+/*
+  MORECORE is the name of the routine to call to obtain more memory
+  from the system.  See below for general guidance on writing
+  alternative MORECORE functions, as well as a version for WIN32 and a
+  sample version for pre-OSX macos.
+*/
 
-#define MALLOC_ZERO(charp, nbytes)                                            \
-do {                                                                          \
-  INTERNAL_SIZE_T* mzp = (INTERNAL_SIZE_T*)(charp);                           \
-  long mctmp = (nbytes)/sizeof(INTERNAL_SIZE_T), mcn;                         \
-  if (mctmp < 8) mcn = 0; else { mcn = (mctmp-1)/8; mctmp %= 8; }             \
-  switch (mctmp) {                                                            \
-    case 0: for(;;) { *mzp++ = 0;                                             \
-    case 7:           *mzp++ = 0;                                             \
-    case 6:           *mzp++ = 0;                                             \
-    case 5:           *mzp++ = 0;                                             \
-    case 4:           *mzp++ = 0;                                             \
-    case 3:           *mzp++ = 0;                                             \
-    case 2:           *mzp++ = 0;                                             \
-    case 1:           *mzp++ = 0; if(mcn <= 0) break; mcn--; }                \
-  }                                                                           \
-} while(0)
+#ifndef MORECORE
+#define MORECORE sbrk
+#endif
 
-/* If the regions overlap, dest is always _below_ src.  */
+/*
+  MORECORE_FAILURE is the value returned upon failure of MORECORE
+  as well as mmap. Since it cannot be an otherwise valid memory address,
+  and must reflect values of standard sys calls, you probably ought not
+  try to redefine it.
+*/
 
-#define MALLOC_COPY(dest,src,nbytes,overlap)                                  \
-do {                                                                          \
-  INTERNAL_SIZE_T* mcsrc = (INTERNAL_SIZE_T*) src;                            \
-  INTERNAL_SIZE_T* mcdst = (INTERNAL_SIZE_T*) dest;                           \
-  long mctmp = (nbytes)/sizeof(INTERNAL_SIZE_T), mcn;                         \
-  if (mctmp < 8) mcn = 0; else { mcn = (mctmp-1)/8; mctmp %= 8; }             \
-  switch (mctmp) {                                                            \
-    case 0: for(;;) { *mcdst++ = *mcsrc++;                                    \
-    case 7:           *mcdst++ = *mcsrc++;                                    \
-    case 6:           *mcdst++ = *mcsrc++;                                    \
-    case 5:           *mcdst++ = *mcsrc++;                                    \
-    case 4:           *mcdst++ = *mcsrc++;                                    \
-    case 3:           *mcdst++ = *mcsrc++;                                    \
-    case 2:           *mcdst++ = *mcsrc++;                                    \
-    case 1:           *mcdst++ = *mcsrc++; if(mcn <= 0) break; mcn--; }       \
-  }                                                                           \
-} while(0)
+#ifndef MORECORE_FAILURE
+#define MORECORE_FAILURE (-1)
+#endif
+
+/*
+  If MORECORE_CONTIGUOUS is true, take advantage of fact that
+  consecutive calls to MORECORE with positive arguments always return
+  contiguous increasing addresses.  This is true of unix sbrk.  Even
+  if not defined, when regions happen to be contiguous, malloc will
+  permit allocations spanning regions obtained from different
+  calls. But defining this when applicable enables some stronger
+  consistency checks and space efficiencies.
+*/
 
+#ifndef MORECORE_CONTIGUOUS
+#define MORECORE_CONTIGUOUS 1
 #endif
 
+/*
+  Define MORECORE_CANNOT_TRIM if your version of MORECORE
+  cannot release space back to the system when given negative
+  arguments. This is generally necessary only if you are using
+  a hand-crafted MORECORE function that cannot handle negative arguments.
+*/
+
+/* #define MORECORE_CANNOT_TRIM */
 
-#ifndef LACKS_UNISTD_H
-#  include <unistd.h>
+/*  MORECORE_CLEARS           (default 1)
+     The degree to which the routine mapped to MORECORE zeroes out
+     memory: never (0), only for newly allocated space (1) or always
+     (2).  The distinction between (1) and (2) is necessary because on
+     some systems, if the application first decrements and then
+     increments the break value, the contents of the reallocated space
+     are unspecified.
+*/
+
+#ifndef MORECORE_CLEARS
+#define MORECORE_CLEARS 1
 #endif
 
+
 /*
-  Define HAVE_MMAP to optionally make malloc() use mmap() to allocate
-  very large blocks.  These will be returned to the operating system
-  immediately after a free().  HAVE_MMAP is also a prerequisite to
-  support multiple `arenas' (see USE_ARENAS below).
+  Define HAVE_MMAP as true to optionally make malloc() use mmap() to
+  allocate very large blocks.  These will be returned to the
+  operating system immediately after a free(). Also, if mmap
+  is available, it is used as a backup strategy in cases where
+  MORECORE fails to provide space from system.
+
+  This malloc is best tuned to work with mmap for large requests.
+  If you do not have mmap, operations involving very large chunks (1MB
+  or so) may be slower than you'd like.
 */
 
 #ifndef HAVE_MMAP
-# ifdef _POSIX_MAPPED_FILES
-#  define HAVE_MMAP 1
-# endif
+#define HAVE_MMAP 1
+
+/* 
+   Standard unix mmap using /dev/zero clears memory so calloc doesn't
+   need to.
+*/
+
+#ifndef MMAP_CLEARS
+#define MMAP_CLEARS 1
+#endif
+
+#else /* no mmap */
+#ifndef MMAP_CLEARS
+#define MMAP_CLEARS 0
+#endif
+#endif
+
+
+/* 
+   MMAP_AS_MORECORE_SIZE is the minimum mmap size argument to use if
+   sbrk fails, and mmap is used as a backup (which is done only if
+   HAVE_MMAP).  The value must be a multiple of page size.  This
+   backup strategy generally applies only when systems have "holes" in
+   address space, so sbrk cannot perform contiguous expansion, but
+   there is still space available on system.  On systems for which
+   this is known to be useful (i.e. most linux kernels), this occurs
+   only when programs allocate huge amounts of memory.  Between this,
+   and the fact that mmap regions tend to be limited, the size should
+   be large, to avoid too many mmap calls and thus avoid running out
+   of kernel resources.
+*/
+
+#ifndef MMAP_AS_MORECORE_SIZE
+#define MMAP_AS_MORECORE_SIZE (1024 * 1024)
 #endif
 
 /*
@@ -538,9 +677,14 @@ do {                                                                          \
 */
 
 #ifndef HAVE_MREMAP
-#define HAVE_MREMAP defined(__linux__)
+#ifdef linux
+#define HAVE_MREMAP 1
+#else
+#define HAVE_MREMAP 0
 #endif
 
+#endif /* HAVE_MMAP */
+
 /* Define USE_ARENAS to enable support for multiple `arenas'.  These
    are allocated using mmap(), are necessary for threads and
    occasionally useful to overcome address space limitations affecting
@@ -550,43 +694,32 @@ do {                                                                          \
 #define USE_ARENAS HAVE_MMAP
 #endif
 
-#if HAVE_MMAP
-
-#include <unistd.h>
-#include <fcntl.h>
-#include <sys/mman.h>
-
-#if !defined(MAP_ANONYMOUS) && defined(MAP_ANON)
-#define MAP_ANONYMOUS MAP_ANON
-#endif
-#if !defined(MAP_FAILED)
-#define MAP_FAILED ((char*)-1)
-#endif
-
-#ifndef MAP_NORESERVE
-# ifdef MAP_AUTORESRV
-#  define MAP_NORESERVE MAP_AUTORESRV
-# else
-#  define MAP_NORESERVE 0
-# endif
-#endif
-
-#endif /* HAVE_MMAP */
 
 /*
-  Access to system page size. To the extent possible, this malloc
-  manages memory from the system in page-size units.
-
-  The following mechanics for getpagesize were adapted from
-  bsd/gnu getpagesize.h
+  The system page size. To the extent possible, this malloc manages
+  memory from the system in page-size units.  Note that this value is
+  cached during initialization into a field of malloc_state. So even
+  if malloc_getpagesize is a function, it is only called once.
+
+  The following mechanics for getpagesize were adapted from bsd/gnu
+  getpagesize.h. If none of the system-probes here apply, a value of
+  4096 is used, which should be OK: If they don't apply, then using
+  the actual value probably doesn't impact performance.
 */
 
+
 #ifndef malloc_getpagesize
+
+#ifndef LACKS_UNISTD_H
+#  include <unistd.h>
+#endif
+
 #  ifdef _SC_PAGESIZE         /* some SVR4 systems omit an underscore */
 #    ifndef _SC_PAGE_SIZE
 #      define _SC_PAGE_SIZE _SC_PAGESIZE
 #    endif
 #  endif
+
 #  ifdef _SC_PAGE_SIZE
 #    define malloc_getpagesize sysconf(_SC_PAGE_SIZE)
 #  else
@@ -594,24 +727,30 @@ do {                                                                          \
        extern size_t getpagesize();
 #      define malloc_getpagesize getpagesize()
 #    else
-#      include <sys/param.h>
-#      ifdef EXEC_PAGESIZE
-#        define malloc_getpagesize EXEC_PAGESIZE
+#      ifdef WIN32 /* use supplied emulation of getpagesize */
+#        define malloc_getpagesize getpagesize() 
 #      else
-#        ifdef NBPG
-#          ifndef CLSIZE
-#            define malloc_getpagesize NBPG
-#          else
-#            define malloc_getpagesize (NBPG * CLSIZE)
-#          endif
+#        ifndef LACKS_SYS_PARAM_H
+#          include <sys/param.h>
+#        endif
+#        ifdef EXEC_PAGESIZE
+#          define malloc_getpagesize EXEC_PAGESIZE
 #        else
-#          ifdef NBPC
-#            define malloc_getpagesize NBPC
+#          ifdef NBPG
+#            ifndef CLSIZE
+#              define malloc_getpagesize NBPG
+#            else
+#              define malloc_getpagesize (NBPG * CLSIZE)
+#            endif
 #          else
-#            ifdef PAGESIZE
-#              define malloc_getpagesize PAGESIZE
+#            ifdef NBPC
+#              define malloc_getpagesize NBPC
 #            else
-#              define malloc_getpagesize (4096) /* just guess */
+#              ifdef PAGESIZE
+#                define malloc_getpagesize PAGESIZE
+#              else /* just guess */
+#                define malloc_getpagesize (4096) 
+#              endif
 #            endif
 #          endif
 #        endif
@@ -620,241 +759,656 @@ do {                                                                          \
 #  endif
 #endif
 
-
-
 /*
-
   This version of malloc supports the standard SVID/XPG mallinfo
-  routine that returns a struct containing the same kind of
-  information you can get from malloc_stats. It should work on
-  any SVID/XPG compliant system that has a /usr/include/malloc.h
-  defining struct mallinfo. (If you'd like to install such a thing
-  yourself, cut out the preliminary declarations as described above
-  and below and save them in a malloc.h file. But there's no
-  compelling reason to bother to do this.)
+  routine that returns a struct containing usage properties and
+  statistics. It should work on any SVID/XPG compliant system that has
+  a /usr/include/malloc.h defining struct mallinfo. (If you'd like to
+  install such a thing yourself, cut out the preliminary declarations
+  as described above and below and save them in a malloc.h file. But
+  there's no compelling reason to bother to do this.)
 
   The main declaration needed is the mallinfo struct that is returned
   (by-copy) by mallinfo().  The SVID/XPG malloinfo struct contains a
-  bunch of fields, most of which are not even meaningful in this
-  version of malloc. Some of these fields are are instead filled by
-  mallinfo() with other numbers that might possibly be of interest.
+  bunch of fields that are not even meaningful in this version of
+  malloc.  These fields are are instead filled by mallinfo() with
+  other numbers that might be of interest.
 
   HAVE_USR_INCLUDE_MALLOC_H should be set if you have a
   /usr/include/malloc.h file that includes a declaration of struct
   mallinfo.  If so, it is included; else an SVID2/XPG2 compliant
   version is declared below.  These must be precisely the same for
-  mallinfo() to work.
-
+  mallinfo() to work.  The original SVID version of this struct,
+  defined on most systems with mallinfo, declares all fields as
+  ints. But some others define as unsigned long. If your system
+  defines the fields using a type of different width than listed here,
+  you must #include your system version and #define
+  HAVE_USR_INCLUDE_MALLOC_H.
 */
 
 /* #define HAVE_USR_INCLUDE_MALLOC_H */
 
-#if HAVE_USR_INCLUDE_MALLOC_H
-# include "/usr/include/malloc.h"
-#else
-# ifdef _LIBC
-#  include "malloc.h"
-# else
-#  include "ptmalloc.h"
-# endif
+#ifdef HAVE_USR_INCLUDE_MALLOC_H
+#include "/usr/include/malloc.h"
 #endif
 
-#include <bp-checks.h>
 
-#ifndef DEFAULT_TRIM_THRESHOLD
-#define DEFAULT_TRIM_THRESHOLD (128 * 1024)
-#endif
+/* ---------- description of public routines ------------ */
 
 /*
-    M_TRIM_THRESHOLD is the maximum amount of unused top-most memory
-      to keep before releasing via malloc_trim in free().
-
-      Automatic trimming is mainly useful in long-lived programs.
-      Because trimming via sbrk can be slow on some systems, and can
-      sometimes be wasteful (in cases where programs immediately
-      afterward allocate more large chunks) the value should be high
-      enough so that your overall system performance would improve by
-      releasing.
-
-      The trim threshold and the mmap control parameters (see below)
-      can be traded off with one another. Trimming and mmapping are
-      two different ways of releasing unused memory back to the
-      system. Between these two, it is often possible to keep
-      system-level demands of a long-lived program down to a bare
-      minimum. For example, in one test suite of sessions measuring
-      the XF86 X server on Linux, using a trim threshold of 128K and a
-      mmap threshold of 192K led to near-minimal long term resource
-      consumption.
-
-      If you are using this malloc in a long-lived program, it should
-      pay to experiment with these values.  As a rough guide, you
-      might set to a value close to the average size of a process
-      (program) running on your system.  Releasing this much memory
-      would allow such a process to run in memory.  Generally, it's
-      worth it to tune for trimming rather than memory mapping when a
-      program undergoes phases where several large chunks are
-      allocated and released in ways that can reuse each other's
-      storage, perhaps mixed with phases where there are no such
-      chunks at all.  And in well-behaved long-lived programs,
-      controlling release of large blocks via trimming versus mapping
-      is usually faster.
-
-      However, in most programs, these parameters serve mainly as
-      protection against the system-level effects of carrying around
-      massive amounts of unneeded memory. Since frequent calls to
-      sbrk, mmap, and munmap otherwise degrade performance, the default
-      parameters are set to relatively high values that serve only as
-      safeguards.
-
-      The default trim value is high enough to cause trimming only in
-      fairly extreme (by current memory consumption standards) cases.
-      It must be greater than page size to have any useful effect.  To
-      disable trimming completely, you can set to (unsigned long)(-1);
-
-
+  malloc(size_t n)
+  Returns a pointer to a newly allocated chunk of at least n bytes, or null
+  if no space is available. Additionally, on failure, errno is
+  set to ENOMEM on ANSI C systems.
+
+  If n is zero, malloc returns a minumum-sized chunk. (The minimum
+  size is 16 bytes on most 32bit systems, and 24 or 32 bytes on 64bit
+  systems.)  On most systems, size_t is an unsigned type, so calls
+  with negative arguments are interpreted as requests for huge amounts
+  of space, which will often fail. The maximum supported value of n
+  differs across systems, but is in all cases less than the maximum
+  representable value of a size_t.
 */
+#if __STD_C
+Void_t*  public_mALLOc(size_t);
+#else
+Void_t*  public_mALLOc();
+#endif
 
+/*
+  free(Void_t* p)
+  Releases the chunk of memory pointed to by p, that had been previously
+  allocated using malloc or a related routine such as realloc.
+  It has no effect if p is null. It can have arbitrary (i.e., bad!)
+  effects if p has already been freed.
+
+  Unless disabled (using mallopt), freeing very large spaces will
+  when possible, automatically trigger operations that give
+  back unused memory to the system, thus reducing program footprint.
+*/
+#if __STD_C
+void     public_fREe(Void_t*);
+#else
+void     public_fREe();
+#endif
 
-#ifndef DEFAULT_TOP_PAD
-#define DEFAULT_TOP_PAD        (0)
+/*
+  calloc(size_t n_elements, size_t element_size);
+  Returns a pointer to n_elements * element_size bytes, with all locations
+  set to zero.
+*/
+#if __STD_C
+Void_t*  public_cALLOc(size_t, size_t);
+#else
+Void_t*  public_cALLOc();
 #endif
 
 /*
-    M_TOP_PAD is the amount of extra `padding' space to allocate or
-      retain whenever sbrk is called. It is used in two ways internally:
+  realloc(Void_t* p, size_t n)
+  Returns a pointer to a chunk of size n that contains the same data
+  as does chunk p up to the minimum of (n, p's size) bytes, or null
+  if no space is available. 
 
-      * When sbrk is called to extend the top of the arena to satisfy
-        a new malloc request, this much padding is added to the sbrk
-        request.
+  The returned pointer may or may not be the same as p. The algorithm
+  prefers extending p when possible, otherwise it employs the
+  equivalent of a malloc-copy-free sequence.
 
-      * When malloc_trim is called automatically from free(),
-        it is used as the `pad' argument.
+  If p is null, realloc is equivalent to malloc.  
 
-      In both cases, the actual amount of padding is rounded
-      so that the end of the arena is always a system page boundary.
+  If space is not available, realloc returns null, errno is set (if on
+  ANSI) and p is NOT freed.
 
-      The main reason for using padding is to avoid calling sbrk so
-      often. Having even a small pad greatly reduces the likelihood
-      that nearly every malloc request during program start-up (or
-      after trimming) will invoke sbrk, which needlessly wastes
-      time.
+  if n is for fewer bytes than already held by p, the newly unused
+  space is lopped off and freed if possible.  Unless the #define
+  REALLOC_ZERO_BYTES_FREES is set, realloc with a size argument of
+  zero (re)allocates a minimum-sized chunk.
 
-      Automatic rounding-up to page-size units is normally sufficient
-      to avoid measurable overhead, so the default is 0.  However, in
-      systems where sbrk is relatively slow, it can pay to increase
-      this value, at the expense of carrying around more memory than
-      the program needs.
+  Large chunks that were internally obtained via mmap will always
+  be reallocated using malloc-copy-free sequences unless
+  the system supports MREMAP (currently only linux).
 
+  The old unix realloc convention of allowing the last-free'd chunk
+  to be used as an argument to realloc is not supported.
 */
+#if __STD_C
+Void_t*  public_rEALLOc(Void_t*, size_t);
+#else
+Void_t*  public_rEALLOc();
+#endif
 
-
-#ifndef DEFAULT_MMAP_THRESHOLD
-#define DEFAULT_MMAP_THRESHOLD (128 * 1024)
+/*
+  memalign(size_t alignment, size_t n);
+  Returns a pointer to a newly allocated chunk of n bytes, aligned
+  in accord with the alignment argument.
+
+  The alignment argument should be a power of two. If the argument is
+  not a power of two, the nearest greater power is used.
+  8-byte alignment is guaranteed by normal malloc calls, so don't
+  bother calling memalign with an argument of 8 or less.
+
+  Overreliance on memalign is a sure way to fragment space.
+*/
+#if __STD_C
+Void_t*  public_mEMALIGn(size_t, size_t);
+#else
+Void_t*  public_mEMALIGn();
 #endif
 
 /*
+  valloc(size_t n);
+  Equivalent to memalign(pagesize, n), where pagesize is the page
+  size of the system. If the pagesize is unknown, 4096 is used.
+*/
+#if __STD_C
+Void_t*  public_vALLOc(size_t);
+#else
+Void_t*  public_vALLOc();
+#endif
+
 
-    M_MMAP_THRESHOLD is the request size threshold for using mmap()
-      to service a request. Requests of at least this size that cannot
-      be allocated using already-existing space will be serviced via mmap.
-      (If enough normal freed space already exists it is used instead.)
-
-      Using mmap segregates relatively large chunks of memory so that
-      they can be individually obtained and released from the host
-      system. A request serviced through mmap is never reused by any
-      other request (at least not directly; the system may just so
-      happen to remap successive requests to the same locations).
-
-      Segregating space in this way has the benefit that mmapped space
-      can ALWAYS be individually released back to the system, which
-      helps keep the system level memory demands of a long-lived
-      program low. Mapped memory can never become `locked' between
-      other chunks, as can happen with normally allocated chunks, which
-      menas that even trimming via malloc_trim would not release them.
 
-      However, it has the disadvantages that:
+/*
+  mallopt(int parameter_number, int parameter_value)
+  Sets tunable parameters The format is to provide a
+  (parameter-number, parameter-value) pair.  mallopt then sets the
+  corresponding parameter to the argument value if it can (i.e., so
+  long as the value is meaningful), and returns 1 if successful else
+  0.  SVID/XPG/ANSI defines four standard param numbers for mallopt,
+  normally defined in malloc.h.  Only one of these (M_MXFAST) is used
+  in this malloc. The others (M_NLBLKS, M_GRAIN, M_KEEP) don't apply,
+  so setting them has no effect. But this malloc also supports four
+  other options in mallopt. See below for details.  Briefly, supported
+  parameters are as follows (listed defaults are for "typical"
+  configurations).
+
+  Symbol            param #   default    allowed param values
+  M_MXFAST          1         64         0-80  (0 disables fastbins)
+  M_TRIM_THRESHOLD -1         128*1024   any   (-1U disables trimming)
+  M_TOP_PAD        -2         0          any  
+  M_MMAP_THRESHOLD -3         128*1024   any   (or 0 if no MMAP support)
+  M_MMAP_MAX       -4         65536      any   (0 disables use of mmap)
+*/
+#if __STD_C
+int      public_mALLOPt(int, int);
+#else
+int      public_mALLOPt();
+#endif
+
+
+/*
+  mallinfo()
+  Returns (by copy) a struct containing various summary statistics:
+
+  arena:     current total non-mmapped bytes allocated from system 
+  ordblks:   the number of free chunks 
+  smblks:    the number of fastbin blocks (i.e., small chunks that
+               have been freed but not use resused or consolidated)
+  hblks:     current number of mmapped regions 
+  hblkhd:    total bytes held in mmapped regions 
+  usmblks:   the maximum total allocated space. This will be greater
+                than current total if trimming has occurred.
+  fsmblks:   total bytes held in fastbin blocks 
+  uordblks:  current total allocated space (normal or mmapped)
+  fordblks:  total free space 
+  keepcost:  the maximum number of bytes that could ideally be released
+               back to system via malloc_trim. ("ideally" means that
+               it ignores page restrictions etc.)
+
+  Because these fields are ints, but internal bookkeeping may
+  be kept as longs, the reported values may wrap around zero and 
+  thus be inaccurate.
+*/
+#if __STD_C
+struct mallinfo public_mALLINFo(void);
+#else
+struct mallinfo public_mALLINFo();
+#endif
 
-         1. The space cannot be reclaimed, consolidated, and then
-            used to service later requests, as happens with normal chunks.
-         2. It can lead to more wastage because of mmap page alignment
-            requirements
-         3. It causes malloc performance to be more dependent on host
-            system memory management support routines which may vary in
-            implementation quality and may impose arbitrary
-            limitations. Generally, servicing a request via normal
-            malloc steps is faster than going through a system's mmap.
+/*
+  independent_calloc(size_t n_elements, size_t element_size, Void_t* chunks[]);
+
+  independent_calloc is similar to calloc, but instead of returning a
+  single cleared space, it returns an array of pointers to n_elements
+  independent elements that can hold contents of size elem_size, each
+  of which starts out cleared, and can be independently freed,
+  realloc'ed etc. The elements are guaranteed to be adjacently
+  allocated (this is not guaranteed to occur with multiple callocs or
+  mallocs), which may also improve cache locality in some
+  applications.
+
+  The "chunks" argument is optional (i.e., may be null, which is
+  probably the most typical usage). If it is null, the returned array
+  is itself dynamically allocated and should also be freed when it is
+  no longer needed. Otherwise, the chunks array must be of at least
+  n_elements in length. It is filled in with the pointers to the
+  chunks.
+
+  In either case, independent_calloc returns this pointer array, or
+  null if the allocation failed.  If n_elements is zero and "chunks"
+  is null, it returns a chunk representing an array with zero elements
+  (which should be freed if not wanted).
+
+  Each element must be individually freed when it is no longer
+  needed. If you'd like to instead be able to free all at once, you
+  should instead use regular calloc and assign pointers into this
+  space to represent elements.  (In this case though, you cannot
+  independently free e