aboutsummaryrefslogtreecommitdiff
path: root/malloc/malloc.c
AgeCommit message (Collapse)AuthorFilesLines
2025-04-16malloc: move tcache_init out of hot tcache pathsCupertino Miranda1-12/+6
This patch moves any calls of tcache_init away after tcache hot paths. Since there is no reason to initialize tcaches in the hot path and since we need to be able to check tcache != NULL in any case, because of tcache_thread_shutdown function, moving tcache_init away from hot path can only be beneficial. The patch also removes the initialization of tcaches within the __libc_free call. It only makes sense to initialize tcaches for the thread after it calls one of the allocation functions. Also the patch removes the save/restore of errno from tcache_init code, as it is no longer needed.
2025-04-15malloc: Use tailcalls in __libc_freeWilco Dijkstra1-7/+24
Use tailcalls to avoid the overhead of a frame on the free fastpath. Move tcache initialization to _int_free_chunk(). Add malloc_printerr_tail() which can be tailcalled without forcing a frame like no-return functions. Change tcache_double_free_verify() to retry via __libc_free() after clearing the key. Reviewed-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-04-15malloc: Inline tcache_freeWilco Dijkstra1-30/+14
Inline tcache_free since it's only used by __libc_free. Add __glibc_likely for the tcache checks. Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-04-15malloc: Improve free checksWilco Dijkstra1-9/+6
The checks on size can be merged and use __builtin_add_overflow. Since tcache only handles small sizes (and rejects sizes < MINSIZE), delay this check until after tcache. Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-04-15malloc: Inline _int_free_checkWilco Dijkstra1-19/+12
Inline _int_free_check since it is only used by __libc_free. Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-04-14malloc: Inline _int_freeWilco Dijkstra1-27/+11
Inline _int_free since it is a small function and only really used by __libc_free. Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-04-14malloc: Move mmap code out of __libc_free hotpathWilco Dijkstra1-30/+40
Currently __libc_free checks for a freed mmap chunk in the fast path. Also errno is always saved and restored to preserve it. Since mmap chunks are larger than the largest tcache chunk, it is safe to delay this and handle tcache, smallbin and medium bin blocks first. Move saving of errno to cases that actually need it. Remove a safety check that fails on mmap chunks and a check that mmap chunks cannot be added to tcache. Performance of bench-malloc-thread improves by 9.2% for 1 thread and 6.9% for 32 threads on Neoverse V2. Reviewed-by: DJ Delorie <dj@redhat.com> Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-03-28malloc: Improve performance of __libc_mallocWilco Dijkstra1-13/+20
Improve performance of __libc_malloc by splitting it into 2 parts: first handle the tcache fastpath, then do the rest in a separate tailcalled function. This results in significant performance gains since __libc_malloc doesn't need to setup a frame and we delay tcache initialization and setting of errno until later. On Neoverse V2, bench-malloc-simple improves by 6.7% overall (up to 8.5% for ST case) and bench-malloc-thread improves by 20.3% for 1 thread and 14.4% for 32 threads. Reviewed-by: DJ Delorie <dj@redhat.com>
2025-03-26malloc: Use __always_inline for simple functionsWilco Dijkstra1-5/+5
Use __always_inline for small helper functions that are critical for performance. This ensures inlining always happens when expected. Performance of bench-malloc-simple improves by 0.6% on average on Neoverse V2. Reviewed-by: DJ Delorie <dj@redhat.com>
2025-03-25malloc: Use _int_free_chunk for remaindersWilco Dijkstra1-9/+6
When splitting a chunk, release the tail part by calling int_free_chunk. This avoids inserting random blocks into tcache that were never requested by the user. Fragmentation will be worse if they are never used again. Note if the tail is fairly small, we could avoid splitting it at all. Also remove an oddly placed initialization of tcache in _libc_realloc. Reviewed-by: DJ Delorie <dj@redhat.com>
2025-03-21malloc: missing initialization of tcache in _mid_memalignCupertino Miranda1-0/+2
_mid_memalign includes tcache code but does not attempt to initialize tcaches. Reviewed-by: DJ Delorie <dj@redhat.com>
2025-03-18malloc: Improve csize2tidxWilco Dijkstra1-1/+1
Remove the alignment rounding up from csize2tidx - this makes no sense since the input should be a chunk size. Removing it enables further optimizations, for example chunksize_nomask can be safely used and invalid sizes < MINSIZE are not mapped to a valid tidx. Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-03-03malloc: Add integrity check to largebin nextsizesBen Kallus1-0/+3
If attacker overwrites the bk_nextsize link in the first chunk of a largebin that later has a smaller chunk inserted into it, malloc will write a heap pointer into an attacker-controlled address [0]. This patch adds an integrity check to mitigate this attack. [0]: https://github.com/shellphish/how2heap/blob/master/glibc_2.39/large_bin_attack.c Signed-off-by: Ben Kallus <benjamin.p.kallus.gr@dartmouth.edu> Reviewed-by: DJ Delorie <dj@redhat.com>
2025-02-13malloc: Add size check when moving fastbin->tcacheBen Kallus1-0/+3
By overwriting a forward link in a fastbin chunk that is subsequently moved into the tcache, it's possible to get malloc to return an arbitrary address [0]. When a chunk is fetched from a fastbin, its size is checked against the expected chunk size for that fastbin (see malloc.c:3991). This patch adds a similar check for chunks being moved from a fastbin to tcache, which renders obsolete the exploitation technique described above. Now updated to use __glibc_unlikely instead of __builtin_expect, as requested. [0]: https://github.com/shellphish/how2heap/blob/master/glibc_2.39/fastbin_reverse_into_tcache.c Signed-off-by: Ben Kallus <benjamin.p.kallus.gr@dartmouth.edu> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-01-01Update copyright dates with scripts/update-copyrightsPaul Eggert1-1/+1
2024-12-11malloc: Add tcache path for callocWangyang Guo1-25/+64
This commit add tcache support in calloc() which can largely improve the performance of small size allocation, especially in multi-thread scenario. tcache_available() and tcache_try_malloc() are split out as a helper function for better reusing the code. Also fix tst-safe-linking failure after enabling tcache. In previous, calloc() is used as a way to by-pass tcache in memory allocation and trigger safe-linking check in fastbins path. With tcache enabled, it needs extra workarounds to bypass tcache. Result of bench-calloc-thread benchmark Test Platform: Xeon-8380 Ratio: New / Original time_per_iteration (Lower is Better) Threads# | Ratio -----------|------ 1 thread | 0.656 4 threads | 0.470 Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-12-04malloc: Optimize small memory clearing for callocH.J. Lu1-35/+1
Add calloc-clear-memory.h to clear memory size up to 36 bytes (72 bytes on 64-bit targets) for calloc. Use repeated stores with 1 branch, instead of up to 3 branches. On x86-64, it is faster than memset since calling memset needs 1 indirect branch, 1 broadcast, and up to 4 branches. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2024-11-29malloc: send freed small chunks to smallbink4lizen1-19/+34
Large chunks get added to the unsorted bin since sorting them takes time, for small chunks the benefit of adding them to the unsorted bin is non-existant, actually hurting performance. Splitting and malloc_consolidate still add small chunks to unsorted, but we can hint the compiler that that is a relatively rare occurance. Benchmarking shows this to be consistently good. Authored-by: k4lizen <k4lizen@proton.me> Signed-off-by: Aleksa Siriški <sir@tmina.org>
2024-11-27malloc: Avoid func call for tcache quick path in free()Wangyang Guo1-1/+1
Tcache is an important optimzation to accelerate memory free(), things within this code path should be kept as simple as possible. This commit try to remove the function call when free() invokes tcache code path by inlining _int_free(). Result of bench-malloc-thread benchmark Test Platform: Xeon-8380 Ratio: New / Original time_per_iteration (Lower is Better) Threads# | Ratio -----------|------ 1 thread | 0.879 4 threads | 0.874 The performance data shows it can improve bench-malloc-thread benchmark by ~12% in both single thread and multi-thread scenario. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-11-25Silence most -Wzero-as-null-pointer-constant diagnosticsAlejandro Colomar1-28/+28
Replace 0 by NULL and {0} by {}. Omit a few cases that aren't so trivial to fix. Link: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117059> Link: <https://software.codidact.com/posts/292718/292759#answer-292759> Signed-off-by: Alejandro Colomar <alx@kernel.org>
2024-11-25malloc: Split _int_free() into 3 sub functionsWangyang Guo1-49/+86
Split _int_free() into 3 smaller functions for flexible combination: * _int_free_check -- sanity check for free * tcache_free -- free memory to tcache (quick path) * _int_free_chunk -- free memory chunk (slow path)
2024-11-12linux: Add support for getrandom vDSOAdhemerval Zanella1-2/+2
Linux 6.11 has getrandom() in vDSO. It operates on a thread-local opaque state allocated with mmap using flags specified by the vDSO. Multiple states are allocated at once, as many as fit into a page, and these are held in an array of available states to be doled out to each thread upon first use, and recycled when a thread terminates. As these states run low, more are allocated. To make this procedure async-signal-safe, a simple guard is used in the LSB of the opaque state address, falling back to the syscall if there's reentrancy contention. Also, _Fork() is handled by blocking signals on opaque state allocation (so _Fork() always sees a consistent state even if it interrupts a getrandom() call) and by iterating over the thread stack cache on reclaim_stack. Each opaque state will be in the free states list (grnd_alloc.states) or allocated to a running thread. The cancellation is handled by always using GRND_NONBLOCK flags while calling the vDSO, and falling back to the cancellable syscall if the kernel returns EAGAIN (would block). Since getrandom is not defined by POSIX and cancellation is supported as an extension, the cancellation is handled as 'may occur' instead of 'shall occur' [1], meaning that if vDSO does not block (the expected behavior) getrandom will not act as a cancellation entrypoint. It avoids a pthread_testcancel call on the fast path (different than 'shall occur' functions, like sem_wait()). It is currently enabled for x86_64, which is available in Linux 6.11, and aarch64, powerpc32, powerpc64, loongarch64, and s390x, which are available in Linux 6.12. Link: https://pubs.opengroup.org/onlinepubs/9799919799/nframe.html [1] Co-developed-by: Jason A. Donenfeld <Jason@zx2c4.com> Tested-by: Jason A. Donenfeld <Jason@zx2c4.com> # x86_64 Tested-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> # x86_64, aarch64 Tested-by: Xi Ruoyao <xry111@xry111.site> # x86_64, aarch64, loongarch64 Tested-by: Stefan Liebler <stli@linux.ibm.com> # s390x
2024-01-12Make __getrandom_nocancel set errno and add a _nostatus versionXi Ruoyao1-1/+3
The __getrandom_nocancel function returns errors as negative values instead of errno. This is inconsistent with other _nocancel functions and it breaks "TEMP_FAILURE_RETRY (__getrandom_nocancel (p, n, 0))" in __arc4random_buf. Use INLINE_SYSCALL_CALL instead of INTERNAL_SYSCALL_CALL to fix this issue. But __getrandom_nocancel has been avoiding from touching errno for a reason, see BZ 29624. So add a __getrandom_nocancel_nostatus function and use it in tcache_key_initialize. Signed-off-by: Xi Ruoyao <xry111@xry111.site> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
2024-01-01Update copyright dates with scripts/update-copyrightsPaul Eggert1-1/+1
2023-11-07malloc: Decorate malloc mapsAdhemerval Zanella1-0/+5
Add anonymous mmap annotations on loader malloc, malloc when it allocates memory with mmap, and on malloc arena. The /proc/self/maps will now print: [anon: glibc: malloc arena] [anon: glibc: malloc] [anon: glibc: loader malloc] On arena allocation, glibc annotates only the read/write mapping. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
2023-08-15malloc: Remove bin scanning from memalign (bug 30723)Florian Weimer1-164/+5
On the test workload (mpv --cache=yes with VP9 video decoding), the bin scanning has a very poor success rate (less than 2%). The tcache scanning has about 50% success rate, so keep that. Update comments in malloc/tst-memalign-2 to indicate the purpose of the tests. Even with the scanning removed, the additional merging opportunities since commit 542b1105852568c3ebc712225ae78b ("malloc: Enable merging of remainders in memalign (bug 30723)") are sufficient to pass the existing large bins test. Remove leftover variables from _int_free from refactoring in the same commit. Reviewed-by: DJ Delorie <dj@redhat.com>
2023-08-11malloc: Enable merging of remainders in memalign (bug 30723)Florian Weimer1-76/+121
Previously, calling _int_free from _int_memalign could put remainders into the tcache or into fastbins, where they are invisible to the low-level allocator. This results in missed merge opportunities because once these freed chunks become available to the low-level allocator, further memalign allocations (even of the same size are) likely obstructing merges. Furthermore, during forwards merging in _int_memalign, do not completely give up when the remainder is too small to serve as a chunk on its own. We can still give it back if it can be merged with the following unused chunk. This makes it more likely that memalign calls in a loop achieve a compact memory layout, independently of initial heap layout. Drop some useless (unsigned long) casts along the way, and tweak the style to more closely match GNU on changed lines. Reviewed-by: DJ Delorie <dj@redhat.com>
2023-07-06realloc: Limit chunk reuse to only growing requests [BZ #30579]Siddhesh Poyarekar1-8/+15
The trim_threshold is too aggressive a heuristic to decide if chunk reuse is OK for reallocated memory; for repeated small, shrinking allocations it leads to internal fragmentation and for repeated larger allocations that fragmentation may blow up even worse due to the dynamic nature of the threshold. Limit reuse only when it is within the alignment padding, which is 2 * size_t for heap allocations and a page size for mmapped allocations. There's the added wrinkle of THP, but this fix ignores it for now, pessimizing that case in favor of keeping fragmentation low. This resolves BZ #30579. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reported-by: Nicolas Dusart <nicolas@freedelity.be> Reported-by: Aurelien Jarno <aurelien@aurel32.net> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Tested-by: Aurelien Jarno <aurelien@aurel32.net>
2023-06-02Fix all the remaining misspellings -- BZ 25337Paul Pluzhnikov1-8/+8
2023-05-08aligned_alloc: conform to C17DJ Delorie1-3/+23
This patch adds the strict checking for power-of-two alignments in aligned_alloc(), and updates the manual accordingly. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2023-04-18malloc: set NON_MAIN_ARENA flag for reclaimed memalign chunk (BZ #30101)DJ Delorie1-76/+81
Based on these comments in malloc.c: size field is or'ed with NON_MAIN_ARENA if the chunk was obtained from a non-main arena. This is only set immediately before handing the chunk to the user, if necessary. The NON_MAIN_ARENA flag is never set for unsorted chunks, so it does not have to be taken into account in size comparisons. When we pull a chunk off the unsorted list (or any list) we need to make sure that flag is set properly before returning the chunk. Use the rounded-up size for chunk_ok_for_memalign() Do not scan the arena for reusable chunks if there's no arena. Account for chunk overhead when determining if a chunk is a reuse candidate. mcheck interferes with memalign, so skip mcheck variants of memalign tests. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com>
2023-03-29memalign: Support scanning for aligned chunks.DJ Delorie1-27/+233
This patch adds a chunk scanning algorithm to the _int_memalign code path that reduces heap fragmentation by reusing already aligned chunks instead of always looking for chunks of larger sizes and splitting them. The tcache macros are extended to allow removing a chunk from the middle of the list. The goal is to fix the pathological use cases where heaps grow continuously in workloads that are heavy users of memalign. Note that tst-memalign-2 checks for tcache operation, which malloc-check bypasses. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2023-03-29Remove --enable-tunables configure optionAdhemerval Zanella Netto1-11/+3
And make always supported. The configure option was added on glibc 2.25 and some features require it (such as hwcap mask, huge pages support, and lock elisition tuning). It also simplifies the build permutations. Changes from v1: * Remove glibc.rtld.dynamic_sort changes, it is orthogonal and needs more discussion. * Cleanup more code. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-03-08malloc: Fix transposed arguments in sysmalloc_mmap_fallback callRobert Morell1-2/+2
git commit 0849eed45daa ("malloc: Move MORECORE fallback mmap to sysmalloc_mmap_fallback") moved a block of code from sysmalloc to a new helper function sysmalloc_mmap_fallback(), but 'pagesize' is used for the 'minsize' argument and 'MMAP_AS_MORECORE_SIZE' for the 'pagesize' argument. Fixes: 0849eed45daa ("malloc: Move MORECORE fallback mmap to sysmalloc_mmap_fallback") Signed-off-by: Robert Morell <rmorell@nvidia.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2023-02-22malloc: remove redundant check of unsorted bin corruptionAyush Mittal1-2/+0
* malloc/malloc.c (_int_malloc): remove redundant check of unsorted bin corruption With commit "b90ddd08f6dd688e651df9ee89ca3a69ff88cd0c" (malloc: Additional checks for unsorted bin integrity), same check of (bck->fd != victim) is added before checking of unsorted chunk corruption, which was added in "bdc3009b8ff0effdbbfb05eb6b10966753cbf9b8" (Added check before removing from unsorted list). .. 3773 if (__glibc_unlikely (bck->fd != victim) 3774 || __glibc_unlikely (victim->fd != unsorted_chunks (av))) 3775 malloc_printerr ("malloc(): unsorted double linked list corrupted"); .. .. 3815 /* remove from unsorted list */ 3816 if (__glibc_unlikely (bck->fd != victim)) 3817 malloc_printerr ("malloc(): corrupted unsorted chunks 3"); 3818 unsorted_chunks (av)->bk = bck; .. So this extra check can be removed. Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Signed-off-by: Ayush Mittal <ayush.m@samsung.com> Reviewed-by: DJ Delorie <dj@redhat.com>
2023-01-06Update copyright dates with scripts/update-copyrightsJoseph Myers1-1/+1
2022-12-08realloc: Return unchanged if request is within usable sizeSiddhesh Poyarekar1-0/+10
If there is enough space in the chunk to satisfy the new size, return the old pointer as is, thus avoiding any locks or reallocations. The only real place this has a benefit is in large chunks that tend to get satisfied with mmap, since there is a large enough spare size (up to a page) for it to matter. For allocations on heap, the extra size is typically barely a few bytes (up to 15) and it's unlikely that it would make much difference in performance. Also added a smoke test to ensure that the old pointer is returned unchanged if the new size to realloc is within usable size of the old pointer. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: DJ Delorie <dj@redhat.com>
2022-10-13malloc: Switch global_max_fast to uint8_tFlorian Weimer1-1/+1
MAX_FAST_SIZE is 160 at most, so a uint8_t is sufficient. This makes it harder to use memory corruption, by overwriting global_max_fast with a large value, to fundamentally alter malloc behavior. Reviewed-by: DJ Delorie <dj@redhat.com>
2022-09-26Use atomic_exchange_release/acquireWilco Dijkstra1-1/+1
Rename atomic_exchange_rel/acq to use atomic_exchange_release/acquire since these map to the standard C11 atomic builtins. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-09-22malloc: Print error when oldsize is not equal to the current size.Qingqing Li1-1/+2
This is used to detect errors early. The read of the oldsize is not protected by any lock, so check this value to avoid causing bigger mistakes. Reviewed-by: DJ Delorie <dj@redhat.com>
2022-09-09Use C11 atomics instead of atomic_decrement(_val)Wilco Dijkstra1-1/+1
Replace atomic_decrement and atomic_decrement_val with atomic_fetch_add_relaxed. Reviewed-by: DJ Delorie <dj@redhat.com>
2022-09-09Use C11 atomics instead atomic_add(_zero)Wilco Dijkstra1-1/+1
Replace atomic_add and atomic_add_zero with atomic_fetch_add_relaxed. Reviewed-by: DJ Delorie <dj@redhat.com>
2022-09-06malloc: Use C11 atomics rather than atomic_exchange_and_addWilco Dijkstra1-3/+3
Replace a few counters using atomic_exchange_and_add with atomic_fetch_add_relaxed. Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-08-15malloc: Do not use MAP_NORESERVE to allocate heap segmentsFlorian Weimer1-4/+0
Address space for heap segments is reserved in a mmap call with MAP_ANONYMOUS | MAP_PRIVATE and protection flags PROT_NONE. This reservation does not count against the RSS limit of the process or system. Backing memory is allocated using mprotect in alloc_new_heap and grow_heap, and at this point, the allocator expects the kernel to provide memory (subject to memory overcommit). The SIGSEGV that might generate due to MAP_NORESERVE (according to the mmap manual page) does not seem to occur in practice, it's always SIGKILL from the OOM killer. Even if there is a way that SIGSEGV could be generated, it is confusing to applications that this only happens for secondary heaps, not for large mmap-based allocations, and not for the main arena. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2022-08-03assert: Do not use stderr in libc-internal assertFlorian Weimer1-16/+0
Redirect internal assertion failures to __libc_assert_fail, based on based on __libc_message, which writes directly to STDERR_FILENO and calls abort. Also disable message translation and reword the error message slightly (adjusting stdlib/tst-bz20544 accordingly). As a result of these changes, malloc no longer needs its own redefinition of __assert_fail. __libc_assert_fail needs to be stubbed out during rtld dependency analysis because the rtld rebuilds turn __libc_assert_fail into __assert_fail, which is unconditionally provided by elf/dl-minimal.c. This change is not possible for the public assert macro and its __assert_fail function because POSIX requires that the diagnostic is written to stderr. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-08-03stdio: Clean up __libc_message after unconditional abortFlorian Weimer1-3/+2
Since commit ec2c1fcefb200c6cb7e09553f3c6af8815013d83 ("malloc: Abort on heap corruption, without a backtrace [BZ #21754]"), __libc_message always terminates the process. Since commit a289ea09ea843ced6e5277c2f2e63c357bc7f9a3 ("Do not print backtraces on fatal glibc errors"), the backtrace facility has been removed. Therefore, remove enum __libc_message_action and the action argument of __libc_message, and mark __libc_message as _No_return. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-08-01malloc: Use __getrandom_nocancel during tcache initiailizationFlorian Weimer1-1/+2
Cancellation currently cannot happen at this point because dlopen as used by the unwind link always performs additional allocations for libgcc_s.so.1, even if it has been loaded already as a dependency of the main executable. But it seems prudent not to rely on this quirk. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2022-07-21malloc: Simplify implementation of __malloc_assertFlorian Weimer1