aboutsummaryrefslogtreecommitdiff
path: root/sysdeps/generic
AgeCommit message (Collapse)AuthorFilesLines
2025-04-08elf: Extend glibc.rtld.execstack tunable to force executable stack (BZ 32653)Adhemerval Zanella1-0/+13
From the bug report [1], multiple programs still require to dlopen shared libraries with either missing PT_GNU_STACK or with the executable bit set. Although, in some cases, it seems to be a hard-craft assembly source without the required .note.GNU-stack marking (so the static linker is forced to set the stack executable if the ABI requires it), other cases seem that the library uses trampolines [2]. Unfortunately, READ_IMPLIES_EXEC is not an option since on some ABIs (x86_64), the kernel clears the bit, making it unsupported. To avoid reinstating the broken code that changes stack permission on dlopen (0ca8785a28), this patch extends the glibc.rtld.execstack tunable to allow an option to force an executable stack at the program startup. The tunable is a security issue because it defeats the PT_GNU_STACK hardening. It has the slight advantage of making it explicit by the caller, and, as for other tunables, this is disabled for setuid binaries. A tunable also allows us to eventually remove it, but from previous experiences, it would require some time. Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu. [1] https://sourceware.org/bugzilla/show_bug.cgi?id=32653 [2] https://github.com/conda-forge/ctng-compiler-activation-feedstock/issues/143 Reviewed-by: Sam James <sam@gentoo.org>
2025-03-27Implement C23 pownJoseph Myers5-0/+16
C23 adds various <math.h> function families originally defined in TS 18661-4. Add the pown functions, which are like pow but with an integer exponent. That exponent has type long long int in C23; it was intmax_t in TS 18661-4, and as with other interfaces changed after their initial appearance in the TS, I don't think we need to support the original version of the interface. The test inputs are based on the subset of test inputs for pow that use integer exponents that fit in long long. As the first such template implementation that saves and restores the rounding mode internally (to avoid possible issues with directed rounding and intermediate overflows or underflows in the wrong rounding mode), support also needed to be added for using SET_RESTORE_ROUND* in such template function implementations. This required math-type-macros-float128.h to include <fenv_private.h>, so it can tell whether SET_RESTORE_ROUNDF128 is defined. In turn, the include order with <fenv_private.h> included before <math_private.h> broke loongarch builds, showing up that sysdeps/loongarch/math_private.h is really a fenv_private.h file (maybe implemented internally before the consistent split of those headers in 2018?) and needed to be renamed to fenv_private.h to avoid errors with duplicate macro definitions if <math_private.h> is included after <fenv_private.h>. The underlying implementation uses __ieee754_pow functions (called more than once in some cases, where the exponent does not fit in the floating type). I expect a custom implementation for a given format, that only handles integer exponents but handles larger exponents directly, could be faster and more accurate in some cases. I encourage searching for worst cases for ulps error for these implementations (necessarily non-exhaustively, given the size of the input space). Tested for x86_64 and x86, and with build-many-glibcs.py.
2025-03-21debug: Improve '%n' fortify detection (BZ 30932)Adhemerval Zanella1-0/+14
The 7bb8045ec0 path made the '%n' fortify check ignore EMFILE errors while trying to open /proc/self/maps, and this added a security issue where EMFILE can be attacker-controlled thus making it ineffective for some cases. The EMFILE failure is reinstated but with a different error message. Also, to improve the false positive of the hardening for the cases where no new files can be opened, the _dl_readonly_area now uses _dl_find_object to check if the memory area is within a writable ELF segment. The procfs method is still used as fallback. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Arjun Shankar <arjun@redhat.com>
2025-03-21Remove eloop-threshold.hAdhemerval Zanella2-72/+42
On both Linux and Hurd the __eloop_threshold() is always a constant (40 and 32 respectively), so there is no need to always call __sysconf (_SC_SYMLOOP_MAX) for Linux case (!SYMLOOP_MAX). To avoid a name clash with gnulib, rename the new file min-eloop-threshold.h. Checked on x86_64-linux-gnu and with a build for x86_64-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
2025-03-13elf: Canonicalize $ORIGIN in an explicit ld.so invocation [BZ 25263]Adhemerval Zanella2-31/+4
When an executable is invoked directly, we calculate $ORIGIN by calling readlink on /proc/self/exe, which the Linux kernel resolves to the target of any symlinks. However, if an executable is run through ld.so, we cannot use /proc/self/exe and instead use the path given as an argument. This leads to a different calculation of $ORIGIN, which is most notable in that it causes ldd to behave differently (e.g., by not finding a library) from directly running the program. To make the behavior consistent, take advantage of the fact that the kernel also resolves /proc/self/fd/ symlinks to the target of any symlinks in the same manner, so once we have opened the main executable in order to load it, replace the user-provided path with the result of calling readlink("/proc/self/fd/N"). (On non-Linux platforms this resolution does not happen and so no behavior change is needed.) The __fd_to_filename requires _fitoa_word and _itoa_word, which for 32-bits pulls a lot of definitions from _itoa.c (due _ITOA_NEEDED being defined). To simplify the build move the required function to a new file, _fitoa_word.c. Checked on x86_64-linux-gnu and i686-linux-gnu. Co-authored-by: Geoffrey Thomas <geofft@ldpreload.com> Reviewed-by: Geoffrey Thomas <geofft@ldpreload.com> Tested-by: Geoffrey Thomas <geofft@ldpreload.com>
2025-03-12math: Refactor how to use libm-test-ulpsAdhemerval Zanella2-6/+0
The current approach tracks math maximum supported errors by explicitly setting them per function and architecture. On newer implementations or new compiler versions, the file is updated with newer values if it shows higher results. The idea is to track the maximum known error, to update the manual with the obtained values. The constant libm-test-ulps shows little value, where it is usually a mechanical change done by the maintainer, for past releases it is usually ignored whether the ulp change resulted from a compiler regression, and the math tests already have a maximum ulp error that triggers a regression. It was shown by a recent update after the new acosf [1] implementation that is correctly rounded, where the libm-test-ulps was indeed from a compiler issue. This patch removes all arch-specific libm-test-ulps, adds system generic libm-test-ulps where applicable, and changes its semantics. The generic files now track specific implementation constraints, like if it is expected to be correctly rounded, or if the system-specific has different error expectations. Now multiple libm-test-ulps can be defined, and system-specific overrides generic implementation. This is for the case where arch-specific implementation might show worse precision than generic implementation, for instance, the cbrtf on i686. Regressions are only reported if the implementation shows larger errors than 9 ulps (13 for IBM long double) unless it is overridden by libm-test-ulps and the maximum error is not printed at the end of tests. The regen-ulps rule is also removed since it does not make sense to update the libm-test-ulps automatically. The manual error table is also removed, Paul Zimmermann and others have been tracking libm precision with a more comprehensive analysis for some releases; so link to his work instead. [1] https://sourceware.org/git/?p=glibc.git;a=commit;h=9cc9f8e11e8fb8f54f1e84d9f024917634a78201
2025-03-05Remove dl-procinfo.hAdhemerval Zanella1-25/+0
powerpc was the only architecture with arch-specific hooks for LD_SHOW_AUXV, and with the information moved to ld diagnostics there is no need to keep the _dl_procinfo hook. Checked with a build for all affected ABIs. Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2025-02-28Remove unused dl-procinfo.hWilco Dijkstra1-6/+0
Remove unused _dl_hwcap_string defines. As a result many dl-procinfo.h headers can be removed. This also removes target specific _dl_procinfo implementations which only printed HWCAP strings using dl_hwcap_string. Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-02-02elf: Add l_soname accessor function for DT_SONAME valuesFlorian Weimer1-0/+12
It's not necessary to introduce temporaries because the compiler is able to evaluate l_soname just once in constracts like: l_soname (l) != NULL && strcmp (l_soname (l), LIBC_SO) != 0
2025-02-02elf: Split _dl_lookup_map, _dl_map_new_object from _dl_map_objectFlorian Weimer1-0/+13
So that they can eventually be called separately from dlopen.
2025-01-30ld.so: Decorate BSS mappingsPetr Malat1-0/+12
Decorate BSS mappings with [anon: glibc: .bss <file>], for example [anon: glibc: .bss /lib/libc.so.6]. The string ".bss" is already used by bionic so use the same, but add the filename as well. If the name would be longer than what the kernel allows, drop the directory part of the path. Refactor glibc.mem.decorate_maps check to a separate function and use it to avoid assembling a name, which would not be used later. Signed-off-by: Petr Malat <oss@malat.biz> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-01-10Add generic 'extra TLS'Michael Jeanson1-0/+46
Add the logic to append an 'extra TLS' block in the TLS block allocator with a generic stub implementation. The duplicated code in 'csu/libc-tls.c' and 'elf/dl-tls.c' is to handle both statically linked applications and the ELF dynamic loader. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>
2025-01-09elf: Always define TLS_TP_OFFSETFlorian Weimer1-0/+3
This will be needed to compute __rseq_offset outside of the TLS relocation machinery. Reviewed-by: Michael Jeanson <mjeanson@efficios.com>
2025-01-09Move <thread_pointer.h> to kernel-independent sysdeps directoriesFlorian Weimer1-0/+28
Hurd is expected to use the same thread ABI as Linux. Reviewed-by: Michael Jeanson <mjeanson@efficios.com>
2025-01-02elf: Introduce generic <dl-tls.h>Florian Weimer1-2/+36
On arc, the definition of TLS_DTV_UNALLOCATED now comes from <dl-dtv.h>. For x86-64 x32, a separate version is needed because unsigned long int is 32 bits on this target. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-01-01Update copyright dates with scripts/update-copyrightsPaul Eggert189-189/+189
2024-12-31elf: Do not change stack permission on dlopen/dlmopenAdhemerval Zanella1-18/+4
If some shared library loaded with dlopen/dlmopen requires an executable stack, either implicitly because of a missing GNU_STACK ELF header (where the ABI default flags implies in the executable bit) or explicitly because of the executable bit from GNU_STACK; the loader will try to set the both the main thread and all thread stacks (from the pthread cache) as executable. Besides the issue where any __nptl_change_stack_perm failure does not undo the previous executable transition (meaning that if the library fails to load, there can be thread stacks with executable stacks), this behavior was used on a CVE [1] as a vector for RCE. This patch changes that if a shared library requires an executable stack, and the current stack is not executable, dlopen fails. The change is done only for dynamically loaded modules, if the program or any dependency requires an executable stack, the loader will still change the main thread before program execution and any thread created with default stack configuration. [1] https://www.qualys.com/2023/07/19/cve-2023-38408/rce-openssh-forwarded-ssh-agent.txt Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-12-23include/sys/cdefs.h: Add __attribute_optimization_barrier__Adhemerval Zanella1-1/+1
Add __attribute_optimization_barrier__ to disable inlining and cloning on a function. For Clang, expand it to __attribute__ ((optnone)) Otherwise, expand it to __attribute__ ((noinline, clone)) Co-Authored-By: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>
2024-12-20elf: Move _dl_rtld_map, _dl_rtld_audit_state out of GLFlorian Weimer1-10/+9
This avoids immediate GLIBC_PRIVATE ABI issues if the size of struct link_map or struct auditstate changes. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-12-20elf: Introduce is_rtld_link_mapFlorian Weimer1-1/+15
Unconditionally define it to false for static builds. This avoids the awkward use of weak_extern for _dl_rtld_map in checks that cannot be possibly true on static builds. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-12-19elf: Remove code dependent on __rtld_lock_default_lock_recursive macroFlorian Weimer1-6/+0
Neither NPTL nor Hurd define this macro anymore. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-12-04malloc: Optimize small memory clearing for callocH.J. Lu1-0/+49
Add calloc-clear-memory.h to clear memory size up to 36 bytes (72 bytes on 64-bit targets) for calloc. Use repeated stores with 1 branch, instead of up to 3 branches. On x86-64, it is faster than memset since calling memset needs 1 indirect branch, 1 broadcast, and up to 4 branches. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2024-12-02elf: Consolidate stackinfo.hAdhemerval Zanella1-3/+12
And use sane default the generic implementation. Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-11-22math: Use tanf from CORE-MATHAdhemerval Zanella1-0/+150
The CORE-MATH implementation is correctly rounded (for any rounding mode) and shows better performance to the generic tanf. The code was adapted to glibc style, to use the definition of math_config.h, to remove errno handling, and to use a generic 128 bit routine for ABIs that do not support it natively. Benchtest on x64_64 (Ryzen 9 5900X, gcc 14.2.1), aarch64 (neoverse1, gcc 13.2.1), and powerpc (POWER10, gcc 13.2.1): latency master patched improvement x86_64 82.3961 54.8052 33.49% x86_64v2 82.3415 54.8052 33.44% x86_64v3 69.3661 50.4864 27.22% i686 219.271 45.5396 79.23% aarch64 29.2127 19.1951 34.29% power10 19.5060 16.2760 16.56% reciprocal-throughput master patched improvement x86_64 28.3976 19.7334 30.51% x86_64v2 28.4568 19.7334 30.65% x86_64v3 21.1815 16.1811 23.61% i686 105.016 15.1426 85.58% aarch64 18.1573 10.7681 40.70% power10 8.7207 8.7097 0.13% Signed-off-by: Alexei Sibidanov <sibid@uvic.ca> Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr> Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: DJ Delorie <dj@redhat.com>
2024-11-12linux: Add support for getrandom vDSOAdhemerval Zanella2-1/+29
Linux 6.11 has getrandom() in vDSO. It operates on a thread-local opaque state allocated with mmap using flags specified by the vDSO. Multiple states are allocated at once, as many as fit into a page, and these are held in an array of available states to be doled out to each thread upon first use, and recycled when a thread terminates. As these states run low, more are allocated. To make this procedure async-signal-safe, a simple guard is used in the LSB of the opaque state address, falling back to the syscall if there's reentrancy contention. Also, _Fork() is handled by blocking signals on opaque state allocation (so _Fork() always sees a consistent state even if it interrupts a getrandom() call) and by iterating over the thread stack cache on reclaim_stack. Each opaque state will be in the free states list (grnd_alloc.states) or allocated to a running thread. The cancellation is handled by always using GRND_NONBLOCK flags while calling the vDSO, and falling back to the cancellable syscall if the kernel returns EAGAIN (would block). Since getrandom is not defined by POSIX and cancellation is supported as an extension, the cancellation is handled as 'may occur' instead of 'shall occur' [1], meaning that if vDSO does not block (the expected behavior) getrandom will not act as a cancellation entrypoint. It avoids a pthread_testcancel call on the fast path (different than 'shall occur' functions, like sem_wait()). It is currently enabled for x86_64, which is available in Linux 6.11, and aarch64, powerpc32, powerpc64, loongarch64, and s390x, which are available in Linux 6.12. Link: https://pubs.opengroup.org/onlinepubs/9799919799/nframe.html [1] Co-developed-by: Jason A. Donenfeld <Jason@zx2c4.com> Tested-by: Jason A. Donenfeld <Jason@zx2c4.com> # x86_64 Tested-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> # x86_64, aarch64 Tested-by: Xi Ruoyao <xry111@xry111.site> # x86_64, aarch64, loongarch64 Tested-by: Stefan Liebler <stli@linux.ibm.com> # s390x
2024-11-06elf: Introduce _dl_relocate_object_no_relroFlorian Weimer1-0/+7
And make _dl_protect_relro apply RELRO conditionally. Reviewed-by: DJ Delorie <dj@redhat.com>
2024-10-08stdlib: Make abort/_Exit AS-safe (BZ 26275)Adhemerval Zanella2-2/+51
The recursive lock used on abort does not synchronize with a new process creation (either by fork-like interfaces or posix_spawn ones), nor it is reinitialized after fork(). Also, the SIGABRT unblock before raise() shows another race condition, where a fork or posix_spawn() call by another thread, just after the recursive lock release and before the SIGABRT signal, might create programs with a non-expected signal mask. With the default option (without POSIX_SPAWN_SETSIGDEF), the process can see SIG_DFL for SIGABRT, where it should be SIG_IGN. To fix the AS-safe, raise() does not change the process signal mask, and an AS-safe lock is used if a SIGABRT is installed or the process is blocked or ignored. With the signal mask change removal, there is no need to use a recursive loc. The lock is also taken on both _Fork() and posix_spawn(), to avoid the spawn process to see the abort handler as SIG_DFL. A read-write lock is used to avoid serialize _Fork and posix_spawn execution. Both sigaction (SIGABRT) and abort() requires to lock as writer (since both change the disposition). The fallback is also simplified: there is no need to use a loop of ABORT_INSTRUCTION after _exit() (if the syscall does not terminate the process, the system is broken). The proposed fix changes how setjmp works on a SIGABRT handler, where glibc does not save the signal mask. So usage like the below will now always abort. static volatile int chk_fail_ok; static jmp_buf chk_fail_buf; static void handler (int sig) { if (chk_fail_ok) { chk_fail_ok = 0; longjmp (chk_fail_buf, 1); } else _exit (127); } [...] signal (SIGABRT, handler); [....] chk_fail_ok = 1; if (! setjmp (chk_fail_buf)) { // Something that can calls abort, like a failed fortify function. chk_fail_ok = 0; printf ("FAIL\n"); } Such cases will need to use sigsetjmp instead. The _dl_start_profile calls sigaction through _profil, and to avoid pulling abort() on loader the call is replaced with __libc_sigaction. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>
2024-08-23nptl: Fix Race conditions in pthread cancellation [BZ#12683]Adhemerval Zanella1-0/+25
The current racy approach is to enable asynchronous cancellation before making the syscall and restore the previous cancellation type once the syscall returns, and check if cancellation has happen during the cancellation entrypoint. As described in BZ#12683, this approach shows 2 problems: 1. Cancellation can act after the syscall has returned from the kernel, but before userspace saves the return value. It might result in a resource leak if the syscall allocated a resource or a side effect (partial read/write), and there is no way to program handle it with cancellation handlers. 2. If a signal is handled while the thread is blocked at a cancellable syscall, the entire signal handler runs with asynchronous cancellation enabled. This can lead to issues if the signal handler call functions which are async-signal-safe but not async-cancel-safe. For the cancellation to work correctly, there are 5 points at which the cancellation signal could arrive: [ ... )[ ... )[ syscall ]( ... 1 2 3 4 5 1. Before initial testcancel, e.g. [*... testcancel) 2. Between testcancel and syscall start, e.g. [testcancel...syscall start) 3. While syscall is blocked and no side effects have yet taken place, e.g. [ syscall ] 4. Same as 3 but with side-effects having occurred (e.g. a partial read or write). 5. After syscall end e.g. (syscall end...*] And libc wants to act on cancellation in cases 1, 2, and 3 but not in cases 4 or 5. For the 4 and 5 cases, the cancellation will eventually happen in the next cancellable entrypoint without any further external event. The proposed solution for each case is: 1. Do a conditional branch based on whether the thread has received a cancellation request; 2. It can be caught by the signal handler determining that the saved program counter (from the ucontext_t) is in some address range beginning just before the "testcancel" and ending with the syscall instruction. 3. SIGCANCEL can be caught by the signal handler and determine that the saved program counter (from the ucontext_t) is in the address range beginning just before "testcancel" and ending with the first uninterruptable (via a signal) syscall instruction that enters the kernel. 4. In this case, except for certain syscalls that ALWAYS fail with EINTR even for non-interrupting signals, the kernel will reset the program counter to point at the syscall instruction during signal handling, so that the syscall is restarted when the signal handler returns. So, from the signal handler's standpoint, this looks the same as case 2, and thus it's taken care of. 5. For syscalls with side-effects, the kernel cannot restart the syscall; when it's interrupted by a signal, the kernel must cause the syscall to return with whatever partial result is obtained (e.g. partial read or write). 6. The saved program counter points just after the syscall instruction, so the signal handler won't act on cancellation. This is similar to 4. since the program counter is past the syscall instruction. So The proposed fixes are: 1. Remove the enable_asynccancel/disable_asynccancel function usage in cancellable syscall definition and instead make them call a common symbol that will check if cancellation is enabled (__syscall_cancel at nptl/cancellation.c), call the arch-specific cancellable entry-point (__syscall_cancel_arch), and cancel the thread when required. 2. Provide an arch-specific generic system call wrapper function that contains global markers. These markers will be used in SIGCANCEL signal handler to check if the interruption has been called in a valid syscall and if the syscalls has side-effects. A reference implementation sysdeps/unix/sysv/linux/syscall_cancel.c is provided. However, the markers may not be set on correct expected places depending on how INTERNAL_SYSCALL_NCS is implemented by the architecture. It is expected that all architectures add an arch-specific implementation. 3. Rewrite SIGCANCEL asynchronous handler to check for both canceling type and if current IP from signal handler falls between the global markers and act accordingly. 4. Adjust libc code to replace LIBC_CANCEL_ASYNC/LIBC_CANCEL_RESET to use the appropriate cancelable syscalls. 5. Adjust 'lowlevellock-futex.h' arch-specific implementations to provide cancelable futex calls. Some architectures require specific support on syscall handling: * On i386 the syscall cancel bridge needs to use the old int80 instruction because the optimized vDSO symbol the resulting PC value for an interrupted syscall points to an address outside the expected markers in __syscall_cancel_arch. It has been discussed in LKML [1] on how kernel could help userland to accomplish it, but afaik discussion has stalled. Also, sysenter should not be used directly by libc since its calling convention is set by the kernel depending of the underlying x86 chip (check kernel commit 30bfa7b3488bfb1bb75c9f50a5fcac1832970c60). * mips o32 is the only kABI that requires 7 argument syscall, and to avoid add a requirement on all architectures to support it, mips support is added with extra internal defines. Checked on aarch64-linux-gnu, arm-linux-gnueabihf, powerpc-linux-gnu, powerpc64-linux-gnu, powerpc64le-linux-gnu, i686-linux-gnu, and x86_64-linux-gnu. [1] https://lkml.org/lkml/2016/3/8/1105 Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-08-19elf: Make dl-fptr and dl-symaddr hppa specificAdhemerval Zanella1-45/+0
With ia64 removal, the function descriptor supports is only used by HPPA and new architectures do not seem leaning towards this design. Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-08-05elf: Avoid re-initializing already allocated TLS in dlopen (bug 31717)Florian Weimer1-7/+1
The old code used l_init_called as an indicator for whether TLS initialization was complete. However, it is possible that TLS for an object is initialized, written to, and then dlopen for this object is called again, and l_init_called is not true at this point. Previously, this resulted in TLS being initialized twice, discarding any interim writes (technically introducing a use-after-free bug even). This commit introduces an explicit per-object flag, l_tls_in_slotinfo. It indicates whether _dl_add_to_slotinfo has been called for this object. This flag is used to avoid double-initialization of TLS. In update_tls_slotinfo, the first_static_tls micro-optimization is removed because preserving the initalization flag for subsequent use by the second loop for static TLS is a bit complicated, and another per-object flag does not seem to be worth it. Furthermore, the l_init_called flag is dropped from the second loop (for static TLS initialization) because l_need_tls_init on its own prevents double-initialization. The remaining l_init_called usage in resize_scopes and update_scopes is just an optimization due to the use of scope_has_map, so it is not changed in this commit. The isupper check ensures that libc.so.6 is TLS is not reverted. Such a revert happens if l_need_tls_init is not cleared in _dl_allocate_tls_init for the main_thread case, now that l_init_called is not checked anymore in update_tls_slotinfo in elf/dl-open.c. Reported-by: Jonathon Anderson <janderson@rice.edu> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-08-05elf: Clarify and invert second argument of _dl_allocate_tls_initFlorian Weimer1-3/+1
Also remove an outdated comment: _dl_allocate_tls_init is called as part of pthread_create. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-08-01Add mremap testsH.J. Lu1-0/+25
Add tests for MREMAP_MAYMOVE and MREMAP_FIXED. On Linux, also test MREMAP_DONTUNMAP. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-07-01elf: Support recursive use of dynamic TLS in interposed mallocFlorian Weimer1-0/+14
It turns out that quite a few applications use bundled mallocs that have been built to use global-dynamic TLS (instead of the recommended initial-exec TLS). The previous workaround from commit afe42e935b3ee97bac9a7064157587777259c60e ("elf: Avoid some free (NULL) calls in _dl_update_slotinfo") does not fix all encountered cases unfortunatelly. This change avoids the TLS generation update for recursive use of TLS from a malloc that was called during a TLS update. This is possible because an interposed malloc has a fixed module ID and TLS slot. (It cannot be unloaded.) If an initially-loaded module ID is encountered in __tls_get_addr and the dynamic linker is already in the middle of a TLS update, use the outdated DTV, thus avoiding another call into malloc. It's still necessary to update the DTV to the most recent generation, to get out of the slow path, which is why the check for recursion is needed. The bookkeeping is done using a global counter instead of per-thread flag because TLS access in the dynamic linker is tricky. All this will go away once the dynamic linker stops using malloc for TLS, likely as part of a change that pre-allocates all TLS during pthread_create/dlopen. Fixes commit d2123d68275acc0f061e73d5f86ca504e0d5a344 ("elf: Fix slow tls access after dlopen [BZ #19924]"). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-06-18elf: Remove HWCAP_IMPORTANTStefan Liebler1-3/+0
Remove the definitions of HWCAP_IMPORTANT after removal of LD_HWCAP_MASK / tunable glibc