aboutsummaryrefslogtreecommitdiff
path: root/sysdeps/x86
AgeCommit message (Collapse)AuthorFilesLines
2025-04-12x86: Detect Intel Diamond RapidsH.J. Lu1-0/+12
Detect Intel Diamond Rapids and tune it similar to Intel Granite Rapids. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2025-04-11x86: Handle unknown Intel processor with default tuningSunil K Pandey1-144/+143
Enable default tuning for unknown Intel processor. Tested on x86, no regression. Co-Authored-By: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-04-10x86: Add ARL/PTL/CWF model detection supportSunil K Pandey1-0/+10
- Add ARROWLAKE model detection. - Add PANTHERLAKE model detection. - Add CLEARWATERFOREST model detection. Intel® Architecture Instruction Set Extensions Programming Reference https://cdrdv2.intel.com/v1/dl/getContent/671368 Section 1.2. No regression, validated model detection on SDE. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-04-05x86: Optimize xstate size calculationSunil K Pandey2-56/+24
Scan xstate IDs up to the maximum supported xstate ID. Remove the separate AMX xstate calculation. Instead, exclude the AMX space from the start of TILECFG to the end of TILEDATA in xsave_state_size. Completed validation on SKL/SKX/SPR/SDE and compared xsave state size with "ld.so --list-diagnostics" option, no regression. Co-Authored-By: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2025-03-31x86: Link tst-gnu2-tls2-x86-noxsave{,c,xsavec} with libpthreadFlorian Weimer1-0/+3
This fixes a test build failure on Hurd. Fixes commit 145097dff170507fe73190e8e41194f5b5f7e6bf ("x86: Use separate variable for TLSDESC XSAVE/XSAVEC state size (bug 32810)"). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2025-03-31Fix typo in commentYLK1-1/+1
2025-03-29x86: Use separate variable for TLSDESC XSAVE/XSAVEC state size (bug 32810)Florian Weimer8-7/+39
Previously, the initialization code reused the xsave_state_full_size member of struct cpu_features for the TLSDESC state size. However, the tunable processing code assumes that this member has the original XSAVE (non-compact) state size, so that it can use its value if XSAVEC is disabled via tunable. This change uses a separate variable and not a struct member because the value is only needed in ld.so and the static libc, but not in libc.so. As a result, struct cpu_features layout does not change, helping a future backport of this change. Fixes commit 9b7091415af47082664717210ac49d51551456ab ("x86-64: Update _dl_tlsdesc_dynamic to preserve AMX registers"). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-03-29x86: Skip XSAVE state size reset if ISA level requires XSAVEFlorian Weimer1-0/+5
If we have to use XSAVE or XSAVEC trampolines, do not adjust the size information they need. Technically, it is an operator error to try to run with -XSAVE,-XSAVEC on such builds, but this change here disables some unnecessary code with higher ISA levels and simplifies testing. Related to commit befe2d3c4dec8be2cdd01a47132e47bdb7020922 ("x86-64: Don't use SSE resolvers for ISA level 3 or above"). Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-03-05Remove dl-procinfo.hAdhemerval Zanella1-3/+0
powerpc was the only architecture with arch-specific hooks for LD_SHOW_AUXV, and with the information moved to ld diagnostics there is no need to keep the _dl_procinfo hook. Checked with a build for all affected ABIs. Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2025-02-28Remove unused dl-procinfo.hWilco Dijkstra3-49/+0
Remove unused _dl_hwcap_string defines. As a result many dl-procinfo.h headers can be removed. This also removes target specific _dl_procinfo implementations which only printed HWCAP strings using dl_hwcap_string. Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2025-02-23math: Fix `unknown type name '__float128'` for clang 3.4 to 3.8.1 (bug 32694)koraynilay1-2/+2
When compiling a program that includes <bits/floatn.h> using a clang version between 3.4 (included) and 3.8.1 (included), clang will fail with `unknown type name '__float128'; did you mean '__cfloat128'?`. This changes fixes the clang prerequirements macro call in floatn.h to check for clang 3.9 instead of 3.4, since support for __float128 was actually enabled in 3.9 by: commit 50f29e06a1b6a38f0bba9360cbff72c82d46cdd4 Author: Nemanja Ivanovic <nemanja.i.ibm@gmail.com> Date: Wed Apr 13 09:49:45 2016 +0000 Enable support for __float128 in Clang This fixes bug 32694. Signed-off-by: koraynilay <koray.fra@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2025-02-20x86 (__HAVE_FLOAT128): Defined to 0 for Intel SYCL compiler [BZ #32723]H.J. Lu1-2/+6
Intel compiler always defines __INTEL_LLVM_COMPILER. When SYCL is enabled by -fsycl, it also defines SYCL_LANGUAGE_VERSION. Since Intel SYCL compiler doesn't support _Float128: https://github.com/intel/llvm/issues/16903 define __HAVE_FLOAT128 to 0 for Intel SYCL compiler. This fixes BZ #32723. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>
2025-01-09x86: Add missing #include <features.h> to <thread_pointer.h>Florian Weimer1-0/+2
It is required for __GNUC_PREREQ. Reviewed-by: Michael Jeanson <mjeanson@efficios.com>
2025-01-09Move <thread_pointer.h> to kernel-independent sysdeps directoriesFlorian Weimer1-0/+0
Hurd is expected to use the same thread ABI as Linux. Reviewed-by: Michael Jeanson <mjeanson@efficios.com>
2025-01-01Update copyright dates with scripts/update-copyrightsPaul Eggert107-107/+107
2024-12-23include/sys/cdefs.h: Add __attribute_optimization_barrier__Adhemerval Zanella12-25/+25
Add __attribute_optimization_barrier__ to disable inlining and cloning on a function. For Clang, expand it to __attribute__ ((optnone)) Otherwise, expand it to __attribute__ ((noinline, clone)) Co-Authored-By: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>
2024-12-22x86: Define __HAVE_FLOAT128 for Clang and use __builtin_*f128 code pathFangrui Song1-8/+16
Clang supports __builtin_fabsf128 (despite not supporting _Float128) but it does not support __builtin_fabsq. Fallback to back to `typedef __float128 _Float128;` it clang is used. Originally developed by Fangrui Song <maskray@google.com>. Co-Authored-By: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>
2024-12-22x86: Use inhibit_stack_protector on tst-ifunc-isa.hAdhemerval Zanella1-2/+3
Co-Authored-By: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>
2024-12-22x86: Include test-flt-eval-method-387 if -mfpmath=387 worksH.J. Lu3-1/+47
Since Clang doesn't support -mfpmath=387 on x86-64, on x86, include test-flt-eval-method-387 only if -mfpmath=387 works. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sam James <sam@gentoo.org>
2024-12-20elf: Introduce is_rtld_link_mapFlorian Weimer1-1/+1
Unconditionally define it to false for static builds. This avoids the awkward use of weak_extern for _dl_rtld_map in checks that cannot be possibly true on static builds. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-12-18sys/platform/x86.h: Do not depend on _Bool definition in C++ modeH.J. Lu2-3/+3
Clang does not define _Bool for -std=c++98: /usr/include/bits/platform/features.h:31:19: error: unknown type name '_Bool' 31 | static __inline__ _Bool | ^ Change _Bool to bool to silence clang++ error. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-12-17x86: Avoid integer truncation with large cache sizes (bug 32470)Florian Weimer1-2/+2
Some hypervisors report 1 TiB L3 cache size. This results in some variables incorrectly getting zeroed, causing crashes in memcpy/memmove because invariants are violated.
2024-12-16Fix sysdeps/x86/fpu/Makefile: Split and sort testsH.J. Lu1-1/+2
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-12-16sysdeps/x86/fpu/Makefile: Split and sort testsH.J. Lu1-2/+7
Split and sort tests in sysdeps/x86/fpu/Makefile. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-11-25Silence most -Wzero-as-null-pointer-constant diagnosticsAlejandro Colomar1-1/+1
Replace 0 by NULL and {0} by {}. Omit a few cases that aren't so trivial to fix. Link: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117059> Link: <https://software.codidact.com/posts/292718/292759#answer-292759> Signed-off-by: Alejandro Colomar <alx@kernel.org>
2024-08-26x86: Enable non-temporal memset for Hygon processorsFeifei Wang2-3/+8
This patch uses 'Avoid_Non_Temporal_Memset' flag to access the non-temporal memset implementation for hygon processors. Test Results: hygon1 arch x86_memset_non_temporal_threshold = 8MB size new performance time / old performance time 1MB 0.994 4MB 0.996 8MB 0.670 16MB 0.343 32MB 0.355 hygon2 arch x86_memset_non_temporal_threshold = 8MB size new performance time / old performance time 1MB 1 4MB 1 8MB 1.312 16MB 0.822 32MB 0.830 hygon3 arch x86_memset_non_temporal_threshold = 8MB size new performance time / old performance time 1MB 1 4MB 0.990 8MB 0.737 16MB 0.390 32MB 0.401 For hygon arch with this patch, non-temporal stores can improve performance by 20% - 65%. Signed-off-by: Feifei Wang <wangfeifei@hygon.cn> Reviewed-by: Jing Li <lijing@hygon.cn> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-08-26x86: Add cache information support for Hygon processorsFeifei Wang1-0/+60
Add hygon branch in dl_init_cacheinfo function to initialize cache size variables for hygon processors. In the meanwhile, add handle_hygon() function to get cache information. Signed-off-by: Feifei Wang <wangfeifei@hygon.cn> Reviewed-by: Jing Li <lijing@hygon.cn> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-08-26x86: Add new architecture type for Hygon processorsFeifei Wang2-3/+17
Add a new architecture type arch_kind_hygon to spilt Hygon branch from AMD. This is to facilitate the Hygon processors to make settings that are suitable for its own characteristics. Signed-off-by: Feifei Wang <wangfeifei@hygon.cn> Reviewed-by: Jing Li <lijing@hygon.cn> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-08-15x86: Add `Avoid_STOSB` tunable to allow NT memset without ERMSNoah Goldstein5-7/+40
The goal of this flag is to allow targets which don't prefer/have ERMS to still access the non-temporal memset implementation. There are 4 cases for tuning memset: 1) `Avoid_STOSB && Avoid_Non_Temporal_Memset` - Memset with temporal stores 2) `Avoid_STOSB && !Avoid_Non_Temporal_Memset` - Memset with temporal/non-temporal stores. Non-temporal path goes through `rep stosb` path. We accomplish this by setting `x86_rep_stosb_threshold` to `x86_memset_non_temporal_threshold`. 3) `!Avoid_STOSB && Avoid_Non_Temporal_Memset` - Memset with temporal stores/`rep stosb` 3) `!Avoid_STOSB && !Avoid_Non_Temporal_Memset` - Memset with temporal stores/`rep stosb`/non-temporal stores. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-08-15x86: Use `Avoid_Non_Temporal_Memset` to control non-temporal pathNoah Goldstein2-8/+23
This is just a refactor and there should be no behavioral change from this commit. The goal is to make `Avoid_Non_Temporal_Memset` a more universal knob for controlling whether we use non-temporal memset rather than having extra logic based on vendor. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-08-02x86: Tunables may incorrectly set Prefer_PMINUB_for_stringop (bug 32047)Florian Weimer1-0/+1
Fixes commit 5bcf6265f215326d14dfacdce8532792c2c7f8f8 ("x86: Disable non-temporal memset on Skylake Server"). Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-08-02x86: Add missing switch/case fall-through markers to init_cpu_featuresFlorian Weimer1-0/+2
The commits introducing these fall-throughs intended them to happen. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-07-16x86: Disable non-temporal memset on Skylake ServerNoah Goldstein5-12/+26
The original commit enabling non-temporal memset on Skylake Server had erroneous benchmarks (actually done on ICX). Further benchmarks indicate non-temporal stores may in fact by a regression on Skylake Server. This commit may be over-cautious in some cases, but should avoid any regressions for 2.40. Tested using qemu on all x86_64 cpu arch supported by both qemu + GLIBC. Reviewed-by: DJ Delorie <dj@redhat.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-06-30x86: Set default non_temporal_threshold for Zhaoxin processorsMayShao-oc2-2/+5
Current 'non_temporal_threshold' set to 'non_temporal_threshold_lowbound' on Zhaoxin processors without ERMS. The default 'non_temporal_threshold_lowbound' is too small for the KH-40000 and KX-7000 Zhaoxin processors, this patch updates the value to 'shared / cachesize_non_temporal_divisor'. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-06-30x86: Set preferred CPU features on the KH-40000 and KX-7000 Zhaoxin processorsMayShao-oc1-16/+35
Fix code formatting under the Zhaoxin branch and add comments for different Zhaoxin models. Unaligned AVX load are slower on KH-40000 and KX-7000, so disable the AVX_Fast_Unaligned_Load. Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to use sse2_unaligned version of memset,strcpy and strcat. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-06-18elf: Remove HWCAP_IMPORTANTStefan Liebler1-13/+0
Remove the definitions of HWCAP_IMPORTANT after removal of LD_HWCAP_MASK / tunable glibc.cpu.hwcap_mask. There HWCAP_IMPORTANT was used as default value. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-06-18elf: Remove _DL_PLATFORMS_COUNTStefan Liebler3-12/+4
Remove the definitions of _DL_PLATFORMS_COUNT as those are not used anymore after removal in elf/dl-cache.c:search_cache(). Note: On x86, we can also get rid of the definitions HWCAP_PLATFORMS_START and HWCAP_PLATFORMS_COUNT. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-06-18elf: Remove _DL_FIRST_PLATFORMStefan Liebler1-3/+0
Remove the definitions of _DL_FIRST_PLATFORM as those were only used in the _DL_HWCAP_PLATFORM definitions and in _dl_string_platform(). Both were removed. Note: Removed on every architecture despite of powerpc, where _dl_string_platform() is still used. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-06-18elf: Remove _DL_HWCAP_PLATFORMStefan Liebler1-3/+0
Remove the definitions of _DL_HWCAP_PLATFORM as those are not used anymore after removal in elf/dl-cache.c:search_cache(). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-06-18elf: Remove platform strings in dl-procinfo.cStefan Liebler1-16/+0
Remove the platform strings in dl-procinfo.c where also the implementation of _dl_string_platform() was removed. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-06-18elf: Remove _dl_string_platformStefan Liebler1-15/+0
Despite of powerpc where the returned integer is stored in tcb, and the diagnostics output, there is no user anymore. Thus this patch removes the diagnostics output and _dl_string_platform for all other platforms. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-06-18x86: Remove HWCAP_START and HWCAP_COUNTStefan Liebler1-6/+0
Both defines are not used anymore. Those were only used for _dl_string_hwcap(), which itself was removed with commit ab40f20364f4a417a63dd51fdd943742070bfe96 "elf: Remove _dl_string_hwcap" Just clean up. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-06-17Convert to autoconf 2.72 (vanilla release, no distribution patches)Andreas K. Hüttel1-12/+16
As discussed at the patch review meeting Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org> Reviewed-by: Simon Chopin <simon.chopin@canonical.com>
2024-06-14x86: Fix value for `x86_memset_non_temporal_threshold` when it is undesirableNoah Goldstein1-3/+3
When we don't want to use non-temporal stores for memset, we set `x86_memset_non_temporal_threshold` to SIZE_MAX. The current code, however, we using `maximum_non_temporal_threshold` as the upper bound which is `SIZE_MAX >> 4` so we ended up with a value of `0`. Fix is to just use `SIZE_MAX` as the upper bound for when setting the tunable. Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-06-12x86: Properly set x86 minimum ISA level [BZ #31883]H.J. Lu3-3/+17
Properly set libc_cv_have_x86_isa_level in shell for MINIMUM_X86_ISA_LEVEL defined as (__X86_ISA_V1 + __X86_ISA_V2 + __X86_ISA_V3 + __X86_ISA_V4) Also set __X86_ISA_V2 to 1 for i386 if __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 is defined. There are no changes in config.h nor in config.make on x86-64. On i386, -march=x86-64-v2 with GCC generates #define MINIMUM_X86_ISA_LEVEL 2 in config.h and have-x86-isa-level = 2 in config.make. This fixes BZ #31883. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-06-11x86: Properly set MINIMUM_X86_ISA_LEVEL for i386 [BZ #31867]H.J. Lu2-4/+12
On i386, set the default minimum ISA level to 0, not 1 (baseline which includes SSE2). There are no changes in config.h nor in config.make on x86-64. This fixes BZ #31867. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Tested-by: Ian Jordan <immoloism@gmail.com> Reviewed-by: Sam James <sam@gentoo.org> Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-06-10x86: Enable non-temporal memset tunable for AMDJoe Damato1-4/+4
In commit 46b5e98ef6f1 ("x86: Add seperate non-temporal tunable for memset") a tunable threshold for enabling non-temporal memset was added, but only for Intel hardware. Since that commit, new benchmark results suggest that non-temporal memset is beneficial on AMD, as well, so allow this tunable to be set for AMD. See: https://docs.google.com/spreadsheets/d/1opzukzvum4n6-RUVHTGddV6RjAEil4P2uMjjQGLbLcU/edit?usp=sharing which has been updated to include data using different stategies for large memset on AMD Zen2, Zen3, and Zen4. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-05-30x86: Add seperate non-temporal tunable for memsetNoah Goldstein5-2/+31
The tuning for non-temporal stores for memset vs memcpy is not always the same. This includes both the exact value and whether non-temporal stores are profitable at all for a given arch. This patch add `x86_memset_non_temporal_threshold`. Currently we disable non-temporal stores for non Intel vendors as the only benchmarks showing its benefit have been on Intel hardware. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-27i386: Disable Intel Xeon Phi tests for GCC 15 and above (BZ 31782)Sunil K Pandey1-1/+7
This patch disables Intel Xeon Phi tests for GCC 15 and above. GCC 15 removed Intel Xeon Phi ISA support. commit e1a7e2c54d52d0ba374735e285b617af44841ace Author: Haochen Jiang <haochen.jiang@intel.com> Date: Mon May 20 10:43:44 2024 +0800 i386: Remove Xeon Phi ISA support Fixes BZ 31782. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-07support: Add envp argument to support_capture_subprogramAdhemerval Zanella1-1/+1
So tests can specify a list of environment variables. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>