aboutsummaryrefslogtreecommitdiff
path: root/sysdeps/aarch64/fpu/vecmath_config.h
AgeCommit message (Collapse)AuthorFilesLines
2025-01-01Update copyright dates with scripts/update-copyrightsPaul Eggert1-1/+1
2024-11-01AArch64: Remove SVE erf and erfc tablesJoe Ramsay1-20/+8
By using a combination of mask-and-add instead of the shift-based index calculation the routines can share the same table as other variants with no performance degradation. The tables change name because of other changes in downstream AOR. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2024-05-21aarch64/fpu: Add vector variants of powJoe Ramsay1-11/+31
Plus a small amount of moving includes around in order to be able to remove duplicate definition of asuint64. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-04aarch64/fpu: Add vector variants of erfcJoe Ramsay1-0/+16
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-04aarch64/fpu: Add vector variants of coshJoe Ramsay1-0/+2
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-04aarch64/fpu: Add vector variants of erfJoe Ramsay1-0/+28
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-01-01Update copyright dates with scripts/update-copyrightsPaul Eggert1-1/+1
2023-11-10aarch64: Add vector implementations of atan2 routinesJoe Ramsay1-0/+11
2023-10-23aarch64: Add vector implementations of log10 routinesJoe Ramsay1-0/+11
A table is also added, which is shared between AdvSIMD and SVE log10.
2023-10-23aarch64: Add vector implementations of log2 routinesJoe Ramsay1-0/+12
A table is also added, which is shared between AdvSIMD and SVE log2.
2023-10-05aarch64: Optimise vecmath logsJoe Ramsay1-2/+4
* Transpose table layout for improved memory access * Use half-vector special comparisons for AdvSIMD * Improve register use near special-case branches - Due to the presence of a function call, return value would get mov-d out of x0 in order to facilitate PCS. By moving the final computation after the branch this can be avoided Also change SVE routines to use overloaded intrinsics for readability.
2023-06-30aarch64: Add vector implementations of exp routinesJoe Ramsay1-0/+3
Optimised implementations for single and double precision, Advanced SIMD and SVE, copied from Arm Optimized Routines. As previously, data tables are used via a barrier to prevent overly aggressive constant inlining. Special-case handlers are marked NOINLINE to avoid incurring the penalty of switching call standards unnecessarily. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2023-06-30aarch64: Add vector implementations of log routinesJoe Ramsay1-0/+10
Optimised implementations for single and double precision, Advanced SIMD and SVE, copied from Arm Optimized Routines. Log lookup table added as HIDDEN symbol to allow it to be shared between AdvSIMD and SVE variants. As previously, data tables are used via a barrier to prevent overly aggressive constant inlining. Special-case handlers are marked NOINLINE to avoid incurring the penalty of switching call standards unnecessarily. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2023-06-30aarch64: Add vector implementations of cos routinesJoe Ramsay1-0/+38
Replace the loop-over-scalar placeholder routines with optimised implementations from Arm Optimized Routines (AOR). Also add some headers containing utilities for aarch64 libmvec routines, and update libm-test-ulps. Data tables for new routines are used via a pointer with a barrier on it, in order to prevent overly aggressive constant inlining in GCC. This allows a single adrp, combined with offset loads, to be used for every constant in the table. Special-case handlers are marked NOINLINE in order to confine the save/restore overhead of switching from vector to normal calling standard. This way we only incur the extra memory access in the exceptional cases. NOINLINE definitions have been moved to math_private.h in order to reduce duplication. AOR exposes a config option, WANT_SIMD_EXCEPT, to enable selective masking (and later fixing up) of invalid lanes, in order to trigger fp exceptions correctly (AdvSIMD only). This is tested and maintained in AOR, however it is configured off at source level here for performance reasons. We keep the WANT_SIMD_EXCEPT blocks in routine sources to greatly simplify the upstreaming process from AOR to glibc. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>