| Age | Commit message (Collapse) | Author | Files | Lines |
|
Replace the loop-over-scalar placeholder routines with optimised
implementations from Arm Optimized Routines (AOR).
Also add some headers containing utilities for aarch64 libmvec
routines, and update libm-test-ulps.
Data tables for new routines are used via a pointer with a
barrier on it, in order to prevent overly aggressive constant
inlining in GCC. This allows a single adrp, combined with offset
loads, to be used for every constant in the table.
Special-case handlers are marked NOINLINE in order to confine the
save/restore overhead of switching from vector to normal calling
standard. This way we only incur the extra memory access in the
exceptional cases. NOINLINE definitions have been moved to
math_private.h in order to reduce duplication.
AOR exposes a config option, WANT_SIMD_EXCEPT, to enable
selective masking (and later fixing up) of invalid lanes, in
order to trigger fp exceptions correctly (AdvSIMD only). This is
tested and maintained in AOR, however it is configured off at
source level here for performance reasons. We keep the
WANT_SIMD_EXCEPT blocks in routine sources to greatly simplify
the upstreaming process from AOR to glibc.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
|
|
Finally remove all mpa related files, headers, declarations, probes, unused
tables and update makefiles.
Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
|
|
The __sin32 and __cos32 functions were only used in the now removed slow
path of asin and acos.
|
|
issignalingf is a very small function used in some areas where
better performance (and smaller code) might be helpful.
Create inline implementation for issignalingf.
Reviewed-by: Joseph Myers <joseph@codesourcery.com>
|
|
The algorithm is exp(y * log(x)), where log(x) is computed with about
1.3*2^-68 relative error (1.5*2^-68 without fma), returning the result
in two doubles, and the exp part uses the same algorithm (and lookup
tables) as exp, but takes the input as two doubles and a sign (to handle
negative bases with odd integer exponent). The __exp1 internal symbol
is no longer necessary.
There is separate code path when fma is not available but the worst case
error is about 0.54 ULP in both cases. The lookup table and consts for
log are 4168 bytes. The .rodata+.text is decreased by 37908 bytes on
aarch64. The non-nearest rounding error is less than 1 ULP.
Improvements on Cortex-A72 compared to current glibc master:
pow thruput: 2.40x in [0.01 11.1]x[0.01 11.1]
pow latency: 1.84x in [0.01 11.1]x[0.01 11.1]
Tested on
aarch64-linux-gnu (defined __FP_FAST_FMA, TOINT_INTRINSICS) and
arm-linux-gnueabihf (!defined __FP_FAST_FMA, !TOINT_INTRINSICS) and
x86_64-linux-gnu (!defined __FP_FAST_FMA, !TOINT_INTRINSICS) and
powerpc64le-linux-gnu (defined __FP_FAST_FMA, !TOINT_INTRINSICS) targets.
* NEWS: Mention pow improvements.
* math/Makefile (type-double-routines): Add e_pow_log_data.
* sysdeps/generic/math_private.h (__exp1): Remove.
* sysdeps/i386/fpu/e_pow_log_data.c: New file.
* sysdeps/ia64/fpu/e_pow_log_data.c: New file.
* sysdeps/ieee754/dbl-64/Makefile (CFLAGS-e_pow.c): Allow fma
contraction.
* sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove.
(exp_inline): Remove.
(__ieee754_exp): Only single double input is handled.
* sysdeps/ieee754/dbl-64/e_pow.c: Rewrite.
* sysdeps/ieee754/dbl-64/e_pow_log_data.c: New file.
* sysdeps/ieee754/dbl-64/math_config.h (issignaling_inline): Define.
(__pow_log_data): Define.
* sysdeps/ieee754/dbl-64/upow.h: Remove.
* sysdeps/ieee754/dbl-64/upow.tbl: Remove.
* sysdeps/m68k/m680x0/fpu/e_pow_log_data.c: New file.
* sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_pow-fma.c): Allow fma
contraction.
(CFLAGS-e_pow-fma4.c): Likewise.
|
|
Continuing the cleanup of math_private.h, with a view to it becoming
the header for the APIs defined therein and not also a header with
inline variants of math.h APIs, this patch moves inline definitions of
__isinff128 and fabsf128 to include/math.h, so that any users of
math.h in glibc automatically get the optimized functions rather than
quietly missing them if they do not also include math_private.h.
Tested for x86_64 and x86, and with build-many-glibcs.py with GCC 6.
There are changes to installed stripped libc.so on configurations with
distinct _Float128, because of __printf_fp_l code that now gets the
__isinff128 inline where previously it called the out-of-line
function because of the lack of a math_private.h call. It seems
appropriate that this code does get the inline (as it would
automatically with GCC 7 and later when the built-in function is used)
rather than being the only place in glibc that does not.
* sysdeps/generic/math_private.h
[__HAVE_DISTINCT_FLOAT128 && !__GNUC_PREREQ (7, 0)] (__isinff128):
Move this inline function ....
[__HAVE_DISTINCT_FLOAT128] (fabsf128): And this one ....
* include/math.h [!_ISOMAC]: To here....
|
|
Continuing the clean-up related to the catch-all math_private.h
header, this patch stops math_private.h from including fenv_private.h.
Instead, fenv_private.h is included directly from those users of
math_private.h that also used interfaces from fenv_private.h. No
attempt is made to remove unused includes of math_private.h, but that
is a natural followup.
(However, since math_private.h sometimes defines optimized versions of
math.h interfaces or __* variants thereof, as well as defining its own
interfaces, I think it might make sense to get all those optimized
versions included from include/math.h, not requiring a separate header
at all, before eliminating unused math_private.h includes - that
avoids a file quietly becoming less-optimized if someone adds a call
to one of those interfaces without restoring a math_private.h include
to that file.)
There is still a pitfall that if code uses plain fe* and __fe*
interfaces, but only includes fenv.h and not fenv_private.h or (before
this patch) math_private.h, it will compile on platforms with
exceptions and rounding modes but not get the optimized versions (and
possibly not compile) on platforms without exception and rounding mode
support, so making it easy to break the build for such platforms
accidentally.
I think it would be most natural to move the inlines / macros for fe*
and __fe* in the case of no exceptions and rounding modes into
include/fenv.h, so that all code including fenv.h with _ISOMAC not
defined automatically gets them. Then fenv_private.h would be purely
the header for the libc_fe*, SET_RESTORE_ROUND etc. internal
interfaces and the risk of breaking the build on other platforms than
the one you tested on because of a missing fenv_private.h include
would be much reduced (and there would be some unused fenv_private.h
includes to remove along with unused math_private.h includes).
Tested for x86_64 and x86, and tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by this patch.
* sysdeps/generic/math_private.h: Do not include <fenv_private.h>.
* math/fromfp.h: Include <fenv_private.h>.
* math/math-narrow.h: Likewise.
* math/s_cexp_template.c: Likewise.
* math/s_csin_template.c: Likewise.
* math/s_csinh_template.c: Likewise.
* math/s_ctan_template.c: Likewise.
* math/s_ctanh_template.c: Likewise.
* math/s_iseqsig_template.c: Likewise.
* math/w_acos_compat.c: Likewise.
* math/w_acosf_compat.c: Likewise.
* math/w_acosl_compat.c: Likewise.
* math/w_asin_compat.c: Likewise.
* math/w_asinf_compat.c: Likewise.
* math/w_asinl_compat.c: Likewise.
* math/w_ilogb_template.c: Likewise.
* math/w_j0_compat.c: Likewise.
* math/w_j0f_compat.c: Likewise.
* math/w_j0l_compat.c: Likewise.
* math/w_j1_compat.c: Likewise.
* math/w_j1f_compat.c: Likewise.
* math/w_j1l_compat.c: Likewise.
* math/w_jn_compat.c: Likewise.
* math/w_jnf_compat.c: Likewise.
* math/w_llogb_template.c: Likewise.
* math/w_log10_compat.c: Likewise.
* math/w_log10f_compat.c: Likewise.
* math/w_log10l_compat.c: Likewise.
* math/w_log2_compat.c: Likewise.
* math/w_log2f_compat.c: Likewise.
* math/w_log2l_compat.c: Likewise.
* math/w_log_compat.c: Likewise.
* math/w_logf_compat.c: Likewise.
* math/w_logl_compat.c: Likewise.
* sysdeps/aarch64/fpu/feholdexcpt.c: Likewise.
* sysdeps/aarch64/fpu/fesetround.c: Likewise.
* sysdeps/aarch64/fpu/fgetexcptflg.c: Likewise.
* sysdeps/aarch64/fpu/ftestexcept.c: Likewise.
* sysdeps/ieee754/dbl-64/e_atan2.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp2.c: Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c: Likewise.
* sysdeps/ieee754/dbl-64/e_jn.c: Likewise.
* sysdeps/ieee754/dbl-64/e_pow.c: Likewise.
* sysdeps/ieee754/dbl-64/e_remainder.c: Likewise.
* sysdeps/ieee754/dbl-64/e_sqrt.c: Likewise.
* sysdeps/ieee754/dbl-64/gamma_product.c: Likewise.
* sysdeps/ieee754/dbl-64/lgamma_neg.c: Likewise.
* sysdeps/ieee754/dbl-64/s_atan.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fma.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fmaf.c: Likewise.
* sysdeps/ieee754/dbl-64/s_llrint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_llround.c: Likewise.
* sysdeps/ieee754/dbl-64/s_lrint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_lround.c: Likewise.
* sysdeps/ieee754/dbl-64/s_nearbyint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_sin.c: Likewise.
* sysdeps/ieee754/dbl-64/s_sincos.c: Likewise.
* sysdeps/ieee754/dbl-64/s_tan.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_lround.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_nearbyint.c: Likewise.
* sysdeps/ieee754/dbl-64/x2y2m1.c: Likewise.
* sysdeps/ieee754/float128/float128_private.h: Likewise.
* sysdeps/ieee754/flt-32/e_gammaf_r.c: Likewise.
* sysdeps/ieee754/flt-32/e_j1f.c: Likewise.
* sysdeps/ieee754/flt-32/e_jnf.c: Likewise.
* sysdeps/ieee754/flt-32/lgamma_negf.c: Likewise.
* sysdeps/ieee754/flt-32/s_llrintf.c: Likewise.
* sysdeps/ieee754/flt-32/s_llroundf.c: Likewise.
* sysdeps/ieee754/flt-32/s_lrintf.c: Likewise.
* sysdeps/ieee754/flt-32/s_lroundf.c: Likewise.
* sysdeps/ieee754/flt-32/s_nearbyintf.c: Likewise.
* sysdeps/ieee754/k_standardl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_expl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_j1l.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-128/gamma_productl.c: Likewise.
* sysdeps/ieee754/ldbl-128/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_nearbyintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/x2y2m1l.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_expl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_j1l.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-96/gamma_productl.c: Likewise.
* sysdeps/ieee754/ldbl-96/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fma.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-96/x2y2m1l.c: Likewise.
* sysdeps/powerpc/fpu/e_sqrt.c: Likewise.
* sysdeps/powerpc/fpu/e_sqrtf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_floor.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_nearbyint.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_round.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_roundeven.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise.
* sysdeps/riscv/rvd/s_finite.c: Likewise.
* sysdeps/riscv/rvd/s_fmax.c: Likewise.
* sysdeps/riscv/rvd/s_fmin.c: Likewise.
* sysdeps/riscv/rvd/s_fpclassify.c: Likewise.
* sysdeps/riscv/rvd/s_isinf.c: Likewise.
* sysdeps/riscv/rvd/s_isnan.c: Likewise.
* sysdeps/riscv/rvd/s_issignaling.c: Likewise.
* sysdeps/riscv/rvf/fegetround.c: Likewise.
* sysdeps/riscv/rvf/feholdexcpt.c: Likewise.
* sysdeps/riscv/rvf/fesetenv.c: Likewise.
* sysdeps/riscv/rvf/fesetround.c: Likewise.
* sysdeps/riscv/rvf/feupdateenv.c: Likewise.
* sysdeps/riscv/rvf/fgetexcptflg.c: Likewise.
* sysdeps/riscv/rvf/ftestexcept.c: Likewise.
* sysdeps/riscv/rvf/s_ceilf.c: Likewise.
* sysdeps/riscv/rvf/s_finitef.c: Likewise.
* sysdeps/riscv/rvf/s_floorf.c: Likewise.
* sysdeps/riscv/rvf/s_fmaxf.c: Likewise.
* sysdeps/riscv/rvf/s_fminf.c: Likewise.
* sysdeps/riscv/rvf/s_fpclassifyf.c: Likewise.
* sysdeps/riscv/rvf/s_isinff.c: Likewise.
* sysdeps/riscv/rvf/s_isnanf.c: Likewise.
* sysdeps/riscv/rvf/s_issignalingf.c: Likewise.
* sysdeps/riscv/rvf/s_nearbyintf.c: Likewise.
* sysdeps/riscv/rvf/s_roundevenf.c: Likewise.
* sysdeps/riscv/rvf/s_roundf.c: Likewise.
* sysdeps/riscv/rvf/s_truncf.c: Likewise.
|
|
On some architectures, the parts of math_private.h relating to the
floating-point environment are in a separate file fenv_private.h
included from math_private.h. As this is purely an
architecture-specific convention used by several architectures,
however, all such architectures still need their own math_private.h,
even if it has nothing to do beyond #include <fenv_private.h> and
peculiarity of including the i386 file directly instead of having a
shared file in sysdeps/x86.
This patch makes the fenv_private.h name an architecture-independent
convention in glibc. The include of fenv_private.h from
math_private.h becomes architecture-independent (until callers are
updated to include fenv_private.h directly so the include from
math_private.h is no longer needed). Some architecture math_private.h
headers are removed if no longer needed, or renamed to fenv_private.h
if all they define belongs in that header; architecture fenv_private.h
headers now do require #include_next <fenv_private.h>. The i386
fenv_private.h file moves to sysdeps/x86/fpu/ to reflect how it is
actually shared with x86_64. The generic math_private.h gets a new
include of <stdbool.h>, as needed for bool in some prototypes in that
header (previously that was indirectly included via include/fenv.h,
which now only gets included too late in math_private.h, after those
prototypes).
Tested for x86_64 and x86, and tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by the patch.
* sysdeps/aarch64/fpu/fenv_private.h: New file. Based on ....
* sysdeps/aarch64/fpu/math_private.h: ... this file. All contents
moved to fenv_private.h except for ...
(TOINT_INTRINSICS): Kept in math_private.h.
(roundtoint): Likewise.
(converttoint): Likewise.
* sysdeps/arm/fenv_private.h: Change multiple-include guard to
[ARM_FENV_PRIVATE_H]. Include next <fenv_private.h>.
* sysdeps/arm/math_private.h: Remove.
* sysdeps/generic/fenv_private.h: New file. Contents moved from
....
* sysdeps/generic/math_private.h: ... this file. Include
<stdbool.h>. Do not include <fenv.h> or <get-rounding-mode.h>.
Include <fenv_private.h>. Remove functions and macros moved to
fenv_private.h.
* sysdeps/i386/fpu/math_private.h: Remove.
* sysdeps/mips/math_private.h: Move to ....
* sysdeps/mips/fpu/fenv_private.h: ... here. Change
multiple-include guard to [MIPS_FENV_PRIVATE_H]. Remove
[__mips_hard_float] conditional. Include next <fenv_private.h>.
* sysdeps/powerpc/fpu/fenv_private.h: Change multiple-include
guard to [POWERPC_FENV_PRIVATE_H]. Include next <fenv_private.h>.
* sysdeps/powerpc/fpu/math_private.h: Do not include
<fenv_private.h>.
* sysdeps/riscv/rvf/math_private.h: Move to ....
* sysdeps/riscv/rvf/fenv_private.h: ... here. Change
multiple-include guard to [RISCV_FENV_PRIVATE_H]. Include next
<fenv_private.h>.
* sysdeps/sparc/fpu/fenv_private.h: Change multiple-include guard
to [SPARC_FENV_PRIVATE_H]. Include next <fenv_private.h>.
* sysdeps/sparc/fpu/math_private.h: Remove.
* sysdeps/i386/fpu/fenv_private.h: Move to ....
* sysdeps/x86/fpu/fenv_private.h: ... here. Change
multiple-include guard to [X86_FENV_PRIVATE_H]. Include next
<fenv_private.h>.
* sysdeps/x86_64/fpu/math_private.h: Do not include
<sysdeps/i386/fpu/fenv_private.h>.
|
|
Previously, only the SSE2 rounding mode was set, so the assembler
implementations using 387 were not following the expecting rounding
mode.
|
|
This patch continues the math_private.h cleanup by stopping
math_private.h from including math-barriers.h and making the users of
the barrier macros include the latter header directly. No attempt is
made to remove any math_private.h includes that are now unused, except
in strtod_l.c where that is done to avoid line number changes in
assertions, so that installed stripped shared libraries can be
compared before and after the patch. (I think the floating-point
environment support in math_private.h should also move out - some
architectures already have fenv_private.h as an architecture-internal
header included from their math_private.h - and after moving that out
might be a better time to identify unused math_private.h includes.)
Tested for x86_64 and x86, and tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by the patch.
* sysdeps/generic/math_private.h: Do not include
<math-barriers.h>.
* stdlib/strtod_l.c: Include <math-barriers.h> instead of
<math_private.h>.
* math/fromfp.h: Include <math-barriers.h>.
* math/math-narrow.h: Likewise.
* math/s_nextafter.c: Likewise.
* math/s_nexttowardf.c: Likewise.
* sysdeps/aarch64/fpu/s_llrint.c: Likewise.
* sysdeps/aarch64/fpu/s_llrintf.c: Likewise.
* sysdeps/aarch64/fpu/s_lrint.c: Likewise.
* sysdeps/aarch64/fpu/s_lrintf.c: Likewise.
* sysdeps/i386/fpu/s_nextafterl.c: Likewise.
* sysdeps/i386/fpu/s_nexttoward.c: Likewise.
* sysdeps/i386/fpu/s_nexttowardf.c: Likewise.
* sysdeps/ieee754/dbl-64/e_atan2.c: Likewise.
* sysdeps/ieee754/dbl-64/e_atanh.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp2.c: Likewise.
* sysdeps/ieee754/dbl-64/e_j0.c: Likewise.
* sysdeps/ieee754/dbl-64/e_sqrt.c: Likewise.
* sysdeps/ieee754/dbl-64/s_expm1.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fma.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fmaf.c: Likewise.
* sysdeps/ieee754/dbl-64/s_log1p.c: Likewise.
* sysdeps/ieee754/dbl-64/s_nearbyint.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_nearbyint.c: Likewise.
* sysdeps/ieee754/flt-32/e_atanhf.c: Likewise.
* sysdeps/ieee754/flt-32/e_j0f.c: Likewise.
* sysdeps/ieee754/flt-32/s_expm1f.c: Likewise.
* sysdeps/ieee754/flt-32/s_log1pf.c: Likewise.
* sysdeps/ieee754/flt-32/s_nearbyintf.c: Likewise.
* sysdeps/ieee754/flt-32/s_nextafterf.c: Likewise.
* sysdeps/ieee754/k_standardl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_asinl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_expl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_powl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_nearbyintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_nextafterl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_nexttoward.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_nexttowardf.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_asinl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nextafterl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nexttoward.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nexttowardf.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_atanhl.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_j0l.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fma.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_nexttoward.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_nexttowardf.c: Likewise.
* sysdeps/ieee754/ldbl-opt/s_nexttowardfd.c: Likewise.
* sysdeps/m68k/m680x0/fpu/s_nextafterl.c: Likewise.
|
|
This patch continues cleaning up math_private.h by moving the
math_check_force_underflow set of macros to a separate header
math-underflow.h.
This header is included by the files that need it rather than from
math_private.h. Moving these macros to a separate file removes the
math_private.h uses of macros from float.h, so the inclusion of
float.h in math_private.h is also removed; files that were depending
on that inclusion are fixed to include float.h directly. The
inclusion of math-barriers.h from math_private.h will be removed in a
separate patch.
Tested for x86_64 and x86. Also tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by this patch.
* math/math-underflow.h: New file.
* sysdeps/generic/math_private.h: Do not include <float.h>.
(fabs_tg): Remove macro. Moved to math-underflow.h.
(min_of_type_f): Likewise.
(min_of_type_): Likewise.
(min_of_type_l): Likewise.
(min_of_type_f128): Likewise.
(min_of_type): Likewise.
(math_check_force_underflow): Likewise.
(math_check_force_underflow_nonneg): Likewise.
(math_check_force_underflow_complex): Likewise.
* math/e_exp2_template.c: Include <math-underflow.h>.
* math/k_casinh_template.c: Likewise.
* math/s_catan_template.c: Likewise.
* math/s_catanh_template.c: Likewise.
* math/s_ccosh_template.c: Likewise.
* math/s_cexp_template.c: Likewise.
* math/s_clog10_template.c: Likewise.
* math/s_clog_template.c: Likewise.
* math/s_csin_template.c: Likewise.
* math/s_csinh_template.c: Likewise.
* math/s_csqrt_template.c: Likewise.
* math/s_ctan_template.c: Likewise.
* math/s_ctanh_template.c: Likewise.
* sysdeps/ieee754/dbl-64/e_asin.c: Likewise.
* sysdeps/ieee754/dbl-64/e_atanh.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp2.c: Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c: Likewise.
* sysdeps/ieee754/dbl-64/e_hypot.c: Likewise.
* sysdeps/ieee754/dbl-64/e_j1.c: Likewise.
* sysdeps/ieee754/dbl-64/e_jn.c: Likewise.
* sysdeps/ieee754/dbl-64/e_pow.c: Likewise.
* sysdeps/ieee754/dbl-64/e_sinh.c: Likewise.
* sysdeps/ieee754/dbl-64/s_asinh.c: Likewise.
* sysdeps/ieee754/dbl-64/s_atan.c: Likewise.
* sysdeps/ieee754/dbl-64/s_erf.c: Likewise.
* sysdeps/ieee754/dbl-64/s_expm1.c: Likewise.
* sysdeps/ieee754/dbl-64/s_log1p.c: Likewise.
* sysdeps/ieee754/dbl-64/s_sin.c: Likewise.
* sysdeps/ieee754/dbl-64/s_sincos.c: Likewise.
* sysdeps/ieee754/dbl-64/s_tan.c: Likewise.
* sysdeps/ieee754/dbl-64/s_tanh.c: Likewise.
* sysdeps/ieee754/flt-32/e_asinf.c: Likewise.
* sysdeps/ieee754/flt-32/e_atanhf.c: Likewise.
* sysdeps/ieee754/flt-32/e_gammaf_r.c: Likewise.
* sysdeps/ieee754/flt-32/e_j1f.c: Likewise.
* sysdeps/ieee754/flt-32/e_jnf.c: Likewise.
* sysdeps/ieee754/flt-32/e_sinhf.c: Likewise.
* sysdeps/ieee754/flt-32/k_sinf.c: Likewise.
* sysdeps/ieee754/flt-32/k_tanf.c: Likewise.
* sysdeps/ieee754/flt-32/s_asinhf.c: Likewise.
* sysdeps/ieee754/flt-32/s_atanf.c: Likewise.
* sysdeps/ieee754/flt-32/s_erff.c: Likewise.
* sysdeps/ieee754/flt-32/s_expm1f.c: Likewise.
* sysdeps/ieee754/flt-32/s_log1pf.c: Likewise.
* sysdeps/ieee754/flt-32/s_tanhf.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_asinl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_atanhl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_expl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_hypotl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_j1l.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_sinhl.c: Likewise.
* sysdeps/ieee754/ldbl-128/k_sincosl.c: Likewise.
* sysdeps/ieee754/ldbl-128/k_sinl.c: Likewise.
* sysdeps/ieee754/ldbl-128/k_tanl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_asinhl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_atanl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_erfl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_expm1l.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_log1pl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_tanhl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_asinl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_atanhl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_hypotl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_j1l.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_powl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_sinhl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/k_sincosl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/k_sinl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/k_tanl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_asinhl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_atanl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_erfl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_tanhl.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_asinl.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_atanhl.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_hypotl.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_j1l.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_sinhl.c: Likewise.
* sysdeps/ieee754/ldbl-96/k_sinl.c: Likewise.
* sysdeps/ieee754/ldbl-96/k_tanl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_asinhl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_erfl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_tanhl.c: Likewise.
* sysdeps/powerpc/fpu/e_hypot.c: Likewise.
* sysdeps/x86/fpu/powl_helper.c: Likewise.
* sysdeps/ieee754/dbl-64/s_nextup.c: Include <float.h>.
* sysdeps/ieee754/flt-32/s_nextupf.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_nextupl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nextupl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_nextupl.c: Likewise.
|
|
This patch continues cleaning up math_private.h by moving the
math_opt_barrier and math_force_eval macros to a separate header
math-barriers.h.
At present, those macros are inside a "#ifndef math_opt_barrier" in
math_private.h to allow architectures to override them and then use
a separate math-barriers.h header, no such #ifndef or #include_next is
needed; architectures just have their own alternative version of
math-barriers.h when providing their own optimized versions that avoid
going through memory unnecessarily. The generic math-barriers.h has a
comment added to document these two macros.
In this patch, math_private.h is made to #include <math-barriers.h>,
so files using these macros do not need updating yet. That is because
of uses of math_force_eval in math_check_force_underflow and
math_check_force_underflow_nonneg, which are still defined in
math_private.h. Once those are moved out to a separate header, that
separate header can be made to include <math-barriers.h>, as can the
other files directly using these barrier macros, and then the include
of <math-barriers.h> from math_private.h can be removed.
Tested for x86_64 and x86. Also tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by this patch.
* sysdeps/generic/math-barriers.h: New file.
* sysdeps/generic/math_private.h [!math_opt_barrier]
(math_opt_barrier): Move to math-barriers.h.
[!math_opt_barrier] (math_force_eval): Likewise.
* sysdeps/aarch64/fpu/math-barriers.h: New file.
* sysdeps/aarch64/fpu/math_private.h (math_opt_barrier): Move to
math-barriers.h.
(math_force_eval): Likewise.
* sysdeps/alpha/fpu/math-barriers.h: New file.
* sysdeps/alpha/fpu/math_private.h (math_opt_barrier): Move to
math-barriers.h.
(math_force_eval): Likewise.
* sysdeps/x86/fpu/math-barriers.h: New file.
* sysdeps/i386/fpu/fenv_private.h (math_opt_barrier): Move to
math-barriers.h.
(math_force_eval): Likewise.
* sysdeps/m68k/m680x0/fpu/math_private.h: Move to....
* sysdeps/m68k/m680x0/fpu/math-barriers.h: ... here. Adjust
multiple-include guard for rename.
* sysdeps/powerpc/fpu/math-barriers.h: New file.
* sysdeps/powerpc/fpu/math_private.h (math_opt_barrier): Move to
math-barriers.h.
(math_force_eval): Likewise.
|
|
This patch continues cleaning up the math_private.h header, which
contains lots of different definitions many of which are only needed
by a limited subset of files using that header (and some of which are
overridden by architectures that only want to override selected parts
of the header), by moving the math_narrow_eval macro out to a separate
math-narrow-eval.h header, only included by those files that need it.
That header is placed in include/ (since it's used in stdlib/, not
just files built in math/, but no sysdeps variants are needed at
present).
Tested for x86_64, and with build-many-glibcs.py. (Installed stripped
shared libraries change because of line numbers in assertions in
strtod_l.c.)
* include/math-narrow-eval.h: New file. Contents moved from ....
* sysdeps/generic/math_private.h: ... here.
(math_narrow_eval): Remove macro. Moved to math-narrow-eval.h.
[FLT_EVAL_METHOD != 0] (excess_precision): Likewise.
* math/s_fdim_template.c: Include <math-narrow-eval.h>.
* stdlib/strtod_l.c: Likewise.
* sysdeps/i386/fpu/s_f32xaddf64.c: Likewise.
* sysdeps/i386/fpu/s_f32xsubf64.c: Likewise.
* sysdeps/i386/fpu/s_fdim.c: Likewise.
* sysdeps/ieee754/dbl-64/e_cosh.c: Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c: Likewise.
* sysdeps/ieee754/dbl-64/e_j1.c: Likewise.
* sysdeps/ieee754/dbl-64/e_jn.c: Likewise.
* sysdeps/ieee754/dbl-64/e_lgamma_r.c: Likewise.
* sysdeps/ieee754/dbl-64/e_sinh.c: Likewise.
* sysdeps/ieee754/dbl-64/gamma_productf.c: Likewise.
* sysdeps/ieee754/dbl-64/k_rem_pio2.c: Likewise.
* sysdeps/ieee754/dbl-64/lgamma_neg.c: Likewise.
* sysdeps/ieee754/dbl-64/s_erf.c: Likewise.
* sysdeps/ieee754/dbl-64/s_llrint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_lrint.c: Likewise.
* sysdeps/ieee754/flt-32/e_coshf.c: Likewise.
* sysdeps/ieee754/flt-32/e_exp2f.c: Likewise.
* sysdeps/ieee754/flt-32/e_expf.c: Likewise.
* sysdeps/ieee754/flt-32/e_gammaf_r.c: Likewise.
* sysdeps/ieee754/flt-32/e_j1f.c: Likewise.
* sysdeps/ieee754/flt-32/e_jnf.c: Likewise.
* sysdeps/ieee754/flt-32/e_lgammaf_r.c: Likewise.
* sysdeps/ieee754/flt-32/e_sinhf.c: Likewise.
* sysdeps/ieee754/flt-32/k_rem_pio2f.c: Likewise.
* sysdeps/ieee754/flt-32/lgamma_negf.c: Likewise.
* sysdeps/ieee754/flt-32/s_erff.c: Likewise.
* sysdeps/ieee754/flt-32/s_llrintf.c: Likewise.
* sysdeps/ieee754/flt-32/s_lrintf.c: Likewise.
* sysdeps/ieee754/ldbl-96/gamma_product.c: Likewise.
|
|
Remove the __slowexp code, so exp is no longer correctly rounded. The
result is computed to about 70 bits precision so the worst case ulp
error is about 0.500007 in nearest rounding mode.
* manual/probes.texi: Remove slowexp probes.
* math/Makefile: Remove slowexp.
* sysdeps/generic/math_private.h (__slowexp): Remove.
* sysdeps/ieee754/dbl-64/e_exp.c (__ieee754_exp): Remove __slowexp and
document error bounds.
* sysdeps/i386/fpu/slowexp.c: Remove.
* sysdeps/ia64/fpu/slowexp.c: Remove.
* sysdeps/ieee754/dbl-64/slowexp.c: Remove.
* sysdeps/ieee754/dbl-64/uexp.h (err_0): Remove.
* sysdeps/m68k/m680x0/fpu/slowexp.c: Remove.
* sysdeps/powerpc/power4/fpu/Makefile (CPPFLAGS-slowexp.c): Remove.
* sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowexp-fma.
* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Remove.
* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Remove.
* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Remove.
* sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Remove.
* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Remove.
* sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Remove.
|
|
Remove the slow paths from pow. Like several other double precision math
functions, pow is exactly rounded. This is not required from math functions
and causes major overheads as it requires multiple fallbacks using higher
precision arithmetic if a result is close to 0.5ULP. Ridiculous slowdowns
of up to 100000x have been reported when the highest precision path triggers.
All GLIBC math tests pass on AArch64 and x64 (with ULP of pow set to 1).
The worst case error is ~0.506ULP. A simple test over a few hundred million
values shows pow is 10% faster on average. This fixes BZ #13932.
[BZ #13932]
* sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove.
* benchtests/pow-inputs: Update comment for slow path cases.
* manual/probes.texi (slowpow_p10): Delete removed probe.
(slowpow_p10): Likewise.
* math/Makefile: Remove halfulp.c and slowpow.c.
* sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1.
* sysdeps/generic/math_private.h (__exp1): Remove error argument.
(__halfulp): Remove.
(__slowpow): Remove.
* sysdeps/i386/fpu/halfulp.c: Delete file.
* sysdeps/i386/fpu/slowpow.c: Likewise.
* sysdeps/ia64/fpu/halfulp.c: Likewise.
* sysdeps/ia64/fpu/slowpow.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument,
improve comments and add error analysis.
* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis.
(power1): Remove function:
(log1): Remove error argument, add error analysis.
(my_log2): Remove function.
* sysdeps/ieee754/dbl-64/halfulp.c: Delete file.
* sysdeps/ieee754/dbl-64/slowpow.c: Likewise.
* sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise.
* sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise.
* sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c.
* sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1.
* sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c,
slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c.
* sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define.
* sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise.
* sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file.
* sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise.
|
|
Continuing the process of improving and cleaning up the handling of
configurations lacking support for floating-point exceptions and
rounding modes, this patch adds trivial inline definitions of
feholdexcept and __feholdexcept to the set of inlines for such
configurations in math_private.h. These inlines were missing from the
tile version used as a basis for the previous inlines, despite a few
such function calls ending up in libm.so.
Tested with build-many-glibcs.py. As expected, installed stripped
shared libraries are unchanged for architectures supporting exceptions
and rounding modes, but changed for architectures lacking such
support.
* sysdeps/generic/math_private.h
[!FE_HAVE_ROUNDING_MODES && FE_ALL_EXCEPT == 0] (feholdexcept):
New inline functio |