aboutsummaryrefslogtreecommitdiff
path: root/include/alloc_buffer.h
diff options
context:
space:
mode:
authorNoah Goldstein <goldstein.w.n@gmail.com>2022-06-06 21:11:30 -0700
committerNoah Goldstein <goldstein.w.n@gmail.com>2022-06-07 13:09:36 -0700
commit731feee3869550e93177e604604c1765d81de571 (patch)
tree7939c015c885c704b33feab219ce3dac77050088 /include/alloc_buffer.h
parentd0370d992e5e7b4a8843e8e130f6c86b483ab7d0 (diff)
downloadglibc-731feee3869550e93177e604604c1765d81de571.tar.xz
glibc-731feee3869550e93177e604604c1765d81de571.zip
x86: Optimize memrchr-sse2.S
The new code: 1. prioritizes smaller lengths more. 2. optimizes target placement more carefully. 3. reuses logic more. 4. fixes up various inefficiencies in the logic. The total code size saving is: 394 bytes Geometric Mean of all benchmarks New / Old: 0.874 Regressions: 1. The page cross case is now colder, especially re-entry from the page cross case if a match is not found in the first VEC (roughly 50%). My general opinion with this patch is this is acceptable given the "coldness" of this case (less than 4%) and generally performance improvement in the other far more common cases. 2. There are some regressions 5-15% for medium/large user-arg lengths that have a match in the first VEC. This is because the logic was rewritten to optimize finds in the first VEC if the user-arg length is shorter (where we see roughly 20-50% performance improvements). It is not always the case this is a regression. My intuition is some frontend quirk is partially explaining the data although I haven't been able to find the root cause. Full xcheck passes on x86_64. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Diffstat (limited to 'include/alloc_buffer.h')
0 files changed, 0 insertions, 0 deletions