diff options
| author | Noah Goldstein <goldstein.w.n@gmail.com> | 2022-06-06 21:11:30 -0700 |
|---|---|---|
| committer | Noah Goldstein <goldstein.w.n@gmail.com> | 2022-06-07 13:09:36 -0700 |
| commit | 731feee3869550e93177e604604c1765d81de571 (patch) | |
| tree | 7939c015c885c704b33feab219ce3dac77050088 /include/alloc_buffer.h | |
| parent | d0370d992e5e7b4a8843e8e130f6c86b483ab7d0 (diff) | |
| download | glibc-731feee3869550e93177e604604c1765d81de571.tar.xz glibc-731feee3869550e93177e604604c1765d81de571.zip | |
x86: Optimize memrchr-sse2.S
The new code:
1. prioritizes smaller lengths more.
2. optimizes target placement more carefully.
3. reuses logic more.
4. fixes up various inefficiencies in the logic.
The total code size saving is: 394 bytes
Geometric Mean of all benchmarks New / Old: 0.874
Regressions:
1. The page cross case is now colder, especially re-entry from the
page cross case if a match is not found in the first VEC
(roughly 50%). My general opinion with this patch is this is
acceptable given the "coldness" of this case (less than 4%) and
generally performance improvement in the other far more common
cases.
2. There are some regressions 5-15% for medium/large user-arg
lengths that have a match in the first VEC. This is because the
logic was rewritten to optimize finds in the first VEC if the
user-arg length is shorter (where we see roughly 20-50%
performance improvements). It is not always the case this is a
regression. My intuition is some frontend quirk is partially
explaining the data although I haven't been able to find the
root cause.
Full xcheck passes on x86_64.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Diffstat (limited to 'include/alloc_buffer.h')
0 files changed, 0 insertions, 0 deletions
