glibc.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-02-08 10:08:38 -0300
committer	H.J. Lu <hjl.tools@gmail.com>	2024-02-13 08:49:12 -0800
commit	0c0d39fe4aeb0f69b26e76337c5dfd5530d5d44e (patch)
tree	90a312fbf1c501177bfd1e33b67ace7d62bbbd56 /scripts
parent	155bb9d036646138348fee0ac045de601811e0c5 (diff)
download	glibc-0c0d39fe4aeb0f69b26e76337c5dfd5530d5d44e.tar.xz glibc-0c0d39fe4aeb0f69b26e76337c5dfd5530d5d44e.zip

x86: Fix Zen3/Zen4 ERMS selection (BZ 30994)

The REP MOVSB usage on memcpy/memmove does not show much performance improvement on Zen3/Zen4 cores compared to the vectorized loops. Also, as from BZ 30994, if the source is aligned and the destination is not the performance can be 20x slower. The performance difference is noticeable with small buffer sizes, closer to the lower bounds limits when memcpy/memmove starts to use ERMS. The performance of REP MOVSB is similar to vectorized instruction on the size limit (the L2 cache). Also, there is no drawback to multiple cores sharing the cache. Checked on x86_64-linux-gnu on Zen3. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

Diffstat (limited to 'scripts')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: