The purpose of this document is to assist folks in testing out my custom dj/malloc branch of the upstream GLIBC git repo. This COPR repo has pre-built RPMs for easy installation in a test environment.
See https://copr.fedorainfracloud.org/coprs/djdelorie/glibc_dj_malloc/ for links and other information.$ cd /etc/yum.repos.d/
$ wget https://copr.fedorainfracloud.org/coprs/djdelorie/glibc_dj_malloc/repo/epel-7/djdelorie-glibc_dj_malloc-epel-7.repo $ yum update $ init 6
$ dnf copr enable djdelorie/glibc_dj_malloc $ dnf clean all (optional) $ dnf update $ init 6
$ rpm -qa | grep ^glibc | grep -v x86_64
$ rpm -qa | grep glibc glibc-all-langpacks-2.23.90-alphadj9.fc23.x86_64 glibc-2.23.90-alphadj9.fc23.x86_64 glibc-common-2.23.90-alphadj9.fc23.x86_64
One key new feature in this malloc is a high-speed trace buffer that records every malloc, free, etc call with a minimum of added latency. This is an improvement over the existing trace feature for applications that are performance-critical. There is a private (i.e. glibc-internal) API for activating this buffer, which is enabled via a provided DSO:
$ LD_PRELOAD=/lib64/libmtracectl.so ls
Replace lib64 with lib, or whatever suitable path you've installed into, for 32-bit machines or machines with non-standard layouts, which I don't support, but you never know...
$ ls -l /tmp/mtrace-* -rw-r--r--. 1 root root 12422 Jun 2 20:53 mtrace.out.1188
Each generated file is a binary file, specific to the architecture, with one record per trace record entry. Some programs are included in the COPR repo to process the generated files. Please make sure you process these files on the same architecture as they were generated on.
If we ask you to send us a trace file, please rename and compress it to make the file easier to transfer and keep track of.
$ cd /tmp $ gzip -9 mtrace.out.1188 $ mv mtrace.out.1188.gz f24-ls-fred.mtrace.gz (or whatever name fits :)
Then mail f24-ls-fred.mtrace.gz to dj@redhat.com (or whoever is asking for it, of course)
This build also includes a set of tools to "play back" a recorded trace, which can be helpful in diagnosing memory-related performance issues. Such workloads might be locally generated as part of a benchmark suite, for example.
trace2wl outfile [infile ...]If an infile is not provided, input is read from stdin.
$ trace2wl /tmp/ls.wl /tmp/mtrace-22172.outThe resulting file is a "workload" - a data file that tells the simulator how to play back all the malloc/free/etc calls. This file is not human-readable, but a compact binary datafile intended to be used only by the simulator.
trace_run workload.wl
Note: trace_run only works on intel processors with the RDTSCP opcode, which is only available on reasonably modern processors. To see if your processor supports this opcode, look for the rdtscp cpu flag:
$ grep rdtscp /proc/cpuinfoIf you get lines like "flags :
$ trace_run /tmp/ls.wl 488,004 cycles 106 usec wall time 0 usec across 1 thread 0 Kb Max RSS (1,228 -> 1,228) Avg malloc time: 385 in 154 calls Avg calloc time: 0 in 1 calls Avg realloc time: 0 in 1 calls Avg free time: 194 in 14 calls Total call time: 62,033 cyclesNote: see Practical Micro-Benchmarking with ltrace and sched to get more stable numbers.
MALLOC_TCACHE_COUNT=count MALLOC_TCACHE_MAX=bytes
count can be anything from 0 to whatever.
bytes can be anything from 0 to 63*2*sizeof(void *)-1 (503 for 32-bit, 1007 for 64-bit).
mallopt parameters are (private):
#define M_TCACHE_COUNT -9 #define M_TCACHE_MAX -10
$ vi /etc/yum.repos.d/_copr_djdelorie-glibc_dj_malloc.repochange this line from 1 to 0:
enabled=0Then:
$ dnf --allowerasing downgrade glibc(replace "dnf" with "yum" for RHEL 7)