Using the dj/malloc GLIBC COPR repo

The purpose of this document is to assist folks in testing out my custom dj/malloc branch of the upstream GLIBC git repo. This COPR repo has pre-built RPMs for easy installation in a test environment.

See https://copr.fedorainfracloud.org/coprs/djdelorie/glibc_dj_malloc/ for links and other information.

Installing the COPR Repo

$ cd /etc/yum.repos.d/

RHEL7

$ wget https://copr.fedorainfracloud.org/coprs/djdelorie/glibc_dj_malloc/repo/epel-7/djdelorie-glibc_dj_malloc-epel-7.repo

$ yum update

$ init 6

Fedora

$ dnf copr enable djdelorie/glibc_dj_malloc

$ dnf clean all  (optional)
$ dnf update

$ init 6

Missing Dependencies

If dnf complains about missing dependencies, see if you have non-x86_64 variants of glibc installed, and remove them:
$ rpm -qa | grep ^glibc | grep -v x86_64

Confirming Installation

$ rpm -qa | grep glibc
glibc-all-langpacks-2.23.90-alphadj9.fc23.x86_64
glibc-2.23.90-alphadj9.fc23.x86_64
glibc-common-2.23.90-alphadj9.fc23.x86_64

Capturing to the Trace Buffer

One key new feature in this malloc is a high-speed trace buffer that records every malloc, free, etc call with a minimum of added latency. This is an improvement over the existing trace feature for applications that are performance-critical. There is a private (i.e. glibc-internal) API for activating this buffer, which is enabled via a provided DSO:

$ LD_PRELOAD=/lib64/libmtracectl.so ls

Replace lib64 with lib, or whatever suitable path you've installed into, for 32-bit machines or machines with non-standard layouts, which I don't support, but you never know...

$ ls -l /tmp/mtrace-*
-rw-r--r--. 1 root root 12422 Jun  2 20:53 mtrace.out.1188

Each generated file is a binary file, specific to the architecture, with one record per trace record entry. Some programs are included in the COPR repo to process the generated files. Please make sure you process these files on the same architecture as they were generated on.

Sending Us Trace Files

If we ask you to send us a trace file, please rename and compress it to make the file easier to transfer and keep track of.

$ cd /tmp
$ gzip -9 mtrace.out.1188
$ mv mtrace.out.1188.gz f24-ls-fred.mtrace.gz (or whatever name fits :)

Then mail f24-ls-fred.mtrace.gz to dj@redhat.com (or whoever is asking for it, of course)

Workload Simulator

This build also includes a set of tools to "play back" a recorded trace, which can be helpful in diagnosing memory-related performance issues. Such workloads might be locally generated as part of a benchmark suite, for example.

trace2wl outfile [infile ...]
If an infile is not provided, input is read from stdin.
$ trace2wl /tmp/ls.wl /tmp/mtrace-22172.out
The resulting file is a "workload" - a data file that tells the simulator how to play back all the malloc/free/etc calls. This file is not human-readable, but a compact binary datafile intended to be used only by the simulator.
trace_run workload.wl

Note: trace_run only works on intel processors with the RDTSCP opcode, which is only available on reasonably modern processors. To see if your processor supports this opcode, look for the rdtscp cpu flag:

$ grep rdtscp /proc/cpuinfo
If you get lines like "flags : " then you have support and trace_run will work. If the grep returns nothing, you don't.
$ trace_run /tmp/ls.wl
488,004 cycles
106 usec wall time
0 usec across 1 thread
0 Kb Max RSS (1,228 -> 1,228)

Avg malloc time:    385 in        154 calls
Avg calloc time:      0 in          1 calls
Avg realloc time:     0 in          1 calls
Avg free time:      194 in         14 calls
Total call time: 62,033 cycles
Note: see Practical Micro-Benchmarking with ltrace and sched to get more stable numbers.

Tunables

MALLOC_TCACHE_COUNT=count
MALLOC_TCACHE_MAX=bytes

count can be anything from 0 to whatever.

bytes can be anything from 0 to 63*2*sizeof(void *)-1 (503 for 32-bit, 1007 for 64-bit).

mallopt parameters are (private):

#define M_TCACHE_COUNT  -9
#define M_TCACHE_MAX  -10

Uninstalling

To uninstall the custom build and revert to an official release, you "simly" disable the COPR repo and downgrade to the latest "released" version:
$ vi /etc/yum.repos.d/_copr_djdelorie-glibc_dj_malloc.repo
change this line from 1 to 0:
  enabled=0
Then:
$ dnf --allowerasing downgrade glibc
(replace "dnf" with "yum" for RHEL 7)