aboutsummaryrefslogtreecommitdiff
path: root/malloc/trace_run.c
AgeCommit message (Collapse)AuthorFilesLines
2016-07-22Add quick_run compilation mode.Carlos O'Donell1-37/+69
- Add quick_run compilation mode. - Remove disabling of fast bins.
2016-07-22More 32-bit fixes.DJ Delorie1-2/+7
Various fixes to handle traces and workloads bigger than 2 Gb.
2016-07-19Minor tweaks to trace_run and trace2wlDJ Delorie1-10/+4
trace_run - fix realloc returning NULL behavior trace2wl - hard stop on multi-level inversion, print number of fixed inversions.
2016-07-18Change trace_run from mmap to readDJ Delorie1-51/+121
To avoid huge memory requirements for huge workloads, and unreliable RSS size due to unmlock'able maps, switch trace_run to a read-as-you-go design. Data is read per-thread in 4k or 64k chunks (based on workload size) into a fixed buffer.
2016-07-16Enhance the tracer with new data and fixes.Carlos O'Donell1-13/+49
* Increase trace entry to 64-bytes. The following patch increases the trace entry to 64-bytes, still a proper multiple of the shared memory window size. While we have doubled the entry size the on-disk format is still smaller than the ASCII version. In the future we may wish to add variable sized records, but for now the simplicity of this method works well. With the extra bytes we are going to: - Record internal size information for incoming (free) and outgoing chunks (malloc, calloc, realloc, etc). - Simplifies accounting of RSS usage and provides an extra cross check between malloc<->free based on internal chunk sizes. - Record alignment information for memalign, and posix_memalign. - Continues to extend the tracer to the full API. - Leave 128-bits of padding for future path uses. - Useful for more path information. Additionally __MTB_TYPE_POSIX_MEMALIGN is added for the sole purpose of recording the trace only so that we can hard-fail in the workload converter when we see such an entry. Lastly C_MEMALIGN, C_VALLOC, C_PVALLOC, and C_POSIX_MEMALIGN are added for workload entries for the sake of completeness. Builds on x86_64, capture looks good and it works. * Teach trace_dump about the new entries. The following patch teaches trace_dump about the new posix_memalign entry. It also teaches trace_dump about the new size2 and size3 fields. Tested by tracing a program that uses malloc, free, and memalign and verifying that the extra fields show the expected chunk sizes, and alignments dumped with trace_dump. Tested on x86_64 with no apparently problems. * Teach trace2wl and trace_run about new entries (a) trace2wl changes: The following patch teaches trace2wl how to output entries for valloc and pvalloc, it does so exactly the same way it does for malloc, since from the perspective of the API they are identical. Additionally trace2wl is taught how to output an event for memalign, storing alignment and size in the event record. Lastly posix_memalign is detected and the converter aborted if it's seen. It is my opinion that we should not ignore this data during conversion. If we see a need for it we should implement it later. (b) trace_run changes: Some cosmetic cleanup in printing 'pthread_t' which is always an address of the struct pthread structure in memory, so to make debugging easier we should print the value as a hex pointer. Teach the simulator how to run memalign. With the newly recorded alignment information we double check that the resulting memory is correctly aligned. We do not implement valloc and pvalloc, they will abort the simulator. This is incremental progress. Tested on x86_64 by converting and running a multithreaded test application that calls calloc, malloc, free, and memalign. * Disable recursive traces and save new data. (a) Adds support for disabling recurisvely recorded traces e.g. realloc calling malloc no longer produces a realloc and malloc trace event. We solve this by using a per-thread variable to disable new trace creation, but allow path bits to be set. This lets us record the code paths taken, but only record one public API event. (b) Save internal chunk size information into trace events for all APIs. The most important is free where we record the free size, this allows easier tooling to compute running idea RSS values. Tested on x86_64 with some small applications and test programs.
2016-07-15Fix NULL return value handlingDJ Delorie1-0/+16
Decided that a call that returns NULL should be encoded in the workload but that the simulator should just skip those calls, rather than skip them in the converter.
2016-07-12Update to new binary file-based trace file.DJ Delorie1-14/+1
In order to not lose records, or need to guess ahead of time how many records you need, this switches to a mmap'd file for the trace buffer, and grows it as needed. The trace2dat perl script is replaced with a trace2wl C++ program that runs a lot faster and can handle the binary format.
2016-07-0532-bit fixes, RSS tracking, Free wiping.DJ Delorie1-20/+82
More 32-bit vs 64-bit fixes. We now track "ideal RSS" and report its maximum vs what the kernel thinks our max RSS is. Memory is filled with a constant when free'd.
2016-06-30Build fixes for in-tree and 32/64-bitDJ Delorie1-31/+59
Expand the comments in mtrace-ctl.c to better explain how to use this tracing controller. The new docs assume the SO is built and installed. Build fixed for trace_run.c Additional build pedantry to let trace_run.c be built with more warnings/errors turned on. Build/install trace_run and trace2dat trace2dat takes dump files from mtrace-ctl.so and turns them into mmap'able data files for trace_run, which "plays back" the logged calls. 32-bit compatibility Redesign tcache macros to account for differences between 64 and 32 bit systems.
2016-04-29changes to per-thread cache algorithmsDJ Delorie1-7/+116
Core algorithm changes: * Per-thread cache is refilled from existing fastbins and smallbins instead of always needing a bigger chunk. * Caches are linked, and cache is cleaned up when the thread exits (incomplete for now, needed framework for chunk scanner). * Fixes to mutex placement - needed to sync chunk headers across threads. Enabling the per-thread cache (tcache) gives about a 20-30% speedup at a 20-30% memory cost (due to fragmentation). Still working on that :-) Debugging helpers (temporary): * __malloc_scan_chunks() calls back to the app for each chunk in each heap. * _m_printf() helper for "safe" printing within malloc * Lots of calls to the above, commented out, in case you need them. * trace_run scans leftover chunks too.
2016-03-18Switch to datafile-based simulationDJ Delorie1-0/+392
Compiling a 78,000,000 entry trace proved to be... difficult. No, impossible. Now the trace is distilled into a pseudo-code data file that can be mmap'd into trace_run's address space and interpreted.