| Age | Commit message (Collapse) | Author | Files | Lines |
|
In order to not lose records, or need to guess ahead of time how
many records you need, this switches to a mmap'd file for the trace
buffer, and grows it as needed.
The trace2dat perl script is replaced with a trace2wl C++ program
that runs a lot faster and can handle the binary format.
|
|
Core algorithm changes:
* Per-thread cache is refilled from existing fastbins and smallbins
instead of always needing a bigger chunk.
* Caches are linked, and cache is cleaned up when the thread exits
(incomplete for now, needed framework for chunk scanner).
* Fixes to mutex placement - needed to sync chunk headers across
threads.
Enabling the per-thread cache (tcache) gives about a 20-30% speedup at
a 20-30% memory cost (due to fragmentation). Still working on that :-)
Debugging helpers (temporary):
* __malloc_scan_chunks() calls back to the app for each chunk in each
heap.
* _m_printf() helper for "safe" printing within malloc
* Lots of calls to the above, commented out, in case you need them.
* trace_run scans leftover chunks too.
|
|
Compiling a 78,000,000 entry trace proved to be... difficult.
No, impossible. Now the trace is distilled into a pseudo-code
data file that can be mmap'd into trace_run's address space
and interpreted.
|