Koute Bytehound Versions Save

A memory profiler for Linux.

0.11.0

1 year ago

Major changes:

  • Added support for _rjem_aligned_alloc (jemalloc variant of aligned_malloc)
  • The initialization is now done eagerly in the static constructor, in case the profiled executable overrides all of the LD_PRELOAD-hooked functions by itself so none actually get called

0.10.0

1 year ago

Major changes:

  • Performance improvements; CPU overhead of allocation-heavy heavily multithreaded programs was cut down by up to ~80%
  • You can now control whether child processes are profiled with the MEMORY_PROFILER_TRACK_CHILD_PROCESSES environment variable (disabled by default)
  • The fragmentation timeline was removed from the UI
  • mmap/munmap calls are now gathered by default (you can disable this with MEMORY_PROFILER_GATHER_MAPS)
  • Total actual memory usage is now gathered by periodically polling /proc/self/smaps
  • Maps can now be browsed in the UI and analyzed through the scripting API
  • Maps are now named according to their source using PR_SET_VMA_ANON_NAME (Linux 5.17 or newer; on older kernels this is emulated in user space)
  • Glibc-internal __mmap and __munmap are now hooked into
  • Bytehound-internal allocations now exclusively use mimalloc as their allocator
  • New scripting APIs:
    • AllocationList::only_alive_at
    • AllocationList::only_from_maps
    • Graph::start_at
    • Graph::end_at
    • Graph::show_address_space
    • Graph::show_rss
    • MapList
    • Map
  • Removed scripting APIs:
    • AllocationList::only_not_deallocated_after_at_least
    • AllocationList::only_not_deallocated_until_at_most
    • Graph::truncate_until
    • Graph::extend_until
  • Removed lifetime filters in the UI: only_not_deallocated_in_current_range, only_deallocated_in_current_range
  • Fixed a rare crash when profiling programs using jemalloc
  • Added support for aligned_alloc
  • Added support for memalign
  • Relative scale in the generated graphs is now always relative to the start of profiling
  • Gathered backtraces will now include an extra Bytehound-specific frame on the bottom to indicate which function was called
  • Minor improvements to the UI

0.9.0

1 year ago

Major changes:

  • Deallocation backtraces are now gathered by default; you can use the MEMORY_PROFILER_GRAB_BACKTRACES_ON_FREE environment variable to turn this off
  • Deallocation backtraces are now shown in the GUI for each allocation
  • Allocations can now be filtered according to where exactly they were deallocated
  • Allocations can now be filtered according to whether the last allocation in their realloc chain was leaked or not
  • Profiling of executables larger than 4GB is now supported
  • Profiling of executables using unprefixed jemalloc is now supported
  • New scripting APIs:
    • AllocationList::only_matching_deallocation_backtraces
    • AllocationList::only_not_matching_deallocation_backtraces
    • AllocationList::only_position_in_chain_at_least
    • AllocationList::only_position_in_chain_at_most
    • AllocationList::only_chain_leaked
  • The server subcommand of the CLI should now use less memory when loading large data files
  • The behavior of malloc_usable_size when called with a NULL argument now matches glibc
  • At minimum Rust 1.62 is now required to build the crates; older versions might still work, but will not be supported
  • The way the profiler is initialized was reworked; this should increase compatibility and might fix some of the crashes seen when trying to profile certain programs

0.8.0

2 years ago

Major changes:

  • Significantly lower CPU usage when temporary allocation culling is turned on
  • Each thread has now its own first-level backtrace cache; this might result in higher memory usage when profiling
  • The MEMORY_PROFILER_BACKTRACE_CACHE_SIZE environment variable knob was replaced with MEMORY_PROFILER_BACKTRACE_CACHE_SIZE_LEVEL_1 and MEMORY_PROFILER_BACKTRACE_CACHE_SIZE_LEVEL_2 to control the size of the per-thread caches and the global cache respectively
  • The MEMORY_PROFILER_PRECISE_TIMESTAMPS environment variable knob was removed (always gathering precise timestamps is fast enough on amd64)
  • The default value of MEMORY_PROFILER_TEMPORARY_ALLOCATION_PENDING_THRESHOLD is now unset, which means that the allocations will be buffered indefinitely until they're either culled or until they'll live long enough to not be eligible for culling (might increase memory usage in certain cases)
  • Backtraces are now not emitted for allocations which were completely culled
  • You can now see whether a given allocation was made through jemalloc, and filter according to that
  • You can now see when a given allocation group reached its maximum memory usage was, and filter according to that
  • New scripting APIs:
    • Graph::show_memory_usage
    • Graph::show_live_allocations
    • Graph::show_new_allocations
    • Graph::show_deallocations
    • AllocationList::only_group_max_total_usage_first_seen_at_least
    • AllocationList::only_jemalloc
  • New subcommand: extract (will unpack all of the files embedded into a given data file)
  • The strip subcommand will now not buffer allocations indefinitely when using the --threshold option, which results in a significantly lower memory usage when stripping huge data files from long profiling runs
  • malloc_usable_size now works properly when compiled with the jemalloc feature
  • reallocarray doesn't segfault anymore
  • The compilation should now work on distributions with an ancient version of Yarn

0.7.0

2 years ago

Major changes:

  • The project was rebranded from memory-profiler to bytehound
  • Profiling of applications using jemalloc is now fully supported (AMD64-only, jemallocator crate only)
  • Added built-in scripting capabilities which can be used for automated analysis and report generation; those can be accessed through the script subcommand
  • Added a scripting console to the GUI
  • Added the ability to define programmatic filters in the GUI
  • Allocation graphs are now shown in the GUI when browsing through the allocations grouped by backtraces
  • Improved support for tracking and analyzing reallocations
  • Improved paralellization of the analyzer's internals, which should result in snappier behavior on modern multicore machines
  • The cutoff point for determining allocations' lifetime is now the end of profiling for those allocations which were never deallocated
  • The squeeze subcommand was renamed to strip
  • You can now use the strip subcommand to strip away only a subset of temporary allocations
  • Information about allocations culled at runtime is now emitted on a per-backtrace basis during profiling
  • Fixed an issue where the shadow stack based unwinding was incompatible with Rust's ABI in certain rare cases
  • mmap calls are now always gathered in order (if you have enabled their gathering)
  • Improved runtime backtrace deduplication which should result in smaller datafiles
  • Many other miscellaneous bugfixes

0.6.1

2 years ago

This is a bugfix release that fixes a possible deadlock when FDEs are dynamically registered at runtime.

0.6.0

2 years ago

Major changes:

  • Added a runtime backtrace cache; backtraces are now deduplicated when profiling, which results in less data being generated.
  • Added automatic culling of temporary allocations when running with MEMORY_PROFILER_CULL_TEMPORARY_ALLOCATIONS set to 1.
  • Added support for reallocarray.
  • Added support for unwinding through JITed code, provided the JIT compiler registers its unwinding tables through __register_frame.
  • Added support for unwinding through frames which require arbitrary DWARF expressions to be evaluated when resolving register values.
  • Added support for DWARF expressions that fetch memory.
  • Allocations are not tracked by their addresses anymore; they're now tracked by unique IDs, which fixes a race condition when multiple threads are simultaneously allocating and deallocating memory in quick succession.
  • mmap calls are now not gathered by default.
  • Rewrote TLS state management; some deallocations from TLS destructors which were previously missed by the profiler are now gathered.
  • When profiling is disabled at runtime the profiler doesn't completely shutdown anymore, and will keep on gathering data for those allocations which were made before it was disabled; when reenabled it won't create a new file anymore and instead it will keep on writing to the same file as it did before it was disabled.
  • The profiler now requires Rust nightly to compile.

0.5.0

4 years ago

Major changes:

  • Shadow stack based unwinding is now supported on stable Rust and turned on by default.
  • Systems where perf_event_open is unavailable (e.g. unpatched MIPS64 systems, docker containers, etc.) are now supported.
  • The mechanism for exception handling when using shadow stack based unwinding was completely rewritten using proper landing pads.
  • Programs which call longjmp/setjmp are now partially supported when using shadow stack based unwinding.
  • Shared objects dynamically loaded through dlopen are now properly handled.
  • Rust symbol demangling is now supported.
  • Fixed an issue where calling backtrace on certain architectures while using shadow stack based unwinding would crash the program.
  • The profiler can now be compiled with the jemalloc feature to use jemalloc instead of the system allocator.
  • The profiler can now be started and stopped programmatically through memory_profiler_start and memory_profiler_stop functions exported by libmemory_profiler.so. Those are equivalent to controlling the profiler through signals.

0.4.0

4 years ago

Major changes:

  • The profiler can now be compiled on Rust stable, with the caveat that the shadow stack based unwinding will be then disabled.
  • The profiler is now fully lazily initialized; if disabled with MEMORY_PROFILER_DISABLE_BY_DEFAULT the profiler will not initialize itself nor create an output file.
  • The signal handler registration can now be disabled with MEMORY_PROFILER_REGISTER_SIGUSR1 and MEMORY_PROFILER_REGISTER_SIGUSR2.
  • When the profiling is disabled at runtime it will more thoroughly deinitialize itself, and when reenabled it will create a new output file instead of continuing to write data to the old one.
  • The embedded server is now disabled by default and can be reenabled with the MEMORY_PROFILER_ENABLE_SERVER environment variable.
  • The base port of the embedded server can now be set with the MEMORY_PROILER_BASE_SERVER_PORT environment variable.
  • The MEMORY_PROFILER_OUTPUT now supports an %n placeholder.
  • The GUI has now a graph which shows allocations and deallocations per second.

0.3.0

4 years ago

Major changes:

  • More performance improvements. In the average case the cost per a single allocation was cut down to approximately 75%. Every thread has now its own unwind context, so stack traces can be now gathered in parallel.
  • The profiler should no longer crash on systems with a recent version of libstdc++ when a C++ exception is thrown.