Geopm Versions Save

Global Extensible Open Power Manager

v1.0.0+rc3

5 years ago
  • Wed Apr 3 2019 Christopher M. Cantalupo [email protected] v1.0.0+rc3
  • Modified implementations and interfaces:
    • Finalized interfaces for 1.0.0 release.
    • Changed class naming scheme to drop "I" prefix from interface base classes and add "Imp" suffix to implementation classes.
    • Replaced ascend() and descend() Agent methods with more fine grained interface.
    • Modified MSRIOGroup to use JSON to store MSR data.
    • Updated utility classes for Agent interface changes.
    • Removed use of raw pointers from MSRIOGroup.
    • Added Helper function to list files in a directory.
    • Renamed split_string() to string_split().
    • Removed sort call from table dump since no longer needed.
    • Removed samples sent up tree from MonitorAgent.
    • Moved "PlatformTopo::m_domain_e" to a C enum "geopm_domain_e" in geopm_topo.h.
    • Changed GEOPM_DOMAIN_INVALID to -1 and shifted the all other domains values by one.
    • Renamed all references to the PlatformTopo::m_domain_e enum to use geopm_domain_e.
    • Removed PlatformIO::num_signal() and PlatformIO::num_control() from public interface.
    • Renamed PlatformIO method is_domain_within() to is_nested_domain().
    • Moved geopm_region_info_s to geopm.h.
    • Renamed Agent::report_node() to report_host().
    • Removed ProfileIOGroup from installed headers.
    • Renamed CircularBufferImp to CircularBuffer.
    • Moved MSRSignal and MSRControl into their own files.
    • Moved Imp classes for installed classes to own non-installed header.
    • Moved SharedMemory and SharedMemoryUser classes into separate headers.
    • Introduced FrequencyGovernor that holds common code for setting frequency.
    • Updated EnergyEfficientAgent and FrequencyMapAgent to use FrequencyGovernor.
    • Replaced ascend() and descend() methods in all built in agents to use new APIs.
    • Removed num_signal_pushed() and num_control_pushed() from public PlatformIO APIs.
    • Made tutorial shell scripts compatible with more shell variants.
  • Updated features:
    • Implemented and documented C wrappers for the PlatformIO class: geopm_pio_c(3).
    • Implemented and documented C wrappers for the PlatformTopo class: geopm_topo_c(3).
    • Changed implementation to stop sending messages about MPI regions nested inside of network hint regions.
    • Added command line option to geopmread(1) and geopmwrite(1) to create topology cache file.
    • Added make_unique and make_shared factory methods all installed C++ header classes.
    • Added check for RAPL lock bit when using power controls
    • Added UNCORE_RATIO_LIMIT MSR support for HSX, BDX, and SKX.
    • Added per-region power to Report.
    • Enabled MSRIOGroup to extend MSRs through JSON file at runtime located in GEOPM_PLUGIN_PATH.
    • Added MSR methods for parsing function and units strings.
    • Introduced FrequencyMapAgent which runs regions at specified frequencies.
    • Added --enable-beta configure flag which installs beta features with make install target.
  • Updated and extended integration tests:
    • Ignore failures for missing python packages.
    • Added feature to save/restore power limit and frequency between each integration test.
  • Updated unit tests:
    • Added more unit tests for Helper.
    • Fixed AgentFactoryTest.
  • Updates to documentation:
    • Added documentation on MPI requirements for geopm_prof_c(3) APIs.
    • Removed references to endpoint in documentation since this is still a beta feature.
    • Added documentation about Agent report/trace extension name conventions.
    • Add man page for geopm_pio_c(3) and geopm_topo_c(3).
    • Add man page for geopm_agent_frequency_map(7).
  • Bug fixes:
    • Fixed EnergyEfficientAgent so it actually functions properly.
    • Fixed issue with using temporary script in launcher to execute lscpu.
    • Fixed missing input parameter checks in PlatformTopo and PlatformIO.
    • Fixed Fortran build and missing dependency that could break parallel builds.

v1.0.0+rc2

5 years ago
  • Fri Feb 22 2019 Christopher M. Cantalupo [email protected] v1.0.0+rc2
  • Modified implementations and interfaces:
    • Rename GEOPM_PROFILE_TIMEOUT environment variable to GEOPM_TIMEOUT.
    • Modify default behavior when using the geopmlaunch: --geopm-ctl=process --geopm-report=geopm.report.
    • Introduce --geopm-disable-ctl CLI option for geopmlaunch to preserve passthrough behavior.
    • Remove geopm_prof_init() interface from installed header.
    • Fix geopmhash example command line tool.
    • Update plugin loading implementation to use C++.
    • Refactor IOGroup lookup in PlatformIO.
    • Modify analysis power sweep to consider multiple packages.
    • Support lscpu versions that omit 0x from hex values.
    • Do not install Comm.hpp or MPIComm.hpp.
    • Modify time signal to be scoped to the CPU.
    • Rename M_UNITS_HZ to M_UNITS_HERTZ
    • Add tables module to Python requirements.
    • Change MSR names to match names in Intel (R) Software Developers Manual.
    • Make end bit of MSR bitfield inclusive.
    • Add descriptions for built-in signals and controls.
    • Align launcher names and programmatically generate list of supported launchers.
    • Modified Agent::validate_policy() interface.
    • Add stricter domain checks in TimeIOGroup and CpuinfoIOGroup
    • Fix configuration and build issues with ompt.
    • Disable python unit testing in RPM check target.
    • Remove uninstalled files from spec file.
  • Updated features:
    • Update tracer to enable user specified column signals to also specify domain.
    • Update reporter to enable user specified signals and domains.
    • Add REGION_HASH and REGION_HINT signals.
    • Remove all references to the region_id from public interfaces.
    • Add domain aggregation for read_signal and write_control.
    • Add TEMPERATURE as default trace column.
    • Add split_string() helper function.
    • Install geopm_hash.h and add man page.
    • Add helper function to replace gethostname().
    • Improve trace column header names for PowerBalancerAgent.
    • Modify how epoch totals are calculated.
  • Updated and extended integration tests:
    • Fix fence-post problem in test_trace_runtimes.
    • Skip EnergyEfficientAgent integration test on non-BDX platforms.
  • Updated unit tests:
    • Fix timing issue with PowerGovernorAgentTest.wait test.
    • Fix geopmagent CLI test.
    • Clean up PlatformIOTest.
    • Update to googletest v1.8.1.
    • Optimize Travis CI build.
  • Updates to documentation:
    • Update man pages to reflect environment extension of report and trace.
    • Update man pages for Agg, CircularBuffer, IOGroup, Exception, Helper, RegionAggregator, SharedMemory, PluginFactory, MSR, MSRIO, and MSRIOGroup classes.
    • Update geopm_region_id_c.3 man page.
    • Update geopm_sched.3.ronn.
    • Clean up geopmlaunch man page.
    • Update man pages for IOGroups
    • Add tutorial about plugin loading order.
    • Add missing links to geopm(7) man page.
    • Update copyright date to 2019.
    • Use BLURB in geopm.7 man page.
    • Sync spec file for OpenHPC with the one published with OpenHPC.
    • Change die.net links to man7.org
  • Bug fixes:
    • Fix all timeouts for usages of SharedMemoryUser to reflect geopm_env_profile_timeout().
    • Fix energy status units for DRAM on Haswell and Broadwell.
    • Fix energy reporting on multi-socket systems.
    • Fix issue when application calls MPI_Init_thread() to increase thread level to match GEOPM requirements.
    • Fix broken build when configured with --enable-overhead.
    • Fix issues detected with clang.
    • Fix launcher args for IMPI.
    • Fix throw in Tracer when reading hash and hint which are allowed to be zero.

v1.0.0-rc1

5 years ago
  • Release overview:
    • This is the first candidate for the v1.0.0 release of the GEOPM package.
    • The version 1.0 is significant in that semantic versioning https://semver.org/ is intended for all subsequent releases.
    • The APIs defined by all installed header files and the documented behavior of those interfaces shall remain compatible with linking applications until version 2.0.
    • The documented definition for all built in signals and controls supported by PlatformIO is not intended to change prior to version 2.0.
  • Expected changes prior to v1.0.0 release:
    • The documentation included in this release candidate will be improved upon prior to the actual v1.0.0 release.
    • Man pages which currently link to doxygen will be filled in.
    • The definition of the high order bits in the REGION_ID# signal supported by PlatformIO may be changed in the way documented in the PlatformIO(3) man page to split into two signals (REGION_ID AND REGION_HINT).
    • It is possible that interface classes currently prefixed with "I" may be renamed to exclude the "I" (e.g. IPlatformIO -> PlatformIO).
    • In this case the concrete implementation would be appended with "Imp" (e.g. PlatformIO -> PlatformIOImp).
    • The appearance of the epoch signal in the REGION_ID column of the trace will be removed.
    • The EPOCH_COUNT signal will be added to the default set of traced signals to enable tracking of epoch calls.
  • High level summary of changes since v0.6.1:
    • With this release we have removed all references to the Policy, Decider, Platform and PlatformImp objects.
    • These have been replaced by the PlatformIO / IOGroup / Agent class interactions.
    • The Kontroller object which was supporting the new code path has been renamed Controller.
    • The legacy Controller implementation has been removed.
    • GEOPM no longer depends on the hwloc library, and is relying on running lscpu on compute node instead.
  • Modified implementations and interfaces:
    • Rename launcher to geopmlaunch.
    • Do not install geopmanalysis and geopmplotter command line utilities.
    • The command line interfaces for these tools will be changing.
    • Once they are committed, we will begin installing them again.
    • Remove unused error codes from geopm_error.h.
    • Remove some deprecated interfaces and files.
    • Remove legacy artifacts from Reporter and Tracer.
    • Remove legacy structures from geopm_message.h.
    • Remove deprecated API headers.
    • Remove CtlConf Python object.
    • Remove region ID memory from derivative for power signals, this is a feature for agent to implement.
    • Remove unused arguments from the geopmctl_main.
    • Remove push_combined_signal() from PlatformIO interface.
    • Remove NAN check for policy in Controller. Agents are responsible for handling NAN.
    • Remove IPlatformTopo::define_cpu_group(). This method is not implemented and not used.
    • Remove MPI bit from region ID in report.
    • Remove install of geopm_message.h and geopm_plugin.h.
    • Remove environment variables for min/max frequency used by EnergyEfficientAgent: this functionality is provided through the policy as documented.
    • Fixes for online mode of EnergyEfficientAgent: ignore 0.0 when sampling runtime, fix min/max frequency range in analysis.py, fix final requested frequency printed in report.
    • EnergyEfficientAgent no longer considers DRAM energy in its optimization.
    • Change default frequency for hints from min to max in EnergyEfficientAgent.
    • Implement EnergyEfficientAgent analysis using hints only.
    • Change meaning of EPOCH_RUNTIME signal: MPI and ignore time reported explicitly and a separately.
    • Install many C++ headers into /usr/include/geopm.
    • Move geopmbench source files files from tutorial directory into src.
    • Don't copy any files from src into tutorials.
    • Update tutorials to use Agent code path.
    • Throw if multiple hints given to geopm_prof_region.
    • Allow writing controls for containing domains: the same value will be written to every subdomain.
    • Update EpochRuntimeRegulator accounting: PKG and DRAM energy dissociated from rank.
    • Updated to report pre-epoch MPI and ignore runtime.
    • Make TreeComm fan out configurable with environment variable.
    • Per thread progress is supported by the 'REGION_THREAD_PROGRESS' signal.
    • Align command line options to the launcher and the environment variables used by the controller.
    • Merge tutorial Makefiles into one and remove duplicate scripts.
    • Rename runtime related APIs.
    • Merge ProfileIO into ProfileIOSample.
    • Refactor analysis.py command line parsing to use argparse, etc.
    • Move some header includes from headers into source files when possible.
    • Change "POWER_PACKAGE" control name to "POWER_PACKAGE_LIMIT".
    • Expose MSR PKG_POWER_LIMIT fields as signals.
    • Reorder directory search in plugin load: load plugins from right to left to so leftmost plugin wins in case of IOGroup loading same name for controls and signals.
    • Use accumulator member in EpochRuntimeRegulator for MPI runtime.
    • Changes to the launcher for mpiexec using in hydra
    • Move set_policy_defaults to Agent interface
    • Aggregation functions have been moved out of PlatformIO and into their own class: Agg.
    • Implement agg_function for IOGroups, including tutorial.
    • Do not stop integration test in looper if one test fails.
    • Increase shmem table size to 2MB per rank to reduce risk of overflow.
    • Remove hash table structure in ProfileTable; all regions now use the same table entry.
    • Change CpuinfoIOGroup to throw in constructor if cpuinfo could not be parsed.
    • In python analysis do not parse traces if total size is more than half of memory.
    • Remove redundant HDF5 cache from analysis.py.
    • Remove TURBO_RATIO_LIMIT2 control for platforms where it is not in whitelist.
    • Read multiple samples for a short time in geopmread to support POWER signals.
    • Narrow scope of warning message about cpufreq governor: only print warning when an attempt is made to write to a control that begins with POWER or FREQUENCY.
    • Prevent MSRIOGroup from throwing when saving MSRs.
    • Implement and use AgentConf in python code to create agent polices.
  • Updated features:
    • Add timestamp counter to available signals.
    • Add --info option to geopmread and geopmwrite.
    • Add check for invalid GEOPM_CTL values.
    • Add temperature signals.
    • Add Imbalancer interface to libgeopm and libgeopmpolicy: Imbalancer_() -> geopm_imbalancer_().
    • Add some placeholder descriptions to MSRIOGroup and TimeIOGroup to support integration tests.
    • Add methods to RegionAggregator to get region IDs and signals.
    • Add methods to PlatformIO to provide signal/control descriptions: this will be used to augment geopmread/write with descriptions.
    • Add description APIs for IOGroup: allows IOGroups to provide a user-friendly description of signals/controls.
    • Add GEOPM_TIME_REF constant for use with geopm_time_*() APIs.
    • Add INSTRUCTIONS_RETIRED alias signal.
    • Add TIMESTAMP_COUNTER alias for MSRIOGroup.
    • Add signal to enable reading of the RAPL lock bit.
    • Add PKG_POWER_LIMIT MSR fields as a signal.
    • Add expect_same aggregation function that returns NAN if any elements of the vector differ.
    • Add average node frequency to EnergyEfficientAgent tree samples.
    • Add support for POWER_* as signals that give meaningful results without runtime.
    • Add module conflict of darshan to theta module file.
    • Add psutils python dependency.
    • Add warnings for system misconfiguration.
    • Add read_file() to Helper.hpp.
    • Add job start in Trace and Report headers.
    • Add outlier detector script.
    • Add handling of NAN for default policy values to all agents.
    • Add parsing for overhead fields to io.py.
    • Add reading of the thread table through PlatformIO.
  • Updated and extended integration tests:
    • Ignore misconfigured system warnings in integration test.
    • Remove ignore of multiple plugin load warnings that stopped occurring after removal of legacy code.
    • Do not test epoch runtime in test_region_runtimes.
    • Add all2all to power_balancer integration test.
    • Adjust power_balancer test logic to compare Governor and Balancer relatively.
    • Fix EnergyEfficientAgent integration test.
    • Test decorators implemented to use launcher. This forces the checks to be run on the compute nodes.
    • Update integration tests to reflect removal of legacy code path.
    • Update test_power_consumption to use PowerGovernor.
    • Fix integration test to exclude MPI and model-init regions from tests using traces.
    • Fix integration test to use assertNear to account for new MPI region markup.
    • Move GEOPM_EXEC_WRAPPER functionality into integration test.
  • Updated unit tests:
    • Add tests of domain aggregation for pushed signals.
    • Add test for geopmread signal aggregation.
    • Stop the unit tests from littering files.
    • Fixed signed / unsigned comparison issue in PlatformIO test.
    • Update unit tests to reflect removal of legacy code path.
    • Add test of IOGroup factory that checks that an IOGroup's list of signal/control names are all valid.
  • Updates to documentation:
    • Update GEOPM main README.
    • Add doxygen target for public interface files.
    • Add man pages for all C++ headers that are now installed to support plugin development.
    • Full man pages have been added for PluginFactory, PlatformIO, PlatformTopo, Agent, and IOGroup.
    • Add documentation about aliasing signals and controls.
    • Update launcher ronn to include references to env vars.
    • Add README for outlier_detection.
    • Update the tutorial README.md to reference geopmbench and point out the agent and iogroup subdirectories.
    • Document how to build GEOPM with Intel Toolchain.
    • Fix example source code in geopm_prof_c.3 man page.
    • Add man pages for geopm_time.h and geopm_imbalancer.h.
    • Update Doxygen to reflect removal of legacy code path.
    • Remove alpha and beta labels from documentation.
  • Bug fixes:
    • Fix how starting energy counters are recorded in EpochRuntimeRegulator.
    • Fix timestamp issue with Tracer.
    • Fix region handling in Reporter hints.
    • Fix OMPT enabled pthread launch with Controller/Agent.
    • Fix for invalid function for some MSR signals.
    • Fix for EnergyEfficientAgent policy: initialize min and max frequency to NAN.
    • Fix EnergyEfficentAgent offline analysis parsing.
    • Fix geopmbench stream benchmark which was using too little memory.
    • Fix python tests to print better warnings and avoid print command.
    • Fix for MPI region entry: MPI regions used in GEOPM startup were given a region ID of 0.
    • Fix initialization of per rank ignore and mpi runtime.
    • Fix default policy generated by geopmagent to properly represent NAN.
    • Fix reporting of MPI and ignore runtime prior to first epoch for report totals.

v0.6.1

5 years ago
  • Hotfix for v0.6.0 release.
    • Fix MPI functions called during startup getting assigned region 0.
    • Fix missing profiling of some MPI functions when called from fortran.
    • Fix performance regression due to attempt to profile non-blocking MPI calls.
    • Fix to remove unsupported MSR from skylake platform definition (TURBO_RATIO_LIMIT2).
    • Fix to prevent throw when trying to save/restore MSRs that are not supported on the system.

v0.6.0

5 years ago
  • Stabilized Agent code path.
  • Last release with Decider/Platform/PlatformImp support.
  • Modified implementations and interfaces:
    • Modify PowerGovernor to ignore DRAM power and tune parameters for power balancer.
    • Profile larger set of MPI functions including non-blocking routines.
    • Removed push_region_signal_total() and sample_region_total() from PlatformIO.
    • This functionality is available to Agents by creating an instance of RegionAggregator.
    • Redesigned geopmanalysis command line interface so that the first argument selects the analysis type.
    • Add options to geopmanalysis for min and max frequency for frequency sweep analysis types.
    • Remove geopmanalysis --level option and replace with --summary and --plot.
    • This allows summaries and/or plots to be generated separately.
    • Add option to use agent code path to geopmanalysis (use_agent).
    • Change EnergyEfficientAgent frequency map to use JSON format.
    • Introducing GEOPM_EXEC_WRAPPER environment variable useful for inserting a debugger into the integration tests.
    • Reuse same idx val for repeated pushes of signals/controls.
    • Cat lscpu output to /tmp prior to running job and avoid popen call inside of MPI app.
    • Change PowerGovernorAgent::wait() to use time instead of RAPL updates.
    • Get rid of C-string from ProfileTable implementation.
    • Add max_level() to TreeComm.
    • Introducing the PowerGovernor class.
    • Introducing Agent::aggregate_sample() static helper function for Agents.
    • Add agent field to io.py dataframe index. Note: this will break compatibility with scripts that use the old index.
    • Rename RAPL related MSR names: SOFT_POWER_LIMIT to PL1_POWER_LIMIT and HARD_POWER_LIMIT to PL2_POWER_LIMIT.
    • Add geopm_time_since() method.
    • Update the analysis.py energy references.
    • Add RegionAggregator class for per-region signal totals.
    • Update Reporter to use RegionAggregator.
    • Changed region counts to start at -1 before first entry.
    • Get rid of unused and undocumented environment variable GEOPM_REPORT_VERBOSITY.
    • Modify launcher to set LD_PRELOAD only for application.
    • Change some AppOutput methods to return pandas Dataframes instead of Report/Region objects.
    • Add barrier in MPI_Init prior to GEOPM startup.
    • Have RootRole throw if bad power cap is set.
  • Updated features:
    • Introducing the new PowerBalancer agent with many commits since v0.5.1 that tweak the algorithm.
    • Ignore epoch calls when made inside of a region marked with the ignore hint.
    • Add MSRIOGroup signals that return the raw value of an MSR.
    • Use slurm option to select the performance power governor when using GEOPM.
    • Add a spec file for building GEOPM for ALCF Theta.
    • Add profile name and agent to trace header.
    • Add CYCLES_THREAD and CYCLES_REFERENCE to trace.
    • Add Agent support in python scripts.
    • Add CORAL 2 version of AMG to examples.
    • Update markup for miniFE example to set region ID once per region.
    • Update nekbone patches for scaling studies.
    • Suppress OMP warnings in launcher when using Intel toolchain.
    • Add PowerSweepAnalysis type to geopmanalysis.
    • Add BalancerAnalysis type to geopmanalysis.
    • Add NodeEfficiencyAnalysis type to geopmanalysis.
    • Add NodePowerAnalysis type to geopmanalysis.
    • Introduce a plotter method to generate histograms.
    • Have ManagerIO skip policy file parsing if agent has no policies.
    • Add HDF5 caching for parsed reports and traces to io.py.
    • Add summary features to analysis where summarized data is written to files in ascii tables.
  • Updated and extended integration tests:
    • Updates to integration tests to support the Agent / PlatformIO code path are a major feature of this release.
    • Adding back integration test for power balancer with increased time limit.
    • Automatically infer architecture based on hostname.
    • Add monitor as available agent to run integration tests.
    • Use regular runtime for epoch in test_region_runtimes.
    • Require balancer test to run in an allocation.
    • Checks average power limit across nodes is under cap in test_power_balancer.
    • Add integration test that runs GEOPM, but does not generate reports.
  • Updates to documentation:
    • Add documentation to the README about the scaling_governor.
    • Add documentation of constructor attribute for plugins to geopm(7) man page.
    • Add documentation for hint ignore interaction with geopm_prof_epoch().
    • Add documentation for all of the supported region hints.
    • Remove documentation about node barrier enforced by epoch call, this is no longer true.
    • Remove reference to MPIEXEC from spec file.
    • Add missing launcher options to help text.
  • Updated unit tests:
    • Add PowerBalancer unit tests.
    • Add PowerBalancerAgent unit tests.
    • Add analysis.py unit tests.
    • Add more detailed checks of TreeComm calls to KontrollerTest.
    • Add tests of geopmanalysis CLI.
    • Fix tests for ControlMessage.
  • Bug fixes:
    • Fix catch-value warning from GCC 8.
    • Fix possible C string truncation.
    • Fix for null characters sometimes appearing in report header.
    • Fix string sizing for strncpy and snprintf for gnu8.
    • Fix null termination in case of string overflow.
    • Fix in PowerGovernorAgent where fan_in could be accessed out of bounds.
    • Fix Kontroller index into Agent array; the level 0 Agent should not do descend() or ascend().
    • Fix issue where second region runtime is longer than first: move region exit barrier after call to sample.
    • Fix geopmagent so it can create empty json files.
    • Fix launcher to handle --cpu-bind as well as --cpu_bind.
    • Fix failure to restore fixed counter MSRs at end of GEOPM runtime.
    • Fix epoch region ID detection in io.py.
    • Fix for test_trace_runtimes with agent code path.
    • Fix performance issue: if power will be controlled, adjust one CPU per package.
    • Fix EnergyEfficientAgent init().
    • Fix issue where geopm would try to restore MSR MISC_ENABLE which is read only.
    • Fix test_power_consumption to measure socket power only.
    • Fix order of MSR save / agent init() to avoid failure to restore time window setting.
    • Fix --enable-overhead configure option
    • Fix pthread launch for Agent code path.
    • Fix Fortran comm initialization.
    • Fix handling of bad OMP masks.
    • Fix for klocwork error: missing null check.
    • Fix pthread launch when using MPICH by enabling MPI_THREAD_MULTIPLE in environment.
    • Fix pthread launch issue in Cray Linux by using secure versions of the CPU_SET macros.
    • Fix hang when runtime is active but report has not been requested.
    • Fix python scripts to support old data missing separate dram energy in report.
    • Fix python scripts to handle new agent field in parsed header.
    • Fix race in ControlMessage that could cause hang at GEOPM runtime start up.
    • Fix for ompt region names in Reporter.
    • Fix issue where slack was calculated prior to adding in extra power in PowerBalancingAgent.

v0.5.1

5 years ago
  • Introduce the PowerGovernorAgent. This agent is implemented and fully featured.
  • Restoring the MSR values at the end of a run is now best effort since the system whitelist may prevent the write from being allowed.
  • Allow min/max frequencies to be specified in the EnergyEfficientAgent's policy.
  • Fix geopmread usages for tutorial.
  • Fix MSR overflow logic, performance counter initialization, and MSR encode/decode functions.
  • Fix integration tests for geopmwrite use cases.

v0.5.0

5 years ago
  • Community updates:

  • Modified implementations and interfaces:

    • Major refactor of the controller and plugin architecture is provided as an optional new code path.
    • Most of the changes made to the implementation for this release modify the new code path.
    • The old code path is still available for users as long as the controller is run without the GEOPM_AGENT environment variable set.
    • The new code path will be active if the user selects an agent by name with the GEOPM_AGENT environment variable when launching the controller.
    • The old code path is maintained in the current Controller object along with the the Decider / Platform / PlatformImp plugins.
    • The new code path is maintained in a replacement for the Controller which has been temporarily named the Kontroller.
    • The Kontroller will be renamed the Controller after this release, and the old code path will no longer be available.
    • Similar to the Kontroller/Controller replacement, the KprofileIOGroup KprofileIOSample and KruntimeRegulator are temporary replacements for their non-K counterparts and will be renamed.
    • The beta release enables a new set of plugin interfaces named the IOGroup, Agent, and Comm.
    • It is through the IOGroup, Agent and Comm plugins that the GEOPM runtime can be extended.
    • The Decider / Platform / PlatformImp plugin extensions are deprecated and will be removed after this release.
    • The IOGroup plugin enables a user to add new signal and control mechanisms for an Agent to read and write.
    • The Agent plugin enables a user to add new monitor and control algorithms to the GEOPM runtime.
    • MPI use by the GEOPM runtime which is not linked by application has been completely encapsulated in the Comm object.
    • The tutorial has been extended with two new directories: tutorial/agent and tutorial/iogroup.
    • The tutorial/iogroup directory documents how to write an IOGroup plugin.
    • The tutorial/agent directory documents how to write an Agent plugin.
    • The interface to the resource manager has been made much more flexible for supporting the new Agent interfaces.
    • The resource manager interface is documented in the geopm_agent_c(3) and geopm_endpoint_c(3) man pages.
    • Additionally command line tools have been proposed and partially implemented to support the interfaces documented in those man pages.
    • The geopm_agent_c(3) APIs and geopmagent(1) CLI has software support.
    • The endpoint interfaces are a work in progress that has not yet been integrated into the mainline source.
    • The PlatformIO object provides the interface to the IOGroups.
    • The PlatformIO C++ object will soon have an associated C interface documented as geopm_platformio_c(3).
    • The geopmread and geopmwrite provide a CLI to the PlatformIO features.
    • Introducing the MSRIOGroup which provides an implementation of the IOGroup for MSRs.
    • Introducing the TimeIOGroup which provides an IOGroup for the time signal.
    • Introducing the CpuinfoIOGroup which provides data from /proc/cpuinfo as signals.
    • Introducing the ProfileIOGroup which provides profile data collected from the main compute application through the geopm_prof_c(3) APIs.
    • The release includes three new installed binaries: geopmread, geopmwrite, and geopmagent.
    • Each of these command line interfaces is documented with a man page and there is a man page for a future command line tool called geopmendpoint.
    • Deprecated geopm_policy_() interfaces that have been replaced with the geopm_agent_() and geopm_endpoint_*() APIs.
    • Introducing the first three Agent implementations: MonitorAgent, PowerBalancerAgent, and EnergyEfficientAgent.
    • Introducing PlatformTopo, replacement for PlatformTopology.
    • Introducing DefaultProfile singleton which supports geopm_prof_c(3) APIs for profiling.
    • Added documentation for monitor, energy_efficient, and power_balancer Agents, but the implementation is not currently aligned.
    • The monitor agent is implemented and fully featured.
    • The energy_efficient agent will soon be extended to match the man page, and currently use of the network is not enabled.
    • The existing implementation of the energy_efficient agent does currently provide similar functionality to the efficient_freq Decider.
    • The power_balancer agent is a work in progress that is not well aligned with the man page, but will be feature complete soon.
    • Reports and traces generated by Agent code path are designed to be backward compatible with reports and traces generated with the Decider code path.
    • New environment variables documented in geopm(7): GEOPM_ENDPOINT, GEOPM_AGENT, GEOPM_TRACE_SIGNALS, and GEOPM_DISABLE_HYPERTHREADS.
    • Remove GEOPM_ERROR_AFFINITY_IGNORE environment variable, no longer required for testing.
    • New plugin registration mechanism has been put in place and new factory has been implemented.
    • Replace independent factories with single templated class the PluginFactory.
    • No longer register a plugin using a half instantiated object.
    • Removed call to dlsym, and plugins now use attribute((constructor)) to specify a callback target used when plugin is loaded.
    • In this callback the plugin should register with its respective factory.
    • Each plugin type has a make_plugin() static method that creates the plugin object and returns a pointer to the base class.
    • The make_plugin() function pointer is what is registered with the factory.
    • Extend the PluginFactory to require a the registration of a dictionary (map<string,string>) to enable queries of plugin capabilities.
    • Use stricter criterion for selecting plugin files to load, name must be of the form libgeopmpi*.so.0.0.0 where 0.0.0 is the GEOPM ABI version.
    • Moved geopm_plugin_description_s definition to geopm.h.
    • Add a configure option to enable use of the msr-safe ioctl interface for writing with PlatformIO.
    • The msr-safe ioctl interface should not be used for writing unless the system has an msr-safe installation that has fixed https://github.com/LLNL/msr-safe/issues/38.
    • Added APIs for manipulating hint bits in region id hash.
    • Many changes were made to modernize the use of C++.
    • Change protected members of all classes to private where possible.
    • Replace all raw pointer usage with C++11 smart pointers if possible.
    • Use default keyword for constructors and destructors where appropriate.
    • Use delete keyword rather than throw to avoid copy constructor.
    • Add override keyword to derived classes.
    • Use forward declaration of classes rather than include one header inside of another.
    • Add and integrate make_unique implementation for C++11.
    • Confirmed const correctness for all class methods.
    • Add public interface to register IOGroups with PlatformIO which enables IOGroups to be created at runtime.
    • Standardize the IOGroup signal and control names so that they are prefixed by the IOGroup name and two colons.
    • Agents should generally use high level aliases rather than these low level signals and controls.
    • Introduce functions for converting between signals and bit-fields to allow for PlatformIO to provide full 64 bit integer signals like the region ID.
    • Add overflow function type to MSR class.
    • Change frequency APIs to use Hz to enforce uniform use of SI units.
    • Use instruction offset in OMPT derived region name; this resolves a name ambiguity when more than one OpenMP region is discovered within the same function.
    • Use gmock archive uploaded to the geopm organization on github.
    • PlatformTopo is built on top of lscpu and does not require hwloc.
    • Throw on GlobalPolicy misconfiguration earlier in the runtime execution.
    • Rename SimpleFreqDecider to EfficientFreqDecider which will be replaced by EnergyEfficientAgent.
    • Update to efficient Decider and Agent related environment variables according to above name changes.
    • The json-c library is no longer a dependency, all references have been removed.
    • Now using the json11 library which is distributed in the "contrib" sub-directory.
  • Updated features:

    • Enable Agent to augment report and trace.
    • Enable user to augment trace through environment variable GEOPM_TRACE_SIGNALS in new code path.
    • Changes to PlatformIO to support non-CPU domains.
    • Added MSR save/restore functionality to PlatformIO save/reset interfaces.
    • Allow loading PlatformIO when some IOGroups fail to load.
    • Add aggregation functions to PlatformIO to encode how to combine signals.
    • Add PlatformTopo methods for converting domain to string and vice-versa.
    • Add signal_names() and control_names() to PlatformIO and IOGroup.
    • Add Skylake server (SKX) as a supported platform.
    • Add Haswell and SandyBridge MSRs to PlatformIO interface.
    • OMPT report region names include instruction offset, now two OpenMP regions within the same function can be distinguished.
    • Add region runtime as default trace column.
    • Simpler column names in trace; print some columns using old names.
    • Change region ID to hex in report and trace.
    • Order regions in report by runtime.
    • Add application total ignore time to report.
    • Replace tabs with spaces for report formatting.
    • Enable PlatformIO to support Epoch based signals.
    • Add power signals to PlatformIO using derivative calculation previously done in Region object.
    • Add PlatformIO aliases for region ID, progress, frequency and energy.
    • Add CombinedSignal class which is used to combine signals from different IOGroups.
    • Allow for a user provided number of experiment iterations (loops) to perform for each geopmanalysis type
    • Enable geopmanalysis to provide more detailed information about the results
    • Allow turbo to be skipped by geopmanalysis when determining the best per-region frequencies.
    • Updates to geopmanalysis python script to bypass trace parsing if requested and in debug plot ignore check for multiple profile names.
    • Use hyphen instead of underscore in geopmanalysis options for consistency with other interfaces.
    • Don't require -n and -N with geopmanalysis when skipping launch.
    • Pass output_dir through to plotter when using geopmanalysis.
    • Changes to analysis.py for SC17 data: multiply energy percent by 100, have frequency sweep plots use frequencies from profile name.
    • Add geopmanalysis option to specify controller launch method.
  • Updated and extended integration tests:

    • Integration tests validated with the GEOPM_AGENT set to test new code path.
    • A few problems with the new code path exposed by integration tests have been added to github issues.
    • A few changes to support integration tests with new code path have been integrated.
    • Change io.py and integration tests: Allow hex numbers for region ID in report, skip extra lines in report.
    • Remove Platform plugin registration.
    • Update EfficientFreqDecider to use new runtime metric for performance.
    • Update EfficientFreqDecider to use PlatformIO directly and remove method from Policy object for adjusting frequency.
  • Updated unit tests:

    • Many unit tests have been added to accompany the new code path which has many new classes.
    • The new classes were specifically designed to enable unit testing poorly covered code that it refactors.
    • Refactor Profile constructor into testable functions.
    • Add unit tests for Profile class.
    • Simple profile class in test directory for testing and debug: enables profiling of the GEOPM runtime itself.
    • More detailed checks of messages in unit tests when exceptions are thrown.
    • Fix test-license to assert that files in MANIFEST.EXEMPT exist.
    • Remove TestPlugin code that is not used by tests.
    • Add make check target to tutorial build.
  • Bug fixes:

    • Update GEOPM runtime C APIs to print to standard error instead of having the controller suppress error messages.
    • Handle exceptions that occur during app/controller handshake.
    • Enable timeout rather than hang if Controller or application fail during execution.
    • Fix for package-scoped MSRs that will write to all CPUs in a package rather than just one.
    • Fix HSX and SKX frequency control MSRs to core domain.
    • Fix issue when running on systems with offline CPUs.
    • Do not report a completed send if policy or sample contains a NAN.
    • Fix lscpu parsing for offline CPUs.
    • Exclude regions with 0 count from report, except unmarked region, which is always 0.
    • Add verbose error message when PluginFactory::dictionary() is called with plugin name that has not been registered.
    • Fix get_alloc_nodes for slurm in geopmpy launcher
    • Fix for test_power_consumption to checks the current platform cpuid to decide power budget.
    • Fix geopmpy.launcher for Intel's mpiexec: does not accept -- as a separator for positional arguments.
    • Fix for when GEOPM_PLUGIN_PATH contains multiple paths.
    • Fix tutorial tarball so that it will build out of place.
    • Fix shared memory issues during start-up when launching the Controller as a separate application.
    • Remove erroneous double split of the Controller's comm; the ppn1 comm is already passed into the constructor.
    • Fix test to use in-memory file system to avoid adding missing msync() calls.
    • Fix resource leak in TreeCommunicator constructor.
    • Fix tracing capability with geopmanalysis.
    • Leave -- separator in list of arguments to avoid parsing command line arguments intended for application as launcher arguments.

v0.4.0

6 years ago
  • Modified implementations and interfaces:
    • Updated algorithm for choosing CPU affinity in the launcher: fill application CPUs from back to front, and never share physical cores between MPI ranks.
    • Created new abstraction for interfacing with MSRs and more broadly for abstracting hardware IO (PlatformIO, MSRIO, and MSR classes).
    • Application region hints are now properly exposed to the decider.
    • Added geopmanalysis executable to the geopmpy package; this executable runs applications and performs analysis of power and performance based on GEOPM report and trace data.
    • Added geopmbench to the installed binaries; this is simply an installed version of the tutorial_6 executable.
    • Added GEOPM_RM environment variable and --geopm-rm command line option to select geopmpy.launcher's back end resource manager.
    • Updated man pages to include geopmanalysis and geopmbench.
    • Removed handling of SIGCHLD signal in GEOPM runtime (commonly raised in non-error conditions when using popen(3)).
    • Launcher will guess correct number of OpenMP threads if user has not specified.
    • Added warning message at start up if report and trace files will not be created due to permissions issues.
    • Added better error handling to tutorial sources.
    • Added support for geopmctl to be run as a different user than application.
    • Added support for user provided shmkey's that do not begin with '/'.
    • Added error checking in launcher user requests more ranks per node than there are cores per node.
    • Added more robust error checking for command line issues in launcher.
    • Added command line option to launcher to exclude use of hyperthreads: --geopm-disable-hyperthreads.
    • If a plugin fails at registration time, do not bring down the controller; a warning is printed if debug is enabled.
    • Remove -s parameter from geopmctl CLI (was being ignored).
    • Encapsulated use of MPI by GEOPM inside of a class abstraction (IComm), but controller has not been modified to use the new class due to deadlock bug.
    • Encapsulated in a class the handshake interface between the controller and the application across shared memory.
    • General clean up of the geompy.plotter implementation.
    • Added more error checking in Controller.
    • Some fixes for issues exposed by static analysis.
  • Updated features:
    • Added new decider called "simple_freq" that adjusts CPU frequency to save energy with a small impact to performance; name will likely change to "efficient_freq" in the future.
    • Added region runtime reporting to traces and Region objects based on the average execution time of a region by all of the ranks on a node.
    • Added a method to the Region object to give access to the telemetry time stamps to the decider.
    • Added online learning approach to energy efficient frequency decider.
    • Added support to geopmpy.launcher for launching with Intel(R) MPI's mpiexec.
    • Added option to plotter to use all samples or just epoch samples.
    • Modified the tutorials to enable use of the geopmpy launcher.
    • Improved tutorial Makefile to allow user override of GNU Make standard variables.
    • Added an RPM spec file for use with the OpenHPC distribution.
  • Updated and extended integration tests:
    • Moved Controller death test from the unit tests to the integration tests.
    • Added integration tests for pthread an application launch of the controller.
    • Added an isolated hardware test for RAPL power limit functionality.
    • Updated documentation: both man pages and doxygen have been reviewed and cleaned up.
  • Updated unit tests:
    • Added unit test for SubsetOptionParser.
    • Reduced dependence of unit tests on MPI runtime.
    • Removed MPIProfileTest unit test which is covered by integration tests, and not really a unit test.
    • Removed unused MPIControllerTest.
    • Removed MVAPICH2 Fortran tests.
  • Bug fixes:
    • Fixed broken build in tutorials (tutorial_region.c).
    • Fixed faulty argument parsing by the geopmpy launcher.
    • Fixed error reporting when using geopmpy with python 3.x.
    • Fixed issues with affinity when launching the controller as a pthread.
    • Fixed issue in passing power budgets down a multi-level tree.
    • Fixed issue in platform choice when head node architecture differs from the compute nodes.
    • Fixed broken build if --disable-doc configuration option is passed.
    • Fixed decider setup code to correctly propagate power bounds down tree.
    • Fixed the way RAPL time window is set.
    • Fixed the use of cached data by geopmpy.plotter.
    • Fixed integration test issues related to systems with multiple cluster node partitions.
    • Fixed process CPU affinity implementation (don't use hwloc) and added unit tests for this.
    • Fixed potential overflow issue with error messages in PlatformImp.cpp.
    • Fixed race in SharedMemory test.
    • Fixed markup patch for MiniFE.
    • Fixed launcher when user explicitly requests OMP_NUM_THREADS=1.
    • Fixed MPIInterfaceTests so it uses only mocked MPI interfaces, and does not explicitly require MPI.
    • Fixed memory leaks in GlobalPolicy.
    • Fixed linking order of libgeopm and libmpi.
    • Fixed non-performance mode integration test launcher.
    • Fixed issue where libgeopmpolicy had false dependence on OMPT.cpp
    • Fixed rpm Makefile target to avoid the rpmbuild -t option to avoid trying to use the OpenHPC spec file.
    • Fixed issue where platform topology could be determined from nodes other than the ones that run the job.
    • Fixed Intel(R) MPI launcher's use of host files and the --ppn CLI.
    • Fixed incompatibility between MVAPICH2 affinity and srun affinity.
    • Fixed test_progress_exit integration test to account for extrapolation error.
    • Fixed integration test for MPI time accounting.
    • Fixed launcher problem when node is listed in multiple queues by sinfo.
    • Fixed and improved affinity assignment in corner cases.
    • Fixed use of sched_getcpu() for Mac OS X.

v0.3.0

6 years ago