Madminer Versions Save

Machine learning–based inference toolkit for particle physics

v0.4.5

4 years ago

New features

  • Histograms in AsymptoticLimits now by default use one set of weighted events for all parameter points, reweighting them appropriately

API / breaking changes

  • AsymptoticLimits functions keyword n_toys_per_theta renamed to n_histo_toys

Internal changes

  • Refactored HDF5 file access

v0.4.4

4 years ago

New features

  • More options for some plotting functions

Bug fixes

  • Fixed AsymptoticLimits functions for LikelihoodEstimator instances
  • Fixed bug in histogram_of_fisher_information()
  • Fixed bug in particle smearing caused by floating point precision by changing < to <=
  • Increased Python 2 compatibility

v0.4.3

4 years ago

Bug fixes:

  • Fixed wrong results and crazy run time in AsymptoticLimits when more than one parameter batch was used
  • Fixed wrong factor of 1 / test_split in histogram_of_fisher_information()
  • Fixed AsymptoticLimits not accepting LikelihoodEstimator instances to calculate the kinematic likelihood

v0.4.2

4 years ago

New features:

  • AsymptoticLimits functions are more memory efficient. New keyword histo_theta_batchsize sets number of parameter points for which histogram data is loaded in parallel.
  • New keyword n_observables for AsymptoticLimits.observed_limits() allows overwriting the number of events (= rescaling the log likelihood).

Bug fixes:

  • Fixed bugs that caused the training of DoubleParameterizedRatioEstimator and LikelihoodEstimator instances to crash
  • Added more sanity checks when evaluating observables, both in LHEReader and DelphesReader

v0.4.1

4 years ago

New features:

  • LHEReader.add_observable_from_function() now expects functions with signature observable(truth_particles, leptons, photons, jets, met), where truth_particles are the original particles in the LHE file, and the other arguments have smearing applied to them.
  • New dof keyword in limit functions to overwrite degrees of freedom
  • Trimmed sampling to remove events with largest weights

Breaking / API changes:

  • Changed construction of parameter grid in limit functions to numpy's "ij" mode to make 2d and >2d cases more consistent

Bug fixes:

  • Fixed bugs with efficiencies in LHEReader

v0.4.0

5 years ago

New features:

  • Smarter sampling: MadMiner now keeps track of which events where generated (sampled) from which benchmark point (at the MadGraph stage). The new keyword sample_only_from_closest_benchmark in the SampleAugmenter functions and plot_distributions() then allows the user to restrict the unweighting / resampling at some parameter point to events from the closest benchmark point. This can significantly reduce the weights of individual events and thus reduce the variance.

API / breaking changes:

  • k-factors are now automatically added when there are subsamples generated at different benchmarks. For instance, if we add a sample with 30k events generated at theta0 and a sample with 70k events generated at theta1, and calculate cross sections from the full sample, MadMiner will automatically apply a k-factor of 0.3 and 0.7 to the two samples.

Bug fixes:

  • Various small bug fixes, mostly related to nuisance parameters

Internal changes:

  • More information stored in MadMiner HDF5 file

v0.3.1

5 years ago

New features:

  • Fisher information in histograms can be calculated for a custom binning
  • MET noise in LHEReader
  • Number of toys for AsymptoticLimits histograms are now a keyword and not limited to 1k

Bug fixes:

  • Fixed wrong cross sections in AsymptoticLimits
  • Fixed unzipping of event samples to support older versions of gunzip
  • Various smaller bug fixes

Tutorials and documentation:

  • Restructured tutorials

Internal changes:

  • Improved automatic testing of likelihood ratio estimation

v0.3.0

5 years ago

New features:

  • Nuisance parameters for ratio-based methods! This required a major refactoring of madminer.sampling.
  • New madminer.limits submodule that calculates expected and observed p-values.
  • SALLY can profile the estimated score over nuisance parameters (then its output is n-dimensional rather than (n+k)-dimensional, where n is the number of parameters of interest and k is the number of nuisance parameters).
  • New madminer.analysis submodule with a generic DataAnalyzer class that can be used to access the general setup, weighted events, and cross sections. The classes SampleAugmenter, FisherInformation and AsymptoticLimits subclass it, leading to a more unified interface.
  • Sampling speed-up with parallelization when using n_processes>1 for any sampling call.

Breaking / API changes:

  • New file format for trained neural networks (only the *.json files are different) -- old models won't load
  • The one-size-fits-all class MLForge is replaced by four different classes ParameterizedRatioEstimator, DoubleParameterizedRatioEstimator, LikelihoodEstimator, and ScoreEstimator.
  • EnsembleForge is now Ensemble, with renamed functions and less clutter (at the cost of some unimportant functionality)
  • Renaming: DelphesProcessor -> DelphesReader, LHEProcessor -> LHEReader
  • madminer.morphing now lives in madminer.utils.morphing, and the Morpher class was renamed PhysicsMorpher (since we also have a NuisanceMorpher class)
  • The madminer.sampling has changed in a number of ways. The high-level functions have been renamed: extract_samples_train_plain() -> sample_train_plain(), extract_samples_train_local() -> sample_train_local(), extract_samples_train_global() -> train_samples_density(), extract_samples_train_ratio() -> sample_train_ratio(), extract_samples_train_more_ratios() -> sample_train_more_ratios(), extract_samples_test() -> sample_test(), extract_cross_sections() -> cross_sections(). In addition to the physical parameters theta, they all now take descriptions of the nuisance parameters nu as input argument, for which new helper functions exist.
  • The helper functions in madminer.sampling are also different: constant_benchmark_theta() -> benchmark(), multiple_benchmark_thetas() -> benchmarks(), constant_morphing_theta() -> morphing_point(), multiple_morphing_thetas() -> morphing_points, random_morphing_thetas() -> random_morphing_points()
  • New default evaluation split of 0.2 (so with the defaults, there will be 60% train, 20% validation, 20% test events)

Internal changes:

  • madminer.ml largely rewritten
  • madminer.sampling largely rewritten
  • madminere.utils.analysis got removed, the old functions are now split between madminer.analysis and madminer.sampling

v0.2.8

5 years ago

Breaking / API changes:

  • The keyword trainer in MLForge.train() has been renamed to optimizer
  • The gradient of network output wrt x is no longer calculated, the corresponding keywords in MLForge.evaluate() and MLForge.train() are gone
  • MLForge has new methods evaluate_log_likelihood(), evaluate_log_likelihood_ratio(), and evaluate_score(); MLForge.evaluate() just wraps around them now

Bug fixes:

  • Fixed a number of bugs affecting neural density estimation methods ("NDE" and "SCANDAL")
  • Fixed bugs when trying to use CASCAL

Internal changes:

  • Rewrote training procedure from scratch
  • Some more refactoring of the ML parts

v0.2.7

5 years ago

New features:

  • b and tau tags, can be used for cuts and observables
  • New plot_uncertainty() function
  • More options for plot_distributions()

Breaking / API changes:

  • In MLForge.evaluate() and EnsembleForge.evaluate(), the keyword x_filename is replaced by x, which supports ndarrays as an alternative to filenames

Documentation and examples:

  • Improved logging output
  • New test of nuisance setup and Fisher information calculation

Bug fixes:

  • Fixed MadMiner crashing with stable W bosons in LHE files
  • Fixed calculation of MET in LHEProcessor when visible particle pTs are smeared
  • Fixed range calculation for plot_distributions()