EQcorrscan Versions Save

Earthquake detection and analysis in Python.

0.5.0

4 months ago

This release represents a significant increase in efficiency in large-scale matched-filters in EQcorrscan. Lots of work has gone in to reducing memory usage in the non-correlation components of the matched-filter workflow, streamlining the code, making better use of shared memory multi-threaded parallelism and increasing CPU loads. In our testing we can now achieve and maintain >190% CPU efficiency (e.g. >95% hyperthreaded performance). We can also better load GPUs by making use of concurrent CPU and GPU processing of workflow steps. You should not need to change your code to make use of most of these speed-ups. Hopefully you will notice that you can run larger datasets faster than even!

Changelog

core.match_filter.tribe
- Significant re-write of detect logic to take advantage of parallel steps (see #544)
- Significant re-structure of hidden functions.
core.match_filter.matched_filter
- 5x speed up for MAD threshold calculation with parallel (threaded) MAD calculation (#531).
core.match_filter.detect
- 1000x speedup for retrieving unique detections for all templates.
- 30x speedup in handling detections (50x speedup in selecting detections, 4x speedup in adding prepick time)
core.match_filter.template
- new quick_group_templates function for 50x quicker template grouping.
- Templates with nan channels will be considered equal to other templates with shared nan channels.
- New grouping strategy to minimise nan-channels - templates are grouped by similar seed-ids. This should speed up both correlations and prep_data_for_correlation. See PR #457.
utils.pre_processing
- _prep_data_for_correlation: 3x speedup for filling NaN-traces in templates
- New function ``quick_trace_select` for a very efficient selection of trace by seed ID without wildcards (4x speedup).
- process, dayproc and shortproc replaced by multi_process. Deprecation warning added.
- multi_process implements multithreaded GIL-releasing parallelism of slow sections (detrending, resampling and filtering) of the processing workflow. Multiprocessing is no longer supported or needed for processing. See PR #540 for benchmarks. New approach is slightly faster overall, and significantly more memory efficeint (uses c. 6x less memory than old multiprocessing approach on a 12 core machine)
utils.correlate
- 25 % speedup for _get_array_dicts with quicker access to properties.
utils.catalog_to_dd
- _prepare_stream
  - Now more consistently slices templates to length = extract_len * samp_rate so that user receives less warnings about insufficient data.
- write_correlations
  - New option use_shared_memory to speed up correlation of many events by ca. 20 % by moving trace data into shared memory.
  - Add ability to weight correlations by raw correlation rather than just correlation squared.
utils.cluster.decluster_distance_time
- Bug-fix: fix segmentation fault when declustering more than 46340 detections with hypocentral_separation.

0.5.0rc0

5 months ago

Release candidate for version 0.5.0

0.4.4

1 year ago

EQcorrscan 0.4.4:

Changelog

core.match_filter
- Bug-fix: peak-cores could be defined twice in _group_detect through kwargs. Fix: only update peak_cores if it isn't there already.
core.match_filter.tribe

Detect now allows passing of pre-processed data

core.match_filter.template

Remove duplicate detections from overlapping windows using ._uniq()

core.lag_calc._xcorr_interp

CC-interpolation replaced with resampling (more robust), old method deprecated. Use new method with use_new_resamp_method=True as **kwarg.

core.lag_calc:

Fixed bug where minimum CC defined via min_cc_from_mean_cc_factor was not set correctly for negative correlation sums.

utils.correlate

Fast Matched Filter now supported natively for version >= 1.4.0
Only full correlation stacks are returned now (e.g. where fewer than than the full number of channels are in the stack at the end of the stack, zeros are returned).

utils.mag_calc.relative_magnitude

fixed bug where S-picks / traces were used for relative-magnitude calculation against user's choice.
implemented full magnitude bias-correction for CC and SNR

utils.mag_calc.relative_amplitude:

returns dicts for SNR measurements

utils.catalog_to_dd.write_correlations

Fixed bug on execution of parallel execution.
Added parallel-options for catalog-dt measurements and for stream-preparation before cross correlation-dt measurements.
Default parallelization of dt-computation is now across events (loads CPUs more efficiently), and there is a new option ``max_trace_workers` to use the old parallelization strategy across traces.
Now includes all_horiz-option that will correlate all matching horizontal channels no matter to which of these the S-pick is linking.

utils.clustering

Allow to handle indirect comparison of event-waveforms when (i.e., events without matching traces which can be compared indirectly via a third event)
Allows to set clustering method, metric, and sort_order from scipy.cluster.hierarchy.linkage.

tribe, template, template_gen, archive_read, clustering: remove option to read from seishub (deprecated in obspy).

0.4.4rc0

1 year ago

Release candidate 0 for release 0.4.4.

core.match_filter
- Bug-fix: peak-cores could be defined twice in _group_detect through kwargs. Fix: only update peak_cores if it isn't there already.
core.match_filter.tribe

Detect now allows passing of pre-processed data

core.match_filter.template

Remove duplicate detections from overlapping windows using ._uniq()

core.lag_calc._xcorr_interp

CC-interpolation replaced with resampling (more robust), old method deprecated. Use new method with use_new_resamp_method=True as **kwarg.

core.lag_calc:

Fixed bug where minimum CC defined via min_cc_from_mean_cc_factor was not set correctly for negative correlation sums.

utils.correlate

Fast Matched Filter now supported natively for version >= 1.4.0
Only full correlation stacks are returned now (e.g. where fewer than than the full number of channels are in the stack at the end of the stack, zeros are returned).

utils.mag_calc.relative_magnitude

fixed bug where S-picks / traces were used for relative-magnitude calculation against user's choice.
implemented full magnitude bias-correction for CC and SNR

utils.mag_calc.relative_amplitude:

returns dicts for SNR measurements

utils.catalog_to_dd.write_correlations

Fixed bug on execution of parallel execution.
Added parallel-options for catalog-dt measurements and for stream-preparation before cross correlation-dt measurements.
Default parallelization of dt-computation is now across events (loads CPUs more efficiently), and there is a new option ``max_trace_workers` to use the old parallelization strategy across traces.
Now includes all_horiz-option that will correlate all matching horizontal channels no matter to which of these the S-pick is linking.

utils.clustering

Allow to handle indirect comparison of event-waveforms when (i.e., events without matching traces which can be compared indirectly via a third event)
Allows to set clustering method, metric, and sort_order from scipy.cluster.hierarchy.linkage.

tribe, template, template_gen, archive_read, clustering: remove option to read from seishub (deprecated in obspy).

0.4.3

3 years ago

Changelog

core.match_filter
- match_filter:
  - Provide option of exporting the cross-correlation sums for additional later analysis.
core.match_filter.party.write
- BUG-FIX: When format='tar' is selected, added a check for .tgz-file suffix before checking the filename against an existing file. Previously, when a filename without '.tgz'-suffix was supplied, then the file was overwritten against the function's intention.
- Add option overwrite=True to allow overwriting of existing files.
core.match_filter.party.read
- BUG-FIX: Ensure wildcard reading works as expected: #453
core.match_filter.party.rethreshold:
- added option to rethreshold based on absolute values to keep relevant detections with large negative detect_val.
core.lag_calc:
- Added option to set minimum CC threshold individually for detections based on: min(detect_val / n_chans * min_cc_from_mean_cc_factor, min_cc).
- Added the ability of saving correlation data of the lag_calc.
utils.mag_calc.calc_b_value:
- Added useful information to doc-string regarding method and meaning of residuals
- Changed the number of magnitudes used to an int (from a string!?)
utils.mag_calc.relative_magnitude:
- Refactor so that min_cc is used regardless of whether weight_by_correlation is set. See issue #455.
utils.archive_read
- Add support for wildcard-comparisons in the list of requested stations and channels.
- New option arctype='SDS' to read from a SeisComp Data Structure (SDS). This option is also available in utils.clustering.extract_detections and in utils.archive_read._check_available_data.
utils.catalog_to_dd
- Bug-fixes in #424:
  - only P and S phases are used now (previously spurious amplitude picks were included in correlations);
  - Checks for length are done prior to correlations and more helpful error outputs are provided.
  - Progress is not reported within dt.cc computation
- write_station now supports writing elevations: #424.
utils.clustering
- For cluster, distance_matrix and cross_chan_correlation, implemented full support for shift_len != 0. The latter two functions now return, in addition to the distance-matrix, a shift-matrix (both functions) and a shift-dictionary (for distance_matrix). New option for shifting streams as a whole or letting traces shift individually (allow_individual_trace_shifts=True).
utils.plotting
- Function added (twoD_seismplot) for plotting seismicity (#365).

0.4.3rc0

3 years ago

Changelog

core.match_filter
- match_filter:
  - Provide option of exporting the cross-correlation sums for additional later analysis.
core.match_filter.party.write
- BUG-FIX: When format='tar' is selected, added a check for .tgz-file suffix before checking the filename against an existing file. Previously, when a filename without '.tgz'-suffix was supplied, then the file was overwritten against the function's intention.
- Add option overwrite=True to allow overwriting of existing files.
core.match_filter.party.read
- BUG-FIX: Ensure wildcard reading works as expected: #453
core.match_filter.party.rethreshold:
- added option to rethreshold based on absolute values to keep relevant detections with large negative detect_val.
core.lag_calc:
- Added option to set minimum CC threshold individually for detections based on: min(detect_val / n_chans * min_cc_from_mean_cc_factor, min_cc).
- Added the ability of saving correlation data of the lag_calc.
utils.mag_calc.calc_b_value:
- Added useful information to doc-string regarding method and meaning of residuals
- Changed the number of magnitudes used to an int (from a string!?)
utils.mag_calc.relative_magnitude:
- Refactor so that min_cc is used regardless of whether weight_by_correlation is set. See issue #455.
utils.archive_read
- Add support for wildcard-comparisons in the list of requested stations and channels.
- New option arctype='SDS' to read from a SeisComp Data Structure (SDS). This option is also available in utils.clustering.extract_detections and in utils.archive_read._check_available_data.
utils.catalog_to_dd
- Bug-fixes in #424:
  - only P and S phases are used now (previously spurious amplitude picks were included in correlations);
  - Checks for length are done prior to correlations and more helpful error outputs are provided.
  - Progress is not reported within dt.cc computation
- write_station now supports writing elevations: #424.
utils.clustering
- For cluster, distance_matrix and cross_chan_correlation, implemented full support for shift_len != 0. The latter two functions now return, in addition to the distance-matrix, a shift-matrix (both functions) and a shift-dictionary (for distance_matrix). New option for shifting streams as a whole or letting traces shift individually (allow_individual_trace_shifts=True).
utils.plotting
- Function added (twoD_seismplot) for plotting seismicity (#365).

0.4.2

3 years ago

A Python package for the detection and analysis of repeating and near-repeating seismicity.

Changelog

Add seed-ids to the _spike_test's message.
utils.correlation
- Cross-correlation normalisation errors no-longer raise an error
- When "out-of-range" correlations occur a warning is given by the C-function with details of what channel, what template and where in the data vector the issue occurred for the user to check their data.
- Out-of-range correlations are set to 0.0
- After extensive testing these errors have always been related to data issues within regions where correlations should not be computed (spikes, step artifacts due to incorrectly padding data gaps).
- USERS SHOULD BE CAREFUL TO CHECK THEIR DATA IF THEY SEE THESE WARNINGS
utils.mag_calc.amp_pick_event
- Added option to output IASPEI standard amplitudes, with static amplification of 1 (rather than 2080 as per Wood Anderson specs).
- Added filter_id and method_id to amplitudes to make these methods more traceable.
core.match_filter
- Bug-fix - cope with data that are too short with ignore_bad_data=True. This flag is generally not advised, but when used, may attempt to trim all data to zero length. The expected behaviour is to remove bad data and run with the remaining data.
- Party:
  - decluster now accepts a hypocentral_separation argument. This allows the inclusion of detections that occur close in time, but not in space. This is underwritten by a new findpeaks.decluster_dist_time function based on a new C-function.
- Tribe:
  - Add monkey-patching for clients that do not have a get_waveforms_bulk method for use in .client_detect. See issue #394.
utils.pre_processing
- Only templates that need to be reshaped are reshaped now - this can be a lot faster.

0.4.2rc0

3 years ago

Pre-release for 0.4.2 for testing on conda-forge

0.4.1

4 years ago

A Python package for the detection and analysis of repeating and near-repeating seismicity.

Changelog

core.match_filter
- BUG-FIX: Empty families are no longer run through lag-calc when using Party.lag_calc(). Previously this resulted in a "No matching data" error, see #341.
core.template_gen
- BUG-FIX: Fix bug where events were incorrectly associated with templates in Tribe().construct() if the given catalog contained events outside of the time-range of the stream. See issue #381 and PR #382.
utils.catalog_to_dd
- Added ability to turn off parallel processing (this is turned off by default now) for write_correlations - parallel processing for moderate to large datasets was copying far too much data and using lots of memory. This is a short-term fix - ideally we will move filtering and resampling to C functions with shared-memory parallelism and GIL releasing. See PR #374.
- Moved parallelism for _compute_dt_correlations to the C functions to reduce memory overhead. Using a generator to construct sub-catalogs rather than making a list of lists in memory. See issue #361.
utils.mag_calc:
- amp_pick_event now works on a copy of the data by default
- amp_pick_event uses the appropriate digital filter gain to correct the applied filter. See issue #376.
- amp_pick_event rewritten for simplicity.
- amp_pick_event now has simple synthetic tests for accuracy.
- _sim_wa uses the full response information to correct to velocity this includes FIR filters (previously not used), and ensures that the wood-anderson poles (with a single zero) are correctly applied to velocity waveforms.
- calc_max_curv is now computed using the non-cumulative distribution.
Some problem solved in _match_filter_plot. Now it shows all new detections.
Add plotdir to eqcorrscan.core.lag_calc.lag_calc function to save the images.

0.4.1rc0

4 years ago

Pre-release for 0.4.1