Naniar Versions Save

Tidy data structures, summaries, and visualisations for missing data

v1.1.0

2 months ago

New

Implement impute_fixed, impute_zero, and impute_factor. notably these do not implement "scoped variants" which were previously implemented - for example, impute_fixed_if etc. This is in favour of using the new across workflow within dplyr, and it is easier to maintain. #261
Add digit argument to miss_var_summary to help display %missing data correctly when there is a very small fraction of missingness. #284
Implemented impute_mode - resolves #213.
geom_miss_point() works with shape argument #290
Fix bug with all_complete, which was implemented as !anyNA(x) but should be all(complete.cases(x)).
Correctly implement any_na() (and any_miss()) and any_complete(). Rework examples to demonstrate workflow for finding complete variables.

Bug fixes

Fix bug with shadow_long not working when gathering variables of mixed type. Fix involves specifying a value transform, which defaults to character. #314
Implement Date, POSIXct and POSIXlt methods for impute_below() - #158
Provide replace_na_with, a complement to replace_with_na - #129
Fix bug with gg_miss_fct where it used a deprecated function from forcats - #342

Misc

Use cli::cli_abort and cli::cli_warn instead of stop and warn (#326)
Use expect_snapshot instead of expect_error (#326)

Changes

Soft deprecated shadow_shift - #193
Soft deprecate miss_case_cumsum() and miss_var_cumsum() - #257

v1.0.0

1 year ago

Version 1.0.0 of naniar is to signify that this release is associated with the publication of the associated JSS paper, doi:10.18637/jss.v105.i07. There are also a few small changes that have been implemented in this release, which are described below.

There is still a lot to do in naniar, and this release does not signify that there are no changes upcoming, more so to establish that this is a stable release, and that any changes upcoming will go through a more formal deprecation process and so on.

New

The DOI in the CITATION is for a new JSS publication that will be registered after publication on CRAN.
Replaced tidyr::gather with tidyr::pivot_longer - resolves #301
added set_n_miss and set_prop_miss functions - resolved #298

Bug Fixes

Fix bug in gg_miss_var() where a warning appears to due change in how to remove legend #288.

Misc

Removed gdtools from naniar as no longer needed 302.
added imports, vctrs and cli - which are both free dependencies as they are used within the already used tidyverse already.

0.6.0

3 years ago

naniar 0.6.0 (2020/08/17) "Spur of the lamp post"

Provide warning for replace_with_na when columns provided that don't exist (see #160). Thank you to michael-dewar for their help with this.

Breaking Changes

Drop the "nabular" and "shadow" classes (#268) used in nabular() and bind_shadow(). In doing so removes the functions, as_shadow(), is_shadow(), is_nabular(), new_nabular(), new_shadow(). These were mostly used internally and it is not expected that users would have used this functions. If these were used, please file an issue and I can implement them again.

0.5.2

3 years ago

naniar 0.5.2 (2020/06/28) "Silver Apple"

Minor Changes

Improvements to code in miss_var_summary(), miss_var_table(), and prop_miss_var(), resulting in a 3-10x speedup.

0.5.1

3 years ago

naniar 0.5.1 (2020/04/10) "Uncle Andrew's Applewood Wardrobe"

Minor Changes

Fixes warnings and errors from tibble and subsequent downstream impacts on simputation.

0.5.0

4 years ago

naniar 0.5.0 (2020/02/20) "The End of this Story and the Beginning of all of the Others"

Breaking Changes

The following functions related to calculating the proportion/percentage of missingness were made Defunct and will no longer work:
- miss_var_prop()
- complete_var_prop()
- miss_var_pct()
- complete_var_pct()
- miss_case_prop()
- complete_case_prop()
- miss_case_pct()
- complete_case_pct()

Instead use: prop_miss_var(), prop_complete_var(), pct_miss_var(), pct_complete_var(), prop_miss_case(), prop_complete_case(), pct_miss_case(), pct_complete_case(). (see 242)

replace_to_na() was made defunct, please use replace_with_na() instead. (see 242)

Minor changes

miss_var_cumsum and miss_case_cumsum are now exported
use map_dfc instead of map_df
Fix various extra warnings and improve test coverage

Bug Fixes

Address bug where the number of missings in a row is not calculated properly - see 238 and 232. The solution involved using rowSums(is.na(x)), which was 3 times faster.
Resolve bug in gg_miss_fct() where warning is given for non explicit NA values - see 241.
skip vdiffr tests on github actions
use tibble() not data_frame()

0.4.2

5 years ago

Improvements

The geom_miss_point() ggplot2 layer can now be converted into an interactive web-based version by the ggplotly() function in the plotly package. In order for this to work, naniar now exports the geom2trace.GeomMissPoint() function (users should never need to call geom2trace.GeomMissPoint() directly -- ggplotly() calls it for you).
adds WORDLIST for spelling thanks to usethis::use_spell_check()
fix documentation @seealso bug (#228) (@sfirke)

Dependency fixes

Thanks to a PR (#223) from @romainfrancois:
- This fixes two problems that were identified as part of reverse dependency checks of dplyr 0.8.0 release candidate. https://github.com/tidyverse/dplyr/blob/revdep_dplyr_0_8_0_RC/revdep/problems.md#naniar
- n() must be imported or prefixed like any other function. In the PR, I've changed 1:n() to dplyr::row_number() as naniar seems to prefix all dplyr functions.
- update_shadow was only restoring the class attributes, changed so that it restores all attributes, this was causing problems when data was a grouped_df. This likely was a problem before too, but dplyr 0.8.0 is stricter about what is a grouped data frame.

0.4.1

5 years ago

Minor Change

Fixes to new_tibble #220 - Thanks to Kirill Müller.
Refactoring the capture of arguments from rlang #218 - thanks for Lionel Henry.

0.4.0

5 years ago

New Feature

Add custom label support for missings and not missings with functions add_label_missings and add_label_shadow() and add_any_miss(). So you can now do `add_label_missings(data, missing = "custom_missing_label", complete = "custom_complete_label")
impute_median() and scoped variants
any_shade() returns a logical TRUE or FALSE depending on if there are any shade values
nabular() an alias for bind_shadow() to tie the nabular term into the work.
is_nabular() checks if input is nabular.
geom_miss_point() now gains the arguments from shadow_shift()/impute_below() for altering the amount of jitter and proportion below (prop_below).
Added two new vignettes, "Exploring Imputed Values", and "Special Missing Values"
miss_var_summary and miss_case_summary now no longer provide the cumulative sum of missingness in the summaries - this summary can be added back to the data with the option add_cumsum = TRUE. #186

Added gg_miss_upset to replace workflow of:

data %>% 
  as_shadow_upset() %>%
  UpSetR::upset()

Major Change

recode_shadow now works! This function allows you to recode your missing values into special missing values. These special missing values are stored in the shadow part of the dataframe, which ends in _NA.
implemented shade where appropriate throughout naniar, and also added verifiers, is_shade, are_shade, which_are_shade, and removed which_are_shadow.

as_shadow and bind_shadow now return data of class shadow. This will feed into recode_shadow methods for flexibly adding new types of missing data.
Note that in the future shadow might be changed to nabble or something similar.

Minor feature

Functions add_label_shadow() and add_label_missings() gain arguments so you can only label according to the missingness / shadowy-ness of given variables.
new function which_are_shadow(), to tell you which values are shadows.
new function long_shadow(), which converts data in shadow/nabular form into a long format suitable for plotting. Related to #165
Added tests for miss_scan_count

Minor Changes

gg_miss_upset gets a better default presentation by ordering by the largest intersections, and also an improved error message when data with only 1 or no variables have missing values.
shadow_shift gains a more informative error message when it doesn't know the class.
Changed common_na_string to include escape characters for "?", "", "." so that if they are used in replacement or searching functions they don't return the wildcard results from the characters "?", "", and ".".
miss_case_table and miss_var_table now has final column names pct_vars, and pct_cases instead of pct_miss - fixes #178.

Breaking Changes

Deprecated old names of the scalar missingness summaries, in favour of a more consistent syntax #171. The old the and new are:

old_names	new_names
`miss_case_pct`	`pct_miss_case`
`miss_case_prop`	`prop_miss_case`
`miss_var_pct`	`pct_miss_var`
`miss_var_prop`	`prop_miss_var`
`complete_case_pct`	`pct_complete_case`
`complete_case_prop`	`prop_complete_case`
`complete_var_pct`	`pct_complete_var`
`complete_var_prop`	`prop_complete_var`

These old names will be made defunct in 0.5.0, and removed completely in 0.6.0.

impute_below has changed to be an alias of shadow_shift - that is it operates on a single vector. impute_below_all operates on all columns in a dataframe (as specified in #159)

Bug fix

Ensured that miss_scan_count actually return'd something.
gg_miss_var(airquality) now prints the ggplot - a typo meant that this did not print the plot

V0.3.1

5 years ago

This release is a patch to remove a package imported but not used.

Minor Change

This is a patch release that removes tidyselect from the package Imports, as it is unnecessary. Fixes #174 naniar_0.3.1.tar.gz