Datawizard Versions Save

Magic potions to clean and transform your data 🧙

v0.8.0

10 months ago

BREAKING CHANGES

  • The following re-exported functions from {insight} have now been removed: object_has_names(), object_has_rownames(), is_empty_object(), compact_list(), compact_character().

  • Argument na.rm was renamed to remove_na throughout {datawizard} functions. na.rm is kept for backward compatibility, but will be deprecated and later removed in future updates.

  • The way expressions are defined in data_filter() was revised. The filter argument was replaced by ..., allowing to separate multiple expression with a comma (which are then combined with &). Furthermore, expressions can now also be defined as strings, or be provided as character vectors, to allow string-friendly programming.

CHANGES

  • Weighted-functions (weighted_sd(), weighted_mean(), ...) gain a remove_na argument, to remove or keep missing and infinite values. By default, remove_na = TRUE, i.e. missing and infinite values are removed by default.

  • reverse_scale(), normalize() and rescale() gain an append argument (similar to other data frame methods of transformation functions), to append recoded variables to the input data frame instead of overwriting existing variables.

NEW FUNCTIONS

  • rowid_as_column() to complement rownames_as_column() (and to mimic tibble::rowid_to_column()). Note that its behavior is different from tibble::rowid_to_column() for grouped data. See the Details section in the docs.

  • data_unite(), to merge values of multiple variables into one new variable.

  • data_separate(), as counterpart to data_unite(), to separate a single variable into multiple new variables.

  • data_modify(), to create new variables, or modify or remove existing variables in a data frame.

MINOR CHANGES

  • to_numeric() for variables of type Date, POSIXct and POSIXlt now includes the class name in the warning message.

  • Added a print() method for center(), standardize(), normalize() and rescale().

BUG FIXES

  • standardize_parameters() now works when the package namespace is in the model formula (#401).

  • data_merge() no longer yields a warning for tibbles when join = "bind".

  • center() and standardize() did not work for grouped data frames (of class grouped_df) when force = TRUE.

  • The data.frame method of describe_distribution() returns NULL instead of an error if no valid variable were passed (for example a factor variable with include_factors = FALSE) (#421).

v0.7.1

1 year ago

BREAKING CHANGES

  • add_labs() was renamed into assign_labels(). Since add_labs() existed only for a few days, there will be no alias for backwards compatibility.

NEW FUNCTIONS

  • labels_to_levels(), to use value labels of factors as their levels.

MINOR CHANGES

  • data_read() now checks if the imported object actually is a data frame (or coercible to a data frame), and if not, no longer errors, but gives an informative warning of the type of object that was imported.

BUG FIXES

  • Fix test for CRAN check on Mac OS arm64

v0.7.0

1 year ago

BREAKING CHANGES

  • In selection patterns, expressions like -var1:var3 to exclude all variables between var1 and var3 are no longer accepted. The correct expression is -(var1:var3). This is for 2 reasons:

    • to be consistent with the behavior for numerics (-1:2 is not accepted but -(1:2) is);
    • to be consistent with dplyr::select(), which throws a warning and only uses the first variable in the first expression.

NEW FUNCTIONS

  • recode_into(), similar to dplyr::case_when(), to recode values from one or more variables into a new variable.

  • mean_sd() and median_mad() for summarizing vectors to their mean (or median) and a range of one SD (or MAD) above and below.

  • data_write() as counterpart to data_read(), to write data frames into CSV, SPSS, SAS, Stata files and many other file types. One advantage over existing functions to write data in other packages is that labelled (numeric) data can be converted into factors (with values labels used as factor levels) even for text formats like CSV and similar. This allows exporting "labelled" data into those file formats, too.

  • add_labs(), to manually add value and variable labels as attributes to variables. These attributes are stored as "label" and "labels" attributes, similar to the labelled class from the haven package.

MINOR CHANGES

  • data_rename() gets a verbose argument.
  • winsorize() now errors if the threshold is incorrect (previously, it provided a warning and returned the unchanged data). The argument verbose is now useless but is kept for backward compatibility. The documentation now contains
    details about the valid values for threshold (#357).
  • In all functions that have arguments select and/or exclude, there is now one warning per misspelled variable. The previous behavior was to have only one warning.
  • Fixed inconsistent behaviour in standardize() when only one of the arguments center or scale were provided (#365).
  • unstandardize() and replace_nan_inf() now work with select helpers (#376).
  • Added informative warning and error messages to reverse(). Furthermore, the docs now describe the range argument more clearly (#380).
  • unnormalize() errors with unexpected inputs (#383).

BUG FIXES

  • empty_columns() (and therefore remove_empty_columns()) now correctly detects columns containing only NA_character_ (#349).
  • Select helpers now work in custom functions when argument is called select (#356).
  • Fix unexpected warning in convert_na_to() when select is a list (#352).
  • Fixed issue with correct labelling of numeric variables with more than nine unique values and associated value labels.

v0.6.5

1 year ago

MAJOR CHANGES

  • Etienne Bacher is the new maintainer.

MINOR CHANGES

  • standardize(), center(), normalize() and rescale() can be used in model formulas, similar to base::scale().

  • data_codebook() now includes the proportion for each category/value, in addition to the counts. Furthermore, if data contains tagged NA values, these are included in the frequency table.

BUG FIXES

  • center(x) now works correctly when x is a single value and either reference or center is specified (#324).

  • Fixed issue in data_codebook(), which failed for labelled vectors when values of labels were not in sorted order.

0.6.4

1 year ago

NEW FUNCTIONS

  • data_codebook(): to generate codebooks of data frames.

  • New functions to deal with duplicates: data_duplicated() (keep all duplicates, including the first occurrence) and data_unique() (returns the data, excluding all duplicates except one instance of each, based on the selected method).

MINOR CHANGES

  • .data.frame methods should now preserve custom attributes.

  • The include_bounds argument in normalize() can now also be a numeric value, defining the limit to the upper and lower bound (i.e. the distance to 1 and 0).

  • data_filter() now works with grouped data.

BUG FIXES

  • data_read() no longer prints message for empty columns when the data actually had no empty columns.

  • data_to_wide() now drops columns that are not in id_cols (if specified), names_from, or values_from. This is the behaviour observed in tidyr::pivot_wider().

0.6.3

1 year ago

MAJOR CHANGES

  • There is a new publication about the {datawizard} package: https://joss.theoj.org/papers/10.21105/joss.04684

  • Fixes failing tests due to changes in R-devel.

  • data_to_long() and data_to_wide() have had significant performance improvements, sometimes as high as a ten-fold speedup.

MINOR CHANGES

  • When column names are misspelled, most functions now suggest which existing columns possibly could be meant.

  • Miscellaneous performance gains.

  • convert_to_na() now requires argument na to be of class 'Date' to convert specific dates to NA. For example, convert_to_na(x, na = "2022-10-17") must be changed to convert_to_na(x, na = as.Date("2022-10-17")).

BUG FIXES

  • data_to_long() and data_to_wide() now correctly keep the date format.

0.6.2

1 year ago

BREAKING CHANGES

  • Methods for grouped data frames (.grouped_df) no longer support dplyr::group_by() for {dplyr} before version 0.8.0.

  • empty_columns() and remove_empty_columns() now also remove columns that contain only empty characters. Likewise, empty_rows() and remove_empty_rows() remove observations that completely have missing or empty character values.

CHANGES

  • data_arrange() now works with data frames that were grouped using data_group() (#274).

  • data_read() gains a convert_factors argument, to turn off automatic conversion from numeric variables into factors.

0.6.1

1 year ago
  • Updates tests for upcoming changes in the {tidyselect} package (#267).

0.6.0

1 year ago

BREAKING CHANGES

  • The minimum needed R version has been bumped to 3.6.

  • Following deprecated functions have been removed:

    data_cut(), data_recode(), data_shift(), data_reverse(), data_rescale(), data_to_factor(), data_to_numeric()

  • New text_format() alias is introduced for format_text(), latter of which will be removed in the next release.

  • New recode_values() alias is introduced for change_code(), latter of which will be removed in the next release.

  • data_merge() now errors if columns specified in by are not in both datasets.

  • Using negative values in arguments select and exclude now removes the columns from the selection/exclusion. The previous behavior was to start the selection/exclusion from the end of the dataset, which was inconsistent with the use of "-" with other selecting possibilities.

NEW FUNCTIONS

  • data_peek(): to peek at values and type of variables in a data frame.

  • coef_var(): to compute the coefficient of variation.

CHANGES

  • data_filter() will give more informative messages on malformed syntax of the filter argument.

  • It is now possible to use curly brackets to pass variable names to data_filter(), like the following example. See examples section in the documentation of data_filter().

  • The regex argument was added to functions that use select-helpers and did not already have this argument.

  • Select helpers starts_with(), ends_with(), and contains() now accept several patterns, e.g starts_with("Sep", "Petal").

  • Arguments select and exclude that are present in most functions have been improved to work in loops and in custom functions. For example, the following code now works:

foo <- function(data) {
  i <- "Sep"
  find_columns(data, select = starts_with(i))
}
foo(iris)

for (i in c("Sepal", "Sp")) {
  head(iris) |>
    find_columns(select = starts_with(i)) |>
    print()
}
  • There is now a vignette summarizing the various ways to select or exclude variables in most {datawizard} functions.

0.5.1

1 year ago
  • Fixes tests for {poorman} update