Datawizard Versions Save

Magic potions to clean and transform your data 🧙

0.5.0

1 year ago

MAJOR CHANGES

  • Following statistical transformation functions have been renamed to not have data_*() prefix, since they do not work exclusively with data frames, but are typically first of all used with vectors, and therefore had misleading names:

    • data_cut() -> categorize()
    • data_recode() -> change_code()
    • data_shift() -> slide()
    • data_reverse() -> reverse()
    • data_rescale() -> rescale()
    • data_to_factor() -> to_factor()
    • data_to_numeric() -> to_numeric()

    Note that these functions also have .data.frame() methods and still work for data frames as well. Former function names are still available as aliases, but will be deprecated and removed in a future release.

  • Bumps the needed minimum R version to 3.5.

  • Removed deprecated function data_findcols(). Please use its replacement, data_find().

  • Removed alias extract() for data_extract() function since it collided with tidyr::extract().

  • Argument training_proportion in data_partition() is deprecated. Please use proportion now.

  • Given his continued and significant contributions to the package, Etienne Bacher (@etiennebacher) is now included as an author.

  • unstandardise() now works for center(x)

  • unnormalize() now works for change_scale(x)

  • reshape_wider() now follows more consistently tidyr::pivot_wider() syntax. Arguments colnames_from, sep, and rows_from are deprecated and should be replaced by names_from, names_sep, and id_cols respectively. reshape_wider() also gains an argument names_glue (#182, #198).

  • Similarly, reshape_longer() now follows more consistently tidyr::pivot_longer() syntax. Argument colnames_to is deprecated and should be replaced by names_to. reshape_longer() also gains new arguments: names_prefix, names_sep, names_pattern, and values_drop_na (#189).

CHANGES

  • Some of the text formatting helpers (like text_concatenate()) gain an enclose argument, to wrap text elements with surrounding characters.

  • winsorize now accepts "raw" and "zscore" methods (in addition to "percentile"). Additionally, when robust is set to TRUE together with method = "zscore", winsorizes via the median and median absolute deviation (MAD); else via the mean and standard deviation. (@rempsyc, #177, #49, #47).

  • data_partition() now allows to create multiple partitions from the data, returning multiple training and a remaining test set.

  • Functions like center(), normalize() or standardize() no longer fail when data contains infinite values (Inf).

NEW FUNCTIONS

  • row_to_colnames() and colnames_to_row() to move a row to column names, and column names to row (@etiennebacher, #169).

BUG FIXES

  • Fixed wrong column names in data_to_wide() (#173).

0.4.1

2 years ago

BREAKING CHANGES

  • Added the standardize.default() method (moved from package effectsize), to be consistent in that the default-method now is in the same package as the generic. standardize.default() behaves exactly like in effectsize and particularly works for regression model objects. effectsize now re-exports standardize() from datawizard.

NEW FUNCTIONS

  • data_shift() to shift the value range of numeric variables.

  • data_recode() to recode old into new values.

  • data_to_factor() as counterpart to data_to_numeric().

  • data_tabulate() to create frequency tables of variables.

  • data_read() to read (import) data files (from text, or foreign statistical packages).

  • unnormalize() as counterpart to normalize(). This function only works for variables that have been normalized with normalize().

  • data_group() and data_ungroup() to create grouped data frames, or to remove the grouping information from grouped data frames.

CHANGES

  • data_find() was added as alias to find_colums(), to have consistent name patterns for the datawizard functions. data_findcols() will be removed in a future update and usage is discouraged.

  • The select argument (and thus, also the exclude argument) now also accepts functions testing for logical conditions, e.g. is.numeric() (or is.numeric), or any user-defined function that selects the variables for which the function returns TRUE (like: foo <- function(x) mean(x) > 3).

  • Arguments select and exclude now allow the negation of select-helpers, like -ends_with(""), -is.numeric or -Sepal.Width:Petal.Length.

  • Many functions now get a .default method, to capture unsupported classes. This now yields a message and returns the original input, and hence, the .data.frame methods won't stop due to an error.

  • The filter argument in data_filter() can also be a numeric vector, to indicate row indices of those rows that should be returned.

  • convert_to_na() gets methods for variables of class logical and Date.

  • convert_to_na() for factors (and data frames) gains a drop_levels argument, to drop unused levels that have been replaced by NA.

  • data_to_numeric() gains two more arguments, preserve_levels and lowest, to give better control of conversion of factors.

BUG FIXES

  • When logicals were passed to center() or standardize() and force = TRUE, these were not properly converted to numeric variables.

0.4.0

2 years ago

MAJOR CHANGES

  • data_match() now returns filtered data by default. Old behavior (returning rows indices) can be set by setting return_indices = TRUE.

  • The following functions are now re-exported from {insight} package: object_has_names(), object_has_rownames(), is_empty_object(), compact_list(), compact_character()

  • data_findcols() will become deprecated in future updates. Please use the new replacements find_columns() and get_columns().

  • The vignette Analysing Longitudinal or Panel Data has now moved to parameters package.

NEW FUNCTIONS

  • To convert rownames to a column, and vice versa: rownames_as_column() and column_as_rownames() (@etiennebacher, #80).

  • find_columns() and get_columns() to find column names or retrieve subsets of data frames, based on various select-methods (including select-helpers). These function will supersede data_findcols() in the future.

  • data_filter() as complement for data_match(), which works with logical expressions for filtering rows of data frames.

  • For computing weighted centrality measures and dispersion: weighted_mean(), weighted_median(), weighted_sd() and weighted_mad().

  • To replace NA in vectors and dataframes: convert_na_to() (@etiennebacher, #111).

MINOR CHANGES

  • The select argument in several functions (like data_remove(), reshape_longer(), or data_extract()) now allows the use of select-helpers for selecting variables based on specific patterns.

  • data_extract() gains new arguments to allow type-safe return values, i.e. always return a vector or a data frame. Thus, data_extract() can now be used to select multiple variables or pull a single variable from data frames.

  • data_match() gains a match argument, to indicate with which logical operation matching results should be combined.

  • Improved support for labelled data for many functions, i.e. returned data frame will preserve value and variable label attributes, where possible and applicable.

  • describe_distribution() now works with lists (@etiennebacher, #105).

  • data_rename() doesn't use pattern anymore to rename the columns if replacement is not provided (@etiennebacher, #103).

  • data_rename() now adds a suffix to duplicated names in replacement (@etiennebacher, #103).

BUG FIXES

  • data_to_numeric() produced wrong results for factors when dummy_factors = TRUE and factor contained missing values.

  • data_match() produced wrong results when data contained missing values.

  • Fixed CRAN check issues in data_extract() when more than one variable was extracted from a data frame.

0.3.0

2 years ago
  • New functions:

    • To find or remove empty rows and columns in a data frame: empty_rows(), empty_columns(), remove_empty_rows(), remove_empty_columns(), and remove_empty.

    • To check for names: object_has_names() and object_has_rownames().

    • To rotate data frames: data_rotate().

    • To reverse score variables: data_reverse().

    • To merge/join multiple data frames: data_merge() (or its alias data_join()).

    • To cut (recode) data into groups: data_cut().

    • To replace specific values with NAs: convert_to_na().

    • To replace Inf and NaN values with NAs: replace_nan_inf().

  • Arguments cols, before and after in data_relocate() can now also be numeric values, indicating the position of the destination column.

0.2.3

2 years ago
  • New functions:

    • to work with lists: is_empty_object() and compact_list()

    • to work with strings: compact_character()

0.2.2

2 years ago
  • New function data_extract() (or its alias extract()) to pull single variables from a data frame, possibly naming each value by the row names of that data frame.

  • reshape_ci() gains a ci_type argument, to reshape data frames where CI-columns have prefixes other than "CI".

  • standardize() and center() gain arguments center and scale, to define references for centrality and deviation that are used when centering or standardizing variables.

  • center() gains the arguments force and reference, similar to standardize().

  • The functionality of the append argument in center() and standardize() was revised. This made the suffix argument redundant, and thus it was removed.

  • Fixed issue in standardize().

  • Fixed issue in data_findcols().

0.2.1

2 years ago
  • Exports plot method for visualisation_recipe() objects from {see} package.

  • centre(), standardise(), unstandardise() are exported as aliases for center(), standardize(), unstandardize(), respectively.

0.2.0.1

2 years ago
  • This is mainly a maintenance release that addresses some issues with conflicting namespaces.

0.2.0

2 years ago
  • New function: visualisation_recipe().

  • The following function has now moved to performance package: check_multimodal().

  • Minor updates to documentation, including a new vignette about demean().

0.1.0

2 years ago
  • First release.