Tidyverse Haven Versions Save

Read SPSS, Stata and SAS files from R

v2.5.4

5 months ago
  • Fix for upcoming R-devel change.

v2.5.3

10 months ago
  • Fix for upcoming R-devel change.

v2.5.2

1 year ago
  • Updated to ReadStat 1.1.9.

    • Fix various SAS catalog file reading bugs (#529, #653, #680, #696, #705).
    • Increase maximum SAS page file size to 16MiB (#697).
    • Ignore invalid SAV timestamp strings (#683).
    • Fix compiler warnings (#707).
  • The experimental write_sas() function has been deprecated (#224). The sas7bdat file format is complex and undocumented, and as such writing SAS files is not officially supported by ReadStat. write_xpt() should be used instead - it produces files in the SAS transport format, which has limitations but will be reliably read by SAS.

  • write_*() functions gain a new adjust_tz argument to allow more control over time zone conversion for date-time variables (#702). Thanks to @jmobrien for the detailed issue and feedback.

    Stata, SPSS and SAS do not have a concept of time zone. Since haven 2.4.0 date-time values in non-UTC time zones are implicitly converted when writing to ensure the time displayed in Stata/SPSS/SAS will match the time displayed to the user in R (see #555). This is the behaviour when adjust_tz = TRUE (the default). Although this is in line with general user expectations it can cause issues when the time zone is important, for e.g. when looking at differences between time points, since the underlying numeric data is changed to preserve the displayed time. Use adjust_tz = FALSE to write the time as the corresponding UTC value, which will appear different to the user but preserves the underlying numeric data.

  • write_*() functions previously returned the data frame with minor alterations made to date-time variables. These functions now invisibly return the original input data frame unchanged (@jmobrien, #702).

  • Fix bug in string variable width calculation that treated NA values as width 2. NA values are now treated as blanks for width calculations (#699).

v2.5.1

1 year ago
  • All labelled() vectors now have left-aligned column headers when printing in tibbles for better alignment with labels (#676).

  • write_*() now accept functions as well as strings in the .name_repair argument in line with the documentation. Previously they only supported string values (#684).

  • write_sav() variable name validation no longer treats all non-ASCII characters as invalid (#689).

v2.5.0

2 years ago

New author

  • @gorcha is now a haven author in recognition of his significant and sustained contributions.

File writing improvements

  • All write_ functions can now write custom variable widths by setting the width attribute (#650).

  • When writing files, the minimum width for character variables is now 1. This fixes issues with statistical software reading blank character variables with width 0 (#650).

  • write_dta() now uses strL when strings are too long to be stored in an str# variable (#437). strL is used when strings are longer than 2045 characters by default, which matches Stata's behaviour, but this can be reduced with the strl_threshold argument.

  • write_xpt() can now write dataset labels with the label argument, which defaults to the label attribute of the input data frame, if present (#562).

  • write_sav() now checks for case-insensitive duplicate variable names (@juansebastianl, #641) and verifies that variable names are valid SPSS variables.

  • The compress argument for write_sav() now supports all 3 SPSS compression modes specified as a character string - "byte", "none" and "zsav" (#614). TRUE and FALSE can be used for backwards compatibility, and correspond to the "zsav" and "none" options respectively.

  • write_sav() successfully writes user missing values and ranges for labelled() integer vectors (#596).

  • POSIXct and POSIXlt values with no time component (e.g. "2010-01-01") were being converted to NA when attempting to convert the output timezone to UTC. These now output successfully (#634).

  • Fix bug in output timezone conversion that was causing variable labels and other variable attributes to disappear (#624).

Other improvements and fixes

  • Updated to ReadStat 1.1.8 RC.

    • Fix bug when writing formats to XPT files (#650).
    • Fix off by one error in indexing for strL variables (#437).
  • labelled() vectors now throw a warning when combining two vectors with conflicting labels (#667).

  • zap_labels() gains a user_na argument to control whether user-defined missing values are converted to NA or left as is (#638).

  • vctrs casting and coercion generics now do less work when working with two identical labelled() vectors. This significantly improves performance when working with labelled() vectors in grouped data frames (#658).

  • Errors and warnings now use cli_abort() and cli_warning() (#661).

Dependency changes

  • R 3.4 is now the minimum supported version, in line with tidyverse policy.

  • cli >= 3.0.0 has been added to Imports to support new error messaging.

  • lifecycle has been added to Imports, and is now used to manage deprecations.

v2.4.3

2 years ago
  • Fix build failure on Solaris.

v2.4.2

2 years ago
  • Updated to ReadStat 1.1.7 RC (#620).

  • read_dta() no longer crashes if it sees StrL variables with missing values (@gorcha, #594, #600, #608). urlchecker::url_check()

  • write_dta() now correctly handles "labelled"-class numeric (double) variables that don't have value labels (@jmobrien, #606, #609).

  • write_dta() now allows variable names up to 32 characters (@sbae, #605).

  • Can now correctly combine labelled_spss() with identical labels (@gorcha, #599).

v2.4.1

3 years ago
  • Fix buglet when combining labelled() with identical labels.

v2.4.0

3 years ago

New features

  • labelled_spss() gains full vctrs support thanks to the hard work of @gorcha (#527, #534, #538, #557). This means that they should now work seamlessly in dplyr 1.0.0, tidyr 1.0.0 and other packages that use vctrs.

  • labelled() vectors are more permissive when concatenating; output labels will be a combination of the left-hand and the right-hand side, preferring values assigned to the left-hand side (#543).

  • Date-times are no longer forced to UTC, but instead converted to the equivalent UTC (#555). This should ensure that you see the same date-time in R and in Stata/SPSS/SAS.

Minor improvements and bug fixes

  • Updated to ReadStat 1.1.5. Most importantly this includes support for SAS binary compression.

  • as_factor(levels = "values") preserves values of unlabelled elements (#570).

  • labelled_spss() is a little stricter: it prevents na_range and na_value from containing missing values, and ensures that na_range is in the correct order (#574).

  • read_spss() now reads NA values and ranges of character variables (#409).

  • write_dta() now correctly writes tagged NAs (including tagged NAs in labels) (#583) and once again validates length of variables names (#485).

  • write_*() now validate file and variable metadata with ReadStat. This should prevent many invalid files from being written (#408). Additionally, validation failures now provide more details about the source of the problem (e.g. the column name of the problem) (#463).

  • write_sav(compress = FALSE) now uses SPSS bytecode compression instead of the rarely-used uncompressed mode. compress = TRUE continues to use the newer (and not universally supported, but more compact) zlib format (@oliverbock, #544).

v2.3.1

3 years ago
  • Add missing methods so median(), quantile() and summary() work once more (#520).

  • Add missing cast methods (#522).