Fast reading of delimited files
It is now possible (again?) to read from a list of connections (@bairdj, #514).
Internal change for compatibility with cpp11 >= 0.4.6 (@DavisVaughan, #512).
str()
now works in a colorized context in the presence of a column of class integer64
, i.e. parsed with col_big_integer()
(@bart1, #477).
The embedded implementation of the Grisu algorithm for printing floating point numbers now uses snprintf()
instead of sprintf()
and likewise for vroom's own code (@jeroen, #480).
vroom(col_select=)
now handles column selection by numeric position when id
column is provided (#455).
vroom(id = "path", col_select = a:c)
is treated like vroom(id = "path", col_select = c(path, a:c))
. If an id
column is provided, it is automatically included in the output (#416).
vroom_write(append = TRUE)
does not modify an existing file when appending an empty data frame. In particular, it does not overwrite (delete) the existing contents of that file (https://github.com/tidyverse/readr/issues/1408, #451).
vroom::problems()
now defaults to .Last.value
for its primary input, similar to how readr::problems()
works (#443).
The warning that indicates the existence of parsing problems has been improved, which should make it easier for the user to follow-up (https://github.com/tidyverse/readr/issues/1322).
vroom()
reads more reliably from filepaths containing non-ascii characters, in a non-UTF-8 locale (#394, #438).
vroom_format()
and vroom_write()
only quote values that contain a delimiter, quote, or newline. Specifically values that are equal to the na
string (or that start with it) are no longer quoted (#426).
Fixed segfault when reading in multiple files and the first file has only a header row of column names, but subsequent files have at least one row (#430).
Fixed segfault when vroom_format()
is given an empty data frame (#425)
Fixed a segfault that could occur when the final field of the final line is missing and the file also does not end in a newline (#429).
Fixed recursive garbage collection error that could occur during vroom_write()
when output_column()
generates an ALTREP vector (#389).
vroom_progress()
uses rlang::is_interactive()
instead of base::interactive()
.
col_factor(levels = NULL)
honors the na
strings of vroom()
and its own include_na
argument, as described in the docs, and now reproduces the behaviour of readr's first edition parser (#396).
Jenny Bryan is now the official maintainer.
Fix uninitialized bool detected by CRAN's UBSAN check (https://github.com/r-lib/vroom/pull/386)
Fix buffer overflow when trying to parse an integer field that is over 64 characters long (https://github.com/tidyverse/readr/issues/1326)
Fix subset indexing when indexes span a file boundary multiple times (#383)
vroom(col_select=)
now works if col_names = FALSE
as intended (#381)
vroom(n_max=)
now correctly handles cases when reading from a connection and the file does not end with a newline (https://github.com/tidyverse/readr/issues/1321)
vroom()
no longer issues a spurious warning when the parsing needs to be restarted due to the presence of embedded newlines (https://github.com/tidyverse/readr/issues/1313)
Fix performance issue when materializing subsetted vectors (#378)
vroom_format()
now uses the same internal multi-threaded code as vroom_write()
, improving its performance in most cases (#377)
vroom_fwf()
no longer omits the last line if it does not end with a newline (https://github.com/tidyverse/readr/issues/1293)
Empty files or files with only a header line and no data no longer cause a crash if read with multiple files (https://github.com/tidyverse/readr/issues/1297)
Files with a header but no contents, or a empty file if col_names = FALSE
no longer cause a hang when progress = TRUE
(https://github.com/tidyverse/readr/issues/1297)
Commented lines with comments at the end of lines no longer hang R (https://github.com/tidyverse/readr/issues/1309)
Comment lines containing unpaired quotes are no longer treated as unterminated quotations (https://github.com/tidyverse/readr/issues/1307)
Values with only a Inf
or NaN
prefix but additional data afterwards, like
Inform
or no longer inappropriately guessed as doubles (https://github.com/tidyverse/readr/issues/1319)
Time types now support %h
format to denote hour durations greater than 24, like readr (https://github.com/tidyverse/readr/issues/1312)
Fix performance issue when materializing subsetted vectors (#378)
vroom()
now supports files with only carriage return newlines (\r
). (#360, https://github.com/tidyverse/readr/issues/1236)
vroom()
now parses single digit datetimes more consistently as readr has done (https://github.com/tidyverse/readr/issues/1276)
vroom()
now parses Inf
values as doubles (https://github.com/tidyverse/readr/issues/1283)
vroom()
now parses NaN
values as doubles (https://github.com/tidyverse/readr/issues/1277)
VROOM_CONNECTION_SIZE
is now parsed as a double, which supports scientific notation (#364)
vroom()
now works around specifying a \n
as the delimiter (#365, https://github.com/tidyverse/dplyr/issues/5977)
vroom()
no longer crashes if given a col_name
and col_type
both less than the number of columns (https://github.com/tidyverse/readr/issues/1271)
vroom()
no longer hangs if given an empty value for locale(grouping_mark=)
(https://github.com/tidyverse/readr/issues/1241)
Fix performance regression when guessing with large numbers of rows (https://github.com/tidyverse/readr/issues/1267)
vroom(col_types=)
now accepts column type names like those accepted by utils::read.table. e.g.
vroom::vroom(col_types = list(a = "integer", b = "double", c = "skip"))
vroom()
now respects the quote
parameter properly in the first two lines of the file (https://github.com/tidyverse/readr/issues/1262)
vroom_write()
now always correctly writes its output including column names in UTF-8 (https://github.com/tidyverse/readr/issues/1242)
vroom_write()
now creates an empty file when given a input without any columns (https://github.com/tidyverse/readr/issues/1234)
vroom(col_types=)
now truncates the column types if the user passes too many types. (#355)
vroom()
now always includes the last row when guessing (#352)
vroom(trim_ws = TRUE)
now trims field content within quotes as well as without (#354).
Previously vroom explicitly retained field content inside quotes regardless of the value of trim_ws
.