Dplyr Versions Save

dplyr: A grammar of data manipulation

v0.8.1

5 years ago

dplyr 0.8.1

Breaking changes

  • group_modify() is the new name of the function previously known as group_map()

New functions

  • group_map() now only calls the function on each group and return a list.

  • group_by_drop_default(), previously known as dplyr:::group_drops() is exported (#4245).

Minor changes

  • Lists of formulas passed to colwise verbs are now automatically named.

  • group_by() does a shallow copy even in the no groups case (#4221).

  • Fixed mutate() on rowwise data frames with 0 rows (#4224).

  • Fixed handling of bare formulas in colwise verbs (#4183).

  • Fixed performance of n_distint() (#4202).

  • group_indices() now ignores empty groups by default for data.frame, which is consistent with the default of group_by() (@yutannihilation, #4208).

  • Fixed integer overflow in hybrid ntile() (#4186).

  • colwise functions summarise_at() ... can rename vars in the case of multiple functions (#4180).

  • select_if() and rename_if() handle logical vector predicate (#4213).

  • hybrid min() and max() cast to integer when possible (#4258).

  • bind_rows() correctly handles the cases where there are multiple consecutive NULL (#4296).

  • Support for R 3.1.* has been dropped. The minimal R version supported is now 3.2.0. https://www.tidyverse.org/articles/2019/04/r-version-support/

  • rename_at() handles empty selection (#4324).

v0.8.0.1

5 years ago

dplyr 0.8.0.1 (2019-02-15)

  • Fixed integer C/C++ division, forced released by CRAN (#4185).

v0.8.0

5 years ago

dplyr 0.8.0

Breaking changes

  • The error could not find function "n" or the warning Calling `n()` without importing or prefixing it is deprecated, use `dplyr::n()`

    indicates when functions like n(), row_number(), ... are not imported or prefixed.

    The easiest fix is to import dplyr with import(dplyr) in your NAMESPACE or #' @import dplyr in a roxygen comment, alternatively such functions can be imported selectively as any other function with importFrom(dplyr, n) in the NAMESPACE or #' @importFrom dplyr n in a roxygen comment. The third option is to prefix them, i.e. use dplyr::n()

  • If you see checking S3 generic/method consistency in R CMD check for your package, note that :

    • sample_n() and sample_frac() have gained ...
    • filter() and slice() have gained .preserve
    • group_by() has gained .drop
  • Error: `.data` is a corrupt grouped_df, ... signals code that makes wrong assumptions about the internals of a grouped data frame.

New functions

  • New selection helpers group_cols(). It can be called in selection contexts such as select() and matches the grouping variables of grouped tibbles.

  • last_col() is re-exported from tidyselect (#3584).

  • group_trim() drops unused levels of factors that are used as grouping variables.

  • nest_join() creates a list column of the matching rows. nest_join() + tidyr::unnest() is equivalent to inner_join (#3570).

    band_members %>% 
      nest_join(band_instruments)
    
  • group_nest() is similar to tidyr::nest() but focusing on the variables to nest by instead of the nested columns.

    starwars %>%
      group_by(species, homeworld) %>% 
      group_nest()
    
    starwars %>%
      group_nest(species, homeworld)
    
  • group_split() is similar to base::split() but operating on existing groups when applied to a grouped data frame, or subject to the data mask on ungrouped data frames

    starwars %>%
      group_by(species, homeworld) %>%   
      group_split()
    
    starwars %>%
      group_split(species, homeworld)
    
  • group_map() and group_walk() are purrr-like functions to iterate on groups of a grouped data frame, jointly identified by the data subset (exposed as .x) and the data key (a one row tibble, exposed as .y). group_map() returns a grouped data frame that combines the results of the function, group_walk() is only used for side effects and returns its input invisibly.

    mtcars %>%
      group_by(cyl) %>%
      group_map(~ head(.x, 2L))
    
  • distinct_prepare(), previously known as distinct_vars() is exported. This is mostly useful for alternative backends (e.g. dbplyr).

Major changes

  • group_by() gains the .drop argument. When set to FALSE the groups are generated based on factor levels, hence some groups may be empty (#341).

    # 3 groups
    tibble(
      x = 1:2, 
      f = factor(c("a", "b"), levels = c("a", "b", "c"))
    ) %>% 
      group_by(f, .drop = FALSE)
    
    # the order of the grouping variables matter
    df <- tibble(
      x = c(1,2,1,2), 
      f = factor(c("a", "b", "a", "b"), levels = c("a", "b", "c"))
    )
    df %>% group_by(f, x, .drop = FALSE)
    df %>% group_by(x, f, .drop = FALSE)
    

    The default behaviour drops the empty groups as in the previous versions.

    tibble(
        x = 1:2, 
        f = factor(c("a", "b"), levels = c("a", "b", "c"))
      ) %>% 
        group_by(f)
    
  • filter() and slice() gain a .preserve argument to control which groups it should keep. The default filter(.preserve = FALSE) recalculates the grouping structure based on the resulting data, otherwise it is kept as is.

    df <- tibble(
      x = c(1,2,1,2), 
      f = factor(c("a", "b", "a", "b"), levels = c("a", "b", "c"))
    ) %>% 
      group_by(x, f, .drop = FALSE)
    
    df %>% filter(x == 1)
    df %>% filter(x == 1, .preserve = TRUE)
    
  • The notion of lazily grouped data frames have disappeared. All dplyr verbs now recalculate immediately the grouping structure, and respect the levels of factors.

  • Subsets of columns now properly dispatch to the [ or [[ method when the column is an object (a vector with a class) instead of making assumptions on how the column should be handled. The [ method must handle integer indices, including NA_integer_, i.e. x[NA_integer_] should produce a vector of the same class as x with whatever represents a missing value.

Minor changes

  • tally() works correctly on non-data frame table sources such as tbl_sql (#3075).

  • sample_n() and sample_frac() can use n() (#3527)

  • distinct() respects the order of the variables provided (#3195, @foo-bar-baz-qux) and handles the 0 rows and 0 columns special case (#2954).

  • combine() uses tidy dots (#3407).

  • group_indices() can be used without argument in expressions in verbs (#1185).

  • Using mutate_all(), transmute_all(), mutate_if() and transmute_if() with grouped tibbles now informs you that the grouping variables are ignored. In the case of the _all() verbs, the message invites you to use mutate_at(df, vars(-group_cols())) (or the equivalent transmute_at() call) instead if you'd like to make it explicit in your code that the operation is not applied on the grouping variables.

  • Scoped variants of arrange() respect the .by_group argument (#3504).

  • first() and last() hybrid functions fall back to R evaluation when given no arguments (#3589).

  • mutate() removes a column when the expression evaluates to NULL for all groups (#2945).

  • grouped data frames support [, drop = TRUE] (#3714).

  • New low-level constructor new_grouped_df() and validator validate_grouped_df (#3837).

  • glimpse() prints group information on grouped tibbles (#3384).

  • sample_n() and sample_frac() gain ... (#2888).

  • Scoped filter variants now support functions and purrr-like lambdas:

    mtcars %>% filter_at(vars(hp, vs), ~ . %% 2 == 0)
    

Lifecycle

  • do(), rowwise() and combine() are questioning (#3494).

  • funs() is soft-deprecated and will start issuing warnings in a future version.

Changes to column wise functions

  • Scoped variants for distinct(): distinct_at(), distinct_if(), distinct_all() (#2948).

  • summarise_at() excludes the grouping variables (#3613).

  • mutate_all(), mutate_at(), summarise_all() and summarise_at() handle utf-8 names (#2967).

Performance

  • R expressions that cannot be handled with native code are now evaluated with unwind-protection when available (on R 3.5 and later). This improves the performance of dplyr on data frames with many groups (and hence many expressions to evaluate). We benchmarked that computing a grouped average is consistently twice as fast with unwind-protection enabled.

    Unwind-protection also makes dplyr more robust in corner cases because it ensures the C++ destructors are correctly called in all circumstances (debugger exit, captured condition, restart invokation).

  • sample_n() and sample_frac() gain ... (#2888).

  • Improved performance for wide tibbles (#3335).

  • Faster hybrid sum(), mean(), var() and sd() for logical vectors (#3189).

  • Hybrid version of sum(na.rm = FALSE) exits early when there are missing values. This considerably improves performance when there are missing values early in the vector (#3288).

  • group_by() does not trigger the additional mutate() on simple uses of the .data pronoun (#3533).

Internal

  • The grouping metadata of grouped data frame has been reorganized in a single tidy tibble, that can be accessed with the new group_data() function. The grouping tibble consists of one column per grouping variable, followed by a list column of the (1-based) indices of the groups. The new group_rows() function retrieves that list of indices (#3489).

    # the grouping metadata, as a tibble
    group_by(starwars, homeworld) %>% 
      group_data()
    
    # the indices
    group_by(starwars, homeworld) %>% 
      group_data() %>% 
      pull(.rows)
    
    group_by(starwars, homeworld) %>% 
      group_rows()
    
  • Hybrid evaluation has been completely redesigned for better performance and stability.

Documentation

  • Add documentation example for moving variable to back in ?select (#3051).

  • column wise functions are better documented, in particular explaining when grouping variables are included as part of the selection.

v0.7.6

5 years ago
  • exprs() is no longer exported to avoid conflicts with Biobase::exprs() (#3638).

  • The MASS package is explicitly suggested to fix CRAN warnings on R-devel (#3657).

  • Set operations like intersect() and setdiff() reconstruct groups metadata (#3587).

  • Using namespaced calls to base::sort() and base::unique() from C++ code to avoid ambiguities when these functions are overridden (#3644).

  • Fix rchk errors (#3693).

v0.7.5

6 years ago

Breaking changes for package developers

  • The major change in this version is that dplyr now depends on the selecting backend of the tidyselect package. If you have been linking to dplyr::select_helpers documentation topic, you should update the link to point to tidyselect::select_helpers.

  • Another change that causes warnings in packages is that dplyr now exports the exprs() function. This causes a collision with Biobase::exprs(). Either import functions from dplyr selectively rather than in bulk, or do not import Biobase::exprs() and refer to it with a namespace qualifier.

Bug fixes

  • distinct(data, "string") now returns a one-row data frame again. (The previous behavior was to return the data unchanged.)

  • do() operations with more than one named argument can access . (#2998).

  • Reindexing grouped data frames (e.g. after filter() or ..._join()) never updates the "class" attribute. This also avoids unintended updates to the original object (#3438).

  • Fixed rare column name clash in ..._join() with non-join columns of the same name in both tables (#3266).

  • Fix ntile() and row_number() ordering to use the locale-dependent ordering functions in R when dealing with character vectors, rather than always using the C-locale ordering function in C (#2792, @foo-bar-baz-qux).

  • Summaries of summaries (such as summarise(b = sum(a), c = sum(b))) are now computed using standard evaluation for simplicity and correctness, but slightly slower (#3233).

  • Fixed summarise() for empty data frames with zero columns (#3071).

Major changes

  • enexpr(), expr(), exprs(), sym() and syms() are now exported. sym() and syms() construct symbols from strings or character vectors. The expr() variants are equivalent to quo(), quos() and enquo() but return simple expressions rather than quosures. They support quasiquotation.

  • dplyr now depends on the new tidyselect package to power select(), rename(), pull() and their variants (#2896). Consequently select_vars(), select_var() and rename_vars() are soft-deprecated and will start issuing warnings in a future version.

    Following the switch to tidyselect, select() and rename() fully support character vectors. You can now unquote variables like this:

    vars <- c("disp", "cyl")
    select(mtcars, !! vars)
    select(mtcars, -(!! vars))
    

    Note that this only works in selecting functions because in other contexts strings and character vectors are ambiguous. For instance strings are a valid input in mutating operations and mutate(df, "foo") creates a new column by recycling "foo" to the number of rows.

Minor changes

  • Support for raw vector columns in arrange(), group_by(), mutate(), summarise() and ..._join() (minimal raw x raw support initially) (#1803).

  • bind_cols() handles unnamed list (#3402).

  • bind_rows() works around corrupt columns that have the object bit set while having no class attribute (#3349).

  • combine() returns logical() when all inputs are NULL (or when there are no inputs) (#3365, @zeehio).

  • distinct() now supports renaming columns (#3234).

  • Hybrid evaluation simplifies dplyr::foo() to foo() (#3309). Hybrid functions can now be masked by regular R functions to turn off hybrid evaluation (#3255). The hybrid evaluator finds functions from dplyr even if dplyr is not attached (#3456).

  • In mutate() it is now illegal to use data.frame in the rhs (#3298).

  • Support !!! in recode_factor() (#3390).

  • row_number() works on empty subsets (#3454).

  • select() and vars() now treat NULL as empty inputs (#3023).

  • Scoped select and rename functions (select_all(), rename_if() etc.) now work with grouped data frames, adapting the grouping as necessary (#2947, #3410). group_by_at() can group by an existing grouping variable (#3351). arrange_at() can use grouping variables (#3332).

  • slice() no longer enforce tibble classes when input is a simple data.frame, and ignores 0 (#3297, #3313).

  • transmute() no longer prints a message when including a group variable.

Documentation

  • Improved documentation for funs() (#3094) and set operations (e.g. union()) (#3238, @edublancas).

Error messages

  • Better error message if dbplyr is not installed when accessing database backends (#3225).

  • arrange() fails gracefully on data.frame columns (#3153).

  • Corrected error message when calling cbind() with an object of wrong length (#3085).

  • Add warning with explanation to distinct() if any of the selected columns are of type list (#3088, @foo-bar-baz-qux), or when used on unknown columns (#2867, @foo-bar-baz-qux).

  • Show clear error message for bad arguments to funs() (#3368).

  • Better error message in ..._join() when joining data frames with duplicate or NA column names. Joining such data frames with a semi- or anti-join now gives a warning, which may be converted to an error in future versions (#3243, #3417).

  • Dedicated error message when trying to use columns of the Interval or Period classes (#2568).

  • Added an .onDetach() hook that allows for plyr to be loaded and attached without the warning message that says functions in dplyr will be masked, since dplyr is no longer attached (#3359, @jwnorman).

Performance

  • sample_n() and sample_frac() on grouped data frame are now faster especially for those with large number of groups (#3193, @saurfang).

Internal

  • Compute variable names for joins in R (#3430).

  • Bumped Rcpp dependency to 0.12.15 to avoid imperfect detection of NA values in hybrid evaluation fixed in RcppCore/Rcpp#790 (#2919).

  • Avoid cleaning the data mask, a temporary environment used to evaluate expressions. If the environment, in which e.g. a mutate() expression is evaluated, is preserved until after the operation, accessing variables from that environment now gives a warning but still returns NULL (#3318).

v0.7.4

6 years ago
  • Fix recent Fedora and ASAN check errors (#3098).

  • Avoid dependency on Rcpp 0.12.10 (#3106).

v0.7.3

6 years ago

dplyr 0.7.3

  • Fixed protection error that occurred when creating a character column using grouped mutate() (#2971).

  • Fixed a rare problem with accessing variable values in summarise() when all groups have size one (#3050).

  • Fixed rare out-of-bounds memory write in slice() when negative indices beyond the number of rows were involved (#3073).

  • select(), rename() and summarise() no longer change the grouped vars of the original data (#3038).

  • nth(default = var), first(default = var) and last(default = var) fall back to standard evaluation in a grouped operation instead of triggering an error (#3045).

  • case_when() now works if all LHS are atomic (#2909), or when LHS or RHS values are zero-length vectors (#3048).

  • case_when() accepts NA on the LHS (#2927).

  • Semi- and anti-joins now preserve the order of left-hand-side data frame (#3089).

  • Improved error message for invalid list arguments to bind_rows() (#3068).

  • Grouping by character vectors is now faster (#2204).

  • Fixed a crash that occurred when an unexpected input was supplied to the call argument of order_by() (#3065).

v0.7.2

6 years ago
  • Move build-time vs. run-time checks out of .onLoad() and into dr_dplyr().

v0.7.1

6 years ago
  • Use new versions of bindrcpp and glue to avoid protection problems. Avoid wrapping arguments to internal error functions (#2877). Fix two protection mistakes found by rchk (#2868).

  • Fix C++ error that caused compilation to fail on mac cran (#2862)

  • Fix undefined behaviour in between(), where NA_REAL were assigned instead of NA_LOGICAL. (#2855, @zeehio)

  • top_n() now executes operations lazily for compatibility with database backends (#2848).

  • Reuse of new variables created in ungrouped mutate() possible again, regression introduced in dplyr 0.7.0 (#2869).

  • Quosured symbols do not prevent hybrid handling anymore. This should fix many performance issues introduced with tidyeval (#2822).

v0.7.0

7 years ago

New data, functions, and features

  • Five new datasets provide some interesting built-in datasets to demonstrate dplyr verbs (#2094):

    • starwars dataset about starwars characters; has list columns
    • storms has the trajectories of ~200 tropical storms
    • band_members, band_instruments and band_instruments2 has some simple data to demonstrate joins.
  • New add_count() and add_tally() for adding an n column within groups (#2078, @dgrtwo).

  • arrange() for grouped data frames gains a .by_group argument so you can choose to sort by groups if you want to (defaults to FALSE) (#2318)

  • New pull() generic for extracting a single column either by name or position (either from the left or the right). Thanks to @paulponcet for the idea (#2054).

    This verb is powered with the new select_var() internal helper, which is exported as well. It is like select_vars() but returns a single variable.

  • as_tibble() is re-exported from tibble. This is the recommend way to create tibbles from existing data frames. tbl_df() has been softly deprecated. tribble() is now imported from tibble (#2336, @chrMongeau); this is now prefered to frame_data().

Deprecated and defunct

  • dplyr no longer messages that you need dtplyr to work with data.table (#2489).

  • Long deprecated regroup(), mutate_each_q() and summarise_each_q() functions have been removed.

  • Deprecated failwith(). I'm not even sure why it was here.

  • Soft-deprecated mutate_each() and summarise_each(), these functions print a message which will be changed to a warning in the next release.

  • The .env argument to sample_n() and sample_frac() is defunct, passing a value to this argument print a message which will be changed to a warning in the next release.

Databases

This version of dplyr includes some major changes to how database connections work. By and large, you should be able to continue using your existing dplyr database code without modification, but there are two big changes that you should be aware of:

  • Almost all database related code has been moved out of dplyr and into a new package, dbplyr. This makes dplyr simpler, and will make it easier to release fixes for bugs that only affect databases. src_mysql(), src_postgres(), and src_sqlite() will still live dplyr so your existing code continues to work.

  • It is no longer necessary to create a remote "src". Instead you can work directly with the database connection returned by DBI. This reflects the maturity of the DBI ecosystem. Thanks largely to the work of Kirill Muller (funded by the R Consortium) DBI backends are now much more consistent, comprehensive, and easier to use. That means that there's no longer a need for a layer in between you and DBI.

You can continue to use src_mysql(), src_postgres(), and src_sqlite(), but I recommend a new style that makes the connection to DBI more clear:

library(dplyr)

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
DBI::dbWriteTable(con, "mtcars", mtcars)

mtcars2 <- tbl(con, "mtcars")
mtcars2

This is particularly useful if you want to perform non-SELECT queries as you can do whatever you want with DBI::dbGetQuery() and DBI::dbExecute().

If you've implemented a database backend for dplyr, please read the backend news to see what's changed from your perspective (not much). If you want to ensure your package works with both the current and previous version of dplyr, see wrap_dbplyr_obj() for helpers.

UTF-8

  • Internally, column names are always represented as character vectors, and not as language symbols, to avoid encoding problems on Windows (#1950, #2387, #2388).

  • Error messages and explanations of data frame inequality are now encoded in UTF-8, also on Windows (#2441).

  • Joins now always reencode character columns to UTF-8 if necessary. This gives a nice speedup, because now pointer comparison can be used instead of string comparison, but relies on a proper encoding tag for all strings (#2514).

  • Fixed problems when joining factor or character encodings with a mix of native and UTF-8 encoded values (#1885, #2118, #2271, #2451).

  • Fix group_by() for data frames that have UTF-8 encoded names (#2284, #2382).

  • New group_vars() generic that returns the grouping as character vector, to avoid the potentially lossy conversion to language symbols. The list returned by group_by_prepare() now has a new group_names component (#1950, #2384).

Colwise functions

  • rename(), select(), group_by(), filter(), arrange() and transmute() now have scoped variants (verbs suffixed with _if(), _at() and _all()). Like mutate_all(), summarise_if(), etc, these variants apply an operation to a selection of variables.

  • The scoped verbs taking predicates (mutate_if(), summarise_if(), etc) now support S3 objects and lazy tables. S3 objects should implement methods for length(), [[ and tbl_vars(). For lazy tables, the first 100 rows are collected and the predicate is applied on this subset of the data. This is robust for the common case of checking the type of a column (#2129).

  • Summarise and mutate colwise functions pass ... on the the manipulation functions.

  • The performance of colwise verbs like mutate_all() is now back to where it was in mutate_each().

  • funs() has better handling of namespaced functions (#2089).

  • Fix issue with mutate_if() and summarise_if() when a predicate function returns a vector of FALSE (#1989, #2009, #2011).

Tidyeval

dplyr has a new approach to non-standard evaluation (NSE) called tidyeval. It is described in detail in vignette("programming") but, in brief, gives you the ability to interpolate values in contexts where dplyr usually works with expressions:

my_var <- quo(homeworld)

starwars %>%
  group_by(!!my_var) %>%
  summarise_at(vars(height:mass), mean, na.rm = TRUE)

This means that the underscored version of each main verb is no longer needed, and so these functions have been deprecated (but remain around for backward compatibility).

  • order_by(), top_n(), sample_n() and sample_frac() now use tidyeval to capture their arguments by expression. This makes it possible to use unquoting idioms (see vignette("programming")) and fixes scoping issues (#2297).

  • Most verbs taking dots now ignore the last argument if empty. This makes it easier to copy lines of code without having to worry about deleting trailing commas (#1039).

  • [API] The new .data and .env environments can be used inside all verbs that operate on data: .data$column_name accesses the column column_name, whereas .env$var accesses the external variable var. Columns or external variables named .data or .env are shadowed, use .data$... and/or .env$... to access them. (.data implements strict matching also for the $ operator (#2591).)

    The column() and global() functions have been removed. They were never documented officially. Use the new .data and .env environments instead.

  • Expressions in verbs are now interpreted correctly in many cases that failed before (e.g., use of $, case_when(), nonstandard evaluation, ...). These expressions are now evaluated in a specially constructed temporary environment that retrieves column data on demand with the help of the bindrcpp package (#2190). This temporary environment poses restrictions on assignments using <- inside verbs. To prevent leaking of broken bindings, the temporary environment is cleared after the evaluation (#2435).

Verbs

Joins

  • [API] xxx_join.tbl_df(na_matches = "never") treats all NA values as different from each other (and from any other value), so that they never match. This corresponds to the behavior of joins for database sources, and of database joins in general. To match NA values, pass na_matches = "na" to the join verbs; this is only supported for data frames. The default is na_matches = "na", kept for the sake of compatibility to v0.5.0. It can be tweaked by calling pkgconfig::set_config("dplyr::na_matches", "na") (#2033).

  • common_by() gets a better error message for unexpected inputs (#2091)

  • Fix groups when joining grouped data frames with duplicate columns (#2330, #2334, @davidkretch).

  • One of the two join suffixes can now be an empty string, dplyr no longer hangs (#2228, #2445).

  • Anti- and semi-joins warn if factor levels are inconsistent (#2741).

  • Warnings about join column inconsistencies now contain the column names (#2728).

Select

  • For selecting variables, the first selector decides if it's an inclusive selection (i.e., the initial column list is empty), or an exclusive selection (i.e., the initial column list contains all columns). This means that select(mtcars, contains("am"), contains("FOO"), contains("vs")) now returns again both am and vs columns like in dplyr 0.4.3 (#2275, #2289, @r2evans).

  • Select helpers now throw an error if called when no variables have been set (#2452)

  • Helper functions in select() (and related verbs) are now evaluated in a context where column names do not exist (#2184).

  • select() (and the internal function select_vars()) now support column names in addition to column positions. As a result, expressions like select(mtcars, "cyl") are now allowed.

Other

  • recode(), case_when() and coalesce() now support splicing of arguments with rlang's !!! operator.

  • count() now preserves the grouping of its input (#2021).

  • distinct() no longer duplicates variables (#2001).

  • Empty distinct() with a grouped data frame works the same way as an empty distinct() on an ungrouped data frame, namely it uses all variables (#2476).

  • copy_to() now returns it's output invisibly (since you're often just calling for the side-effect).

  • filter() and lag() throw informative error if used with ts objects (#2219)

  • mutate() recycles list columns of length 1 (#2171).

  • mutate() gives better error message when attempting to add a non-vector column (#2319), or attempting to remove a column with NULL (#2187, #2439).

  • summarise() now correctly evaluates newly created factors (#2217), and can create ordered factors (#2200).

  • Ungrouped summarise() uses summary variables correctly (#2404, #2453).

  • Grouped summarise() no longer converts character NA to empty strings (#1839).

Combining and comparing

  • all_equal() now reports multiple problems as a character vector (#1819, #2442).

  • all_equal() checks that factor levels are equal (#2440, #2442).

  • bind_rows() and bind_cols() give an error for database tables (#2373).

  • bind_rows() works correctly with NULL arguments and an .id argument (#2056), and also for zero-column data frames (#2175).

  • Breaking change: bind_rows() and combine() are more strict when coercing. Logical values are no longer coerced to integer and numeric. Date, POSIXct and other integer or double-based classes are no longer coerced to integer or double as there is chance of attributes or information being lost (#2209, @zeehio).

  • bind_cols() now calls tibble::repair_names() to ensure that all names are unique (#2248).

  • bind_cols() handles empty argument list (#2048).

  • bind_cols() better handles NULL inputs (#2303, #2443).

  • bind_rows() explicitly rejects columns containing data frames (#2015, #2446).

  • bind_rows() and bind_cols() now accept vectors. They are treated as rows by the former and columns by the latter. Rows require inner names like c(col1 = 1, col2 = 2), while columns require outer names: col1 = c(1, 2). Lists are still treated as data frames but can be spliced explicitly with !!!, e.g. bind_rows(!!! x) (#1676).

  • rbind_list() and rbind_all() now call .Deprecated(), they will be removed in the next CRAN release. Please use bind_rows() instead.

  • combine() accepts NA values (#2203, @zeehio)

  • combine() and bind_rows() with character and factor types now always warn about the coercion to character (#2317, @zeehio)

  • combine() and bind_rows() accept difftime objects.

  • mutate coerces results from grouped dataframes accepting combinable data types (such as integer and numeric). (#1892, @zeehio)

Vector functions

  • %in% gets new hybrid handler (#126).

  • between() returns NA if left or right is NA (fixes #2562).

  • case_when() supports NA values (#2000, @tjmahr).

  • first(), last(), and nth() have better default values for factor, Dates, POSIXct, and data frame inputs (#2029).

  • Fixed segmentation faults in hybrid evaluation of first(), last(), nth(), lead(), and lag(). These functions now always fall back to the R implementation if called with arguments that the hybrid evaluator cannot handle (#948, #1980).

  • n_distinct() gets larger hash tables given slightly better performance (#977).

  • nth() and ntile() are more careful about proper data types of their return values (#2306).

  • ntile() ignores NA when computing group membership (#2564).

  • lag() enforces integer n (#2162, @kevinushey).

  • hybrid min() and max() now always return a numeric and work correctly in edge cases (empty input, all NA, ...) (#2305, #2436).

  • min_rank("string") no longer segfaults in hybrid evaluation (#2279, #2444).

  • recode() can now recode a factor to other types (#2268)

  • recode() gains .dots argument to support passing replacements as list (#2110, @jlegewie).

Other minor changes and bug fixes

  • Many error messages are more helpful by referring to a column name or a position in the argument list (#2448).

  • New is_grouped_df() alias to is.grouped_df().

  • tbl_vars() now has a group_vars argument set to TRUE by default. If FALSE, group variables are not returned.

  • Fixed segmentation fault after calling rename() on an invalid grouped data frame (#2031).

  • rename_vars() gains a strict argument to control if an error is thrown when you try and rename a variable that doesn't exist.

  • Fixed undefined behavior for slice() on a zero-column data frame (#2490).

  • Fixed very rare case of false match during join (#2515).

  • Restricted workaround for match() to R 3.3.0. (#1858).

  • dplyr now warns on load when the version of R or Rcpp during installation is different to the currently installed version (#2514).

  • Fixed improper reuse of attributes when creating a list column in summarise() and perhaps mutate() (#2231).

  • mutate() and summarise() always strip the names attribute from new or updated columns, even for ungrouped operations (#1689).

  • Fixed rare error that could lead to a segmentation fault in all_equal(ignore_col_order = FALSE) (#2502).

  • The "dim" and "dimnames" attributes are always stripped when copying a vector (#1918, #2049).

  • grouped_df and rowwise are registered officially as S3 classes. This makes them easier to use with S4 (#2276, @joranE, #2789).

  • All operations that return tibbles now include the "tbl" class. This is important for correct printing with tibble 1.3.1 (#2789).

  • Makeflags uses PKG_CPPFLAGS for defining preprocessor macros.

  • astyle formatting for C++ code, tested but not changed as part of the tests (#2086, #2103).

  • Update RStudio project settings to install tests (#1952).

  • Using Rcpp::interfaces() to register C callable interfaces, and registering all native exported functions via R_registerRoutines() and useDynLib(.registration = TRUE) (#2146).

  • Formatting of grouped data frames now works by overriding the tbl_sum() generic instead of print(). This means that the output is more consistent with tibble, and that format() is now supported also for SQL sources (#2781).