Taxize Versions Save

A taxonomic toolbelt for R

v0.9.8

4 years ago

NEW FEATURES

  • all get_* functions gain some new features (associated new fxns are taxon_last and taxon_clear): a) nicer messages printed to the console when iterating through taxa, and a summary at the end of what was done; and b) state is now saved when running get_* functions. That is, in an object external to the get_* function call we keep track of what happened, so that if an error is encountered, you can easily restart where you left off; this is especially useful when dealing with a large number of inputs to a get_* function. To utilize, pass the output of taxon_last() to a get_* function call. Associated with these changes are new package imports: R6, crayon and cli (#736) (#757)
  • gains a new function taxize_options() to set options when using taxize. the first reason for the function is to set two options for the above item for get_* functions: taxon_state_messages to allow taxon state tracking messages in get_* functions or not, and quiet=TRUE quiets output from the taxize_options() function itself

MINOR IMPROVEMENTS

  • in id2name() and worms_downstream() use worrms::wm_record instead of worrms::wm_record_ for newest version of worrms (#760)
  • many get_* functions and col_downstream() parameter verbose changed to messages to not conflict with a verbose curl options parameter passed in to crul

BUG FIXES

  • fix to http request processing for COL - sometimes errors, and gives a message in the response body, but DOES NOT give the appropriate error HTTP status code - need to always do a check for COL responses (#755) (#756) thanks @dougwyu
  • fix to gbif_downstream() - GBIF in some cases returns a rank of "unranked", which we hadn't accounted for in internal rank processing code (#758) thanks @ocstringham

v0.9.7

5 years ago

MINOR IMPROVEMENTS

  • class2tree() gains node labels when present (#644) (#748) thanks @gpli
  • change documentation to use markdown (#658) (#746) thanks @Rekyt
  • fix encoding in fixtures that debian clang doesnt like (#754)

v0.9.6

5 years ago

NEW FEATURES

  • gains new functions for Kew's Plants of the World: get_pow(), get_pow_(), as.pow(), classification.pow(), pow_search(), and pow_lookup() (#598) (#739)
  • we now pass a user agent string in all HTTP requests to the various data sources so they know its coming from taxize. the string will look something like r-curl/3.3 crul/0.7.0 rOpenSci(taxize/0.9.6), including the versions of the curl R pkg, the crul package, and the taxize package (#662)
  • change to get_colid functionality: we weren't paginating for the user when there were more than 50 results for a query; we now paginate for the user using async HTTP requests; this means that some requests will take longer than they did before if they have more than 50 results; this is a good change given that you get all the results for your query now (#743)
  • change across most get_* functions: in some of the get_* functions we tried for a direct match (e.g., "Poa" == "Poa") and if one was found, then we were done and returned that record. however, we didn't deploy the same logic across all get_* functions. Now all get_* functions check for a direct match. Of course if there is a direct match with more than 1 result, you still get the prompt asking you which name you want. (#631) (#734)

MINOR IMPROVEMENTS

  • Make separate taxize-authentication manual file covering authentication information across the package (#681)
  • new case study vignette added (#544) (#721) thanks @fozy81
  • add note to gnr_resolve() docs about age of datasets used in the Global Names Resolver, and how to access age of datasets (#737)
  • get_eolid() fixes: gains new attribute pageid; uri's given are updated to EOL's new URL format; rank and datasource parameters were not documented, now are; we no longer use short names for data sources within EOL, but instead use their full names (#702) (#742)
  • col_search() now returns attributes on the output data.frame's with number of results found and returned, and other metadata about the search
  • gnr_datasources() loses the todf parameter; now always returns a data.frame and the data.frame has all the columns, whereas the default call returned a limited set of columns in previous versions

BUG FIXES

  • fix bug in get_wormsid(), was failing when there was a direct match found with more than 1 result (#740)
  • fix across all get_* functions: linting of the input to the rows parmeter was failing with a vector of values in some cases (#741)
  • fix to iucn_summary(); we weren't passing on the API key internally correctly (#735) thanks @PrincessPi314 for the report

v0.9.5

5 years ago

Compare to previous release

https://github.com/ropensci/taxize/compare/v0.9.4...v0.9.5

DEFUNCT

  • iucn_summary_id() is defunct, use iucn_summary() instead

NEW FEATURES

  • col_downstream() gains parameter extant_only (logical) to optionally keep extant taxa only (#714) thanks @ArielGreiner for the inquiry
  • downstream() gains another db options: Worms. You can now set db="worms" to use Worms to get taxa downstream from a target taxon. In addition, taxize gains new function worms_downstream(), which is used under the hood in downstream(..., db="worms") (#713) (#715)
  • gains new function id2name() with db options for tol, itis, ncbi, worms, gbif, col, and bold. the function converts taxonomid IDs to names. It's sort of the inverse of the get_*() family of functions. (#712) (#716)
  • tax_rank() gains new parameter rows so that one can pass rows down to get_*() functions

MINOR IMPROVEMENTS

  • synonyms() warning from an internal cbind() call now fixed (#704) (#705) thanks @vijaybarve
  • namespace taxize function calls thrown when notifying users about API keys (e.g., taxize::use_tropicos()) to make it very clear where the functions live (to avoid confusion with usethis) (#724) (#725) thanks @maelle
  • changed iucn_summary() to output the same structure when no match is found as when a match is found so that when output is passed to iucn_status() behavior is the same (#708) thanks @Rekyt
  • skip tax_name() tests on CRAN (#728)
  • httr replaced by crul throughout (#590)
  • most unit tests that make HTTP requests now cached with vcr, making tests much faster and not prone to errors to remote services being down (#729)
  • EOL: The EOL API underwent major changes, and we've attempted to get things in working order. eol_dataobjects() gains new parameter language. eol_pages() loses iucn, images, videos, sounds, maps, and text parameters, and gains images_per_page, videos_per_page, sounds_per_page, maps_per_page, texts_per_page, and texts_page. Please do let us know if you find any problems with any EOL functions (#717) (#718)
  • As part of EOL changes, the default db value for comm2sci() and sci2comm() is now ncbi instead of eol
  • EUBON base URL now https instead of http
  • A number of get_*() functions changed parameter verbose to messages to not conflict with verbose passed down to crul::HttpClient
  • ping functions: ncbi_ping() reworked to allow use of your api key as a parameter or pulled from your environemnt; eol_ping() using https instead of http, and parsing JSON instead of XML.

BUG FIXES

  • get_eolid() was erroring when no results found for a query due to not assigning an internal variable (#701) (#709) thanks for the fix @taddallas
  • get_tolid() was erroring when values were NULL - now replacing all NULLL with NA_character_ to make data.table::rbindlist() happy (#710) (#711) thanks @gpli for the fix
  • add additional rows to the rank_ref data.frame of taxonomic ranks: species subgroup, forma, varietas, clade, megacohort, supercohort, cohort, subcohort, infracohort. when there's no matched rank errors can result in many of the downstream functions. The data.frame now has 43 rows. (#720) (#727)
  • fix to downstream() and ncbi_get_taxon_summary(): change in ncbi_get_taxon_summary to break up queries into smaller chunks to avoid HTTP 414 errors ("URI too long") (#727) (#730) thanks for reporting @fischhoff and @benjaminschwetz
  • a number of fixes internally (not user facing) to comply with upcoming R-devel changes for checking length greater than 1 in logical statements (#731)

v0.9.4

5 years ago

NEW FEATURES

  • new contributor: Gaopeng Li
  • gains new functions for helping the user get authentication keys/tokens: use_entrez(), use_eol(), use_iucn() (which uses internally rredlist::rl_use_iucn()), and use_tropicos() (#682) (#691) (#693) By @maelle

MINOR IMPROVEMENTS

  • remove commented out code

BUG FIXES

  • fix tropicos_ping()
  • fixed downstream() and gbif_downstream(): some of the results don't have a canonicalName, so now safely try to get that field (#673)
  • fixed as.uid(), was erroring when passing in a taxon ID (#674) (#675) by @zachary-foster
  • fix in get_boldid() (and by extension classification(..., db = "bold")): was failing when no parent taxon found, just fill in with NA now (#680)
  • fix to synonyms(): was failing for some TSNs for db="itis" (#685)
  • fix to tax_name(): rows arg wasn't being passed on internally (#686)
  • fix to gnr_resolve() and gnr_datasources(): problems were caused by http scheme, switched to use https instead of http (#687)
  • fix to class2tree(): organisms with unique rank lower than non-unique ranks will give extra wrong rows (#689) (#690) thanks @gpli
  • fix in ncbi_get_taxon_summary(): changes in the NCBI API most likely lead to HTTP 414 (URI Too Long) errors. we now loop internally for the user. By extension this helps problems upsteam in downstream()/ncbi_downstream()/ncbi_children() (#698)
  • fix in class2tree(): was erroring when name strings contained pound signs (e.g., #) (#699) (#700) thanks @gpli

v0.9.3

6 years ago

MINOR IMPROVEMENTS

  • package gains three new authors: Bastian Greshake Tzovaras, Philippe Marchand, and Vinh Tran
  • Don't enforce rate limiting via Sys.sleep for NCBI requests if the user has an API key (#667)
  • Fix to all functions that do NCBI requests to work whether or not a user has an NCBI API key (#668)
  • Increased documentation on authentication, see ?taxize-authentication
  • Further conversion of verbose to messages across the package so that supressing calls to message() do not conflict with curl options passed in
  • Converted genbank2uid() and ncbi_get_taxon_summary() to use crul instead of httr for HTTP requests

BUG FIXES

  • Fix to get_tolid(): it was missing assignment of the att attribute internally, causing failures in some cases (#663) (#672)
  • Fix to ncbi_children() (and thus children() when requesting NCBI data) to not fail when there is an empty result from the internal call to classification() (#664) thanks @arendsee

v0.9.2

6 years ago

Installation

Stalled on CRAN. Install like

install.packages("taxize", repos = c("http://packages.ropensci.org"))

OR

remotes::install_github("ropensci/taxize")
# OR
devtools::install_github("ropensci/taxize")

NEWS

NEW FEATURES

  • class2tree() gets a major overhaul thanks to @gedankenstuecke and @trvinh (!!). The function now takes unnamed ranks into account when clustering, which fixes problem where trees were unresolved for many splits as the named taxonomy levels were shared between them. Now it makes full use of the NCBI Taxonomy string, including the unnamed ranks, leading to higher resolution trees that have less multifurcations (#611) (#634)
  • Added support throughout package for use of NCBI Entrez API keys - NCBI now strongly encourages their use and you get a higher rate limit when you use one. See ?taxize-authentication for help. Importantly, note that API key names (both R options and environment variables) have changed. They are now the same for R options and env vars: TROPICOS_KEY, EOL_KEY, PLANTMINER_KEY, ENTREZ_KEY. You no longer need an API key for Plantminer. (#640) (#646)
  • New author Zebulun Arendsee (@arendsee)
  • New package dependencies: crul and zoo

MINOR IMPROVEMENTS

  • In downstream() we now pass on limit and start parameters to gbif_downstream(); we weren't doing that before; the two parameters control pagination (#638)
  • genbank2uid() now returns the correct ID when there are multiple possibilities and invalid IDs no longer make whole batches fail (#642) thanks @zachary-foster
  • children() outputs made more consistent for certain cases when no results found for searches (#648) (#649) thanks @arendsee
  • Improve downstream() by passing ... (additional parameters) down to ncbi_children() used internally. allows e.g., use of ambiguous parameter in ncbi_children() allows you to remove ambiguousl named nodes (#653) (#654) thanks @arendsee
  • swapped out use of httr for crul in EOL and Tropics functions - note that this won't affect you unless you're passing curl options. see package crul for help on curl options. Along with this change, the parameter verbose has changed to messages (for toggling printing of information messages)

DOCUMENTATION

  • Added additional text to the CONTRIBUTING.md file for how to contribute to the test suite (#635)

BUG FIXES

  • genbank2uid now returns the correct ID when there are multiple possibilities and invalid IDs no longer make whole batches fail.
  • Fix to downstream(): passing numeric taxon ids to the function while using db="ncbi" wasn't working (#641) thanks @arendsee
  • Fix to children(): passing numeric taxon ids to the function while using db="worms" wasn't working (#650) (#651) thanks @arendsee
  • synonyms_df() - that attemps to combine many outputs from the synonyms() function - now removes NA/NULL/empy outputs before attempting the combination (#636)
  • Fix to gnr_resolve(): before if preferred_data_sources was used, you would get the preferred data but only a few columns of the response. We now return all fields; however, we only return the preferred data part when that parameter is used (#656)
  • Fixes to children(). It was returning unexpected results for amgiguous taxonomic names (e.g., there's some insects that are returned when searching within Bacteria). It was also failing when one tried to get the children of a root taxon (e.g., the children of the NCBI id 131567). (#639) (#647) fixed via PR (#659) thanks @arendsee and @zachary-foster

v0.9.0

6 years ago

Changes to get_*() functions

  • Added separate documentation file for all get* functions describing attributes and various exception behaviors
  • Some get*() functions had NaN as default rows parameter value. Those all changed to NA
  • Better failure behavior now when non-acceptable rows parameter value given
  • Added in all type checks for parameters across get_*() functions
  • Changed behavior across all get_*() functions to behave the same when ask = FALSE, rows = 1 and ask = TRUE, rows = 1 as these should result in the same outcome. (#627) thanks @zachary-foster !
  • Fixed direct match behavior so that when there's multiple results from the data provider, but no direct match, that the functions don't give back just NA with no inication that there were multiple matches.
  • Please let me know if any of these changes cause problems for your code or package.

NEW FEATURES

  • Change comm2sci() to S3 setup with methods for character, uid, and tsn (#621)
  • iucn_status() now has S3 setup with a single method that only handles output from the iucn_summary() function.

MINOR IMPROVEMENTS

  • Add required key parameter to fxn iucn_id() (#633)
  • imrove docs for sci2comm(): to indicate how to get non-simplified output (which includes what language the common name is from) vs. getting simplified output (#623) thanks @glaroc !
  • Fix to sci2comm() to not be case sensitive when looking for matches (#625) thanks @glaroc !
  • Two additional columns now returned with eol_search(): link and content
  • Improve docs in eol_search() to describe returned data.frame
  • Fix bold_bing() to use new base URL for their API
  • Improved description of the dataset rank_ref, see ?rank_ref

BUG FIXES

  • Fix to downstream() via fix to rank_ref dataset to include "infraspecies" and make "unspecified" and "no rank" requivalent. Fix to col_downstream() to remove properly ranks lower than allowed. (#620) thanks @cdeterman !
  • iucn_summary: changed to using rredlist package internally. sciname param changed to x. iucn_summary_id() now is deprecated in favor of iucn_summary(). iucn_summary() now has a S3 setup, with methods for character and iucn (#622)
  • Added "cohort" to rank_ref dataset as that rank sometimes used at NCBI (from bug reported in ncbi_downstream()) (#626)
  • Fix to sci2comm(), add tryCatch() to internals to catch failed requests for specific pageid's (#624) thanks @glaroc !
  • Fix URL for taxa for NBN taxonomic ids retrieved via get_nbnid() (#632)

v0.8.9

6 years ago

BUG FIXES

  • Remove ape::neworder_phylo object, which is not used anymore in taxize
    (#618) (#619) thanks @ashiklom

v0.8.8

6 years ago

NEW FEATURES

  • New function ncbi_downstream() and now NCBI is an option in the function downstream() (#583) thanks for the push @andzandz11
  • New data source: Wiki*, which includes Wikipedia, Wikispecies, and Wikidata - you can choose which you'd like to search. Uses new package wikitaxa, with contributions from @ezwelty (#317)
  • scrapenames() gains a parameter return_content, a boolean, to optionally return the OCR content as a text string with the results. (#614) thanks @fgabriel1891
  • New function get_iucn() - to get IUCN Red List ids for taxa. In addition, new S3 methods synonyms.iucn and sci2comm.iucn - no other methods could be made to work with IUCN Red List ids as they do no share their taxonomic classification data (#578) thanks @diogoprov

MINOR IMPROVEMENTS

  • bold now an option in classification() function (#588)
  • fix to NBN to use new base URL (#582) ($597)
  • genbank2uid() can give back more than 1 taxon matched to a given Genbank accession number. Now the function can return more than one match for each query, e.g., try genbank2uid(id = "AM420293") (#602) thanks @sariya
  • had to modify cbind() usage to incclude ... for method consistency (#612)
  • tax_rank() used to be able to do only ncbi and itis. Can now do a lot more data sources: ncbi, itis, eol, col, tropicos, gbif, nbn, worms, natserv, bold (#587)
  • Added to classification() docs in a section Lots of results a note about how to deal with results when there are A LOT of them. (#596) thanks @ahhurlbert for raising the issue
  • tnrs() now returns the resulting data.frame in the oder of the names passed in by the user (#613) thanks @wpetry
  • Changes to gnr_resolve() to now strip out taxonomic names submitted by user that are NA, or zero length strings, or are not of class character (#606)
  • Added description of the columns of the data.frame output in gnr_resolve() (#610) thanks @kamapu
  • Added noted in tnrs() docs that the service doesn't provide any information about homonyms. (#610) thanks @kamapu
  • Added parvorder to the taxize rank_ref dataset - used by NCBI - if tax returned with that rank, some functions in taxize were failing due to that rank missing in our reference dataset rank_ref (#615)

BUG FIXES

  • Fix to get_colid() via problem in parsing within col_search() (#585)
  • Fix to gbif_downstream (and thus fix in downstream()): there was two rows with form in our rank_ref reference dataset of rank names, causing > 1 result in some cases, then causing vapply to fail as it's expecting length 1 result (#599) thanks @andzandz11
  • Fix genbank2uid(): was failing when getting more than 1 result back, works now (#603) and fails better now, giving back warnings/error messages that are more informative (see also #602) thanks @sariya
  • Fix to synonyms.tsn(): in some cases a TSN has > 1 accepted name. We get accepted names first from the TSN, then look for synonyms, and hadn't accounted for > 1 accepted name. Fixed now (#607) thanks @tdjames
  • Fixed bug in sci2comm() - was not dealing internally with passing the simplify parameter (#616)