A taxonomic toolbelt for R
get_*
functions gain some new features (associated new fxns are taxon_last
and taxon_clear
): a) nicer messages printed to the console when iterating through taxa, and a summary at the end of what was done; and b) state is now saved when running get_*
functions. That is, in an object external to the get_*
function call we keep track of what happened, so that if an error is encountered, you can easily restart where you left off; this is especially useful when dealing with a large number of inputs to a get_*
function. To utilize, pass the output of taxon_last()
to a get_*
function call. Associated with these changes are new package imports: R6, crayon and cli (#736) (#757)taxize_options()
to set options when using taxize. the first reason for the function is to set two options for the above item for get_*
functions: taxon_state_messages
to allow taxon state tracking messages in get_*
functions or not, and quiet=TRUE
quiets output from the taxize_options()
function itselfid2name()
and worms_downstream()
use worrms::wm_record
instead of worrms::wm_record_
for newest version of worrms
(#760)get_*
functions and col_downstream()
parameter verbose
changed to messages
to not conflict with a verbose
curl options parameter passed in to crul
gbif_downstream()
- GBIF in some cases returns a rank of "unranked", which we hadn't accounted for in internal rank processing code (#758) thanks @ocstringhamget_pow()
, get_pow_()
, as.pow()
, classification.pow()
, pow_search()
, and pow_lookup()
(#598) (#739)taxize
. the string will look something like r-curl/3.3 crul/0.7.0 rOpenSci(taxize/0.9.6)
, including the versions of the curl
R pkg, the crul
package, and the taxize
package (#662)get_colid
functionality: we weren't paginating for the user when there were more than 50 results for a query; we now paginate for the user using async HTTP requests; this means that some requests will take longer than they did before if they have more than 50 results; this is a good change given that you get all the results for your query now (#743)get_*
functions: in some of the get_*
functions we tried for a direct match (e.g., "Poa" == "Poa"
) and if one was found, then we were done and returned that record. however, we didn't deploy the same logic across all get_*
functions. Now all get_*
functions check for a direct match. Of course if there is a direct match with more than 1 result, you still get the prompt asking you which name you want. (#631) (#734)taxize-authentication
manual file covering authentication information across the package (#681)gnr_resolve()
docs about age of datasets used in the Global Names Resolver, and how to access age of datasets (#737)get_eolid()
fixes: gains new attribute pageid
; uri
's given are updated to EOL's new URL format; rank
and datasource
parameters were not documented, now are; we no longer use short names for data sources within EOL, but instead use their full names (#702) (#742)col_search()
now returns attributes on the output data.frame's with number of results found and returned, and other metadata about the searchgnr_datasources()
loses the todf
parameter; now always returns a data.frame and the data.frame has all the columns, whereas the default call returned a limited set of columns in previous versionsget_wormsid()
, was failing when there was a direct match found with more than 1 result (#740)get_*
functions: linting of the input to the rows
parmeter was failing with a vector of values in some cases (#741)iucn_summary()
; we weren't passing on the API key internally correctly (#735) thanks @PrincessPi314 for the reporthttps://github.com/ropensci/taxize/compare/v0.9.4...v0.9.5
iucn_summary_id()
is defunct, use iucn_summary()
insteadcol_downstream()
gains parameter extant_only
(logical) to optionally keep extant taxa only (#714) thanks @ArielGreiner for the inquirydownstream()
gains another db
options: Worms. You can now set db="worms"
to use Worms to get taxa downstream from a target taxon. In addition, taxize
gains new function worms_downstream()
, which is used under the hood in downstream(..., db="worms")
(#713) (#715)id2name()
with db
options for tol, itis, ncbi, worms, gbif, col, and bold. the function converts taxonomid IDs to names. It's sort of the inverse of the get_*()
family of functions. (#712) (#716)tax_rank()
gains new parameter rows
so that one can pass rows
down to get_*()
functionssynonyms()
warning from an internal cbind()
call now fixed (#704) (#705) thanks @vijaybarvetaxize
function calls thrown when notifying users about API keys (e.g., taxize::use_tropicos()
) to make it very clear where the functions live (to avoid confusion with usethis
) (#724) (#725) thanks @maelleiucn_summary()
to output the same structure when no match is found as when a match is found so that when output is passed to iucn_status()
behavior is the same (#708) thanks @Rekyttax_name()
tests on CRAN (#728)httr
replaced by crul
throughout (#590)vcr
, making tests much faster and not prone to errors to remote services being down (#729)eol_dataobjects()
gains new parameter language
. eol_pages()
loses iucn
, images
, videos
, sounds
, maps
, and text
parameters, and gains images_per_page
, videos_per_page
, sounds_per_page
, maps_per_page
, texts_per_page
, and texts_page
. Please do let us know if you find any problems with any EOL functions (#717) (#718)db
value for comm2sci()
and sci2comm()
is now ncbi
instead of eol
get_*()
functions changed parameter verbose
to messages
to not conflict with verbose
passed down to crul::HttpClient
ncbi_ping()
reworked to allow use of your api key as a parameter or pulled from your environemnt; eol_ping()
using https instead of http, and parsing JSON instead of XML.get_eolid()
was erroring when no results found for a query due to not assigning an internal variable (#701) (#709) thanks for the fix @taddallasget_tolid()
was erroring when values were NULL
- now replacing all NULLL
with NA_character_
to make data.table::rbindlist()
happy (#710) (#711) thanks @gpli for the fixrank_ref
data.frame of taxonomic ranks: species subgroup, forma, varietas, clade, megacohort, supercohort, cohort, subcohort, infracohort. when there's no matched rank errors can result in many of the downstream functions. The data.frame now has 43 rows. (#720) (#727)downstream()
and ncbi_get_taxon_summary()
: change in ncbi_get_taxon_summary
to break up queries into smaller chunks to avoid HTTP 414 errors ("URI too long") (#727) (#730) thanks for reporting @fischhoff and @benjaminschwetzuse_entrez()
, use_eol()
, use_iucn()
(which uses internally rredlist::rl_use_iucn()
), and use_tropicos()
(#682) (#691) (#693) By @maelletropicos_ping()
downstream()
and gbif_downstream()
: some of the results don't have a canonicalName
, so now safely try to get that field (#673)as.uid()
, was erroring when passing in a taxon ID (#674) (#675) by @zachary-fosterget_boldid()
(and by extension classification(..., db = "bold")
): was failing when no parent taxon found, just fill in with NA now (#680)synonyms()
: was failing for some TSNs for db="itis"
(#685)tax_name()
: rows
arg wasn't being passed on internally (#686)gnr_resolve()
and gnr_datasources()
: problems were caused by http scheme, switched to use https instead of http (#687)class2tree()
: organisms with unique rank lower than non-unique ranks will give extra wrong rows (#689) (#690) thanks @gplincbi_get_taxon_summary()
: changes in the NCBI API most likely lead to HTTP 414 (URI Too Long) errors. we now loop internally for the user. By extension this helps problems upsteam in downstream()
/ncbi_downstream()
/ncbi_children()
(#698)class2tree()
: was erroring when name strings contained pound signs (e.g., #
) (#699) (#700) thanks @gpliSys.sleep
for NCBI requests if the user has an API key (#667)?taxize-authentication
verbose
to messages
across the package so that supressing calls to message()
do not conflict with curl options passed ingenbank2uid()
and ncbi_get_taxon_summary()
to use crul
instead of httr
for HTTP requestsget_tolid()
: it was missing assignment of the att
attribute internally, causing failures in some cases (#663) (#672)ncbi_children()
(and thus children()
when requesting NCBI data) to not fail when there is an empty result from the internal call to classification()
(#664) thanks @arendseeStalled on CRAN. Install like
install.packages("taxize", repos = c("http://packages.ropensci.org"))
OR
remotes::install_github("ropensci/taxize")
# OR
devtools::install_github("ropensci/taxize")
class2tree()
gets a major overhaul thanks to @gedankenstuecke and @trvinh (!!). The function now takes unnamed ranks into account when clustering, which fixes problem where trees were unresolved for many splits as the named taxonomy levels were shared between them. Now it makes full use of the NCBI Taxonomy string, including the unnamed ranks, leading to higher resolution trees that have less multifurcations (#611) (#634)?taxize-authentication
for help. Importantly, note that API key names (both R options and environment variables) have changed. They are now the same for R options and env vars: TROPICOS_KEY, EOL_KEY, PLANTMINER_KEY, ENTREZ_KEY. You no longer need an API key for Plantminer. (#640) (#646)crul
and zoo
downstream()
we now pass on limit
and start
parameters to gbif_downstream()
; we weren't doing that before; the two parameters control pagination (#638)genbank2uid()
now returns the correct ID when there are multiple possibilities and invalid IDs no longer make whole batches fail (#642) thanks @zachary-fosterchildren()
outputs made more consistent for certain cases when no results found for searches (#648) (#649) thanks @arendseedownstream()
by passing ...
(additional parameters) down to ncbi_children()
used internally. allows e.g., use of ambiguous
parameter in ncbi_children()
allows you to remove ambiguousl named nodes (#653) (#654) thanks @arendseehttr
for crul
in EOL and Tropics functions - note that this won't affect you unless you're passing curl options. see package crul
for help on curl options. Along with this change, the parameter verbose
has changed to messages
(for toggling printing of information messages)CONTRIBUTING.md
file for how to contribute to the test suite (#635)genbank2uid
now returns the correct ID when there are multiple possibilities and invalid IDs no longer make whole batches fail.downstream()
: passing numeric taxon ids to the function while using db="ncbi"
wasn't working (#641) thanks @arendseechildren()
: passing numeric taxon ids to the function while using db="worms"
wasn't working (#650) (#651) thanks @arendseesynonyms_df()
- that attemps to combine many outputs from the synonyms()
function - now removes NA/NULL/empy outputs before attempting the combination (#636)gnr_resolve()
: before if preferred_data_sources
was used, you would get the preferred data but only a few columns of the response. We now return all fields; however, we only return the preferred data part when that parameter is used (#656)children()
. It was returning unexpected results for amgiguous taxonomic names (e.g., there's some insects that are returned when searching within Bacteria). It was also failing when one tried to get the children of a root taxon (e.g., the children of the NCBI id 131567). (#639) (#647) fixed via PR (#659) thanks @arendsee and @zachary-fosterget_*()
functionsget*()
functions had NaN
as default rows
parameter
value. Those all changed to NA
rows
parameter value givenget_*()
functionsget_*()
functions to behave the
same when ask = FALSE, rows = 1
and ask = TRUE, rows = 1
as these
should result in the same outcome. (#627) thanks @zachary-foster !NA
with no inication that there were multiple matches.comm2sci()
to S3 setup with methods for character
, uid
,
and tsn
(#621)iucn_status()
now has S3 setup with a single method that only handles
output from the iucn_summary()
function.key
parameter to fxn iucn_id()
(#633)sci2comm()
: to indicate how to get non-simplified
output (which includes what language the common name is from) vs.
getting simplified output (#623) thanks @glaroc !sci2comm()
to not be case sensitive when looking for matches
(#625) thanks @glaroc !eol_search()
: link
and content
eol_search()
to describe returned data.frame
bold_bing()
to use new base URL for their APIrank_ref
, see ?rank_ref
downstream()
via fix to rank_ref
dataset to include
"infraspecies" and make "unspecified" and "no rank" requivalent.
Fix to col_downstream()
to remove properly ranks lower than
allowed. (#620) thanks @cdeterman !iucn_summary
: changed to using rredlist
package internally.
sciname
param changed to x
. iucn_summary_id()
now is
deprecated in favor of iucn_summary()
. iucn_summary()
now has a
S3 setup, with methods for character
and iucn
(#622)rank_ref
dataset as that rank sometimes used
at NCBI (from bug reported in ncbi_downstream()
) (#626)sci2comm()
, add tryCatch()
to internals to catch
failed requests for specific pageid's (#624) thanks @glaroc !get_nbnid()
(#632)ncbi_downstream()
and now NCBI is an option in
the function downstream()
(#583) thanks for the push @andzandz11wikitaxa
, with contributions from @ezwelty (#317)scrapenames()
gains a parameter return_content
, a boolean, to
optionally return the OCR content as a text string with the results. (#614)
thanks @fgabriel1891get_iucn()
- to get IUCN Red List ids for taxa. In addition,
new S3 methods synonyms.iucn
and sci2comm.iucn
- no other methods could
be made to work with IUCN Red List ids as they do no share their taxonomic
classification data (#578) thanks @diogoprovbold
now an option in classification()
function (#588)genbank2uid()
can give back more than 1 taxon matched to a given
Genbank accession number. Now the function can return more than one
match for each query, e.g., try genbank2uid(id = "AM420293")
(#602)
thanks @sariyacbind()
usage to incclude ...
for method
consistency (#612)tax_rank()
used to be able to do only ncbi and itis. Can now do a
lot more data sources: ncbi, itis, eol, col, tropicos, gbif, nbn,
worms, natserv, bold (#587)classification()
docs in a section Lots of results
a
note about how to deal with results when there are A LOT of them. (#596)
thanks @ahhurlbert for raising the issuetnrs()
now returns the resulting data.frame in the oder of the
names passed in by the user (#613) thanks @wpetrygnr_resolve()
to now strip out taxonomic names submitted
by user that are NA, or zero length strings, or are not of class
character (#606)gnr_resolve()
(#610) thanks @kamaputnrs()
docs that the service doesn't provide any
information about homonyms. (#610) thanks @kamapuparvorder
to the taxize
rank_ref
dataset - used by NCBI -
if tax returned with that rank, some functions in taxize
were failing
due to that rank missing in our reference dataset rank_ref
(#615)get_colid()
via problem in parsing within col_search()
(#585)gbif_downstream
(and thus fix in downstream()
): there
was two rows with form in our rank_ref
reference dataset of rank names,
causing > 1 result in some cases, then causing vapply
to fail as it's
expecting length 1 result (#599) thanks @andzandz11genbank2uid()
: was failing when getting more than 1 result back,
works now (#603) and fails better now, giving back warnings/error messages
that are more informative (see also #602) thanks @sariyasynonyms.tsn()
: in some cases a TSN has > 1 accepted name. We
get accepted names first from the TSN, then look for synonyms, and hadn't
accounted for > 1 accepted name. Fixed now (#607) thanks @tdjamessci2comm()
- was not dealing internally with passing
the simplify
parameter (#616)