Bnosac Udpipe Versions Save

R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit

0.8.11

1 year ago

CHANGES IN udpipe VERSION 0.8.11

  • replace move with std::move to fix R CMD check warning on recent versions of clang compilers

0.8.10

1 year ago

CHANGES IN udpipe VERSION 0.8.10

  • use snprintf instead of sprintf to handle the R CMD check deprecating note on M1mac
  • reduction of timings of the examples of document_term_matrix, document_term_frequencies, document_term_frequencies_statistics, cooccurrence, dtm_bind, keywords_collocation

0.8.9

2 years ago

CHANGES IN udpipe VERSION 0.8.9

  • fix R CMD check message on Fedora clang infrastructure: rcpp_udpipe.cpp:243:8: warning: use of bitwise '&' with boolean operands

0.8.8

2 years ago

CHANGES IN udpipe VERSION 0.8.8

  • dtm_svd_similarity, fix to make sure that if provided a dtm with features which are all missing/zero, the scoring still works as expected instead of removing features which contain no data whatsoever. So that dtm_svd_similarity can be used alongside embeddings of R package word2vec which might contain words which are not in the dtm. See the example in ?dtm_svd_similarity
  • added txt_grepl

CHANGES IN udpipe VERSION 0.8.7

  • txt_count now always returns an integer, even if in the border case where a character vector of length 0 is supplied

0.8.6

2 years ago

CHANGES IN udpipe VERSION 0.8.6

  • Downloading models to paths containing non-ASCII characters now works (issue #95)
  • strsplit.data.frame gains ... which are passed on to strsplit (e.g. to use fixed=TRUE for speeding up)
  • read_connlu is now using fixed=TRUE when splitting by newline symbol (for speeding up parsing with function udpipe)
  • Added txt_paste
  • Added txt_context
  • Use html_vignette instead of html_document in the vignettes in order to reduce package size

0.8.5

3 years ago

CHANGES IN udpipe VERSION 0.8.5

  • Added document_term_matrix.default, document_term_matrix.integer and document_term_matrix.numeric
  • Added groups argument to dtm_colsums and dtm_rowsums
  • Added dtm_align
  • Added dtm_sample
  • Added document_term_matrix.matrix
  • dtm_cbind and dtm_rbind allow to pass more than 2 sparse matrices
  • cbind_morphological gains argument which to specify which morphological features to extract
  • txt_count now returns NA when NA is provided instead of an error
  • txt_contains now returns NA when NA is provided instead of FALSE, unless value is set to TRUE
  • txt_collapse now also works if provided a list of character vectors
  • paste.data.frame now works as well if a data.table is passed instead of a data.frame
  • txt_recode gains an extra argument na.rm

0.8.4-1

3 years ago

CHANGES IN udpipe VERSION 0.8.4-1

  • Fixing the Solaris compilation issue in ufal::udpipe::multiword_splitter::append_token

0.8.4

3 years ago

CHANGES IN udpipe VERSION 0.8.4

  • Update to UDPipe 1.2.1 (28 Sep 2018)
    • this adds segment_size and learning_rate_final parameters to tokenizer training
    • correctly set SpaceAfter for last token when normalizing spaces.
  • Default of udpipe_download_model is now changed, downloads now models built on Universal Dependencies 2.5 instead of the models build on Universal Dependencies 2.4
  • Added txt_count
  • Added txt_overlap
  • Added dtm_conform
  • Added dtm_chisq
  • Added dtm_svd_similarity
  • Added as_fasttext
  • Added unlist_tokens
  • txt_recode_ngram now also works gracefully in case ngram is set to 1 although the intention is not to use it when ngram is set to 1
  • Experimental changes regarding cbind_dependencies which might change in a subsequent release.
    • cbind_dependencies now has been implementend for type 'child'.
    • cbind_dependencies now allows to add row numbers of the parent or children where the token is linked to using the dependency parsing output.
  • Experimental and unfinished work on allowing to easily query dependency relations

0.8.3

4 years ago

CHANGES IN udpipe VERSION 0.8.3

  • Default of udpipe_download_model is now changed, downloads now models built on Universal Dependencies 2.4 instead of the models build on Universal Dependencies 2.3
  • also allow strsplit.data.frame to work if the data argument is a data.table
  • in case the model loaded with udpipe_load_model is a nil pointer (most likely due to users which restarted their R sessions without knowing), try reloading the model file in udpipe_annotate
  • fix issue in udpipe_reconstruct giving wrong values in start/end positions of the token in case someone had as well SpacesBefore as SpacesAfter for a token. For users prior to version 0.8.3 you can easily circumvent this issue by removing leading/trailing white space in your text by using trimws on your text before using udpipe::udpipe.
  • document_term_matrix now gains argument weight allowing to select another column to put into the matrix cells
  • add txt_contains

0.8.2

4 years ago

CHANGES IN udpipe VERSION 0.8.2

  • udpipe::udpipe now gains 2 arguments: parallel.cores and parallel.chunksize in order to annotate in parallel over your CPU cores.
  • document_term_matrix.data.frame now preserves order of the documents (issue #44)
  • dtm_remove_lowfreq, dtm_remove_tfidf, dtm_remove_terms gain extra argument remove_emptydocs explicitely add drop=FALSE to internal dtm_... calls
  • add dtm_remove_sparseterms (issue #44)
  • make sure downloading model fails gracefully if github internet resource is not available on CRAN machines
  • udpipe_download_model now also returns download_failed/download_message indicating if the download failed due to internet connectivity issues