SeqArray Versions Save

Data management of large-scale whole-genome sequence variant calls (Development version only)

v1.38.0

1 year ago

CHANGES IN VERSION 1.38.0

UTILITIES

  • new option 'ext_nbyte' in seqGet2bGeno()
  • seqAlleleCount() and seqGetAF_AC_Missing() return NA instead of zero when all genotypes are missing at a site
  • seqGDS2VCF() does not output the FORMAT column if there is no selected sample (e.g., site-only VCF files)
  • seqGetData(, "$chrom_pos2") is similar to seqGetData(, "$chrom_pos") except the duplicates with the suffix ("_1", "_2" or >2)

NEW FEATURES

  • seqGDS2BED() can convert to PLINK BED files with the best-guess genotypes when there are only numeric dosages in the GDS file
  • seqEmptyFile() outputs an empty GDS file

v1.36.0

2 years ago

CHANGES IN VERSION 1.36.0

NEW FEATURES

  • new functions seqUnitCreate(), seqUnitSubset() and seqUnitMerge()
  • new functions seqFilterPush() and seqFilterPop()
  • new functions seqGet2bGeno() and seqGetAF_AC_Missing()
  • new function seqGetData(, "$dosage_sp") for a sparse matrix of dosages
  • the first argument 'gdsfile' can be a file name in seqAlleleFreq(), seqAlleleCount(), seqMissing()
  • new function seqMulticoreSetup() for setting a multicore cluster according to a numeric value assigned to the argument 'parallel'

UTILITIES

  • allow opening a duplicated GDS file ('allow.duplicate=TRUE') when the input is a file name instead of a GDS object in seqGDS2VCF(), seqGDS2SNP(), seqGDS2BED(), seqVCF2GDS(), seqSummary(), seqCheck() and seqMerge()
  • remove the deprecated '.progress' in seqMissing(), seqAlleleCount() and seqAlleleFreq()
  • add summary.SeqUnitListClass()
  • no genotype and phase data nodes from seqSNP2GDS() if SNP dosage GDS is the input

BUG FIXES

  • seqUnitApply() works correctly with selected samples if 'parallel' is a non-fork cluster
  • seqVCF2GDS() and seqVCF_Header() work correctly if the VCF header has white space
  • seqGDS2BED() with selected samples for sex and phenotype information
  • bug fix in seqGDS2VCF() if there is no integer genotype

v1.32.0

2 years ago

CHANGES IN VERSION 1.32.0

NEW FEATURES

  • new option 'ret.idx' in seqSetFilter() for unsorted sample and variant indices
  • new option 'ret.idx' in seqSetFilterAnnotID() for unsorted variant index
  • rewrite the function seqSetFilterPos(): new options 'ref' and 'alt', 'multi.pos=TRUE' by default
  • new option 'packed.idx' in seqAddValue() for packing an indexing variable
  • new option 'warn' in seqSetFilter() to enable or disable the warning
  • new functions seqNewVarData() and seqListVarData() for variable-length data

UTILITIES

  • allow no variant in seqApply() and seqBlockApply()
  • the list object returned from seqGetData() always have names if there are more than one input variable names

BUG FIXES

  • seqGDS2VCF() should output "." instead of NA in the FILTER column
  • seqGetData() should support factor when '.padNA=TRUE' or '.tolist=TRUE'
  • fix seqGDS2VCF() with factor variables
  • seqSummary(gds, "$filter") should return a data frame with zero row if 'annotation/filter' is not a factor

v1.26.0

4 years ago

CHANGES IN VERSION 1.26.0

NEW FEATURES

  • new function seqAddValue()

UTILITIES

  • RLE chromosome coding in seqBED2GDS()
  • change the file name "vignettes/R_Integration.Rmd" to "vignettes/SeqArray.Rmd", so vignette("SeqArray") can work directly
  • correct Estimated remaining Time to Complete (ETC) for load balancing in seqParallel()

BUG FIXES

  • seqBED2GDS(, verbose=FALSE) should have no display

CHANGES

  • use a svg file instead of png in vignettes

v1.24.0

5 years ago

CHANGES IN VERSION 1.24.0

NEW FEATURES

  • a new function seqResetVariantID()
  • a new option in seqRecompress(, compress="none") to uncompress all data
  • seqGetData() allows a GDS file name in the first argument

v1.22.0

5 years ago

CHANGES IN VERSION 1.22.0

UTILITIES

  • avoid duplicated meta-information lines in seqVCF2GDS() and seqVCF_Header()
  • require >= R_v3.5.0, since reading from connections in text mode is buffered
  • seqDigest() requires the digest package
  • optimization in reading genotypes from a subset of samples (according to gdsfmt_1.17.5)

NEW FEATURES

  • seqSNP2GDS() imports dosage GDS files
  • seqVCF_Header() allows a BCF file as an input
  • a new function seqRecompress()
  • a new function seqCheck() for checking the data integrity of a SeqArray GDS file
  • seqGDS2SNP() exports dosage GDS files

BUG FIXES

  • seqVCF2GDS() and seqVCF_Header() are able to import site-only VCF files (i.e., VCF with no sample)
  • fix seqVCF2GDS() and seqBCF2GDS() since reading from connections in text mode is buffered for R >= v3.5.0

v1.21.4

5 years ago

Reading from connections in text mode is buffered for >= R_3.5.0. No use buff in the new version (>=3.5.0) of R_ext/Connections.h:

struct Rconn {
    ...
    unsigned char *buff;
    size_t buff_len, buff_stored_len, buff_pos;
};

Install:

library(devtools)
install_github("zhengxwen/SeqArray", ref="1d5ab05fa8ae8b754feab62f41ab00a182d54793")

v1.16.0

7 years ago

v1.12.8

7 years ago

v1.11.18

8 years ago
  • SeqArray_v1.11.18 is backward compatible with R_v2.15.0
  • the later version will require R (>=v3.3.0), which utilizes the official C API R_GetConnection() to accelerate text import and export
library("devtools")
install_github("zhengxwen/SeqArray", ref="v1.11.18")