MiXCR is an ultimate software platform for analysis of Next-Generation Sequencing (NGS) data for immune profiling.
findShmTrees
to combine heavy and light SHM trees utilizing information added to clonotypes by groupClones
command. Nodes in resulting tree will contain both light and heavy chains. If there is no connection to a clone from a companion chain, a reconstructed sequence will be added.--dont-combine-tree-by-cells
option to reconstruct separate heavy and light SHM treesexportShmSingleCellTrees
command that export one node per line. It there is several roots in a tree, data will be exported in a different columns.-subtreeId
to tree exports to differentiate part of trees from different chainsexportShmTreesWithNodes
and exportShmTrees
commands will export subtrees with different chains at separate rows.groupClones
commandexportClones
in groupId
column. Such clones can be filtered out from export by --filter-out-group-types contamination
undefiened
group being split by cell barcodes-subtreeId
for determination of different chains in the same tree-numberOfClonesInTree [forChain]
Number of uniq clones in the SHM tree.-numberOfNodesWithClones
Number of nodes with clones, i.e. nodes with different clone sequences.-totalReadsCountInTree [forChain]
Total sum of read counts of clones in the SHM tree.-totalUniqueTagCountInTree (Molecule|Cell|Sample) [forChain]
Total count of unique tags in the SHM tree with specified type.-chains
Chain type of the tree-treeHeight
Height of the tree-vGene
, -jGene
, -vFamily
, -jFamily
- in previous version thous were exported only for nodes with clones-vBestIdentityPercent
, -jBestIdentityPercent
, -isOOF
and -isProductive
now exported for reconstructed nodes too-aaLength
and -allAALength
is available alongside -nLength
and -allNLength
-aaMutationsRate
is available alongside -nMutationsRate
germline
in -nFeature
, -aaFeature
, -nLength
, -aaLength
in exportClones
, exportAlignments
and exportCloneGroups
. It allows to export a sequence of the germline instead of a sequence of the gene.-mutationsDetailed
) added optional filter by mutation type: ... [(substitutions|indels|inserts|deletions)]
-nMutationsCount
, -aaMutationsCount
, -allNMutationsCount
, -allAAMutationsCount
for all relatable exportsexportShmTreesWithNodes
(germline|mrca|parent)
option is now optional. Will be export mutations from germline
by default--export-clone-groups-sort-chains-by
mixinVCDR3Part
, DCDR3Part
or JCDR3Part
-nLength
, -nMutationsCount
, -nMutationsRate
can be calculated for multiple gene features (e.g. -nMutationsRate VRegionTrimmed,JRegionTrimmed
)--export-clone-groups-sort-chains-by
mixin with type of sorting of clones for determination of the primary and the secondary chains. It applies to exportCloneGroups
command. By default, it's Auto
(by UMI if it's available, by Read otherwise; previous default value was Read
)--filter-out-group-types
mixin to filter-out clones having certain clone group assignment kind: found
, undefined
or contamination
. It applies to exportClones
commandexportCloneGroups
by default will export groups in separate files for IG
, TRAB
, TRGD
and mixed
. This behaviour could be switched off by using --reset-export-clone-table-splitting
or single --export-clone-groups-for-cell-type
. In case of several --export-clone-groups-for-cell-type
every cell type will be exported in separate file.--export-clone-groups-for-cell-type
in exportCloneGroups
all mixed or unmatched groups will be filtered out.TRAD
meta-chain split into TRA
and TRD
as it should be. Chain assignment for clonotypes based on J genes.IGH
, TRB
, TRA
and TRD
chains. Now allelic names correspond to the IUIS nomenclature.IGK
Vend
coordinates corrected.UTR5Begin
coordinates added to the following mouse genes: IGKV23-1, IGKV20-101-2, IGKV14-130, IGKV8-28, TRGV2milab-human-rna-tcr-umi-race
preset has been updated: now clones are assembled by default based on the CDR3, in line with the manufacturer's recommended read length.flairr-seq-bcr
preset has been updated: now the preset sets species to human
by default according to a built-in tag pattern with primer sequences.invivoscribe-human-dna-trg-lymphotrack
,invivoscribe-human-dna-trb-lymphotrack
, invivoscribe-human-dna-igk-lymphotrack
,invivoscribe-human-dna-ighv-leader-lymphotrack
,invivoscribe-human-dna-igh-fr3-lymphotrack
, invivoscribe-human-dna-igh-fr2-lymphotrack
,invivoscribe-human-dna-igh-fr1-lymphotrack
,invivoscribe-human-dna-igh-fr123-lymphotrack
.thermofisher-mouse-rna-tcb-ampliseq-sr
,thermofisher-mouse-dna-tcb-ampliseq-sr
,thermofisher-mouse-rna-igh-ampliseq-sr
,thermofisher-mouse-dna-igh-ampliseq-sr
.takara-sc-human-rna-tcr-smarter
milab-human-rna-ig-umi-multiplex
preset has been updated: the pattern now trims fewer nucleotides, which facilitates CDR1 identification. The splits by V and J genes have been removed as redundant due to the full-length assembling feature.Combining trees
step in findShmTrees
commandVJJunction
in shmTrees
exports now produces an error-nMutationsRate
-nMutationsRate
if region is not covered for the cloneexportAlignmentsPretty
broken in the previous versionexportAlignmentsPretty
for cases where translation can't be performedanalyze
executed with -f
and --output-not-used-reads
at the same time-nMutationsRate
for CDR3 in exportShmTreesWithNodes
extend
with .vdjca
inputfindShmTrees
filter for productive only clones now check for stop codons in all features, not only in CDR3findShmTrees
to false (was true before)--productive-only
to findShmTrees
--export-clone-groups-for-cell-type
parameterslice
command on clnx files that weren't ordered by id.slice
now default behaviour is to keep original ids. Previous behaviour available with --reassign-ids
option--assemble-clonotypes-by [VDJRegion,CBegin(0,10)]
exportClonesOverlap
exportAirr
in case of a clone with CDR3 that don't have VCDR3Part and JCDR3Partclone_id
column in exportAirr
exportClones
in case of splitting file by tag:...
if there is a clone that have several tags of requested level-nMutationsCount
, -nMutationsRate
, -aaMutationsCount
and -aaMutationsRate
. Previously in some cases it was calculated on different region, from what was requested.CellBarcodesWithFoundGroups
for groupClones
QC checks--no-feature
in exportAlignmentsPretty
align
, now coverage takes into account alignment-aided overlap--build-from <path>
was removed from findShmTrees
command-lengthOf
now is deprecated, use -nLength
instead-allLengthOf
now is deprecated, use -allNLength
instead-mutationRate
now is deprecated, use -nMutationsRate
insteadNow MiXCR calculates Heavy-Light antibody and Alpha-Beta and Gamma-Delta TCR combined clones for single-cell data. Two new commands were introduced to enable this functionality:
groupClones
: calculates multi-chain clones from assembled clonotypes and writes result in a binary format;exportCloneGroups
: export information about combined clonotypes.All single-cell presets now automatically produce combined multi-chain output in both binary and textual formats, see files with names matching *.clone.groups.tsv
pattern in the output folder.
-biochemicalProperty <geneFeature> <property>
or -baseBiochemicalProperties <geneFeature>
export options. Available in export for alignments, clones and SHM tree nodes. Available properties: Hydrophobicity
, Charge
, Polarity
, Volume
, Strength
, MjEnergy
, Kf1
, Kf2
, Kf3
, Kf4
, Kf5
, Kf6
, Kf7
, Kf8
, Kf9
, Kf10
, Rim
, Surface
, Turn
, Alpha
, Beta
, Core
, Disorder
, N2Strength
, N2Hydrophobicity
, N2Volume
, N2Surface
.-isotype [<(primary|subclass|auto)>]
-mutationRate [<gene_feature>]
in exportShmTreesWithNodes
, exportClones
and exportCloneGroups
command: number of mutations relative to corresponding germline divided by the target sequence size. For exportClones
and exportCloneGroups
CDR3 is not included in calculation.cram
files as input for analyze
and align
commands. Optionally, a reference to the genome can be specified by --reference-for-cram
analyze
and align
, if file contains both paired and single readsassemble
to collapse UMI/Cell groups into contigs, now have much better seed selection empirical step for multi-consensus assembly scenarios. This significantly increases sensitivity during assembly of secondary consensuses from the same group of sequences.10x-sc-xcr-vdj
preset.cellecta-human-rna-xcr-umi-drivermap-air
. Now UMI includes a part of the C-gene primer to increase diversity, and R2 is also used for payload.irepertoire-human-rna-xcr-repseq-plus
preset. Now {CDR2Begin:FR4End}
.bd-sc-xcr-rhapsody-full-length-enhanced-bead-v2
.takara-mouse-rna-tcr-umi-smarseq
.cellecta-human-dna-xcr-umi-drivermap-air
, cellecta-human-rna-xcr-full-length-umi-drivermap-air
, cellecta-mouse-rna-xcr-umi-drivermap-air
.irepertoire-mouse-rna-xcr-repseq-plus-umi-pe
, irepertoire-human-rna-xcr-repseq-plus-umi-se
,irepertoire-human-rna-xcr-repseq-plus-umi-pe
.isotype
field added to exportClones
for presets supporting isotype identification.thermofisher-human-rna-igh-oncomine-lr
and cellecta-human-rna-xcr-umi-drivermap-air
presets to facilitate isotype separation.maxNormalizedAlignmentPenalty
and altSeedPenaltyTolerance
are adjusted to increase sensitivity.--split-by-sample
option is now set to true
by default for all align
presets, as well as all presets that inherit from it. This new default behavior applies unless it is directly overridden in the preset or with --dont-split-by-sample
mix-in.exportAlignments
now reports UMI and/or Cell barcodes by default for presets with barcodes.--dry-run
option in analyze
--assemble-contigs-by
instead of --assemble-clonotypes-by
.exportClone
and exportShmTreesWithNodes
now output read count as the sum of reads for given tags selection, more complicated formula was used in previous versionsexportAlignments
by default now include the column topChains
. exportClones
function reports topChains
for single cell presets.geneFamilyName
for genes like IGHA*00
(without the number before *
symbol)listPresets
command. Added grouping by vendor, labels and optional filteringalign
or analyze
by given tag patternalign
step speedup for most of the protocol-specific presets (see the list below)generic-ont
, generic-ont-with-umi
, generic-pacbio
, generic-pacbio-with-umi
assemble
with UMI tags but with consensus assembler turned offtakara-human-rna-bcr-umi-smartseq
, takara-human-rna-bcr-umi-smarter
,takara-human-rna-tcr-umi-smartseq
,takara-human-rna-tcr-umi-smarter-v2
,takara-human-rna-tcr-smarter
,takara-mouse-rna-bcr-smarter
,takara-mouse-rna-tcr-smarter
,10x-sc-xcr-vdj
,10x-sc-5gex
,abhelix-human-rna-xcr
,bd-human-sc-xcr-rhapsody-cdr3
,bd-mouse-sc-xcr-rhapsody-cdr3
,bd-sc-xcr-rhapsody-full-length
,cellecta-human-rna-xcr-umi-drivermap-air
,illumina-human-rna-trb-ampliseq-sr
,illumina-human-rna-trb-ampliseq-plus
,irepertoire-human-rna-xcr-repseq-sr
,irepertoire-human-rna-xcr-repseq-lr
,irepertoire-mouse-rna-xcr-repseq-sr
,irepertoire-mouse-rna-xcr-repseq-lr
,irepertoire-human-rna-xcr-repseq-plus
,irepertoire-mouse-rna-xcr-repseq-plus
,irepertoire-human-dna-xcr-repseq-sr
,irepertoire-human-dna-xcr-repseq-lr
,milab-human-rna-ig-umi-multiplex
,milab-human-rna-tcr-umi-race
,milab-human-rna-tcr-umi-multiplex
,milab-human-dna-tcr-multiplex
,milab-human-dna-xcr-7genes-multiplex
,milab-mouse-rna-tcr-umi-race
,neb-human-rna-xcr-umi-nebnext
,qiagen-human-rna-tcr-umi-qiaseq
MiXCR features robust support for inferring donor-specific allelic variants of V and J genes from NGS data, using the findAlleles
command. With this new release, we introduce a comprehensive built-in database of human alleles. Now, the findAlleles
command will utilize known allele names from this integrated library. Feel free to explore our database at https://vdj.online/library.
MiXCR now offers detailed insights into the quality of input data with its new quality control (QC) checks. A comprehensive list of checks provides complete information about the data and facilitates immediate feedback to the wet lab if any issues are detected
Now one can build gene segment reference library for de-novo libraries or for chimeric model animals with just a single buildLibrary
command. Check out our updated guide.
mixcr buildLibrary \
--v-genes-from-fasta v-genes.IGH.fasta \
--v-gene-feature VRegion \
--j-genes-from-fasta j-genes.IGH.fasta \
--d-genes-from-fasta d-genes.IGH.fasta \ # optional
--c-genes-from-fasta c-genes.IGH.fasta \ # optional
--chain IGH \
--species phocoena \
phocoena-IGH.json.gz
Now one can pass sample sheet directly to MiXCR analyze
command as input. This way one can easily run MiXCR for arbitrary structure of input files, demultiplexed or not, with any type of multiplexing used:
mixcr analyze generic-sc-ht-vdj-amplicon --species hsa \
sample-sheet.csv \
output_prefix
Support of MiLaboratories Human 7 Genes DNA Multiplex: milab-human-dna-xcr-7genes-multiplex
Support of Parse Bio Evercode Whole Transcriptome presets: parsebio-sc-3gex-evercode-wt-mini
, parsebio-sc-3gex-evercode-wt
and parsebio-sc-3gex-evercode-wt-mega
Support of FLAIRR-Seq protocol via flairr-seq
preset
New generic single cell presets:
Low throughput (e.g. micro-wells) amplicon-based single cell:
generic-sc-lt-vdj-amplicon
generic-sc-lt-vdj-amplicon-umi
Low throughput (e.g. micro-wells) single cell with fragmentation (RNA-Seq):
generic-sc-lt-vdj-fragmented
generic-sc-lt-vdj-fragmented-umi
High throughput (e.g. droplets) amplicon-based single cell:
generic-sc-ht-vdj-amplicon
generic-sc-ht-vdj-amplicon-umi
High throughput (e.g. droplets) single cell with fragmentation (RNA-Seq):
generic-sc-ht-vdj-fragmented
generic-sc-ht-vdj-fragmented-umi
Reconstructing VDJ from generic gene expression data:
generic-sc-gex
generic-sc-gex-umi
New Biomed2 primer sets: biomed2-human-rna-igkl
, biomed2-human-rna-trbdg
.
Improved aligner parameters for all protocols. We spent in total more than 100,000 CPU/hours running optimization. As a result alignment rate is better for most of the protocols, especially in the case of average data quality.
Adds new minSequenceCount
parameter for k-mer filter, allowing construction of more flexible filtering pipelines with better fallback behaviour for under-sequenced libraries.
Now full sample sheet with input file names can be provided as an input to the pipeline.
Sample sheets provided both with --sample-sheet
mixin and as a pipeline input, will be fuzzy matched against the data, allowing for one substitutions in unambiguous cases. This behaviour can be turned off by using --sample-sheet-strict
mixin instead, or by adding a --strict-sample-sheet-matching
option if full sample sheet input is used as pipeline input.
New commands: mixcr qc
, mixcr buildLibrary
) , mixcr mergeLibrary
, mixcr debugLibrary
)
Various major improvements to sequencing and PCR error correction algorithms for tags and clonotypes:
Mechanism to apply different tag transformations on the align
step. Transformations include mappings, string and sequence manipulations and various arithmetic operations. This feature allows to fit single-cell scenarios where multiple well-known barcodes marks the same cell, allows to convert sequence barcodes to textual representation to adopt different barcode naming schemas used in some protocols, convert multiple barcodes to single cell id. Feature is currently used in presets for analysis of data from Parse Bioscience and BD Rhapsody single-cell platforms.
Special mechanism to allow for NaN
values in metrics in group filters (used in minSequenceCount
parameter in k-mer filter, see below).
Added fallback behaviour for under-sequenced libraries
analyze
if target folder is specified--tag-parse-unstranded
Reads dropped due to low quality, percent of total report string
--chains
is used with exportClonesOverlap
export...
- tag quality field added back to export columnsedgeRealignmentMinScoreOverride
for more sensitive alignments for short paired-end readsalign
now calculate percents relative to the number of reads in the sample rather than the
total number of reads in multi-sample analysisassemble
(--consensus-alignments
, --consensus-state-stat
, --downsample-consensus-state-stat
)
and analyze
(--output-consensus-alignments
, --output-consensus-state-stat
, --downsample-consensus-state-stat
)findAlleles
now recalculate functionality of de novo found allelesassemble
report--by-feature
and --by-gene
to sortClones
-rankByReads
and -rankByTag <(Molecule|Cell|Sample)>
to exportClones
and exportShmTreesWithNodes.txt
readIds
in exportAlignments
by defaultfindAlleles
findAlleles
will remove not used genes from the library (genes that not represented in given donor)--chains
optional in downsampling
command and allow multiple inputexportClones
if file doesn't contain any clonesexportClones
write no_d_gene
if requested VDJunction
, DCDR3Part
or DJJunction
in absence of D hitexportReportsTable
now covers most of significant statistics from reportsCustom entry-point of the image removed, and now is set to /bin/bash
. Now one needs to specify mixcr
command at the beginning of argument list:
Old: docker run ghcr.io/milaboratory/mixcr/mixcr analyze ...
New: docker run ghcr.io/milaboratory/mixcr/mixcr mixcr analyze ...
New image is based on Amazon Corretto which in turn is based on Amazon Linux 2. If customization is required for the image, one now need to use yum
package manager instead of apt
/apt-get
.
With old image:
FROM ghcr.io/milaboratory/mixcr/mixcr:4.3.2
# ...
RUN apt-get install -y wget
# ...
With new image:
FROM ghcr.io/milaboratory/mixcr/mixcr:4.4.0
# ...
RUN yum install -y wget
# ...
see official docs for more detais.
Better compatibility of official docker image with Nextflow
Parameter clusteringFilter.specificMutationProbability
removed from assemble
action. Three new parameters are introduced instead:
clusteringFilter.correctionPower
- this parameter determines how thorough the procedure should eliminate erroneous variants. Smaller value leaves less erroneous variants at the cost of accidentally correcting true variants. This value approximates the fraction of erroneous variants the algorithm will miss (type II errors).
clusteringFilter.backgroundSubstitutionRate
- expected rate of substitutions happening before the sequencing.
clusteringFilter.backgroundIndelRate
- expected rate of indels happening before the sequencing.
Majority of presets underwent name revisions (legacy names remain functional, though accompanied by deprecation warnings). See the full list of renames here.
🐞 This update addresses a significant issue that first appeared in version 4.3.0
, which caused incorrect column names for FR4
nucleotide and amino acid sequences in export tables (e.g. nSeqJGeneWithoutCDR3Part
instead of nSeqFR4
).
findAlleles
now works much faster for extremely diverse samplesassemble
when badQualityThreshold=0
cumtop
fallbacks)-isOOF <gene_feature>
column to export-hasStops <gene_feature>
column to export-isProductive <gene_feature>
column to exportfindAlleles
commandfindAlleles
commandfindAlleles
now more resilient to case when most allele variants of donor differ from *00
alleles in a libraryfindAlleles
command with --output-template
argumentinferMinRecordsPerConsensus == true
and cell level assemblyminRecordsPerConsensus
inference mechanism for new filtering features introduced in previous version (4.3.0)$
) in tag pattern matching algorithmdiscardAmbiguousNucleotideCalls
parameters for contig assembly-cellId
in commands exportClones
and exportAlignments
cell_id
, umi_count
and consensus_count
to exportAirr
commandexportAirr
command now split clones by cells if there is cell barcodes in the dataanalyze
options --not-aligned-..
and --not-parsed-..
with one option --output-not-used-reads
--chains
optiontagValueCELL
) to two columns: tagValue<tag_name>
and tagQuality<tag_name>
#
character now can be used to separate groupName from group matcher in file expansion mechanist (additionally to :
), allowing multi-sample analysis on Windows--assemble-contigs-by
assemble
if --write-all
was used in align
BD Rpahsody full-length protocol
Smart-Seq2 single cell RNA-Seq protocol
Oxford Nanopore long-read technology
Complete support of sample barcodes that may be picked up from all possible sources:
Now one can analyze multiple patient samples at once. Along with a powerful file name expansion functionality, one can process any kind of sequencing protocol with any custom combination of sample, cell and UMI barcoding.
Processing of multiple samples can be done in two principal modes in respect to sample barcodes: (1) data can be split by samples right on the align
stage and processed separately, or (2) all samples can be processed as a single set of sequences and separated only on the very last exportClones
step, both approaches have their pros and cons allowing to use the best strategy given the experimental setup and study goals.
For 10x Genomics and other fragmented protocols, a new powerful k-mer based filtering algorithm is now used to eliminate cross-cell contamination coming from plasmatic cells.
For UMI filtering, a new algorithm from the paper by J. Barron (2020) allows for better automated histogram thresholding in barcoded data filtering.
fastq
files as input (I1
and I2
reads support)I1
and I2
reads)||
, allows to easily adopt MiGEC-style-like sample files)--sample-table
mixin option allowing for flexible sample table definition in a tab-delimited table form--infer-sample-table
mixin option to infer sample table for sample tags from file name expansiongeneric-tcr-amplicon-separate-samples-umi
)align
command now optionally allows to split output alignments by sample into separate vdjca
filesexportClones
command now supports splitting the output into multiple files by sampleanalyze
command supports new splitting behaviour of the align
command, separately running all the analysis steps for all the output files (if splitting is enabled)assemble
, now it correctly handles any possible tag combination (sample, cell or molecule)trimmingQualityThreshold
changed from 0
to 10
), this setting showed better performance in many real world use-casesexportReportsTable
that prints file in tabular format with report data from commands that were runTMPDIR
environment variablelocal:...
) are packed into the output *.vdjca
file on align
step, the same applies to all externally linked information, like tag whitelists and sample lists. This behaviour facilitates better analysis reproducibility and more transparent parameter logistics.analyze
: --set-whitelist
and --reset-whitelist
refineTagsAndSort
options -w
and --whitelist
; corresponding deprecation error message printed if usedexportClones
, allowing to normalize values for -readFraction
and -uniqueTagFraction ...
columns to totals for certain compartments instead of normalizing to the whole dataset. This feature allows to output e.g. fractions of reads inside the cell.--add-export-clone-table-splitting
, --reset-export-clone-table-splitting
, --add-export-clone-grouping
and --reset-export-clone-grouping
findAlleles
commandexportAlignmentsPretty
and exportClonesPretty
--chains
filter for exportShmTrees
, exportShmTreesWithNodes
, exportShmTreesNewick
and exportPlots shmTrees
commandsalign
happening for not-parsed sequences with writeFailedAlignments=true
assemblePartial
; parameter name is minimalNOverlapShare
, it controls minimal relative part of N region that must be covered by the overlap to conclude that two reads are from the same V(D)J rearrangementassemblePartial
procedureassemblePartial
executed for the data without C-gene alignment settingsexportAirr
command-nFeature
for not covered region or not existed tag). Option --not-covered-as-empty
will save previous behaviourfindAlleles
and description of allelesnull
is overridden by null
using the -O...
optionexportClones
with some argumentsALL
in exportAlignments
was preventing not-aligned records to be exportedassemble
rising in analysis of data with CELL barcodes but without UMIs, with turned off consensus assembly--export-productive-clones-only
-Xmx..
JVM option in mixcr
script-Xms..
species
flag to 10x, nanopore and smart-seq2 presetsfindShmTrees
now can build trees from inputs with different tags--impute-germline-on-export
and --dont-impute-germline-on-export
to exportAlignments
and exportClones
commandsallTags:Cell
, allTags:Molecule
). This also facilitates creation of a more generic base
presets implementing common single-cell and UMI filtering strategies.<tag_name>
to <tag_type>
semantics in export columns and --split-by-tag
optionssaveOriginalReads=true
on align
leading to errors down the pipelineanalyze
now correctly terminates on first erroralign
with multiple input files provided by file name expansion mechanism--only-observed
behaviour in exportShmTreesWithNodes
-O...