Technology agnostic long read analysis pipeline for transcriptomes
talon_abundance
that would error out on fusion / readthrough genes and transcriptsReadthrough transcription detection and gene assignment improvement
talon
, --create_novel_spliced_genes
, which will create a novel gene if a spliced read is found that does not share splice junctions with any genes, but does overlap an existing locus or loci in the referenceSingle-cell support
talon
to use cell barcodes in the alignment files as separate datasets (--cb
)talon_create_adata
). Particularly useful for single-cell and large datasets where dense matrix representation is prohibitive. Capable of generating both gene- and transcript-level AnnDatasRequirements changes
Miscellaneous
talon_filter_transcripts
to exclude ISM transcripts (--excludeISMs
)talon_filter_transcripts
to include all transcripts from the reference annotation, regardless of whether they were observed in the datasets or not (--includeAnnot
)call_longest_ends
)--verbosity
option to talon
to tune how much output the user sees (0 = only errors, 1 = logging, 2 = debug)Added new utility 'talon_get_sjs' to extract the locations, novelty, and transcript assignments of exons/introns in a TALON database or GTF file.
The 'process_remaining_mult_cases' function was using the wrong index to access the end position of multi-exonic transcripts in its call to 'search_for_overlap_with_gene', leading to overlaps being missed in some cases for reads that lacked any matches to the reference splice junctions. This has been fixed as of this version, and a test has been added to the testing suite to cover this case.
Allow user to omit previously added datasets from config file when attempting an additional TALON run on the same database. In the past, this led the database validation step to crash at run's end.
Small change to get_read_annotations.py utility: Added patch to cover case where the gene/transcript status in the original GTF annotation was something other than 'KNOWN'. The novelty status for such cases will now be 'Other' (in the past they led to a crash). The 'NOVEL' designation is reserved for novel transcripts called by TALON specifically.
The underlying functionality of this TALON version is identical to version 4.2, but it has been converted to a Python package in order to create a better experience for users. Once installed, the program can be run from anywhere on the system.
Changes: