Mortazavilab TALON Versions Save

Technology agnostic long read analysis pipeline for transcriptomes

v6.0.1

6 months ago

Added a minor bug fix to talon_abundance that would error out on fusion / readthrough genes and transcripts

v6.0

6 months ago

Readthrough transcription detection and gene assignment improvement

Added option to base talon, --create_novel_spliced_genes, which will create a novel gene if a spliced read is found that does not share splice junctions with any genes, but does overlap an existing locus or loci in the reference
Improved gene assignment for novel transcripts - Reads that contain splice sites that overlap splice sites in a set of multiple non-overlapping genes (without shared splice sites) are assigned to novel genes with the novelty label fusion - Reads that contain splice sites that are shared between a set of multiple genes (ie the genes overlap) are now tiebroken based on distance of read's 5' / 3' ends to transcripts from this gene set - This change improves performance on annotating reads and transcripts to the right gene and thus improves gene quantification as well

Single-cell support

Added option to talon to use cell barcodes in the alignment files as separate datasets (--cb)
Added utility to output TALON quantifications in AnnData format (talon_create_adata). Particularly useful for single-cell and large datasets where dense matrix representation is prohibitive. Capable of generating both gene- and transcript-level AnnDatas

Requirements changes

Removed pybedtools as a requirement as it was breaking installs and is not required
Restriction of Python versions to <=3.6 and >3.8, as Python 3.8 changes the behavior of variable sharing across multiprocessing threads (https://stackoverflow.com/questions/70552775/multiprocess-inherently-shared-memory-in-no-longer-working-on-python-3-10-comin)

Miscellaneous

Added option to talon_filter_transcripts to exclude ISM transcripts (--excludeISMs)
Added option to talon_filter_transcripts to include all transcripts from the reference annotation, regardless of whether they were observed in the datasets or not (--includeAnnot)
Added script to return the longest observed ends for each transcript instead of the ones reported by TALON (call_longest_ends)
Added --verbosity option to talon to tune how much output the user sees (0 = only errors, 1 = logging, 2 = debug)
Added support for BAM files as input in addition to SAM files
Added multithreading to SAM to BAM compression
Fixed minor bugs with temporary output directory behavior

v5.0

4 years ago

Added talon_label_reads module to look for evidence of internal priming
Overhauled filtering step to add additional options and to take internal priming into account
TALON schema version 5:
- Added fields to observed table to record custom SAM tags
- Removed reference to nonexistent exon table
- Added schema version field to run_info
ISM transcripts with known start and ends are no longer promoted to NICs.
Accommodate more GTF formatting quirks
Replaced GM12878 example with SIRVs for speed and simplicity
Changed default identity parameter value to 0.8 (talon)
Changed default min length parameter value to 0 (talon_initialize_database)

4.4.2

4 years ago

Added new utility 'talon_get_sjs' to extract the locations, novelty, and transcript assignments of exons/introns in a TALON database or GTF file.

4.4.1

4 years ago

The 'process_remaining_mult_cases' function was using the wrong index to access the end position of multi-exonic transcripts in its call to 'search_for_overlap_with_gene', leading to overlaps being missed in some cases for reads that lacked any matches to the reference splice junctions. This has been fixed as of this version, and a test has been added to the testing suite to cover this case.
Allow user to omit previously added datasets from config file when attempting an additional TALON run on the same database. In the past, this led the database validation step to crash at run's end.
Small change to get_read_annotations.py utility: Added patch to cover case where the gene/transcript status in the original GTF annotation was something other than 'KNOWN'. The novelty status for such cases will now be 'Other' (in the past they led to a crash). The 'NOVEL' designation is reserved for novel transcripts called by TALON specifically.

4.4

4 years ago

Parallelized TALON using Python's multiprocessing package to greatly speed up talon.py
Uses pysam and pybedtools to partition the reads into non-overlapping intervals
Added a new utility, talon_fetch_reads, which generates an annotation file with an entry for every read in the TALON database
Fixed edge case behavior of single-exon reads overlapping a single-exon reference transcript. In the past, such transcripts were deemed genomic if their start and/or endpoints were beyond the 5'/3' cutoff distances. Now, these cases will be assigned to the reference transcript.
Modified abundance script to accept a whitelist for filtering rather than performing the filtering step itself. The idea is to be more consistent with the GTF file.
Expanded testing suite

4.3.0-dev0.0.1

4 years ago

Incorporated pysam parsing for input SAM files.
Allow more versions of Python
Simplified .travis.yml.

4.3-dev

4 years ago

The underlying functionality of this TALON version is identical to version 4.2, but it has been converted to a Python package in order to create a better experience for users. Once installed, the program can be run from anywhere on the system.

Changes:

Moved all python files to src/talon and all post processing tools to src/talon/post
created init.py files in src/talon and src/talon/post to signal to python that these are packages.
Created a setup.py.
Changed all the imports to make use of the package structure.
Removed all sys.path manipulations.
Removed unnecessary imports.
Set up entry points so tools can be run from the command line. All tools start with talon_ for consistency (except for talon which is plain talon).
Changed the tests in the test suite to run from an installed package of talon.
Setup tox to automate installation and testing. You can start the tests by running tox from the project root.
Added a .travis.yml file so automated testing can be enabled on travis-ci.org. (See for an example https://travis-ci.org/rhpvorderman/TALON/builds/584501052).
Changed the readme and example to reflect the changes.
Moved previous documentation to the wiki

v4.2

4 years ago

Updated documentation and added an example complete with input files.
Fixed GTF formatting bug that caused mono-exonic transcripts to be written with a duplicate exon
Added GTF reformatting script to allow reference GTFs lacking explicit gene and transcript entries to be compatible with TALON

v4.1

4 years ago

Reformatted abundance file to consolidate columns
Added transcript model length to abundance file
New formatting for TALON-issued names (numbers are padded with zeros up to 9 places)
Various bug fixes