Mortazavilab TALON Versions Save

Technology agnostic long read analysis pipeline for transcriptomes


3 years ago
  • Added talon_label_reads module to look for evidence of internal priming
  • Overhauled filtering step to add additional options and to take internal priming into account
  • TALON schema version 5:
    • Added fields to observed table to record custom SAM tags
    • Removed reference to nonexistent exon table
    • Added schema version field to run_info
  • ISM transcripts with known start and ends are no longer promoted to NICs.
  • Accommodate more GTF formatting quirks
  • Replaced GM12878 example with SIRVs for speed and simplicity
  • Changed default identity parameter value to 0.8 (talon)
  • Changed default min length parameter value to 0 (talon_initialize_database)


3 years ago

Added new utility 'talon_get_sjs' to extract the locations, novelty, and transcript assignments of exons/introns in a TALON database or GTF file.


3 years ago
  • The 'process_remaining_mult_cases' function was using the wrong index to access the end position of multi-exonic transcripts in its call to 'search_for_overlap_with_gene', leading to overlaps being missed in some cases for reads that lacked any matches to the reference splice junctions. This has been fixed as of this version, and a test has been added to the testing suite to cover this case.

  • Allow user to omit previously added datasets from config file when attempting an additional TALON run on the same database. In the past, this led the database validation step to crash at run's end.

  • Small change to utility: Added patch to cover case where the gene/transcript status in the original GTF annotation was something other than 'KNOWN'. The novelty status for such cases will now be 'Other' (in the past they led to a crash). The 'NOVEL' designation is reserved for novel transcripts called by TALON specifically.


3 years ago
  • Parallelized TALON using Python's multiprocessing package to greatly speed up
  • Uses pysam and pybedtools to partition the reads into non-overlapping intervals
  • Added a new utility, talon_fetch_reads, which generates an annotation file with an entry for every read in the TALON database
  • Fixed edge case behavior of single-exon reads overlapping a single-exon reference transcript. In the past, such transcripts were deemed genomic if their start and/or endpoints were beyond the 5'/3' cutoff distances. Now, these cases will be assigned to the reference transcript.
  • Modified abundance script to accept a whitelist for filtering rather than performing the filtering step itself. The idea is to be more consistent with the GTF file.
  • Expanded testing suite


3 years ago
  • Incorporated pysam parsing for input SAM files.
  • Allow more versions of Python
  • Simplified .travis.yml.


3 years ago

The underlying functionality of this TALON version is identical to version 4.2, but it has been converted to a Python package in order to create a better experience for users. Once installed, the program can be run from anywhere on the system.


  • Moved all python files to src/talon and all post processing tools to src/talon/post
  • created files in src/talon and src/talon/post to signal to python that these are packages.
  • Created a
  • Changed all the imports to make use of the package structure.
  • Removed all sys.path manipulations.
  • Removed unnecessary imports.
  • Set up entry points so tools can be run from the command line. All tools start with talon_ for consistency (except for talon which is plain talon).
  • Changed the tests in the test suite to run from an installed package of talon.
  • Setup tox to automate installation and testing. You can start the tests by running tox from the project root.
  • Added a .travis.yml file so automated testing can be enabled on (See for an example
  • Changed the readme and example to reflect the changes.
  • Moved previous documentation to the wiki


3 years ago
  • Updated documentation and added an example complete with input files.
  • Fixed GTF formatting bug that caused mono-exonic transcripts to be written with a duplicate exon
  • Added GTF reformatting script to allow reference GTFs lacking explicit gene and transcript entries to be compatible with TALON


4 years ago
  • Reformatted abundance file to consolidate columns
  • Added transcript model length to abundance file
  • New formatting for TALON-issued names (numbers are padded with zeros up to 9 places)
  • Various bug fixes


4 years ago

This release is written for Python 3.7 rather than Python 2.7. There are some small issues to be worked out in the post-TALON_tools section, but the schema and script are set.

  • Matching is done on the vertex rather than exon level (except for mono-exonic transcripts, where overlap-based matching is attempted first). Known starts and ends for the gene are prioritized when matching transcript start and endpoints.
  • Updated filtering: genomic transcripts are removed regardless of reproducibility
  • Schema changes to TALON database:
    • Added more information to the observed table, including start/end exons
    • In transcript table, the 'jn_path' column now omits the start and end exon. This information is stored in the start_exon and end_exon columns instead.
    • location table no longer includes strand. This has been moved to the edge table
    • gene table now includes strand
  • Reports type of novelty each time a new gene or transcript is identified. For genes, novelty types include antisense and intergenic. For transcripts, novelty types include incomplete splice match (ISM), ISM prefix, ISM suffix, novel in catalog (NIC), novel not in catalog (NNC), antisense, genomic, and intergenic.
  • Updated GTF utility to use a whitelist file rather than database filtering
  • Initialization step now assumes that provided GTF genes, transcripts, and exon are known unless specified otherwise in the GTF attributes. Necessary because new versions of the GENCODE annotation now lack the 'gene_status' and 'transcript_status' fields.
  • Expanded testing suite


4 years ago
  • Updates have been made to the database schema, so this version is not backwards-compatible with previous releases
  • Instead of separate observed 5' and observed 3' end tables, the schema now includes a single table called 'observed'. This table tracks 5' and 3' end differences as before, but also has additional attributes such as the original read name and length.
  • Fixed bug in the testing suite that resulted in certain tests crashing when run on a different computer than originally