Technology agnostic long read analysis pipeline for transcriptomes
Added new utility 'talon_get_sjs' to extract the locations, novelty, and transcript assignments of exons/introns in a TALON database or GTF file.
The 'process_remaining_mult_cases' function was using the wrong index to access the end position of multi-exonic transcripts in its call to 'search_for_overlap_with_gene', leading to overlaps being missed in some cases for reads that lacked any matches to the reference splice junctions. This has been fixed as of this version, and a test has been added to the testing suite to cover this case.
Allow user to omit previously added datasets from config file when attempting an additional TALON run on the same database. In the past, this led the database validation step to crash at run's end.
Small change to get_read_annotations.py utility: Added patch to cover case where the gene/transcript status in the original GTF annotation was something other than 'KNOWN'. The novelty status for such cases will now be 'Other' (in the past they led to a crash). The 'NOVEL' designation is reserved for novel transcripts called by TALON specifically.
The underlying functionality of this TALON version is identical to version 4.2, but it has been converted to a Python package in order to create a better experience for users. Once installed, the program can be run from anywhere on the system.
Changes:
This release is written for Python 3.7 rather than Python 2.7. There are some small issues to be worked out in the post-TALON_tools section, but the schema and talon.py script are set.