Biotite Versions Save

A comprehensive library for computational molecular biology

v0.32.0

2 years ago

Changelog

Additions

Add Python 3.10 build
Add writer_iter() to some File classes
- Allows entry-wise writing into a file to reduce memory consumption, as there is no longer the need to keep the entire file content in memory
- Implemented for sequence.io.FastaFile, sequence.io.FastqFile and all classes inheriting from structure.io.TrajectoryFile
Add symbol_spacing parameter to sequence.graphics.plot_alignment
- If the parameter is given, a small gap is introduced every n symbols in the alignment
- Also added to sequence.graphics.plot_alignment_similarity_based and sequence.graphics.plot_alignment_types_based

Changes

Python 3.7 builds are discontinued

Fixes

Now the simulation time for each frame is returned in structure.io.NetCDFFile, previously it was always None

v0.31.0

2 years ago

Changelog

Additions

New functionalities for sequence.SequenceProfile
- Added method probability_matrix() to compute the symbol probabilities from total frequencies
- Added method log_odds_matrix() to calculate a position weight matrix
- Added methods sequence_probability() and sequence_score() to assess the adherence of a given sequence to a profile
New functionalities for structure.BondList
- New bond type structure.BondType.AROMATIC_TRIPLE to support triple bonds in aromatic systems
- Added structure.BondList.without_aromaticity() to convert bonds with structure.BondType.AROMATIC_<order> to structure.BondType.<order>
Added structure.info.bond_order()
- Used to get the structure.BondType of the bond between two atoms in a residue
- Replaces structure.info.bond_order()
- Initial loading of bond dataset is much faster

Changes

Deprecated structure.info.bond_order()

Fixes

Fixed structure.io.pdb.PDBFile.get_structure() raising an exception, if the PDB file contains an invalid CRYST1 record, now a warning is printed
Fixed CONECT records written by structure.io.pdb.PDBFile.set_structure()
- Previously, an empty second CONECT record was created, if a atom has 4 bond partners
Fixed installation of Biotite source distribution with Python 3.10 (#356)
Fixed running application.muscle.MuscleApp with nucleotide sequences

v0.30.0

2 years ago

Changelog

Changes

sequence.graphics.plot_sequence_logo() requires now a sequence.SequenceProfile, usage of sequence.align.Alignment still works, but is deprecated
database.rcsb.FieldQuery has now an optional molecular_definition parameter to allow searches in molecule related fields.

Fixes

The new RCSB search API is now supported (#347)
More than two database.rcsb.Query objects can be combined with logical operators
application.blast.BlastWebApp now also reports hit sequences containing selenocysteine without errors (#344)
Atom, AtomArray and AtomArrayStack can be unpickled (#349)
sequence.graphics.plot_sequence_logo() supports now all Matplotlib backends (#345)

v0.29.0

2 years ago

Changelog

Additions

Extended functionalities for homology searches
- Added application.tantan.TantanApp for sequence repeat masking with Tantan
- sequence.align.KmerAlphabet and sequence.align.KmerTable support spaced k-mers
- Added sequence.align.local_gapped() and sequence.align.local_ungapped() for aligning sequences locally with X-drop heuristic
- Added EValueEstimator for calculation of expect values (E-values) from alignment scores
- Increased performance of sequence.LetterAlphabet.extends()
Added sequence.SequenceProfile for representing sequence profiles by means of a symbol frequency table
- sequence.SequenceProfile.from_alignment() creates a profile from an alignment
- sequence.SequenceProfile.to_consenus() creates a consensus sequence
Increased performance of sequence.align.get_codes() and sequence.align.get_symbols()
Bonds can be read from and written to CONECT records in structure.io.pdb.PDBFile (#329)

Changes

Documentation now uses sphinxcontrib-bibtex for citations
- Citations include DOI with link to publication

Fixes

Fixed protein color schemes for sequences containing 'X' or '*' (#322)
sequence.graphics,plot_nucleotide_secondary_structure() is now able to set the color of symbols (#333)

v0.28.0

2 years ago

Changelog

Additions

Most classes support now the repr() function (#290)
Add equality comparison for sequence.CodonTable
Added structure.info.all_residues(), that gives all residue names from the Chemical Component Dictionary
Updated resources from Chemical Component Dictionary in structure.info
Add structure.io.mol.MOLFile to support MOL and SDF files for small molecule structure data
Add database.uniprot for UniProt database support
- database.uniprot.search() searches for Uniprot IDs that match a given database.uniprot.Query
- database.uniprot.fetch() downloads the file corresponding to the given Uniprot ID.
Increase performance of structure.pseudoknots() by using NetworkX to identify conflicting regions (#289)

Changes

Changed structure.BondType enum values for more precise description of aromatic bonds
- BondType.AROMATIC is replaced by BondType.AROMATIC_SINGLE and BondType.AROMATIC_DOUBLE
- New method structure.BondList.remove_aromaticity() converts BondType.AROMATIC_SINGLE to BondType.SINGLE and BondType.AROMATIC_DOUBLE to BondType.DOUBLE in-place

Fixes

Fixed error, when reading a single model from a structure.io.PDBFile, if the MODEL line is missing in the file
Fixed the format of formal charges written to structure.io.PDBFile
- Previously it was e.g. '+2' instead of the correct 2+
Fixed atom_i parameter when reading a trajectory via structure.io.load_structure() (#308)
Fixed performance issue of structure.CellList
- Previously the number of evaluated cells was too large, if the radius parameter of structure.CellList.get_atoms() was equal to the cell_size parameter of the constructor (#311)
Setting an existing annotation array in structure.AtomArray and structure.AtomArrayStack preserves the NumPy dtype

v0.27.0

3 years ago

Changelog

Additions

Added interface to AutoDock Vina
- application.autodock.VinaApp uses vina executable to perform docking of ligand to a receptor molecule
- Uses new structure.io.pdbqt.PDBQTFile class for writing input for and reading output from vina
  - An MGLTools installation is not necessary
- By default the receptor is handled as rigid structure, however, flexible side chains can be defined
Added modular system for fast k-mer based sequence searches/mappings
- sequence.align.KmerAlphabet encodes a sequence.Sequence into k-mers
- sequence.align.KmerTable is able to find k-mer matches between sequence in an efficient manner
- sequence.align.SimilarityRule allows matching similar instead of exact k-mer matches via a sequence.align.KmerTable
- sequence.align.align_banded() performs a heuristic local or semi-global sequence alignment within a defined diagonal band
Added sequence.align.remove_terminal_gaps() function
Added application.sra.FastqDumpApp.get_file_paths() method
Increased performance of sequence.Sequence.get_symbol_frequency()
Increased performance of sequence.NucleotideSequence.complement()
sequence.Sequence.reverse() can optionally create an array view instead of a copy

Changes

application.sra.FastqDumpApp.get_file_paths() only parses downloaded PDBQT files, if required
Running pytest automatically recompiles changed Cython source code

v0.26.0

3 years ago

Changelog

Additions

Added interface to some programs of the ViennaRNA software package
- application.viennarna.RNAfoldApp uses RNAfold to predict the minimum free energy secondary structure of an RNA sequence
- application.viennarna.RNAplotApp uses RNAplot to calculate the 2D coordinates for base symbols in a secondary structure plot
Added structure.graphics.plot_nucleotide_secondary_structure() for visualization of an RNA secondary structure via Matplotlib
- Internally uses RNAplot
- Optional visualization of pseudoknots
Increased performance of structure.find_connected()
Increased performance of structure.partial_charges()
Added structure.BondList.remove_bonds_to
Added molecule-level atom selections
- structure.get_molecule_indices(), structure.get_molecule_masks() and structure.molecule_iter select atoms belong to a single molecule, i.e. atoms that are connected via bonds
Added interface to NetworkX package
- Added as_graph() method to sequence.phylo.Tree and structure.BondList for conversion into a NetworkX Graph
- The find_rotatable_bonds() function uses NetworkX to identify rotatable bonds, i.e. single bonds that are not part of a cycle, in structures with a structure.BondList()

Changes

Add support for Python 3.9, remove support for Python 3.6
Add networkx package as dependency
structure.io.pdbx.set_structure() does not convert the residue ID -1 to "." anymore

Fixes

Fixed missing check for string length of chain ID, residue name and atom name when setting a structure in a structure.io.pdb.PDBFile
structure.io.pdbx.set_structure() supports now atom IDs larger than one million
Fixed application.dssp.DsspApp unable to work with multicharacter chain identifiers (#264)
Fixed application.muscle.MuscleApp sometimes not finishing for long alignments (#273)
Fixed the creation an AtomArray from atoms with optional annotations (#279)
Fixed deletion of annotation arrays in `structure.AtomArray
Fixed structure.BondList potentially ending in a broken state after indexing it with an unordered index array
structure.partial_charges() uses bond order instead of number of bond partners to calculate correct charges for atoms with positive or negative formal charge

v0.25.0

3 years ago

Changelog

Additions

New analysis capabilities for nucleic acid base pairs
- Added structure.info.nucleotide_names()
- Increased performance of structure.base_pairs()
- Support for exotic nucleotides in structure.base_pairs() and structure.filter_nucleotides()
- Added structure.base_pairs_edge() and structure.base_pairs_glycosidic_bond() for further characterization of base pairs
- Added structure.base_stacking() for identification of pi-stacking of nucleobases
- Added structure.pseudoknots() for identification of pseudoknots in a given list of base pairings
- Added structure.dot_bracket(), structure.dot_bracket_from_structure() and structure.base_pairs_from_dot_bracket() for conversion of base pairs to dot-bracket-letter notation and vice versa
New methods for structure.BondList:
- Added get_all_bonds() for obtaining the bonds atoms for each atom in the structure
- Added adjacency_matrix() and bond_type_matrix()
Added structure.partial_charges() for partial charge calculation using the PEOE method
Added structure.info.standardize_order(), that reorders atoms in residues into the PDB standard atom order for the respective residue
Added structure.graphics.plot_ball_and_stick_model()
Increased performance of residue level utilities
Added structure.get_residue_positions()
Added sequence.io.genbank.get_raw_sequence() which returns the sequence as string

Changes

structure.hbond() raises a warning if an input structure without hydrogen atoms is given (#241)
get_sequence() and get_sequences() of biotite.sequence.io.fasta and biotite.sequence.io.fastq convert selenocysteine to cysteine (#232)
Changed order of sequence type biotite.sequence.io.fasta.get_sequence() and biotite.sequence.io.fasta.get_sequences() try to create (#232):
- First: sequence.NucleotideSequence
- If this fails: sequence.ProteinSequence
Temporary files used by the application subpackage are removed via os.remove() due to issues on Windows (#243)

Fixes

Fixed structure.base_pairs() for structures that contian residues, that are not in the PDB standard order (#237)
Fixed slightly incorrect aspect ratio in molecular visualizations created via structure.graphics.plot_atoms()
Fixed bounds check for input bonds the structure.BondList constructor (related to #252):
- Previously, the bond type value was not allowed to exceed the number of atoms
Fixed structure.BondList indexing with an unsorted index array (#238)
Fixed the charge annotation of molecules obtained via structure.info.residue() (#254)

v0.24.0

3 years ago

Changelog

Additions

Added sequence.ProteinSequence.get_molecular_weight() method
Added application.sra subpackage as interface to NCBI SRA tools
- FastqDumpApp is used for fetching FASTQ files from the NCBI SRA
Added iter_read() static method to sequence.io.fasta.FastaFile and sequence.io.fasta.FastqFile
- This method is used to parse header-sequence-pairs from FASTA/FASTQ files without the necessity to keep the entire file in memory.
set_sequence and set_sequences in sequence.io.fasta and sequence.io.fasta support writing RNA sequences with the new as_rna parameter

Fixes

Fixed missing whitespace at the end of _loop category labels in PDBx/mmCIF files (#224)
Fixed inconsistent handling of model IDs over different file formats for structures where the first model ID is greater than 1 (#227)
Removed warning in structure.density()
sequence.io.fastq.get_sequence() and sequence.io.fastq.get_sequences() properly handle RNA and ambiguous sequences now
Fixed start parameter in structure.renumber_atom_ids and structure.renumber_res_ids
Updated fetch URL for FASTA files in database.rcsb.fetch()

v0.23.0

3 years ago

Changelog

Additions

Improved example gallery
- Added minigalleries in the API reference to get tangible examples for the respective function/class
- Added support for animated Matplotlib plots
- Using Ammolite for rendering PyMOL images
Added support for new RCSB search API
- New database.rcsb.Query classes, that reflect the entirety of the new search API, including sequence, sequence motif and structure searches
  - Multiple database.rcsb.Query objects can be combined/negated using the operators |, & and ~
- Added the return_type, sort_by and range parameter to database.rcsb.search()
- Added database.rcsb.count() function to count the number of results a database.rcsb.Query would yield in a less costly way than database.rcsb.search()
Increased indexing speed in biotite.structure.BondList
Added attribute sequence.Sequence.alphabet property, that is equivalent to sequence.Sequence.get_alphabet()
Added convenience functions fastq.get_sequence(), fastq.get_sequences(), fastq.set_sequence() and fastq.set_sequences()
Drastically increased writing speed of sequence.io.fasta.FastaFile
Increased mapping speed of sequence.AlphabetMapper
Added sequence.Alphabet.is_letter_alphabet() method
Added general sequence I/O convenience functions sequence.io.load_sequence(), sequence.io.load_sequences(), sequence.io.save_sequence() and sequence.io.save_sequences() that derive the appropriate File class from the suffix of the file name.

Changes

The omit_chain parameter has been removed from database.rcsb.search()
The old database.rcsb.Query classes have been removed
Removed python setup.py test and python setup.py build_sphinx commands, please use pytest and sphinx-build directly instead
Renamed sequence.NucleotideSequence.alphabet to sequence.NucleotideSequence.alphabet_unamb
sequence.io.fastq.FastqFile returns its entries only as str instead of sequence.NucleotideSequence for consistency with sequence.io.fastq.FastaFile
- The method sequence.io.fastq.FastqFile.get_sequence() is deprecated
- The method sequence.io.fastq.FastqFile.get_seq_string() returns the sequence as a str instead of a sequence.NucleotideSequence

Fixes

Fixed expect_looped parameter in structure.io.pdbx.PDBxFile.get_category()
Fixed error in structure.io.pdbx.PDBxFile, that was raised, if a PDBx field and its single-line value are in separate lines
Added check for boolean mask length, when a boolean mask is given as index to biotite.structure.BondList
Changed chain_id dtype from 'U3' to 'U4' (#215)