Biotite Versions Save

A comprehensive library for computational molecular biology

v0.32.0

2 years ago

Changelog

Additions

  • Add Python 3.10 build
  • Add writer_iter() to some File classes
    • Allows entry-wise writing into a file to reduce memory consumption, as there is no longer the need to keep the entire file content in memory
    • Implemented for sequence.io.FastaFile, sequence.io.FastqFile and all classes inheriting from structure.io.TrajectoryFile
  • Add symbol_spacing parameter to sequence.graphics.plot_alignment
    • If the parameter is given, a small gap is introduced every n symbols in the alignment
    • Also added to sequence.graphics.plot_alignment_similarity_based and sequence.graphics.plot_alignment_types_based

Changes

  • Python 3.7 builds are discontinued

Fixes

  • Now the simulation time for each frame is returned in structure.io.NetCDFFile, previously it was always None

v0.31.0

2 years ago

Changelog

Additions

  • New functionalities for sequence.SequenceProfile
    • Added method probability_matrix() to compute the symbol probabilities from total frequencies
    • Added method log_odds_matrix() to calculate a position weight matrix
    • Added methods sequence_probability() and sequence_score() to assess the adherence of a given sequence to a profile
  • New functionalities for structure.BondList
    • New bond type structure.BondType.AROMATIC_TRIPLE to support triple bonds in aromatic systems
    • Added structure.BondList.without_aromaticity() to convert bonds with structure.BondType.AROMATIC_<order> to structure.BondType.<order>
  • Added structure.info.bond_order()
    • Used to get the structure.BondType of the bond between two atoms in a residue
    • Replaces structure.info.bond_order()
    • Initial loading of bond dataset is much faster

Changes

  • Deprecated structure.info.bond_order()

Fixes

  • Fixed structure.io.pdb.PDBFile.get_structure() raising an exception, if the PDB file contains an invalid CRYST1 record, now a warning is printed
  • Fixed CONECT records written by structure.io.pdb.PDBFile.set_structure()
    • Previously, an empty second CONECT record was created, if a atom has 4 bond partners
  • Fixed installation of Biotite source distribution with Python 3.10 (#356)
  • Fixed running application.muscle.MuscleApp with nucleotide sequences

v0.30.0

2 years ago

Changelog

Changes

  • sequence.graphics.plot_sequence_logo() requires now a sequence.SequenceProfile, usage of sequence.align.Alignment still works, but is deprecated
  • database.rcsb.FieldQuery has now an optional molecular_definition parameter to allow searches in molecule related fields.

Fixes

  • The new RCSB search API is now supported (#347)
  • More than two database.rcsb.Query objects can be combined with logical operators
  • application.blast.BlastWebApp now also reports hit sequences containing selenocysteine without errors (#344)
  • Atom, AtomArray and AtomArrayStack can be unpickled (#349)
  • sequence.graphics.plot_sequence_logo() supports now all Matplotlib backends (#345)

v0.29.0

2 years ago

Changelog

Additions

  • Extended functionalities for homology searches
    • Added application.tantan.TantanApp for sequence repeat masking with Tantan
    • sequence.align.KmerAlphabet and sequence.align.KmerTable support spaced k-mers
    • Added sequence.align.local_gapped() and sequence.align.local_ungapped() for aligning sequences locally with X-drop heuristic
    • Added EValueEstimator for calculation of expect values (E-values) from alignment scores
    • Increased performance of sequence.LetterAlphabet.extends()
  • Added sequence.SequenceProfile for representing sequence profiles by means of a symbol frequency table
    • sequence.SequenceProfile.from_alignment() creates a profile from an alignment
    • sequence.SequenceProfile.to_consenus() creates a consensus sequence
  • Increased performance of sequence.align.get_codes() and sequence.align.get_symbols()
  • Bonds can be read from and written to CONECT records in structure.io.pdb.PDBFile (#329)

Changes

  • Documentation now uses sphinxcontrib-bibtex for citations
    • Citations include DOI with link to publication

Fixes

  • Fixed protein color schemes for sequences containing 'X' or '*' (#322)
  • sequence.graphics,plot_nucleotide_secondary_structure() is now able to set the color of symbols (#333)

v0.28.0

2 years ago

Changelog

Additions

  • Most classes support now the repr() function (#290)
  • Add equality comparison for sequence.CodonTable
  • Added structure.info.all_residues(), that gives all residue names from the Chemical Component Dictionary
  • Updated resources from Chemical Component Dictionary in structure.info
  • Add structure.io.mol.MOLFile to support MOL and SDF files for small molecule structure data
  • Add database.uniprot for UniProt database support
    • database.uniprot.search() searches for Uniprot IDs that match a given database.uniprot.Query
    • database.uniprot.fetch() downloads the file corresponding to the given Uniprot ID.
  • Increase performance of structure.pseudoknots() by using NetworkX to identify conflicting regions (#289)

Changes

  • Changed structure.BondType enum values for more precise description of aromatic bonds
    • BondType.AROMATIC is replaced by BondType.AROMATIC_SINGLE and BondType.AROMATIC_DOUBLE
    • New method structure.BondList.remove_aromaticity() converts BondType.AROMATIC_SINGLE to BondType.SINGLE and BondType.AROMATIC_DOUBLE to BondType.DOUBLE in-place

Fixes

  • Fixed error, when reading a single model from a structure.io.PDBFile, if the MODEL line is missing in the file
  • Fixed the format of formal charges written to structure.io.PDBFile
    • Previously it was e.g. '+2' instead of the correct 2+
  • Fixed atom_i parameter when reading a trajectory via structure.io.load_structure() (#308)
  • Fixed performance issue of structure.CellList
    • Previously the number of evaluated cells was too large, if the radius parameter of structure.CellList.get_atoms() was equal to the cell_size parameter of the constructor (#311)
  • Setting an existing annotation array in structure.AtomArray and structure.AtomArrayStack preserves the NumPy dtype

v0.27.0

3 years ago

Changelog

Additions

  • Added interface to AutoDock Vina
    • application.autodock.VinaApp uses vina executable to perform docking of ligand to a receptor molecule
    • Uses new structure.io.pdbqt.PDBQTFile class for writing input for and reading output from vina
      • An MGLTools installation is not necessary
    • By default the receptor is handled as rigid structure, however, flexible side chains can be defined
  • Added modular system for fast k-mer based sequence searches/mappings
    • sequence.align.KmerAlphabet encodes a sequence.Sequence into k-mers
    • sequence.align.KmerTable is able to find k-mer matches between sequence in an efficient manner
    • sequence.align.SimilarityRule allows matching similar instead of exact k-mer matches via a sequence.align.KmerTable
    • sequence.align.align_banded() performs a heuristic local or semi-global sequence alignment within a defined diagonal band
  • Added sequence.align.remove_terminal_gaps() function
  • Added application.sra.FastqDumpApp.get_file_paths() method
  • Increased performance of sequence.Sequence.get_symbol_frequency()
  • Increased performance of sequence.NucleotideSequence.complement()
  • sequence.Sequence.reverse() can optionally create an array view instead of a copy

Changes

  • application.sra.FastqDumpApp.get_file_paths() only parses downloaded PDBQT files, if required
  • Running pytest automatically recompiles changed Cython source code

v0.26.0

3 years ago

Changelog

Additions

  • Added interface to some programs of the ViennaRNA software package
    • application.viennarna.RNAfoldApp uses RNAfold to predict the minimum free energy secondary structure of an RNA sequence
    • application.viennarna.RNAplotApp uses RNAplot to calculate the 2D coordinates for base symbols in a secondary structure plot
  • Added structure.graphics.plot_nucleotide_secondary_structure() for visualization of an RNA secondary structure via Matplotlib
    • Internally uses RNAplot
    • Optional visualization of pseudoknots
  • Increased performance of structure.find_connected()
  • Increased performance of structure.partial_charges()
  • Added structure.BondList.remove_bonds_to
  • Added molecule-level atom selections
    • structure.get_molecule_indices(), structure.get_molecule_masks() and structure.molecule_iter select atoms belong to a single molecule, i.e. atoms that are connected via bonds
  • Added interface to NetworkX package
    • Added as_graph() method to sequence.phylo.Tree and structure.BondList for conversion into a NetworkX Graph
    • The find_rotatable_bonds() function uses NetworkX to identify rotatable bonds, i.e. single bonds that are not part of a cycle, in structures with a structure.BondList()

Changes

  • Add support for Python 3.9, remove support for Python 3.6
  • Add networkx package as dependency
  • structure.io.pdbx.set_structure() does not convert the residue ID -1 to "." anymore

Fixes

  • Fixed missing check for string length of chain ID, residue name and atom name when setting a structure in a structure.io.pdb.PDBFile
  • structure.io.pdbx.set_structure() supports now atom IDs larger than one million
  • Fixed application.dssp.DsspApp unable to work with multicharacter chain identifiers (#264)
  • Fixed application.muscle.MuscleApp sometimes not finishing for long alignments (#273)
  • Fixed the creation an AtomArray from atoms with optional annotations (#279)
  • Fixed deletion of annotation arrays in `structure.AtomArray
  • Fixed structure.BondList potentially ending in a broken state after indexing it with an unordered index array
  • structure.partial_charges() uses bond order instead of number of bond partners to calculate correct charges for atoms with positive or negative formal charge

v0.25.0

3 years ago

Changelog

Additions

  • New analysis capabilities for nucleic acid base pairs
    • Added structure.info.nucleotide_names()
    • Increased performance of structure.base_pairs()
    • Support for exotic nucleotides in structure.base_pairs() and structure.filter_nucleotides()
    • Added structure.base_pairs_edge() and structure.base_pairs_glycosidic_bond() for further characterization of base pairs
    • Added structure.base_stacking() for identification of pi-stacking of nucleobases
    • Added structure.pseudoknots() for identification of pseudoknots in a given list of base pairings
    • Added structure.dot_bracket(), structure.dot_bracket_from_structure() and structure.base_pairs_from_dot_bracket() for conversion of base pairs to dot-bracket-letter notation and vice versa
  • New methods for structure.BondList:
    • Added get_all_bonds() for obtaining the bonds atoms for each atom in the structure
    • Added adjacency_matrix() and bond_type_matrix()
  • Added structure.partial_charges() for partial charge calculation using the PEOE method
  • Added structure.info.standardize_order(), that reorders atoms in residues into the PDB standard atom order for the respective residue
  • Added structure.graphics.plot_ball_and_stick_model()
  • Increased performance of residue level utilities
  • Added structure.get_residue_positions()
  • Added sequence.io.genbank.get_raw_sequence() which returns the sequence as string

Changes

  • structure.hbond() raises a warning if an input structure without hydrogen atoms is given (#241)
  • get_sequence() and get_sequences() of biotite.sequence.io.fasta and biotite.sequence.io.fastq convert selenocysteine to cysteine (#232)
  • Changed order of sequence type biotite.sequence.io.fasta.get_sequence() and biotite.sequence.io.fasta.get_sequences() try to create (#232):
    • First: sequence.NucleotideSequence
    • If this fails: sequence.ProteinSequence
  • Temporary files used by the application subpackage are removed via os.remove() due to issues on Windows (#243)

Fixes

  • Fixed structure.base_pairs() for structures that contian residues, that are not in the PDB standard order (#237)
  • Fixed slightly incorrect aspect ratio in molecular visualizations created via structure.graphics.plot_atoms()
  • Fixed bounds check for input bonds the structure.BondList constructor (related to #252):
    • Previously, the bond type value was not allowed to exceed the number of atoms
  • Fixed structure.BondList indexing with an unsorted index array (#238)
  • Fixed the charge annotation of molecules obtained via structure.info.residue() (#254)

v0.24.0

3 years ago

Changelog

Additions

  • Added sequence.ProteinSequence.get_molecular_weight() method
  • Added application.sra subpackage as interface to NCBI SRA tools
    • FastqDumpApp is used for fetching FASTQ files from the NCBI SRA
  • Added iter_read() static method to sequence.io.fasta.FastaFile and sequence.io.fasta.FastqFile
    • This method is used to parse header-sequence-pairs from FASTA/FASTQ files without the necessity to keep the entire file in memory.
  • set_sequence and set_sequences in sequence.io.fasta and sequence.io.fasta support writing RNA sequences with the new as_rna parameter

Fixes

  • Fixed missing whitespace at the end of _loop category labels in PDBx/mmCIF files (#224)
  • Fixed inconsistent handling of model IDs over different file formats for structures where the first model ID is greater than 1 (#227)
  • Removed warning in structure.density()
  • sequence.io.fastq.get_sequence() and sequence.io.fastq.get_sequences() properly handle RNA and ambiguous sequences now
  • Fixed start parameter in structure.renumber_atom_ids and structure.renumber_res_ids
  • Updated fetch URL for FASTA files in database.rcsb.fetch()

v0.23.0

3 years ago

Changelog

Additions

  • Improved example gallery
    • Added minigalleries in the API reference to get tangible examples for the respective function/class
    • Added support for animated Matplotlib plots
    • Using Ammolite for rendering PyMOL images
  • Added support for new RCSB search API
    • New database.rcsb.Query classes, that reflect the entirety of the new search API, including sequence, sequence motif and structure searches
      • Multiple database.rcsb.Query objects can be combined/negated using the operators |, & and ~
    • Added the return_type, sort_by and range parameter to database.rcsb.search()
    • Added database.rcsb.count() function to count the number of results a database.rcsb.Query would yield in a less costly way than database.rcsb.search()
  • Increased indexing speed in biotite.structure.BondList
  • Added attribute sequence.Sequence.alphabet property, that is equivalent to sequence.Sequence.get_alphabet()
  • Added convenience functions fastq.get_sequence(), fastq.get_sequences(), fastq.set_sequence() and fastq.set_sequences()
  • Drastically increased writing speed of sequence.io.fasta.FastaFile
  • Increased mapping speed of sequence.AlphabetMapper
  • Added sequence.Alphabet.is_letter_alphabet() method
  • Added general sequence I/O convenience functions sequence.io.load_sequence(), sequence.io.load_sequences(), sequence.io.save_sequence() and sequence.io.save_sequences() that derive the appropriate File class from the suffix of the file name.

Changes

  • The omit_chain parameter has been removed from database.rcsb.search()
  • The old database.rcsb.Query classes have been removed
  • Removed python setup.py test and python setup.py build_sphinx commands, please use pytest and sphinx-build directly instead
  • Renamed sequence.NucleotideSequence.alphabet to sequence.NucleotideSequence.alphabet_unamb
  • sequence.io.fastq.FastqFile returns its entries only as str instead of sequence.NucleotideSequence for consistency with sequence.io.fastq.FastaFile
    • The method sequence.io.fastq.FastqFile.get_sequence() is deprecated
    • The method sequence.io.fastq.FastqFile.get_seq_string() returns the sequence as a str instead of a sequence.NucleotideSequence

Fixes

  • Fixed expect_looped parameter in structure.io.pdbx.PDBxFile.get_category()
  • Fixed error in structure.io.pdbx.PDBxFile, that was raised, if a PDBx field and its single-line value are in separate lines
  • Added check for boolean mask length, when a boolean mask is given as index to biotite.structure.BondList
  • Changed chain_id dtype from 'U3' to 'U4' (#215)