Biotite Versions Save

A comprehensive library for computational molecular biology

v0.40.0

1 month ago

Changelog

Additions

  • Refactored struc.superimpose() (#526)
    • Multiple fixed models are allowed
    • Increased performance for multiple models
  • Support for BinaryCIF file format (#531)
    • Added 'bcif' format to database.rcsb.fetch()
    • Added structure.io.pdbx.BinaryCIFFile to parse BinaryCIF files
    • Added structure.io.pdbx.CIFFile to parse CIF files with analogous API to BinaryCIFFile
    • High-level PDBx API (get_structure(), get_assembly(), etc.) supports these new file classes
    • Added include_bondsparameter to structure.io.pdbx.get_structure() and structure.io.pdbx.get_assembly() to parse bond information from file
  • Refactored structure.info subpackage (#540)
    • Decreased initial loading time when package is imported
    • The component dataset is now stored as compressed BinaryCIF decreasing the Biotite package size
    • The component dataset is updated to the current version, i.e. the latest chemical components from the wwPDB are included
    • The project now contains the setup_ccd.py script, enabling the user to get an up-to-date version of the component dataset

Changes

  • Removed structure.info.bond_order() and structure.info.bond_dataset (#540)
  • struc.superimpose returns now an AffineTransformation object instead of a transformation tuple (#526)
    • superimpose_apply() is deprecated in favor of AffineTransformation.apply()
  • structure.io.pdbx.PDBxFile is deprecated and superseded by CIFFile (#531)
  • structure.io.mmtf is deprecated and superseded by BinaryCIFFile (#531)

Fixes

  • Handle invalid CRYST1 records in PDB files correctly (#523)
  • Ensure that NumPy 1.x is used (#537)
    • Support for 2.x will be added in the future

v0.39.0

4 months ago

Changelog

Additions

  • Add build for Python 3.12 (#513)
  • Added modern fast k-mer subsetting methods to sequence.align (#510)
    • These include:
      • MinimizerSelector
      • SyncmerSelector
      • CachedSyncmerSelector
      • MincodeSelector
    • The following k-mer ordering methods are available:
      • RandomPermutation
      • FrequencyPermutation
    • Added BucketKmerTable to support indexing of long k-mers with reasonable memory consumption
  • Support conversion of biotite.sequence.align.Alignment from/to CIGAR strings (#516)
    • read_alignment_from_cigar()
    • write_alignment_to_cigar()
  • Added sequence.graphics.plot_alignment_array() (#485)
  • Support new 5-character residue names in structures from PDB (#512)
  • Support NCBI API keys in database.entrez to increase download limits (#514)
  • Increased performance of application.sra(#504).
    • prefetch is called before fasterq_dump, as suggested here
    • FastaDumpApp is added, which decreases computation time by writing as FASTA instead of a FASTQ file, which omits the scores

Changes

  • application.sra.FastaDumpApp.get_sequences() now only returns sequence (#504) strings and not scores anymore (#504)
    • Use get_sequences_and_scores() instead

Fixes

  • Fixed memory leak in sequence.align.KmerTable.from_tables() (#510)
  • Fixed problems of plotting functionalities with recent Matplotlib versions (#518)

v0.38.0

8 months ago

Changelog

Additions

  • Faster k-mer decomposition in sequence.align.KmerAlphabet.create_kmers() (#475)
  • Sequence type can be set when reading sequences and alignments using sequence.io.fasta ( #478)

Fixes

  • Fixed error that appeared when indexing an sequence.AnnotatedSequence with a slice (#479)
  • Fixed reading MOL/SDF files with more than 100 bonds (#480)
  • Fixed compilation of Biotite with Cython 3.x (#493)
  • Fixed usage of box parameter in structure.rdf() (#494)

v0.37.0

1 year ago

Changelog

Additions

  • Added PubChem database interface with database.pubchem (#472)
    • Analogous to the other database subpackages, it supports, search() and fetch()
    • fetch_property() can be used to quickly obtain a wide range of properties for a given list of compound IDs
    • Automatic throttle control ensures that the PubChem usage control is obeyed
  • Extended functionality for database.rcsb.search() and database.rcsb.count() (#466):
    • Added support for computational structures (e.g. from Alphafold DB) via the content_types parameter
    • Added support for grouping via the new group_by and return_groups parameters
      • the type of grouping is selected via Grouping subclasses
    • Added support for ascending sorting with the Sorting class
  • database.entrez.search() now also accepts the common database name in addition to the E-utility database name (#471)
    • This is now consistent with the behavior in database.entrez.fetch()
  • Added structure.io.pdb.PDBFile.get_b_factor() analogous to structure.io.pdb.PDBFile.get_coord() (#469)
  • Added structure.io.pdbx.get_component() and set_component() (#468)
    • Allows getting/setting chemical components from/to PDBx files via their chem_comp group of categories instead of atom_site

Changes

  • Deprecate atom_mask parameter in structure.connect_via_residue_names() and structure.connect_via_distances() (#474)
    • It has no effect anymore
  • In structure.BondList.merge() the BondList given as parameter takes precedence, if both BondLists contain the same bond with different BondType (#473)
    • Previously it was the other way round
  • The BondList returned by structure.io.pdb.PDBFile.get_structure() (if include_bonds is True) gives appropriate BondTypes, if they can be determined using the CCD (#473)
    • Otherwise the BondType is BondType.ANY
    • Previously it was BondType.ANY for all bonds
  • Refactored structure.remove_pbc()(#460)
    • PCB removal is conducted for each molecule separately
    • Not the first atom but the centroid of a molecule is placed within the box
    • The selection can only be a boolean matrix

Fixes

  • Fixed a bug in structure.connect_via_distances() and structure.connect_via_residue_names() that allowed unexpected bonds between polymer and non-polymer residues (#473)

v0.36.1

1 year ago

Changelog

Fixes

  • Fixed parsing of remarks < 100 in structure.io.PDBFile (#457)
  • Bonds can now be read and written using hybrid-36 encoding in structure.io.PDBFile (#456)

v0.36.0

1 year ago

Changelog

Additions

  • Added Python 3.11 build
  • Better support for macromolecular assemblies and symmetry mates (#450)
    • biotite.structure.io.pdb and biotite.structure.io.mmtf now support parsing of assemblies via list_assemblies() and get_assembly()
    • biotite.structure.io.pdb is able to parse all atoms within a single unit cell via get_symmetry_mates()
  • Added structure.rmspd() to compute the root-mean-square-pairwise-deviation
    • This is a method to determine deviations between to models without the need of prior structure superimposition
  • Refactored structure.annotate_sse() (#448)
    • Higher performance due to more vectorization
    • Multiple chains can be processed at once
  • More granular macromolecule filters in structure subpackage (#436)
    • Added filter_peptide_backbone() and filter_phosphate_backbone() to filter backbone atoms of proteins and nucleotides, respectively
    • Added filter_linear_bond_continuity() that filters atoms that are within distance boundaries to the next atom
    • Added filter_polymer() that filters biomacromolecules of the given type (peptide, nucleotide, carbohydrate) and minimum length
  • More integrity checks in structure subpackage (#436)
    • check_linear_continuity() gives positions in a structure where atoms are not within distance boundaries to the next atom
    • check_backbone_continuity() does the same exclusively for peptide/nucleotide backbone atoms
  • Added sequence.common_alphabet() to determine the Alphabet from a list of alphabets that extends all other alphabets from this list (#446)
  • sequence.phylo.Tree.to_newick() and sequence.phylo.TreeNode.to_newick() allow rounding of distance labels (#439)
  • application.TantanApp is able to process multiple sequences in a single call (#446)
    • This significantly improves the performance especially for short sequences

Changes

  • structure.filter_backbone() is deprecated and replaced by filter_peptide_backbone() (#436)
  • structure.check_bond_continuity() is deprecated and replaced by check_backbone_continuity() (#436)
  • Deprecated chain_id parameter in structure.annotate_sse(), multiple chains can now be processed at once (#448)

Fixes

  • structure.CellList accepts empty query coordinates in get_atoms() and get_atoms_in_cells() (#448)
  • Fixed padding of CRYST1 records to 80 instead of 70 characters (#453)
  • Fixed issue, where application.dssp.DSSPApp did not give correct number of secondary structure elements for multi-chain structures (#444)
  • Resolved MemoryError in structure.repeat_box() (#450)

v0.35.0

1 year ago

Changelog

Additions

  • Support stack-wise iteration over trajectory files (#420)
  • Support Path objects in File.read()
  • Improved filters for different types of residues in structure subpackage (#425)
    • filter_amino_acids() now also filters for non-canonical amino acids
    • filter_nucleotides() uses an updated list of nucleotides
    • New filter_carbohydrates() filters for saccharides
    • filter_canonical_amino_acids() and filter_canonical_nucleotides() filter the respective canonical residues
    • New structure.info.carbohydrate_names() and structure.info.amino_acid_names() give a list of residue names considered as carbohydrates and amino acids, respectively
  • application.LocalApp now supports input to STDIN
  • Improved ViennaRNA interfaces (#435)
    • Added application.viennarna.RNAalifoldApp interface to RNAalifold
    • Secondary structure constraints can be given to application.viennarna.RNAfoldApp and application.viennarna.RNAalifoldApp

Changes

  • The residues that are recognized by structure.filter_amino_acids() have changed (see above)
  • Deprecated application.viennarna.RNAfoldApp.get_mfe() and replaced it by application.viennarna.RNAfoldApp.get_free_energy()

Fixes

  • Support PDB format dialect with inverted charge column (X+ instead of +X) in structure.io.PDBFile(#421)
  • Fixed erroneous atom parsing in strutcure.io.mmtf.MMTFFile, if an MMTF file has multiple different groupType entries for the same residue name and the same number of atoms (#426)
  • Fixed angle condition in structure.base_stacking() (#432)
  • Fixed TypeError in database.muscle.Muscle5App
  • Fixed bond_line_style parameter in structure.graphics.plot_secondary_structure()
  • Fixed error in pseudoknots() and base_pairs_from_dot_bracket() in cases the secondary structure had no base pairs
  • Update identification of error messages from server in database.entrez.fetch()

v0.34.1

1 year ago

Fixes

  • Support for new UniProt REST API (#409)
  • Preserve lower-case chain IDs when an AtomArray is read from PDB and PDBQT files (#413)
  • application.vina.VinaApp supports now docking of molecules containing certain metal elements

v0.34.0

1 year ago

Changelog

Additions

  • Support for new RCSB search API (#408)
    • Added case_sensitive parameter in database.rcsb.FieldQuery
  • structure.info.mass() support deuterium
  • structure.connect_via_distances() can connect atoms over periodic boundaries
  • Added more chain-level utilities consistent with residue-level utilities
    • structure.apply_chain_wise()
    • structure.spread_chain_wise()
    • structure.get_chain_masks()
    • structure.get_chain_starts_for()
    • structure.get_chain_positions()
  • structure.superimpose() supports also pure coordinates

Changes

  • structure.hbond() uses an associated structure.BondList to find hydrogen atoms to potential hydrogen bond donors
  • Lines depicting bonds in structure.graphics.plot_atoms() and structure.graphics.plot_ball_and_stick_model() use rounded tips

Fixes

  • Fixed structure.io.pdbx.get_assembly missing chains in some structures (#387)
  • Added a more meaningful error, if Matplotlib is required, but not installed (#302)
  • Added more descriptive error, if a structure.io.pdb.PDBFile has erroneous atom IDs (#379)
  • structure.io.pdb.PDBFile pads lines always to 80 characters
  • Allow empty attribute string in sequence.io.GFFFile
  • Fixed wrong similarity scores, if a sequence.align.SubstitutionMatrix with two different alphabets is read from string or file
  • Fixed application.mafft.MafftApp runs for more than 10 sequences.

v0.33.0

2 years ago

Changelog

Additions

  • Added application.muscle.Muscle5App to support the changed CLI of Muscle 5
  • Added structure.orient_principal_components() to orient atom coordinates to the given axes
  • biotite.structure.io.pdbx.get_structure() uses label_xxx or auth_xxx field as fallback, if the respective other one is not available
  • Added default_bond_type parameter to biotite.structure.io.write_structure_to_ctab() and biotite.structure.connect_via_distances to allow the user to change the BondType in the generated BondList

Fixes

  • sequence.io.gff.GFFFile.read() is now able to read GFF records with trailing tabs
  • Fixed DeprecationWarning in structure.align_vectors() (#295)
  • Fixed alignment in atom name column in structure.io.pdb.PDBFile.write()
  • Fixed error handling in structure.index_xxx() functions, if invalid input shape is given
  • Ensured quoted values in looped categories will not be truncated in structure.io.pdbx.PDBxFile.set_category()