Anvio Versions Save

An analysis and visualization platform for 'omics data

v8

8 months ago

We are happy to announce anvi'o v8 with the code name, "marie"!

After about 4,200 changes that introduced over 36,000 new lines of code, this stable release of anvi'o represents significant advancements over v7, and introduces many new features for integrated studies of microbial metabolism, genomic inversions, phylogeography of proteins, performance improvements, and fixes for known bugs.

This page intends to give you a summary of some of the notable changes that come with marie.

The code name recognizes Marie Tharp, an American geologist and oceanographic cartographer, who has made immense contributions to earth sciences. Marie was a pioneer in our understanding of oceans as she created the first map of the Atlantic seafloor with her colleague Bruce Heezen [1]. Her work showed that the bottom of our oceans were not only flat sediments but were also covered with canyons, ridges, and mountain ranges that spanned over 65,000 kilometers around the globe. Marie's revolutionary work emerged from her interpretation of data she was not allowed to collect since women were not allowed to be on ships during the 1950s. Marie compiled her physiographic diagrams from the data Bruce Heezen were able to collect [2]. She did not step on a ship until 1968, and the early evidence she had for seafloor features was initially dismissed as 'girl talk' [3].

[1] https://en.wikipedia.org/wiki/Marie_Tharp [2] https://www.lyellcollection.org/doi/abs/10.1144/GSL.SP.2002.192.01.11 [3] https://www.youtube.com/watch?v=gsQGOJtwdv0

The code name was a suggestion by Zena Cardman, a Marine Microbiologist and a NASA Astronaut. The release notes were written by Meren, Iva Veseli, and Matt Schechter, who are among the developers of anvi'o. The notes were proofread by Katy Lambert-Slosarska, who is a MSc student at the International Max Planck Research School of Marine Microbiology (MarMic).

New anvi'o programs, artifacts, and workflows

The new version of anvi'o comes with a few new programs:

And a few new artifacts:

In addition, this release makes available three new Snakemake workflows that are accessible via the anvi'o program anvi-run-workflow: trnaseq, ecophylo, and sra_download.

A new subsystem for metabolic modeling

One of the biggest news in this release is the set of programs now anvi'o includes for metabolic modeling. These programs are emerging as a by-product of collaborative projects in C-CoMP, or the Center for Chemical Currencies of a Microbial Planet, and under the leadership of Samuel Miller.

Using the integrated anvi'o metabolic modeling subsystem, one can generate a biochemical reaction network suitable for metabolic modeling from the annotations in a genome or a pangenome using the new program anvi-reaction-network. This works on both individual genomes (using a contigs-db and pangenomes (using a genomes-storage-db). The resulting network is stored in corresponding anvi'o database for programmatic access, and can be exported into a JSON file for inspection and downstream usage (i.e., as input into a program for flux-balance analysis) via another new program, anvi-get-metabolic-model-file.

These programs rely on KEGG Orthology (KO) annotations of protein-coding genes and reference data in the ModelSEED Biochemistry database, which can be downloaded and set up on your computer using the programs anvi-setup-kegg-data and anvi-setup-modelseed-database, respectively.

For additional information, please see PRs #2058, #2072, and #2123.

Substantial improvements to metabolic pathway prediction in anvi'o

Anvi'o metabolism offers a full suite of integrated tools to study metabolism in microbial genomes and metagenomes, and multiple recent papers from our group (i.e., by Watson et al and Veseli et al) propelled a series of improvements thanks work from Iva Veseli. We hope these improvements summarized below will also help anvi'o users at large.

Improved data download and processing

Now multiple aspects of anvi'o rely on data from KEGG, so we decided to revamp how we download it. The old program anvi-setup-kegg-kofams has been changed to a new program, anvi-setup-kegg-data. This program has multiple modes for downloading KOfam profiles, KEGG MODULE data, KEGG BRITE hierarchies (PR #1910), and modeling data for anvi-reaction-network. It can be multi-threaded for faster downloads.

However, for most users we recommend the default usage of this program, which downloads a pre-processed snapshot of everything you need for downstream programs working on this data. Please see anvi-setup-kegg-data.

Improvements in pathway prediction

The metabolism framework in anvi'o has undergone a lot of changes in the past year, with the addition of several notable features mainly concerning the use of the program anvi-estimate-metabolism:

"Stepwise" metrics: a new strategy for interpreting metabolic pathway definitions

As of PR #1927, we've added a new way of interpreting metabolic modules which affects how metrics like completeness and copy number are calculated. This strategy is called 'stepwise' interpretation because it considers only the major, non-redundant steps in a metabolic module. In this method, alternative enzymes, or in some cases alternative series of enzymes, are evaluated as one entity. Stepwise metrics may be appropriate for those interested in summarizing generic metabolic capacity with less focus on the specific enzymes that are required.

The former method of interpreting pathways is now referred to as the 'pathwise' strategy because it involves deconstructing the module definition into all possible unique combinations of enzymes required to catalyze the reactions in the metabolic pathway (so it considers all possible 'paths' through the module). Metrics are still calculated using this strategy and are labeled with the term 'pathwise' to distinguish them from the stepwise metrics.

You can find a description of these two strategies, along with examples, here.

Calculation of pathway copy number

This release also introduces a redundancy metric for metagenome-wide analyses - pathway copy number. This metric can be added to your output files using the --add-copy-number flag, and will be calculated using both the pathwise and the stepwise strategies. This metric may be most appropriate when your input data represents a multitude of organisms (as in when you input a metagenome without using the --metagenome-mode flag).

In our documentation, you can find an explanation of the pathwise copy number calculation and the stepwise copy number calculation. This feature was added in PR #1927.

User-defined metabolic modules

anvi-estimate-metabolism now has the ability to work with user-defined metabolic pathways based on arbitrary functional annotation sources as of PR #1867.

Users wishing to define their own metabolic modules can use the new anvi'o artifact, user-modules-data. The files can either be written manually or generated via the script anvi-script-gen-user-module-file (See PR #1872). The program anvi-setup-user-modules can then convert these module files into a database that can be used with anvi-estimate-metabolism via the --user-modules flag, as described here.

To support the use of arbitrary HMMs as an annotation source for user-defined metabolic modules, the program anvi-run-hmms now has a flag called --add-to-functions-table, which causes any HMM hits to be stored as functional annotations. See here for details.

Miscellaneous updates

Beyond the major features described above, there are a few miscellaneous changes to the metabolism codebase.

You no longer have to rely on having contigs databases as input to anvi-estimate-metabolism. Thanks to help from Antonio Fernandez-Guerra, this program now can accept a simple list of enzymes as input. See PR #1890 as well as this help section.
The output options and formats for anvi-estimate-metabolism are different. See this page for details. One new output feature that may particularly help with interpretation of these data is the addition of columns related to enzymes that are unique to a given metabolic module. These are described in PR #1867.

A new anvi'o workflow to study phylogeography of any gene family

Exploring the ecology and evolution of microbes across environments with metagenomic data is a common task for microbiologists. What if we applied this framework to gene families? The availability of large metagenomic datasets and fast computational biology toolsets provide us a unique opportunity to explore the limits of gene diversity! To leverage this, Matthew Schechter led the development of the ecophylo workflow, which can simultaneously profile ecological and phylogenetic relationships between gene families and environments.

The final output of the ecophylo workflow is an interactive interface that includes (1) a phylogenetic analysis of all genes detected by the HMM in genomes and/or metagenomes, and (2) the distribution pattern of each of these genes across metagenomes if the user provided metagenomic short reads to survey.

For more details please see the ecophylo documentation.

A new anvi'o framework to identify genomic inversions and quantify their activity

Genetic variants can rapidly proliferate even in populations taht grow from a single cell. One class of such variants emerge from 'inversions', a genetic phenomenon through which a microorganism can mediate the ON/OFF orientation of a promoter region regulating the expression of a downstream gene. Using paired-end short reads and quantifying their orientation upon mapping to a genomic context, one can identify and quantify inversions and their activities.

Thanks to Florian Trigodet's efforts, this version of anvi'o comes with a new program, anvi-report-inversions to study inversions in genomes and metagenomes across environments and to quantify the relative proportion of each inversion orientation in each sample.

The anvi-report-inversions workflow will (1) find genomic regions of interest (based on short-read recruitment data), (2) find palindromic motifs in regions of interest (where the pair of inverted repeats (IR) that surround the inversion site is found, (3) confirm the inversion (by going back to the BAM file and make sure the IR is the true one among multiple potential IRs that may occur in the region of interest), (4) compute the inversion activity (using the raw R1/R2 sequences from FASTQ files find support for activity, and (5) generate extensive reporting (including the genomic context, and genes that surround the inversion site). These reports will include a lot of information in text file outputs (see inversion-txt for details), as well as a static HTML output that does not require an anvi'o installation to browse.

A new suite of programs to analyze Transfer RNA transcripts

Anvi'o now includes a comprehensive (yet very experimental) software framework to support the analysis of tRNA transcript sequencing (as demonstrated here). The 'tRex Tools', as Samuel Miller calls them, include new programs for the identification of tRNA sequences and their modification sites in tRNA-seq results. The primary output of tRex tools in anvi'o is a set of tRNA seeds, each of which represents a mature tRNA sequence (minus the 3’-CCA acceptor) from the input set of samples. These capabilities are implemented in a set of programs that can be run individually or as part of the tRNAseq workflow:

The anvi-trnaseq program predicts tRNA sequences, structures, and modifications from a single tRNA-seq library
The anvi-merge-trnaseq program combines the results across multiple tRNA-seq libraries and computes a final set of tRNA seeds (as well as their coverage across samples)
To analyze the taxonomy associated with tRNA sequences, there are two programs to be run in sequence: anvi-run-trna-taxonomy and anvi-estimate-trna-taxonomy
Finally, the program anvi-tabulate-trnaseq exports the tRNA-associated coverage and modification data as tab-delimited files

You can also generate nice plots of the tRNA seed coverages and modification sites with anvi-plot-trnaseq.

A new variant of the contigs database -- the trnaseq-contigs-db, which stores tRNA seeds instead of contigs -- and a new variant of the profile database -- the trnaseq-profile-db, which stores modification positions and both specific and non-specific coverage of tRNA seeds -- makes the integration of these new data types possible.

This is quite an experimental workflow, and if you plan to use it, please get in touch with us.

Please follow the latest installation instructions at https://anvio.org/install/ and come to the anvi'o Discord channel if you have any qeustions or concerns, or to simply join our community.

v7.1

2 years ago

A minor release with many bug fixes and new anvi'o toys.

Please see our up-to-date installation instructions at https://merenlab.org/install-anvio.

v7

3 years ago

We are happy to announce anvi'o v7 with the code name, "hope"!

After more than 3,000 changes that introduced about 35,000 new lines of code, this stable release of anvi'o represents one of the largest leaps forward in the history of the platform that introduces many new features, performance improvements, and fixes for known bugs.

This page intends to give you a summary of some of the notable changes that come with hope.

The code name recognizes Hope E. Hopps as a tribute to all laboratory technicians whose contributions have often been poorly recognized in science. This is despite the fact that technicians not only ensure accuracy, efficiency, and reproducibility in any laboratory, but also push the boundaries of science as much as any other member of their groups, if not more in many cases. Hopps was a specialist in infectious diseases and in 1966 she developed, together with Harry M. Meyer and Paul J. Parkman, a highly effective vaccine for rubella, a viral infection which caused more than 30,000 stillbirths in the United States alone between 1962 and 1965. Despite her role in the vaccine development, in a historical photograph by the NIH that portrays the rubella vaccine development team, Hopps was only identified as "Female Lab Technician" until recently, even though the caption of the same photograph explicitly named Meyer and Parkman. The unfair treatment of laboratory technicians remains to be commonplace in today's science. In fact, "not more than a technician's job" can serve as an argument for professors when they wish to refuse the recognition of one's contributions to science. We can't ignore the significant progress we have made as a community during the past few years. But while we continue working on increasing the diversity, equity, and inclusion in science, we must also recognize and face the implicit and explicit biases against those in science who are not PIs, post-docs, or graduate students.

Disclosures: The code name was a suggestion by Alon Shaiber, a Genomics Data Scientist at Weill Cornell Medicine. The release notes were written by Meren and proofread by Iva Veseli. Alon, Meren, and Iva are among the developers of anvi'o.

New help pages for anvi'o programs and artifacts

As anvi'o developers, we always knew the critical importance of providing our users with extensive tutorials so they can find their way through their data themselves. However, as anvi'o matured, the number of anvi'o programs and artifacts increased dramatically. This created a bottleneck since every anvi'o tutorial assumed that our users knew about the common concepts in anvi'o (such as 'the profile database' or 'a collection') or common anvi'o programs (such as 'anvi-profile' or 'anvi-interactive'). Solving this fundamental problem required us to think of an entirely new technical approach to our documentation that is now in place.

We have now implemented a system (#1425) that makes two things possible:

(1) For anvi'o developers, a means to quickly describe their contributions without leaving the environment where they write code (for instance, here is the description of 'collection' in the codebase),
(2) For anvi'o users, a means to be able to see that information on a web page where all anvi'o programs and concepts are interconnected (for instance, here is the description of 'collection' on the web page).

This way, anvi'o could accumulate information from its developers without burdening them and present it to its users in a way where self-learning is possible. However, there was one significant problem: retrospectively describing all the things that have already been implemented in the codebase. Enter Jessica Pan (@Jessica-Pan), an undergraduate student at the MIT. Jessica took the responsibility of describing existing anvi'o programs and artifacts a few months ago and with the guidance of other anvi'o developers, Jessica was able to populate this technical framework with her words and descriptions (#1470), which added more than 200 files and tens of thousands of words of documentation to the codebase.

With this release, we are happy to also release the first outputs of this documentation project here, with the hope that it will make your life with anvi'o a little easier going forward:

https://merenlab.org/software/anvio/help/7/

Perhaps it will not be a surprise to the long-term anvi'o users that this documentation system is also connected to our command-line programs. Thanks to this, they will be able to offer you more useful help menu outputs. For instance, if you were to type anvi-interactive --help in your terminal in v7, you would see the following section at the end of the help menu, so you can click on the link to go to the online description of the program and browse through examples and artifacts associated with it:

If you visit the help pages you will see there are 'edit' links under every file. It is our way of inviting the rest of the community to contribute to these pages with their own experiences with anvi'o tools. If you have ideas to make it better, come to our Slack channel for a discussion, or file a GitHub issue. We are all ears.

Significant performance improvements

This version will likely be remembered for significant performance improvements by multiple heroes, including Evan Kiefl (@ekiefl), Iva Veseli (@ivagljiva), and Ryan Moore (@mooreryan). Here is a glimpse of what happened in v7 compared to v6:

Profiling BAM files is one of the most critical steps in anvi'o and the program anvi-profile has been a nightmare for memory and processing time. Thanks to @ekiefl's significant improvements that influenced the runtime and memory requirements of the profiling step (#1362, #1339), anvi-profile is now ~17 times faster.
One of the first steps in any anvi'o workflow is turning boring FASTA files into talented contigs databases that can be used by many anvi'o programs. Yet, the program anvi-gen-contigs-database has not been multithreaded, which has been a significant performance bottleneck before (#1344, #1431). Responding to our plea on Twitter, @mooreryan has made remarkable contributions to the anvi'o codebase (while wrapping up his PhD, that is) (#1437, #1468, and #1445). As a result, anvi-gen-contigs-database is now multithreaded, at least two times faster than before, and can take advantage of your fancy clusters to be even more efficient.
After making our users who care about the quality of their MAGs go through so much pain and suffering for years, anvi-refine in this release is ~13 times faster after significant improvements in its memory requirements (#1455, #1458), thanks to help from Xabier Vázquez-Campos (@xvazquezc).
Our dealings with HMMs also benefited from major performance improvements. Thanks to @ivagljiva's efforts (#1413), the two frequently used programs, anvi-run-hmms and anvi-run-pfams that rely on HMMER, will perform HMM annotations ~3x faster.

An integrated subsystem for metabolic reconstruction

We are also thrilled to announce that starting with this release, anvi'o includes a new suite of programs for predicting metabolic capabilities for genomic and metagenomic data, thanks to @ivagljiva's extensive work that started with #1413. The new programs in this release rely on the extensive set of resources in the Kyoto Encyclopedia of Genes and Genomes (KEGG) for gene annotation and metabolism estimation, although in future releases we will expand source resources for metabolic reconstruction.

As a part of this subsystem, this release introduces a new database, the anvi'o MODULES.db, which is generated from parsed KEGG data files (such as KOfam HMM profiles and text-based descriptions of KEGG MODULE) and used by subsequent programs detailed below for easy, organized access to metabolic data. A version tracking system ensures metabolism estimation is run using the same MODULES.db that was used to annotate a given CONTIGS.db. Here are the key programs for the anvi'o subsystem for metabolism:

The program anvi-setup-kegg-kofams generates a MODULES.db and stores it on your server or the local computer.
The program anvi-run-kegg-kofams annotates genes in a given anvi'o contigs database with KEGG Orthology (KO) numbers (via hits to the KEGG KOfam database). This program is included in the anvi'o snakemake workflows Alon Shaiber (@ShaiberAlon) had introduced, which enables bulk annotation of several contigs databases with a single command.
The program anvi-estimate-metabolism predicts metabolic potential in a given set of sequences by integrating KO annotations with KEGG MODULE information. These estimates are integrated into anvi'o in various ways and can be summarized in flat text files. In addition to contigs databases, and optionally profile databases and collections, the program accepts internal, external, or metagenomes files as input. The program is able to work with a variety of output options.

We look forward to the input from the community to offer improved and integrated metabolic insights into microbial genomes and metagenomes.

New tools for Transfer RNA biology

One of the most significant advances in this release include the new tools developed by Samuel Miller (@semiller10) for the study of Transfer RNA transcripts that are generated by tRNA-seq, a sequencing protocol that is developed by the members of Tao Pan's group at the University of Chicago. This sequencing strategy aims to offer high-resolution insights into the translational regulation of cells by revealing changes in the abundance and chemical modifications of tRNA transcript across environmental conditions. While the sequencing strategy makes accessible tRNA transcripts from diverse environments, the extremely complex data generated by this strategy requires completely new computational approaches not only to characterize tRNA sequences with their structural properties but also to resolve chemical modification fractions and their taxonomy. We hope that the more than 10,000 lines of code @semiller10 has created behind anvi-trnaseq and anvi-convert-trnaseq-database (primarily through #1509 and #1615) and their cousin programs anvi-scan-trnas and anvi-estimate-trna-taxonomy, will set the stage for new horizons that can bring more RNA biology into the anvi'o ecosystem in an integrated fashion.

Ability to profile INDELs in mapping results

Anvi'o offers a powerful and comprehensive framework to enable in-depth investigations of microbial population genetics. Yet, these insights have so far been limited to single-nucleotide variants, single-codon variants, and single-amino acid variants as reported and/or used by anvi-gen-variability-profile, anvi-display-structure, and anvi-gen-fixation-index-matrix or as displayed in varioius interactive interfaces.

This version comes with a completely redesigned anvi-profile, which is now able to characterize INDELs in read recruitment results thanks to @ekiefl's additions to the codebase especially in #1394. The ability to characterize INDELs pushes the boundaries of our ability to make sense of microbial population genetics through metagenomes, and we hope you will find many gems in your data, as well.

Currently, anvi'o reports INDEL tables that are ready to go into R or any other statistical analysis or visualization environment that you can obtain via the program anvi-export-table with --table indels:

To benefit from this feature in your existing projects, you will need to re-profile your BAM files ... which should be easy-peasy if you have been using anvi'o snakemake workflows ;)

Improvements in the interactive interface

The power of anvi'o in the command line is complemented by its interactive interfaces, which also benefit from numerous new features in this release. But perhaps the most critical improvements were those contributed by Isaac Fink (@isaacfink21), an undergraduate student at the University of Chicago, who revamped the inspection page (in #1466 and others).

Not only we are now able to visualize single-nucleotide variants better,

but also the interactive interface is now able to visualize INDELs in (meta)genomic/(meta)transcriptomic read recruitment results,

so you can spend EVEN MORE TIME looking at your coverages.

Isaac Fink's revamped inspection page comes with a 'settings' panel, organizing all the features this page offers, which looks like this:

There are other exciting features that will likely make those who use anvi'o for pangenomics and phylogenomics very happy.

Anvi'o interactive interface was not programmed to show support values for phylogenomic trees, which was a long standing item (#1450) in the list of feature requests we had. Matthew Lawrence Klein (@matthewlawrenceklein) who joined our team only a few weeks ago, managed to improve this shortcoming through #1618 thanks to the guidance we received from Tom Delmont. Result? Anvi'o can now visualize branch support values on trees when applicable:

While this is a preliminary implementation of this feature, we are looking forward to the feedback we will receive from the community to improve it.

Another critical shortcoming was the difficulty of selecting multiple items using the interactive interface when there was no tree or dendrogram to guide the selection of objects. This happened especially working with pangenomes, where ordering gene clusters based on their synteny is a common need, yet selecting regions of interest requires clicking on each item individually. @matthewlawrenceklein also addressed this through #1614, and it is now possible to select a range of items in a straightforward fashion:

On top, after selecting regions of interest in a pangenome, it is now possible to take a quick peek at functions gene clusters encode through the interactive interface without having to run anvi-summarize:

Other fancy tools and functionality

Ability to estimate per-residue binding frequencies for protein strucutres with anvi-run-interacdome by Evan Kiefl. This enables very highly resolved analyses of environmental variants in conjunction with the structural context of proteins and their ligand binding sites. If this sounds interesting to you, read Evan's journey implementing this feature, see his gargantuan of a pull request at #1472, or read this paper by Shilpa Nadimpalli Kobren and Mona Singh to get yourself familiarized with the goal here.
Ability to extract gene loci or operons from any genomic context with anvi-export-locus by Matthew Schechter (@mschecht) and others. By enabling high-throughput recovery of loci of interest across any number of genomes by marking genes with their functional annotations or HMM hits, this strategy makes it possible to ask very specific questions regarding the gene content, evolution, and ecology of genomic operons. You can read Matt's tutorial here, or see is pull request at #1386.
Generalization of the functional enrichment analysis in #1500 by Iva Veseli. This statistical approach was initially developed by Amy Willis (@adw96) and was implemented in the anvi'o program anvi-get-enriched-functions-per-pan-group. As the name suggests, this tool was specific to studying functional enrichment in pangenomes. Thanks to Iva's contribution, the new program anvi-compute-functional-enrichment is now able to work with pangenomes, metabolic pathways, and internal or external genomes to study functional enrichment statistics between distinct groups of entities.
Integration of the 2020 release of the NCBI's Clusters of Orthologous Groups database to increase the utility of anvi-run-ncbi-cogs for functional annotations (#1570).

And many more.

A list of new anvi'o programs

This release comes with the following programs that were not in the previous stable release: anvi-convert-trnaseq-database, anvi-display-metabolism, anvi-estimate-metabolism, anvi-estimate-trna-taxonomy, anvi-run-interacdome, anvi-run-kegg-kofams, anvi-run-trna-taxonomy, anvi-script-augustus-output-to-external-gene-calls, anvi-script-fix-homopolymer-indels, anvi-script-gen-pseudo-paired-reads-from-fastq, anvi-script-get-primer-matches, anvi-script-pfam-accessions-to-hmms-directory, anvi-script-tabulate, anvi-setup-interacdome, anvi-setup-kegg-kofams, anvi-setup-scg-taxonomy, anvi-setup-trna-taxonomy, anvi-trnaseq.

Anvi'o as a community platform: ✅

We have always imagined anvi'o as a community platform, and we are getting there. Even this very release is a product of voluntary contributions of many members of the anvi'o community, who slowly shape this open-source software ecosystem for integrated multi-omics.

A few weeks ago we published a comment that was authored by those who are mentioned in these release notes and many more who have been supporting anvi'o in many ways to make it more accessible to the community of microbiologists.

Our paper ends with this statement:

(...) As an open-source platform that empowers microbiologists by offering them integrated yet uncharted means to steer through complex ‘omics data, anvi’o welcomes its new users and contributors.

We thank you for your interest in anvi'o and for your patience with it, in advance. We hope that anvi'o will continue to empower you in 2021 so you can find the answers you are looking for in the avalanche of data that surrounds you.

See our up-to-date installation instructions here, which include docker and conda solutions and ways to reach out to the anvi'o community for help if you run into a problem.

v6.2

4 years ago

v6.1

4 years ago

v6

4 years ago

We are happy to announce a new version of anvi'o, "esther" (easy install through conda, quick try via docker)).

After nearly 9,000 changes that introduced about 16,000 new lines of code, the current version of anvi'o represents many fixes to big and small bugs, as well as new features. This page intends to give you a summary of most notable changes that come with esther.

The codename is a small tribute to Esther Lederberg (1922-2006), an American microbiologist who studied plasmids and bacterial viruses. Lederberg discovered lambda phage, an E. coli virus that is commonly used in bacterial genetics and molecular biology to deliver DNA into a recipient organism. This led to her description of specialized transduction, that occurs when a prophage improperly excises from the host chromosome carrying host DNA in addition to the viral DNA. In collaboration with her husband, Lederberg developed the technique known as replica plating, which allows repeatable inoculation of bacterial colonies. Lederberg and Luigi L. Cavalli-Sforza discovered the Fertility factor or F-plasmid in E. coli. This is a sequence of DNA that lets the host cell transfer genetic material via a rod-like structure into recipient cells (conjugation). Despite her many incredible scientific accomplishments, she was constantly overshadowed by her husband. She was not appointed to a tenured position while they were both faculty at Stanford, and after their divorce she had a difficult time retaining her appointment. We dedicate anvi'o version 6 to the memory and revolutionary discoveries of Dr. Lederberg.

Real-time estimation of genome taxonomy

Working with genomes often requires insights into their taxonomy. This becomes a critical need especially in genome-resolved metagenomics studies as we are burning to find out where the genomes we reconstruct from metagenomes fit in the tree of life. Until this esther, anvi’o did not offer anything to address this need, however, this new version comes with a novel solution that covers both the interactive interface during binning:

and the terminal environment to survey existing collections of genomes:

These two examples are from the infant gut dataset by Sharon et al (2013), which we often use to demonstrate anvi'o features, but we can't wait to hear from you to learn about your experience with this feature.

Please read in this article the usage details, our thanks to The Genome Taxonomy Database for making their raw data public, and potential caveats of our approach:

http://merenlab.org/scg-taxonomy

None of this would have been possible without the coding help from Quentin Clayssen and Özcan Esen, and critical suggestions from Alon Shaiber.

A new tool for genome de-replication

De-replication is a critical need to minimize bias in metagenomic read recruitment analyses. In our previous studies we had performed de-replication with a series of Python scripts, but no more. Thanks to Mahmoud Yousef and Evan Kiefl's efforts, we now have two new programs, anvi-compute-genome-similarity and anvi-dereplicate-genomes, integrated with metagenomic and pangenomic workflows in anvi'o and use sourmash and PyANI in the backend.

A tutorial for their usage is on the way!

Support for more binning algorithms

In previous versions of anvi'o we had a native module for CONCOCT, one of the popular binning algorithms for automatic clustering of contigs into genome bins. We have changed that behavior in this version. You will still be able to use the program anvi-import-collection to import binning results from ANY binning software as before, but anvi'o will also be able to automatically use binning tools existing on your system through our new program anvi-cluster-contigs. Here is a command line output to give you a sense of it:

This framework is highly modular, so the integration of new binning algorithms is extremely straightforward thanks to Özcan Esen's excellent design. If you are a programmer you can take a look at the module for MaxBin2 or BinSanity to develop one for your algorithm for benchmarking or testing efforts.

Effective ways to inspect and visualize contig coverages

Recognizing the importance of actually 'looking' at data, we have been putting a lot of emphasis on the inspection capabilities of anvi'o. When it comes to metagenomic read recruitment and coverages, inspecting contigs can be critical to gain deeper insights into what is actually going on.

In this version we have two new programs. The first one is anvi-inspect. The inspect page of anvi’o is very useful for careful examination of contig coverages and single nucleotide variants. Sometimes this might even be all you want. This new program enables you to immediately pull up the inspection page of a given contig without going through the whole hassle of opening the interactive interface.

We often feel the need to put coverage patterns of contigs in presentations or publications. Yet it becomes challenging when there are too many samples in a dataset as it makes it harder to study or save patterns comfortably using the interface. So we thought it would have been very useful if anvi'o could export coverage statistics using ggplot, but we didn't know enough R to be able to do this properly. As a result, we did what anyone who wish to work with talented people would do --we asked for help on Twitter:

Our call for help was heard by Ryan Moore, who actually developed a new anvi'o program that did exaxtly what we thought you would need, and much more: anvi-script-visualize-split-coverages (we sent him an anvi'o t-shirt as a token of our deep gratitude for his contribution, but we never got a photo back, so we don't know whether he is wearing it).

This program can export split coverages along with single-nuleotide variants on them into PDF files for even very large numbers of samples. It uses the output files anvi'o generates through anvi-get-split-coverages and optionally anvi-gen-variability-profile. The output is customizable with respect to plot color, axes, SNV color and grouping of samples. The tutorial for this feature will soon be on our web page.

Improved genome completion/redundancy estimates

New single-copy core gene collections

Starting with this version, we no longer use Campbell et al. and Rinke et al. single-copy core gene (SCGs) HMM sets to estimate completion of bacterial and archaeal genomes. Instead, we are using a modified version of the bacterial single-copy core gene collections Mike Lee recently described, and a set of BUSCO HMMs Tom Delmont curated. Now anvi'o can estimate the completion of bacterial, archaeal, and protist genomes (#1150).

New random forest domain of life classifier

In previous versions anvi'o has relied on multiple heuristics to predict the domains of selected contigs or genomes for the determination of which SCG collection to use to estimate and display completion and redundancy. In this version we have a brand new random forest classifier to take care of this challenging task. This robust classifier with appropriate addition of noise solves this issue like magic, and when you have a bunch of genomes, it gives you proper estimates in the interface (the example is also from the infant gut dataset),

or in the terminal,

Undo/Redo for the interactive interface

Yes. This feature is finally here. Now when you make a mistake while curating or refining your genomes using anvi-interactive or anvi-refine, you will be able to use Ctrl + Z and Ctrl + Shift + Z key combinations for undo and redo your binning decisions. If you can't contain your emotions, consider taking Özcan Esen for a coffee for this excellent feature :)

A new tool to extract target loci from genomes and metagenomes

Some genetic analyses call for the comparison of specific genetic loci between genomes. For example, one may be interested in investigating evidence for adaptive evolution of the lac operon between different E. coli strains by extracting all loci from different genomes. Anvi'o esther comes with a very talented tool, anvi-export-locus, that will help you extract target loci from a larger genomic context, whether those context are genomes or metagenomic assemblies.

This tool cuts out loci using two approaches: default mode or what we call flank-mode. In the default mode, the tool locates a designated anchor gene, then cuts upstream and downstream context based on user-defined input. Flank-mode, on the other hand, locates designated genes that surround the target locus, then cuts in between them. Target genes of interest to locate anchors for exicion can be defined through their specific ids in anvi'o or through search-terms that query functional annotations or HMM hits stored in your contigs databse!

If you find it useful for your reserach, you can send post card to Alon Shaiber, Evan Kiefl, and the newest member of anvi'o developers, Matthew Schechter. A tutorial is on the way :)

Much faster HMMs

You complained, we heard (hehe). In anvi'o esther we finally fixed the sluggish speeds of HMM operations from which we you have suffered even when you assigned multiple threads to anvi-run-hmms. Özcan Esen revamped our code and has improved our speed dramatically with increasing number of threads given to anvi'o. Our tests indicate that speed gains roam around as much as four gazillion.

Much better functional enrichment analyses for pangenomes

Anvi'o esther comes with a new version of anvi-get-enriched-functions-per-pan-group thanks to the invaluable statistics input and code we have received from Amy Willis (@AmyDWillis). Please take a look at our tutorial on pangenomics for details.

Anvi'o gets better at helping you

Getting offline help from anvi'o has been difficult. Recognizing this limitation, Evan Kiefl created the program anvi-help that will help you find your way through anvi'o by simply asking anvi'o what does it have to do X. Here is an example. You type the following,

anvi-help functions

And you get back this:

As a part of our efforts to make information more accessible to you, Iva Veseli created a new resource: Getting help from the anvi'o community.

Thanks!

During the last few months the list of anvi'o developers grew rapidly, for which we are extremely grateful. The sixth version of the platform, which is now close to 80,000 lines of fully open Python and JavaScript code, would not have been possible without those who took their time to participate this community effort with their ideas and expertise.

We are very also very thankful for our users, whose feature requests, bug reports, and patience continue to give us energy to push things forward (although I can promise that we are not going to be pushing anything anywhere for a week or two after this release as we all just want to take a very long nap).

Finally, we thank all the open-source software developers and data curators everywhere. Without them none of these would have ever existed.

We hope esther helps you with your research 😇

To read the updated installation instructions for v6, please visit http://merenlab.org/install-anvio

If you are interested in anvi'o but don't know where to start, please read our "getting help" document, catch us in one of our free workshops, or find us on our Slack channel.

v5

5 years ago

We are happy to announce a new version of anvi'o, "margaret".

After nearly 1,500 changes that introduced about 15,000 new lines to the anvi'o codebase and removed about 4,000 from it, the current version includes many fixes to big and small bugs, as well as new features. This page intends to give you a summary of most notable changes that comes with margaret.

The codename is a small tribute to Margaret Oakley Dayhoff, an American physical chemist, who is known as the founder of bioinformatics. Dayhoff developed first programmable computer methods to compare protein sequences, and published in 1965 a book titled "Atlas of Protein Sequences and Structure", which is considered as of today the first text book of bioinformatics. The codename was suggested by Mick Watson, and won the popular vote on Twitter. Dayhoff sadly died at an early age of 57 in 1994, shortly before bioinformatcis emerged as a distinct field. However, her astonishing contributions to life sciences, such as the development of essential approaches for protein sequence comparison and evolutionary tree construction, still constitute some of the most common approaches in our bioinformatics toolkit.

Your new disconcerting toy: GC-content overlaid on reference contexts

Metagenomic read recruitment often results in wavy coverage patterns in the reference context. This phenomenon, which can be attributed to three major sources, can result in up to an order of magnitude coverage difference for genes within the same contig. While we are kind enough to leave those alone who solely work with metagenomic short reads to quantify functions in metagenomes in their blissful world, we wanted to include in this version of anvi'o something so you can overlay GC-content change throughout your contigs to see whether variation you observe in the context of some of your key genes is largely driven by GC-content or not:

This is not yet anything but a qualitative insight for you to make sense of to what extent variation in coverage could be explained by deterministic factors that have nothing to do with the biology of your system given the metagenome, but it shows that more quantitative insights into this could be useful. We will think about this going forward, and we are open to your suggestions!

A new anvi'o workflow management system for serious anvians

This new version of anvi'o includes a new program anvi-run-workflow, which provides an interface to our new module that implements snakemake-based anvi'o workflows.

These workflows offer accessible, reproducible, and comprehensible solutions for complex analyses that may include hundreds of samples. We have been using anvi-run-workflow every day in our lab since it first appeared in our master repository, and we are happy to make its power available to you as soon as we could.

There will be an extensive tutorial very soon, but until then you can send your questions to Alon (smiley).

Single-codon variants for a more powerful framework to study microbial population genetics

Anvi'o already could make sense of single-amino acid variants (SAAVs) in environmental metagenomes. But working with SAAVs was limiting our ability to infer and quantify neutral processes that may not result in changes in the amino acid sequence. We changed our design in such a way, now anvi-profile can characterize single-codon variants (SCVs) if --profile-SCVs flag is declared. We updated our reference manual for variability analysis to include new sections describing SCVs and SAAVs.

With SNVs, SCVs, and SAAVs, anvi'o v5, deserving of its codename, offers a robust framework to investigate population genetics of environmental microbes, while SCVs and SAAVs leverage our ability to tease apart evolutionary forces acting upon them. We hope you enjoy these new toys, and feel free to get in touch with us if you have questions or suggestions.

Visualize environmental variation on protein structures through the new Structure DB

Our efforts to push the boundaries of investigations of environmental variation within microbial populations reaches to a new level in this release with a brand new ability about which we are very excited: linking variation to predicted protein structures.

With the new structure database associated workflows anvi’o can predict the tertiary structure of genes identified from a contigs database using the Protein Database Bank. Then, it can directly overlay onto the predicted protein structures the variability data from your metagenomes in the form of SCVs and SAAVs. All of this is accomplished in just two new programs, anvi-gen-structure-database and anvi-display-structure.

We believe that this nexus between structural biology and metagenomics will elevate environmental metagenomics into the realm of biophysics, and enable investigations into evolutionary processes driving the diversity of proteins that could not be learned from sequence analyses alone.

With these new advances come two new dependencies to additional open-source software, for which we are very grateful: MODELLER and DSSP.

Here is a teaser from the new interactive anvi-display-structure interface:

We will soon make available an extensive tutorial to describe this workflow in detail. Until the, you can send your questions to Evan and Ozcan.

Computing average nucleotide identity for genomes in pangenomes

This release also includes significant improvements for our comparative genomics and pangenomics workflows.

One of these improvements is the inclusion of a new program, anvi-compute-ani, to calculate the average nucleotide identities across a given set of genomes, which can be automatically added into any anvi'o pangenome.

For instance, this is an anvi'o pangenome of the 31 Prochlorococcus isolates we played with in our recent paper:

And this is what you get when you run anvi-compute-ani:

Mike Lee had suggested this as an option a long time ago. We are happy to finally deliver this functionality, which uses pyANI as a backend, for which we are thankful for its developers.

We updated our tutorial on pangenomics to describe intermediate steps.

A new approach to explore functional enrichment in pangenomes

This version of anvi'o also incluedes a new analytical framework to study functional enrichment in a given pangenome based on any arbitrary organizations of genomes. You simply define how would you like to partition your genomes, whether based on a phylogenetic tree or a dendrogram that anvi'o computed from gene cluster distributions, and this new tool finds functions that are enriched in those groups (i.e. functions that are characteristic of a given group of genomes, and predominantly absent from genomes from outside this group).

This is done by the new program anvi-get-enriched-functions-per-pan-group, and Alon extended our current tutorial on pangenomics with an extensive description of how it works.

Native functional annotation options += PFAMs

If you have your own functional annotations for your genes in an anvi'o contigs database, it is quite straightforward to import them via anvi-import-functions program. Anvi'o v3 had made available another program to automatize the annotation process, anvi-run-ncbi-cogs, if you were fine with NCBI's Cluster of Orthologus Groups. This release contains a new program, anvi-run-pfams to use the collection of HMMs produced by the European Bioinformatics Institute based on UniProt.

Tree modification through the interactive interface

It has been a challenge to deal with phylogenetic tree operations in anvi'o interactive interface. This version includes a significant code refactoring effort, which makes possible to have new toys that we could not have before. These new toys include basic tree editing and storage abilities such as re-rooting trees, rotating and collapsing branches. You can even see the branch support values in the mouse tab of the anvi'o interactive interface. These functions are now available to you through the menu that appears when you click a branch in the interactive interface while pressing the Command or Control key:

A new HMM collection to estimate completion of eukaryotic bins

Since its conception anvi’o included single-copy core gene collections to assess the completion and redundancy of bacterial and archaeal bins. This release includes a collection to estimate the completion of eukaryotic bins that Tom Delmont, who recently left us physically to join the ranks of Genoscope, curated from the BUSCO collection.

See Tom's blog post for details and preliminary benchmarks (also, if you are finding these release notes too boring to read, you can try reading this one too).

If you are recovering tiny eukaryotic organisms from your metagenomes please help us improve this collection by reporting back your experiences with it.

Importing metagenome-level short-read taxonomy and the enhanced stacked bar data type

While our efforts on shotgun metagenomes largely focus on genome-resolved strategies, we acknowledge that one could learn a lot from taxonomic annotation of short-reads as an additional layer of information. In this release anvi'o comes with a new program, anvi-import-taxonomy-for-layers with a KrakenHLL parser, which can import short-read level taxonomic annotations into anvi'o profiles. Thanks to the improved data groups, different levels of taxonomy would be available in the layers tab,

And could be visualized easily:

The best part is that our improved stacked bar data type in this release then would allow you to order your metagenomes based on the relative abundance of any given taxon at any given taxonomic level in those metagenomes according to short reads (the example below, orders metagenomes in the infant gut dataset from Sharon et al. based on the increasing relative abundance of Enterococcus):

Here we would like to assume that you're saying to yourself "the example is boring, but the concept has promise". Thanks! We agree.

Thanks

A year ago I listened to Jeff Gordon's talk at the University of Chicago to which he started with this African proverb:

If you want to go fast, go alone. If you want to go far, go together.

This concept applies to scientific endeavors so well. Speed is transient, and teamwork is essential for major contributions. Fortunately anvi'o has been becoming more and more of a team effort. But looking at our release notes, I don't know whether we could go any faster from v4 to v5 either. This release was a result of significant intellectual and coding contributions from Alon Shaiber, Evan Kiefl, as well as Özcan Esen, whose guidance and hard work continue to keep this operation together. Altogether, they spent hours and hours on big and small features and issues, with an enthusiasm that can be best justified by curiosity and the desire to contribute to your journey in data-driven microbiology. I, Meren, who gets to write this release note one more time, thank them wholeheartedly.

As a team we also thank Jarrod Scott, Alexandra Campbell, Samantha Atkinson, Carlos Ruiz, Bryan Merill, Mike Lee, Varun Srinivasan, and many others who asked for features and reported bugs with their endless patience with us.

We hope you find v5 useful for your research, and we certainly hope you will not run into any bugs we probably left in the code 😇

If you are interested in anvi'o but don't know where to start, catch us in one of our free workshops, or find us on our Slack channel.

v4

6 years ago

We are happy to announce a new version of anvi'o, "rosalind".

After nearly 300 changes that introduced about 15,000 new lines, and removed about 7,500 from the anvi'o codebase, the current version includes many bug fixes, as well as some new features. This release note intends to give you a summary of most important changes.

The codename is a small tribute to Rosalind Franklin, the British biophysicist whose work, among other advances in life sciences, led to the discovery of the DNA double helix. This codename was inspired by Emily Crossette's suggestion, 'esther', "after Esther Lederberg, who co-developed a replica plating method with her husband but was largely unrecognized and discriminated against as a woman scientist". Emily explained that her suggestion was to "celebrate how far we have come as a scientific community and look to the future". Yes. We fortunately did not stay where we were, but we are still far from where we could have been. We remember these women and many others with respect and gratitude, and understand our responsibility to make sure the younger generations of scientists will not suffer from the kinds of discrimination to which their professors were subjected.

An elegant way to upgrade anvi'o databases

Upgrading anvi'o databases is now simpler than ever. With this change, the number of excuses you can use to not switch to the newest version of anvi'o goes from "0" to "-1". Just saying.

We now have a single program, anvi-migrate-db, that upgrades any anvi'o database to the latest version in one step.

As a part of this change we replaced all HDF5 files, which resulted in tremendous performance gain (especially in pangenomic operations that required access to the genome storage database), and up to 10-fold reduction in disk storage needs (for auxiliary data files). As a result, these changes did occur: No more CONTIGS.h5 --the content of this file is now a part of the CONTIGS.db (yay for less clutter). No more SAMPLES.db (more on this down below). Genome storage and auxiliary data files now have .db extensions rather than .h5 as they are now SQLite databases, instead of HD5 files.

Improvements in the pangenomic workflow

We made multiple very critical improvements in our pangenomic workflow. Here is a list of them:

These are the gene clusters you are looking for. Now it is possible to "select" gene clusters programmatically both from the command line, and from the anvi'o interactive interface through the combination of filters. We thank Ryan Bartelme for pushing us to improve our pangenomic workflow as he once again did in https://github.com/merenlab/anvio/issues/668. Gene clusters that match to these filters are highlighted immediately on the interface, and can be added into any bin/collection for summary:

screen shot 2018-02-15 at 11 41 19 am

Search gene clusters by function. We also now have the capacity to search for gene clusters that describe genes with functions of interest through the command line as well as the interactive interface:

screen shot 2018-02-15 at 4 10 20 pm

Parallel alignment. After identifying all gene clusters in a given pangenome, anvi'o by default would use muscle or famsa to store multiple sequence alignments for amino acid sequences in each gene cluster. This was one of the most time consuming steps of the pangenomic workflow. With v4, anvi'o uses as many cores as you wish anvi'o to use to parallelize amino acid alignments per gene cluster. It changes a lot.

Forced synteny. Gene clusters in a pangenome are by default organized based on their distribution across genomes (so that is the dendrogram in the center). However, with this version there are additional ways to order them, including ordering them by "synteny". In this forced organization you get to choose one of the genomes in your analysis from the "item orders" combo box, which tells anvi'o that you wish to order all gene clusters in your pangenome based on the order of genes in that genome. We found it to be an efficient way to study missing genomic loci, and other not-so-straightforward-to-spot phenomena.

Everything is better in color. Arguably, one of the most important improvements to the pangenomic workflow was the addition of an amino acid alignment conservancy coloring algorithm. This was done in https://github.com/merenlab/anvio/pull/732 by Mahmoud Yousef, who is currently a second year Computer Science student at the University of Chicago. Mahmoud also very kindly wrote a blog post to explain the details of this algorithm with examples: http://merenlab.org/2018/02/13/color-coding-aa-alignments/.

Gene popups. Now you can click gene caller ids next to the amino acid sequence alignment in inspection pages, and enjoy these functional popups to access any information (https://github.com/merenlab/anvio/issues/680):

wo5r5f5kl1

Cleaner terminology. After consulting with the community, we changed all instances of 'protein clusters' in our pangenomic workflow with 'gene clusters'.

Metapangenomics: linking pangenomes and metagenomes

Anvi'o comes with powerful analytical tools to study pangenomes and metagenomes. Now you can take things one step further with the same ease-of-use.

We define metapangenomics as the outcome of the analysis of pangenomes in conjunction with the environment where the abundance and prevalence of gene clusters and genomes are recovered through shotgun metagenomes. This version includes a new program, anvi-meta-pan-genome, that brings the power of metapangenomics into a single command line. Please read our paper on the Prochlorococcus metapangenome to see how this concept could apply to your research.

Improvements in the interactive interface

This release also include multiple notable improvements in the interactive interface.

The 'max coverage' fix we all needed but didn't know. Inspection pages are great to investigate coverage data and single-nucleotide variants in a single-nucleotide resolution, however, it was not quite easy to make visual sense of data when coverage values dramatically differed between samples, or short but non-specific mapping pushed maximum coverage values too high to make sense of the actual population coverage in the context of long contigs. In v4 you will see additional buttons in the inspection pages to mitigate these kinds of visual imperfections. Here is some action for you skeptics:

kmqe0scgf1

Descriptions tab gets 1-up. One of the most useful features of the interactive interface is the "Descriptions" tab. Yes, we know you are not using it, but you should. Here is an example to see why you should use them (just wait until the page loads, and see all the information that will show up in the right panel): https://anvi-server.org/merenlab/dwh_o_desum. The description tab is extremely useful to take notes and store them in a profile database to remember later. With this new version, you will be able to point out to an item (https://github.com/merenlab/anvio/issues/715), which will give access to the reader so they can see where it is on the display by highlighting it, or they can inspect it by clicking 'inspect':

ezgif-1-73e497cef7

Gene mode: a new, highly-resolved interactive mode to study genome bins

This is yet another way for you to examine your data in high-resolution.

We added a flag to anvi-interactive: --gene-mode. When you use this flag along with a collection and a bin name, it allows you to load the interactive interface in the "gene mode". In this mode every item is a gene, instead of a contig, and you can see the coverage, detection, non-outlier coverage, and non-outlier standard deviation of coverage statistics per gene, independently. You can use these data to order the genes, and order the samples. Inspection of nucleotide level coverages, gene sequences, and even gene functions could also be explored in this mode. This allowed us to easily recognize genes that recruit a lot of non-specific mapping, and identify hyper-variable regions in our genomes. One can also search for genes with certain functions, and see their coverages, and the coverage of the genes that are next to them.

Please refer to the help menu for the interactive interface (via anvi-interactive -h or here) to find out more about this mode.

We are excited about this new feature, and we plan to expand it in future versions of anvi'o. If you have any suggestions/complaints/compliments please leave a comment in this issue: #754. We will soon put a tutorial for this mode online, so stay tuned!

A new and elegant way to extend anvi'o displays: additional data tables

We made a major change in our design to simplify the way various data for items and layers can be imported into anvi'o profile or pan databases, and managed. This change opens doors for endless possibilities to manipulate additional data streams through interactive, command line, and application programmer interfaces. Please see a detailed description on this new framework here: http://merenlab.org/2017/12/11/additional-data-tables/.

Other improvements

Optional noise cutoffs for HMMs. This has been a long standing issue (detailed in https://github.com/merenlab/anvio/issues/498). The current version allows user defined noisee-cutoff terms and make it easier for anyone who wish to make anvi'o use their own HMM collection.

Anvi'o vignettes. All anvi'o programs, their categories, parameters, and help: http://merenlab.org/software/anvio/vignette/. This is big, guys.

Variability performance improvement. Anvi'o now relies on pandas (not the animal, sadly, but the library) to take care of variability operations. While this will not impact user experience much, the code is much more elegant now and we wanted you to know it. See the Pull Request at https://github.com/merenlab/anvio/pull/660 for more details.

Anvi'o disco mode [ON]. We heard more than once that people do not realize that they need to click the 'Draw' button in the interface if there is no default state to load and draw everything automtically (#739). So, disco:

ezgif-1-e7a6fe2486

New scripts and programs

We have some new programs that comes with this version. Click on their links to learn more about them.

Thanks

We know that developing anvi'o would have been much less fun without its enthusiastic and engaged users. We are thankful for those, including Bryan Merrill, Rika Anderson, Mike Lee, Marta Royo-Llonch, Xabier Vázquez-Campos, Emily Crossette, Varun Srinivasan, Julie Reveillaud, Alban Mathieu, and others, who help us improve anvi'o with their science, patience, issue reports, and suggestions.

We are also thankful to our users that share their experiences, such as Elaina Graham, who recently wrote about importing GhostKOALA/KEGG annotations into anvi'o, and Bryan Merrill, who shared his experience with importing VirSorter annotations into anvi'o to study phages for making anvi'o more accessible to the community.

v3

6 years ago

We are happy to announce a new version of anvi'o, "Eden".

After more than 300 changes that introduced about 6,000 new lines, and removed about 2,250 from the anvi'o codebase, the current version includes many bug fixes, as well as some new features. This release note intends to give you a summary of most important changes.

The codename, which was suggested by @watsonar and won the popular vote, is to mark the arrival of the newest honorary member of our lab. Despite her very young age which doesn't even round to a positive integer, she managed to tip the scale of the gender diversity in our lab for the better. We thank Alon and Rebecca for allowing us to witness essential beauties of life through their happiness.

A new program: anvi-display-contigs-stats

Thanks to @ozcan's recent efforts, anvi'o now can give you basic stats about your contigs databases. Using anvi-display-contigs-stats you can generate insights from one or more contigs databases. Beyond obvious uses, we hope it to be useful to interactively compare different assemblers on the same dataset, or multiple genomes for each of which you have a contigs database.

Since a basic framework is now in place for such comparisons, we will be looking forward to hearing suggestions from anvi'o users to improve it further at every chance.

Here is what you see when you run it on the contigs database of our FMT study:

anvi-display-contigs-stats FMT-CONTIGS.db

And this is what you see when you run it on a bunch of single genomes:

anvi-display-contigs-stats c0328-Microgenomates.db \
                             c0319-Microgenomates.db \
                             c0091-Candidate_CPR3.db \
                             c0205-Parcubacteria.db \
                             c0661-Parcubacteria.db \
                             c0792-Candidate_WS6.db

We hope you try it on your current contigs databases, and let us know about your suggestions.

A new versioning approach

We will no longer follow the standards of semantic versioning with anvi'o releases. The last anvi'o version was v2.4.0, this one v3, and the next one will likely be v4, unless there is an absolutely minor change that will not require you to update your v3 installation.

Why leaving the field standards? Anvi'o is an essential tool for us to do science, and we know of multiple other groups besides our's that also use this platform quite rigorously to go after their own questions. Since this is not a media player, which would continue to play your favorite shows even if you don't keep it up-to-date, and since the excellence of everything we do with anvi'o depends on its accuracy, we want our users to update their installations every time there is a new version. When this is the case, the conventional versioning of software becomes rather irrelevant, and somewhat confusing. We are aware of the fact that installing and updating software could be quite a frustrating task, and we do our best to improve those steps for you. If you have any questions or suggestions, please send them to the anvi'o discussion group.

Making a release is quite a painful process, and I personally hate it passionately as it takes at least a full day from my life. We are only doing it when there is a need for it. So, please keep your anvi'o up-to-date while @ozcan, myself, and other developers and contributors do their best to try to keep it bug free for you :)

Bug fixes, new anvi'o programs, and new flags

This version includes a large amount of bug fixes and minor improvement of available programs with additional flags.

New programs that became available with this release include anvi-delete-state, anvi-display-contigs-stats, and anvi-export-samples-db.

We also added new scripts to anvi'o distribution, such as anvi-script-add-default-collection (to add a quick 'default' collection to access to all splits in a profile database, and anvi-script-filter-fasta-by-blast (to remove weak hits from BLAST search outputs).

Some of the new flags we added to our existing programs include --return-codon-frequencies-instead to be able to get codon frequencies instead of amino acid frequencies from BAM files via anvi-get-aa-frequencies, --items-order to explicitly provide an items order to anvi-interactive, --max-num-genes-missing-from-bin and —min-num-bins-gene-occurs to remarkably improve the functionality of anvi-get-seqeunces-for-hmm-hits to filter weak genomes or genes for better phylogenomic analyses thanks to Ryan Bartelme's suggestions, and --align-with flag so we can optionally use FAMSA instead of muscle, which works exceptionally better than muscle in our experience to align sequences within our protein clusters. We heard FAMSA thanks to Antonio Fernandez-Guerra.

Contigs database version bump

We realized that were. You will be able to update your existing contigs databases without any pain via the upgrade script anvi-script-upgrade-contigs-db-v8-to-v9.

Genome storage version bump

We extended the functionality of anvi'o genome storages that are used for pangenomic workflows. They now also keep a copy of every gene they describe in DNA alphabet. Unfortunately this also requires an update, which can be done via anvi-script-upgrade-genomes-storage-v3-to-v4.

Thank you for your interest in anvi'o!

Please find anvi'o tutorials and installation instructions here:

http://merenlab.org/software/anvio

v2.4.0

6 years ago

We are happy to announce the new version of anvi'o, "Pyrenees".

After 350 changes in the codebase that introduced more than 4,500 lines of code and removed about 9,000, the current version includes many bug fixes, as well as some important additions to the repository. This release note will give you a summary of most important changes.

The codename is to honor our friend and colleague, Tom Delmont, who is going to continue working with us from France. The Pyrenees is a mountain range in southwest Europe between France and Spain, where Tom spent most of his life.

Single-amino acid variants (SAAVs)

Since its very first version, anvi'o has been providing you with one of the most comprehensive frameworks to make sense of single-nucleotide variants (SNVs). Here is a tutorial for skeptics.

Although it has been in anvi'o for more than a year, we are very excited to officially announce the anvi'o workflow to investigate single-amino acid variants. More resources on SAAVs will soon be available, and we will keep you posted. But if you have been using the anvi-gen-variability-profile program to characterize single-nucleotide populations in your population genomes, all you need to do now is to add --engine AA to get single amino acid profiles.

Fluency in phylogenomics

Anvi'o now can speak phylogenomics. Here you will find an extensive tutorial with reproducible examples:

http://merenlab.org/2017/06/07/phylogenomics/

You now can do phylogenomic analyses in anvi'o for metagenomic bins or pangenomes thanks to others who pushed us for it. Luke McKay of the Montana State University asked for gene concatenation for HMM hits in genomes from metagenomes, we delivered it, (47fd334476b82cc0caf6ad31904972fa49b52447, b9780a522888e42ccefed06a7bb8a575b0f204da, 29a382751d9e019a08125e5f7cec80da674697c7, 3ee670efbf5b5e339f42d201d2629768814cbddd, fa7128a11d4066393fa2eedbfd9296f73d95142f). Then, Ryan Bartelme of the University of Wisconsin-Milwaukee asked for concatenated genes in protein clusters of pangenomes by sending a private e-mail, and we delivered it, too (1726476d1d84a3e03950fb32c813dd9ed5f7edcd, 3d07e8b671db66496f6d5061029a1b01158e8b8c, 99359bc5bdfbbd7688182e3871992dd28f9488b0). We didn't stop there. We implemented a simple driver for FastTree (227e892fda1bc23191a07221055d8b83132fee73) for starters so you can immediately start playing with your data. As always, we are thankful for your suggestions.

Identifying ribosomal RNAs everywhere

In most cases, getting ribosomal RNA genes out of isolate or metagenome-assembled genomes is not as straightforward as one would like, even when the assembler managed to assemble them.

Gene callers usually don’t perform well when it comes to identifying 16S or 23S rRNA genes, and using primer sequences for these regions is not exactly 2017. We added a new feature in anvi’o that reduces the recovery of rRNA gene sequences from isolate, single-cell, or metagenome-assembled genomes to a couple of key strokes. More information and examples are here:

http://merenlab.org/2017/07/09/recovering-rRNAs/

The small but mighty 'push' button

This is something we are very excited about. You know how sometimes you have something on your interactive interface you would like to show to your colleagues, or share with everyone on the planet? Well, with this version you will be able to click on this little new cloud button on the interface,

and you will be able to send your interactive display directly to http://anvi-server.org or any other anvi'server instance online. After which you can share this interactive display in read-only mode privately with your colleagues, editors, or reviewers, or with everyone by making it public.

Here is an appropriate use of it in a recent paper in Cell Reports that gives another dimension to a static figure:

https://www.youtube.com/watch?v=WtPB9GchuEI

More functional side buttons

Now we have a way to deliver some important news to your door step:

More information on this new addition, including notes about possible privacy concerns is here:

http://merenlab.org/2017/05/16/anvio-news-panel/

Please consider sharing your thoughts and opinions on this.

We also made sure anvi'o can convey extensive descriptions of the displays shown in anvi'o. Here is an example:

The description tab gives access to an editor, in which you can describe the data using Markdown syntax. We hope that this practice catches on, so whenever someone looks at an anvi'o display, they know that there will be some information on the side panel to better understand the details.

Please find anvi'o tutorials and installation instructions here:

http://merenlab.org/software/anvio