MetaPhlAn Versions Save

MetaPhlAn is a computational tool for profiling the composition of microbial communities from metagenomic shotgun sequencing data

4.1.0

2 months ago

Version 4.1.0 (Feb 20th, 2024)

Database updates

  • We just released the new vJun23 database
    • Addition of ~45k reference genomes from NCBI
    • Addition of ~50k MAGs from ocean, ~40k MAGs from soil, ~30k MAGs from domestic animals and non-human primates, ~4k MAGs from giant turtles, ~7.5k MAGs from skin microbiome, ~20k MAGs from dental plaque, ~15k MAGs from Asian populations, ~2.7k MAGs from ancient and modern Bolivians and other small datasets from diverse sources
    • Expansion of the markers database with 36,822 SGBs (6,272 more SGBs than in vOct22)
  • Inclusion of the new Viral Sequence Clusters (VSCs) database
    • Containing 3,944 VSCs clustered into 1,345 Viral Sequence Groups (VSGs).
    • Including a total of 45,872 representative VSGs sequences.
    • Each cluster/group is labeled as known (kVSG) or unknown (uVSG) depending on the presence of at least a viral RefSeq reference genome within the cluster/group.

New features

  • [MetaPhlAn] The new --profile_vsc parameter (together with --vsc_out and --vsc_breadth) enables the profiling of viral sequence clusters.
  • [MetaPhlAn] The --subsampling now subsamples the FASTQ files and not the mapping results
  • [MetaPhlAn] The new --mapping_subsampling parameter enables the previous mapping subsampling behaviour
  • [MetaPhlAn] The new --subsampling_output parameter enables to save the subsampled FASTQ file
  • [MetaPhlAn] The new create_toy_database.py script enables the custom filtering of the MetaPhlAn databases

Changed features

  • [MetaPhlAn] The average read length is included in the output header with the -t rel_ab_w_read_stats parameter
  • [StrainPhlAn] Quasi-markers behaviour in line with that of MetaPhlAn
  • [StrainPhlAn] sample2markers.py output is now in JSON format
  • [StrainPhlAn] Simplified sample and marker filtering parameters, integrated with primary/secondary samples
  • [StrainPhlAn] Faster inference of small and medium phylogenies
  • [StrainPhlAn] Faster execution of the parameter –-print_clades_only

4.0.6

1 year ago

Version 4.0.6 (Mar 1st, 2023)

Changed features

  • [MetaPhlAn] The GTDB taxonomic assignment for the vOct22 database is now available.

4.0.5

1 year ago

Version 4.0.5 (Feb 23rd, 2023)

Database updates

  • We just released the new vOct22 database
  • Addition of ~200k new genomes
  • 3,580 more SGBs than the vJan21
  • 2,548 genomes considered reference genomes in vJan21 were relabelled as MAGs in NCBI -> 1,550 kSGBs in vJan21 are now uSGBs in vOct22
  • Removed redundant reference genomes from the vJan21 genomic database using a MASH distance threshold at 0.1%
  • Local reclustering to improve SGB definitions of oversized or too-close SGBs
  • Improved GGB and FGB definitions by reclustering SGB centroids from scratch
  • Improved phylum assignment of SGBs with no reference genomes at FGB level using MASH distances on amino acids to find the closest kSGB

Changed features

  • [StrainPhlAn] Improved StrainPhlAn's speed when running with the --print_clades_only option

Missing features

  • [MetaPhlAn] The GTDB taxonomic assignment for the vOct22 database is not available yet (expected release: end of Feb 2023)
  • [MetaPhlAn] The phylogenetic tree of life for the vOct22 database is not available yet (expected release: TBD).

4.0.4

1 year ago

Version 4.0.4 (Jan 17nd, 2023)

Changed features

  • [MetaPhlAn] Download of the pre-computed Bowtie2 database is now the default option during installation
  • [StrainPhlAn] Improved StrainPhlAn's sample2makers.py script performance and speed

Fixes

  • [StrainPhlAn] Fixes error when using --abs_n_samples_threshold in the PhyloPhlAn call

4.0.3

1 year ago

Version 4.0.3 (Oct 24nd, 2022)

Changed features

  • [MetaPhlAn] Removal of the NCBI taxID from the merged profiles produced by the merge_metaphlan_profiles.py script
  • [StrainPhlAn] Improved StrainPhlAn's performance in the markers/samples filtering step

Fixes

  • [MetaPhlAn] -t rel_ab_w_read_stats now produces the reads stats also at the SGB level
  • [MetaPhlAn] Fixes overstimation of reads aligned to known clades
  • [MetaPhlAn] Fixes error when not providing the number of reads using SAM files as input
  • [StrainPhlAn] Fixes No markers were found for the clade error while executing StrainPhlAn without providing the clade markers FASTA file

4.0.2

1 year ago

Version 4.0.2 (Sep 22nd, 2022)

New features

  • [MetaPhlAn] The new --subsampling parameter allows reads' subsampling on the flight
  • [MetaPhlAn] The new --subsampling_seed parameter enables a deterministic or randomized subsampling of the reads
  • [MetaPhlAn] The new --gtdb_profiles of the merge_metaphlan_profiles.tsv allows the merge of GTDB-based MetaPhlAn profiles
  • [StrainPhlAn] The new --breadth_thres parameter allows StrainPhlAn to filter the consensus markers sequences after the execution of sample2markers.py
  • [StrainPhlAn] Interactive selection of the available SGBs when the clade is specified at the species level
  • [StrainPhlAn] The new --non_interactive parameter disables user interaction when running StrainPhlAn
  • [StrainPhlAn] The new --abs_n_markers_thres and --abs_n_samples_thres parameters enables the specification of the samples/markers filtering thresholds in absolute numbers
  • [StrainPhlAn] The new --treeshrink parameter enables StrainPhlAn to run TreeShrink for outlier removal in the tree
  • [StrainPhlAn] Addition of the VallesColomerM_2022_Jan21_thresholds.tsv for compatibility with the mpa_vJan21 database
  • [StrainPhlAn] The new --clades parameter enables sample2markers.py to restrict the reconstruction of markers to the specified clades

Changed features

  • [StrainPhlAn] The -c parameter of the extract_markers.py script now allows the specification of multiple clades
  • [StrainPhlAn] The --print_clades_only parameter now produces an output print_clades_only.tsv report
  • [StrainPhlAn] Compatibility with clade markers compressed in bz2 format
  • [StrainPhlAn] The strain_transmission.py script now uses by the default the VallesColomerM_2022_Jan21_thresholds.tsv thresholds

Fixes

  • [MetaPhlAn] metaphlan2krona.py and hclust2 have been added to the bioconda recipe

4.0.1

1 year ago

Changes in version 4.0.1

  • The new --offline parameter stops MetaPhlAn from automatically checking for updates
  • Fixes "KeyError: 't'" error when running MetaPhlAn with the --CAMI_format_output parameter
  • Improved StrainPhlAn's gaps management with the newest version of PhyloPhlAn (version 3.0.3)
  • Improved set of colors for the plot_tree_graphlan.py script

4.0.0

1 year ago

MetaPhlAn 4 relies on ~5.1M unique clade-specific marker genes identified from ~1M microbial genomes (~236,600 references and 771,500 metagenomic assembled genomes) spanning 26,970 species-level genome bins (SGBs), 4,992 of them taxonomically unidentified at the species level.

What's new in version 4

  • Adoption of the species-level genome bins system (SGBs)
  • New MetaPhlAn marker genes extracted identified from ~1M microbial genomes
  • Ability to profile 21,978 known (kSGBs) and 4,992 unknown (uSGBs) microbial species
  • Better representation of, not only the human gut microbiome but also many other animal and ecological environments
  • Estimation of metagenome composed by microbes not included in the database with parameter --unclassified_estimation
  • Compatibility with MetaPhlAn 3 databases with parameter --mpa3

4.beta.3

1 year ago

MetaPhlAn 4 relies on ~5.1M unique clade-specific marker genes identified from ~1M microbial genomes (~236,600 references and 771,500 metagenomic assembled genomes) spanning 26,970 species-level genome bins (SGBs, http://segatalab.cibio.unitn.it/data/Pasolli_et_al.html), 4,992 of them taxonomically unidentified at the species level. What was changed in version 4:

  • Adoption of the species-level genome bins system (SGBs).
  • New MetaPhlAn marker genes extracted identified from ~1M microbial genomes.
  • Ability to profile 21,978 known (kSGBs) and 4,992 unknown (uSGBs) microbial species.
  • Better representation of, not only the human gut microbiome but also many other animal and ecological environments.
  • Estimation of metagenome composed by microbes not included in the database with parameter --unclassified_estimation.
  • Compatibility with MetaPhlAn 3 databases with parameter --mpa3

3.1.0

1 year ago

MetaPhlAn 3.1 represents a moderate update to the bioBakery 3 software and databases. This update is based on improvements in our ability to align NCBI-sourced microbial genomes and their constituent genes to UniProt resources alongside additional removal of low-quality species (see PMID:33944776 for the definition of “low-quality species”).

What has changed in version 3.1:

  • 433 low-quality species were removed from the MetaPhlAn 3.1 marker database and 2,680 species were added (for a new total of 15,766; a 17% increase).
  • Marker genes for a subset of existing bioBakery 3 species were also revised.
  • Most existing bioBakery 3 species pangenomes were updated with revised or expanded gene content.
  • MetaPhlAn 3.1 software has been updated to work with revised marker database.