DRAM Versions Save

Distilled and Refined Annotation of Metabolism: A tool for the annotation and curation of function for microbial and viral genomes

v1.5.0

4 months ago

This is the official release of DRAM1.5.0. The 1.5.0 release has significant changes that could impact your research. Please review these changes and help us validate this release!

Install / upgrade:

If DRAM is installed with Bioconda, and then it can be upgraded like any Conda package. Note that the Conda package for dram may be delayed slightly while it is validated, but it should be available within a day or two of the release.

If you already have a DRAM environment and want to upgrade:

# Activate your old DRAM environment first!
# Save your old config
DRAM-setup.py export_config > my_old_config.txt
# install DRAM
wget https://raw.githubusercontent.com/shafferm/DRAM/master/environment.yaml
conda env update -f environment.yaml -n DRAM --prune
# import your old databases
DRAM-setup.py import_config --config_loc  my_old_config.txt

If you are using an old database, like in the example above, you may need to check out a special version of dram from GitHub.

git clone https://github.com/WrightonLabCSU/DRAM.git
cd DRAM
git checkout dbcan_no_ec
conda env update -f environment.yaml -n DRAM --prune
conda activate DRAM
conda install pip
pip install ./

To install the DRAM in a new Conda environment, follow the instructions in the README.

Change log DRAM1.5.0:

  1. DRAM annotate now has a new database which may be included. The new database CAMPER (Curated Annotations for Microbial (Poly)phenol Enzymes and Reactions) can be incorporated into DRAM.

Please visit the CAMPER GitHub for more information: https://github.com/WrightonLabCSU/CAMPER

  1. Accumulation of mmseq temporary files during annotation are now removed immediately after a given sample has processed. Before, these files were removed after all samples were annotated. This reduces storage space needed for a given DRAM run.

  2. "scikit-bio" related error. This error arose when scikit-bio was updated. While always using the latest version of a software can be important for security updates, the stability of DRAM is our main concern. To solve this, we have explicitly stated each version of each dependency within the environment.yaml file.

v1.4.6

1 year ago

This is the official release of DRAM1.4.56. The 1.4.0 release has significant changes that could impact your research. The 1.4.4 point release is less significant, but still important for dram-v and dram users. DRAM 1.4.5 and 1.4.6 are a bug fix releases, so there is no new information. Please review these changes and help us validate this release!

Install / upgrade:

If DRAM is installed with Bioconda, and then it can be upgraded like any Conda package. Note that the Conda package for dram may be delayed slightly while it is validated, but it should be available within a day or two of the release.

If you already have a DRAM environment and want to upgrade:

# Activate your old DRAM environment first!
# Save your old config
DRAM-setup.py export_config > my_old_config.txt
# install DRAM
wget https://raw.githubusercontent.com/shafferm/DRAM/master/environment.yaml
conda env update -f environment.yaml -n DRAM --prune
# import your old databases
DRAM-setup.py import_config --config_loc  my_old_config.txt

If you are using an old database, like in the example above, you may need to check out a special version of dram from GitHub.

git clone https://github.com/WrightonLabCSU/DRAM.git
cd DRAM
git checkout dbcan_no_ec
conda env update -f environment.yaml -n DRAM --prune
conda activate DRAM
conda install pip
pip install ./

To install the DRAM in a new Conda environment, follow the instructions in the README.

Point Release Update:

1.4.6:

  1. In order to react to changes in ref-seqs viral, the number of viral files has been changed from 2 to 1.

1.4.5:

  1. Bug fix related to the default values of the config file. Specifically, the CONFIG file retained information from testing that could mess with the setup process if the paths were not overwriten by new dbs.
  2. Added a unit test to check that the CONFIG file that is committed to GitHub is compatible in the future.

1.4.4:

  1. Bug fixes have been made all to the setup script to support the many ways the DRAM databases get build, You will see them in the merge history.
  2. Previously, the DRAM-v AMG summary did not add match data for AMGs that were matched to the AMG Database only. This was confusing, and so now information relevant to the AMG Database is in the AMG summary along with the Metabolic Database. This adds the new columns "metabolism", "reference", and "verified", and the "gene_id_origin" field which tells you where this Gene ID came from. Remember that a sequence can match to more than one sequence and this is more common in the AMG Database, so your AMG Summary will be longer and contain more duplicates.
  3. DRAM1.4.X collects subfamily EC numbers for the raw annotations, but does not use them in the distillation process. We have future plans for these EC numbers, but in the meantime it makes it impossible to use older versions of the DRAM databases with the newer DRAM1.4.X. This is not ideal as we do strive for backwards compatibility, sadly the only solution at this time is to create a branch that does not look for the EC numbers. Use the instructions above or in the read me to install the dbcan_no_ec branch from git.
  4. Most output arguments are now required, with only a few exceptions. Most people will not notice this.

Change log DRAM1.4.0:

  1. DRAM distill now includes a new metabolism for methylation. Although planned for DRAM2 you can already include this tool in annotation and distillation provided you follow the instructions below.

    In order to distill with methyl, you need only download the new FASTA file and point to it with the dram custom database options that were introduced in DRAM1.3. Note that in order to distill correctly, you will need to use the correct name ‘methyl’ and must use DRAM 1.4.

    To Annotate with methyl, do something like:

    wget https://raw.githubusercontent.com/shafferm/DRAM/master/data/methylotrophy/methylotrophy.faa
    DRAM.py annotate -i '/some/path/*.fasta' -o dram_output --threads 30 --custom_db_name methyl --custom_fasta_loc methylotrophy.faa
    

    To Distill with methyl:

    wget https://raw.githubusercontent.com/shafferm/DRAM/master/data/methylotrophy/methylotrophy_distillate.tsv
    DRAM.py distill -i dram_output/annotations.tsv -o dram_output/distillate --custom_distillate methylotrophy_distillate.tsv
    

    Learn more about custom databases, in the Wiki.

  2. Glycoside hydrolase subfamily calls, subfamily calls are now being incorporated into annotations with changes in databases and code; this impacts what gets pulled into the distillate and product because these are looking for family level (e.g. AA1) not subfamily level (e.g. AA1_1, AA2_2).

    In response, DRAM is changing the output of the dbCAN database in DRAM1.4. Raw- cazyme subfamilies will be output into the cazy_id column, and the corresponding description for the cazyme family will be put into the cazy_hit column.

    The Distillation in DRAM1.4 will count cazymes marked at subfamily level on the family level; this means for cazyme family AA1 there will be 4 entries in the distillate AA1, AA1_1, AA1_2, and AA1_3 and the sum of these four will be the total number of AA1 cazymes. In DRAM1.3 and previous, the distillate for this example AA1 with no underscore would include cazymes that can be assigned to family AA1, but do not have a subfamily designation.

    The DRAM Product will also count cazymes at the family level. For the AA1 example, AA1_1, AA1_2, and AA1_3 will be counted as AA1 for the current rules in assigning cazymes to compounds.

  3. More changes are also being made that will affect CAZY IDs in DRAM1.4. The cutoff e-value is being changed to 1e-18 to conform to best practices for the database.

    DRAM1.4 also introduced a new column for best hit per gene from dbCAN database named cazy_best_hit. This column will be the match to the gene that has the highest coverage and lowest full-sequence e-value as calculated by mmseqs, with priority on e-value. Cazy_best_hit will be the only column considered downstream in the distillate and product. DRAM1.3 pulls and counts all dbCAN hits above e-value 1e-15, rather than profiling best hits.

    New column corresponding to EC number information from subfamilies, named cazy_subfamily_ec has been added in DRAM1.4. These EC numbers will also be used as part of the distillate along with those from kegg, as part of pathways and other tools. For now, incomplete EC numbers will be included, but not considered for the distillate. The subfamilies will be excluded from the product in order to facilitate its goals of being a larger overview.

  4. Logging is now fully implemented in DRAM1.4. Log files will be created for almost all DRAM functions. The log file for annotations will appear in the annotations' folder by default, and the log file for the dram distillation will by default be in the distillation folder. You can also use the --log_file_path argument to set the log path. A log file for database processing is set by the config file, and by default it will be in the databases' directory. All content that DRAM prints to the command line will appear in the log file .

  5. The dram config now stores when databases were downloaded, citation information and version information when applicable. This information is printed to the log at the beginning of each run. The old format can still be imported if you want to keep your DRAM1.3 databases.

  6. In 1.4 you can set a config file to use in dram annotation and distillation at run time in 2 ways. (1) use --config_loc with DRAM.py or DRAM-v.py or (2) set the environment variable DRAM_CONFIG_LOCATION. This will not store or import the config, and that config will only be used for that run.

  7. Significant Bug fixes are also included in this release.

  • When the input fastas contain duplicates in their header names, the dram annotate step should fail with an error immediately, not at the end of the annotation process, this will save some people a lot of time. It may be that this is only a problem for annotating genomes, in any case it must be in place across workflows.
  • Some users have firewalls on their HPC environments that prevent the download via ftp in some cases converting to http can solve download problems. In DRAM1.4 if ftp links fail, a back-up http link will be attempted before an error is thrown. See issue #206.
  • DRAM1.4 will ensure that if no databases are downloaded, DRAM setup will still work. Previously, some databases depend on data being downloaded and can't be set up with a provided data set.
  • Reduced unnecessary warnings in various repetitive tasks in DRAM distillation by refactoring pandas code.
  • BIO-RELATED This bug change could affect biology. In the past, the counting of EC numbers was inconsistent. When counting the number of EC numbers in a row of the annotations file duplicates were not counted, however if counting the EC numbers for the full set of data the count of EC numbers included such duplicates. This is now corrected, but it could have some small unexpected downstream effects.
  • Glycoside hydrolase subfamily calls.
  • In response to issue #122 You can now pass a config file at run time or by setting the environment variable DRAM_CONFIG_LOCATION. Read more in the Wiki.

Known issues:

  • Speed and memory remain a big problem for DRAM and the estimates in the wiki and other documentation are woefully out of date. Fixing this is a major priority.
  • The annotation merging tool lacks sufficient checks, and fails when files are missing.
  • Code coverage remains low, especially for the less prominent tools.

v1.4.5

1 year ago

This is the official release of DRAM1.4.5. The 1.4.0 release has significant changes that could impact your research. The 1.4.4 point release is less significant, but still important for dram-v and dram users. DRAM 1.4.5 is a bug fix release so there is no new information. Please review these changes and help us validate this release!

Install / upgrade:

If DRAM is installed with Bioconda, and then it can be upgraded like any Conda package. Note that the conda package for dram may be delayed slightly while it is validated, but it should be available within a day or two of the release.

If you already have a DRAM environment and want to upgrade:

# Activate your old DRAM environment first!
# Save your old config
DRAM-setup.py export_config > my_old_config.txt
# install DRAM
wget https://raw.githubusercontent.com/shafferm/DRAM/master/environment.yaml
conda env update -f environment.yaml -n DRAM --prune
# import your old databases
DRAM-setup.py import_config --config_loc  my_old_config.txt

If you are using an old database, like in the example above, you may need to check out a special version of dram from GitHub.

git clone https://github.com/WrightonLabCSU/DRAM.git
cd DRAM
git checkout dbcan_no_ec
conda env update -f environment.yaml -n DRAM --prune
conda activate DRAM
conda install pip
pip install ./

To install the DRAM in a new Conda environment, follow the instructions in the README.

Point Release Update:

1.4.6:

  1. In order to react to changes in ref-seqs viral, the number of viral files has been changed from 2 to 1. 1.4.5:
  2. Bug fix related to the default values of the config file. Specifically, the CONFIG file retained information from testing that could mess with the setup process if the paths were not overwriten by new dbs.
  3. Added a unit test to check that the CONFIG file that is committed to GitHub is compatible in the future. 1.4.4:
  4. Bug fixes have been made all to the setup script to support the many ways the DRAM databases get build, You will see them in the merge history.
  5. Previously, the DRAM-v AMG summary did not add match data for AMGs that were matched to the AMG Database only. This was confusing, and so now information relevant to the AMG Database is in the AMG summary along with the Metabolic Database. This adds the new columns "metabolism", "reference", and "verified", and the "gene_id_origin" field which tells you where this Gene ID came from. Remember that a sequence can match to more than one sequence and this is more common in the AMG Database, so your AMG Summary will be longer and contain more duplicates.
  6. DRAM1.4.X collects subfamily EC numbers for the raw annotations, but does not use them in the distillation process. We have future plans for these EC numbers, but in the meantime it makes it impossible to use older versions of the DRAM databases with the newer DRAM1.4.X. This is not ideal as we do strive for backwards compatibility, sadly the only solution at this time is to create a branch that does not look for the EC numbers. Use the instructions above or in the read me to install the dbcan_no_ec branch from git.
  7. Most output arguments are now required, with only a few exceptions. Most people will not notice this.

Change log DRAM1.4.0:

  1. DRAM distill now includes a new metabolism for methylation. Although planned for DRAM2 you can already include this tool in annotation and distillation provided you follow the instructions below.

    In order to distill with methyl, you need only download the new FASTA file and point to it with the dram custom database options that were introduced in DRAM1.3. Note that in order to distill correctly, you will need to use the correct name ‘methyl’ and must use DRAM 1.4.

    To Annotate with methyl, do something like:

    wget https://raw.githubusercontent.com/shafferm/DRAM/master/data/methylotrophy/methylotrophy.faa
    DRAM.py annotate -i '/some/path/*.fasta' -o dram_output --threads 30 --custom_db_name methyl --custom_fasta_loc methylotrophy.faa
    

    To Distill with methyl:

    wget https://raw.githubusercontent.com/shafferm/DRAM/master/data/methylotrophy/methylotrophy_distillate.tsv
    DRAM.py distill -i dram_output/annotations.tsv -o dram_output/distillate --custom_distillate methylotrophy_distillate.tsv
    

    Learn more about custom databases, in the Wiki.

  2. Glycoside hydrolase subfamily calls, subfamily calls are now being incorporated into annotations with changes in databases and code; this impacts what gets pulled into the distillate and product because these are looking for family level (e.g. AA1) not subfamily level (e.g. AA1_1, AA2_2).

    In response, DRAM is changing the output of the dbCAN database in DRAM1.4. Raw- cazyme subfamilies will be output into the cazy_id column, and the corresponding description for the cazyme family will be put into the cazy_hit column.

    The Distillation in DRAM1.4 will count cazymes marked at subfamily level on the family level; this means for cazyme family AA1 there will be 4 entries in the distillate AA1, AA1_1, AA1_2, and AA1_3 and the sum of these four will be the total number of AA1 cazymes. In DRAM1.3 and previous, the distillate for this example AA1 with no underscore would include cazymes that can be assigned to family AA1, but do not have a subfamily designation.

    The DRAM Product will also count cazymes at the family level. For the AA1 example, AA1_1, AA1_2, and AA1_3 will be counted as AA1 for the current rules in assigning cazymes to compounds.

  3. More changes are also being made that will affect CAZY IDs in DRAM1.4. The cutoff e-value is being changed to 1e-18 to conform to best practices for the database.

    DRAM1.4 also introduced a new column for best hit per gene from dbCAN database named cazy_best_hit. This column will be the match to the gene that has the highest coverage and lowest full-sequence e-value as calculated by mmseqs, with priority on e-value. Cazy_best_hit will be the only column considered downstream in the distillate and product. DRAM1.3 pulls and counts all dbCAN hits above e-value 1e-15, rather than profiling best hits.

    New column corresponding to EC number information from subfamilies, named cazy_subfamily_ec has been added in DRAM1.4. These EC numbers will also be used as part of the distillate along with those from kegg, as part of pathways and other tools. For now, incomplete EC numbers will be included, but not considered for the distillate. The subfamilies will be excluded from the product in order to facilitate its goals of being a larger overview.

  4. Logging is now fully implemented in DRAM1.4. Log files will be created for almost all DRAM functions. The log file for annotations will appear in the annotations' folder by default, and the log file for the dram distillation will by default be in the distillation folder. You can also use the --log_file_path argument to set the log path. A log file for database processing is set by the config file, and by default it will be in the databases' directory. All content that DRAM prints to the command line will appear in the log file .

  5. The dram config now stores when databases were downloaded, citation information and version information when applicable. This information is printed to the log at the beginning of each run. The old format can still be imported if you want to keep your DRAM1.3 databases.

  6. In 1.4 you can set a config file to use in dram annotation and distillation at run time in 2 ways. (1) use --config_loc with DRAM.py or DRAM-v.py or (2) set the environment variable DRAM_CONFIG_LOCATION. This will not store or import the config, and that config will only be used for that run.

  7. Significant Bug fixes are also included in this release.

  • When the input fastas contain duplicates in their header names, the dram annotate step should fail with an error immediately, not at the end of the annotation process, this will save some people a lot of time. It may be that this is only a problem for annotating genomes, in any case it must be in place across workflows.
  • Some users have firewalls on their HPC environments that prevent the download via ftp in some cases converting to http can solve download problems. In DRAM1.4 if ftp links fail, a back-up http link will be attempted before an error is thrown. See issue #206.
  • DRAM1.4 will ensure that if no databases are downloaded, DRAM setup will still work. Previously, some databases depend on data being downloaded and can't be set up with a provided data set.
  • Reduced unnecessary warnings in various repetitive tasks in DRAM distillation by refactoring pandas code.
  • BIO-RELATED This bug change could affect biology. In the past, the counting of EC numbers was inconsistent. When counting the number of EC numbers in a row of the annotations file duplicates were not counted, however if counting the EC numbers for the full set of data the count of EC numbers included such duplicates. This is now corrected, but it could have some small unexpected downstream effects.
  • Glycoside hydrolase subfamily calls.
  • In response to issue #122 You can now pass a config file at run time or by setting the environment variable DRAM_CONFIG_LOCATION. Read more in the Wiki.

Known issues:

  • Speed and memory remain a big problem for DRAM and the estimates in the wiki and other documentation are woefully out of date. Fixing this is a major priority.
  • The annotation merging tool lacks sufficient checks, and fails when files are missing.
  • Code coverage remains low, especially for the less prominent tools.

v1.4.4

1 year ago

This is the official release of DRAM1.4.4. The 1.4.0 release has significant changes that could impact your research. The 1.4.4 point release is less significant, but still important for dram-v and dram users. Please review these changes and help us validate this release!

Install / upgrade:

If DRAM is installed with Bioconda, and then it can be upgraded like any Conda package. Note that the conda package for dram may be delayed slightly while it is validated, but it should be available within a day or two of the release.

If you already have a DRAM environment and want to upgrade:

# Activate your old DRAM environment first!
# Save your old config
DRAM-setup.py export_config > my_old_config.txt
# install DRAM
wget https://raw.githubusercontent.com/shafferm/DRAM/master/environment.yaml
conda env update -f environment.yaml -n DRAM --prune
# import your old databases
DRAM-setup.py import_config --config_loc  my_old_config.txt

If you are using an old database, like in the example above, you may need to check out a special version of dram from GitHub.

git clone https://github.com/WrightonLabCSU/DRAM.git
cd DRAM
git checkout dbcan_no_ec
conda env update -f environment.yaml -n DRAM --prune
conda activate DRAM
conda install pip
pip install ./

To install the DRAM in a new Conda environment, follow the instructions in the README.

Change log DRAM1.4.4 addendum:

  1. Bug fixes have been made all to the setup script to support the many ways the DRAM databases get build, You will see them in the merge history.
  2. Previously, the DRAM-v AMG summary did not add match data for AMGs that were matched to the AMG Database only. This was confusing, and so now information relevant to the AMG Database is in the AMG summary along with the Metabolic Database. This adds the new columns "metabolism", "reference", and "verified", and the "gene_id_origin" field which tells you where this Gene ID came from. Remember that a sequence can match to more than one sequence and this is more common in the AMG Database, so your AMG Summary will be longer and contain more duplicates.
  3. DRAM1.4.X collects subfamily EC numbers for the raw annotations, but does not use them in the distillation process. We have future plans for these EC numbers, but in the meantime it makes it impossible to use older versions of the DRAM databases with the newer DRAM1.4.X. This is not ideal as we do strive for backwards compatibility, sadly the only solution at this time is to create a branch that does not look for the EC numbers. Use the instructions above or in the read me to install the dbcan_no_ec branch from git.
  4. Most output arguments are now required, with only a few exceptions. Most people will not notice this.

Change log DRAM1.4.0:

  1. DRAM distill now includes a new metabolism for methylation. Although planned for DRAM2 you can already include this tool in annotation and distillation provided you follow the instructions below.

    In order to distill with methyl, you need only download the new FASTA file and point to it with the dram custom database options that were introduced in DRAM1.3. Note that in order to distill correctly, you will need to use the correct name ‘methyl’ and must use DRAM 1.4.

    To Annotate with methyl, do something like:

    wget https://raw.githubusercontent.com/shafferm/DRAM/master/data/methylotrophy/methylotrophy.faa
    DRAM.py annotate -i '/some/path/*.fasta' -o dram_output --threads 30 --custom_db_name methyl --custom_fasta_loc methylotrophy.faa
    

    To Distill with methyl:

    wget https://raw.githubusercontent.com/shafferm/DRAM/master/data/methylotrophy/methylotrophy_distillate.tsv
    DRAM.py distill -i dram_output/annotations.tsv -o dram_output/distillate --custom_distillate methylotrophy_distillate.tsv
    

    Learn more about custom databases, in the Wiki.

  2. Glycoside hydrolase subfamily calls, subfamily calls are now being incorporated into annotations with changes in databases and code; this impacts what gets pulled into the distillate and product because these are looking for family level (e.g. AA1) not subfamily level (e.g. AA1_1, AA2_2).

    In response, DRAM is changing the output of the dbCAN database in DRAM1.4. Raw- cazyme subfamilies will be output into the cazy_id column, and the corresponding description for the cazyme family will be put into the cazy_hit column.

    The Distillation in DRAM1.4 will count cazymes marked at subfamily level on the family level; this means for cazyme family AA1 there will be 4 entries in the distillate AA1, AA1_1, AA1_2, and AA1_3 and the sum of these four will be the total number of AA1 cazymes. In DRAM1.3 and previous, the distillate for this example AA1 with no underscore would include cazymes that can be assigned to family AA1, but do not have a subfamily designation.

    The DRAM Product will also count cazymes at the family level. For the AA1 example, AA1_1, AA1_2, and AA1_3 will be counted as AA1 for the current rules in assigning cazymes to compounds.

  3. More changes are also being made that will affect CAZY IDs in DRAM1.4. The cutoff e-value is being changed to 1e-18 to conform to best practices for the database.

    DRAM1.4 also introduced a new column for best hit per gene from dbCAN database named cazy_best_hit. This column will be the match to the gene that has the highest coverage and lowest full-sequence e-value as calculated by mmseqs, with priority on e-value. Cazy_best_hit will be the only column considered downstream in the distillate and product. DRAM1.3 pulls and counts all dbCAN hits above e-value 1e-15, rather than profiling best hits.

    New column corresponding to EC number information from subfamilies, named cazy_subfamily_ec has been added in DRAM1.4. These EC numbers will also be used as part of the distillate along with those from kegg, as part of pathways and other tools. For now, incomplete EC numbers will be included, but not considered for the distillate. The subfamilies will be excluded from the product in order to facilitate its goals of being a larger overview.

  4. Logging is now fully implemented in DRAM1.4. Log files will be created for almost all DRAM functions. The log file for annotations will appear in the annotations' folder by default, and the log file for the dram distillation will by default be in the distillation folder. You can also use the --log_file_path argument to set the log path. A log file for database processing is set by the config file, and by default it will be in the databases' directory. All content that DRAM prints to the command line will appear in the log file .

  5. The dram config now stores when databases were downloaded, citation information and version information when applicable. This information is printed to the log at the beginning of each run. The old format can still be imported if you want to keep your DRAM1.3 databases.

  6. In 1.4 you can set a config file to use in dram annotation and distillation at run time in 2 ways. (1) use --config_loc with DRAM.py or DRAM-v.py or (2) set the environment variable DRAM_CONFIG_LOCATION. This will not store or import the config, and that config will only be used for that run.

  7. Significant Bug fixes are also included in this release.

  • When the input fastas contain duplicates in their header names, the dram annotate step should fail with an error immediately, not at the end of the annotation process, this will save some people a lot of time. It may be that this is only a problem for annotating genomes, in any case it must be in place across workflows.
  • Some users have firewalls on their HPC environments that prevent the download via ftp in some cases converting to http can solve download problems. In DRAM1.4 if ftp links fail, a back-up http link will be attempted before an error is thrown. See issue #206.
  • DRAM1.4 will ensure that if no databases are downloaded, DRAM setup will still work. Previously, some databases depend on data being downloaded and can't be set up with a provided data set.
  • Reduced unnecessary warnings in various repetitive tasks in DRAM distillation by refactoring pandas code.
  • BIO-RELATED This bug change could affect biology. In the past, the counting of EC numbers was inconsistent. When counting the number of EC numbers in a row of the annotations file duplicates were not counted, however if counting the EC numbers for the full set of data the count of EC numbers included such duplicates. This is now corrected, but it could have some small unexpected downstream effects.
  • Glycoside hydrolase subfamily calls.
  • In response to issue #122 You can now pass a config file at run time or by setting the environment variable DRAM_CONFIG_LOCATION. Read more in the Wiki.

Known issues:

  • Speed and memory remain a big problem for DRAM and the estimates in the wiki and other documentation are woefully out of date. Fixing this is a major priority.
  • The annotation merging tool lacks sufficient checks, and fails when files are missing.
  • Code coverage remains low, especially for the less prominent tools.

v1.4.0

1 year ago

This is the official release of DRAM1.4.0. The 1.4.0 release has significant changes that could impact your research. Please review these changes and help us validate this release!

Install / upgrade:

If DRAM is installed with Bioconda, and then it can be upgraded like any Conda package. Note that the conda package for dram may be delayed slightly while it is validated, but it should be available within a day or two of the release.

If you already have a DRAM environment and want to upgrade:

# Activate your old DRAM environment first!
# Save your old config
DRAM-setup.py export_config > my_old_config.txt
# install DRAM
wget https://raw.githubusercontent.com/shafferm/DRAM/master/environment.yaml
conda env update -f environment.yaml -n DRAM --prune
# import your old databases
DRAM-setup.py import_config --config_loc  my_old_config.txt

To install the DRAM in a new Conda environment, follow the instructions in the README.

Change log:

  1. Dram distill now includes a new metabolism for methylation. Although planned for DRAM2 you can already include this tool in annotation and distillation provided you follow the instructions below.

    In order to distill with methyl, you need only download the new FASTA file and point to it with the dram custom database options that were introduced in DRAM1.3. Note that in order to distill correctly, you will need to use the correct name ‘methyl’ and must use DRAM 1.4.

    To Annotate with methyl, do something like:

    wget https://raw.githubusercontent.com/shafferm/DRAM/master/data/methylotrophy/methylotrophy.faa
    DRAM.py annotate -i '/some/path/*.fasta' -o dram_output --threads 30 --custom_db_name methyl --custom_fasta_loc methylotrophy.faa
    

    To Distill with methyl:

    wget https://raw.githubusercontent.com/shafferm/DRAM/master/data/methylotrophy/methylotrophy_distillate.tsv
    DRAM.py distill -i dram_output/annotations.tsv -o dram_output/distillate --custom_distillate methylotrophy_distillate.tsv
    

    Learn more about custom databases, in the Wiki.

  2. Glycoside hydrolase subfamily calls, subfamily calls are now being incorporated into annotations with changes in databases and code; this impacts what gets pulled into the distillate and product because these are looking for family level (e.g. AA1) not subfamily level (e.g. AA1_1, AA2_2).

    In response, DRAM is changing the output of the dbCAN database in DRAM1.4. Raw- cazyme subfamilies will be output into the cazy_id column, and the corresponding description for the cazyme family will be put into the cazy_hit column.

    The Distillation in DRAM1.4 will count cazymes marked at subfamily level on the family level; this means for cazyme family AA1 there will be 4 entries in the distillate AA1, AA1_1, AA1_2, and AA1_3 and the sum of these four will be the total number of AA1 cazymes. In DRAM1.3 and previous, the distillate for this example AA1 with no underscore would include cazymes that can be assigned to family AA1, but do not have a subfamily designation.

    The DRAM Product will also count cazymes at the family level. For the AA1 example, AA1_1, AA1_2, and AA1_3 will be counted as AA1 for the current rules in assigning cazymes to compounds.

  3. More changes are also being made that will affect CAZY IDs in DRAM1.4. The cutoff e-value is being changed to 1e-18 to conform to best practices for the database.

    DRAM1.4 also introduced a new column for best hit per gene from dbCAN database named cazy_best_hit. This column will be the match to the gene that has the highest coverage and lowest full-sequence e-value as calculated by mmseqs, with priority on e-value. Cazy_best_hit will be the only column considered downstream in the distillate and product. DRAM1.3 pulls and counts all dbCAN hits above e-value 1e-15, rather than profiling best hits.

    New column corresponding to EC number information from subfamilies, named cazy_subfamily_ec has been added in DRAM1.4. These EC numbers will also be used as part of the distillate along with those from kegg, as part of pathways and other tools. For now, incomplete EC numbers will be included, but not considered for the distillate. The subfamilies will be excluded from the product in order to facilitate its goals of being a larger overview.

  4. Logging is now fully implemented in DRAM1.4. Log files will be created for almost all DRAM functions. The log file for annotations will appear in the annotations' folder by default, and the log file for the dram distillation will by default be in the distillation folder. You can also use the --log_file_path argument to set the log path. A log file for database processing is set by the config file, and by default it will be in the databases' directory. All content that DRAM prints to the command line will appear in the log file .

  5. The dram config now stores when databases were downloaded, citation information and version information when applicable. This information is printed to the log at the beginning of each run. The old format can still be imported if you want to keep your DRAM1.3 databases.

  6. In 1.4 you can set a config file to use in dram annotation and distillation at run time in 2 ways. (1) use --config_loc with DRAM.py or DRAM-v.py or (2) set the environment variable DRAM_CONFIG_LOCATION. This will not store or import the config, and that config will only be used for that run.

  7. Significant Bug fixes are also included in this release.

    • When the input fastas contain duplicates in their header names, the dram annotate step should fail with an error immediately, not at the end of the annotation process, this will save some people a lot of time. It may be that this is only a problem for annotating genomes, in any case it must be in place across workflows.
    • Some users have firewalls on their HPC environments that prevent the download via ftp in some cases converting to http can solve download problems. In DRAM1.4 if ftp links fail, a back-up http link will be attempted before an error is thrown. See issue #206.
    • DRAM1.4 will ensure that if no databases are downloaded, DRAM setup will still work. Previously, some databases depend on data being downloaded and can't be set up with a provided data set.
    • Reduced unnecessary warnings in various repetitive tasks in DRAM distillation by refactoring pandas code.
    • BIO-RELATED This bug change could affect biology. In the past, the counting of EC numbers was inconsistent. When counting the number of EC numbers in a row of the annotations file duplicates were not counted, however if counting the EC numbers for the full set of data the count of EC numbers included such duplicates. This is now corrected, but it could have some small unexpected downstream effects.
    • Glycoside hydrolase subfamily calls.
    • In response to issue #122 You can now pass a config file at run time or by setting the environment variable DRAM_CONFIG_LOCATION. Read more in the Wiki.

Known issues:

  • Speed and memory remain a big problem for DRAM and the estimates in the wiki and other documentation are woefully out of date. Fixing this is a major priority.
  • The annotation merging tool lacks sufficient checks, and fails when files are missing.
  • Code coverage remains low, especially for the less prominent tools.

v1.4.0.rc1

1 year ago

This is the first release candidate of DRAM1.4.0. The 1.4.0 release has significant changes that could impact your research. Please review these changes and help us validate this release!

Install / upgrade:

In a few weeks DRAM will be upgraded in Bioconda and then can be upgraded like any Conda package. You will still be able to install DRAM1.3.5 with the traditional Conda method outlined in the README, but for early adoption you will need to use the method of install below. This method is also added in the README under Install Release Candidate.

To install a potentially unstable release candidate of DRAM, use the set of commands below that are suitable to your situation. Note the comments within the code sections and there is a context in which commands must be used.

If you already have a DRAM environment and want to upgrade:

# Activate your old DRAM environment first!
# Save your old config
DRAM-setup.py export_config > my_old_config.txt
# If you want to install in a new environment follow the instructions below and import your config with the last command in this block
# Clone the git repository
git clone https://github.com/WrightonLabCSU/DRAM.git
# you may need to install pip
conda install pip3
# Make sure the pip path is in your conda environment path
which pip3
# install DRAM
pip install ./DRAM
# import your old databases
DRAM-setup.py import_config --config_loc  my_old_config.txt

To install the DRAM release candidate in a new Conda environment;

git clone https://github.com/WrightonLabCSU/DRAM.git
cd DRAM
# Install dependencies, this will also install a stable version of DRAM that will then be replaced.
conda env create --name my_dram_env -f environment.yaml
conda activate my_dram_env
# Install pip
conda install pip3
pip3 install ./

Change log:

  1. Dram distill now includes a new metabolism for methylation. Although planned for DRAM2 you can already include this tool in annotation and distillation provided you follow the instructions below.

    In order to distill with methyl, you need only download the new FASTA file and point to it with the dram custom database options that were introduced in DRAM1.3. Note that in order to distill correctly, you will need to use the correct name ‘methyl’ and must use DRAM 1.4.

    To Annotate with methyl, do something like:

    wget https://raw.githubusercontent.com/shafferm/DRAM/master/data/methylotrophy/methylotrophy.faa
    DRAM.py annotate -i '/some/path/*.fasta' -o dram_output --threads 30 --custom_db_name methyl --custom_fasta_loc methylotrophy.faa
    

    To Distill with methyl:

    wget https://raw.githubusercontent.com/shafferm/DRAM/master/data/methylotrophy/methylotrophy_distillate.tsv
    DRAM.py distill -i dram_output/annotations.tsv -o dram_output/distillate --custom_distillate methylotrophy_distillate.tsv
    

    Learn more about custom databases, in the Wiki.

  2. Glycoside hydrolase subfamily calls, subfamily calls are now being incorporated into annotations with changes in databases and code; this impacts what gets pulled into the distillate and product because these are looking for family level (e.g. AA1) not subfamily level (e.g. AA1_1, AA2_2).

    In response, DRAM is changing the output of the dbCAN database in DRAM1.4. Raw- cazyme subfamilies will be output into the cazy_id column, and the corresponding description for the cazyme family will be put into the cazy_hit column.

    The Distillation in DRAM1.4 will count cazymes marked at subfamily level on the family level; this means for cazyme family AA1 there will be 4 entries in the distillate AA1, AA1_1, AA1_2, and AA1_3 and the sum of these four will be the total number of AA1 cazymes. In DRAM1.3 and previous, the distillate for this example AA1 with no underscore would include cazymes that can be assigned to family AA1, but do not have a subfamily designation.

    The DRAM Product will also count cazymes at the family level. For the AA1 example, AA1_1, AA1_2, and AA1_3 will be counted as AA1 for the current rules in assigning cazymes to compounds.

  3. More changes are also being made that will affect CAZY IDs in DRAM1.4. The cutoff e-value is being changed to 1e-18 to conform to best practices for the database.

    DRAM1.4 also introduced a new column for best hit per gene from dbCAN database named cazy_best_hit. This column will be the match to the gene that has the highest coverage and lowest full-sequence e-value as calculated by mmseqs, with priority on e-value. Cazy_best_hit will be the only column considered downstream in the distillate and product. DRAM1.3 pulls and counts all dbCAN hits above e-value 1e-15, rather than profiling best hits.

    New column corresponding to EC number information from subfamilies, named cazy_subfamily_ec has been added in DRAM1.4. These EC numbers will also be used as part of the distillate along with those from kegg, as part of pathways and other tools. For now, incomplete EC numbers will be included, but not considered for the distillate. The subfamilies will be excluded from the product in order to facilitate its goals of being a larger overview.

  4. Logging is now fully implemented in DRAM1.4. Log files will be created for almost all DRAM functions. The log file for annotations will appear in the annotations' folder by default, and the log file for the dram distillation will by default be in the distillation folder. You can also use the --log_file_path argument to set the log path. A log file for database processing is set by the config file, and by default it will be in the databases' directory. All content that DRAM prints to the command line will appear in the log file .

  5. The dram config now stores when databases were downloaded, citation information and version information when applicable. This information is printed to the log at the beginning of each run. The old format can still be imported if you want to keep your DRAM1.3 databases.

  6. Significant Bug fixes are also included in this release.

    • When the input fastas contain duplicates in their header names, the dram annotate step should fail with an error immediately, not at the end of the annotation process, this will save some people a lot of time. It may be that this is only a problem for annotating genomes, in any case it must be in place across workflows.
    • Some users have firewalls on their HPC environments that prevent the download via ftp in some cases converting to http can solve download problems. In DRAM1.4 if ftp links fail, a back-up http link will be attempted before an error is thrown. See issue #206.
    • DRAM1.4 will ensure that if no databases are downloaded, DRAM setup will still work. Previously, some databases depend on data being downloaded and can't be set up with a provided data set.
    • Reduced unnecessary warnings in various repetitive tasks in DRAM distillation by refactoring pandas code.
    • BIO-RELATED This bug change could affect biology. In the past, the counting of EC numbers was inconsistent. When counting the number of EC numbers in a row of the annotations file duplicates were not counted, however if counting the EC numbers for the full set of data the count of EC numbers included such duplicates. This is now corrected, but it could have some small unexpected downstream effects.
    • Glycoside hydrolase subfamily calls.
    • In response to issue #122 You can now pass a config file at run time or by setting the environment variable DRAM_CONFIG_LOCATION. Read more in the Wiki.

v1.3

2 years ago

DRAM v1.3 change log

  • Add --amg_database_loc parameter that was missing in DRAM-setup.py
  • Shift DRAM download of UniRef from FTP to HTTP address to address firewall issues
  • Rename of headers in annotations.tsv files to be more uniform across databases
  • By default DRAM.py annotate now does not annotated with VOGDB by default, flag added to use VOGDB
  • By default don't split DRAM-v.py annotate input contigs into separate files because HMMER doesn't care for E-values
  • Users can now pass multiple --input_fasta arguments to DRAM.py annotate and DRAM-v.py annotate
  • Now DRAM makes sure bin names (pulled from file names) are unique and not full paths
  • Update pandas methods to get rid of warnings and increase speed
  • When annotating with KEGG Genes the KEGG Genes IDs are stored in the annotations.tsv in addition to the KO IDs
  • Complete rewrite of how HMM annotation is handled inside DRAM to reduce redundancy and allow...
  • Users can now annotate using custom HMM sets which may include custom bitScore cutoffs
  • Complete rewrite of database handling from setup through annotation, in the future this will allow more flexible configuration
  • Change CI to CircleCI from travis
  • DRAM strainer and gene neighborhood pulling can now both use custom distillate information

v1.2.4

2 years ago

Potentially breaking change in this release for those parsing annotation.tsv results. VOGDB hits columns have been renamed from vogdb_description and vogdb_categories to vogdb_id (previously unreported), vodb_categories and vogdb_hit (equivalent to vogdb_description).

Changelog

  • Add anammox to distillate
  • Fix more no hits bugs (thanks to @cerebis)
  • VOGDB output columns renamed to match other databases
  • Big upgrade to merge annotations, now includes all parts of DRAM annotate output
  • Added option to run KOfam with dbCAN thresholds (used in other DRAM hmmscan applications) instead of KOfam recommended thresholds
  • @rmflynn fixed bug where VirSorter 2 headers were not recognized by DRAM-v
  • @rmflynn upgraded warnings and error messages for DRAM-v around parsing affi contigs and fasta files
  • Fixed bug were non-cazy ID's were pulled out of dbCAN descriptions

v1.2.0

3 years ago

Change Log

  • Handle when there are no significant kofamhits
  • Fix DRAM genome stats to always give scaffold location even if grouping by other unit
  • Fix last genome being skipped when number of genomes was > genomes per liquor
  • Don't kill everything if bins aren't in gtdb or checkM, just give warning and fill
  • Add ability for DRAM and DRAM-v to take custom distillate sheets
  • Add flag to DRAM distill to give gene names and not counts in distillate
  • Fix where making gbk file would fail if '>' included in fasta header
  • Fix DRAM-v distill breaking at making viral stats with non VirSorter 1 viruses
  • Add flag to DRAM distill to set the number of genomes per liquor
  • Update default dbCAN2 version to v9
  • Metabolism updates
    • Remove database identifiers from polyphenolic metabolism that were too loose
    • Add sugar metabolisms to distillate

v0.0-beta.2

4 years ago

New release for August 2019. Take note now sqlalchemy, barrnap and altair are not dependencies. All can be installed via conda.

After this release a new structure for branches is being used. Master branch will be the release + any bugfixes associated with getting the release to work. This branch should be stable. The dev branch holds all features added since the last release. This branch is semi-stable. This branch will be rolled into master at each new release. Individual features will be developed on their own branches. These are not at all stable. Once they are tested and working they can be rolled into dev.

Changelog:

  • Add gene start, end and strandedness to annotations
  • Add GTDBk taxonomy and checkM contamination and completeness to annotations
  • Add --skip_trnascan flag
  • Work around is to set $TMPDIR variable
  • Add ability to use custom fasta as annotation databases (BLAST style search only (at this point))
  • All fasta headers now stored in sqlite db
  • Output genbank file per bin with annotation information
  • Add rRNA detection using barrnap
  • Creation of genome_stats.tsv
  • Creation of funcation_heatmap.html
  • Bugfixes surrounding processing of small genomes with few annotations
  • Various bugfixes