Running dnaapler
For all subcommands, dnaapler requires an input FASTA file using the -i or --input parameters.
It is also highly recommended to specify an output directory using the -o or --output parameters, otherwise dnaapler will write the output to a directory named output.dnaapler by default.
You can modify the prefix for the output files from dnaapler to whatever you please with the -p or --prefix parameters.
You can use BLAST with multiple threads using the -t or --threads parameters and modify the BLAST evalue with the -e or --evalue parameter.
dnaapler will not overwrite an output directory if it already exists by default. To force overwrite, please use -f or --force.
Finally, for the BLAST based subcommands (chromosome, phage, plasmid, archaea, custom or all), if no BLAST hit is found, by default dnaapler will error and exit.
However, you can decide to autocomplete dnaapler using the -a or --autocomplete parameters along with mystery or nearest, which will then run those subcommands to reorient your sequence.
Also, a seed value using --seed_value can be specified with dnaapler to ensure that dnaapler mystery (or when austocomplete is used with -a mystery) to ensure dnaapler is reproducible in workflows.
all
dnaapler all is designed to simultaneously orient multiple contigs that can be a mix of chromosomes, plasmids, archaea and phages. It will also work on just 1 contig.
If a contig has BLAST hits for both dnaA and terL or repA, dnaA will be chosen for reorientation.
If a contig has BLAST hits for both archaeal COG1474 and terL or repA, COG1474 will be chosen for reorientation.
If a contig has BLAST hits for both terL and repA (but not dnaA or COG1474), repA will be chosen for reorientation.
If a contig has BLAST hits for both dnaA and archaeal COG1474, dnaA will be chosen for reorientation though I assume this would be very unlikely!
You can also specify a text file with --ignore that lists all contigs (based on their header) to be ignored during reorientation.
e.g. the file (ignored_contigs.txt) needs to be formatted as follows:
contig_1
contig_2
Example usage to reorient a number of contigs in input.fasta, ignoring all contigs with headers denoted in ignored_contigs.txt
dnaapler all -i input.fasta -o output_directory_path -t 8 --ignore ignored_contigs.txt
Usage: dnaapler all [OPTIONS]
Reorients contigs to begin with any of dnaA, repA, terL or archaeal COG1474
Orc1/cdc6
Options:
-h, --help Show this message and exit.
-V, --version Show the version and exit.
-i, --input PATH Path to input file in FASTA or GFA format
[required]
-o, --output PATH Output directory [default: output.dnaapler]
-t, --threads INTEGER Number of threads to use with BLAST [default: 1]
-p, --prefix TEXT Prefix for output files [default: dnaapler]
-f, --force Force overwrites the output directory
-e, --evalue TEXT e value for blastx [default: 1e-10]
--ignore PATH Text file listing contigs (one per row) that are to
be ignored
-a, --autocomplete TEXT Choose an option to autocomplete reorientation if
BLAST based approach fails. Must be one of: none,
mystery, largest, or nearest [default: none]
--seed_value INTEGER Rand
chromosome
Example usage with mystery as the autocomplete command and a random seed of 245 for reproducibility and with 8 threads for BLAST:
dnaapler chromosome -i input.fasta -o output_directory_path -p my_bacteria_name -t 8 -a mystery --seed_value 245
Usage: dnaapler chromosome [OPTIONS]
Reorients your genome to begin with the dnaA chromosomal replication
initiation gene
Options:
-h, --help Show this message and exit.
-V, --version Show the version and exit.
-i, --input PATH Path to input file in FASTA or GFA format
[required]
-o, --output PATH Output directory [default: output.dnaapler]
-t, --threads INTEGER Number of threads to use with BLAST [default: 1]
-p, --prefix TEXT Prefix for output files [default: dnaapler]
-f, --force Force overwrites the output directory
-e, --evalue TEXT e value for blastx [default: 1e-10]
-a, --autocomplete TEXT Choose an option to autocomplete reorientation if
BLAST based approach fails. Must be one of: none,
mystery or nearest [default: none]
--seed_value INTEGER Random seed to ensure reproducibility. [default:
13]
phage
Example usage with no autocomplete command:
dnaapler phage -i input.fasta -o output_directory_path -p my_phage_name -t 8
Usage: dnaapler phage [OPTIONS]
Reorients your genome to begin with the terL large terminase subunit
Options:
-h, --help Show this message and exit.
-V, --version Show the version and exit.
-i, --input PATH Path to input file in FASTA or GFA format
[required]
-o, --output PATH Output directory [default: output.dnaapler]
-t, --threads INTEGER Number of threads to use with BLAST [default: 1]
-p, --prefix TEXT Prefix for output files [default: dnaapler]
-f, --force Force overwrites the output directory
-e, --evalue TEXT e value for blastx [default: 1e-10]
-a, --autocomplete TEXT Choose an option to autocomplete reorientation if
BLAST based approach fails. Must be one of: none,
mystery or nearest [default: none]
--seed_value INTEGER Random seed to ensure reproducibility. [default:
13]
plasmid
Example usage with no autocomplete command:
dnaapler plasmid -i input.fasta -o output_directory_path -p my_plasmid_name -t 8
Usage: dnaapler plasmid [OPTIONS]
Reorients your genome to begin with the repA replication initiation gene
Options:
-h, --help Show this message and exit.
-V, --version Show the version and exit.
-i, --input PATH Path to input file in FASTA or GFA format
[required]
-o, --output PATH Output directory [default: output.dnaapler]
-t, --threads INTEGER Number of threads to use with BLAST [default: 1]
-p, --prefix TEXT Prefix for output files [default: dnaapler]
-f, --force Force overwrites the output directory
-e, --evalue TEXT e value for blastx [default: 1e-10]
-a, --autocomplete TEXT Choose an option to autocomplete reorientation if
BLAST based approach fails. be one of: none,
mystery or nearest [default: none]
--seed_value INTEGER Random seed to ensure reproducibility. [default:
13]
archaea
Example usage with no autocomplete command:
dnaapler archaea -i input.fasta -o output_directory_path -p my_archaea_name -t 8
Usage: dnaapler archaea [OPTIONS]
Reorients your genome to begin with the archaeal COG1474 Orc1/cdc6 origin
recognition complex gene
Options:
-h, --help Show this message and exit.
-V, --version Show the version and exit.
-i, --input PATH Path to input file in FASTA or GFA format
[required]
-o, --output PATH Output directory [default: output.dnaapler]
-t, --threads INTEGER Number of threads to use with BLAST [default: 1]
-p, --prefix TEXT Prefix for output files [default: dnaapler]
-f, --force Force overwrites the output directory
-e, --evalue TEXT e value for blastx [default: 1e-10]
-a, --autocomplete TEXT Choose an option to autocomplete reorientation if
BLAST based approach fails. Must be one of: none,
mystery, largest, or nearest [default: none]
--seed_value INTEGER Random
```
### custom
To run `dnaapler custom`, you need to prefix an Amino Acid FASTA file containing the desired custom database gene using `-c` or `--custom_db`.
Example usage:
dnaapler custom -i input.fasta -o output_directory_path -p my_plasmid_name -t 8 -c custom_db.faa
Usage: dnaapler custom [OPTIONS]
Reorients your genome with a custom database
Options: -h, --help Show this message and exit. -V, --version Show the version and exit. -i, --input PATH Path to input file in FASTA or GFA format [required] -o, --output PATH Output directory [default: output.dnaapler] -t, --threads INTEGER Number of threads to use with BLAST [default: 1] -p, --prefix TEXT Prefix for output files [default: dnaapler] -f, --force Force overwrites the output directory -e, --evalue TEXT e value for blastx [default: 1e-10] -c, --custom_db PATH FASTA file with amino acids that will be used as a custom blast database to reorient your sequence however you want. [required] -a, --autocomplete TEXT Choose an option to autocomplete reorientation if BLAST based approach fails. Must be one of: none, mystery or nearest [default: none] --seed_value INTEGER Random seed to ensure reproducibility. [default: 13]
### mystery
`dnaapler mystery` will reorient your genome to begin with a random coding sequence (CDS) (as predicted by [Pyrodigal](https://github.com/althonos/pyrodigal)).
Example usage:
dnaapler mystery -i input.fasta -o output_directory_path -t 8
Usage: dnaapler mystery [OPTIONS]
Reorients your genome with a random CDS
Options: -h, --help Show this message and exit. -V, --version Show the version and exit. -i, --input PATH Path to input file in FASTA or GFA format [required] -o, --output PATH Output directory [default: output.dnaapler] -t, --threads INTEGER Number of threads to use with BLAST [default: 1] -p, --prefix TEXT Prefix for output files [default: dnaapler] -f, --force Force overwrites the output directory --seed_value INTEGER Random seed to ensure reproducibility. [default: 13]
### nearest
`dnaapler nearest` will reorient your genome to begin the first coding sequence (CDS) as predicted by [Pyrodigal](https://github.com/althonos/pyrodigal).
Example usage:
dnaapler nearest -i input.fasta -o output_directory_path -t 8
Usage: dnaapler nearest [OPTIONS]
Reorients your genome the begin with the first CDS as called by pyrodigal
Options: -h, --help Show this message and exit. -V, --version Show the version and exit. -i, --input PATH Path to input file in FASTA or GFA format [required] -o, --output PATH Output directory [default: output.dnaapler] -t, --threads INTEGER Number of threads to use with BLAST [default: 1] -p, --prefix TEXT Prefix for output files [default: dnaapler] -f, --force Force overwrites the output directory
### largest
`dnaapler largest` will reorient your genome to begin the largest coding sequence (CDS) as predicted by [Pyrodigal](https://github.com/althonos/pyrodigal).
Example usage:
dnaapler largest -i input.fasta -o output_directory_path -t 8
Usage: dnaapler largest [OPTIONS]
Reorients your genome the begin with the largest CDS as called by pyrodigal
Options: -h, --help Show this message and exit. -V, --version Show the version and exit. -i, --input PATH Path to input file in FASTA or GFA format [required] -o, --output PATH Output directory [default: output.dnaapler] -t, --threads INTEGER Number of threads to use with BLAST [default: 1] -p, --prefix TEXT Prefix for output files [default: dnaapler] -f, --force Force overwrites the output directory
### bulk
`dnaapler bulk` is designed to simultaneously orient multiple genomes.
You must also specify `-m` or `--mode` with either `chromosome`, `phage`, `plasmid` or `custom` to tell `dnaapler` what mode to run. It will default to `-m chromosome`. Additionally, if you choose `-m custom`, then you must also specify a custom database amino acid file using `-c` or `--custom_db`.
Your input FASTA must also have at least 2 contigs.
Example usage to reorient a number of bacterial chromosomes in `input.fasta` to begin with the dnaA gene:
dnaapler bulk -i input.fasta -o output_directory_path -t 8 -m chromosome
Usage: dnaapler bulk [OPTIONS]
Reorients multiple genomes to begin with the same gene
Options: -h, --help Show this message and exit. -V, --version Show the version and exit. -i, --input PATH Path to input file in FASTA or GFA format [required] -o, --output PATH Output directory [default: output.dnaapler] -t, --threads INTEGER Number of threads to use with BLAST [default: 1] -p, --prefix TEXT Prefix for output files [default: dnaapler] -f, --force Force overwrites the output directory -e, --evalue TEXT e value for blastx [default: 1e-10] -m, --mode TEXT Choose an mode to reorient in bulk. Must be one of: chromosome, plasmid, phage or custom [default: chromosome] -c, --custom_db PATH FASTA file with amino acids that will be used as a custom blast database to reorient your sequence however you want. Must be specified if -m custom is specified. ```