Using the Anacapa Pipeline

There are some important files that you should take a look at before starting anything else:

File	Description
`Anacapa_db/scripts/anacapa_vars.sh`	File containing default variables used in the Anacapa pipeline. You can modify this file to change default settings.
`Anacapa_db/metabarcode_loci_min_merge_length.txt`	Configuation file for dada2, to configure overlap lengths.
`Anacapa_db/forward_primers.txt`	Default forward primers used for trimming and sorting.
`Anacapa_db/reverse_primers.txt`	Default reverse primers used for trimming and sorting.

You can edit these files using the terminal with the nano command. For example:

nano Anacapa_db/scripts/anacapa_vars.sh

Once you've made your changes, exit with ^X (Control + X), then press Y to save, and Enter to confirm the filename.

First Half - QC and DADA2

The first half of the Anacapa pipeline can be run with Anacapa_db/anacapa_QC_dada2.sh.

Example:

Anacapa_db/anacapa_QC_dada2.sh -i Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/12S_test_data -o out -d Anacapa_db -f Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/12S_test_data/forward.txt -r Example_data/12S_example_anacapa_QC_dada2_and_BLCA_classifier/12S_test_data/reverse.txt -e Anacapa_db/metabarcode_loci_min_merge_length.txt -a nextera -t MiSeq -l

Breakdown of the example command:

Command Argument	Description
`-i [filepath]`	Path to folder with .fastq.gz files
`-o out`	Path to output directory. If it doesn't exist, it will be created.
`-d Anacapa_db`	Path to Anacapa_db. This doesn't really change.
`-f [filepath]`	Path to file with forward primers
`-r [filepath]`	Path to file with reverse primers
`-e [filepath]`	File path to a list of minimum length(s) required for paired F and R reads to overlap
`-a nextera`	Illumina adapter type
`-t MiSeq`	Illumina platform
`-l`	Indicates running locally. This is always needed, because the original Anacapa was designed for use on the UCLA HPC.

Other arguments not used in the example:

Command Argument	Description
`-g`	If .fastq read are not compressed
`-c`	To modify the allowed cutadapt error for 3' adapter and 5' primer adapter trimming: 0.0 to 1.0 (default 0.3)
`-p`	To modify the allowed cutadapt error 3' primer sorting and trimming: 0.0 to 1.0 (default 0.3)
`-q`	To modify the minimum quality score allowed: 0 - 40 (default 35)
`-m`	To modify the minimum length after quality trimming: 0 - 300 (default 100)
`-x`	To modify the additional 5' trimming of forward reads: 0 - 300 (default HiSeq 10, default MiSeq 20)
`-y`	To modify the additional 5' trimming of reverse reads: 0 - 300 (default HiSeq 25, default MiSeq 50)
`-b`	To modify the number of occurrences required to keep an ASV: 0 - any integer (default 0)

Second Half - Classification

The second half of the Anacapa pipeline can be run with Anacapa_db/anacapa_classifier.sh.

Example:

Anacapa_db/anacapa_classifier.sh -d Anacapa_db -o out -l

Breakdown of the example command:

Command Argument	Description
`-d Anacapa_db`	Path to Anacapa_db. This doesn't really change.
`-o out`	Path to output directory generated in the Sequence QC and ASV Parsing script. Yes, the output is the input. I don't know either. It does modify the input in-place, though, so it does become the output, in a way.
`-l`	Indicates running locally. This is always needed, because the original Anacapa was designed for use on the UCLA HPC.

Other arguments not used in the example:

Command Argument	Description
`-b`	Percent of missmatch allowed between the qury and subject for BLCA: 0.0 to 1.0 (default 0.8)
`-p`	Minimum percent of length of the subject reltive to the query for BLCA: 0.0 to 1.0 (default 0.8)
`-c`	A list of BCC cut-off values to report taxonomy: "0 to 100", quotes required (default "40 50 60 70 80 90 95"). The file must contain the following format: PERCENT="40 50 60 70 80 90 95 100", Where the value may differ but the PERCENT="values" is required. see `Anacapa_db/scripts/BCC_default_cut_off.sh` as an example.
`-n`	BLCA number of times to bootstrap: integer value (default 100)
`-m`	Muscle alignment match score: default 1
`-f`	Muscle alignment mismatch score: default -2.5
`-g`	Muscle alignment gap penalty: default -2

Complete

Once both halves have been run, your output files will be in the output directory you specified in both commands. Copy or move it to the /data directory to access it from your host machine, if that's not where it already is.