Calculating Statistics

After generating callable loci, the next step is to calculate population genetic statistics using those regions and your variant data with clam stat.

Inputs

clam stat requires two main inputs:

A VCF file containing variants: This must be a bgzipped and indexed VCF file.
Callable loci file: The D4 format file generated by clam loci that defines which regions have sufficient sequencing depth.

Options

Windows or Regions

You must specify either a window size or a regions file for which to calculate statistics:
-w, --window-size: Size of windows for statistics in base pairs.
-r, --regions-file: BED file specifying regions to calculate statistics for.

Populations

To specify populations (-p, --populations), create a tab seperated file that maps samples to population labels:

sample1    population1
sample2    population1
sample3    population2
sample4    population2
sample5    population3

Sample Names

The sample names in your population file must exactly the samples in the header of your VCF.

Runs of Homozygosity

--roh-file: stat can accept per-sample runs of homozygosity (ROH) intervals to ignore spurious heterozygous calls when calculating stats. ROH file must be tab seperated with the following columns: chrom, start, end, sample

Chromosome Filtering

You can select specific chromosomes to exclude or restrict your analysis to , see CLI Reference for details.

Outputs

clam stat generates several output files in the specified output directory: 1. clam_pi.tsv: Always generated. 2. clam_dxy.tsv: Only generated if populations were specified. 3. clam_fst.tsv: Only generated if populations were specified. 4. clam_het.tsv: Always generated.

These files are tab-separated and can be easily imported into R, Python, or other analysis tools.