Calculating Statistics
After generating callable loci, the next step is to calculate population genetic statistics using those regions and your variant data with clam stat
.
Inputs
clam stat
requires two main inputs:
-
A VCF file containing variants: This must be a bgzipped and indexed VCF file.
-
Callable loci file: The D4 format file generated by
clam loci
that defines which regions have sufficient sequencing depth.
Options
Windows or Regions
- You must specify either a window size or a regions file for which to calculate statistics:
-w, --window-size
- Size of windows for statistics in base pairs.
-r, --regions-file
- BED file specifying regions to calculate statistics for.
Populations
To specify populations (-p, --populations
), create a tab seperated file that maps samples to population labels:
Sample Names
The sample names in your population file must exactly the samples in the header of your VCF.
Runs of Homozygosity
--roh-file
- stat can accept per-sample runs of homozygosity (ROH) intervals to ignore spurious heterozygous calls when calculating stats.
ROH file must be tab seperated with the following columns:
chrom, start, end, sample
Chromosome Filtering
You can select specific chromosomes to exclude or restrict your analysis to , see CLI Reference for details.
Outputs
clam stat
generates several output files in the specified output directory:
1. clam_pi.tsv
: Always generated.
2. clam_dxy.tsv
: Only generated if populations were specified.
3. clam_fst.tsv
: Only generated if populations were specified.
4. clam_het.tsv
: Always generated.
These files are tab-separated and can be easily imported into R, Python, or other analysis tools.