Calculating Statistics
After generating callable loci, the next step is to calculate population genetic statistics using those regions and your variant data with clam stat.
Inputs
clam stat requires two main inputs:
-
A VCF file containing variants: This must be a bgzipped and indexed VCF file.
-
Callable loci file: The D4 format file generated by
clam locithat defines which regions have sufficient sequencing depth.
Options
Windows or Regions
- You must specify either a window size or a regions file for which to calculate statistics:
-w, --window-size- Size of windows for statistics in base pairs.
-r, --regions-file- BED file specifying regions to calculate statistics for.
Populations
To specify populations (-p, --populations), create a tab seperated file that maps samples to population labels:
Sample Names
The sample names in your population file must exactly the samples in the header of your VCF.
Runs of Homozygosity
--roh-file- stat can accept per-sample runs of homozygosity (ROH) intervals to ignore spurious heterozygous calls when calculating stats.
ROH file must be tab seperated with the following columns:
chrom, start, end, sample
Chromosome Filtering
You can select specific chromosomes to exclude or restrict your analysis to , see CLI Reference for details.
Outputs
clam stat generates several output files in the specified output directory:
1. clam_pi.tsv: Always generated.
2. clam_dxy.tsv: Only generated if populations were specified.
3. clam_fst.tsv: Only generated if populations were specified.
4. clam_het.tsv: Always generated.
These files are tab-separated and can be easily imported into R, Python, or other analysis tools.