Working with VCF Files Using bcftools

VCF (Variant Call Format) files are a staple in bioinformatics for storing variant calls like SNPs, indels, and structural variants. One of the most powerful tools to manipulate, filter, and extract information from VCF files is bcftools, part of the SAMtools suite.

bcftools view -h variants.vcf.gz

bcftools stats variants.vcf.gz > vcf.stats
plot-vcfstats vcf.stats -p plots/

bcftools filter -i 'QUAL>30' variants.vcf.gz -Oz -o filtered.vcf.gz

bcftools view -r chr1:1000000-2000000 variants.vcf.gz -Oz -o chr1_region.vcf.gz

bcftools view -s Sample1,Sample2 variants.vcf.gz -Oz -o subset_samples.vcf.gz

bcftools query -f '%CHROM\t%POS\t%REF\t%ALT\t%DP\t%AF\n' variants.vcf.gz > variants.tsv

bcftools index variants.vcf.gz

Creates a .csi index file required for fast querying and region extraction.

bcftools merge sample1.vcf.gz sample2.vcf.gz -Oz -o merged.vcf.gz

Combines multiple VCFs with shared samples into a single file.

bcftools concat part1.vcf.gz part2.vcf.gz -Oz -o full.vcf.gz

Concatenates VCFs from different genomic regions.

bcftools norm -f reference.fasta -m -any variants.vcf.gz -Oz -o norm.vcf.gz

Normalizes variants and splits multiallelic records for consistent analysis.

bcftools annotate -a dbsnp.vcf.gz -c ID -h header.txt variants.vcf.gz -Oz -o annotated.vcf.gz

Adds annotations (e.g., rsIDs) from a reference or database VCF.

This guide helps you explore and manipulate VCF files efficiently using bcftools.

← Back to Blog