Zeeshan Fazal's Bioinformatics Blog

Parsing VEP-Annotated VCF Files into Tabular Format with R

June 10, 2025

Why Parse VEP-Annotated VCFs?

Variant Effect Predictor (VEP) is a powerful tool by Ensembl that annotates VCF files with rich biological insights — such as predicted consequences, gene names, protein changes, and population allele frequencies. However, this annotation is stored in a condensed format under the CSQ tag in the INFO field, making it difficult to filter, group, or visualize.

To get the most out of VEP annotations, it’s essential to parse these fields into a clean, tabular structure.

R Script Overview

Here’s an example script I use to:

options(stringsAsFactors = FALSE)
library(vcfR)
library(ensemblVEP)
library(dplyr)
library(tidyr)


files <- list.files(pattern = "*.vcf.gz")
vcf_db <- read.vcfR(files[1], verbose = FALSE)

vep_header <- data.frame(vcf_db@meta)
vep_variants <- data.frame(vcf_db@fix)
vep_gt <- data.frame(vcf_db@gt)
VAF <- data.frame(extract.gt(vcf_db, element = "AF", as.numeric = TRUE))
colnames(VAF) <- "VAF"

vcf_ens <- readVcf(files[1], "hg38")
csq_vep <- parseCSQToGRanges(vcf_ens)
csq_vep <- data.frame(csq_vep)[ , -c(1:5, 8, 9)]

info <- strsplit(vep_variants$INFO, ";CSQ")
info_df <- as.data.frame(matrix(unlist(info), ncol=2, byrow=TRUE))
INFO <- info_df[1]
colnames(INFO) <- "INFO"

vep_variants_filt <- vep_variants[ , c(1:2,4:7) ]
VEP <- cbind.data.frame(vep_variants_filt, INFO, csq_vep, VAF, vep_gt)
write.table(VEP, "VEP_annotated_table.tsv", sep = "", row.names = FALSE, quote = FALSE)

Output Columns

← Back to Blog