If you're working with Next-Generation Sequencing (NGS) data, understanding file formats is key to managing your data efficiently. Here’s a quick overview:
samtools view -Sb file.sam > file.bam
samtools view -h file.bam > file.sam
samtools view -C -T reference.fasta file.bam > file.cram
samtools view -b -T reference.fasta file.cram > file.bam
samtools index file.bam
samtools index file.cram
tabix -p vcf file.vcf.gz
CRAM < BAM < FASTQ.GZ (in terms of disk space saved).
CRAM is ideal for archiving; BAM balances speed and size; FASTQ.GZ stores raw reads but takes more space.