1

The early history of the SAM/BAM format

 2 years ago
source link: http://lh3.github.io/2015/01/27/the-early-history-of-the-sambam-format
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

The early history of the SAM/BAM format

27 January 2015

While I was looking for an ancient email on my old (first) macbook, I noticed the numerous email exchanges during the early days of the SAM/BAM format. Here is a brief summary. The ideas below were proposed by various people in the 1000 Genomes Project analysis group.

  • 2008-10-21: SAM got its name.
  • 2008-10-22: The first day: fixed columns and optional tags; extended CIGAR and binning index.
  • 2008-10-24: Compression suggested. RAZF started. Streamability emphasized.
  • 2008-11-01: 2-byte tags; mate positions as fixed columns.
  • 2008-11-03: Adopted the text/binary dual format. RAZF implemented.
  • 2008-11-06: Sequence dictionary. The very first draft of SAM/BAM spec circulated.
  • 2008-11-07: BGZF proposed. BAM got its name.
  • 2008-11-10: Linear index.
  • 2008-11-12: Read group.
  • 2008-11-14: BGZF implemented. BAM on top of RAZF working.
  • 2008-11-18: Combining binning and linear indices.
  • 2008-11-20: sort/merge/pileup/faidx implemented.
  • 2008-11-21: tview prototype working.
  • 2008-12-08: Final draft sent to 1000g. Adopted the MIT license.
  • 2008-12-22: First public release of samtools. It is still working on most BAMs nowadays.

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK