4. Input file and data format
4.1. BED format
BED (Browser Extensible Data) format is commonly used to describe genomic intervals. Standard BED file has 12 columns, but cobind only requires the first three columns (all the other columns are optional):
# BED3 format (chrom, start, end)
chr1 629149 629391
chr1 629720 630165
chr1 631404 631758
...
# BED4 format (chrom, start, end, name)
chr1 629149 629391 region_1
chr1 629720 630165 region_2
chr1 631404 631758 region_3
...
# BED6 format (chrom, start, end, name, score, strand)
chr1 629149 629391 region_1 0 +
chr1 629720 630165 region_2 0 +
chr1 631404 631758 region_3 0 -
...
4.2. BED-like format
ENCODE narrowpeak
ENCODE broadpeak
ENCODE gappedpeak
4.3. bigBed
bigBed is an indexed binary format of a BED file. UCSC’s bedToBigBed
and bigBedToBed
commands can be used to convert BED files into bigBed files or vice versa.
4.4. bigWig
The bigWig format is an indexed binary format of a wiggle file, which is widely used to represent genomic signals. UCSC’s wigToBigWig
and bigWigToWig
commands can be used to convert wiggle files into bigWig files or vice versa.