10. Spatial Relation Of Genomic (SROG) intervals
10.1. Description
Match up two sets of genomic intervals, and report the code of Spatial Relation Of Genomic (SROG).
SROG codes include disjoint
, touch
, equal
, overlap
, contain
, within
.
10.2. Usage
cobind.py srog -h
usage: cobind.py srog [-h] [--dist MAX_DIST] [-l log_file] [-d]
input_A.bed input_B.bed output.tsv
positional arguments:
input_A.bed Genomic regions in BED, BED-like or bigBed format. If
'name' (the 4th column) is not provided, the default
name is "chrom:start-end". If strand (the 6th column)
is not provided, the default strand is "+".
input_B.bed Genomic regions in BED, BED-like or bigBed format. If
'name' (the 4th column) is not provided, the default
name is "chrom:start-end". If strand (the 6th column)
is not provided, the default strand is "+".
output.tsv Generate spatial relation code (disjoint, touch,
equal, overlap, contain, within) for each genomic
interval in "input_A.bed".
options:
-h, --help show this help message and exit
--dist MAX_DIST When intervals are disjoint, find the closest up- and
down-stream intervals that are no further than
`max_dist` away. default: 250000000)
-l log_file, --log log_file
This file is used to save the log information. By
default, if no file is specified (None), the log
information will be printed to the screen.
-d, --debug Print detailed information for debugging.
10.3. Example
cobind.py srog CTCF_ENCFF660GHM.bed3 RAD21_ENCFF057JFH.bed3 output.tsv
2022-01-20 09:01:17 [INFO] Determine the spacial realtions of genomic (SROG) intervals ...
2022-01-20 09:01:17 [INFO] Build interval tree from file: "RAD21_ENCFF057JFH.bed3"
2022-01-20 09:01:17 [INFO] Reading BED file: "CTCF_ENCFF660GHM.bed3"
disjoint 30419
overlap 4341
contain 1695
within 23214
touch 0
equal 1
other 0
dtype: int64
Match up results were saved to output.tsv
$head -10 output.tsv
chr12 53676079 53676369 within chr12:53676060-53676382
chr12 57905364 57905661 within chr12:57905272-57905699
chr22 20564334 20564661 contain chr22:20564370-20564581
chr16 57649065 57649362 within chr16:57649007-57649370
chr17 45135294 45135610 overlap chr17:45135296-45135642
chr15 40274737 40275016 within chr15:40274714-40275018
chr1 114346538 114346847 within chr1:114346526-114346903
chr7 151172578 151172888 overlap chr7:151172565-151172865
chr1 225474965 225475268 within chr1:225474919-225475330
chr5 179668464 179668730 contain chr5:179668495-179668674
...
chr22 23128466 23128723 disjoint UpInterval=chr22:22651059-22651463,
DownInterval=chr22:23385972-23386169
...
- Column 1-3
Genome intervals from “CTCF_ENCFF660GHM.bed3”.
- Column 4
SROG code. When SORG =
disjoint
, two closest intervals (up- and down-stream) fromRAD21_ENCFF057JFH.bed3
were reported.- column 5
Genomic intervals from
RAD21_ENCFF057JFH.bed3
.