11. Stat
11.1. Description
- Wrapper function. Report basic statistics of genomic intervals, including
count
total size
unique size
mean size
median size
min size
max size
Standard deviation
- and calculate overlapping measurements, including
collocation coefficient (C)
Jaccard similarity coefficient (J)
Sørensen–Dice coefficient (SD)
Szymkiewicz–Simpson coefficient (SS)
pointwise mutual information (PMI)
normalized pointwise mutual information (NPMI)
11.2. Usage
cobind.py stat -h
usage: cobind.py stat [-h] [--nameA NAMEA] [--nameB NAMEB] [-b BGSIZE]
[-l log_file] [-d]
input_A.bed input_B.bed
positional arguments:
input_A.bed Genomic regions in BED, BED-like or bigBed format. The
BED-like format includes:'bed3', 'bed4', 'bed6',
'bed12', 'bedgraph', 'narrowpeak', 'broadpeak',
'gappedpeak'. BED and BED-like format can be plain
text, compressed (.gz, .z, .bz, .bz2, .bzip2) or
remote (http://, https://, ftp://) files. Do not
compress BigBed foramt. BigBed file can also be a
remote file.
input_B.bed Genomic regions in BED, BED-like or bigBed format. The
BED-like format includes:'bed3', 'bed4', 'bed6',
'bed12', 'bedgraph', 'narrowpeak', 'broadpeak',
'gappedpeak'. BED and BED-like format can be plain
text, compressed (.gz, .z, .bz, .bz2, .bzip2) or
remote (http://, https://, ftp://) files. Do not
compress BigBed foramt. BigBed file can also be a
remote file.
options:
-h, --help show this help message and exit
--nameA NAMEA Name to represent 1st set of genomic interval. If not
specified (None), the file name ("input_A.bed") will
be used.
--nameB NAMEB Name to represent the 2nd set of genomic interval. If
not specified (None), the file name ("input_B.bed")
will be used.
-b BGSIZE, --background BGSIZE
The size of the cis-regulatory genomic regions. This
is about 1.4Gb For the human genome. (default:
1400000000)
-l log_file, --log log_file
This file is used to save the log information. By
default, if no file is specified (None), the log
information will be printed to the screen.
-d, --debug Print detailed information for debugging.
11.3. Example
cobind.py stat CTCF_ENCFF660GHM.bed RAD21_ENCFF057JFH.bed
2022-07-09 09:44:12 [INFO] Gathering information for "CTCF_ENCFF660GHM.bed" ...
2022-07-09 09:44:12 [INFO] Gathering information for "RAD21_ENCFF057JFH.bed" ...
A.name CTCF_ENCFF660GHM.bed
A.interval_count 58684
A.interval_total_size 12190325
A.interval_mean_size 207.7283
A.interval_median_size 240.0000
A.interval_min_size 60
A.interval_max_size 576
A.interval_size_SD 51.5489
B.name RAD21_ENCFF057JFH.bed
B.interval_count 33373
B.interval_total_size 11381586
B.interval_mean_size 341.0417
B.interval_median_size 404.0000
B.interval_min_size 101
B.interval_max_size 553
B.interval_size_SD 96.8607
G.size 1400000000.0000
A.size 12184840
Not_A.size 1387815160.0000
B.size 11130268
Not_B.size 1388869732.0000
A_not_B.size 7245355
B_not_A.size 6190783
A_and_B.size 4939485
A_and_B.exp_size 96871.8105
A_or_B.size 18375623
Neither_A_nor_B.size 1381624377.0000
coef.Collocation 0.4241
coef.Jaccard 0.2688
coef.Dice 0.4237
coef.SS 0.4438
A_and_B.PMI 3.9316
A_and_B.NPMI 0.6962
dtype: object