11. Stat

11.1. Description

Wrapper function. Report basic statistics of genomic intervals, including
  • count

  • total size

  • unique size

  • mean size

  • median size

  • min size

  • max size

  • Standard deviation

and calculate overlapping measurements, including
  • collocation coefficient (C)

  • Jaccard similarity coefficient (J)

  • Sørensen–Dice coefficient (SD)

  • Szymkiewicz–Simpson coefficient (SS)

  • pointwise mutual information (PMI)

  • normalized pointwise mutual information (NPMI)

11.2. Usage

cobind.py stat -h

usage: cobind.py stat [-h] [--nameA NAMEA] [--nameB NAMEB] [-b BGSIZE]
                      [-l log_file] [-d]
                      input_A.bed input_B.bed

positional arguments:
  input_A.bed           Genomic regions in BED, BED-like or bigBed format. The
                        BED-like format includes:'bed3', 'bed4', 'bed6',
                        'bed12', 'bedgraph', 'narrowpeak', 'broadpeak',
                        'gappedpeak'. BED and BED-like format can be plain
                        text, compressed (.gz, .z, .bz, .bz2, .bzip2) or
                        remote (http://, https://, ftp://) files. Do not
                        compress BigBed foramt. BigBed file can also be a
                        remote file.
  input_B.bed           Genomic regions in BED, BED-like or bigBed format. The
                        BED-like format includes:'bed3', 'bed4', 'bed6',
                        'bed12', 'bedgraph', 'narrowpeak', 'broadpeak',
                        'gappedpeak'. BED and BED-like format can be plain
                        text, compressed (.gz, .z, .bz, .bz2, .bzip2) or
                        remote (http://, https://, ftp://) files. Do not
                        compress BigBed foramt. BigBed file can also be a
                        remote file.

options:
  -h, --help            show this help message and exit
  --nameA NAMEA         Name to represent 1st set of genomic interval. If not
                        specified (None), the file name ("input_A.bed") will
                        be used.
  --nameB NAMEB         Name to represent the 2nd set of genomic interval. If
                        not specified (None), the file name ("input_B.bed")
                        will be used.
  -b BGSIZE, --background BGSIZE
                        The size of the cis-regulatory genomic regions. This
                        is about 1.4Gb For the human genome. (default:
                        1400000000)
  -l log_file, --log log_file
                        This file is used to save the log information. By
                        default, if no file is specified (None), the log
                        information will be printed to the screen.
  -d, --debug           Print detailed information for debugging.

11.3. Example

cobind.py stat CTCF_ENCFF660GHM.bed RAD21_ENCFF057JFH.bed

2022-07-09 09:44:12 [INFO]  Gathering information for "CTCF_ENCFF660GHM.bed" ...
2022-07-09 09:44:12 [INFO]  Gathering information for "RAD21_ENCFF057JFH.bed" ...
A.name                     CTCF_ENCFF660GHM.bed
A.interval_count                          58684
A.interval_total_size                  12190325
A.interval_mean_size                   207.7283
A.interval_median_size                 240.0000
A.interval_min_size                          60
A.interval_max_size                         576
A.interval_size_SD                      51.5489
B.name                    RAD21_ENCFF057JFH.bed
B.interval_count                          33373
B.interval_total_size                  11381586
B.interval_mean_size                   341.0417
B.interval_median_size                 404.0000
B.interval_min_size                         101
B.interval_max_size                         553
B.interval_size_SD                      96.8607
G.size                          1400000000.0000
A.size                                 12184840
Not_A.size                      1387815160.0000
B.size                                 11130268
Not_B.size                      1388869732.0000
A_not_B.size                            7245355
B_not_A.size                            6190783
A_and_B.size                            4939485
A_and_B.exp_size                     96871.8105
A_or_B.size                            18375623
Neither_A_nor_B.size            1381624377.0000
coef.Collocation                         0.4241
coef.Jaccard                             0.2688
coef.Dice                                0.4237
coef.SS                                  0.4438
A_and_B.PMI                              3.9316
A_and_B.NPMI                             0.6962
dtype: object