Compare different metrics

The table below gives the lower and upper bounds of the 6 metrics and their major drawbacks if any.

Metric

Lower bound

Upper bound

Comments

C(A,B)

0 (no overlap)

1 (A = B)

J(A,B)

0 (no overlap)

1 (A = B)

Bias towards the larger interval

SD(A,B)

0 (no overlap)

1 (A = B)

Bias towards the larger interval

SS(A,B)

0 (no overlap)

1 (A = B, A ∈ B, or B ∈ A)

Bias towards the smaller interval

PMI

-inf (no overlap)

min(-log(p(A)), -log(p(B)))

No fixed bound

NPMI

-1 (no overlap)

1 (A = B)

The table below compares the intersection-based metrics. C, J, SD, and SS. All the four metrics are bounded by 0 and 1. When the size of the two genomic intervals are significanlty different, C is less sensitive to the extreme, and gives a compromised score compared to J/SD and SS.

C(A,B) vs J(A,B) vs SD(A,B) vs SS(A,B)

SROG

|A|

|B|

|A ∩ B|

|A ∪ B|

C

J

SD

SS

A equals B

1000

1000

1000

1000

1

1

1

1

A disjoint B

1000

1000

0

2000

0

0

0

0

A overlaps B

100

1000

50

1050

0.158

0.0476

0.0909

0.5

A within B

100

1000

100

1000

0.316

0.1

0.182

1

CTCF: Demonstration

70-95% of CTCF binding sites are also bound by cohesin complex (including SMC1, SMC3, RAD21, STAG1, and STAG2) to establish chromatin loops and regulate gene expression [1], [2].

We used CTCF-cohesin as a positive control to evaluate the performance of the six collocation measurements (including C, J, SD, SS, PMI and NPMI). We first calculated the scores of these metrics between all the binding sites (defined as cistrome) of CTCF with those cistromes of 1207 TFs curated in the ReMap database. Then, we calculate the Zscore as an overall measurement of the cobindability. Please note, TRIM22 is not part of the cohesin complex, but multiple studies have identified TRIM22 as a critical regulator of chromatin structure. TRIM22 bindings are highly enriched at chromatin contact domain boundaries [3], [4].

Alternative text

Collocation between CTCF binding sites and the binding sites of 1207 TFs were evaluated uing the six measurements as well as the zscore. Only the top 20 TFs were displayed.