PMI compared with traditional methods

In bioinformatics, we generally use three main types of methods to link genes or proteins together. Here is how PMI compares to them in terms of logic and performance.

1. PMI vs. Pearson/Spearman Correlation

This is what 90% of biologists use (e.g., in the tool WGCNA). It measures how much two genes move up or down together. So what is the problem here; correlation cannot see “curves.” If Gene A activates Gene B in a “bell curve” fashion, correlation will say the relationship is $0$ . It also cannot distinguish between a direct link and an indirect link. PMI handles any shape of relationship (non-linear). Most importantly, if Gene A and Gene B are both controlled by Gene C, correlation will say A and B are “linked.” PMI will realize they are only linked because of C and will correctly give you a score of $0$ for the A-B link.

2. PMI vs. Partial Correlation

This is used to find “direct” links. It tries to calculate the relationship between $X$ and $Y$ while holding $Z$ constant. Partial Correlation assumes the data follows a “Normal Distribution” (the bell curve). Biological data, especially single-cell data, is “noisy” and “sparse” (lots of zeros), which breaks this method. PMI is “distribution-free.” It doesn’t care if your data is a bell curve or a mess. It uses the actual probability density of your data, making it much more robust for real-world biological samples.

3. PMI vs. Transfer Entropy / Granger Causality

These are “Advanced” methods used to find the direction of a relationship (who controls whom). These require time-series data (measurements taken at different minutes or hours). In biology, most public data (like TCGA) is a “snapshot” of a tumor at one single moment. You cannot use these methods on snapshot data. PMI provides the accuracy of these advanced methods but works on static (snapshot) data. It is much more versatile for the thousands of datasets already sitting on public servers.

Here a quite-summary

Method	Handles non-linear?	Removes indirect links?	Works on static data?	Computation speed
Correlation	No	No	Yes	Very Fast
Partial Correlation	No	Yes	Yes	Fast
Mutual Information	Yes	No	Yes	Fast
Transfer Entropy	Yes	Yes	No	Slow
PMI (The Winner)	Yes	Yes	Yes	Medium (PC Friendly)

If you look at recent papers in journals like Nature Communications or Bioinformatics, authors often complain: > “Our network likely contains false positives due to indirect correlations that we could not filter out > because existing non-linear methods are too computationally expensive.”

By implementing a PMI-based tool, we provided the community with a “Golden Middle.” It is more accurate than correlation, but unlike the “Big AI” models, it doesn’t require a $10,000$ computer.

We essentially took a high-level mathematical concept and turning it into a “filter” that cleans up biological noise using just a standard CPU. This could, for instance, allow researchers with a normal laptop to find a “True” drug target that others missed because their data was cluttered with indirect “ghost” links.

Dany Mukesha

2026-05-13

1. PMI vs. Pearson/Spearman Correlation

2. PMI vs. Partial Correlation

3. PMI vs. Transfer Entropy / Granger Causality

Here a quite-summary