Skip to contents

In bioinformatics, we generally use three main types of methods to link genes or proteins together. Here is how PMI compares to them in terms of logic and performance.

1. PMI vs. Pearson/Spearman Correlation

This is what 90% of biologists use (e.g., in the tool WGCNA). It measures how much two genes move up or down together. So what is the problem here; correlation cannot see “curves.” If Gene A activates Gene B in a “bell curve” fashion, correlation will say the relationship is 00. It also cannot distinguish between a direct link and an indirect link. PMI handles any shape of relationship (non-linear). Most importantly, if Gene A and Gene B are both controlled by Gene C, correlation will say A and B are “linked.” PMI will realize they are only linked because of C and will correctly give you a score of 00 for the A-B link.

2. PMI vs. Partial Correlation

This is used to find “direct” links. It tries to calculate the relationship between XX and YY while holding ZZ constant. Partial Correlation assumes the data follows a “Normal Distribution” (the bell curve). Biological data, especially single-cell data, is “noisy” and “sparse” (lots of zeros), which breaks this method. PMI is “distribution-free.” It doesn’t care if your data is a bell curve or a mess. It uses the actual probability density of your data, making it much more robust for real-world biological samples.

3. PMI vs. Transfer Entropy / Granger Causality

These are “Advanced” methods used to find the direction of a relationship (who controls whom). These require time-series data (measurements taken at different minutes or hours). In biology, most public data (like TCGA) is a “snapshot” of a tumor at one single moment. You cannot use these methods on snapshot data. PMI provides the accuracy of these advanced methods but works on static (snapshot) data. It is much more versatile for the thousands of datasets already sitting on public servers.


Here a quite-summary

Method Handles non-linear? Removes indirect links? Works on static data? Computation speed
Correlation No No Yes Very Fast
Partial Correlation No Yes Yes Fast
Mutual Information Yes No Yes Fast
Transfer Entropy Yes Yes No Slow
PMI (The Winner) Yes Yes Yes Medium (PC Friendly)

If you look at recent papers in journals like Nature Communications or Bioinformatics, authors often complain: > “Our network likely contains false positives due to indirect correlations that we could not filter out > because existing non-linear methods are too computationally expensive.”

By implementing a PMI-based tool, we provided the community with a “Golden Middle.” It is more accurate than correlation, but unlike the “Big AI” models, it doesn’t require a 10,00010,000 computer.

We essentially took a high-level mathematical concept and turning it into a “filter” that cleans up biological noise using just a standard CPU. This could, for instance, allow researchers with a normal laptop to find a “True” drug target that others missed because their data was cluttered with indirect “ghost” links.