
PMI compared with traditional methods
Dany Mukesha
2026-05-13
Source:vignettes/PMIvsOthers.Rmd
PMIvsOthers.RmdIn bioinformatics, we generally use three main types of methods to link genes or proteins together. Here is how PMI compares to them in terms of logic and performance.
1. PMI vs. Pearson/Spearman Correlation
This is what 90% of biologists use (e.g., in the tool WGCNA). It measures how much two genes move up or down together. So what is the problem here; correlation cannot see “curves.” If Gene A activates Gene B in a “bell curve” fashion, correlation will say the relationship is . It also cannot distinguish between a direct link and an indirect link. PMI handles any shape of relationship (non-linear). Most importantly, if Gene A and Gene B are both controlled by Gene C, correlation will say A and B are “linked.” PMI will realize they are only linked because of C and will correctly give you a score of for the A-B link.
2. PMI vs. Partial Correlation
This is used to find “direct” links. It tries to calculate the relationship between and while holding constant. Partial Correlation assumes the data follows a “Normal Distribution” (the bell curve). Biological data, especially single-cell data, is “noisy” and “sparse” (lots of zeros), which breaks this method. PMI is “distribution-free.” It doesn’t care if your data is a bell curve or a mess. It uses the actual probability density of your data, making it much more robust for real-world biological samples.
3. PMI vs. Transfer Entropy / Granger Causality
These are “Advanced” methods used to find the direction of a relationship (who controls whom). These require time-series data (measurements taken at different minutes or hours). In biology, most public data (like TCGA) is a “snapshot” of a tumor at one single moment. You cannot use these methods on snapshot data. PMI provides the accuracy of these advanced methods but works on static (snapshot) data. It is much more versatile for the thousands of datasets already sitting on public servers.
Here a quite-summary
| Method | Handles non-linear? | Removes indirect links? | Works on static data? | Computation speed |
|---|---|---|---|---|
| Correlation | No | No | Yes | Very Fast |
| Partial Correlation | No | Yes | Yes | Fast |
| Mutual Information | Yes | No | Yes | Fast |
| Transfer Entropy | Yes | Yes | No | Slow |
| PMI (The Winner) | Yes | Yes | Yes | Medium (PC Friendly) |
If you look at recent papers in journals like Nature Communications or Bioinformatics, authors often complain: > “Our network likely contains false positives due to indirect correlations that we could not filter out > because existing non-linear methods are too computationally expensive.”
By implementing a PMI-based tool, we provided the community with a “Golden Middle.” It is more accurate than correlation, but unlike the “Big AI” models, it doesn’t require a computer.
We essentially took a high-level mathematical concept and turning it into a “filter” that cleans up biological noise using just a standard CPU. This could, for instance, allow researchers with a normal laptop to find a “True” drug target that others missed because their data was cluttered with indirect “ghost” links.