BAM filtering, QC, and mapping quality profiling tool
alimap is a fully stand-alone Bash-based pipeline for read filtering and quality control of BAM alignment files. It improves upon standard tools by integrating:
hg19
vs hg38
) for correct blacklist applicationThis makes alimap ideal for large-scale genomics QC, and reproducibility pipelines.
While existing BAM filtering tools (e.g., samtools view
, bedtools intersect
) provide basic functionality, they lack integrated QC, reproducible benchmarking, and direct visualization. alimap fills this gap by producing:
git clone https://github.com/danymukesha/alimap.git
cd alimap
You’ll need the following tools in PATH
:
bash
≥ 4.xsamtools
≥ 1.9bedtools
≥ 2.29gnuplot
(for plots)awk
, grep
, sed
, sort
, uniq
(standard GNU coreutils)On Ubuntu/Debian:
sudo apt update
sudo apt install samtools bedtools gnuplot-core
./alimap.sh --help
alimap.sh v2025-08-11 (Author: Dany MUKESHA)
Usage: alimap.sh -i <input.bam> [options]
Required:
-i,--input FILE Input BAM file (sorted). If missing .bai the script will index it.
Options:
-b,--blacklist FILE Optional blacklist BED (if absent and --test used, script downloads hg19/hg38 blacklists)
-t,--threads N Number of threads for samtools (default: 1)
-o,--outdir DIR Output directory (default: filtered_reads)
-m,--mode MODE cumulative|independent (default: cumulative)
--keep-temp Keep temporary files
--test Download small public example BAM and blacklists and run pipeline using them
--checksum FILE Optional: path to a checksum file (tab-delimited: filename<TAB>md5) to validate downloads
-h,--help Show this help
Outputs (written to --outdir):
- filtered_*.bam (.bai generated)
- filter_counts.txt (tab: filter<TAB>reads)
- *_mapq_hist.csv (per-filter mapping quality histogram)
- *.removed.reads.txt (per-filter list of read names removed by blacklist exclusion)
- *.samtools.time.txt (timing/resource usage from /usr/bin/time, if available)
- qc/* (samtools flagstat/idxstats/stats)
- alimap_filters_plot.pdf (stacked bars: kept vs removed + percent)
- run_info.txt (provenance)
./alimap.sh --bam input.bam --blacklist blacklist.bed --outdir results --threads 4
./alimap.sh --test --outdir results_test --threads 2
When you run the test, the script will fetch a small, real BAM from a public source (1000 Genomes Project) plus a minimal blacklist BED.
Here are the expected sizes & MD5s (for the test run)
File | Size (approx) | MD5 |
---|---|---|
test.bam |
~1.5 MB | 0c5f5793db8cc6f9b7f86b6c17264b6b |
test.bam.bai |
~32 KB | 4d0f1f68b7ff6d868d513e073ed3daba |
blacklist_hg19.bed |
~35 KB | a65ed0c4b9a773c94aa6f2afbb3d9788 |
blacklist_hg38.bed |
~39 KB | b227a36449e52f9243cf2bb8bc1f77f2 |
They are automatically validated within the script after downloading.
The script also have the ability to detect whether the BAM uses hg19 or hg38 by checking chromosome naming:
chr1
, chr2
→ likely hg381
, 2
→ likely hg19 (Though there’s overlap — some custom builds may mix)Test run example (--test
):
Option | Description |
---|---|
--bam |
Input BAM file |
--blacklist |
BED file of regions to remove (auto-switch for hg19/hg38) |
--outdir |
Output directory |
--threads |
Number of threads for samtools |
--test |
Download public example BAM & BED for validation |
--help |
Show usage information |
The pipeline produces:
File | Description |
---|---|
filtered.bam |
Final filtered BAM |
removed_reads.log |
Per-read removal log (read IDs & reason) |
filter_counts.txt |
Counts & % reads removed at each stage |
mapq_histogram_<filter>.csv |
MAPQ distribution per filter step |
mapq_histogram.pdf |
MAPQ distribution plots |
filter_removal_stacked_bar.pdf |
Publication-ready plot of % removed |
runtime_report.txt |
Timing & resource usage report |
md5_checksums.txt |
MD5 hashes of key files for reproducibility |
The pipeline automatically logs:
Example runtime report:
Stage Time (s) Peak RSS (MB)
Blacklist removal 2.14 110
MAPQ filter 1.09 102
Unmapped removal 0.87 95
Total 4.10 110
If you use Alimap in your research, please cite:
Mukesha D. Alimap: Integrated read filtering, QC, and benchmarking for BAM alignment files. 2025. Available at: https://github.com/danymukesha/alimap{.uri}
This project is licensed under the MIT License - see the LICENSE file for details.
Thanks to the developers of samtools
and bedtools
for the underlying alignment and interval manipulation tools.