Introduction
This part reports a detailed mathematical analysis of the BioGA R package, which implements a multi-objective genetic algorithm (GA) for genomic data optimization. The analysis covers all major components of the algorithm with formal mathematical notation and proofs where applicable.
GA framework
The package implements a standard generational GA with the following components:
- Population initialization
- Fitness evaluation
- Selection (NSGA-II inspired)
- Crossover (SBX)
- Mutation (Adaptive)
- Replacement (Elitism + Diversity Preservation)
Mathematical representation
Let:
- be the gene network where are genes
- be the genomic data matrix ( genes × samples)
- be the population at generation ( individuals × genes)
The GA can be represented as:
Where:
: Fitness evaluation function
: Selection operator
: Crossover operator
: Mutation operator
: Replacement operator
Population initialization
Fitness evaluation
The package implements a multi-objective fitness function with two components:
Objective 1: Expression difference
This measures how well the individual matches the observed expression patterns.
Properties:
Convex function with minimum at
Gradient:
Selection (NSGA-II inspired)
The selection implements a simplified version of NSGA-II’s non-dominated sorting:
Domination criteria
Individual dominates iff:
Proof of partial order:
Reflexive: No individual dominates itself
Antisymmetric: If dominates , cannot dominate
Transitive: If dominates and dominates , then dominates
Front construction
- Compute domination counts and dominated sets
- First front: Individuals with domination count
- Subsequent fronts: Remove current front, update counts
Theorem: The front construction algorithm terminates in time where is population size and is number of objectives.
Proof:
Domination check between two individuals is
All pairs check is
Front construction is per front
Crossover (Simulated Binary Crossover - SBX)
Given parents , create offspring :
For each gene : With probability : Else:
Properties:
Preserves mean:
Variance controlled by (distribution index)
For : approaches uniform crossover
For : approaches no crossover ( or )
Mutation
Adaptive mutation with network constraints:
For each gene : With probability :
Properties:
Mutation rate increases with generation
Network term reduces mutation magnitude for highly connected genes
Expected change:
Variance: if network provided
Replacement
Elitism + diversity-preserving replacement:
- Keep best individual:
- For remaining replacements:
- Select random individual
- Select offspring
- Replace with if
Where
Theorem: This strategy preserves elitism while maintaining population diversity.
Proof:
Best solution is never lost
Expected diversity is non-decreasing since replacements only occur when diversity increases
Convergence analysis
The algorithm can be shown to converge under certain conditions:
Assumptions:
Finite search space
Strictly positive mutation probability 3. Elitism is maintained
Theorem: The algorithm converges in probability to the Pareto front.
Proof sketch:
The selection and replacement strategies preserve Pareto optimal solutions (elitism)
Mutation provides ergodicity (any state reachable)
By the multi-objective GA convergence theorems (Rudolph 1998), the algorithm converges to the Pareto front
Computational complexity
Let: - = population size - = number of genes - = number of samples - = number of objectives - = number of generations
Component complexities:
Initialization:
Fitness evaluation: (parallelized)
Selection: worst case
Crossover:
Mutation:
Replacement:
Total complexity:
Mathematical optimization interpretation
The algorithm can be viewed as a stochastic optimization method for:
Where: - measures data fidelity - measures sparsity
The GA approach is particularly suitable because:
The problem is multi-objective
The search space is high-dimensional
The fitness landscape may be non-convex
Sparsity objective is non-differentiable
Special Cases and Relationships
-
Single objective case
():
- Reduces to nonlinear least squares optimization
- GA serves as global optimizer avoiding local minima
-
No Network Constraints:
- Mutation becomes standard Gaussian mutation
- Problem decomposes by genes
-
High Crossover Rate:
- Approaches a recombination-based search
- Faster convergence but reduced diversity
Biological interpretation
The mathematical operations correspond to biological concepts:
- Population Initialization: Sampling from observed biological variability
- Fitness: Measuring both functional efficacy (expression matching) and parsimony (sparsity)
- Network Constraints: Incorporating known gene-gene interactions
- Clustering: Respecting co-expressed gene modules
Conclusion
This mathematical foundation shows that the BioGA package implements a theoretically sound multi-objective evolutionary algorithm for genomic data optimization, with proper attention to both computational efficiency and biological relevance.
Session Info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.5.1 (2025-06-13)
#> os Ubuntu 24.04.2 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language en
#> collate C.UTF-8
#> ctype C.UTF-8
#> tz UTC
#> date 2025-07-09
#> pandoc 3.1.11 @ /opt/hostedtoolcache/pandoc/3.1.11/x64/ (via rmarkdown)
#> quarto NA
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> BiocManager 1.30.26 2025-06-05 [1] RSPM
#> BiocStyle * 2.36.0 2025-04-15 [1] Bioconduc~
#> bookdown 0.43 2025-04-15 [1] RSPM
#> bslib 0.9.0 2025-01-30 [1] RSPM
#> cachem 1.1.0 2024-05-16 [1] RSPM
#> cli 3.6.5 2025-04-23 [1] RSPM
#> desc 1.4.3 2023-12-10 [1] RSPM
#> digest 0.6.37 2024-08-19 [1] RSPM
#> evaluate 1.0.4 2025-06-18 [1] RSPM
#> fastmap 1.2.0 2024-05-15 [1] RSPM
#> fs 1.6.6 2025-04-12 [1] RSPM
#> htmltools 0.5.8.1 2024-04-04 [1] RSPM
#> htmlwidgets 1.6.4 2023-12-06 [1] RSPM
#> jquerylib 0.1.4 2021-04-26 [1] RSPM
#> jsonlite 2.0.0 2025-03-27 [1] RSPM
#> knitr 1.50 2025-03-16 [1] RSPM
#> lifecycle 1.0.4 2023-11-07 [1] RSPM
#> pkgdown 2.1.3 2025-05-25 [1] any (@2.1.3)
#> R6 2.6.1 2025-02-15 [1] RSPM
#> ragg 1.4.0 2025-04-10 [1] RSPM
#> rlang 1.1.6 2025-04-11 [1] RSPM
#> rmarkdown 2.29 2024-11-04 [1] RSPM
#> sass 0.4.10 2025-04-11 [1] RSPM
#> sessioninfo 1.2.3 2025-02-05 [1] RSPM
#> systemfonts 1.2.3 2025-04-30 [1] RSPM
#> textshaping 1.0.1 2025-05-01 [1] RSPM
#> xfun 0.52 2025-04-02 [1] RSPM
#> yaml 2.3.10 2024-07-26 [1] RSPM
#>
#> [1] /home/runner/work/_temp/Library
#> [2] /opt/R/4.5.1/lib/R/site-library
#> [3] /opt/R/4.5.1/lib/R/library
#> * ── Packages attached to the search path.
#>
#> ──────────────────────────────────────────────────────────────────────────────