Introduction
This report provides a detailed mathematical analysis of the BioGA R package, which implements a multi-objective genetic algorithm (GA) for genomic data optimization. The analysis covers all major components of the algorithm with formal mathematical notation and proofs where applicable.
Genetic Algorithm Framework
The package implements a standard generational GA with the following components:
- Population Initialization
- Fitness Evaluation
- Selection (NSGA-II inspired)
- Crossover (SBX)
- Mutation (Adaptive)
- Replacement (Elitism + Diversity Preservation)
Mathematical Representation
Let:
- be the gene network where are genes
- be the genomic data matrix ( genes × samples)
- be the population at generation ( individuals × genes)
The GA can be represented as:
Where:
: Fitness evaluation function
: Selection operator
: Crossover operator
: Mutation operator
: Replacement operator
Population Initialization
Fitness Evaluation
The package implements a multi-objective fitness function with two components:
Objective 1: Expression Difference
This measures how well the individual matches the observed expression patterns.
Properties:
Convex function with minimum at
Gradient:
Selection (NSGA-II Inspired)
The selection implements a simplified version of NSGA-II’s non-dominated sorting:
Domination Criteria
Individual dominates iff:
Proof of Partial Order:
Reflexive: No individual dominates itself
Antisymmetric: If dominates , cannot dominate
Transitive: If dominates and dominates , then dominates
Front Construction
- Compute domination counts and dominated sets
- First front: Individuals with domination count
- Subsequent fronts: Remove current front, update counts
Theorem: The front construction algorithm terminates in time where is population size and is number of objectives.
Proof:
Domination check between two individuals is
All pairs check is
Front construction is per front
Crossover (Simulated Binary Crossover - SBX)
Given parents , create offspring :
For each gene : With probability : Else:
Properties:
Preserves mean:
Variance controlled by (distribution index)
For : approaches uniform crossover
For : approaches no crossover ( or )
Mutation
Adaptive mutation with network constraints:
For each gene : With probability :
Properties:
Mutation rate increases with generation
Network term reduces mutation magnitude for highly connected genes
Expected change:
Variance: if network provided
Replacement
Elitism + diversity-preserving replacement:
- Keep best individual:
- For remaining replacements:
- Select random individual
- Select offspring
- Replace with if
Where
Theorem: This strategy preserves elitism while maintaining population diversity.
Proof:
Best solution is never lost
Expected diversity is non-decreasing since replacements only occur when diversity increases
Convergence Analysis
The algorithm can be shown to converge under certain conditions:
Assumptions:
Finite search space
Strictly positive mutation probability 3. Elitism is maintained
Theorem: The algorithm converges in probability to the Pareto front.
Proof Sketch:
The selection and replacement strategies preserve Pareto optimal solutions (elitism)
Mutation provides ergodicity (any state reachable)
By the multi-objective GA convergence theorems (Rudolph 1998), the algorithm converges to the Pareto front
Computational Complexity
Let: - = population size - = number of genes - = number of samples - = number of objectives - = number of generations
Component Complexities:
Initialization:
Fitness Evaluation: (parallelized)
Selection: worst case
Crossover:
Mutation:
Replacement:
Total Complexity:
Mathematical Optimization Interpretation
The algorithm can be viewed as a stochastic optimization method for:
Where: - measures data fidelity - measures sparsity
The GA approach is particularly suitable because:
The problem is multi-objective
The search space is high-dimensional
The fitness landscape may be non-convex
Sparsity objective is non-differentiable
Special Cases and Relationships
-
Single Objective Case
():
- Reduces to nonlinear least squares optimization
- GA serves as global optimizer avoiding local minima
-
No Network Constraints:
- Mutation becomes standard Gaussian mutation
- Problem decomposes by genes
-
High Crossover Rate:
- Approaches a recombination-based search
- Faster convergence but reduced diversity
Biological Interpretation
The mathematical operations correspond to biological concepts:
- Population Initialization: Sampling from observed biological variability
- Fitness: Measuring both functional efficacy (expression matching) and parsimony (sparsity)
- Network Constraints: Incorporating known gene-gene interactions
- Clustering: Respecting co-expressed gene modules
Conclusion
This mathematical foundation shows that the BioGA package implements a theoretically sound multi-objective evolutionary algorithm for genomic data optimization, with proper attention to both computational efficiency and biological relevance.
Session Info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.5.1 (2025-06-13)
#> os Ubuntu 24.04.2 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language en
#> collate C.UTF-8
#> ctype C.UTF-8
#> tz UTC
#> date 2025-07-03
#> pandoc 3.1.11 @ /opt/hostedtoolcache/pandoc/3.1.11/x64/ (via rmarkdown)
#> quarto NA
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> BiocManager 1.30.26 2025-06-05 [1] RSPM
#> BiocStyle * 2.36.0 2025-04-15 [1] Bioconduc~
#> bookdown 0.43 2025-04-15 [1] RSPM
#> bslib 0.9.0 2025-01-30 [1] RSPM
#> cachem 1.1.0 2024-05-16 [1] RSPM
#> cli 3.6.5 2025-04-23 [1] RSPM
#> desc 1.4.3 2023-12-10 [1] RSPM
#> digest 0.6.37 2024-08-19 [1] RSPM
#> evaluate 1.0.4 2025-06-18 [1] RSPM
#> fastmap 1.2.0 2024-05-15 [1] RSPM
#> fs 1.6.6 2025-04-12 [1] RSPM
#> htmltools 0.5.8.1 2024-04-04 [1] RSPM
#> htmlwidgets 1.6.4 2023-12-06 [1] RSPM
#> jquerylib 0.1.4 2021-04-26 [1] RSPM
#> jsonlite 2.0.0 2025-03-27 [1] RSPM
#> knitr 1.50 2025-03-16 [1] RSPM
#> lifecycle 1.0.4 2023-11-07 [1] RSPM
#> pkgdown 2.1.3 2025-05-25 [1] any (@2.1.3)
#> R6 2.6.1 2025-02-15 [1] RSPM
#> ragg 1.4.0 2025-04-10 [1] RSPM
#> rlang 1.1.6 2025-04-11 [1] RSPM
#> rmarkdown 2.29 2024-11-04 [1] RSPM
#> sass 0.4.10 2025-04-11 [1] RSPM
#> sessioninfo 1.2.3 2025-02-05 [1] RSPM
#> systemfonts 1.2.3 2025-04-30 [1] RSPM
#> textshaping 1.0.1 2025-05-01 [1] RSPM
#> xfun 0.52 2025-04-02 [1] RSPM
#> yaml 2.3.10 2024-07-26 [1] RSPM
#>
#> [1] /home/runner/work/_temp/Library
#> [2] /opt/R/4.5.1/lib/R/site-library
#> [3] /opt/R/4.5.1/lib/R/library
#> * ── Packages attached to the search path.
#>
#> ──────────────────────────────────────────────────────────────────────────────