Computes the Effective Number of Codons (ENC; Wright 1990), a widely used measure of synonymous codon usage bias in protein-coding DNA sequences. ENC quantifies how evenly synonymous codons are used across amino acids.
Value
A single numeric value giving the Effective Number of Codons (ENC), constrained to a maximum of 61.
Details
ENC ranges from:
20: extreme codon usage bias (one codon per amino acid)
61: no codon usage bias (all synonymous codons used equally)
This implementation follows the original formulation by Wright (1990), based on codon-family homozygosity (\(F_k\)) for amino acids with \(k = 2, 3, 4, 6\) synonymous codons:
$$ ENC = 2 + \frac{9}{F_2} + \frac{1}{F_3} + \frac{5}{F_4} + \frac{3}{F_6} $$
where \(F_k\) is the average homozygosity of codon usage within each synonymous codon family of size \(k\).
Stop codons and amino acids encoded by a single codon (Methionine and Tryptophan) are excluded from the calculation, as they do not contribute to synonymous codon bias.
The input vector is internally normalized within each synonymous codon family. Amino acids for which no codons are observed are ignored. If insufficient information is available for one or more degeneracy classes, their contribution to ENC is omitted.
References
Wright, F. (1990). The 'effective number of codons' used in a gene. Gene, 87(1), 23–29. doi:10.1016/0378-1119(90)90491-9
Examples
sequence <- "ATGATGATGTTATTATTACGCCGCCGCC"
freqs <- calculate_codon_frequencies(sequence)
calculate_enc(freqs)
#> Warning: argument is not numeric or logical: returning NA
#> Warning: argument is not numeric or logical: returning NA
#> Warning: argument is not numeric or logical: returning NA
#> [1] 5
