UNSUPPORTED SOFTWARE: USE AT OWN RISK

NullHWE - Test for HWE When One Allele is Null

A null allele is one that isn't detected except when it occurs homozygously. For example, in the blood-antigen ABO system, O is a null allele because from simple blood reaction, the genotypes AO and AA are indistinguishable.

In the presence of a null allele, the apparent distribution of genotypes will appear skewed in favour of homozygotes, because, for example, AO heterozygotes are counted as AA homozygotes. Therefore, when a standard test for Hardy-Weinberg Equilibrium (HWE) (e.g. Guo & Thompson's randomization program HWE, available at http://gause.biology.ualberta.ca/jbrzusto/hwenj.html) indicates lack of equilibrium, it is useful to see to what extent this apparent disequilibrium might be explained by the presence of a null allele.

The standard method for estimating the frequency of a null allele is the Estimation-Maximization (EM) Algorithm, which finds those allele frequencies that maximize the probability of the observed results under the assumption of HWE. (This algorithm, for the case of a null allele, is available at http://gause.biology.ualberta.ca/jbrzusto/nullele.html) What is left to do is to determine to what extent, even assuming the existence of a null allele, there is still evidence of disequilibrium.

This web page offers one approach. In what follows, assume there is a null allele.
Two observable genotype distributions are compatible if there is an allele distribution (including null alleles) with which they are both compatible. For example, in the A-B-O blood type system,
A:10 B:15 AB:20 O:10
is compatible with
A:5 B:10 AB:25 O:15
because they are both compatible with the allele counts:
a:35 b:45 o:30
under the arrangements:
AA:5 BB:10 AB:20 OO:10 AO:5 BO:5
and
AA:5 BB:10 AB:25 OO:15 AO:0 BO:0,
respectively.
This is a straightforward extension of the notion of compatible genotype distributions used by the Guo & Thompson's HWE randomization test. There, allele counts are fixed, because all alleles are visible. The randomization test presented below randomizes over the collection of all genotype distributions compatible with the observed one.

The test proceeds in the usual randomizing way:

At a general level, this algorithm is identical to Guo & Thompson. The differences are:

So the final output answers the question, "How often would a compatible genotype distribution yield as bad or worse a fit to HWE, after using EM to improve the fit with a null allele, as the observed genotype distribution?"


The sample data is for blood types of 2060 Croatians, from Mourant et al. 1976. The first number is 2, the number of non-null alleles. Then come the numbers of people with the phenotypes A, AB, and B. Finally, the number of people with phenotype O, which implies genotype OO. If you click on "Calculate!", the program uses EM to produce estimates of all 3 allele frequencies, and the corresponding expected genotype frequencies under Hardy-Weinberg equilibrium. It then generates the given number of random compatible distributions, and uses EM on each to obtain a best HWE fit, keeping track of those that fit no better than the observed data.

Input format:

Numbers must be separated by spaces, tabs, and/or newlines.

Output:

Output is rounded to four decimal places for allele frequencies, and 2 decimal places for genotype counts.

Comments, complaints, questions to John Brzustowski. This is free software, with source available in this .zip archive.