Problem Introduction
Parameter Description
Random Data Generation
Software Requirements

Problem Introduction


Most part of genomes between two humans are identical. The sites of genomes that make differences among human population are Single Nucleotide Polymorphisms (SNPs). The values of a set of SNPs on a particular chromosome copy define a haplotype. Haplotyping an individual involves determining a pair of haplotypes, one for each copy of a given chromosome according to some optimal objective functions.

In the recent years, the haplotyping problem has been extensively studied. There are several versions of the haplotyping problem. We consider the singular haplotype reconstruction problem that asks to reconstruct two unknown haplotypes from the input matrix of fragments as accurately as possible. We develop a probabilistic approach to overcome some of the difficulties caused by the incompleteness and inconsistency occurred in the input fragments.

Parameter Description


Our probabilistic haplotype reconstruction approach is characterized by the following parameters:

H1: unknown haplotype 1.
H2: unknown haplotype 2.
n: number of fragments i.e. number of rows in the SNP matrix. Each fragment is a copy of H1 or H2 with inconsistency and incompleteness errors.
m: haplotype length i.e. number of columns in the SNP matrix.
β: haplotype dissimilarity. β is measured using the Hamming distance between H1 and H2 divided by the length m of H1 and H2, and is assumed to be small.
n1: number of fragments from haplotype 1. n1 = n * p where p is the proportion of fragments from haplotype 1.
n2: number of fragments from haplotype 2. n2 = n - n1.
α1: inconsistency error rate i.e. probability of reading errors.
α2: incompleteness error rate i.e. probability of holes.

Random Data Generation


In the experimental studies of algorithmic solutions to the singular haplotype reconstruction problem, we often need to generate synthetic data to evaluate the performance and accuracy of a given algorithm. One common practice is as follows: First, randomly choose two haplotypes H1 and H2 such that the dissimilarity between H1 and H2 is at least β. Second, make ni copies of Hi, i = 1, 2. Third, for each copy H = a1 a2 ... am of Hi, for each j = 1, 2, . . . ,m, with probability α1, flip aj to a'j so that they are inconsistent. Also, independently, aj has probability α2 to be a hole -. A synthetic data set is then generated by setting parameters m, n1, n2, β, α1 and α2.

The seed of random number generator is choosed on the basis of the system clock so as to produce a different data set upon each execution.

Software Requirements


Java Runtime Environment is necessary for running this applet. Please download and install it.

In your web browser when you agree to display the applet, it will be downloaded. The calculation is then performed on your system.