DNA resequencing is the task of sequencing a DNA region for an individual given that a reference sequence for this region is already available for the specific species.
Different individuals of the same species differ in their DNA sequences. Single nucleotide polymorphism (SNP) is the most common form of variation between individuals. SNPs are present in the human genome with at least a frequency of around 0.1% and are considered to be responsible for a great part of individual phenotypic variation.
A DNA microarray that performs resequencing by hybridization - known as a variation detection array (VDA) or resequencing array (RA) - is one of the methods available for high-throughput SNP discovery. We are exploring improvements in chip design and data analysis to improve the utility of these devices. Recent work involved the development of an alternative, physical model-based algorithm for base-calling that is specifically targeted for improved heterozygous accuracy at high call rates. We are also experimenting with varying probe length on microarrays to improve the hybridization efficiency. Our aim is to develop a model-based probe length selection strategy for VDAs.
In the figure below, the profiles for all 10 possible genotypes is shown based on our predictive model and the observed test data for the correct genotype is on the left. The graph below the profiles shows the improvement in accuracy for heterozygous sites as a function of call rate.
(from Zhan Y and Kulp D. (2005). "Model-P: basecalling for resequencing microarrays of diploid samples." Bioinformatics, 21 (Suppl 2), ii182-ii189. pdf)
Haploid resquencing
In addition, we are currently applying VDAs to whole genome genotyping of Dengue serotypes (with Irene Bosch) and Toxoplasma (with David Roos and Amit Bahl). In collaboration with Derek Lovely, we are at the early stages of a directed adaption experiment using Geobacter metalireducen, which will involve a resequencing array strategy to discover SNPs under positive selection. From a technology development perspective, we aim to reduce the number of required probes, minimize false positives, and detect copy number changes, while limiting the number of technical replicates.