Approximate String Alignment and Its Application to Ests, Mrnas and Genome Mapping
Author | : Cheuk-Hon Terence Yim |
Publisher | : |
Total Pages | : |
Release | : 2017-01-26 |
ISBN-10 | : 1361205644 |
ISBN-13 | : 9781361205648 |
Rating | : 4/5 (44 Downloads) |
Download or read book Approximate String Alignment and Its Application to Ests, Mrnas and Genome Mapping written by Cheuk-Hon Terence Yim and published by . This book was released on 2017-01-26 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: This dissertation, "Approximate String Alignment and Its Application to ESTs, MRNAs and Genome Mapping" by Cheuk-hon, Terence, Yim, 嚴卓漢, was obtained from The University of Hong Kong (Pokfulam, Hong Kong) and is being sold pursuant to Creative Commons: Attribution 3.0 Hong Kong License. The content of this dissertation has not been altered in any way. We have altered the formatting in order to facilitate the ease of printing and reading of the dissertation. All rights not granted by the above license are retained by the author. Abstract: Abstract of thesis entitled Approximate String Alignment and Its Application to ESTs, mRNAs and Genome Mapping Submitted by Yim Cheuk Hon Terence for the degree of Master of Philosophy at The University of Hong Kong in August 2004 Locating and annotating genes in the genome are critical steps towards a better un- derstanding of how the genes function. Di(R)erent techniques have been used for nding the location of genes, including mapping coding sequences such as cDNAs or ESTs to the genome, whole genome alignment between di(R)erent species, or mapping known gene sequences to the genome. All the techniques mentioned involve sequence comparisons. Hence, practical sequence comparisons algorithms are needed. Sequence comparisons in the genome sequence are presently performed by approximate string matching together with sequence alignment algorithms developed some time ago. However, due to the exceptional magnitude of the genome sequence, the high error ratio between sequences, and the complicated internal structure of the genes, new algorithms are now needed to overcome these challenges. This study proposes a new approximate string matching algorithm which can search on large genome text eciently by employing a new indexing method by combining the strength of sux tree and sux array. To maintain performance in a high error ratio situation, we also develop a new ltering scheme by exploring the relationship between the genome text and the query. Experiments show that the overall running time of our new algorithm is between 8 to 10 times faster than that of existing algorithms. Based on our new approximate string matching algorithm, we also develop a mRNA alignment tool that can align mRNA or EST sequences to the genome and eciently identify the correct internal structure of the sequence. Our alignment algorithm performs better than existing tools, especially in a high error situation. (Word counts: 257) DOI: 10.5353/th_b3145573 Subjects: Gene mapping - Data processing Nucleotide sequence - Data processing Molecular biology - Data processing Algorithms