Efficient Approximate String Matching Techniques for Sequence Alignment

Efficient Approximate String Matching Techniques for Sequence Alignment
Author :
Publisher :
Total Pages : 213
Release :
ISBN-10 : OCLC:1120500733
ISBN-13 :
Rating : 4/5 (33 Downloads)

Book Synopsis Efficient Approximate String Matching Techniques for Sequence Alignment by : Santiago Marco-Sola

Download or read book Efficient Approximate String Matching Techniques for Sequence Alignment written by Santiago Marco-Sola and published by . This book was released on 2018 with total page 213 pages. Available in PDF, EPUB and Kindle. Book excerpt: One of the outstanding milestones achieved in recent years in the field of biotechnology research has been the development of high-throughput sequencing (HTS). Due to the fact that at the moment it is technically impossible to decode the genome as a whole, HTS technologies read billions of relatively short chunks of a genome at random locations. Such reads then need to be located within a reference for the species being studied (that is aligned or mapped to the genome): for each read one identifies in the reference regions that share a large sequence similarity with it, therefore indicating what the read¿s point or points of origin may be. HTS technologies are able to re-sequence a human individual (i.e. to establish the differences between his/her individual genome and the reference genome for the human species) in a very short period of time. They have also paved the way for the development of a number of new protocols and methods, leading to novel insights in genomics and biology in general. However, HTS technologies also pose a challenge to traditional data analysis methods; this is due to the sheer amount of data to be processed and the need for improved alignment algorithms that can generate accurate results quickly. This thesis tackles the problem of sequence alignment as a step within the analysis of HTS data. Its contributions focus on both the methodological aspects and the algorithmic challenges towards efficient, scalable, and accurate HTS mapping. From a methodological standpoint, this thesis strives to establish a comprehensive framework able to assess the quality of HTS mapping results. In order to be able to do so one has to understand the source and nature of mapping conflicts, and explore the accuracy limits inherent in how sequence alignment is performed for current HTS technologies. From an algorithmic standpoint, this work introduces state-of-the-art index structures and approximate string matching algorithms. They contribute novel insights that can be used in practical applications towards efficient and accurate read mapping. More in detail, first we present methods able to reduce the storage space taken by indexes for genome-scale references, while still providing fast query access in order to support effective search algorithms. Second, we describe novel filtering techniques that vastly reduce the computational requirements of sequence mapping, but are nonetheless capable of giving strict algorithmic guarantees on the completeness of the results. Finally, this thesis presents new incremental algorithmic techniques able to combine several approximate string matching algorithms; this leads to efficient and flexible search algorithms allowing the user to reach arbitrary search depths. All algorithms and methodological contributions of this thesis have been implemented as components of a production aligner, the GEM-mapper, which is publicly available, widely used worldwide and cited by a sizeable body of literature. It offers flexible and accurate sequence mapping while outperforming other HTS mappers both as to running time and to the quality of the results it produces.


Efficient Approximate String Matching Techniques for Sequence Alignment Related Books

Efficient Approximate String Matching Techniques for Sequence Alignment
Language: en
Pages: 213
Authors: Santiago Marco-Sola
Categories:
Type: BOOK - Published: 2018 - Publisher:

DOWNLOAD EBOOK

One of the outstanding milestones achieved in recent years in the field of biotechnology research has been the development of high-throughput sequencing (HTS).
Approximate String Alignment and Its Application to Ests, Mrnas and Genome Mapping
Language: en
Pages:
Authors: Cheuk-Hon Terence Yim
Categories:
Type: BOOK - Published: 2017-01-26 - Publisher:

DOWNLOAD EBOOK

This dissertation, "Approximate String Alignment and Its Application to ESTs, MRNAs and Genome Mapping" by Cheuk-hon, Terence, Yim, 嚴卓漢, was obtained from
Efficient String Algorithms with Applications in Bioinformatics
Language: en
Pages: 73
Authors: Sahar Hooshmand
Categories:
Type: BOOK - Published: 2020 - Publisher:

DOWNLOAD EBOOK

The work presented in this dissertation deals with establishing efficient methods for solving some algorithmic problems, which have applications to Bioinformati
Approximate String Matching in DNA Sequences
Language: en
Pages:
Authors:
Categories:
Type: BOOK - Published: 2004 - Publisher:

DOWNLOAD EBOOK

(Uncorrected OCR) Abstract of thesis entitled "Approximate String Matching in DNA Sequences" Submitted by Cheng Lok Lam for the degree of Master of Philosophy a
Handbook of Exact String Matching Algorithms
Language: en
Pages: 238
Authors: Christian Charras
Categories: Computers
Type: BOOK - Published: 2004 - Publisher: College PressPub Company

DOWNLOAD EBOOK

String matching is a very important subject in the wider domain of text processing. It consists of finding one, or more generally, all the occurrences of a stri