Machine Learning Aided Feature Selection for Ultrahigh Dimensional Data

Machine Learning Aided Feature Selection for Ultrahigh Dimensional Data
Author :
Publisher :
Total Pages : 0
Release :
ISBN-10 : 9798379741006
ISBN-13 :
Rating : 4/5 (06 Downloads)

Book Synopsis Machine Learning Aided Feature Selection for Ultrahigh Dimensional Data by : Arkaprabha Ganguli

Download or read book Machine Learning Aided Feature Selection for Ultrahigh Dimensional Data written by Arkaprabha Ganguli and published by . This book was released on 2023 with total page 0 pages. Available in PDF, EPUB and Kindle. Book excerpt: The field of statistical machine learning has seen a surge in popularity for feature selection methods for ultra-high dimensional datasets due to their huge applicability in various scientific domains ranging from genetics to astronomy. These applications typically involve a vast number of potential features, and a quantitative response or outcome variable. Also, often it is observed/hypothesized that only a small subset of these features are truly associated with the response. Any traditional feature selection algorithm is motivated by the need to uncover the true sparsity pattern, buried in the ultra-high dimensional data setting. However, these methods may lead to high false discoveries providing poor scientific insights into the underlying relationship. The error-controlled methods are designed to address this issue by controlling the expected proportion of falsely identified features among the selected ones. In this thesis, we develop and study two novel feature selection methods for ultrahigh dimensional data with False Discovery Rate (FDR) control with a real-world application in the context of diffusion magnetic resonance imaging (DMRI) tractography data.In the first chapter, we propose a p-value-free FDR controlling method for feature selection. Most of the state-of-the-art methods in the literature for controlling FDR rely on p-value, which depends on specific assumptions on the data distribution and may be questionable in some high-dimensional settings. To surpass this problem, we propose a 'screening \\& cleaning' strategy consisting of assigning importance scores to the predictors, followed by constructing an estimate of the FDR. We study the theoretical properties of the method and demonstrate its superior performance compared to existing methods in an extensive simulation study. Finally, we apply the method to a gene expression dataset and identify important genes associated with drug sensitivity.In the second chapter, We extend the feature selection method from a linear model to a non-linear and non-parametric setting by utilizing the Deep Learning (DL) framework. The DL has been at the center of analytics in recent years due to its impressive empirical success in analyzing complex data objects. Despite this success, most existing tools behave like black-box machines, thus the increasing interest in interpretable, reliable, and robust deep learning models applicable to a broad class of applications. Feature-selected deep learning has emerged as a promising tool in this realm. However, the recent developments do not accommodate ultra-high dimensional and highly correlated features or high noise levels. In this article, we propose a novel screening and cleaning method with the aid of deep learning for a data-adaptive multi-resolutional discovery of highly correlated predictors with a controlled FDR. Extensive empirical evaluations over a wide range of simulated scenarios and several real datasets demonstrate the effectiveness of the proposed method in achieving high power while keeping the false discovery rate at a minimum.In the third and final chapter, we apply the proposed feature selection methods to the brain imaging tractography dataset. Our motivation comes from the evidence from studies of dementia which shows that some older adults continue to maintain their cognitive abilities despite signs of ongoing neuropathological diseases. Commonly referred to as cognitive reserve, this phenomenon has unclear neurobiological substrates and a current understanding of corresponding markers is lacking. This study aims at investigating the immense system of structural connections between brain regions constituting subcortical white matter (WM) as potential markers of cognitive reserve. Diffusion MRI tractography is an established computational neuroimaging method to model WM fiber organization throughout the brain. Standard statistical analyses capable of leveraging the high dimensionality of tractography data face additional methodological complications beyond those encountered in typical feature selection problems. Our proposed methodology is specifically tailored for addressing these concerns. Extensive simulation studies on synthetic datasets mimicking the real tractography dataset demonstrate a substantial gain in power with minimal false discoveries, compared with state-of-the-art methods for feature selection. Our application to predicting cognitive reserve in a clinical aging neuroimaging tractography dataset produces anatomically meaningful discoveries in brain regions associated with risk and resilience to neurodegeneration.Overall, this thesis presents novel and effective methods for feature selection in ultrahigh dimensional settings. Our proposed framework would benefit the researchers and professionals who encounter the difficulty of choosing pertinent variables from correlated and vast datasets in diverse fields, ranging from finance and social sciences to biology.


Machine Learning Aided Feature Selection for Ultrahigh Dimensional Data Related Books

Machine Learning Aided Feature Selection for Ultrahigh Dimensional Data
Language: en
Pages: 0
Authors: Arkaprabha Ganguli
Categories: Electronic dissertations
Type: BOOK - Published: 2023 - Publisher:

DOWNLOAD EBOOK

The field of statistical machine learning has seen a surge in popularity for feature selection methods for ultra-high dimensional datasets due to their huge app
Feature Selection for High-Dimensional Data
Language: en
Pages: 163
Authors: Verónica Bolón-Canedo
Categories: Computers
Type: BOOK - Published: 2015-10-05 - Publisher: Springer

DOWNLOAD EBOOK

This book offers a coherent and comprehensive approach to feature subset selection in the scope of classification problems, explaining the foundations, real app
Computational Methods of Feature Selection
Language: en
Pages: 437
Authors: Huan Liu
Categories: Business & Economics
Type: BOOK - Published: 2007-10-29 - Publisher: CRC Press

DOWNLOAD EBOOK

Due to increasing demands for dimensionality reduction, research on feature selection has deeply and widely expanded into many fields, including computational s
Feature Screening and Variable Selection for Ultrahigh Dimensional Data Analysis
Language: en
Pages: 155
Authors: Wei Zhong
Categories:
Type: BOOK - Published: 2012 - Publisher:

DOWNLOAD EBOOK

Feature Engineering for Machine Learning and Data Analytics
Language: en
Pages: 366
Authors: Guozhu Dong
Categories: Business & Economics
Type: BOOK - Published: 2018-03-14 - Publisher: CRC Press

DOWNLOAD EBOOK

Feature engineering plays a vital role in big data analytics. Machine learning and data mining algorithms cannot work without data. Little can be achieved if th