Procedures for Feature Screening and Interaction Identification in High-dimensional Data Modelling

Procedures for Feature Screening and Interaction Identification in High-dimensional Data Modelling
Author :
Publisher :
Total Pages :
Release :
ISBN-10 : OCLC:1117331826
ISBN-13 :
Rating : 4/5 (26 Downloads)

Book Synopsis Procedures for Feature Screening and Interaction Identification in High-dimensional Data Modelling by : Ling Zhang

Download or read book Procedures for Feature Screening and Interaction Identification in High-dimensional Data Modelling written by Ling Zhang and published by . This book was released on 2019 with total page pages. Available in PDF, EPUB and Kindle. Book excerpt: Nowadays, rapid developments in computer technologies have greatly reduced the cost of collecting and storing a massive amount of data. As a result, data with ultrahigh dimensionality begins to enter our vision due to a cheaper cost. It makes new levels of scientific discoveries promising, but also brings us new challenges of analyzing and understanding these data. Variable selection methods, feature screening procedures, and random forest algorithms have been widely used in many scientific fields such as computational biology, health studies, and financial engineering. The goal is to recover the underlying model structure and make an accurate prediction when a large number of predictors are introduced at the initial stage, but only a small subset of them are truly associated with the response.High dimensional survival data analysis is such a scientific field. In the first part of the dissertation, we propose a two-stage feature screening procedure for varying-coefficient Cox model with ultrahigh dimensional covariates. The varying-coefficient model is flexible and powerful for modeling the dynamic effects of coefficients. In the literature, the screening methods for varying-coefficient Cox model are limited to marginal measurements. Distinguished from the marginal screening, the proposed screening procedure is based on the joint partial likelihood of all predictors. Through this, the proposed procedure can effectively identify active predictors that are jointly dependent of, but marginally independent of the response. In order to carry out the proposed procedure, we propose an efficient algorithm and establish the ascent property of the proposed algorithm. We further prove that the proposed procedure possesses the sure screening property: with probability tending to one, the selected variable set includes the actual active predictors. Monte Carlo simulation is conducted to evaluate the finite sample performance of the proposed procedure, with comparison to SIS(Fan and Lv, 2008) procedure and SJS(Yang et al., 2016) for the Cox model. The proposed methodology is also illustrated through the analysis of two real data examples.Although very helpful and computationally efficient, feature screening is not a very powerful method to detect those marginal unimportant variables that participate in high order interaction effects. However, this is the advantage of random forest algorithms because tree structure is a natural and powerful structure for detecting interaction effects. The drawback of the random forest algorithms is that they don't pay enough attention to feature selection, and therefore include lots of redundancy when constructing the forest. This phenomenon will severely influence the interpretability and prediction performance of the forest especially when only a small proportion among a large number of candidate variables are important.In the second part of the dissertation, we propose combining the advantages of forest algorithm and feature screening for a better understanding of the hidden mechanism. To achieve this, we propose a new two-layer random forest algorithm, ``Iteratively Kings' Forests''(iKF), for feature selection and interaction detection in classification and regression problems. In the first layer, we modified the traditional forest constructing process so that we can fully explore the mechanism, both marginal and interaction effects, related to a given important variable(say "King" variable). In the second layer, we iteratively search the next important variable and iterate the process of the first layer for it. Finally, we not only obtain a screened variable index set but also output a short list of ranked highly possible interaction effects. Simulation comparisons are conducted to compare its performance with the feature screening procedure DC-SIS(Li et al., 2012) and random forest algorithm "iRF"(Basu et al., 2018). Also, we apply iKF procedure for empirical analysis to identify important interactions in an early Drosophila embryo data and compare its performance with "iRF".


Procedures for Feature Screening and Interaction Identification in High-dimensional Data Modelling Related Books

Procedures for Feature Screening and Interaction Identification in High-dimensional Data Modelling
Language: en
Pages:
Authors: Ling Zhang
Categories:
Type: BOOK - Published: 2019 - Publisher:

DOWNLOAD EBOOK

Nowadays, rapid developments in computer technologies have greatly reduced the cost of collecting and storing a massive amount of data. As a result, data with u
Independence Screening in High-Dimensional Data
Language: en
Pages:
Authors: John Wauters
Categories:
Type: BOOK - Published: 2016 - Publisher:

DOWNLOAD EBOOK

High-dimensional data, data in which the number of dimensions exceeds the number of observations, is increasingly common in statistics. The term "ultra-high dim
Detecting Relevant Interactions in High Dimensional Data Analysis
Language: en
Pages: 22
Authors: Mike K. P. So
Categories:
Type: BOOK - Published: 2014 - Publisher:

DOWNLOAD EBOOK

In high dimensional data, relevant interactions can be difficult to identify due to the extremely large number of possible interactions among variables. Convent
Feature Screening For Ultra-high Dimensional Longitudinal Data
Language: en
Pages:
Authors: Wanghuan Chu
Categories:
Type: BOOK - Published: 2016 - Publisher:

DOWNLOAD EBOOK

High and ultrahigh dimensional data analysis is now receiving more and more attention in many scientific fields. Various variable selection methods have been pr
Feature Screening in Ultra-high Dimensional Survival Data Analysis
Language: en
Pages:
Authors: Wei Sun
Categories:
Type: BOOK - Published: 2014 - Publisher:

DOWNLOAD EBOOK

Much research has been devoted to developing variable selection methods for decades since high dimensional data arise from many scientific and technological fie