My research focuses on the analysis of high-dimensional data, particularly those that arise from the fields of genetics and genomics. As the data generated by experiments in these fields continues to grow in size and complexity, there is an ever greater need for sound statistical procedures that yield scientific insight from large amounts of information. Methodologically, my primary area of research is in penalized regression. Specific areas I am working in are listed below.
Inference for penalized regression estimators
Penalized regression is an attractive methodology for dealing with high-dimensional data where classical likelihood approaches to modeling break down. However, its widespread adoption has been hindered by a lack of inferential tools. In particular, penalized regression is very useful for variable selection, but how confident should one be about those selections? How many of those selections would likely have occurred by chance alone? The papers below represent my ongoing work to estimate false discovery rates for penalized regression models.
-
Feature-specific inference for penalized regression using local false discovery rates
Miller R and Breheny P
Statistics in Medicine, 42: 1412–1429. Journal PDF R
-
Marginal false discovery rate control for likelihood-based penalized regression models
Miller RE and Breheny P
Biometrical Journal, 61: 889–901. Journal PDF R Reproduce
-
Marginal false discovery rates for penalized regression models
Breheny PJ.
Biostatistics, 20: 299–314. Journal PDF R Reproduce
Deconfounding and mixed models
Penalized linear mixed models provide a way to address batch effects and hidden confounding. The papers below represent my work on studying this phenomenon and providing software to fit these models.
-
Penalized linear mixed models for structured genetic data
Reisetter AC and Breheny P
Genetic Epidemiology, 45: 427–444. Journal
Nonconvex penalties
Although the lasso has many attractive properties, it also introduces significant bias toward 0 for large regression coefficients. The MCP and SCAD penalties have been proposed as alternatives designed to diminish this bias, and shown to have attractive theoretical and empirical properties. However, the penalty functions for SCAD and MCP are nonconvex, which introduces numerical challenges in fitting these models, as well as additional practical considerations in tuning parameter selection. The first paper develops algorithms for fitting nonconvex models in high dimensions and proposes local convexity as a diagnostic measure. The second extends these concepts to elastic net-type estimators and further explores the issue of tuning parameter selection. The third paper discusses methods for accelerating convergence in very high-dimensional problems.
-
Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection
Breheny P and Huang J
Annals of Applied Statistics, 5: 232–253. Journal PDF R
-
The Mnet method for variable selection
Huang J, Breheny P, Lee S, Ma S and Zhang C
Statistica Sinica, 26: 903–923. Journal PDF R
-
Strong Rules for Nonconvex Penalties and Their Implications for Efficient Algorithms in High-Dimensional Regression
Lee S and Breheny P
Journal of Computational and Graphical Statistics, 24: 1074–1091. Journal PDF R
Grouped (hierarchical) variable selection
In regression modeling, explanatory variables can often be thought of as grouped. Taking this grouping information into account in the modeling process should improve both the interpretability and the accuracy of the model. These gains are likely to be particularly important in high-dimensional settings where sparsity and variable selection play important roles in estimation accuracy. The first paper provides a review of this subject, while the second extends the ideas of nonconvex penalization to grouped variable selection and proposes efficient algorithms to fit these models. The third paper extends the idea of group selection to the problem of overlapping groups.
-
Overlapping group logistic regression with applications to genetic pathway selection
Zeng Y and Breheny P
Cancer Informatics, 15: 179–187. Journal PDF R
-
Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors
Breheny P and Huang J
Statistics and Computing, 25: 173–187. Journal PDF R
-
A Selective Review of Group Selection in High-Dimensional Models
Huang J, Breheny P and Ma S
Statistical Science, 27: 481–499. Journal PDF
Bi-level variable selection
Most of the methods developed for grouped variable selection produce estimates that are sparse at the group level and not at the level of individual variables. This is not always appropriate for the data. In many applications (e.g., genetic association studies), the goal is to identify important individual markers, but to increase the power of the search by incorporating grouping information. The first paper below introduces this topic; the second identifies some shortcomings of the method proposed in the first and proposes a new method with many advantages over the first.
-
The group exponential lasso for bi-level variable selection
Breheny P.
Biometrics, 71: 731–740. Journal PDF R
-
Penalized methods for bi-level variable selection
Breheny P and Huang J
Statistics and Its Interface, 2: 369–380. PDF R
Visualization of regression models
The importance of visualizing data is widely recognized. Visualization of models and estimates and predictions deriving from those models is just as important, yet tools for easily carrying out these visualizations are less well-developed. The paper below describes our development of software to provide tools for visualizing a wide class of regression models fit in R.
-
Visualization of regression models using visreg
Breheny P and Burchett W
R Journal, 9: 56–71. Journal PDF R Website
Survival analysis
Much of my work in survival analysis is applied, but the following papers are primarily methodological, or contain significant methodological innovation:
-
Peptide receptor radionuclide therapy improves survival in patients who progress after resection of gastroenteropancreatic neuroendocrine tumors
Borbon LC, Sherman SK, Breheny PJ, Chandrasekharan C, Menda Y, Bushnell D, Bellizzi AM, Ear PH, O’Dorisio MS, O’Dorisio TM, Dillon JS and Howe JR
Annals of Surgical Oncology, 32: 1136–1148. Journal
-
Cross-validation approaches for penalized Cox regression
Dai B and Breheny P
Statistical Methods in Medical Research, 33: 702–715. Journal
Genetics and genomics
I have been particularly motivated by genetic association studies, copy-number variation, and gene expression studies. Many of my papers in this area are applied, but the following are primarily methodological, or contain significant methodological innovation:
-
Penalized multimarker vs. single-marker regression methods for genome-wide association studies of quantitative traits
Yi H, Breheny P, Imam N, Liu Y and Hoeschele I
Genetics, 199: 205–222. Journal PDF
Featured article. Link
-
In Vivo identification of eugenol-responsive and muscone-responsive mouse odorant receptors
McClintock TS, Adipietro K, Titlow WB, Breheny P, Walz A, Mombaerts P and Matsunami H
Journal of Neuroscience, 34: 15669–15678. Journal PDF
Featured article. Link
-
Kernel-based aggregation of marker-level genetic association tests involving copy-number variation
Li Y and Breheny P
Microarrays, 2: 265–283. Journal PDF
-
Statistical challenges and opportunities in copy number variant association studies
Breheny P, Li Y and Charnigo R
Journal of Biometrics and Biostatistics, 3: e118. Journal PDF
-
Genetic Association Studies of Copy-Number Variation: Should Assignment of Copy Number States Precede Testing?
Breheny P, Chalise P, Batzler A, Wang L and Fridley BL
PLoS ONE, 7: e34262. Journal PDF
-
Genomics of mature and immature olfactory sensory neurons
Nickell MD, Breheny P, Stromberg AJ and McClintock TS
Journal of Comparative Neurology, 520: 2608–2629. Journal PDF