My research focuses on the analysis of high-dimensional data, particularly those that arise from the fields of genetics and genomics. As the data generated by experiments in these fields continues to grow in size and complexity, there is an ever greater need for sound statistical procedures that yield scientific insight from large amounts of information. Methodologically, my primary area of research is in penalized regression. Specific areas I am working in are listed below.

Inference for penalized regression estimators

Penalized regression is an attractive methodology for dealing with high-dimensional data where classical likelihood approaches to modeling break down. However, its widespread adoption has been hindered by a lack of inferential tools. In particular, penalized regression is very useful for variable selection, but how confident should one be about those selections? How many of those selections would likely have occurred by chance alone? The papers below represent my ongoing work to estimate false discovery rates for penalized regression models.

Feature-specific inference for penalized regression using local false discovery rates

Miller R and Breheny P

Statistics in Medicine, 42: 1412–1429. Journal PDF R
Marginal false discovery rate control for likelihood-based penalized regression models

Miller RE and Breheny P

Biometrical Journal, 61: 889–901. Journal PDF R Reproduce
Marginal false discovery rates for penalized regression models

Breheny PJ.

Biostatistics, 20: 299–314. Journal PDF R Reproduce

Deconfounding and mixed models

Penalized linear mixed models provide a way to address batch effects and hidden confounding. The papers below represent my work on studying this phenomenon and providing software to fit these models.

Penalized linear mixed models for structured genetic data

Reisetter AC and Breheny P

Genetic Epidemiology, 45: 427–444. Journal

Nonconvex penalties

Although the lasso has many attractive properties, it also introduces significant bias toward 0 for large regression coefficients. The MCP and SCAD penalties have been proposed as alternatives designed to diminish this bias, and shown to have attractive theoretical and empirical properties. However, the penalty functions for SCAD and MCP are nonconvex, which introduces numerical challenges in fitting these models, as well as additional practical considerations in tuning parameter selection. The first paper develops algorithms for fitting nonconvex models in high dimensions and proposes local convexity as a diagnostic measure. The second extends these concepts to elastic net-type estimators and further explores the issue of tuning parameter selection. The third paper discusses methods for accelerating convergence in very high-dimensional problems.

Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection

Breheny P and Huang J

Annals of Applied Statistics, 5: 232–253. Journal PDF R
The Mnet method for variable selection

Huang J, Breheny P, Lee S, Ma S and Zhang C

Statistica Sinica, 26: 903–923. Journal PDF R
Strong rules for nonconvex penalties and their implications for efficient algorithms in high-dimensional regression

Lee S and Breheny P

Journal of Computational and Graphical Statistics, 24: 1074–1091. Journal PDF R

Grouped (hierarchical) variable selection

In regression modeling, explanatory variables can often be thought of as grouped. Taking this grouping information into account in the modeling process should improve both the interpretability and the accuracy of the model. These gains are likely to be particularly important in high-dimensional settings where sparsity and variable selection play important roles in estimation accuracy. The first paper provides a review of this subject, while the second extends the ideas of nonconvex penalization to grouped variable selection and proposes efficient algorithms to fit these models. The third paper extends the idea of group selection to the problem of overlapping groups.

Overlapping group logistic regression with applications to genetic pathway selection

Zeng Y and Breheny P

Cancer Informatics, 15: 179–187. Journal PDF R
Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors

Breheny P and Huang J

Statistics and Computing, 25: 173–187. Journal PDF R
A selective review of group selection in high-dimensional models

Huang J, Breheny P and Ma S

Statistical Science, 27: 481–499. Journal PDF

Bi-level variable selection

Most of the methods developed for grouped variable selection produce estimates that are sparse at the group level and not at the level of individual variables. This is not always appropriate for the data. In many applications (e.g., genetic association studies), the goal is to identify important individual markers, but to increase the power of the search by incorporating grouping information. The first paper below introduces this topic; the second identifies some shortcomings of the method proposed in the first and proposes a new method with many advantages over the first.

The group exponential lasso for bi-level variable selection

Breheny P.

Biometrics, 71: 731–740. Journal PDF R
Penalized methods for bi-level variable selection

Breheny P and Huang J

Statistics and Its Interface, 2: 369–380. PDF R

Visualization of regression models

The importance of visualizing data is widely recognized. Visualization of models and estimates and predictions deriving from those models is just as important, yet tools for easily carrying out these visualizations are less well-developed. The paper below describes our development of software to provide tools for visualizing a wide class of regression models fit in R.

Visualization of regression models using visreg

Breheny P and Burchett W

R Journal, 9: 56–71. Journal PDF R Website

Survival analysis

Much of my work in survival analysis is applied, but the following papers are primarily methodological, or contain significant methodological innovation:

Peptide receptor radionuclide therapy improves survival in patients who progress after resection of gastroenteropancreatic neuroendocrine tumors

Borbon LC, Sherman SK, Breheny PJ, Chandrasekharan C, Menda Y, Bushnell D, Bellizzi AM, Ear PH, O’Dorisio MS, O’Dorisio TM, Dillon JS and Howe JR

Annals of Surgical Oncology, 32: 1136–1148. Journal
Cross-validation approaches for penalized Cox regression

Dai B and Breheny P

Statistical Methods in Medical Research, 33: 702–715. Journal

Genetics and genomics

I have been particularly motivated by genetic association studies, copy-number variation, and gene expression studies. Many of my papers in this area are applied, but the following are primarily methodological, or contain significant methodological innovation:

Penalized multimarker vs. single-marker regression methods for genome-wide association studies of quantitative traits

Yi H, Breheny P, Imam N, Liu Y and Hoeschele I

Genetics, 199: 205–222. Journal PDF

Featured article. Link
In Vivo identification of eugenol-responsive and muscone-responsive mouse odorant receptors

McClintock TS, Adipietro K, Titlow WB, Breheny P, Walz A, Mombaerts P and Matsunami H

Journal of Neuroscience, 34: 15669–15678. Journal PDF

Featured article. Link
Kernel-based aggregation of marker-level genetic association tests involving copy-number variation

Li Y and Breheny P

Microarrays, 2: 265–283. Journal PDF
Statistical challenges and opportunities in copy number variant association studies

Breheny P, Li Y and Charnigo R

Journal of Biometrics and Biostatistics, 3: e118. Journal PDF
Genetic association studies of copy-number variation: should assignment of copy number states precede testing?

Breheny P, Chalise P, Batzler A, Wang L and Fridley BL

PLoS ONE, 7: e34262. Journal PDF
Genomics of mature and immature olfactory sensory neurons

Nickell MD, Breheny P, Stromberg AJ and McClintock TS

Journal of Comparative Neurology, 520: 2608–2629. Journal PDF

Inference for penalized regression estimators

Feature-specific inference for penalized regression using local false discovery rates

Miller R and Breheny P

Statistics in Medicine, 42: 1412–1429. Journal PDF R

Marginal false discovery rate control for likelihood-based penalized regression models

Miller RE and Breheny P

Biometrical Journal, 61: 889–901. Journal PDF R Reproduce

Marginal false discovery rates for penalized regression models

Breheny PJ.

Biostatistics, 20: 299–314. Journal PDF R Reproduce

Deconfounding and mixed models

Penalized linear mixed models for structured genetic data

Reisetter AC and Breheny P

Genetic Epidemiology, 45: 427–444. Journal

Nonconvex penalties

Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection

Breheny P and Huang J

Annals of Applied Statistics, 5: 232–253. Journal PDF R

The Mnet method for variable selection

Huang J, Breheny P, Lee S, Ma S and Zhang C

Statistica Sinica, 26: 903–923. Journal PDF R

Strong rules for nonconvex penalties and their implications for efficient algorithms in high-dimensional regression

Lee S and Breheny P

Journal of Computational and Graphical Statistics, 24: 1074–1091. Journal PDF R

Grouped (hierarchical) variable selection

Overlapping group logistic regression with applications to genetic pathway selection

Zeng Y and Breheny P

Cancer Informatics, 15: 179–187. Journal PDF R

Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors

Breheny P and Huang J

Statistics and Computing, 25: 173–187. Journal PDF R

A selective review of group selection in high-dimensional models

Huang J, Breheny P and Ma S

Statistical Science, 27: 481–499. Journal PDF

Bi-level variable selection

The group exponential lasso for bi-level variable selection

Breheny P.

Biometrics, 71: 731–740. Journal PDF R

Penalized methods for bi-level variable selection

Breheny P and Huang J

Statistics and Its Interface, 2: 369–380. PDF R

Visualization of regression models

Visualization of regression models using visreg

Breheny P and Burchett W

R Journal, 9: 56–71. Journal PDF R Website

Survival analysis

Peptide receptor radionuclide therapy improves survival in patients who progress after resection of gastroenteropancreatic neuroendocrine tumors

Borbon LC, Sherman SK, Breheny PJ, Chandrasekharan C, Menda Y, Bushnell D, Bellizzi AM, Ear PH, O’Dorisio MS, O’Dorisio TM, Dillon JS and Howe JR

Annals of Surgical Oncology, 32: 1136–1148. Journal

Cross-validation approaches for penalized Cox regression

Dai B and Breheny P

Statistical Methods in Medical Research, 33: 702–715. Journal

Genetics and genomics

Penalized multimarker vs. single-marker regression methods for genome-wide association studies of quantitative traits

Yi H, Breheny P, Imam N, Liu Y and Hoeschele I

Genetics, 199: 205–222. Journal PDF

Featured article. Link

In Vivo identification of eugenol-responsive and muscone-responsive mouse odorant receptors

McClintock TS, Adipietro K, Titlow WB, Breheny P, Walz A, Mombaerts P and Matsunami H

Journal of Neuroscience, 34: 15669–15678. Journal PDF

Featured article. Link

Kernel-based aggregation of marker-level genetic association tests involving copy-number variation

Li Y and Breheny P

Microarrays, 2: 265–283. Journal PDF

Statistical challenges and opportunities in copy number variant association studies

Breheny P, Li Y and Charnigo R

Journal of Biometrics and Biostatistics, 3: e118. Journal PDF

Genetic association studies of copy-number variation: should assignment of copy number states precede testing?

Breheny P, Chalise P, Batzler A, Wang L and Fridley BL

PLoS ONE, 7: e34262. Journal PDF

Genomics of mature and immature olfactory sensory neurons

Nickell MD, Breheny P, Stromberg AJ and McClintock TS

Journal of Comparative Neurology, 520: 2608–2629. Journal PDF

Visualization of regression models using `visreg`