WNAR Invited Sessions

Synopses of WNAR Invited Sessions

Missing data in regression: beyond existing modelling assumptions

Missing data are very common in applied data analysis and biostatistical applications. Most existing methods for regression analysis with missing data require parametric modeling assumptions and only hold under the restrictive assumption of missingness at random. However, recent methodological developments have yielded novel techniques for handling missing data in the context of regression models that are valid under weaker assumptions than have been typically required.

Statistical methods for neuroimaging data

Technological advances including novel magnetic resonance-based techniques and computerized clinical diagnostics provide complex functional and imaging data that promise a better understanding of psychiatric and neurological disorders and improved clinical care. Ongoing work by statisticians has centered on modeling these modalities to study etiology and development through such high-dimensional functional data.

Statistical methods to improve drug and vaccine safety surveillance using big healthcare data

The use of large, distributed healthcare databases to conduct comparative safety and effectiveness research has increased dramatically in recent years. But bigger data does not necessarily yield better evidence. Many unique statistical challenges arise when using electronic health records (EHRs) and health insurance claims that are collected by multiple health plans for purposes other than research. New methods tailored to improve safety signal detection in the context of large, multi-site database networks for drug and vaccine safety surveillance are being developed and utilized by stakeholders in academia, government and health care organizations.

Statistics and human rights

Statistics has a long history in the field of human rights — on both the side of good and the side of evil.  Federal data systems have been used to locate citizens in order to commit genocide. Today, statisticians are developing evidence used in war crimes trials against perpetrators. Statisticians have are also working on the problem on wrongful convictions, some of which have been caused by lawed forensics work.  Researchers are working to develop a solid foundation for forensic techniques, both to help solve current crimes and to reduce the risk of sending more innocent people to prison.

Novel statistical methods for determination of patient classification in personalized medicine: illustrations in cystic fibrosis

Cystic Fibrosis (CF) is a genetic disease that dramatically decreases life expectancy and quality. Despite the single disease causing gene in CF, there remains large unexplained variability in disease progression and patient outcomes. Novel methods for identifying classes of patients to better understand these differences have been developed and are allowing practitioners to move toward a more personalized approach to treatment.

Recent methods development for cancer screening

Cancer causes significant mortality and morbidity worldwide. How to guide screening for individuals with potential risks of developing cancer is a critical public health concern. While screening can reduce mortality through an early detection of cancer cases, it is costly and resource-intensive with potential harms involving overdiagnosis and radiation exposures. To maximize the benefit of early detection through screening while minimizing its harms, it is important to identify those who are at high risk of developing cancer. Recently, new methods have emerged that evaluate various cancer screening strategies, including a risk-stratification approach using common genetic variants identified by genome-wide association studies (GWAS), decision models for estimating population-level outcomes of screening, and algorithms for simulating population-level risk factor data to aid decision analyses for cancer screening.

Statistical analysis of wearable sensor data

The past five years has seen an explosion in the number of devices that can be used to capture and quantify human movement, and in the number of studies that use these devices to study complex patterns of activity. Wearable devices from companies like Fitbit and Jawbone along with most current smartphones contain accelerometers, gyroscopes, and sometimes GPS receivers which generate high frequency time series data, with possibly hundreds of observations per second.  Researchers are just beginning to explore how these data can be used to improve human health, with applications ranging from using subtle changes in fine motor movements to predict disease, to understanding how large-scale patterns of human behavior are related to health outcomes. The statistical techniques used to analyze wearable sensor data draw inspiration from a number of different fields including engineering/finance (time series), applied mathematics (functional data analysis), and computer science (machine learning).

A continuous approach to interpreting forensic DNA profiles

DNA profiling has now been used in forensic science for 25 years and it has been hugely beneficial. Early doubts about appropriate statistical methods for attaching numerical weight to matching profiles have largely been resolved, but a substantially new approach seems likely to be adopted by forensic agencies worldwide. At present, profile interpretation begins with a determination of which alleles at a series of genetic markers are present, and these alleles are subsequently compared with those of known or suspected contributors to the profile. Allelic determination is made in the first place by software provided by the typing equipment manufacturer and then confirmed or modified by the forensic analyst. Many difficulties can arise, especially for profiles with multiple contributors and/or for profiles from limited or degraded DNA. The typing signal for allele may not reach a calling threshold and the allele is said to have “dropped out,” or the signal for an allele may be accompanied by a “stutter” that can be confounded with a second allele, or extraneous alleles may have “dropped in,” and so forth. There is also a danger that declaring drop-out or drop-in may depend on what the analyst expects to see. There is the risk that failing to call an allele would be prejudicial to a suspect who would be excluded if that allele was called, and the risk to the prosecution case by failing to include alleles that would not exclude a suspect.

There have been recent efforts to change the binary present/absent determination for alleles to a continuous model that uses all the peaks in the typing electropherogram to indicate the amount of DNA present, regardless of thresholds or stutter positions. Algorithms for obtaining probabilities for the presence of alternative sets of alleles in a profile have been developed. The calculations are moving beyond those that can be done manually by an analyst but they remove the possibility of favoring either prosecution or defense. At least in the short term, there will be a new demand on statisticians to interpret these continuous-approach results for judges and juries.

Statistical innovation for network analysis

Network analysis has become an increasingly useful tool for widely ranging fields from biology to social sciences. It is closely related to other statistical approaches, including covariance matrix estimation and graphical models. Recently, the ever-growing dimensionality and complexity of data arising in practice has posed unique challenges on traditional network methods, and require statistical innovations.

Advances in methodology for causal inference

In many scientific questions, it is of interest to establish a causal link between potential causative agents and an outcome of interest rather than a mere association. Doing so, however, can be difficult, especially in the context of observational data. Causal inference is the branch of statistics that studies conditions under which collected data may be used to assess causal links and statistical methods for doing so. Problems in causal inference can be especially difficult because of the need to carefully account for potential confounders. Since there may be many confounding variables, the resulting statistical problem is often difficult and novel techniques are needed to tackle this problem in a flexible and efficient manner.

Advances in nonparametric and semiparametric inference

Despite their appealing simplicity, parametric models generally do not accurately capture the probabilistic mechanism of interest. This is problematic because model misspecification can have very serious effects on the validity and interpretability of the resulting statistical inference. As an alternative, semiparametric and nonparametric models are often preferable because they can be much more flexible and thus reduce the risk of model misspecification. Nevertheless, inference in these models can be complicated. Important research to determine novel, flexible techniques for efficient estimation in very large statistical models is currently ongoing. More importantly, significant energy is being invested in rendering these techniques accessible to practitioners.

Statistical inference in complex sampling designs

In designing scientific studies, scientific considerations are often at odds with logistic or financial considerations. For the sake of pragmatism, investigators often conduct studies whose design ensures that sufficient valuable information can be gathered even within tight resource limitations. However, performing unbiased and efficient inference using data from these complex sampling designs can be challenging. In this session, the analysis of data from complex study designs will be discussed. The emphasis will be on two-phase sampling designs and its variants; in these studies, information about a marker that is difficult or expensive to measure is obtained only in a subsample of study participants, and the selection of this subsample can depend on outcomes observed over the course of the study. These designs are extremely common in biomedical research and appropriate techniques for statistical inference remain an important and very active area of research.