Thinking About and Approaching Statistics

# Thinking About and Approaching Statistics

## Waylon Howard, PhD

### Slides available at <http://tinyurl.com/pharmerit-slides>

### PDF slides at <http://tinyurl.com/pharmerit-pdf>

???

Welcome to our session! My name is Waylon Howard. Before we begin please note that you can find the link to the HTML version of my slides as well as the PDF version here. I'd like to start with an introduction.

---

.pull-left[
 <img class="circle" src="images/avatar-icon.png" width="200px"/>
 
 Global Director of Biostatistics 
 and Data Analytics 
 
 <img src="images/ConcertAI-logo.png" width="250px"/>

[<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 448 512"><path d="M416 32H31.9C14.3 32 0 46.5 0 64.3v383.4C0 465.5 14.3 480 31.9 480H416c17.6 0 32-14.5 32-32.3V64.3c0-17.8-14.4-32.3-32-32.3zM135.4 416H69V202.2h66.5V416zm-33.2-243c-21.3 0-38.5-17.3-38.5-38.5S80.9 96 102.2 96c21.2 0 38.5 17.3 38.5 38.5 0 21.3-17.2 38.5-38.5 38.5zm282.1 243h-66.4V312c0-24.8-.5-56.7-34.5-56.7-34.6 0-39.9 27-39.9 54.9V416h-66.4V202.2h63.7v29.2h.9c8.9-16.8 30.6-34.5 62.9-34.5 67.2 0 79.7 44.3 79.7 101.9V416z"/></svg> @waylon-howard](https://www.linkedin.com/in/waylon-howard/) 
 [<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 496 512"><path d="M165.9 397.4c0 2-2.3 3.6-5.2 3.6-3.3.3-5.6-1.3-5.6-3.6 0-2 2.3-3.6 5.2-3.6 3-.3 5.6 1.3 5.6 3.6zm-31.1-4.5c-.7 2 1.3 4.3 4.3 4.9 2.6 1 5.6 0 6.2-2s-1.3-4.3-4.3-5.2c-2.6-.7-5.5.3-6.2 2.3zm44.2-1.7c-2.9.7-4.9 2.6-4.6 4.9.3 2 2.9 3.3 5.9 2.6 2.9-.7 4.9-2.6 4.6-4.6-.3-1.9-3-3.2-5.9-2.9zM244.8 8C106.1 8 0 113.3 0 252c0 110.9 69.8 205.8 169.5 239.2 12.8 2.3 17.3-5.6 17.3-12.1 0-6.2-.3-40.4-.3-61.4 0 0-70 15-84.7-29.8 0 0-11.4-29.1-27.8-36.6 0 0-22.9-15.7 1.6-15.4 0 0 24.9 2 38.6 25.8 21.9 38.6 58.6 27.5 72.9 20.9 2.3-16 8.8-27.1 16-33.7-55.9-6.2-112.3-14.3-112.3-110.5 0-27.5 7.6-41.3 23.6-58.9-2.6-6.5-11.1-33.3 2.6-67.9 20.9-6.5 69 27 69 27 20-5.6 41.5-8.5 62.8-8.5s42.8 2.9 62.8 8.5c0 0 48.1-33.6 69-27 13.7 34.7 5.2 61.4 2.6 67.9 16 17.7 25.8 31.5 25.8 58.9 0 96.5-58.9 104.2-114.8 110.5 9.2 7.9 17 22.9 17 46.4 0 33.7-.3 75.4-.3 83.6 0 6.5 4.6 14.4 17.3 12.1C428.2 457.8 496 362.9 496 252 496 113.3 383.5 8 244.8 8zM97.2 352.9c-1.3 1-1 3.3.7 5.2 1.6 1.6 3.9 2.3 5.2 1 1.3-1 1-3.3-.7-5.2-1.6-1.6-3.9-2.3-5.2-1zm-10.8-8.1c-.7 1.3.3 2.9 2.3 3.9 1.6 1 3.6.7 4.3-.7.7-1.3-.3-2.9-2.3-3.9-2-.6-3.6-.3-4.3.7zm32.4 35.6c-1.6 1.3-1 4.3 1.3 6.2 2.3 2.3 5.2 2.6 6.5 1 1.3-1.3.7-4.3-1.3-6.2-2.2-2.3-5.2-2.6-6.5-1zm-11.4-14.7c-1.6 1-1.6 3.6 0 5.9 1.6 2.3 4.3 3.3 5.6 2.3 1.6-1.3 1.6-3.9 0-6.2-1.4-2.3-4-3.3-5.6-2z"/></svg> @wwwaylon](https://github.com/wwwaylon) 
 [<svg style="height:0.8em;top:.04em;position:relative;" viewBox="0 0 512 512"><path d="M326.612 185.391c59.747 59.809 58.927 155.698.36 214.59-.11.12-.24.25-.36.37l-67.2 67.2c-59.27 59.27-155.699 59.262-214.96 0-59.27-59.26-59.27-155.7 0-214.96l37.106-37.106c9.84-9.84 26.786-3.3 27.294 10.606.648 17.722 3.826 35.527 9.69 52.721 1.986 5.822.567 12.262-3.783 16.612l-13.087 13.087c-28.026 28.026-28.905 73.66-1.155 101.96 28.024 28.579 74.086 28.749 102.325.51l67.2-67.19c28.191-28.191 28.073-73.757 0-101.83-3.701-3.694-7.429-6.564-10.341-8.569a16.037 16.037 0 0 1-6.947-12.606c-.396-10.567 3.348-21.456 11.698-29.806l21.054-21.055c5.521-5.521 14.182-6.199 20.584-1.731a152.482 152.482 0 0 1 20.522 17.197zM467.547 44.449c-59.261-59.262-155.69-59.27-214.96 0l-67.2 67.2c-.12.12-.25.25-.36.37-58.566 58.892-59.387 154.781.36 214.59a152.454 152.454 0 0 0 20.521 17.196c6.402 4.468 15.064 3.789 20.584-1.731l21.054-21.055c8.35-8.35 12.094-19.239 11.698-29.806a16.037 16.037 0 0 0-6.947-12.606c-2.912-2.005-6.64-4.875-10.341-8.569-28.073-28.073-28.191-73.639 0-101.83l67.2-67.19c28.239-28.239 74.3-28.069 102.325.51 27.75 28.3 26.872 73.934-1.155 101.96l-13.087 13.087c-4.35 4.35-5.769 10.79-3.783 16.612 5.864 17.194 9.042 34.999 9.69 52.721.509 13.906 17.454 20.446 27.294 10.606l37.106-37.106c59.271-59.259 59.271-155.699.001-214.959z"/></svg> wwwaylon.github.io](https://wwwaylon.github.io/) 

]

<medium>Education</medium>
 
 Ph.D., Quantitative Psychology
 University of Kansas (2012); Advisor: [Todd D. Little](https://scholar.google.com/citations?user=T-dKKGkAAAAJ&hl=en)

<medium>Currently</medium>
 
Lead a [10-person](https://www.concertai.com/) team 
Oncology focused RWE analytics 
- [Clinical: EMR curation](https://www.concertai.com/) 
- [Humanistic: PRO data]() 
- [Economic: claims]()

[SymphonyAI](https://www.prweb.com/releases/2018/06/prweb15527516.htm) acquisition 
- [More than 60%](https://www.concertai.com/) growth 
- Developed [15+ SOP, WI, GD]()

]

???
Originally trained as a Quantitative Psychologist, I have studied applied statistics and research methodology for over 10 years, and gone on to develop advanced expertise in the application and advancement of quantitative measurement and analysis practices within the social, behavioral, and health sciences. My research interests include modeling individual, group, and developmental differences, general structural equations modeling techniques, construct validation, measurement, and missing data analysis.

Currently, I lead biostatistics for ConcertAI. I'm based in Seattle, WA and have staff of 10 MS-level analysts across the US and in India.

Our mission is to provide ConcertAI with responsive, efficient, and high quality analytical support. We assume a leadership role as we collaborate with investigators and clients across all phases of their research.

QQQ about data

---

- [Research and Leadership](#my-work) 
 
- [Motivating example](#mot-exa) 
 
- [Modern modeling highlights](#high) 
 
- [Summary](#value)

???
The first part of the presentation is focused on General overview of experience in quantitative statistical analysis and interests including a recent experience with a challenging or innovative data and the second part is about add value as part of PCO and RWE teams based on the experience highlighted in the first part.
---

<div class="my-footer">Slides at http://tinyurl.com/pharmerit-slides &emsp; &emsp; &emsp; &emsp; &emsp; &emsp; &emsp; &emsp; &emsp; Return to <a style="color:White;" href="slide_deck.html#toc">Table of Contents</a></div>

---

name:my-work

## Interdisciplinary-oriented collaborator

<img src="images/papers.png" style="width: 650px;"/>
 
--

***
 
My work has appeared in [**22**](https://scholar.google.com/citations?user=wUACzXkAAAAJ&hl=en) different peer-reviewed journals garnering [**518**](https://scholar.google.com/citations?user=wUACzXkAAAAJ&hl=en) citations, with an h-index of [**13**](https://scholar.google.com/citations?user=wUACzXkAAAAJ&hl=en) and an i-10 index of [**14**](https://scholar.google.com/citations?user=wUACzXkAAAAJ&hl=en).

???
A researcher's vision is often constrained by how they think about and use data.
It is tough to watch researchers develop intricate theories about how the world works, which represent a lot of deep thinking about a topic, only to cut them up into smaller chunks that are then crammed into canned statistical procedures that were never designed to address the original question to begin with.
Methodological advances allow us to ask more sophisticated questions. But how do researchers stay current with advances in methodology and data analysis?

I love to dig into all phases of research projects - from planning to publication. I work to identify and outline investment to build and maintain research capacity by providing responsive, efficient, and high-quality analytical support.
My work includes multivariate approaches to measurement and analysis of substantive problems where I have led simulation-based research and contributed to recommendations for applied researchers. In addition, I have conducted substantive research using structural equation modeling techniques as a general data analytic approach to studying individual, developmental, and socio-contextual differences within the social, behavioral, and health sciences. I also have extensive grant experience in relation to study design, statistical analytic plans and power analyses.

I have enjoyed many leadership positions throughout my career and believe that personal and professional relationships form the backbone of all sustainability and growth, and that a reflective focus on continuous improvement data informs good decision making.
I look forward to discussing

---

name:my-work

## Grants and contracts

I have directly collaborated with researchers to attain [**more than $5 million**](https://orcid.org/0000-0002-0355-2244) in new research funding and developed considerable experience applying diverse best-practice methods to complex problems in new areas.

???

***

## Training

Organized, managed, and taught an 8-course advanced methodology workshop for faculty and graduate students (N = 60) and a 6-course basic methodology training series for residents (N = 150).

<!--
These slides were built using

```r
R.version.string
```

```
[1] "R version 4.0.2 (2020-06-22)"
```

```r
rstudioapi::versionInfo()$version
```

```
[1] '1.3.1056'
```
-->

???
I have created and taught in training conferences.

---

## Mentoring

The coalescence of a new vision for team leadership within the organization that resulted in new funding, partnerships, and additional value offered by the team.

???
I have enjoyed many leadership positions throughout my career and believe that personal and professional relationships form the backbone of all sustainability and growth, and that a reflective focus on continuous improvement data informs good decision making. My mission as a leader is to support staff and systems to achieve sustainable growth in services, funding, staff/faculty satisfaction, and strategic partnerships. To accomplish this, I focus on a number of critical leadership effectiveness efforts including: strategic planning, productivity and quality, funding and partnerships, reflection, and mentoring.

***
<img src="images/scholarly.png" style="width: 725px;"/>

???

---
class: middle, center

## Is this change *significant*?

## Different question:

## Is this change *sustainable*?

---

### Remaining sustainable

- Services billed increased by [38.6%]()    
- Team averaged [78.3%]() billable time    
- Compound annual growth rate [+9.2%]()

???

---

name:mot-exa

# Motivating example

???

---

<img src="images/napkin.jpg" style="width: 720px"/>
 
???
 Several years ago I worked as a quantitative methodologist for a large research center focused on how children develop and learn. One of the main ideas of this group was to translate social and developmental psychology theory into effective interventions to enhance social and academic outcomes.
 
 My primary area of research was the development and application of novel statistical methods to better translate the kind of benefits that we can get from a conceptual simulation study into real-world settings where the application is often not so good. We had to solve all kinds of methodological problems and technical limitations (e.g., missing data; see Howard, Rhemtulla & Little, 2015) in a research space where over-simplified data analytic practices persist for decades. I found that the application of advanced statistical methods, particularly within the structural equation modeling framework, were really interesting in this context and very challenging.
 
 One of our projects focused on progress monitoring of a new composite communication score to assess early language performance, quantify rates of development, and determine how individuals respond to intervention. What struck me was the enormous gap between the proposed statistical methods and the research questions.
 
 As a statistical consultant, I worked closely with the research team to focus on the theory. This is a path diagram drawn by the primary investigator from one of those meetings that demonstrates a deep theoretical vision for language development.
 
 I often get diagrams like this and I love to see them. What I want you to notice is that there is a lot going on here, we have multiple processes interacting in some really interesting ways. In this diagram you see the forest rather than the trees - which is to say that we are not focusing on one regression path or mean comparison here, rather we are looking into a complex system and all the effects are within the context of all the other pieces of the model.
 
---

## First motivating example for today

![](slide_deck_files/figure-html/eci-1.gif)

???

This is a plot of some data collected for this project.

Notice that each line represents a different form of communication - so the flat line is gesturing, the line above that is vocalizations, we also have single words and then multiple words. Look at how vocalizations seem to peak around 18 months then decline - also referencing this peak notice how the use of single words is accelerating. The idea here is that children transition from one communication strategy to another and this tool seems to capture it.

The question is how to get from data collection with this tool to evaluating the theory of change illustrated above. Traditional approaches might include the creation of multiple-item scale scores (e.g., sum all the communication scales into a total score that are tested using ANCOVA or multilevel modeling - but where is this indicated in the theoretical diagram above? Consider how focusing on one communication measure at a time (i.e., gestures, vocalizations, single- and multiple-word utterances) or an aggregate of all communication scores misses the point.

We wanted to identify inter-individual differences in intra-individual change in language development. Unlike traditional approaches latent growth curve modeling allowed for a more accurate and flexible approach to analyzing repeated measures data by simultaneously modeling change in the means (variable-centered) and in the variance and covariance of level and change (person-centered) across all forms of communication shown in the plot above - within the same model. This model allowed for testing of precursors and consequences of change and multiple group differences in these trajectories and predictive relationships.

Total communication is the weighted combination of the child's gestures (1 X each event), vocalizations (1 X each event), single-word (2 X each event) and multiple-word utterances (3 X each event).

---

# Exemplary LGM model

[Path diagram](https://www.google.com) 
]

[Parameter illustration](https://www.statscamp.org/)
]

???
To address this challenge I used the Latent Growth Curve Modeling approach within the Structural Equation Modeling Framework.

But Why did I go with the LGM over other possible approaches such as a repeated measures ANOVA, or even a multilevel model?

LGM allowed for enormous flexibility in the specification of change over time which allowed for a better correspondence between the statistical model and the theory.

RM-ANOVA and other variations of it model a group mean and treat variation as error - this was inconsistent with our goals because we wanted to actually model the individual differences. Both MLM and LGM allow us to examine intra-individual (within person) change over time AND inter-individual (between person) variability in intra-individual change because these are random effects models.

Notice the diagram on the bottom - each black line represents an individual person that can have their own intercept and slope. As you can see- some start higher, some lower, some increase, some decrease, some are flat. Well, we can determine the average starting point - denoted with a green dot here and also capture the variability around that average intercept. Similarly, we can estimate an average slope and variability around that slope.

In the MLM approach, latent growth curve models are limited to a single growth curve, and the intercept and slope of a latent growth curve cannot predict other variables. This would again require that we change our theory to suit the statistical method - so this is not appropriate.

With LGM we can fit a parallel process model and simultaneously estimate all these relationships.

---

???
This diagram illustrates our final model. The interesting applied statistics problem here was the application of advanced statistical techniques to ask more sophisticated questions and tell more compelling stories.

Our vision is constrained by how we think about and use data. Too often we develop intricate theories about how the world works, which represent a lot of deep thinking about a topic, only to cut them up into smaller chunks that are then crammed into canned statistical procedures that were never designed to address the original question to begin with. I am committed to identifying such practices, providing modern demonstrations of their disadvantages, and explaining available alternatives, to discourage their further use. This requires strong communication with stakeholders who often want to know how (mediation) and when (moderation) predictive relations hold or are strong versus weak or want more flexibility in examining change processes over time.

---

# Highlights: Measurement invariance
 
## [Why?]()

### [To ensure we are measuring the same constructs across groups and time - allows for structural invariance testing](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7920600/)

???
For example, measurement invariance can be used to study whether a given measure is interpreted in a conceptually similar manner by respondents representing different genders or cultural backgrounds
---

### 1. Configural invariance

???

---

]

???

---

???

---

# A note about efficiency tools

.pull-left[
<img src="images/manyfiles.png" width="85%" height="55%">
Numerous [Mplus](https://www.statscamp.org/) output files
]

Example model fit table [automation](https://www.statscamp.org/)
]

???
This figure illustrate

---

# A note about organizational tools

.pull-left[
<img src="images/gmm.png" width="100%" height="100%">
 
a complex, iterative process...
]

.pull-right[
 
<img src="images/excel.png" width="100%" height="100%">

 
a transparent, traceable tool
]

???
---

# Highlights: moderated mediation
 
## [Why?]()

### [We often want to know how (mediation) and when (moderation) predictive relations hold or are strong versus weak](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7920600/)

???

---

.pull-left[
### Conceptual diagram
<img src="images/modmed1.png" width="100%" height="100%"> 
 
See [Full-text available](https://www.researchgate.net/publication/326480880_Social_Support_Dysfunctional_Coping_and_Community_Reintegration_as_Predictors_of_PTSD_Among_Human_Trafficking_Survivors)

]

]

???

---

# Highlights: Missing data and power

???

So, missing data is really an incredible area of research. It occurs in most areas of the applied sciences and knowledge of missing data can function as a sort of repair kit, to help you can get back lost information in your datasets. We can actually use missing data theory when designing studies to decrease participant burden and research expense.

But that's not what I want to talk with you about because today I want to focus on why this matters and what we should consider when handling missing data.

My primary area of research is the development of novel statistical methods to better translate the kind of benefits that you can understand from an experimental simulation setting into real-world settings where the application is often not so good. We have to solve all kinds of limitations. One of my main areas of interest has been missing data where I have led simulation-based research and contributed to best practice recommendations for applied researchers.

---

.pull-left[
### Missing Data Analysis
<img src="images/mechanisms.png" width="100%" height="100%">
Simulation in MBR [Full-text](https://www.tandfonline.com/doi/abs/10.1080/00273171.2014.999267) 
Reporting practices [Full-text](https://journals.sagepub.com/doi/abs/10.1177/0165025415618275)

]

.pull-right[
### Monte Carlo power simulations
<img src="images/medpowercuves_square.png" width="100%" height="100%">
Software wrangling 
Statistical library and templates

]

???

---

# Summary

**1.** Leading technical experts for the past 10 years 
 
**2.** Extensive real-world experience applying diverse best-practice methods to complex problems in new areas 
 
**3.** Readily pivot from the strategic to hands on 
 
**4.** Thrive in fast-paced, changing work environment

???
As a Quantitative Psychologist, I've been leading technical experts in applied statistics and data management for the past 10 years acorss academic, industry, and non-profit spaces. I've provided responsive, efficient, and high-quality analytical support to organizations while building technical teams, improving profitability and workflow, and advancing the quality of research. I've attained more than 500 citations for peer-reviewed scientific publications and directly collaborated with researchers to garner more than $5 million in new research funding. I have extensive experience applying diverse best-practice methods to complex problems in new areas. I'm a global director in a startup oncology research space currently but my real passion is research methods, statistics, and measurement within the social and behavioral sciences.
---

#Thanks!
 
## Any further questions?

- Slides created via the R package [xaringan](https://github.com/yihui/xaringan) by Yihui Xie
- Slides' source code at <https://wwwaylon.github.io/assets/ph-2021/>
- R code from throughout the slides as an R script as [slide_code.R](https://raw.githubusercontent.com/wwwaylon/wwwaylon.github.io/master/assets/ph-2021/slide_code.R)

???

---
class: inverse, center, middle

# Appendix

---

## Previous research appointments

<img class="circle" src="images/hutch.png" width="35px"/> Biostatistics Manager 
 <img class="circle" src="images/atrium.jpg" width="35px"/> Dir. of Biostatistics 
 <img class="circle" src="images/seachildrens.jpg" width="35px"/> Dir. of Biostatistics Core 
 <img class="circle" src="images/umass.png" width="35px"/> Faculty Res. Methodologist 
 <img class="square" src="images/kki.png" width="35px"/> Dir. of Res. & Evaluation 
 <img class="circle" src="images/hopkins.png" width="35px"/> Senior Res. Data Analyst 
 <img class="square" src="images/ku.png" width="35px"/> Quantitative Analyst 
 
- Position details [here](https://wwwaylon.github.io/appointments/)

???

---

## Full-text research content

- [Peer-reviewed research articles](https://wwwaylon.github.io/publications/) 
 
- [Talks, workshops, posters](https://wwwaylon.github.io/presentations/)

???

---

## Exemplar research tools

* Consultant-based [effort estimator](https://whowar.shinyapps.io/Proj_est2021/) 
 
* Timeline visualizations for [project management and communication](https://whowar.shinyapps.io/Projects/) 
 
* Rmarkdown automation of CV, NIH biosketch, NSF biosketch, etc.

???

---

## Example standard process development

| Training | Document ID | Document Title |
|:-----------|:---------|:------------------------------------------|
| [Course-015]() | [GD-OSS015-R01](#gd-oss015-r01) | Statistical Programming Best Practices |
| [Course-016]() | [GD-OSS016-R01](#gd-oss016-r01) | Analysis Datasets Results Verification |
| [Course-017]() | [GD-OSS017-R01](#gd-oss017-r01) | Reusable Code Validation |
| [Course-020]() | [GD-OSS020-R01](#gd-oss020-r01) | Procedures for Annotation and QC of TLFs |
| [Course-001]() | [CHKLST-OSS001](#chklst-oss001) | Source Code Version Control |
| [Course-002]() | [CHKLST-OSS002](#chklst-oss002) | Programming Quality Control Checklist |
| [Course-003]() | [CHKLST-OSS003](#chklst-oss003) | Disk Space Management Checklist |
| [Course-004]() | [CHKLST-OSS004](#chklst-oss004) | AWS Import/export Checklist |
| [Course-005]() | [CHKLST-OSS005](#chklst-oss005) | Data Delivery Checklist |
| [Course-006]() | [CHKLST-OSS006](#chklst-oss006) | Chemo Master List |
| [Course-006]() | [WI-OSS001](#wi-oss001) | De-identification and Data Transfer |

---

## Software summary (not all-inclusive)

- [SAS 9.4](https://www.sas.com/en_us/home.html) (STAT, IML) 
 
- [Mplus 8.4](https://www.statmodel.com/) (base, mixture, multilevel) 
 
- [IBM SPSS statistics 25](https://www.ibm.com/products/spss-statistics) (base, missing values, AMOS) 
 
- [R](https://www.r-project.org/) (data manipulation: dplyr, tidyr, stringr, RMySQL, RSQLite; data visualization: ggplot2, htmlwidgets; reporting: shiny, rmarkdown; analysis: lavaan, psych, lme4/nlme, survival)

---

# Methodology interest summary

- Design and measurement issues in longitudinal research, panel designs, latent growth curve analysis, latent class and finite mixture modeling, multi-level SEM with longitudinal data, moderation and mediation, missing data analysis and power, and measurement invariance testing.