So far we’ve mostly looked at one-way experimental designs. However, in many research contexts we’re interested in evaluating the effect of more than one factor on our response. We also talked in the last several lectures about how one should be cautious when adding arbitrary interactions terms to their regressions which can affect interpretability. These two topics actually end up dove-tailing nicely…
A factorial experiment is simply one where our treatments reflect the combination of all possible levels of our experimental factors. A very common example is the \(2\times2\) factorial experiment, which is an experiment that has two factors, each of which has two levels. (Too much beyond this can quickly get unwieldy).
Consider the example of a simple acceptability judgment study in in
Dillon
and Hornstein (2013). The authors measured acceptability ratings on
a 1-7 rating scale, and they manipulated two features of
their stimuli: the presence or absence of a wh-dependency (the
WH factor), and the syntactic category of the verb’s
argument (NP or small clause complement, the CAT factor).
Crossing these two factors led to four unique conditions, our
treatments:
Factorial designs are popular, and with good reason: they are very efficient ways of collecting data about the effects of different factors of interest. The reason for this is that in the context of a factorial design, every observation gives us information about the effect of both of our factors. That is, any judgment offered in this study gives us information about the effect of wh-movement on judgments, as well as the effect of complement category. This makes these designs very efficient for comparing the effects of two orthogonal factors, and will result in a more powerful statistical test.
Let’s look at a sample version of the data first which contains subject-level means for the conditions of interest. If there were 48 total participants and the design was fully between-subjects, how many subject-means per combined-factor will we have? Try to think through this in your head first and then calculate it quickly in R (a one-liner call should do it). When you do so, also calculate the factor-level rating means for each of the four crossed-conditions.
islands.data <- read.table('https://spencercaplan.org/teaching/F25/Statistics/data/sample-islands-data.txt',
header=T)
head(islands.data, 8)
## subj gap_site movement value
## 1 1 NP -WH 5.955382
## 2 2 NP -WH 6.153561
## 3 3 NP -WH 6.333972
## 4 4 NP -WH 6.369240
## 5 5 NP -WH 6.568217
## 6 6 NP -WH 5.532630
## 7 7 NP -WH 5.879555
## 8 8 NP -WH 6.024270
# Don't forget to set up the factors correctly at the start!
islands.data$gap_site <- as.factor(islands.data$gap_site)
islands.data$movement <- as.factor(islands.data$movement)
(Try not to peak at the answer in the source below until you’ve attempted this yourself!)
In factorial experiments of this sort, there are a number of
questions we might ask. In this instance, we might ask: does the
CAT factor have a reliable effect on the responses? How
about the WH factor? And are these factors independent, or
do they interact?
Let’s consider what it would mean for there to be a general effect of
CAT. If we average across the rows of our table, we have
the marginal means for each level of the
CAT factor:
islands.data %>% group_by(gap_site) %>% dplyr::summarise(rating = mean(value))
## # A tibble: 2 × 2
## gap_site rating
## <fct> <dbl>
## 1 NP 4.55
## 2 SC 5.94
These marginal means suggest that there is probably
an effect of CAT: if we collapse across levels of the
WH factor, we see a difference of more than a point on our
rating scale. We’ll call this the main effect of
CAT: from this point of view, it seems that people prefer
SC complements to NP complements in our stimuli….
Likewise, averaging across the columns of the table will give us the
marginal means for each level of the WH factor:
islands.data %>% group_by(movement) %>% dplyr::summarise(rating = mean(value))
## # A tibble: 2 × 2
## movement rating
## <fct> <dbl>
## 1 -WH 6.36
## 2 +WH 4.13
Woah! An even larger descriptive difference. This looks like an
overall 2 point drop due to WH-movement: the main
effect of WH. People tend to rate sentences with
WH-movement as worse than sentences without WH-movement.
Each of these main effects may be of theoretical interest. For
instance, we may be interested in the main effect of WH as
an index of the processing difficulty associated with constructing a
filler-gap dependency. We may also be interested in the main effect of
complement as an index of the preferred syntactic expression of
event-denoting complements of perception verbs.
However, look at our full set of means again that you derived
initially (like, actually give it a think). We see that the marginal
main effect of WH doesn’t really give us the full picture.
If we look at our data, we see that that marginal main effect arises
because there is a 1 point drop due to WH-movement in the SC conditions,
and a 3 point drop in the NP conditions. In other words, WH
and CAT seem to interact: the size of the
effect of one factor depends on the level of the other. Any
effect of this we will call the interaction of
WH and CAT. And this looks especially damning
for the main effect of CAT: there’s almost no difference
due to CAT for non-WH-movement structures.
How to read an interaction
We thought of the main effects as differences in the marginal means. We can think of the interaction as a difference in differences.
This is easiest to see with a quick plot:
# If you're just doing exploratory analysis sometimes just using the base R plot is easier than a full-fledged ggplot2 solution. There's a built-in "interaction plot" for exactly this scenario
with(islands.data,
interaction.plot(movement, gap_site, value)
)
For a little bit of practice though, quickly try
the following:
ggplot to visualize this:geom_line) to connect matching conditions
(as with the simple R plot above)se estimates to add horizontal bars via
geom_errorbargap_site condition values differently (use
any colors you prefer or leave the defaults but I like “red” and
“dodgerblue” for a simple contrast like this)Leaving aside the ggplot business, above I created an
interaction.plot(). This plots the mean for each
combination of the factors, with one factor on the x-axis, and the other
factor represented as two different lines. This allows us to visualize
the effects our two factors might have on our response. Here we see that
the +WH conditions are overall lower than the -WH conditions: people
rate sentences a little less acceptable when they contain a
wh-dependency. We see that this effect is also dependent on the
position of the gap site: the negative effect of +WH movement is greater
when the sentence contains an NP direct object than when it contains a
small clause complement.
Again, this pattern of means suggests that our factors interact; they do not represent independent dimensions of our stimuli. This is, of course, exactly what is predicted by theories that posit grammatical island constraints: there should be a drop in acceptability associated with extraction from an island domain that is significantly larger than any drop in acceptability that occurs under extraction from a non-island domain.
But what would the interaction plot have looked like if they
were independent factors? Suppose we observed only an
effect of WH-movement, but no effect of category (CAT):
hypothetical.data <- read.table('https://spencercaplan.org/teaching/F25/Statistics/data/hypothetical-interactions.txt',
header=T)
with(hypothetical.data,
interaction.plot(movement, gap_site, value1, col='red', lwd=3)
)
In this plot, we can see that there is a constant effect of
WH-movement that is the same across the levels of CAT, but
there is not any real effect of the CAT factor itself. We
can check this by looking at the marginal means for each factor:
hypothetical.data %>% group_by(gap_site) %>% dplyr::summarise(rating = mean(value1))
## # A tibble: 2 × 2
## gap_site rating
## <chr> <dbl>
## 1 NP 5.1
## 2 SC 5
hypothetical.data %>% group_by(movement) %>% dplyr::summarise(rating = mean(value1))
## # A tibble: 2 × 2
## movement rating
## <chr> <dbl>
## 1 +WH 6.05
## 2 -WH 4.05
This shows a negligible difference in the marginal means for the
CAT factor, but a large difference in the marginal means
for the WH factor.
If only WH movement has an effect on our response, something like the following would be a reasonable statistical model of our data:
\[ x_{ijk} = \mu + \alpha_{wh_i} + \epsilon_{ijk} \]
Let \(i\) refer to the current level
of the WH factor, \(j\)
refer to the current level of the CAT factor, and \(k\) to the index of the observation within
a treatment.
So \(\alpha_{wh_i}\) is the effect
of being in the ith level of WH, and \(\epsilon_{ijk}\) is the residual associated
with an individual observation \(x_{ijk}\). This model asserts that only
constant factors and the WH factor contribute predictable
variation to our observations. Everything else (the residuals) is just
normally distributed noise that our model does not explain.
We might have also seen the opposite pattern: a constant effect of
CAT on the ratings, but no effect of WH-movement:
with(hypothetical.data,
interaction.plot(movement, gap_site, value2, col='red', lwd=3))
And again, in this situation we would only see an
effect in the marginal means associated with the CAT
factor:
hypothetical.data %>% group_by(gap_site) %>% dplyr::summarise(rating = mean(value2))
## # A tibble: 2 × 2
## gap_site rating
## <chr> <dbl>
## 1 NP 4
## 2 SC 6
hypothetical.data %>% group_by(movement) %>% dplyr::summarise(rating = mean(value2))
## # A tibble: 2 × 2
## movement rating
## <chr> <dbl>
## 1 +WH 5
## 2 -WH 5
If only CAT has an effect on our response, then an
appropriate statistical model of our data would be:
\[ x_{ijk} = \mu + \alpha_{cat_j} + \epsilon_{ijk} \]
Where \(\alpha_{cat_j}\) is the effect of being in the j-th level of CAT.
A third possible outcome would have been that the two effects each exerted a constant effect on the response variable, but each did so independently of the other. That would look like:
with(hypothetical.data,
interaction.plot(movement, gap_site, value3, col='red', lwd=3)
)
Independent effects of this sort can be diagnosed in the interaction plot by observing parallel lines. In this case, the marginal means would both suggest a difference, and that difference would be the same within each level of the other factor. Inspection of the marginal means for each factor, and the condition means, supports this view:
hypothetical.data %>% group_by(gap_site) %>% dplyr::summarise(rating = mean(value3))
## # A tibble: 2 × 2
## gap_site rating
## <chr> <dbl>
## 1 NP 3
## 2 SC 5
hypothetical.data %>% group_by(movement) %>% dplyr::summarise(rating = mean(value3))
## # A tibble: 2 × 2
## movement rating
## <chr> <dbl>
## 1 +WH 3
## 2 -WH 5
hypothetical.data %>% group_by(gap_site, movement) %>% dplyr::summarise(rating = mean(value3))
## `summarise()` has grouped output by 'gap_site'. You can override using the
## `.groups` argument.
## # A tibble: 4 × 3
## # Groups: gap_site [2]
## gap_site movement rating
## <chr> <chr> <dbl>
## 1 NP +WH 2
## 2 NP -WH 4
## 3 SC +WH 4
## 4 SC -WH 6
Here, the marginal means suggest that the main effect associated with each factor is 2 points. Looking at the condition means suggests that that 2 point difference holds within each level of the other factor.
In this last set of hypothetical data (and in fact, in the two
prior), the effects of WH and CAT are both
independent of each other. The visual clue to this is the parallel lines
in the interaction plot. The lines in an interaction plot will be
parallel if the effects of our two factors are strictly
additive. As this term suggests, if WH and
CAT were independent of each other, the following would be
a reasonable statistical model of the data:
\[ x_{ijk} = \mu + \alpha_{wh_i} + \alpha_{cat_j} + \epsilon_{ijk} \]
That is, we could estimate the effect of WH-movement, we could
estimate the effect of CAT, and we could add them together
to get a predicted value for any one treatment/condition. Each exerts a
predictable / stable influence on the response, and
each effect is independent of the other.
It’s worth stopping here to reflect on this point. Consider our judgment to the following island violating sentence in isolation:
We know we don’t like it (I hope!). Maybe it’s a 2 on the rating scale, as in the last set of hypothetical data. The traditional explanation of this observation is that there is a specific grammatical constraint that bans extraction from certain syntactic environments. But as the last plot suggests, it is in principle possible to achieve a low rating (our intuitive observation) by simply adding up penalties accrued by independent factors or constraints. That is, the low level of acceptability we perceive for the island violation could simply be just a penalty for WH-movement plus a penalty for constructing a complex NP.
This is not a pattern predicted if there is a grammatical island
constraint; if there is such a constraint, then the effect of
WH-extraction should be larger when extracting from NP
environments than from SC environments.
Our actual interaction plot didn’t look like these.
Instead, that plot suggested that the size of the WH effect
depends on the level of the CAT factor. That is, it seems
that our two factors interact. Interactions can be diagnosed in the
interaction plot by non-parallel lines. Here’s a hypothetical
example:
with(hypothetical.data,
interaction.plot(movement, gap_site, value4, col='red', lwd=3)
)
In this graph we have a classic “crossing” interaction: the
effect of WH-movement is completely opposite in one level of the
CAT factor than in the other level. Our actual data shows a
non-crossing interactive pattern: the effect of
WH-movement is in the same direction for both levels of the
CAT factor, it just happens to be much larger for the NP
level of the CAT factor:
with(islands.data,
interaction.plot(movement, gap_site, value,col='purple', lwd=3))
The statistical model we would want for this would include another term to account for the interaction of WH movement and CAT:
\[ x_{ijk} = \mu + \alpha_{wh_i} + \alpha_{cat_j} + \alpha_{whxcat_{ij}} + \epsilon_{ijk} \]
The two-way ANOVA (remember that’s just a multiple linear regression using categorical predictors) will allow us to evaluate statistical models like that, that consist of the main effects of each of our experimental factors, as well as their interaction.
Why we need the interaction term here
In this kind of scenario it is critical that we do include the interaction term in our regression model. It’s theoretically motivated, is part of the original experimental design, and as a clear interpretation. What’s more, because the factors in question (as per the experimental design) are categorical then if we diagnose an interaction in the full model, we can choose to split the data and analyze each part separately if we wish to consider the main effects in isolation.
This enterprise is very different from adding arbitrary factors or interactions to a model simply to try and boost \(R^2\)
There are three distinct null hypotheses of interest here:
\(H_{0:WH}\) and \(H_{0:CAT}\) are the null hypotheses that
there is no main effect of WH or CAT,
respectively. \(H_{0:WHxCAT}\) is the
null hypothesis that these two factors do not interact.
An important observation about the main effects of the factors, and their interaction, is that they all capture orthogonal, non-overlapping portions of the variance (if you have an equal number of observations in each cell). Thus, we can actually partition the total sum of squares in a two-way experiment in the following manner:
\[ SS_{total} = SS_{A}+SS_{B}+SS_{AB}+SS_{error} \]
Where A and B are the two factors. If we have equal number of observations in each group, the sum of squares for each of these terms is calculated in the following manner. Suppose we have a levels in A, b levels in B, and n observations in each treatment (people generally also use N to denote the total observations across all levels, though that’s not used directly below):
\[ SS_{TOTAL} = \sum^n_{k=1} \sum^a_{i=1} \sum^b_{j=1} (x_{ijk}-\bar{x})^2 \]
\[ SS_{A} = nb \sum^a_{i=1} (\bar{x}_{i}-\bar{x})^2 \]
\[ SS_{B} = na \sum^b_{j=1} (\bar{x}_{j}-\bar{x})^2 \]
\[ SS_{AB} = n \sum^a_{i=1} \sum^b_{j=1} (\bar{x}_{ij}-\bar{x}_{i}-\bar{x}_{j}+\bar{x})^2 \]
\[ SS_{ERROR} = \sum^n_{k=1} \sum^a_{i=1} \sum^b_{j=1} (x_{ijk}-\bar{x}_{ij})^2 \]
\(\bar{x}_i\) and \(\bar{x}_j\) represent the marginal means in our experiment: the mean for one level of a factor, collapsing across all other levels of the factor. \(\bar{x}_{ij}\) is the condition or treatment mean.
Of all these SS terms, the interaction term \(SS_{AB}\) is the one that probably looks the most unusual. This term is computing the difference between the observed cell mean and the value that would be predicted if there were additive effects of A and B. This difference will be 0 when the factors are strictly additive, and so there will be little in the way of variance to be accounted for in the interaction.
As with the one-way ANOVA, we can get the mean squares for each of these terms by dividing by the appropriate degrees of freedom. For the two-way ANOVA, these are:
\(MS_{TOTAL} = SS_{TOTAL}/(N-1)\)
\(MS_{A} = SS_{A}/(a-1)\)
\(MS_{B} = SS_{B}/(b-1)\)
\(MS_{AB} = SS_{AB}/((a-1)(b-1))\)
\(MS_{ERROR} = SS_{ERROR}/(N-ab)\)
From these mean squared errors, we can devise three \(F\)-tests, one for each of our null hypotheses:
\(F_{A} = MS_{A}/MS_{ERROR}\)
\(F_{B} = MS_{B}/MS_{ERROR}\)
\(F_{AB} = MS_{AB}/MS_{ERROR}\)
Let’s look at this in action with our data:
model <- lm(value ~ movement * gap_site, data=islands.data)
anova(model)
## Analysis of Variance Table
##
## Response: value
## Df Sum Sq Mean Sq F value Pr(>F)
## movement 1 59.395 59.395 211.662 < 2.2e-16 ***
## gap_site 1 23.236 23.236 82.806 1.128e-11 ***
## movement:gap_site 1 12.854 12.854 45.809 2.502e-08 ***
## Residuals 44 12.347 0.281
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A few questions to discuss here:
Really put your understanding to the test
Check out what happens if instead of anova(model) you
instead generate summary(model).
You will see that the outcome (particularly the effect of
gap_site) looks quite a bit different. Why is this?
hint: think about how the coding of categorical
variables works by default in R. And check whether that’s exactly what
we want or not. To really confirm things, try changing the
coding scheme using
e.g. contrasts(islands.data$gap_site) <- contr.sum(2)
and see how that affects the results.
Finally, try to impose a different coding scheme (how about helmert
coding) and see if you can get the output of summary(model)
to align with our intuitions about the data.
It seems that all three tests were (highly) significant. There is a
main effect of both CAT and WH, and an interaction. Before jumping right
to the inference, however, we need to exercise a bit of caution. It is
difficult to interpret our main effects in the presence of an
interaction, especially in the presence of a non-crossing interaction
like the one we have. Since the interaction tells us that the effect of
WH depends on the level of the CAT factor, we would want to
know whether there is a significant effect of WH
within each level of the CAT factor before we make
anything of our main effect. Again, Tukey’s HSD comes in
handy here:
TukeyHSD(aov(model))
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = model)
##
## $movement
## diff lwr upr p adj
## +WH--WH -2.224757 -2.532945 -1.916569 0
##
## $gap_site
## diff lwr upr p adj
## SC-NP 1.391525 1.083337 1.699713 0
##
## $`movement:gap_site`
## diff lwr upr p adj
## +WH:NP--WH:NP -3.2597470 -3.8371621 -2.6823319 0.0000000
## -WH:SC--WH:NP 0.3565349 -0.2208802 0.9339500 0.3627474
## +WH:SC--WH:NP -0.8332322 -1.4106473 -0.2558171 0.0020564
## -WH:SC-+WH:NP 3.6162819 3.0388668 4.1936970 0.0000000
## +WH:SC-+WH:NP 2.4265148 1.8490998 3.0039299 0.0000000
## +WH:SC--WH:SC -1.1897671 -1.7671822 -0.6123520 0.0000106
Question: what do these post-hoc
comparisons suggest we might conclude about the main effect of
WH? What about the main effect of CAT?
Let’s wrap this section up by doing a little more plotting. You should have already organized the data in a plot-friendly format when we started but here’s that again for reference:
summary.plot.data <- islands.data %>%
group_by(gap_site, movement) %>%
dplyr::summarise(rating = mean(value),
n = n(),
se = sd(value)/sqrt(n)
)
## `summarise()` has grouped output by 'gap_site'. You can override using the
## `.groups` argument.
summary.plot.data
## # A tibble: 4 × 5
## # Groups: gap_site [2]
## gap_site movement rating n se
## <fct> <fct> <dbl> <int> <dbl>
## 1 NP -WH 6.18 12 0.0916
## 2 NP +WH 2.92 12 0.223
## 3 SC -WH 6.54 12 0.0917
## 4 SC +WH 5.35 12 0.164
ggplot(summary.plot.data,
aes(x=gap_site, y=rating, fill=movement)
) +
ylim(0,7) +
geom_bar(position='dodge', stat = "identity", size=3) +
geom_errorbar(position = position_dodge(width = 0.9),
aes(ymax = rating+se, ymin=rating-se), width=0.25) +
scale_fill_manual(values=c("maroon4", "darkgrey")) +
xlab("Gap position") +
ylab("Average rating") +
ggtitle("Acceptability ratings") +
theme_bw(base_size = 16)
## Warning in geom_bar(position = "dodge", stat = "identity", size = 3): Ignoring
## unknown parameters: `size`
A more robust and informative way to visualize factorial results these days might be with a box-whisker plot split up by the different factors and with dots showing the individual-level results (here subject-means). See if you can construct that yourself using:
geom_boxplotgeom_jitter (so the individual points don’t overlap too
much to be readable)(I’ve got a solution commented out below… NO PEAKING until you try it yourself!)
Before we call it a day let’s walk through how to construct a
logistic regression model in R. It’s very simple, and extremely similar
to specifying and fitting a linear regression. The function we want is
glm() which stands for generalized linear
model rather than lm()
The other major thing to note (at least for now) is that you should
specify the “family” of distribution that you are modeling under the
glm() call (this is required since the glm
framework is more powerful than just fitting logistic regression). Thus
a simple call would look like this:
my.logit.model <- glm(binaryOutcome ~ factor1 * factor2 + factor3, data = df, family = binomial)
With linear regression, we are attempting to build a mathematical model to predict a continuous outcome. But often in psycholinguistics we’re dealing with categorical outcomes (2AFC responses, grammaticality judgements, etc.). And so we’d like to be able to model that as well.
Deep down, this is really just a linear regression, but we pass the summed up weights and biases (\(\beta\)s and predictor values + intercept(s)) through some non-linear “activation” function – here that’s the sigmoid
\[ \sigma(x) = \frac{1}{1 + e^{-x}} \]
BTW: the name “logistic” in logistic regression refers to the logit function which is just the natural inverse of the sigmoid:
\[ \operatorname{logit}(p) = \ln\!\left(\frac{p}{1-p}\right) \]
library(knitr)
df <- data.frame(
Function = c("Sigmoid", "Logit"),
Input = c(
"$-\\infty$ to $\\infty$ (log-odds)",
"$(0,1)$ (probability)"
),
Output = c(
"$(0,1)$ (probability)",
"$-\\infty$ to $\\infty$ (log-odds)"
),
Meaning = c(
"Converts linear scores to probability",
"Inverse of sigmoid"
)
)
kable(df, escape = FALSE, align = "lccc")
| Function | Input | Output | Meaning |
|---|---|---|---|
| Sigmoid | \(-\infty\) to \(\infty\) (log-odds) | \((0,1)\) (probability) | Converts linear scores to probability |
| Logit | \((0,1)\) (probability) | \(-\infty\) to \(\infty\) (log-odds) | Inverse of sigmoid |
Here’s some sample data from the mosaicData package.
Women were interviewed at Age about whether they were a
Smoker1.
Outcome encodes whether they were alive or dead 20 years later. We’ll
encode “Alive” as 0, “Dead” as 1
# install.packages("mosaicData")
library(mosaicData)
glimpse(Whickham)
## Rows: 1,314
## Columns: 3
## $ outcome <fct> Alive, Alive, Dead, Alive, Alive, Alive, Alive, Dead, Alive, A…
## $ smoker <fct> Yes, Yes, Yes, No, No, Yes, Yes, No, No, No, No, Yes, No, Yes,…
## $ age <int> 23, 18, 71, 67, 64, 38, 45, 76, 28, 27, 28, 34, 20, 72, 48, 45…
Can’t we just use linear regression here? (You might be wondering)?
Whickham$outcome_num <- as.numeric(Whickham$outcome == "Dead")
smoking.plot.linear <- ggplot(Whickham, aes(x = age, y = outcome_num, color = smoker)) +
geom_jitter(width = 0.8, height = 0.3, alpha = 0.7, size = 2) +
scale_color_manual(values = c("dodgerblue", "red")) +
geom_smooth(
method = "lm",
se = FALSE,
linetype = "dashed",
color = "black",
linewidth = 1
) +
labs(
x = "Age",
y = "Outcome (Alive / Dead)",
color = "Smoker"
) +
theme_minimal(base_size = 12)
smoking.plot.linear
## `geom_smooth()` using formula = 'y ~ x'
smoking.plot.logistic <- ggplot(Whickham, aes(x = age, y = outcome_num, color = smoker)) +
geom_jitter(width = 0.8, height = 0.3, alpha = 0.7, size = 2) +
scale_color_manual(values = c("dodgerblue", "red")) +
geom_smooth(
method = "glm",
method.args = list(family = binomial),
se = FALSE,
linewidth = 1.2
) +
labs(
x = "Age",
y = "Outcome (Alive / Dead)",
color = "Smoker"
) +
theme_minimal(base_size = 12)
smoking.plot.logistic
## `geom_smooth()` using formula = 'y ~ x'
We could in a mathematical sense, but is that justified here?
logistic.model <- glm(outcome ~ age + smoker, data = Whickham, family = "binomial")
logistic.model
##
## Call: glm(formula = outcome ~ age + smoker, family = "binomial", data = Whickham)
##
## Coefficients:
## (Intercept) age smokerYes
## -7.5992 0.1237 0.2047
##
## Degrees of Freedom: 1313 Total (i.e. Null); 1311 Residual
## Null Deviance: 1560
## Residual Deviance: 945 AIC: 951
How do we interpret these?
coef_df <- data.frame(
term = names(coef(logistic.model)),
estimate = coef(logistic.model)
)
coef_df
## term estimate
## (Intercept) (Intercept) -7.5992213
## age age 0.1236832
## smokerYes smokerYes 0.2046991
These don’t give us a classification directly, nor do they map on to our logistic regression curve in an obvious way? So what’s going on?
If you specify our formula within the glm() call, it’s
really this:
\[ y = w_0 + w_1x_1 + w_2x_2 \]
Where \(X_1\) is age
and \(x_2\) is smoker.
Then applying the non-linear transformation (which you can do in R via
plogis() by hand if you want) we’re getting:
\[
p = \dfrac{1}{1 + e^{-( w_0 + w_1x_1 + w_2x_2)}}
\] And the call to glm() is giving us those
weights.
How to interpret those weights then? Well, since they are in log-odds space it’ll be easier if we exponentiate things back into non-log space:
exp(coef(logistic.model))
## (Intercept) age smokerYes
## 0.0005008413 1.1316572762 1.2271557224
This means that each additional year multiplies the odds of a person
dying by about 1.132. And similarly, being a smoker in the
study means that one had 1.23 times the odds of dying
(during the study period) compared to non-smokers.
Calling summary on logistic regressions gives us the
same kind of information as it does for linear models:
summary(logistic.model)
##
## Call:
## glm(formula = outcome ~ age + smoker, family = "binomial", data = Whickham)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -7.599221 0.441231 -17.223 <2e-16 ***
## age 0.123683 0.007177 17.233 <2e-16 ***
## smokerYes 0.204699 0.168422 1.215 0.224
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1560.32 on 1313 degrees of freedom
## Residual deviance: 945.02 on 1311 degrees of freedom
## AIC: 951.02
##
## Number of Fisher Scoring iterations: 6
The z-values come from the the idea that our approximation of the true \(\beta\)s here is normally distributed and thus can be derived as we’ve done before:
\[ z_j = \dfrac{\hat{\beta}_j}{SE(\hat{\beta}_j)} \]
And the \(p\)-value is testing the null hypothesis that \(\beta_j = 0\). So, as with the rest of our modeling enterprise, be careful when interpreting the (very tempting looking) p-values you see in a regression table. They do not directly reflect the hypothesis of interest!
Instead we’ll soon set up a likelihood-ratio test between
nested models one with the predictor of interest and
one without. (spoiler alert though, if you set up a nested model
comparison using the full model we defined above you’ll find that
including smoker does not significantly improve model
fit…)
Thus, here’s an open-ended challenge for you, try to figure out why smoking doesn’t appear to be a significant predictor of mortality here! (Since, unlike Fisher, we know from a variety of evidence that it is a predictor of mortality.)
Hint: see if there is any correlation between age and smoking in the dataset. Also consider if there’s an interaction between age and smoking? How about the relatively scale between the two predictors?
Although ironically and bizarrely Fisher (yes, that Fisher) was an anti-anti-smoking advocate↩︎