Introduction to testing Statistical
Hypotheses
Quality Control: when to blow the whistle
(Hypothesis: process in control, Alternative: out of control)
Application: industrial production
Planned experiment: are there any treatment effects?
(Hypothesis: zero effect, Alternative: non zero effects)
Application: Pharma, Industry, Agriculture (Green Revolution, Fisher's test)
Independence testing: is there any relation between the observations of different processes?
(Hypothesis: unrelated, independent, Alternative: there is some relation)
Regression Analysis, Econometry,...
suppose our observations are obtained from a data generating process characterized by certain parameters.
our generic Example: Detect a change by one observation:
"drop money from helicopters" on the economy
Treatment
effectcontrolled
experiment
Application:
pour money into "poor" (?) school district, observe SATscores (average) after
compare with what it was before: did it get better? worse? no change?
"ceteris paribus"
NEED: a model for your data are "generated"
continue with
NAIVE Hypotheses testing
FORMAL Hypotheses testing
conclude with
MORE observations, MORE models
Example SAT-scores
Expert opinion on past SAT-scores (random variable)
model:
"except once in a lifetime"
"once in a lifetime, maybe"
why? 95% within
DATA: ("way above average", how much is too much?)
compute probability of observing what you observe (or even more extreme than that)
Question: is there enough evidence for the case
Answer: compute the probability of observering what we observed or even s.th. more extreme than that ("p-value"):
conclusion:
the recipe "better SAT by pouring money" is not convincing (so far)
at best "borderline"
we would expect under this model
()
to observe what we observe
(
or even more extreme than that
)
12% (pvalue) of all times (not untypical)
Example: Alternative motivation of the MODEL: from complete past SAT records, compute
example: SAT-scores
From records on past SAT-scores
DATA:
compute probability of observing what you observe (or even more extreme than that)
compute "p-value": (standardization)
Formal Hypothesis Testing
Null Hypothesis, Alternative, Level of significance
Setting up Hypothesis
(H)
and Alternative
(H)
What
is of interest ? A positive treatment effect.
do we expect or hope to show ?
H suppose
not
before treatment:
after treatment:
H
H
conclusion: there is scant evidence against
H
(or there is not enough reason to reject
H).
Setting the red line. How much is too much?
Introduce Rejection rule
: level of significance
customary: (engineering), (common), (social sciences)
Example (continued)
Model
suppose
H
Rejection rule
Reject
H
if
where
set
from table:
Formal Rejection rule, Final Form:
Type1 error: reject H when H true.
Question: why ever make a Type1 error?
Answer: never make Type1 = never reject=not reject when wrong = Type2 error.
Question: why ever make a Type2 error?
Answer: think about it
FURTHER REMARKS
canned programs
SAS, STATA, SPSS etc do not test hypotheses; they report p-values
before you use these programs know your data!
two-sided alternatives
before treatment:
after treatment:
H
H
two-sided alternative.
split into left and right
see homework problems
More elaborate Tests
two-sample test (normal test, t-test)
Data:
sample1 | ||
sample2 |
H
vs
H
k-sample tests (ANOVA)
H
vs
H
not all equal
NOTE: a Null-hypothesis may always be WRONG and may have to be rejected, but
never is accepted.
never accept a null hypothesis.