Introduction to testing Statistical Hypotheses





Quality Control: when to blow the whistle

(Hypothesis: process in control, Alternative: out of control)

Application: industrial production


Planned experiment: are there any treatment effects?

(Hypothesis: zero effect, Alternative: non zero effects)

Application: Pharma, Industry, Agriculture (Green Revolution, Fisher's $\chi^{2}-$test)




Independence testing: is there any relation between the observations of different processes?

(Hypothesis: unrelated, independent, Alternative: there is some relation)

Regression Analysis, Econometry,...


suppose our observations are obtained from a data generating process characterized by certain parameters.




our generic Example: Detect a change by one observation: $x_{1}$


"drop money from helicopters" on the economy

$\implies$Treatment effect$\implies$controlled experiment


Application:


  1. pour money into "poor" (?) school district, observe SATscores (average) after

  2. compare with what it was before: did it get better? worse? no change?

  3. "ceteris paribus"


NEED: a model for your data are "generated"


continue with

NAIVE Hypotheses testing




FORMAL Hypotheses testing




conclude with

MORE observations, MORE models





Example SAT-scores $X$

  1. Expert opinion on past SAT-scores $X$ (random variable)

    1. model: MATH

    2. $\mu\leq1550$ "except once in a lifetime"

    3. $\mu\leq1450$ "once in a lifetime, maybe"

    4. $\sigma\approx30$

    5. why? 95% within MATH

  2. DATA: $x_{new}=1535$ ("way above average", how much is too much?)

  3. compute probability of observing what you observe (or even more extreme than that)

MATH

Question: is there enough evidence for the case MATH

Answer: compute the probability of observering what we observed or even s.th. more extreme than that ("p-value"): MATH

conclusion:

  1. the recipe "better SAT by pouring money" is not convincing (so far)

  2. at best "borderline"

  3. we would expect under this model ($\mu=1500$) to observe what we observe ($x=1535$ or even more extreme than that $x>1535$) 12% (pvalue) of all times (not untypical)



Example: Alternative motivation of the MODEL: from complete past SAT records, compute $\mu=1500,\sigma=30$

MATH

example: SAT-scores $X$

  1. From records on past SAT-scores $X$

    1. MATH

    2. $\mu=1500$

    3. $\sigma=30$

  2. DATA: $x_{new}=1535$

  3. compute probability of observing what you observe (or even more extreme than that)

MATH

compute "p-value": (standardization) MATH





Formal Hypothesis Testing


Null Hypothesis, Alternative, Level of significance


Setting up Hypothesis (H$_{0}$) and Alternative (H$_{1}$)


What

  1. is of interest ? A positive treatment effect.

  2. do we expect or hope to show ? $\mu>1500$


H$_{0}:$ suppose not


before treatment: $\mu_{0}\leq1500$

after treatment: $\mu>1500$


H$_{0}:$ $\mu\leq1500$

H$_{1}:$ $\mu>1500$


conclusion: there is scant evidence against H$_{0}$ (or there is not enough reason to reject H$_{0}$).


Setting the red line. How much is too much?

Introduce Rejection rule


reject H$_{0}$ if you observe what you observe (or more extreme than that) at most $100\alpha\%$ of all times.



$100\alpha\%$: level of significance

customary: $\alpha=0.01~$(engineering), $0.05$ (common), $0.10$ (social sciences)





Example (continued)

Model


suppose HMATH


Rejection rule


Reject H$_{0}$ if $X>x_{0}$ where MATH


set MATH


MATH


from table:

MATH $\implies$

MATH

$\implies$

MATH

Formal Rejection rule, Final Form:


reject H$_{0}:\mu\leq1500$ at 5% if MATH



Type1 error: reject H$_{0}$ when H$_{0}$ true.

MATH


Question: why ever make a Type1 error?

Answer: never make Type1 = never reject=not reject when wrong = Type2 error.

Question: why ever make a Type2 error?

Answer: think about it





FURTHER REMARKS


canned programs




SAS, STATA, SPSS etc do not test hypotheses; they report p-values

before you use these programs know your data!




two-sided alternatives


before treatment: $\mu_{0}=1500$

after treatment: $\mu\neq1500$

H$_{0}:$ $\mu=1500$

H$_{1}:$ $\mu\neq1500$

two-sided alternative.

split $\alpha$ into $\alpha/2$ left and $\alpha/2$ right

see homework problems





More elaborate Tests


two-sample test (normal test, t-test)

Data:
sample1 $x_{1},\ldots,x_{m}$ MATH
sample2 $y_{1},\ldots,y_{n}$ MATH

H$_{0}:$ $\mu_{1}=\mu_{2}$ vs H$_{a}:$ $\mu_{1}\neq\mu_{2}$


k-sample tests (ANOVA)

H$_{0}:$ MATH vs H$_{a}:$ not all equal


NOTE: a Null-hypothesis may always be WRONG and may have to be rejected, but never is accepted.


never accept a null hypothesis.


do not reject H$_{0}$ instead of accepting H$_{0}$