Kirils Makarovs, PhD Candidate
Why bother with p-values at all?
Why bother with p-values at all?
General population is the 'universe' of objects (people) that we are interested in e.g.
the entire population of the UK
Why bother with p-values at all?
General population is the 'universe' of objects (people) that we are interested in e.g.
the entire population of the UK
Usually not available to a researcher
Why bother with p-values at all?
General population is the 'universe' of objects (people) that we are interested in e.g.
the entire population of the UK
Usually not available to a researcher
Sample is a subset of the objects derived from the general population
Why bother with p-values at all?
General population is the 'universe' of objects (people) that we are interested in e.g.
the entire population of the UK
Usually not available to a researcher
Sample is a subset of the objects derived from the general population
How can we be sure that anything that we find in sample data actually holds in the general population?
Why bother with p-values at all?
General population is the 'universe' of objects (people) that we are interested in e.g.
the entire population of the UK
Usually not available to a researcher
Sample is a subset of the objects derived from the general population
How can we be sure that anything that we find in sample data actually holds in the general population?
That’s the whole point of inferential statistics aka ‘strategies for guessing’!
Why bother with p-values at all?
General population is the 'universe' of objects (people) that we are interested in e.g.
the entire population of the UK
Usually not available to a researcher
Sample is a subset of the objects derived from the general population
How can we be sure that anything that we find in sample data actually holds in the general population?
That’s the whole point of inferential statistics aka ‘strategies for guessing’!
Sample statistics \(\longrightarrow\)
Why bother with p-values at all?
General population is the 'universe' of objects (people) that we are interested in e.g.
the entire population of the UK
Usually not available to a researcher
Sample is a subset of the objects derived from the general population
How can we be sure that anything that we find in sample data actually holds in the general population?
That’s the whole point of inferential statistics aka ‘strategies for guessing’!
Sample statistics \(\longrightarrow\) Population parameters
Why bother with p-values at all?
General population is the 'universe' of objects (people) that we are interested in e.g.
the entire population of the UK
Usually not available to a researcher
Sample is a subset of the objects derived from the general population
How can we be sure that anything that we find in sample data actually holds in the general population?
That’s the whole point of inferential statistics aka ‘strategies for guessing’!
Sample statistics \(\longrightarrow\) Population parameters
The sample should be representative - a topic for another day
However, the statistics (e.g. mean) that you get in your sample are subjected to variability
However, the statistics (e.g. mean) that you get in your sample are subjected to variability
Draw a sample one more time and the \(\bar{\mathrm{x}}\) will be somewhat different
However, the statistics (e.g. mean) that you get in your sample are subjected to variability
Draw a sample one more time and the \(\bar{\mathrm{x}}\) will be somewhat different
Thought experiment:
However, the statistics (e.g. mean) that you get in your sample are subjected to variability
Draw a sample one more time and the \(\bar{\mathrm{x}}\) will be somewhat different
Thought experiment: Say we research how many hours, on average, Brits watch TV per day, and we draw a myriad of samples of the same size
However, the statistics (e.g. mean) that you get in your sample are subjected to variability
Draw a sample one more time and the \(\bar{\mathrm{x}}\) will be somewhat different
Thought experiment: Say we research how many hours, on average, Brits watch TV per day, and we draw a myriad of samples of the same size
Every time we calculate mean hours, they appear to be a bit different:
However, the statistics (e.g. mean) that you get in your sample are subjected to variability
Draw a sample one more time and the \(\bar{\mathrm{x}}\) will be somewhat different
Thought experiment: Say we research how many hours, on average, Brits watch TV per day, and we draw a myriad of samples of the same size
Every time we calculate mean hours, they appear to be a bit different:
\(\bar{\mathrm{x}}_1 = 2.5\)
However, the statistics (e.g. mean) that you get in your sample are subjected to variability
Draw a sample one more time and the \(\bar{\mathrm{x}}\) will be somewhat different
Thought experiment: Say we research how many hours, on average, Brits watch TV per day, and we draw a myriad of samples of the same size
Every time we calculate mean hours, they appear to be a bit different:
\(\bar{\mathrm{x}}_1 = 2.5\), \(\bar{\mathrm{x}}_2 = 2.8\)
However, the statistics (e.g. mean) that you get in your sample are subjected to variability
Draw a sample one more time and the \(\bar{\mathrm{x}}\) will be somewhat different
Thought experiment: Say we research how many hours, on average, Brits watch TV per day, and we draw a myriad of samples of the same size
Every time we calculate mean hours, they appear to be a bit different:
\(\bar{\mathrm{x}}_1 = 2.5\), \(\bar{\mathrm{x}}_2 = 2.8\), \(\bar{\mathrm{x}}_3 = 3.6\)
However, the statistics (e.g. mean) that you get in your sample are subjected to variability
Draw a sample one more time and the \(\bar{\mathrm{x}}\) will be somewhat different
Thought experiment: Say we research how many hours, on average, Brits watch TV per day, and we draw a myriad of samples of the same size
Every time we calculate mean hours, they appear to be a bit different:
\(\bar{\mathrm{x}}_1 = 2.5\), \(\bar{\mathrm{x}}_2 = 2.8\), \(\bar{\mathrm{x}}_3 = 3.6\), \(...\), \(\bar{\mathrm{x}}_\infty = 2.4\)
However, the statistics (e.g. mean) that you get in your sample are subjected to variability
Draw a sample one more time and the \(\bar{\mathrm{x}}\) will be somewhat different
Thought experiment: Say we research how many hours, on average, Brits watch TV per day, and we draw a myriad of samples of the same size
Every time we calculate mean hours, they appear to be a bit different:
\(\bar{\mathrm{x}}_1 = 2.5\), \(\bar{\mathrm{x}}_2 = 2.8\), \(\bar{\mathrm{x}}_3 = 3.6\), \(...\), \(\bar{\mathrm{x}}_\infty = 2.4\) - this is called the sampling distribution!
However, the statistics (e.g. mean) that you get in your sample are subjected to variability
Draw a sample one more time and the \(\bar{\mathrm{x}}\) will be somewhat different
Thought experiment: Say we research how many hours, on average, Brits watch TV per day, and we draw a myriad of samples of the same size
Every time we calculate mean hours, they appear to be a bit different:
\(\bar{\mathrm{x}}_1 = 2.5\), \(\bar{\mathrm{x}}_2 = 2.8\), \(\bar{\mathrm{x}}_3 = 3.6\), \(...\), \(\bar{\mathrm{x}}_\infty = 2.4\) - this is called the sampling distribution!
Two helpful facts about sampling distribution:
Not going deeply into the properties of sampling distribution..
Not going deeply into the properties of sampling distribution..
What we need to know is:
Not going deeply into the properties of sampling distribution..
What we need to know is:
Sampling distribution can be wider or narrower (depending on it’s standard deviation which is called the standard error in this case)
You can decrease the standard error by:
increasing the sample size! (larger sample \(\longrightarrow\) more precise sample estimate)
decreasing the standard deviation in the sample (not in your control, really)
However, how does one get from sample statistics to population parameters?
Not going deeply into the properties of sampling distribution..
What we need to know is:
Sampling distribution can be wider or narrower (depending on it’s standard deviation which is called the standard error in this case)
You can decrease the standard error by:
increasing the sample size! (larger sample \(\longrightarrow\) more precise sample estimate)
decreasing the standard deviation in the sample (not in your control, really)
However, how does one get from sample statistics to population parameters?
Via hypotheses testing!
What is a statistical hypothesis?
What is a statistical hypothesis?
It's a statement about population parameters which we are able to test with our sample data
What is a statistical hypothesis?
It's a statement about population parameters which we are able to test with our sample data
Generally distinguish between two statistical hypotheses:
What is a statistical hypothesis?
It's a statement about population parameters which we are able to test with our sample data
Generally distinguish between two statistical hypotheses:
The null hypothesis \((H_0)\): no difference between groups / some value (mean, correlation coefficient, etc.) is not different from \(0\) in the general population
The alternative hypothesis \((H_a)\): there is a difference between groups / some value (mean, correlation coefficient, etc.) is different from \(0\) in the general population
Note: when running statistical tests, a researcher considers \(H_0\) to be the baseline condition of the world and thus attempts to reject/corroborate \(H_a\)!
Say in your sample data you’ve found out that:
Say in your sample data you’ve found out that:
On average, women watch TV 2.3 hours more than men
There is a relationship between stress level and smoking (e.g. \(r = 0.4\))
Is this all?
Say in your sample data you’ve found out that:
On average, women watch TV 2.3 hours more than men
There is a relationship between stress level and smoking (e.g. \(r = 0.4\))
Is this all? No!
Say in your sample data you’ve found out that:
On average, women watch TV 2.3 hours more than men
There is a relationship between stress level and smoking (e.g. \(r = 0.4\))
Is this all? No!
What do the results obtained on your sample data say about the general population?
Say in your sample data you’ve found out that:
On average, women watch TV 2.3 hours more than men
There is a relationship between stress level and smoking (e.g. \(r = 0.4\))
Is this all? No!
What do the results obtained on your sample data say about the general population?
More precisely:
Say in your sample data you’ve found out that:
On average, women watch TV 2.3 hours more than men
There is a relationship between stress level and smoking (e.g. \(r = 0.4\))
Is this all? No!
What do the results obtained on your sample data say about the general population?
More precisely: what is the probability of observing the difference between men and women in my sample data, if there actually was no difference in the population from which this sample is drawn?
Say in your sample data you’ve found out that:
On average, women watch TV 2.3 hours more than men
There is a relationship between stress level and smoking (e.g. \(r = 0.4\))
Is this all? No!
What do the results obtained on your sample data say about the general population?
More precisely: what is the probability of observing the difference between men and women in my sample data, if there actually was no difference in the population from which this sample is drawn?
Put it other way:
Say in your sample data you’ve found out that:
On average, women watch TV 2.3 hours more than men
There is a relationship between stress level and smoking (e.g. \(r = 0.4\))
Is this all? No!
What do the results obtained on your sample data say about the general population?
More precisely: what is the probability of observing the difference between men and women in my sample data, if there actually was no difference in the population from which this sample is drawn?
Put it other way: what is the probability that the results I obtain in my sample are due to chance alone (i.e. sampling error)?
Say in your sample data you’ve found out that:
On average, women watch TV 2.3 hours more than men
There is a relationship between stress level and smoking (e.g. \(r = 0.4\))
Is this all? No!
What do the results obtained on your sample data say about the general population?
More precisely: what is the probability of observing the difference between men and women in my sample data, if there actually was no difference in the population from which this sample is drawn?
Put it other way: what is the probability that the results I obtain in my sample are due to chance alone (i.e. sampling error)?
This is where the p-value kicks in!
p-value quantifies the evidence against the null hypothesis
p-value quantifies the evidence against the null hypothesis
That is: it shows what is the probability of obtaining the results we got in our sample data assuming that the \(H_0\) holds true in the general population
p-value quantifies the evidence against the null hypothesis
That is: it shows what is the probability of obtaining the results we got in our sample data assuming that the \(H_0\) holds true in the general population
As any probabilistic statement, the p-value is bounded to be within \([0; 1]\)
p-value quantifies the evidence against the null hypothesis
That is: it shows what is the probability of obtaining the results we got in our sample data assuming that the \(H_0\) holds true in the general population
As any probabilistic statement, the p-value is bounded to be within \([0; 1]\)
The lower the p-value, the lower is the probability of a ‘false positive’ conclusion!
p-value quantifies the evidence against the null hypothesis
That is: it shows what is the probability of obtaining the results we got in our sample data assuming that the \(H_0\) holds true in the general population
As any probabilistic statement, the p-value is bounded to be within \([0; 1]\)
The lower the p-value, the lower is the probability of a ‘false positive’ conclusion!
Conventionally, the p-values of lower than \(0.05\) \((5\%)\) imply that you’ve got enough evidence to say that the mean/coefficient derived from your sample analysis holds in the general population
p-value quantifies the evidence against the null hypothesis
That is: it shows what is the probability of obtaining the results we got in our sample data assuming that the \(H_0\) holds true in the general population
As any probabilistic statement, the p-value is bounded to be within \([0; 1]\)
The lower the p-value, the lower is the probability of a ‘false positive’ conclusion!
Conventionally, the p-values of lower than \(0.05\) \((5\%)\) imply that you’ve got enough evidence to say that the mean/coefficient derived from your sample analysis holds in the general population
p-value \(< 0.05\) - accept \(H_a\), reject \(H_0\)
p-value quantifies the evidence against the null hypothesis
That is: it shows what is the probability of obtaining the results we got in our sample data assuming that the \(H_0\) holds true in the general population
As any probabilistic statement, the p-value is bounded to be within \([0; 1]\)
The lower the p-value, the lower is the probability of a ‘false positive’ conclusion!
Conventionally, the p-values of lower than \(0.05\) \((5\%)\) imply that you’ve got enough evidence to say that the mean/coefficient derived from your sample analysis holds in the general population
p-value \(< 0.05\) - accept \(H_a\), reject \(H_0\)
p-value \(> 0.05\) - do not reject (keep) \(H_0\)
Significance testing, as any other statistical instrument, should be used consciously
Significance testing, as any other statistical instrument, should be used consciously
Significance testing, as any other statistical instrument, should be used consciously
p-value itself does not tell you:
p-value itself does tell you:
Significance testing, as any other statistical instrument, should be used consciously
p-value itself does not tell you:
anything about the population per se
the probability of \(H_a\) being true in the general population
the probability of \(H_0\) being true in the general population
anything about the effect size
p-value itself does tell you:
Significance testing, as any other statistical instrument, should be used consciously
p-value itself does not tell you:
anything about the population per se
the probability of \(H_a\) being true in the general population
the probability of \(H_0\) being true in the general population
anything about the effect size
p-value itself does tell you:
getting the p-value of e.g. \(0.03\) \((3\%)\) and corroborating \(H_a\) still means that you would’ve gotten the very same results in \(3\%\) of samples drawn from the population, in which \(H_0\) (not \(H_a\)) yields true!