P-value in social science research

Kirils Makarovs, PhD Candidate

University of Essex, Department of Sociology

03/09/2020

Outline

Sample vs. general population

Why bother with p-values at all?

Sample vs. general population

Why bother with p-values at all?

General population is the 'universe' of objects (people) that we are interested in e.g.
the entire population of the UK

Sample vs. general population

Why bother with p-values at all?

General population is the 'universe' of objects (people) that we are interested in e.g.
the entire population of the UK

Usually not available to a researcher

Sample vs. general population

Why bother with p-values at all?

General population is the 'universe' of objects (people) that we are interested in e.g.
the entire population of the UK

Usually not available to a researcher

Sample is a subset of the objects derived from the general population

Sample vs. general population

Why bother with p-values at all?

General population is the 'universe' of objects (people) that we are interested in e.g.
the entire population of the UK

Usually not available to a researcher

Sample is a subset of the objects derived from the general population

How can we be sure that anything that we find in sample data actually holds in the general population?

Sample vs. general population

Why bother with p-values at all?

General population is the 'universe' of objects (people) that we are interested in e.g.
the entire population of the UK

Usually not available to a researcher

Sample is a subset of the objects derived from the general population

How can we be sure that anything that we find in sample data actually holds in the general population?

That’s the whole point of inferential statistics aka ‘strategies for guessing’!

Sample vs. general population

Why bother with p-values at all?

General population is the 'universe' of objects (people) that we are interested in e.g.
the entire population of the UK

Usually not available to a researcher

Sample is a subset of the objects derived from the general population

How can we be sure that anything that we find in sample data actually holds in the general population?

That’s the whole point of inferential statistics aka ‘strategies for guessing’!

Sample statistics \(\longrightarrow\)

Sample vs. general population

Why bother with p-values at all?

General population is the 'universe' of objects (people) that we are interested in e.g.
the entire population of the UK

Usually not available to a researcher

Sample is a subset of the objects derived from the general population

How can we be sure that anything that we find in sample data actually holds in the general population?

That’s the whole point of inferential statistics aka ‘strategies for guessing’!

Sample statistics \(\longrightarrow\) Population parameters

Sample vs. general population

Why bother with p-values at all?

General population is the 'universe' of objects (people) that we are interested in e.g.
the entire population of the UK

Usually not available to a researcher

Sample is a subset of the objects derived from the general population

How can we be sure that anything that we find in sample data actually holds in the general population?

That’s the whole point of inferential statistics aka ‘strategies for guessing’!

Sample statistics \(\longrightarrow\) Population parameters

The sample should be representative - a topic for another day

Sampling distribution - 1

Sampling distribution - 1

However, the statistics (e.g. mean) that you get in your sample are subjected to variability

Sampling distribution - 1

However, the statistics (e.g. mean) that you get in your sample are subjected to variability

Draw a sample one more time and the \(\bar{\mathrm{x}}\) will be somewhat different

Sampling distribution - 1

However, the statistics (e.g. mean) that you get in your sample are subjected to variability

Draw a sample one more time and the \(\bar{\mathrm{x}}\) will be somewhat different

Thought experiment:

Sampling distribution - 1

However, the statistics (e.g. mean) that you get in your sample are subjected to variability

Draw a sample one more time and the \(\bar{\mathrm{x}}\) will be somewhat different

Thought experiment: Say we research how many hours, on average, Brits watch TV per day, and we draw a myriad of samples of the same size

Sampling distribution - 1

However, the statistics (e.g. mean) that you get in your sample are subjected to variability

Draw a sample one more time and the \(\bar{\mathrm{x}}\) will be somewhat different

Thought experiment: Say we research how many hours, on average, Brits watch TV per day, and we draw a myriad of samples of the same size

Every time we calculate mean hours, they appear to be a bit different:

Sampling distribution - 1

However, the statistics (e.g. mean) that you get in your sample are subjected to variability

Draw a sample one more time and the \(\bar{\mathrm{x}}\) will be somewhat different

Thought experiment: Say we research how many hours, on average, Brits watch TV per day, and we draw a myriad of samples of the same size

Every time we calculate mean hours, they appear to be a bit different:

\(\bar{\mathrm{x}}_1 = 2.5\)

Sampling distribution - 1

However, the statistics (e.g. mean) that you get in your sample are subjected to variability

Draw a sample one more time and the \(\bar{\mathrm{x}}\) will be somewhat different

Thought experiment: Say we research how many hours, on average, Brits watch TV per day, and we draw a myriad of samples of the same size

Every time we calculate mean hours, they appear to be a bit different:

\(\bar{\mathrm{x}}_1 = 2.5\), \(\bar{\mathrm{x}}_2 = 2.8\)

Sampling distribution - 1

However, the statistics (e.g. mean) that you get in your sample are subjected to variability

Draw a sample one more time and the \(\bar{\mathrm{x}}\) will be somewhat different

Thought experiment: Say we research how many hours, on average, Brits watch TV per day, and we draw a myriad of samples of the same size

Every time we calculate mean hours, they appear to be a bit different:

\(\bar{\mathrm{x}}_1 = 2.5\), \(\bar{\mathrm{x}}_2 = 2.8\), \(\bar{\mathrm{x}}_3 = 3.6\)

Sampling distribution - 1

However, the statistics (e.g. mean) that you get in your sample are subjected to variability

Draw a sample one more time and the \(\bar{\mathrm{x}}\) will be somewhat different

Thought experiment: Say we research how many hours, on average, Brits watch TV per day, and we draw a myriad of samples of the same size

Every time we calculate mean hours, they appear to be a bit different:

\(\bar{\mathrm{x}}_1 = 2.5\), \(\bar{\mathrm{x}}_2 = 2.8\), \(\bar{\mathrm{x}}_3 = 3.6\), \(...\), \(\bar{\mathrm{x}}_\infty = 2.4\)

Sampling distribution - 1

However, the statistics (e.g. mean) that you get in your sample are subjected to variability

Draw a sample one more time and the \(\bar{\mathrm{x}}\) will be somewhat different

Thought experiment: Say we research how many hours, on average, Brits watch TV per day, and we draw a myriad of samples of the same size

Every time we calculate mean hours, they appear to be a bit different:

\(\bar{\mathrm{x}}_1 = 2.5\), \(\bar{\mathrm{x}}_2 = 2.8\), \(\bar{\mathrm{x}}_3 = 3.6\), \(...\), \(\bar{\mathrm{x}}_\infty = 2.4\) - this is called the sampling distribution!

Sampling distribution - 1

However, the statistics (e.g. mean) that you get in your sample are subjected to variability

Draw a sample one more time and the \(\bar{\mathrm{x}}\) will be somewhat different

Thought experiment: Say we research how many hours, on average, Brits watch TV per day, and we draw a myriad of samples of the same size

Every time we calculate mean hours, they appear to be a bit different:

\(\bar{\mathrm{x}}_1 = 2.5\), \(\bar{\mathrm{x}}_2 = 2.8\), \(\bar{\mathrm{x}}_3 = 3.6\), \(...\), \(\bar{\mathrm{x}}_\infty = 2.4\) - this is called the sampling distribution!

Two helpful facts about sampling distribution:

Sampling distribution - 1

pic1

Sampling distribution - 1

pic1

Sampling distribution - 2

Not going deeply into the properties of sampling distribution..

Sampling distribution - 2

Not going deeply into the properties of sampling distribution..

What we need to know is:

Sampling distribution - 2

Not going deeply into the properties of sampling distribution..

What we need to know is:

However, how does one get from sample statistics to population parameters?

Sampling distribution - 2

Not going deeply into the properties of sampling distribution..

What we need to know is:

However, how does one get from sample statistics to population parameters?

Via hypotheses testing!

Statistical hypotheses - 1

Statistical hypotheses - 1

What is a statistical hypothesis?

Statistical hypotheses - 1

What is a statistical hypothesis?

It's a statement about population parameters which we are able to test with our sample data

Statistical hypotheses - 1

What is a statistical hypothesis?

It's a statement about population parameters which we are able to test with our sample data

Generally distinguish between two statistical hypotheses:

Statistical hypotheses - 1

What is a statistical hypothesis?

It's a statement about population parameters which we are able to test with our sample data

Generally distinguish between two statistical hypotheses:

Note: when running statistical tests, a researcher considers \(H_0\) to be the baseline condition of the world and thus attempts to reject/corroborate \(H_a\)!

Statistical hypotheses - 2

Statistical hypotheses - 2

Say in your sample data you’ve found out that:

Statistical hypotheses - 2

Say in your sample data you’ve found out that:

Is this all?

Statistical hypotheses - 2

Say in your sample data you’ve found out that:

Is this all? No!

Statistical hypotheses - 2

Say in your sample data you’ve found out that:

Is this all? No!

What do the results obtained on your sample data say about the general population?

Statistical hypotheses - 2

Say in your sample data you’ve found out that:

Is this all? No!

What do the results obtained on your sample data say about the general population?

More precisely:

Statistical hypotheses - 2

Say in your sample data you’ve found out that:

Is this all? No!

What do the results obtained on your sample data say about the general population?

More precisely: what is the probability of observing the difference between men and women in my sample data, if there actually was no difference in the population from which this sample is drawn?

Statistical hypotheses - 2

Say in your sample data you’ve found out that:

Is this all? No!

What do the results obtained on your sample data say about the general population?

More precisely: what is the probability of observing the difference between men and women in my sample data, if there actually was no difference in the population from which this sample is drawn?

Put it other way:

Statistical hypotheses - 2

Say in your sample data you’ve found out that:

Is this all? No!

What do the results obtained on your sample data say about the general population?

More precisely: what is the probability of observing the difference between men and women in my sample data, if there actually was no difference in the population from which this sample is drawn?

Put it other way: what is the probability that the results I obtain in my sample are due to chance alone (i.e. sampling error)?

Statistical hypotheses - 2

Say in your sample data you’ve found out that:

Is this all? No!

What do the results obtained on your sample data say about the general population?

More precisely: what is the probability of observing the difference between men and women in my sample data, if there actually was no difference in the population from which this sample is drawn?

Put it other way: what is the probability that the results I obtain in my sample are due to chance alone (i.e. sampling error)?

This is where the p-value kicks in!

P-value - 1

P-value - 1

p-value quantifies the evidence against the null hypothesis

P-value - 1

p-value quantifies the evidence against the null hypothesis

That is: it shows what is the probability of obtaining the results we got in our sample data assuming that the \(H_0\) holds true in the general population

P-value - 1

p-value quantifies the evidence against the null hypothesis

That is: it shows what is the probability of obtaining the results we got in our sample data assuming that the \(H_0\) holds true in the general population

As any probabilistic statement, the p-value is bounded to be within \([0; 1]\)

P-value - 1

p-value quantifies the evidence against the null hypothesis

That is: it shows what is the probability of obtaining the results we got in our sample data assuming that the \(H_0\) holds true in the general population

As any probabilistic statement, the p-value is bounded to be within \([0; 1]\)

The lower the p-value, the lower is the probability of a ‘false positive’ conclusion!

P-value - 1

p-value quantifies the evidence against the null hypothesis

That is: it shows what is the probability of obtaining the results we got in our sample data assuming that the \(H_0\) holds true in the general population

As any probabilistic statement, the p-value is bounded to be within \([0; 1]\)

The lower the p-value, the lower is the probability of a ‘false positive’ conclusion!

Conventionally, the p-values of lower than \(0.05\) \((5\%)\) imply that you’ve got enough evidence to say that the mean/coefficient derived from your sample analysis holds in the general population

P-value - 1

p-value quantifies the evidence against the null hypothesis

That is: it shows what is the probability of obtaining the results we got in our sample data assuming that the \(H_0\) holds true in the general population

As any probabilistic statement, the p-value is bounded to be within \([0; 1]\)

The lower the p-value, the lower is the probability of a ‘false positive’ conclusion!

Conventionally, the p-values of lower than \(0.05\) \((5\%)\) imply that you’ve got enough evidence to say that the mean/coefficient derived from your sample analysis holds in the general population

p-value \(< 0.05\) - accept \(H_a\), reject \(H_0\)

P-value - 1

p-value quantifies the evidence against the null hypothesis

That is: it shows what is the probability of obtaining the results we got in our sample data assuming that the \(H_0\) holds true in the general population

As any probabilistic statement, the p-value is bounded to be within \([0; 1]\)

The lower the p-value, the lower is the probability of a ‘false positive’ conclusion!

Conventionally, the p-values of lower than \(0.05\) \((5\%)\) imply that you’ve got enough evidence to say that the mean/coefficient derived from your sample analysis holds in the general population

p-value \(< 0.05\) - accept \(H_a\), reject \(H_0\)

p-value \(> 0.05\) - do not reject (keep) \(H_0\)

P-value - 1

pic1

P-value - 2

P-value - 2

Significance testing, as any other statistical instrument, should be used consciously

P-value - 2

Significance testing, as any other statistical instrument, should be used consciously

pic1

P-value - 2

Significance testing, as any other statistical instrument, should be used consciously

p-value itself does not tell you:

p-value itself does tell you:

P-value - 2

Significance testing, as any other statistical instrument, should be used consciously

p-value itself does not tell you:

p-value itself does tell you:

P-value - 2

Significance testing, as any other statistical instrument, should be used consciously

p-value itself does not tell you:

p-value itself does tell you:

getting the p-value of e.g. \(0.03\) \((3\%)\) and corroborating \(H_a\) still means that you would’ve gotten the very same results in \(3\%\) of samples drawn from the population, in which \(H_0\) (not \(H_a\)) yields true!

Wrap up