Stage 5- Analyse and interpret the data

Analysing the data

Qualitative

No use of statistics
There may be no need as pattern is so obvious- e.g. scatter graphs, compare graphs

Quantitative

Use quantitative to be objective, know exact level or direction of relationship/trend, know exact pattern, identify anomalies
Allows the analysis to go further

1 of 26

Types of quantitative data analysis

Descriptive

Descriptive techniques- central tendency, e.g. mean, mode (most common), median (middle value), scatter graphs
Deviation from the central tendency- range, interquartile range, standard deviation
Frequencies (graph)- Kurtosis, skew

Normal distribution

Mean, mode and median coincide- 95% of values within 2 standard deviations of mean

Types of data pattern

Horizontal e.g. nearest neighbour- clustered, regular or scattered
Vertical e.g.Lorenz curve- cumulative frequency
Networks- beta index, alpha index, centrality
Interactions e.g. gravity model

2 of 26

Types of quantitative data analysis

Tests

Statistical test for difference e.g. Mann Whitney, Chi-squared
Statistical test for correlation or association e.g. Spearman's rank, Chi-squared
In tests remember null hypothesis and degrees of freedom influence the result and 95% is the minimum expected accuracy. This is found from tables of significance

Interpretation

Look at results in context of original question
Try to offer explanations for patterns/links/trends and any anomalies

3 of 26

Central tendency- mean

Measures of central tendency represent data sets by a middle value around which other values cluster

Mean

Meaning- add the quantities together (sum of values) and divide by the number of quantities
Most widely used, features in other statistical calculations
Limitations- Distorted by extreme values and involves calculation, mean weights each value according to its magnitude, different distributions give similar mean values
The mean provides an accurate summary where data have a normal distribution and a narrow range of values but are often unrepresentative when distributions are skewed

4 of 26

Central tendency- median and mode

Median

Meaning- the central value when values are put in order
The median gives equal weight to each value meaning it is more representative measure than the mean for data sets that are skewed
Limitations- gives no idea of other values
Wildly different data sets can give similar median values
Cannot be used in any further statistical calculation because it has no true mathematical properties

Mode

Meaning- The value that occurs most frequently
Limitations- Gives no idea of other values
Depends entirely on the arbitrary choice of class interval

5 of 26

Dispersion

Range

Meaning- difference between highest and lowest values
Limitations- Distorted by extreme values, only uses two values in the data set

Interquartile range

Meaning- Difference between inner half of the data around the median
Used alongside the median as a statement of dispersion
Each half is split into four equal parts- quartiles, the upper quartile is the boundary for the upper 25% of values, the lowest 25% of values is the lower quartiles and the difference between the upper and lower quartile is the interquartile range
Limitations- Ignores values above and below this

6 of 26

Dispersion- Standard deviation

Standard deviation

Meaning- shows spread of all values around the mean
Incorporates all the values in a data set
The standard deviation has a precise relationship with data sets which follow a normal frequency distribution
To convert values into a unit of standard deviation we subtract the mean and then divide by the standard deviation
- standard deviate= (value-mean) / standard deviation
Limitations- Uses a formula and calculation distorted by extreme values

Coefficient of variation

The value of standard deviation is strongly influenced by the magnitude of the mean
This is a problem when comparing dispersion in two different data sets with very different means
Using the coefficient of variation overcomes this problem as it expresses the standard deviation as a percentage of the mean

7 of 26

Tests for differences

Students' t test

Compares the arithmetic means of two samples to determine the likelihood that any difference could have occurred by chance
It is a parametric test meaning it should only be applied where samples are derived from populations that have a normal frequency distribution

Mann-Whitney U statistic

Meaning- Compares medians and ranks to see if data set differs
Makes no assumptions about the normality of the population from which sample data are drawn
It can be applied to small data sets, data measured on an ordinal scale and to data sets containing unequal values of numbers
Limitations- Uses a formula, calculation and significance table, can only be applied to two data sets

8 of 26

Tests for difference

Chi-squared

Meaning- Compares observed and expected frequencies, used to determine whether an observed frequency distribution differs significantly from the frequencies that might be expected if the distribution were random
Limitations- Uses a formula, calculation and significance table and how is expected frequency determined?
There are two versions of the chi-squared test: one sample version and a test for two or more sample distributions. Conditions apply:
- data are in frequencies- test is invalid for percentages or proportions
- there should not be many categories for which expected frequencies are small

9 of 26

Calculating the U statistic

1. Arrange the values in the two data sets in rank order of size for both sample together
2. Where the values are tied the mean ranking is used
3. Sum the rank values for each of the data sets separately and then calculate the U statistic using the equations. The smaller of the two values for U is used in statistical tables to estimate significance
U is significant if it is less than the critical values listed in the tables

10 of 26

Calculating the one-sample chi-squared

1. If we assume the distribution is random we can generate an expected distribution based on frequency
2. The formula is then used to calculate
3. The significance of the chi-squared is found in statistical tables. Degrees of freedom are obtained by multiplying the number of columns (k) minus one, by the number of rows (r) minus one.

11 of 26

Calculating the two-squared chi-squared

1. Sum the row values, column values and the total number of values in the data set
2. Calculate the expected frequencies for each cell by multiplying its row value by its column value and dividing by the total number of values
3. Substitute the expected values for each cell in the chi-squared formula and sum the results
The significance of the chi-squared value is checked in statistical tables

12 of 26

Tests for association

Spearman's rank

Meaning- measures strength of relationship between two sets of ranked data
Limitation- only uses ranks of data and uses a significance table

Chi-squared

Meaning- compares observed and the frequency expected given a certain hypothesis
Limitations- Uses a formula and calculation and significance table, how is expected frequency determined?

13 of 26

Trends and relationships

Often shown on graphs, especially scatter graphs and tell you:

The direction of trend/relationship- positive, negative or neutral
The strength of trend/relationship- strong, weak or non-existent
The shape of the trend/relationship- linear, parabolic, exponential, unclear
If there are values that are anomalies

14 of 26

Spearmans rank correlation

Non-parametric test so has the advantage of being distribution free
1.Rank the values x and y from 1 (largest) to 12 (smallest or however many there are) If two values are equal allocate to them the same average ranking
2. For each pair of values find the difference in rank between them (d) and square each difference (d2)
3. Sum the square of the differences
4. Complete the calculation of the Spearman correlation using the formula
5. The significance of the correlation coefficient is obtained from tables

15 of 26

Pearson's product moment correlation

Its outcome is a coefficient of correlation that has exactly the same properties as the Spearman's rank correlation coefficient
It is a parametric test so should only be used when sample data are drawn from a statistical population that has normal distribution

16 of 26

Testing relationship between data sets

Correlation

Correlation measures the statistical association between two variables, x and y.
Variable x is known as the independent variable and is responsible for changes in the dependent variable y
Identifying independent and dependent variables is not always straightforward

Correlation coefficients

Correlation coefficients measure the strength of a relationship or association between two variables
They vary on a scale of +1 to -1, where +1 is a perfect positive correlation and -1 perfect negative or inverse correlation. A correlation coefficient close to 0 suggests little or no relationship

17 of 26

Patterns

These are often shown on maps. This also covers morphology (shape) Patterns can be:

Nucleated or clustered together
Linear
Cuneiform or Cruciform (cross-shaped)
Regular
Concentric
Random or scattered/dispersed or amorphous

18 of 26

Networks

In most forms of network analysis there are some key limitations:

They treat all routes equally, regardless of their quality
They focus on linkages rather than time or distance of journeys
They look at planar (flat) networks- no flyovers etc
They ignore who uses that route

19 of 26

Network analysis

Any network can be broken down into its main elements. Different terms can be used for the same thing which can cause confusion

Centres in the network are called nodes or vertices (V)
The routes are called routes or edges (E)
Independent/unconnected parts are called sub-graphs (G)
A completely linked set of nodes and routes is called a circuit

Simplest measure of networks is the Beta index= E / V The network with a complete circuit will give a score of 1. The maximum result possible is 3

More complex measure is the Alpha index which compares the actual number of circuits with the maximum possible within the network- E - V + G / 2V - 5

Another measure is centrality which tells us how central or accessible a place is in the network.

20 of 26

Caution

When using statistical tests there are some key aspects must remember:

To state null hypothesis
To state your alternative hypothesis (if null is disproved)
To calculate the degrees of freedom
To know the level of significance you are prepared to accept

21 of 26

Inferential statistics

Are used to infer population values from sample values. This leads us to the concept of statistical significance and the probability that the outcomes of investigation based on sample data are due to chance

Standard error of the mean

Used to assess the value of the population mean from sample data sets
The logic is that if you took a large number of samples from a population, calculated the mean from each sample and then plotted them as a frequency curve, they would follow a normal distribution
Enables us to estimate the limits of the population mean because its relationship to the sampling distribution is the same as standard deviation to the normal frequency distribution
Standard error is related to the square of sample size

Standard error of the percentage

Often used when estimating the proportions of land-use types in an area from sample

22 of 26

Coefficient of determination

The coefficient of determination is the product moment correlation coefficient squared, expressed as a percentage
It measures the statistical variation in y 'explained' by x

23 of 26

Simple linear regression

Simple linear regression, involving two variables, x and y, is a technique for fitting a straight line to points on a scatter chart
The regression line is known as 'least squares' because it minimises the sum of the squares of the deviations from the line, and is statistically the 'best fit'
Regression allows us to predict a value of y from a known value of x
A regression equation provides us with precise model of the relationship between two variables and allows us to make comparisons with the same variables in other geographical locations
Regression models are inappropriate where data trends are curvilinear

24 of 26

Spatial statistics

Index of dissimilarity

The index of dissimiliarity is usually applied to the study of segregation among ethnic groups
It measures the unevenness with which two groups are distributed within small spatial units such as wards or census tracts
The index ranges from 0 to 1. The higher the score the more segregated the groups are
An index of zero means that the proportion of group B's population in each census tract is exactly the same as the proportion of group W's population

25 of 26

Nearest neighbour analysis and location quotients

Technique measuring point patterns in space
Gives precise descriptions to rural settlement patterns
The nearest neighbour index ranges from 0. where all the points form a single cluster, to 2.15 which is a perfectly uniform pattern
The technique is based on finding the average distance between points and their nearest neighbour. Taking each point in turn, the distance to thenearest neighbouring point is measured using the formula

Location quotients

Most often used to measure the concentration of an economic activity in an area or region compared to the national average
A location quotient of 1 shows that the activity is represented in exactly the same proportion as nationally
Less than 1 suggests that activity is more imoportant locally than nationally
More than 1 indictates that the activity is less important locally compared to the national average

26 of 26

Get Revising

Stage 5- Analyse and interpret the data

Analysing the data

Types of quantitative data analysis

Types of quantitative data analysis

Central tendency- mean

Central tendency- median and mode

Dispersion

Dispersion- Standard deviation

Tests for differences

Tests for difference

Calculating the U statistic

Calculating the one-sample chi-squared

Calculating the two-squared chi-squared

Tests for association

Trends and relationships

Spearmans rank correlation

Pearson's product moment correlation

Testing relationship between data sets

Patterns

Networks

Network analysis

Caution

Inferential statistics

Coefficient of determination

Simple linear regression

Spatial statistics

Nearest neighbour analysis and location quotients

Comments

Similar Geography resources:

Analysing the data

Types of quantitative data analysis

Types of quantitative data analysis

Central tendency- mean

Central tendency- median and mode

Dispersion

Dispersion- Standard deviation

Tests for differences

Tests for difference

Calculating the U statistic

Calculating the one-sample chi-squared

Calculating the two-squared chi-squared

Tests for association

Trends and relationships

Spearmans rank correlation

Pearson's product moment correlation

Testing relationship between data sets

Patterns

Networks

Network analysis

Caution

Inferential statistics

Coefficient of determination

Simple linear regression

Spatial statistics

Nearest neighbour analysis and location quotients

Comments

Related discussions on The Student Room

Similar Geography resources: