Levels of Measurement
This section will discuss the analysis of quantitative data. The first important consideration is to decide at what level the date are to be measured. The level of data measurement refers to the precision with which the data have been measured.
Nominal data are collected in 'categories'. This involves dividing participants into categories and counting how many are in each category. This is the most basic level of measurement and is really no more than a tally or a head count. An example of nominal data would be to ask all the people in the room to assign themselves to one or two categories either 'tall' or 'short'. We would then know how many people were in the 'tall' group and how many were in the 'short' group, but we would know little else. We would have a very crude measurement of height.
Ordinal data is the next level of measurement we could perform would involve asking all the people in the two groups to form a line with the 'tallest' person at the start and the 'shortest' person at the end. In this way we would now know something about the position of the people in the room with regard to height. We would know that the second person was taller than the third and the last person was the shortest in the room. This is known as ordinal data since we now know the position in the rank of all the participants.
Interval data, to get the most precise measurement of height, we could measure each person to find their exact height in meters. Now we have the most precise measurement of all and we have a lot more information about each person. We could establish that person 2 in the line was exactly 5cm shorter than person 1, and that the tallest person was 50cm taller than the shortest person. We could also be confident that the intervals on the ruler were exactly the same distance apart and that 2cm would be exactly double the distance of 1cm.
Measures of Central Tendency: The Mean
Measures of central tendency are used to reduce a set of numerical data down to a single value which represents the whole set.
The mean, this value is often referred to as the average value. It's what we get if we add up all the scores in a sample and divide the answer by the number of scores. It should only be used on interval data, data which you have obtained by measuring something which has a scale such as temperature, time or height.
The advantage is that often the mean is not the same value as any of the values in the group. It acts like the fulcrum of a balance see-saw sitting exactly at the centre of all the deviations from itself. This makes it the most sensitive measure of central tendency.
This sensitivity can be a disadvantage in certain circumstances. Suppose we add a sixth person's score to our set of anagram-solving times. This person isn't very good at anagram solving and had a bad nights sleep. This person stares at the anagram for exactly 600 seconds.
Now the mean value is not representative of the group in general. A single extreme score in one direction has distorted the mean.
The median is the central value of a set. If we have an odd number of values, it's easy to find. We simply put all the values in numerical order and find the central value. The scores from the anagram solving exercise were 95, 109, 121, 135, 140. The median is 121.
If there's an even number of values, as with the sixth person's time added, we take the mean of the two central values. 95, 109, 121, 135, 140, 480, the median is (121+135)/2 = 128
Notice that this vale is still reasonably representative of the group values.
Advantages are that it can be used on interval or ordinal level data, ordinal data are measured on a scale which you have designed for the purposes of your experiment, how aggressive a person feels on a scale of 1-10. Easier to calculate than the mean, unaffected by extreme values in one direction.
Disadvantages are that it doesn't take into account the exact values of each item, if values are few it can be unrepresentative. With 2,3,5,98,112 the median would be 5.
The mode is the most frequent, most common value. We use this if we have nominal data which are measured as frequencies, such as number of men who do the washing up.
If we assign a category number to types of play observed in children 1, 2, 3, 4, 5, 6, 7, 8, we can't calculate a mean or median. What we can do is say which type of play was most frequently engaged in. The mode for this set of number 1, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 6, 6, 7, 7, 7, 8. is 5, as this value occurs most often. For the set of number 7, 7, 7, 8, 8, 9, 9, 9, 10, 10 there are two modes, 7 and 9, and the set is said to be BI- MODAL.
The advantages are it can be used on interval, ordinal or nominal frequency data, shows the most important values of a set, unaffected by extreme values in one direction.
Disadvantages are that it doesn't take into account the exact value of each item. Not useful for relatively small sets of data where several values occur equally frequently.
Measures of Dispersion
A measure of dispersion is another way of summarising data. This measure tells us how 'spread out' the scores are.
The most simplest way to express the spread of a set of scores is to use the range. This tells us over how many numbers a set of scores is spread and is calculated simply by taking the smallest number away from the largest. The problem with this is that extreme values affect the result.
10, 11, 11, 12, 12, 13, 13, 13, 14. 10, 11, 11 ,12, 12, 13, 13, 13, 20.
One single figure changes the range from 4 to 10. To overcome this problem, the interquartile range can be calculated. This is calculated using the 25% of scores immediately below the median and the 25% of scores immediately above it. The interquartile range therefore, measures only the spread of the middle 50% of values when they're places in numerical order.
The standard deviation is a more sophisticated measure of dispersion since is calculates the average distance of all the scores form the mean. It's only used when a precise level of measurement, such as interval level has been used to gather the data.
Graphs are when you report the results fro a piece of research, you'll want to present them in a way which shows the overall pattern of your data. These are called descriptive statistics. Graphs show the reader at a glance any patterns we have found in the data. We will look at the three most common graphs used in psychological research, namely, the bar chart, histogram and scattergram.
The Bar Chart features the horizontal axis which doesn't always have a discrete scale but may have nominal labels for each column, the columns should be separated and of equal width. The columns can represent single statistics such as the mean or a percentage.
Combines Bar Charts are used to display two values together.
The Histogram is raw data, difficult to interpret and takes up too much space. They can collated into a table known as a frequency distribution. As each class interval is scored, the frequency of the class before it, is added to give a cumulative frequency.
All categories are represented.
Columns are equal width because they represent equal class intervals.
Empty intervals are plotted, columns can be added together to find the total frequency they represent.
The more the points cluster around a straight line, the stronger the correlation.
A line running from upper left to lower right shows a negative correlation.
A line running from lower left to upper right shows a positive correlation.
This involves analysis of non-numerical data generated from observational research and from open questions in interviews and questionnaires. When analysing these types of data, researchers must look for the underlying meaning in what people say or do. They can do this using pure qualitative analysis or by using rating scales or other coding mechanisms to arrange the data into categories and therefore, make them quantitative.
This is very time consuming but does avoid the problem of a particular narrative or behaviour not fitting into a pre determined category. The process of pure qualitative analysis involves transcribing the data in the exact form it was said, reading through repeatedly in order to identify emergent themes.
This technique can be used to analyse transcript of interviews, TV programmes, newspapers, magazines and websites. The researcher creates a coding system of predetermined categories at the start of the study which is then used to categorise the underlying themes in the material in a consistent manner.
The researcher then counts how many times a theme of code word appears. This process translates qualitative data into quantitative frequency data which can be analysed in the normal way.