IT Services

print friendly version

How to...

Making and testing two-way tables with SPSS


Introduction

SPSS provides very convenient and powerful facilities for producing tables from survey data. This guide shows some simple ways to produce tables which are clear and effective, and it also shows how to perform appropriate tests of statistical significance.

[back to top]

Types of variables

The examples which follow use a fictitious database of 115 subjects for which age (in years), sex, nationality and income (sterling pounds equivalent), and an opinion about policy are recorded. Age has been grouped into a new variable as 21-30, 31-50, and 50+, while income has been grouped into another variable by approx 10,000 units. Views on policy are recorded on a scale from Strongly agree, Agree, Neutral, Disagree, and Strongly disagree.

SPSS

The data values were entered as numbers, in coded form, as follows:

By selecting View > Value Labels it is possible to see what the codes mean.

SPSS

 

We have three kinds of variable.

Interval data
Ages and incomes are recorded as actual values. It is meaningful to say that one person is twice as old as another, or earns three times as much. (Strictly these are ratio scales, but in this document we don't distinguish between interval and ratio scales).
Ordinal data
We can no longer compare people within age or income groups as directly as with the original values, although we can say that people earn more or less than others in different income groups. The policy variable shows a progression from Strongly agree to Strongly disagree, but we don't know whether the gap between Agree and Neutral is the same as the gap between Agree and Strongly Agree.
Categorical data
Data is categorical when there is no natural order for the items listed, for example men and women, or English, French, Italian.

A useful exception

When a variable has only two values, as for example gender, or a question with a yes/no answer, this may be treated as categorical, ordinal, or interval.

[back to top]

Producing simple tables

If you are wanting to compare two or more categories it is best to make them the columns of your table. For our first table we will look at the distribution of nationality by gender. We go to the menu at the top of the screen, and follow

Analyse > Descriptive Statistics > Crosstabs

This will show us a menu. We enter our choices into the menu boxes to produce a table. Each time that we return to this menu (providing that this is in the same computer session) it will reappear as we last left it. This makes it easy to repeat a table with a slightly different choice of output. (SPSS, before version 9, will show a slightly different box without the "exact" option.

SPSS

When this menu first appears on the screen, all of the variables are listed in the box at the left. Click on Nationality to highlight it, then click on the little black triangle to move it into the Rows box. Click on Sex to highlight it and move it into the columns box. Click on OK and the following table will appear:

This is a very simple table. To compare the distribution of nationalities by sex it is useful to see these figures in percentage terms. It is much easier to compare percentages working down the columns of a table than working across the rows. At the bottom of the menu shown above is a button marked Cells. Click on this to see:

SPSS

Remove the tick from the Observed Box and then put a tick into Percentages (Column).

Then click Continue and OK to produce the table

If you ticked both the Observed and the Column percentages boxes on the Cells menu you would get two tables for the price of one. However, such a combined table with alternate lines of numbers and percentages is not as easy to read as two separate tables. For a little extra effort and printed output there can be a considerable improvement in clarity.

[back to top]

Copying the table into Word

Make sure that you have typed at least one line of text into your document before you insert the table. Click on the table in SPSS and then use Edit > Copy Objects. Move into Word and either Paste Special > Picture or Paste Special > Formatted Text (RTF) (SPSS doesn't always offer you a choice). The table appears in your Word document. You can drag the table about, and if you click on it you can resize it. Resist the temptation to double-click on the table. You can't edit it in Word, and it may fall to pieces if you try. If you need to alter the table, delete it from your Word document, go back to SPSS to change it, and then copy it back into Word.

[back to top]

Improving the appearance of the table

If you double click on a table in SPSS it will appear in a striped frame. You can now click on various parts of the table and alter them in various ways. If you want to explore in detail, you can either experiment or look at the User's Guide. For the moment we will just improve the table of percentages. Initially these are shown to one decimal place (which implies an accuracy of 1 part in 1000 which is rather excessive for 115 subjects). Click on the 15.9% figure at the top left, and then shift click 100.0% on the bottom right. This will highlight all of the percentage figures. Now go to the Format > Cell Properties menu.

The Cell Properties menu box will open, showing you a number of ways to change the appearance of the numbers, and the number of decimal places to be printed.

SPSS

The format ##.#% is highlighted. Click the #.# at the top of the box so that it is highlighted, and then click on the lower righthand button of the Decimals box to change it from 1 to 0. Then click OK. The table now appears in the clearer format

When a table is selected (stripy frame) you can use the Format > Autofit menu which will try to reduce the column widths. This can make all the difference between a table which is too wide to fit on a printed page and one which fits comfortably. You can help the computer to do this effectively by making sure that every word in each label is fairly short - a column can't generally be narrower than the shortest word in a label.

Here's a table before autofitting

and here it is afterwards.

As you gain experience with SPSS you will realise that careful choices of the names and value labels of variables is very important. Tables look untidy if the margins are filled with excessively long and fussy labels. If SPSS doesn't have enough room to insert all of a label it will stop when it runs out of space. So, "Very long label 1" and "Very long label 2" may both appear as "Very long la", while "Label 1" and "Label 2" would have looked better.

Here's the previous table with a different choice of labels; decide for yourself which you prefer.

[back to top]

Testing for significance

This table shows two categorical variables (i.e. variables with no natural order). The chi-squared test is appropriate for this data. Note that "chi" is the name of the Greek letter c and it is pronounced "ky" as in Kylie Minogue, not like "Chi" in "Chinese".

SPSS

From the Crosstabs menu you can open the Statistics menu. Tick the Chi-square box and then Continue and OK.. The computer will display:

First check the note at the bottom of the table. If more than 25% of the cells have expected frequencies much less than 5, or the minimum expected count is less than 1, the chi-squared test may give misleading results. This table is acceptable, and the results should be shown in a report as "Chi-squared = 4.81 with 2 df, p = 0.090." There is no need to look up significance in tables; the Asymp. Sig (2-sided) figure which we have reported as "p" actually is the significance value. By the usual convention, if p is less than 0.05 the table counts as significant.

If the p value shown in the table is ".000", then report it as "p < 0.001".

[back to top]

Tables with small expected values

If 25% or more of the cells in the table have expected frequencies less than 5, or if any expected frequency is less than 1, then the Chi-squared test isn't always very accurate. Most people just want to know whether p is less than 0.05 or not. Calculated values of 0.40 is much too large to be anywhere near significance, and 0.000 is so small that the true figure is almost certain to be significant.

When a table has only two rows or two columns, the Chi-squared test may not be very accurate, and SPSS will provide Fisher's Exact Test when it is more appropriate. The result that you should report from the Fisher's Exact Test is the Exact Sig (2-sided) p value.

Exact tests (available in SPSS version 9 onwards)

SPSS

If a calculated value of p is close to 0.05 but the computer has warned you of small expected frequencies, then you can ask for the exact value to be calculated.

Click on the Exact button, and also click on the Exact button in the next menu box.

SPSS

The calculations are very complicated, so you can specify how long you are prepared to wait.If the computer doesn't finish in the available time it will provide an approximation which is probably better than the value it produced without the "Exact" option. If the computer reports that there isn't enough memory, return to the menu shown above, and choose Monte Carlo instead of Exact. The results will look like this:

SPSS

You can see that the Asymp Sig (2-sided) approximation of 0.090 is too large; the significance shown in the next column, 0.087 is a better estimate, and you can have 99% confidence that the true value lies betwen 0.079 and 0.094.

[back to top]

Testing tables when both variables are ordinal or interval.

SPSS

Here we will look at the relationship between Age and Income. In the statistics box tick Correlations (removing the tick from Chi-squared if it has one).

The results are as follows:

SPSS

The computer doesn't know whether your variables are interval or ordinal, so it produces statistics for both options. Beware; one of these options is probably not suitable for your data, so you must choose the right one to report. In the case of Age and Income both are interval variables, so the Pearson's R (which is what people usually mean when they talk about a correlation) is appropriate. If you double click on the table you can then click on ".397" and shift click on .000 to highlight the "Spearman Correlation" row. Press Delete to remove that row of the table.

The table of counts is ridiculously large with a separate row for each value of Income and a separate column for each value of Age. To show the values it would be better to use the Income Group and Age Group variables instead. A careful worker might show the grouped data in a printed table, but report the Pearson's R. Sometimes exact data values are not available; if the survey only asked people for their age group and income group, then the Pearson's R would not be suitable for the grouped data, and the Spearman's correlation should be used instead.

Some data is recorded only on an ordinal scale, such as the view on Policy. In this case, even if the other variable in the table is an interval variable, the Spearman correlation is the one to use.

Don't forget that here you may treat a variable with only two values as ordinal, so, for example, a table of Gender by Policy is ordinal by ordinal.

[back to top]

Why you don't need statistical tables with SPSS

In your statistics classes you probably learned how to calculate some simple statistics by hand. You collected your data, chose a test, did a lot of calculations, and eventually ended up with some numbers (the first of them probably "t", "F", or "chi") and then the "degrees of freedom". You then looked up these numbers in a statistical table and found out whether your test was "significant" or not. With SPSS there is no need to look up any results in tables, because SPSS has done it all for you.

In order to understand this, let's consider an example. Suppose that our data set comes from a multinational company that employs 5 million people. It declares that its equal income policy has achieved equal pay for men and women. If you had information about pay and gender for all 5 million people you could calculate the average income for men and the average income for women to see if they really were the same. In practice you would have to make do with a sample of a few hundred people. Now even if the average incomes are exactly the same in the population of 5 million, it is unlikely that they will be exactly the same in a sample. In some samples the men will earn a little more, and in some the women will earn a little more. On the other hand, if the difference was very large we would conclude that it was unlikely to arise from sampling variation, and would probably be a significant difference. Note carefully that "significant" doesn't mean "important"; it means too big to occur often by chance.

Now, on the null hypothesis that men and women have equal incomes, we can say perhaps that 60% of samples will differ by a small amount, but only 1% would differ by a large amount. If it is unlikely that the difference in the data we have actually collected would have arisen by chance in 5% of the samples we might have collected, then our results are significant at the 5% level.

If you do the calculations by hand you eventually get a figure which you can look up in statistical tables. Those tables will tell you if the difference is likely to occur by chance in 5% of possible samples, 1%, or 0.1%. They can't tell you that the figure is 43% or 2.7% or any figures other than those listed in the table. SPSS calculates the probability in a much more informative way. In the Chi-squared test we did earlier the probability of this sample under the null hypothesis is 0.090 so 9.0% of samples could have a chi-squared value as large as this by chance. That clearly isn't significant at the 5% level.

Most significance testing (unless you have a very clear reason for making another decision) is performed at the 5% level, so any Significance value (sometimes labelled "p" or "probability") less than 0.050 is "significant" at the 5% level.

[back to top]

Choosing a statistical test - Summary

When one or more of your variables is categorical with more than two categories, use the Chi-squared test.

If both variables are ordinal or interval (or one of them is, and the other has only two categories), select correlations. Report the Pearson R value when both variables are interval, or report the Spearman value when one or both are ordinal.

[back to top]

created on 2005-01-01 by David Hitchin
last updated on 2013-04-04 by David Guest