SPSS Handbook

Compiled by Mikaila Arthur for Research Methods in Sociology
New York University, Fall 2003 and Fall 2004


  1. Basic Use of SPSS
  2. Frequencies
  3. Utilities
  4. Split File
  5. Recoding
  6. Indexes
  7. Correlation Matrixes
  8. Graphs
  9. Making SPSS Datafiles
Basic Use of SPSS:

To Open Data:
Start SPSS. Go to the FILE menu and click on OPEN, then DATA. Select the file you want and click OPEN.

Saving Files
Go to FILE, then click SAVE. If you want to save the file with a new name, click SAVE AS.


Go to FILE. To see what your printout will look like, click PRINT PREVIEW. Then, at the top of the screen, click the button that says PRINT. You may want to consider clicking PROPERTIES and telling it to print in LANDSCAPE. When you are ready, click OK, and it will print.

[back to top]
Using Frequencies:

For general statistical measures of particular variables.

SPSS Frequencies Dialogue Box

Choose the variables you want, and use the arrow key diagram key  to put them in the VARIABLE(S) box. Select the output you want:
  • DISPLAY FREQUENCY TABLES produced tables showing how often each particular possible result for each variable occurred.
  • Under STATISTICS: PERCENTILE VALUES shows you below which value you find that percent of responses. You can choose quartiles (25%, 50%, 75%), CUT POINTS (you choose how many groups you want), or PERCENTILES (you specify which percentiles you want. CENTRAL TENDENCY refers to measures telling you where the average is. MEAN is the mathematical average where you add up all the numbers and divide by how many numbers there are. MEDIAN is the middle value if you line up all the numbers in a row. MODE is the value that occurs the most frequently. SUM is the total if you add up all the values. DISPERSION has to do with how far the values are
scattered from the central tendency. The RANGE is the difference between the MINIMUM and the MAXIMUM. VARIANCE is a measure of the variation in values. STANDARD DEVIATION is like the variance, but expressed in standard units of measurement. Ignore the S.E. MEAN and the DISTRIBUTION section, because they are primarily for more advanced users.

  • Under CHARTS, you can choose BAR CHARTS or PIE CHARTS. These both express the values of the variable in a graphical format. Each one can be displayed either in FREQUENCIES (the number of times each value appears) or PERCENTAGES (the percent of times each value appears). Your third choice is a HISTOGRAM, which is similar to a bar chart but groups values. If you select the NORMAL CURVE, the histogram will appear with a curve superimposed on it which allows you to see trends in the data. See right for an image of a histogram.

SPSS Histogram of Age

[back to top]

For getting basic information about the variables in your dataset.

SPSS Utilities-Variables for Degree
Go to the UTILITIES menu and click on VARIABLES. You will get a window where you can choose any variable. If you select a variable, it will tell you the name of the variable, the variable label, any missing values, and the labels for all of the variable values. The same information can also be obtained by clicking the VARIABLE VIEW tab at the bottom of the screen.

This is a very useful feature, but does not substitute for consulting the codebook.

[back to top]
Split File:

For analyzing the values of one variable separately.

Go to DATA, SPLIT FILE. Select ORGANIZE OUTPUT BY GROUPS. Then select the variable you want to split by and use the spss arrow key diagram key to put the variable in the GROUPS BASED ON box, and then click OK.


SPSS Split File Dialouge Box

[back to top]

Recoding is used to change how a variable is displayed and/or measured. For instance, you can take an interval/scale variable and transform it into a ordinal, grouped variable.

Go to the TRANSFORM menu, click on RECODE, then INTO DIFFERENT VARIABLES. It is very important that you choose INTO DIFFERENT so that you do not permanently change the original variables in the dataset.
Recode windows in SPSS Select the variable you want to recode and put it into the box using the arrow button button. Specify the NAME and the LABEL for the new variable (make sure these are different from the original variable and are identifiable) and press CHANGE.

Then press OLD AND NEW VALUES. Under OLD VALUE, specify the value from the original variable that you want to change. You can specify:
  • VALUE, a single numerical value;
  • SYSTEM-MISSING or SYSTEM-OR-USER MISSING, both for missing values (the difference only matters for more advanced users);
  • RANGE, either minimum through a number you specify, a number you specify through maximum, or between two numbers you specify; or
Under NEW VALUE, either:
  • Enter the VALUE you want to use;
  • Select SYSTEM-MISSING for missing values; or
  • COPY THE OLD VALUE to keep values the same.
After each value is recoded (including the last one), click ADD. Use CHANGE to modify values you have already entered, and REMOVE to delete values you have already entered. The other two options have to do with variables that are strings (meaning not numbers); you probably will not have to worry about these. Click CONTINUE after you have added all of your values.

The IF button allows you to perform more complex mathematical functions as you recode your variable. It is used primarily by advanced users. When you are finished recoding, press OK.
[back to top]

For making single variables that take into account the values of a number of different values. You can use this, for instance, to create a measure of political attitude by adding together respondents' opinions on various individual political issues.

Go to the TRANSFORM menu and click on COMPUTE. Enter the NAME for the new variable you are creating. Click on TYPE&LABEL to specify the LABEL for the variable, and make sure for TYPE, NUMERIC is selected. Then click CONTINUE.

Using the arrow button button and the mathematical keys, select the variables and computations you want to use. Make sure to use proper order of operations mathematical format, meaning in particular use parentheses to enclose expressions that you want to have happen first.
Example: Let's say you had the highest level of education for your respondent (EDUC), your respondent's mother (MAEDUC), your respondent's father (PAEDUC), and your respondent's spouse (SPEDUC) and you wanted a variable representing average educational attainment in the respondent's family. You could  enter the following into the box:
Creating an index in SPSS

When you are finished, click OK. You can now use FREQUENCIES to get information about your new variable.

N.B.: In many cases, you will need to use the IF button to deal with problems related to the addition of missing values information. If you do a COMPUTE and the results seem rather strange, this is probably why. You'll need to consult individually with an advanced user to deal with this unless you have a decent knowledge of boolean logic, in which case you can click on the IF button, INCLUDE IF CASE SATISFIES CONDITION, and use boolean computation to exclude missing values. For class, if you run into this problem, ask in lab for help.

[back to top]
Correlation Matrixes:

Correlation matrixes are the first level of actual data analysis performed in SPSS. There are other methods of data analysis, particularly regression, that you can learn to do in statistics courses. Correlation matrixes compare 2 (or 3, but at this level usually 2) variables in a table.

running crosstabs in spss

Go to the ANALYZE menu, click on DESCRIPTIVE STATISTICS, then click on CROSSTABS. Using the arrow button button, select the ROW and COLUMN variables for your table. Be sure to put the independent variable in the column and the dependent variable in the row. If you want additional variables, you can put them in the third box, on the bottom of the dialogue, and you can add more by clicking NEXT. Be sure you have a rationale for using more than 2 variables, however, and be sure you will understand what your output means.

DISPLAY CLUSTERED BAR CHARTS creates bar graphs with separate, color-coded bars for different values of the variables. SUPRESS TABLES makes the crosstabs tables not generate -- so don't pick that option.

Under STATISTICS, select CHI-SQUARE. This will tell you if your results are statistically significant. The other options are primarily for advanced users.
Under CELLS, select OBSERVED, which will tell you what the number of cases in each of the cells of your table is. If you wish, you can also select EXPECTED, which will tell you how many cases would be in each cell if there were no relationship between the two variables. You may also want to select COLUMN PERCENTAGES, which will tell you the percent of the total that is in each cell. Don't worry about the other options.

When you are done, click OK.

Understanding your output:

After you run a crosstabs, the OUTPUT window will open with the results.

The CASE PROCESSING SUMMARY tells you how many cases were used in the computation, how many had missing values, and how many cases total exist.

The CROSSTABULATION shows your primary results in a table with the independent variable on top and the dependent variable on the side. If you have selected options as suggested above, each cell with have three numbers in it: the number of cases in the cell, the number of cases which would be expected in the cell if no relationship existed, and the percent of that column which is in that cell. There are also TOTALS for the rows, the columns, and the entire table.

The CHI-SQUARE tests demonstrate whether your output is statistically significant. You should focus on the first row, PEARSON CHI-SQUARE, and the third column,  ASYMP. SIG. (2-SIDED). The number you will find there will be a decimal ranging from .000 to .999. Depending on the context, you will need a number under .010 or under .005 for the results to be meaningful. Ignore the other numbers in the table. They are primarily for more advanced users.

You can save your output by going to FILE, then SAVE. The output will only be readable by SPSS. Or, you can right-click on the table and select COPY OBJECT to paste it into a word-processor document.
Results of a crosstabulation

[back to top]


You can make many different kinds of graphs in SPSS, including bar, line, and pie charts. In general, when you want to make a graph, you go to the GRAPH menu and select the specific type of graph you want to make. Just like in other SPSS functions, you need to specify which variable or variables you want to graph.

There is one specific form of graph, however, that is likely to be especially useful to you. That is the scatterplot. Scatterplots are the simplest way of displaying correlations between continuous variables. In order to make a scatterplot, go to the GRAPH menu and choose SCATTERPLOT. Of the four options you are presented, you should choose SIMPLE.

You should choose two variables. The independent variable goes in the X-AXIS box and the dependent variable goes in the Y-AXIS box.

Under TITLES, you should specify a TITLE for your graph, and you can also choose to specify a SUBTITLE and/or a FOOTNOTE.

To interpret scatterplots, you look at the visual relationship that you see between the two variables. In general, you are looking to find either a direct/positive, an indirect/negative, or no relationship, though at times other substantively interesting relationships may appear.
screenshot of SPSS scatterplot dialogue

[back to top]
Making SPSS Datafiles:

Instructions on how to create SPSS files using your own data.

Before you start, be sure you know what all of your variables are and how you will code them.

Creating SPSS Datafiles
Click on the VARIABLE VIEW tab at the bottom of the screen. You will see the window that is use to enter information about variables into SPSS. Across the top of the screen are a series of column headers: NAME, TYPE, WIDTH, DECIMALS, LABEL, VALUES, MISSING, COLUMNS, ALIGN, and MEASURE. Follow these steps for each variable:
  • For NAME, enter a name for your variable. It can not be more than 8 characters long, can only contain letters, numbers, and underscores ( _ ), and may not start with a number. Try to have it be somewhat meaningful.
  • For TYPE, choose the type that mostly closely resembles the data you are using. Click on the three dots button button to get the dialogue box. In most cases, you will select NUMERIC. The other primary types are DATE (for calendar dates), DOLLAR (for dollar amounts like salaries or prices), and STRING (for words). You can also specify the WIDTH (the number of characters long your data will be) and the DECIMAL PLACES now.
Continue filling in all the columns for each of your variables until you are done. Then, click on DATA VIEW and enter the data. Each numbered line refers to one respondent or case. You should go across the screen, filling in the data for each variable. Make sure you save often.

[back to top]

Copyright 2003, 2004 Mikaila Mariel Lemonik Arthur. All rights reserved. Contact Mikaila.Arthur AT nyu.edu.