by Mikaila Arthur for Research Methods
New York University, Fall 2003 and Fall 2004
- Basic Use of SPSS
- Split File
- Correlation Matrixes
- Making SPSS Datafiles
To Open Data:
Start SPSS. Go to the FILE menu and
click on OPEN, then DATA. Select the file you want and click OPEN.
Go to FILE, then click SAVE. If you want to save the file with a new
name, click SAVE AS.
Go to FILE. To see what your printout will look like, click PRINT
PREVIEW. Then, at the top of the screen, click the button that says
PRINT. You may want to consider clicking PROPERTIES and telling it to
print in LANDSCAPE. When you are ready, click OK, and it will print.
For general statistical measures of particular variables.
scattered from the central tendency. The RANGE is the
difference between the MINIMUM and the MAXIMUM. VARIANCE is a measure
of the variation in values. STANDARD DEVIATION is like the variance,
but expressed in standard units of measurement. Ignore the S.E. MEAN
and the DISTRIBUTION section, because they are primarily for more
|Go to ANALYZE, DESCRIPTIVE STATISTICS, FREQUENCIES.
Choose the variables you want, and use the key to
them in the VARIABLE(S) box. Select the output you want:
- DISPLAY FREQUENCY TABLES produced tables showing how often
each particular possible result for each variable occurred.
- Under STATISTICS: PERCENTILE VALUES shows you below which
find that percent of responses. You can choose quartiles (25%, 50%,
75%), CUT POINTS (you choose how many groups you want), or PERCENTILES
(you specify which percentiles you want. CENTRAL TENDENCY refers to
measures telling you where the average is. MEAN is the mathematical
average where you add up all the numbers and divide by how many
numbers there are. MEDIAN is the middle value if you line up all the
numbers in a
row. MODE is the value that occurs the most frequently. SUM is the
total if you add up all the values. DISPERSION has to do with how far
the values are
- Under CHARTS, you can choose BAR CHARTS or PIE CHARTS.
These both express the values of the variable in a graphical format.
Each one can be displayed either in FREQUENCIES (the number of times
each value appears) or PERCENTAGES (the percent of times each value
appears). Your third choice is a HISTOGRAM, which is similar to a bar
chart but groups values. If you select the NORMAL CURVE, the histogram
will appear with a curve superimposed on it which allows you to see
trends in the data. See right for an image of a histogram.
For getting basic information about the
variables in your dataset.
| Go to the UTILITIES menu and click on
VARIABLES. You will get a window where you can choose any variable. If
you select a variable, it will tell you the name of the variable, the
variable label, any missing values, and the labels for all of the
variable values. The same information can also be obtained by clicking
the VARIABLE VIEW tab at the bottom of the screen.
This is a very useful feature, but does not substitute for consulting
For analyzing the values of one variable
Go to DATA, SPLIT FILE. Select ORGANIZE OUTPUT BY GROUPS. Then select
the variable you want to split by and use the key to put the
variable in the GROUPS BASED ON box, and then click OK.
Recoding is used to change how a variable is displayed and/or measured.
For instance, you can take an interval/scale variable and transform it
into a ordinal, grouped variable.
Go to the TRANSFORM menu, click on RECODE, then INTO DIFFERENT
VARIABLES. It is very important that you choose INTO DIFFERENT
so that you do not permanently change the original variables in the
each value is recoded (including the last one), click ADD. Use CHANGE
to modify values you have already entered, and REMOVE to delete values
you have already entered. The other two options have to do with
variables that are strings (meaning not numbers); you probably will not
have to worry about these. Click CONTINUE after you have added all of
||Select the variable you want to recode and put it into
the box using the
button. Specify the NAME and the LABEL for the new variable (make sure
these are different from the original variable and are identifiable)
and press CHANGE.
Then press OLD AND NEW VALUES. Under OLD VALUE, specify the value from
the original variable that you want to change. You can specify:
Under NEW VALUE, either:
- VALUE, a single numerical value;
- SYSTEM-MISSING or SYSTEM-OR-USER
MISSING, both for missing values (the difference only matters for more
- RANGE, either minimum through a number
you specify, a number you specify through maximum, or between two
numbers you specify; or
- ALL OTHER VALUES.
- Enter the VALUE you want to use;
- Select SYSTEM-MISSING for missing
- COPY THE OLD VALUE to keep values the
The IF button allows you to perform more complex mathematical functions
as you recode your variable. It is used primarily by advanced users.
When you are finished recoding, press OK.
For making single variables that take into account the values of a
number of different values. You can use this, for instance, to create a
measure of political attitude by adding together respondents' opinions
on various individual political issues.
Go to the TRANSFORM menu and click on COMPUTE. Enter the NAME for the
new variable you are creating. Click on TYPE&LABEL to specify the
LABEL for the variable, and make sure for TYPE, NUMERIC is selected.
Then click CONTINUE.
Using the button and the
mathematical keys, select the variables and computations you want to
use. Make sure to use proper order of operations mathematical format,
meaning in particular use parentheses to enclose expressions that you
want to have happen first.
say you had the highest level of education for your respondent (EDUC),
your respondent's mother (MAEDUC), your respondent's father (PAEDUC),
and your respondent's spouse (SPEDUC) and you wanted a variable
representing average educational attainment in the respondent's family.
You could enter the following into the box:
MAEDUC + PAEDUC + SPEDUC) / 4
When you are finished, click OK. You can now use FREQUENCIES to get
information about your new variable.
N.B.: In many cases, you will need to use the IF button to deal with
problems related to the addition of missing values information. If you
do a COMPUTE and the results seem rather strange, this is probably why.
You'll need to consult individually with an advanced user to deal with
this unless you have a decent knowledge of boolean logic, in which case
you can click on the IF button, INCLUDE IF CASE SATISFIES CONDITION,
and use boolean computation to exclude missing values. For class, if
you run into this problem, ask in lab for help.
Correlation matrixes are the first level of
actual data analysis performed in SPSS. There are other methods of data
analysis, particularly regression, that you can learn to do in
statistics courses. Correlation matrixes compare 2 (or 3, but at this
level usually 2) variables in a table.
CELLS, select OBSERVED, which will tell you what the number of cases in
each of the cells of your table is. If you wish, you can also select
EXPECTED, which will tell you how many cases would be in
each cell if there were no relationship between the two variables. You
may also want to select COLUMN PERCENTAGES, which will tell you the
percent of the total that is in each cell. Don't worry about the other
Go to the ANALYZE menu, click on DESCRIPTIVE STATISTICS, then click on
CROSSTABS. Using the
button, select the ROW and COLUMN variables for your table. Be sure to
put the independent variable in the column and the dependent variable
in the row. If you want additional variables, you can put them in the
third box, on the bottom of the dialogue, and you can add more by
clicking NEXT. Be sure you have a rationale for using more than 2
variables, however, and be sure you will understand what your output
DISPLAY CLUSTERED BAR CHARTS creates bar graphs with separate,
color-coded bars for different values of the variables. SUPRESS TABLES
makes the crosstabs tables not generate -- so don't pick that option.
Under STATISTICS, select CHI-SQUARE. This will tell you if your results
are statistically significant. The other options are primarily for
When you are done, click OK.
Understanding your output:
After you run a crosstabs, the OUTPUT window will open with the
The CASE PROCESSING SUMMARY tells you how many cases were used in the
computation, how many had missing values, and how many cases total
CROSSTABULATION shows your primary results in a table with the
independent variable on top and the dependent variable on the side. If
you have selected options as suggested above, each cell with have three
numbers in it: the number of cases in the cell, the number of cases
which would be expected in the cell if no relationship existed, and the
percent of that column which is in that cell. There are also TOTALS for
the rows, the columns, and the entire table.
The CHI-SQUARE tests demonstrate whether your output is statistically
significant. You should focus on the first row, PEARSON CHI-SQUARE, and
the third column, ASYMP. SIG. (2-SIDED). The number you will find
there will be a decimal ranging from .000 to .999. Depending on the
context, you will need a number under .010 or under .005 for the
results to be meaningful. Ignore the other numbers in the table. They
are primarily for more advanced users.
You can save your output by going to FILE, then SAVE. The output will
only be readable by SPSS. Or, you can right-click on the table and
select COPY OBJECT to paste it into a word-processor document.
You can make many different kinds of graphs
in SPSS, including bar, line, and pie charts. In general, when you want
to make a graph, you go to the GRAPH menu and select the specific type
of graph you want to make. Just like in other SPSS functions, you need
to specify which variable or variables you want to graph.
one specific form of graph, however, that is likely to be especially
useful to you. That is the scatterplot. Scatterplots are the simplest
way of displaying correlations between continuous variables. In order
to make a scatterplot, go to the GRAPH menu and choose SCATTERPLOT. Of
the four options you are presented, you should choose SIMPLE.
You should choose two variables. The independent variable goes
in the X-AXIS box and the dependent variable goes in the Y-AXIS
Under TITLES, you should specify a TITLE for your graph, and you can
also choose to specify a SUBTITLE and/or a FOOTNOTE.
To interpret scatterplots, you look at the visual relationship that you
see between the two variables. In general, you are looking to find
either a direct/positive, an indirect/negative, or no
relationship, though at times other substantively interesting
relationships may appear.
Instructions on how to create SPSS files using your own data.
Before you start, be sure you know what all of your variables are and
how you will code them.
|Click on the VARIABLE VIEW tab at the bottom of the
screen. You will
see the window that is use to enter information about variables into
SPSS. Across the top of the screen are a series of column headers:
NAME, TYPE, WIDTH, DECIMALS, LABEL, VALUES, MISSING, COLUMNS, ALIGN,
and MEASURE. Follow these steps for each variable:
- For NAME, enter a name for your
variable. It can not be more than
8 characters long, can only contain letters, numbers, and underscores (
_ ), and may not start with a number. Try to have it be somewhat
- For TYPE, choose the type that mostly
closely resembles the data you are using. Click on the
button to get the dialogue box.
In most cases, you will select NUMERIC. The other primary types are
DATE (for calendar dates), DOLLAR (for dollar amounts like salaries or
prices), and STRING (for words). You can also specify the WIDTH (the
number of characters long your data will be) and the DECIMAL PLACES now.
Continue filling in all the columns for each of your
variables until you are done. Then, click on DATA VIEW and enter the
data. Each numbered line refers to one respondent or case. You should
go across the screen, filling in the data for each variable. Make sure
you save often.
- If you have not already done so, specify the
WIDTH and the DECIMAL PLACES.
- For LABEL, write in meaningful form what your
variable is. The length of this statement is pretty much unlimited, but
10 words is a good maximum.
- The VALUES column is only useful to you if
your data is neither in word form nor a scale or interval level
variable. You use VALUES to change multiple-choice type answers into
numbers. For instance, you could ask respondents for their sexes, and
represent male as 1 and female as 2. Click on the button
to get the dialogue box. Add the numerical VALUE and the VALUE LABEL
(meaningful text explaining the what the value refers to--you have 30
characters). You must click ADD after entering each value,
including the last one. Use CHANGE to modify a value you have already
entered and REMOVE to delete a value you have already entered. When you
are finished, click OK. button to get the dialogue box. There, enter
the numeric value under VALUE and the descriptive label under VALUE
LABEL (you have 30 characters, maximum).
- MISSING VALUES is where you enter values to
represent missing data. For instance, you could have a value that
represents "not applicable" and another that represents "respondent
refused to answer." For the purposes of data analysis, however, these
should both be treated the same, as missing data. Click on the button to get
the dialogue box. If there is no missing data, click the radio
button next to NO MISSING VALUES. If you have 1, 2, or 3 kinds of
missing values, click the radio button next to DISCRETE MISSING VALUES
and enter the values in the boxes. For more than 3 kinds of missing
values, make sure they are values next to each other plus, if you wish,
one other value (for example, the missing values could be 1, 25, 26,
26, 28, and 29). Then you can click the radio button next to RANGE PLUS
ONE DISCRETE MISSING VALUE. Enter the minimum value in the range in the
LOW box, the maximum in the HIGH box, and the other value in the
DISCRETE VALUE box.
- COLUMNS specifies how many columns of display
will be provided for each variable. This number should usually be the
same as WIDTH, but you can make it larger or smaller if it will make it
easier for you to see the data.
- ALIGN again refers to how you want the data to
be displayed. Choose what you find most helpful.
- Finally, MEASURE is where you specify what
type of variable you are using. It is important that you choose
NOMINAL, ORDINAL, or SCALE correctly. Remember:
- NOMINAL refers to variables where
there is no numerical order implied by the data. Examples are gender,
race, or eye color.
- ORDINAL refers to variables where
there is a numerical order, but no mathematical operations could be
performed. This means that score with value 1 is not half the value of
a score with value 2 -- they are just arranged in a particular order.
The most common use of this variable type is for opinion or attitude
measures, but another example is highest degree earned in school.
- SCALE variables are those which can
be added and subtracted meaningfully, for example age and income.
2003, 2004 Mikaila Mariel Lemonik Arthur. All rights reserved. Contact
Mikaila.Arthur AT nyu.edu.