Objectives

What is “coding”

Computer programming, or coding is the use of a specific language to execute commands on a computer. The commands are usually called functions, though there’s more to it than that. A function is simply a mapping between some input and some output. It specifies a procedure for turning input into output.

For example, the sample mean is a function. The details are what the equation \(\frac{1}{N}\Sigma X\) says to do. This function has “arguments”, details to be passed to it and used. In this example, the arguments are N (the sample size) and the sum of the scores.

At it’s essence, R is a procedural language, meaning that programming in R is very literal, and involves writing out calls to various functions in order to obtain the output that you’d like to have. It also has “object-oriented” properties in the background. We don’t need to know about those, but they’re great since they make R very flexible as an open source tool. Open source means that anyone in the world can add functions or procedures to R by writing new code in R.

R Console versus the RStudio Script

In R, we have a choice of typing in commands at the console, which is just the command prompt or writing out many commands and statements in a file and running or executing some or all of them when we need them. The latter is basically the same as writing syntax in SPSS. Like syntax, R scripts can be written, saved, re-used, etc. In both cases, the idea is to be able to reconstruct your analysis exactly each time you re-do it. This saves time and energy, but also makes the analysis “reproducible” such that other scientists can check your work if they desire.

We will use the RStudio Script window to write and save our code. But let’s start with the console.

Type the following statements one by one in the console and see the output. You can actually copy each line one at a time and paste it in your console.

Press ENTER to have the console return results.

Note that any line that starts with # is not read by R. This is called a comment, or “commenting out” a line. These are handy in script files as they explain what is there to a person who doesn’t know what you were hoping to do (which is sometimes you!).

# addition
100 + 100

# subtraction
100 - 100

# multiplication
100 * 100

# division
100 / 100

# Squaring
100^2

# Raising to any power
100^(1.23)

# Using "e" (Euler's Number)
exp(-1.2)

Doing stuff

So at this point we can see that R is a handy-dandy calculator. More elaborate computations can be performed using parentheses and some other good stuff. Let’s try it, but now I want you to open up a new Script file. You can do that by clicking File > New File > R Script (or by various other short-cut methods).

The basic unit of data in R is a vector. Think of it like a row or column of data in Excel. Data objects aren’t very useful if we don’t save them to R’s Global Environment (this means the active datasets in R’s memory at any time).

We “assign” data to an object in R using the assignment operator <- . To type it you type the less than sign < then the short dash -. We put data in a vector by using the c() syntax. c is short for concatenate, a fancy word meaning string a bunch of stuff together.

When you assign data to an object (x below). It won’t print to the console. To see what’s in that object you can simply type it’s name at the console.

x <- c(1, 2, 3, 4) # assignment
x # print
## [1] 1 2 3 4

Using a script to generate data, and get the mean and standard deviation

Let’s say you’ve got 10 scores from different patients rating the quality of their care on a 5-item scale. The data are averages across the 5 items. A score of 5 means the patient was very satisfied.

Type in the following lines, you can omit the comments. When you are finished, click the button in your script window that says “Source.” This button sends all of your script to the console and runs it all. If instead you want to run one line at a time, highlight it with your cursor, or simply put your cursor at the beginning of the line and click the “Run” button. You can do this for the rest of the tutorial (that is, run each new line by itself rather than hitting source over and over again).

# make a small dataset
mydata <- c(3.0, 2.0, 5.0, 2.5, 3.5, 4.5, 2.0, 3.80, 1.0, 4.0)

# compute the mean using the sum of the scores divided by how many there are (the length of the vector holding them)
mymean <- sum(mydata) / length(mydata)

# compute sum of squares using similar commands note the parentheses!
mySS <- sum(mydata^2) - ((sum(mydata)^2) / length(mydata))

# compute the sample standard deviation by taking the square root of the sample variance
mysd <- sqrt(mySS / (length(mydata)- 1))

Ok, you may be wondering where the output is. Look at the Environment tab in your RStudio pane. You’ll see a list of Values and the names you gave them along with the results. If you did it correctly you should have:

##  [1] 3.0 2.0 5.0 2.5 3.5 4.5 2.0 3.8 1.0 4.0
## [1] 3.13
## [1] 14.221
## [1] 1.257025

You’ve done several things here, let’s unpack each one. Let’s compare our results with R’s built in functions

mean(mydata)
## [1] 3.13
sd(mydata)
## [1] 1.257025

Aha! We did it. As you can see, many things that you might be tempted to hard code are already functions / commands in R. We’ll see that this is true for many descriptive statistics in the next section. But first we need to start working with real data, and importing it into R.


Importing Data

The two types of files that are likely to contain data you already have are .sav files (SPSS) and either .csv or .xlsx files. I’m grouping the latter 2 together although they’re actually quite different. To make things easier, I’m going to show you how to import .csv files only and also SPSS files.

The best way to make sure the data file you’re importing is accessible is to make a folder on your computer, store the file in that folder, and then change the working directory to that folder

Set as working directory. Click the Gear icon to open this menu.

Set as working directory. Click the Gear icon to open this menu.

Formatting your data

Data can be cleaned in R quite efficiently, but we won’t worry about that. However, you want your data to adhere to the same format as SPSS files have. That is, you want each row to be a person (unit of observation) and each column to be a different measured or assigned variable. Missing data is fine, and R knows how to handle a variety of missing data.

Let’s try it. First, you’ll need to download the nurses.sav file. Save it to your Downloads folder if it doesn’t automatically go there.

Here’s where it get’s a little tricky. To import SPSS data, we will use the “read.spss” command. That command asks you to give the full path to the file. On windows and PC it’s a bit different. I have a Mac so the following works for me. Before we run it, we have to load the add-on package that contains the command. It is called foreign and you’ll likely need to install this package first. Following code loads the package, then runs the command by specifying the path to the file in quotes. This assumes that you have already set the working directory to the folder that contains nurses.sav.

library(foreign)
nurses <- read.spss("nurses.sav", to.data.frame = TRUE)

Anyway, you can verify that your data imported by looking at the Global Environment pane and clicking on the name of the dataset. It should pop up as a tab in RStudio and if you click on the tab you’ll see something that is very much like a spreadsheet. Also, you should see a blue circle with a little arrow back over by your dataset. It’s a quick way to view the columns in your data. Notice we have a “NA” at various places. This is R’s native missing data symbol and can be handled by various commands in various ways.

One more thing, let’s get some more details about our data before we analyze it. Try this

# get the "structure" of our data
str(nurses)
## 'data.frame':    1000 obs. of  20 variables:
##  $ hospital : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ ward     : num  1 1 1 1 1 1 1 1 1 2 ...
##  $ wardid   : num  11 11 11 11 11 11 11 11 11 12 ...
##  $ nurse    : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ age      : num  36 45 32 57 46 60 23 32 60 45 ...
##  $ gender   : Factor w/ 2 levels "male","female": 1 1 1 2 2 2 2 2 1 1 ...
##  $ experien : num  11 20 7 25 22 22 13 13 17 21 ...
##  $ stress   : num  7 7 7 6 6 6 6 7 7 6 ...
##  $ wardtype : Factor w/ 2 levels "general care",..: 1 1 1 1 1 1 1 1 1 2 ...
##  $ hospsize : Factor w/ 3 levels "small","medium",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ expcon   : Factor w/ 2 levels "control","experiment": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Zage     : num  -0.582 0.166 -0.914 1.162 0.249 ...
##  $ Zgender  : num  -1.66 -1.66 -1.66 0.6 0.6 ...
##  $ Zexperien: num  -1.002 0.487 -1.664 1.315 0.818 ...
##  $ Zstress  : num  2.07 2.07 2.07 1.04 1.04 ...
##  $ Zwardtype: num  -1 -1 -1 -1 -1 ...
##  $ Zhospsize: num  1.78 1.78 1.78 1.78 1.78 ...
##  $ Zexpcon  : num  0.992 0.992 0.992 0.992 0.992 ...
##  $ Cexpcon  : num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
##  $ Chospsize: num  1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, "variable.labels")= Named chr [1:20] "id" "id" "unique ward id" "id" ...
##   ..- attr(*, "names")= chr [1:20] "hospital" "ward" "wardid" "nurse" ...
##  - attr(*, "codepage")= int 65001

What you’ve done here is basically print to the screen what you get with the toggle circle button. This is very helpful if you have factors labels (as with gender) which you will see are included with SPSS imports.

The nurses data set has a special type called data frame. This is basically R’s version of a spreadsheet. It has different columns for variables, and different rows for cases. Each column is of a specific type: num for numeric, char for character (text), Factor for coded variables, int for integer, and logi for logical (True / False).

Descriptive statistics

We won’t have much use in this class for writing our own little commands and functions, we’ll use the built-in functions. Here we’ll check one out from the psych package. For more on this type ?psych in the console. Actually do it! If successful, you’ll see the help page for psych come up in one of your panes. In that window, click on describe it will be underlined in the 3rd line of the text. We’re going to use this function now. Note that in the “Usage” section of the help document, there’s lots of options for the describe function. They’re all default, so the only thing we need to do is simply specify what x is.

As with foreign the psych package needs to be loaded with the library() command. In practice, it’s best to put all of your library() commands at the top of your script and load them simultaneously. All we need to do is specify the data frame containing the data and (optionally) the columns.

library(psych)
# use the describe function from psych on nurses
# use only columns 2 through 21
describe(nurses[ , c("age","experien","stress") ])
##          vars    n  mean    sd median trimmed   mad min max range  skew
## age         1 1000 43.01 12.04     43   42.98 14.83  23  64    41  0.03
## experien    2 1000 17.06  6.04     17   17.06  5.93   1  38    37  0.03
## stress      3 1000  4.98  0.98      5    5.03  1.48   1   7     6 -0.45
##          kurtosis   se
## age         -1.18 0.38
## experien    -0.39 0.19
## stress       0.25 0.03

The column labels are pretty self explanatory, but your view in R studio may be different. For the most part, running something in the Script window will send output to the Console.

A quick note on indexing

The syntax I used above to select columns to analyze in the nurses data frame is only one option out of many. Let’s take a quick look at several ways to get the values in the “age” column

# using the $ selector
nurses$age
##    [1] 36 45 32 57 46 60 23 32 60 45 57 47 32 42 42 53 60 33 64 37 23 61 58 52
##   [25] 28 52 43 64 47 62 39 46 58 34 41 56 39 60 57 50 29 57 51 25 27 53 42 43
##   [49] 24 48 60 27 61 33 47 50 38 28 33 52 53 31 27 57 47 41 64 45 42 25 34 36
##   [73] 55 33 51 60 24 31 58 60 29 24 24 46 43 24 45 64 25 55 42 58 64 47 25 50
##   [97] 42 60 33 51 27 43 45 53 53 37 25 32 46 57 27 55 51 24 56 31 46 57 42 53
##  [121] 48 25 37 47 29 41 31 52 36 39 62 42 51 29 25 50 23 60 43 43 27 28 36 60
##  [145] 64 64 61 27 32 28 64 55 46 45 23 55 60 39 24 61 28 34 34 25 39 64 60 58
##  [169] 58 32 41 61 47 60 56 47 31 47 51 61 50 37 23 56 45 23 48 46 64 47 56 51
##  [193] 28 61 41 37 25 25 50 64 38 39 43 28 33 39 46 25 52 45 52 33 25 36 50 57
##  [217] 43 53 62 60 62 33 33 27 61 23 38 64 64 57 25 56 43 45 23 46 37 31 29 37
##  [241] 61 43 39 29 52 58 34 57 37 56 32 29 29 32 25 36 45 58 52 42 31 37 47 31
##  [265] 38 58 31 41 62 62 24 28 58 56 50 51 24 28 43 45 61 45 41 31 32 24 57 41
##  [289] 58 56 51 45 32 55 53 41 47 41 28 62 34 41 31 57 60 29 46 27 50 45 34 45
##  [313] 39 43 28 55 32 38 36 29 48 31 37 58 31 46 60 46 57 50 64 24 37 36 58 64
##  [337] 34 47 43 39 43 27 27 55 61 37 36 32 47 27 42 55 45 23 58 60 58 33 50 47
##  [361] 23 53 27 53 37 53 52 39 48 64 42 31 34 41 47 48 29 51 48 61 23 61 50 39
##  [385] 37 47 24 47 23 36 41 55 56 37 39 24 56 60 42 29 39 61 37 24 31 36 47 64
##  [409] 47 31 55 34 29 42 42 56 57 29 33 39 62 60 56 47 34 39 50 38 50 25 34 39
##  [433] 42 42 39 56 56 31 58 43 58 37 25 57 52 36 24 42 60 56 55 31 32 42 32 32
##  [457] 57 61 41 46 51 31 56 42 45 39 46 36 42 41 37 52 42 57 62 32 56 48 33 34
##  [481] 61 24 31 46 27 42 62 57 31 34 57 53 47 34 47 31 25 57 47 61 27 53 34 64
##  [505] 34 55 43 29 45 27 47 58 45 64 55 32 29 29 48 45 43 23 60 39 45 36 45 36
##  [529] 42 37 41 64 58 62 32 33 25 60 38 48 45 56 46 45 37 57 62 24 61 51 33 60
##  [553] 36 52 56 46 50 32 45 28 51 38 55 45 32 43 24 47 47 45 52 45 62 48 50 46
##  [577] 61 58 51 33 24 60 43 57 33 62 61 47 60 27 24 31 55 53 23 25 27 45 52 25
##  [601] 37 50 41 28 50 52 23 28 24 37 56 58 36 58 41 51 34 58 37 25 45 58 28 43
##  [625] 38 38 37 61 64 47 39 45 34 34 52 33 50 45 31 53 28 50 58 39 42 28 43 64
##  [649] 47 29 56 23 28 43 60 62 34 36 51 53 58 57 23 23 36 39 33 45 50 34 64 31
##  [673] 61 48 34 58 42 51 28 64 58 37 43 33 62 39 55 43 61 52 45 57 32 32 34 32
##  [697] 56 57 61 23 24 42 24 57 27 51 58 29 42 43 32 51 47 62 61 47 37 23 27 55
##  [721] 25 43 52 25 50 50 58 34 43 46 42 51 46 25 29 48 53 29 51 62 29 46 48 37
##  [745] 25 38 24 27 45 41 38 57 45 57 25 43 45 38 46 46 41 45 37 53 39 51 32 25
##  [769] 23 41 55 24 55 50 62 46 48 53 56 46 56 29 36 31 53 52 47 56 25 56 61 64
##  [793] 41 56 37 36 41 25 62 27 29 36 58 46 37 32 64 52 31 52 47 39 45 53 51 50
##  [817] 34 48 32 38 34 62 31 50 47 27 61 48 64 28 36 51 55 31 28 33 38 45 28 38
##  [841] 46 57 42 31 37 57 42 42 60 33 48 24 37 37 28 34 33 50 31 29 61 58 48 25
##  [865] 34 27 23 58 48 50 45 51 53 25 50 29 64 36 56 34 61 34 58 60 39 24 34 48
##  [889] 28 34 42 46 38 31 24 56 23 50 27 60 23 25 24 37 56 42 39 29 64 50 36 46
##  [913] 56 31 56 64 47 32 41 41 34 37 48 62 38 24 25 36 23 24 60 41 28 58 24 34
##  [937] 33 55 43 57 31 23 37 55 58 55 47 47 28 33 32 52 34 56 42 29 28 55 37 47
##  [961] 48 31 25 47 51 34 62 39 46 38 36 25 42 36 32 42 36 51 52 31 25 41 29 23
##  [985] 23 38 28 62 45 60 34 41 27 39 33 56 29 32 34 58
# using matrix indexing
nurses[ , 5]
##    [1] 36 45 32 57 46 60 23 32 60 45 57 47 32 42 42 53 60 33 64 37 23 61 58 52
##   [25] 28 52 43 64 47 62 39 46 58 34 41 56 39 60 57 50 29 57 51 25 27 53 42 43
##   [49] 24 48 60 27 61 33 47 50 38 28 33 52 53 31 27 57 47 41 64 45 42 25 34 36
##   [73] 55 33 51 60 24 31 58 60 29 24 24 46 43 24 45 64 25 55 42 58 64 47 25 50
##   [97] 42 60 33 51 27 43 45 53 53 37 25 32 46 57 27 55 51 24 56 31 46 57 42 53
##  [121] 48 25 37 47 29 41 31 52 36 39 62 42 51 29 25 50 23 60 43 43 27 28 36 60
##  [145] 64 64 61 27 32 28 64 55 46 45 23 55 60 39 24 61 28 34 34 25 39 64 60 58
##  [169] 58 32 41 61 47 60 56 47 31 47 51 61 50 37 23 56 45 23 48 46 64 47 56 51
##  [193] 28 61 41 37 25 25 50 64 38 39 43 28 33 39 46 25 52 45 52 33 25 36 50 57
##  [217] 43 53 62 60 62 33 33 27 61 23 38 64 64 57 25 56 43 45 23 46 37 31 29 37
##  [241] 61 43 39 29 52 58 34 57 37 56 32 29 29 32 25 36 45 58 52 42 31 37 47 31
##  [265] 38 58 31 41 62 62 24 28 58 56 50 51 24 28 43 45 61 45 41 31 32 24 57 41
##  [289] 58 56 51 45 32 55 53 41 47 41 28 62 34 41 31 57 60 29 46 27 50 45 34 45
##  [313] 39 43 28 55 32 38 36 29 48 31 37 58 31 46 60 46 57 50 64 24 37 36 58 64
##  [337] 34 47 43 39 43 27 27 55 61 37 36 32 47 27 42 55 45 23 58 60 58 33 50 47
##  [361] 23 53 27 53 37 53 52 39 48 64 42 31 34 41 47 48 29 51 48 61 23 61 50 39
##  [385] 37 47 24 47 23 36 41 55 56 37 39 24 56 60 42 29 39 61 37 24 31 36 47 64
##  [409] 47 31 55 34 29 42 42 56 57 29 33 39 62 60 56 47 34 39 50 38 50 25 34 39
##  [433] 42 42 39 56 56 31 58 43 58 37 25 57 52 36 24 42 60 56 55 31 32 42 32 32
##  [457] 57 61 41 46 51 31 56 42 45 39 46 36 42 41 37 52 42 57 62 32 56 48 33 34
##  [481] 61 24 31 46 27 42 62 57 31 34 57 53 47 34 47 31 25 57 47 61 27 53 34 64
##  [505] 34 55 43 29 45 27 47 58 45 64 55 32 29 29 48 45 43 23 60 39 45 36 45 36
##  [529] 42 37 41 64 58 62 32 33 25 60 38 48 45 56 46 45 37 57 62 24 61 51 33 60
##  [553] 36 52 56 46 50 32 45 28 51 38 55 45 32 43 24 47 47 45 52 45 62 48 50 46
##  [577] 61 58 51 33 24 60 43 57 33 62 61 47 60 27 24 31 55 53 23 25 27 45 52 25
##  [601] 37 50 41 28 50 52 23 28 24 37 56 58 36 58 41 51 34 58 37 25 45 58 28 43
##  [625] 38 38 37 61 64 47 39 45 34 34 52 33 50 45 31 53 28 50 58 39 42 28 43 64
##  [649] 47 29 56 23 28 43 60 62 34 36 51 53 58 57 23 23 36 39 33 45 50 34 64 31
##  [673] 61 48 34 58 42 51 28 64 58 37 43 33 62 39 55 43 61 52 45 57 32 32 34 32
##  [697] 56 57 61 23 24 42 24 57 27 51 58 29 42 43 32 51 47 62 61 47 37 23 27 55
##  [721] 25 43 52 25 50 50 58 34 43 46 42 51 46 25 29 48 53 29 51 62 29 46 48 37
##  [745] 25 38 24 27 45 41 38 57 45 57 25 43 45 38 46 46 41 45 37 53 39 51 32 25
##  [769] 23 41 55 24 55 50 62 46 48 53 56 46 56 29 36 31 53 52 47 56 25 56 61 64
##  [793] 41 56 37 36 41 25 62 27 29 36 58 46 37 32 64 52 31 52 47 39 45 53 51 50
##  [817] 34 48 32 38 34 62 31 50 47 27 61 48 64 28 36 51 55 31 28 33 38 45 28 38
##  [841] 46 57 42 31 37 57 42 42 60 33 48 24 37 37 28 34 33 50 31 29 61 58 48 25
##  [865] 34 27 23 58 48 50 45 51 53 25 50 29 64 36 56 34 61 34 58 60 39 24 34 48
##  [889] 28 34 42 46 38 31 24 56 23 50 27 60 23 25 24 37 56 42 39 29 64 50 36 46
##  [913] 56 31 56 64 47 32 41 41 34 37 48 62 38 24 25 36 23 24 60 41 28 58 24 34
##  [937] 33 55 43 57 31 23 37 55 58 55 47 47 28 33 32 52 34 56 42 29 28 55 37 47
##  [961] 48 31 25 47 51 34 62 39 46 38 36 25 42 36 32 42 36 51 52 31 25 41 29 23
##  [985] 23 38 28 62 45 60 34 41 27 39 33 56 29 32 34 58
# using the column name and matrix indexing
nurses[, "age"]
##    [1] 36 45 32 57 46 60 23 32 60 45 57 47 32 42 42 53 60 33 64 37 23 61 58 52
##   [25] 28 52 43 64 47 62 39 46 58 34 41 56 39 60 57 50 29 57 51 25 27 53 42 43
##   [49] 24 48 60 27 61 33 47 50 38 28 33 52 53 31 27 57 47 41 64 45 42 25 34 36
##   [73] 55 33 51 60 24 31 58 60 29 24 24 46 43 24 45 64 25 55 42 58 64 47 25 50
##   [97] 42 60 33 51 27 43 45 53 53 37 25 32 46 57 27 55 51 24 56 31 46 57 42 53
##  [121] 48 25 37 47 29 41 31 52 36 39 62 42 51 29 25 50 23 60 43 43 27 28 36 60
##  [145] 64 64 61 27 32 28 64 55 46 45 23 55 60 39 24 61 28 34 34 25 39 64 60 58
##  [169] 58 32 41 61 47 60 56 47 31 47 51 61 50 37 23 56 45 23 48 46 64 47 56 51
##  [193] 28 61 41 37 25 25 50 64 38 39 43 28 33 39 46 25 52 45 52 33 25 36 50 57
##  [217] 43 53 62 60 62 33 33 27 61 23 38 64 64 57 25 56 43 45 23 46 37 31 29 37
##  [241] 61 43 39 29 52 58 34 57 37 56 32 29 29 32 25 36 45 58 52 42 31 37 47 31
##  [265] 38 58 31 41 62 62 24 28 58 56 50 51 24 28 43 45 61 45 41 31 32 24 57 41
##  [289] 58 56 51 45 32 55 53 41 47 41 28 62 34 41 31 57 60 29 46 27 50 45 34 45
##  [313] 39 43 28 55 32 38 36 29 48 31 37 58 31 46 60 46 57 50 64 24 37 36 58 64
##  [337] 34 47 43 39 43 27 27 55 61 37 36 32 47 27 42 55 45 23 58 60 58 33 50 47
##  [361] 23 53 27 53 37 53 52 39 48 64 42 31 34 41 47 48 29 51 48 61 23 61 50 39
##  [385] 37 47 24 47 23 36 41 55 56 37 39 24 56 60 42 29 39 61 37 24 31 36 47 64
##  [409] 47 31 55 34 29 42 42 56 57 29 33 39 62 60 56 47 34 39 50 38 50 25 34 39
##  [433] 42 42 39 56 56 31 58 43 58 37 25 57 52 36 24 42 60 56 55 31 32 42 32 32
##  [457] 57 61 41 46 51 31 56 42 45 39 46 36 42 41 37 52 42 57 62 32 56 48 33 34
##  [481] 61 24 31 46 27 42 62 57 31 34 57 53 47 34 47 31 25 57 47 61 27 53 34 64
##  [505] 34 55 43 29 45 27 47 58 45 64 55 32 29 29 48 45 43 23 60 39 45 36 45 36
##  [529] 42 37 41 64 58 62 32 33 25 60 38 48 45 56 46 45 37 57 62 24 61 51 33 60
##  [553] 36 52 56 46 50 32 45 28 51 38 55 45 32 43 24 47 47 45 52 45 62 48 50 46
##  [577] 61 58 51 33 24 60 43 57 33 62 61 47 60 27 24 31 55 53 23 25 27 45 52 25
##  [601] 37 50 41 28 50 52 23 28 24 37 56 58 36 58 41 51 34 58 37 25 45 58 28 43
##  [625] 38 38 37 61 64 47 39 45 34 34 52 33 50 45 31 53 28 50 58 39 42 28 43 64
##  [649] 47 29 56 23 28 43 60 62 34 36 51 53 58 57 23 23 36 39 33 45 50 34 64 31
##  [673] 61 48 34 58 42 51 28 64 58 37 43 33 62 39 55 43 61 52 45 57 32 32 34 32
##  [697] 56 57 61 23 24 42 24 57 27 51 58 29 42 43 32 51 47 62 61 47 37 23 27 55
##  [721] 25 43 52 25 50 50 58 34 43 46 42 51 46 25 29 48 53 29 51 62 29 46 48 37
##  [745] 25 38 24 27 45 41 38 57 45 57 25 43 45 38 46 46 41 45 37 53 39 51 32 25
##  [769] 23 41 55 24 55 50 62 46 48 53 56 46 56 29 36 31 53 52 47 56 25 56 61 64
##  [793] 41 56 37 36 41 25 62 27 29 36 58 46 37 32 64 52 31 52 47 39 45 53 51 50
##  [817] 34 48 32 38 34 62 31 50 47 27 61 48 64 28 36 51 55 31 28 33 38 45 28 38
##  [841] 46 57 42 31 37 57 42 42 60 33 48 24 37 37 28 34 33 50 31 29 61 58 48 25
##  [865] 34 27 23 58 48 50 45 51 53 25 50 29 64 36 56 34 61 34 58 60 39 24 34 48
##  [889] 28 34 42 46 38 31 24 56 23 50 27 60 23 25 24 37 56 42 39 29 64 50 36 46
##  [913] 56 31 56 64 47 32 41 41 34 37 48 62 38 24 25 36 23 24 60 41 28 58 24 34
##  [937] 33 55 43 57 31 23 37 55 58 55 47 47 28 33 32 52 34 56 42 29 28 55 37 47
##  [961] 48 31 25 47 51 34 62 39 46 38 36 25 42 36 32 42 36 51 52 31 25 41 29 23
##  [985] 23 38 28 62 45 60 34 41 27 39 33 56 29 32 34 58

The $ “selects” the age variable from nurses, and the result is a vector (the column of ages)

Matrix indexing is really handy, but also tricky. It is common to all computer code. A single cell of a matrix or rectangular data array is accessed using it’s row number and column number in square brackets sepearated by a comma. For example, this will give me the age of the nurse in the 6th row. Six specifies the row, while 5 is there singe age is the 5th column in the dataset

# the age of the nurse in the 6th row
nurses[6,5]
## [1] 60

Usually however, we want a whole row or column. We do that simply by leaving either the row or column entry blank like so (note: I’m using the head() command to print only 6 rows, the dataset is 1000 rows…)

# 5th column of the data
nurses[ , 5]
##    [1] 36 45 32 57 46 60 23 32 60 45 57 47 32 42 42 53 60 33 64 37 23 61 58 52
##   [25] 28 52 43 64 47 62 39 46 58 34 41 56 39 60 57 50 29 57 51 25 27 53 42 43
##   [49] 24 48 60 27 61 33 47 50 38 28 33 52 53 31 27 57 47 41 64 45 42 25 34 36
##   [73] 55 33 51 60 24 31 58 60 29 24 24 46 43 24 45 64 25 55 42 58 64 47 25 50
##   [97] 42 60 33 51 27 43 45 53 53 37 25 32 46 57 27 55 51 24 56 31 46 57 42 53
##  [121] 48 25 37 47 29 41 31 52 36 39 62 42 51 29 25 50 23 60 43 43 27 28 36 60
##  [145] 64 64 61 27 32 28 64 55 46 45 23 55 60 39 24 61 28 34 34 25 39 64 60 58
##  [169] 58 32 41 61 47 60 56 47 31 47 51 61 50 37 23 56 45 23 48 46 64 47 56 51
##  [193] 28 61 41 37 25 25 50 64 38 39 43 28 33 39 46 25 52 45 52 33 25 36 50 57
##  [217] 43 53 62 60 62 33 33 27 61 23 38 64 64 57 25 56 43 45 23 46 37 31 29 37
##  [241] 61 43 39 29 52 58 34 57 37 56 32 29 29 32 25 36 45 58 52 42 31 37 47 31
##  [265] 38 58 31 41 62 62 24 28 58 56 50 51 24 28 43 45 61 45 41 31 32 24 57 41
##  [289] 58 56 51 45 32 55 53 41 47 41 28 62 34 41 31 57 60 29 46 27 50 45 34 45
##  [313] 39 43 28 55 32 38 36 29 48 31 37 58 31 46 60 46 57 50 64 24 37 36 58 64
##  [337] 34 47 43 39 43 27 27 55 61 37 36 32 47 27 42 55 45 23 58 60 58 33 50 47
##  [361] 23 53 27 53 37 53 52 39 48 64 42 31 34 41 47 48 29 51 48 61 23 61 50 39
##  [385] 37 47 24 47 23 36 41 55 56 37 39 24 56 60 42 29 39 61 37 24 31 36 47 64
##  [409] 47 31 55 34 29 42 42 56 57 29 33 39 62 60 56 47 34 39 50 38 50 25 34 39
##  [433] 42 42 39 56 56 31 58 43 58 37 25 57 52 36 24 42 60 56 55 31 32 42 32 32
##  [457] 57 61 41 46 51 31 56 42 45 39 46 36 42 41 37 52 42 57 62 32 56 48 33 34
##  [481] 61 24 31 46 27 42 62 57 31 34 57 53 47 34 47 31 25 57 47 61 27 53 34 64
##  [505] 34 55 43 29 45 27 47 58 45 64 55 32 29 29 48 45 43 23 60 39 45 36 45 36
##  [529] 42 37 41 64 58 62 32 33 25 60 38 48 45 56 46 45 37 57 62 24 61 51 33 60
##  [553] 36 52 56 46 50 32 45 28 51 38 55 45 32 43 24 47 47 45 52 45 62 48 50 46
##  [577] 61 58 51 33 24 60 43 57 33 62 61 47 60 27 24 31 55 53 23 25 27 45 52 25
##  [601] 37 50 41 28 50 52 23 28 24 37 56 58 36 58 41 51 34 58 37 25 45 58 28 43
##  [625] 38 38 37 61 64 47 39 45 34 34 52 33 50 45 31 53 28 50 58 39 42 28 43 64
##  [649] 47 29 56 23 28 43 60 62 34 36 51 53 58 57 23 23 36 39 33 45 50 34 64 31
##  [673] 61 48 34 58 42 51 28 64 58 37 43 33 62 39 55 43 61 52 45 57 32 32 34 32
##  [697] 56 57 61 23 24 42 24 57 27 51 58 29 42 43 32 51 47 62 61 47 37 23 27 55
##  [721] 25 43 52 25 50 50 58 34 43 46 42 51 46 25 29 48 53 29 51 62 29 46 48 37
##  [745] 25 38 24 27 45 41 38 57 45 57 25 43 45 38 46 46 41 45 37 53 39 51 32 25
##  [769] 23 41 55 24 55 50 62 46 48 53 56 46 56 29 36 31 53 52 47 56 25 56 61 64
##  [793] 41 56 37 36 41 25 62 27 29 36 58 46 37 32 64 52 31 52 47 39 45 53 51 50
##  [817] 34 48 32 38 34 62 31 50 47 27 61 48 64 28 36 51 55 31 28 33 38 45 28 38
##  [841] 46 57 42 31 37 57 42 42 60 33 48 24 37 37 28 34 33 50 31 29 61 58 48 25
##  [865] 34 27 23 58 48 50 45 51 53 25 50 29 64 36 56 34 61 34 58 60 39 24 34 48
##  [889] 28 34 42 46 38 31 24 56 23 50 27 60 23 25 24 37 56 42 39 29 64 50 36 46
##  [913] 56 31 56 64 47 32 41 41 34 37 48 62 38 24 25 36 23 24 60 41 28 58 24 34
##  [937] 33 55 43 57 31 23 37 55 58 55 47 47 28 33 32 52 34 56 42 29 28 55 37 47
##  [961] 48 31 25 47 51 34 62 39 46 38 36 25 42 36 32 42 36 51 52 31 25 41 29 23
##  [985] 23 38 28 62 45 60 34 41 27 39 33 56 29 32 34 58
# 6th row of the data
nurses[6, ]
##   hospital ward wardid nurse age gender experien stress     wardtype hospsize
## 6        1    1     11     6  60 female       22      6 general care    large
##       expcon     Zage  Zgender Zexperien  Zstress Zwardtype Zhospsize   Zexpcon
## 6 experiment 1.411358 0.600153 0.8180786 1.044405 -1.001501  1.777279 0.9915356
##   Cexpcon Chospsize
## 6     0.5         1

If we want multiple rows or columns, we give a range or use a vector to name the columns:

head(nurses[, 5:7])
##   age gender experien
## 1  36   male       11
## 2  45   male       20
## 3  32   male        7
## 4  57 female       25
## 5  46 female       22
## 6  60 female       22
head(nurses[, c(5,8)])
##   age stress
## 1  36      7
## 2  45      7
## 3  32      7
## 4  57      6
## 5  46      6
## 6  60      6
head(nurses[, c("age","hospsize")])
##   age hospsize
## 1  36    large
## 2  45    large
## 3  32    large
## 4  57    large
## 5  46    large
## 6  60    large

Computing a new variable and adding it to the dataframe

Finally, let’s see how to make a new variable in a data frame. We do this by using the $ and naming a new variable on the fly. Here I’m going make a new composite variable which is just age times experience

nurses$composite <- nurses$age * nurses$experien

This data set has Z scores for a number of variables. Let’s try to duplicate that with R code and check our results:

nurses$Zage2 <- (nurses$age - mean(nurses$age)) / sd(nurses$age)

head(nurses[, c("Zage","Zage2")])
##         Zage      Zage2
## 1 -0.5817336 -0.5817336
## 2  0.1656757  0.1656757
## 3 -0.9139156 -0.9139156
## 4  1.1622216  1.1622216
## 5  0.2487212  0.2487212
## 6  1.4113581  1.4113581

Notice the liberal use of parenthesis. R respects order of operations very literally, so when doing computations like this, it is important to check your work.

Here’s one more, a log transformation

nurses$logAge <- log(nurses$age)

This is just a taste of the R universe, and we’ll try to stick to the basics throughout the course. I have lots of R resources and suggestions so please reach out to me!


Review

Wow that was a lot. Our objectives were:

  • Overcome any fear about learning to “code”
  • Learn about data structures in R
    • Type data into R (not fun!)
    • Import data to R from SPSS
    • Import data from Excel
  • Using the psych package
  • Transforming / computing variables

Part of your “homework” is to import some Excel data (as a .csv) into R and do some basic statistcs. To do so, you can start with this syntax below and simply insert the file name in quotes, with the .csv extension. If it doesn’t work, it means that either you spelled the file name wrong, or it’s not in your working directory.

# read in the data
mydata2 <- read.csv("FileName")