Chapter 7 Manipulating data frames.

Most of the time data does not come us ready to analyze and we must make manipulations to the data frame. Base R has many useful functions that will allow us to do this.

7.1 Slicing a data frame

We may intersted in looking at just 1 column or 1 row. We can do this with by specifying the indicies for the rows and columns.

  • full data frame
##   PlayerID Hits AtBats
## 1 Player01    3      4
## 2 Player02    1      4
## 3 Player03    0      3
## 4 Player04    2      5
## 5 Player05    4      4
##   PlayerID Hits AtBats
## 1 Player01    3      4
## 2 Player02    1      4
## 3 Player03    0      3
## 4 Player04    2      5
## 5 Player05    4      4

Note: If we dont specify which rows or columns we want we get the whole data frame back.

  • Second row
##   PlayerID Hits AtBats
## 2 Player02    1      4
  • second column
## [1] 3 1 0 2 4
  • Second row of the second column
## [1] 1

We can also use column names to slice data:

First lets get the names of the columns in the data frame.

## [1] "PlayerID" "Hits"     "AtBats"

Now we can use these column names to select the hits columns.

## [1] 3 1 0 2 4

We can pass a vector of column names if we want more than 1 column.

##   Hits PlayerID
## 1    3 Player01
## 2    1 Player02
## 3    0 Player03
## 4    2 Player04
## 5    4 Player05

You can also use the $ sign to specify columns

## [1] "Player01" "Player02" "Player03" "Player04" "Player05"

In the case the column is treated as a vector which can be manipulated.

## [1] "Player04"

We can also assign a new value here.

## [1] "PlayerNew"

Operations can also be also be completed in the brackets to help filter data

## [1] 4

what this means is player new is the 4th observation. We can filter the whole data frame with.

##    PlayerID Hits AtBats
## 4 PlayerNew    2      5

7.2 Creating new columns

you can create new columns with mathematical operations like \(+\),\(-\),\(/\) and \(*\)

##    PlayerID Hits AtBats batting_average outs
## 1  Player01    3      4            0.75    1
## 2  Player02    1      4            0.25    3
## 3  Player03    0      3            0.00    3
## 4 PlayerNew    2      5            0.40    3
## 5  Player05    4      4            1.00    0