Chapter 7 Manipulating data frames.
Most of the time data does not come us ready to analyze and we must make manipulations to the data frame. Base R has many useful functions that will allow us to do this.
7.1 Slicing a data frame
We may intersted in looking at just 1 column or 1 row. We can do this with by specifying the indicies for the rows and columns.
- full data frame
## PlayerID Hits AtBats
## 1 Player01 3 4
## 2 Player02 1 4
## 3 Player03 0 3
## 4 Player04 2 5
## 5 Player05 4 4
## PlayerID Hits AtBats
## 1 Player01 3 4
## 2 Player02 1 4
## 3 Player03 0 3
## 4 Player04 2 5
## 5 Player05 4 4
Note: If we dont specify which rows or columns we want we get the whole data frame back.
- Second row
## PlayerID Hits AtBats
## 2 Player02 1 4
- second column
## [1] 3 1 0 2 4
- Second row of the second column
## [1] 1
We can also use column names to slice data:
First lets get the names of the columns in the data frame.
## [1] "PlayerID" "Hits" "AtBats"
Now we can use these column names to select the hits columns.
## [1] 3 1 0 2 4
We can pass a vector of column names if we want more than 1 column.
## Hits PlayerID
## 1 3 Player01
## 2 1 Player02
## 3 0 Player03
## 4 2 Player04
## 5 4 Player05
You can also use the $ sign to specify columns
## [1] "Player01" "Player02" "Player03" "Player04" "Player05"
In the case the column is treated as a vector which can be manipulated.
## [1] "Player04"
We can also assign a new value here.
## Changing value of a sliced data frame.
hits_data_frame$PlayerID[4] <- "PlayerNew"
hits_data_frame$PlayerID[4]
## [1] "PlayerNew"
Operations can also be also be completed in the brackets to help filter data
## [1] 4
what this means is player new is the 4th observation. We can filter the whole data frame with.
## PlayerID Hits AtBats
## 4 PlayerNew 2 5
7.2 Creating new columns
you can create new columns with mathematical operations like \(+\),\(-\),\(/\) and \(*\)
### Create new columns with / dividing
hits_data_frame$batting_average <- hits_data_frame$Hits / hits_data_frame$AtBats
### Create new columns with - subtracting
hits_data_frame$outs <- hits_data_frame$AtBats - hits_data_frame$Hits
hits_data_frame
## PlayerID Hits AtBats batting_average outs
## 1 Player01 3 4 0.75 1
## 2 Player02 1 4 0.25 3
## 3 Player03 0 3 0.00 3
## 4 PlayerNew 2 5 0.40 3
## 5 Player05 4 4 1.00 0