# Chapter 3 Data types and objects

All objects in R have a given *type*. You already know most of them, as these types are also used
in mathematics. Integers, floating point numbers, or floats, matrices, etc, are all objects you
are already familiar with. But R has other, maybe lesser known data types (that you can find in a
lot of other programming languages) that you need to become familiar with. But first, we need to
learn how to assign a value to a variable. This can be done in two ways:

or

there is almost no difference between these two approaches. You would need to pay attention to
this, and use `<-`

in very specific situations to which you will very likely never be confronted
to.

Another thing you must know before going further is that you can convert from one type to another
using functions that start with `as.()`

, such as `as.character()`

, `as.numeric()`

, `as.logical()`

,
etc… For example, `as.character(1)`

converts the number `1`

to the character (or string) “1”.
There are also `is.character()`

, `is.numeric()`

and so on that test if the object is of the
required class. These functions exist for each object type, and are very useful. Make sure you
remember them!

## 3.1 The `numeric`

class

To define single numbers, you can do the following:

The `class()`

function allows you to check the class of an object:

`## [1] "numeric"`

Decimals are defined with the character `.`

:

## 3.2 The `character`

class

Use `" "`

to define characters (called strings in other programming languages):

`## [1] "character"`

A very nice package to work with characters is `{stringr}`

, which is also part of the `{tidyverse}`

.

## 3.3 The `factor`

class

Factors look like characters, but are very different. They are the representation of categorical
variables. A `{tidyverse}`

package to work with factors is `{forcats}`

. You would rarely use
factor variables outside of datasets, so for now, it is enough to know that this class exists.
We are going to manipulate factor variables in the next chatper 5.

## 3.4 The `Date`

class

Dates also look like characters, but are very different too:

`## [1] "2019-03-19"`

`## [1] "Date"`

Manipulating dates and time can be
tricky, but thankfully there’s a `{tidyverse}`

package for that, called `{lubridate}`

. We are going
to go over this package in Chapter 5.

## 3.5 The `logical`

class

This class is the result of logical comparisons, for example, if you type:

`## [1] TRUE`

R returns `TRUE`

, which is an object of class `logical`

:

`## [1] "logical"`

In other programming languages, `logical`

s are often called `bool`

s.

A `logical`

variable can only have two values, either `TRUE`

or `FALSE`

.

## 3.6 Vectors and matrices

You can create a vector in different ways. But first of all, it is important to understand that a vector in most programming languages is nothing more than a list of things. These things can be numbers (either integers or floats), strings, or even other vectors.

### 3.6.1 The `c()`

function

A very important function that allows you to build a vector is `c()`

:

This creates a vector with elements 1, 2, 3, 4, 5. If you check its class:

`## [1] "numeric"`

This can be confusing: you where probably expecting a to be of class *vector* or
something similar. This is not the case if you use `c()`

to create the vector, because `c()`

doesn’t build a vector in the mathematical sense, but rather a list with numbers.
Checking its dimension:

`## NULL`

returns `NULL`

because a list doesn’t have a dimension,
that’s why the `dim()`

function returns `NULL`

. If you want to create a true vector, you need to
use `cbind()`

or `rbind()`

.

### 3.6.2 `cbind()`

and `rbind()`

You can create a *true* vector with `cbind()`

:

Check its class now:

`## [1] "matrix"`

This is exactly what we expected. Let’s check its dimension:

`## [1] 1 5`

This returns the dimension of `a`

using the LICO notation (number of LInes first, the number of COlumns).

It is also possible to bind vectors together to create a matrix.

Now let’s put vector `a`

and `b`

into a matrix called `matrix_c`

using `rbind()`

.
`rbind()`

functions the same way as `cbind()`

but glues the vectors together by rows and not by columns.

```
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 2 3 4 5
## [2,] 6 7 8 9 10
```

### 3.6.3 The `matrix`

class

R also has support for matrices. For example, you can create a matrix of dimension (5,5) filled
with 0’s with the `matrix()`

function:

If you want to create the following matrix:

\[ B = \left( \begin{array}{ccc} 2 & 4 & 3 \\ 1 & 5 & 7 \end{array} \right) \]

you would do it like this:

The option `byrow <- TRUE`

means that the rows of the matrix will be filled first.

You can access individual elements of `matrix_a`

like so:

`## [1] 0`

and R returns its value, 0. We can assign a new value to this element if we want. Try:

and now take a look at `matrix_a`

again.

```
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0 0 0 0 0
## [2,] 0 0 7 0 0
## [3,] 0 0 0 0 0
## [4,] 0 0 0 0 0
## [5,] 0 0 0 0 0
```

Recall our vector `b`

:

To access its third element, you can simply write:

`## [1] 8`

I have heard many people praising R for being a matrix based language. Matrices are indeed useful,
and statisticians are very used to working with them. However, I very rarely use matrices in my
day to day work, and prefer an approach based on data frames (which will be discussed below). This
is because working with data frames makes it easier to use R’s advanced functional programming
language capabilities, and this is where R really shines in my opinion. Working with matrices
almost automatically implies using loops and all the iterative programming techniques, *à la Fortran*,
which I personally believe are ill-suited for interactive statistical programming (as discussed in
the introduction).

## 3.7 The `list`

class

The `list`

class is a very flexible class, and thus, very useful. You can put anything inside a list,
such as numbers:

or other lists constructed with `c()`

:

you can also put objects of different classes in the same list:

and of course create list of lists:

To check the contents of a list, you can use the structure function `str()`

:

```
## List of 3
## $ :List of 2
## ..$ : num 3
## ..$ : num 2
## $ :List of 2
## ..$ : num [1:2] 1 2
## ..$ : num [1:2] 3 4
## $ :List of 3
## ..$ : num 3
## ..$ : num [1:2] 1 2
## ..$ : chr "lists are amazing!"
```

or you can use RStudio’s *Environment* pane:

You can also create named lists:

and you can access the elements in two ways:

`## [1] 2`

or, for named lists:

`## [1] "this is a named list"`

Lists are used extensively because they are so flexible. You can build lists of datasets and apply functions to all the datasets at once, build lists of models, lists of plots, etc… In the later chapters we are going to learn all about them. Lists are central objects in a functional programming workflow for interactive statistical analysis.

## 3.8 The `data.frame`

and `tibble`

classes

In the next chapter we are going to learn how to import datasets into R. Once you import data, the
resulting object is either a `data.frame`

or a `tibble`

depending on which package you used to
import the data. `tibble`

s extend `data.frame`

s so if you know about `data.frame`

objects already,
working with `tibble`

s will be very easy. `tibble`

s have a better `print()`

method, and some other
niceties. If you want to know more, I go into more detail in my other
book
but for our purposes, there’s not much you need to know about `data.frame`

and `tibble`

objects,
apart that this is the representation of a dataset when loaded into R.

However, I want to stress that these objects are central to R and are thus very important; they are
actually special cases of lists, discussed above. There
are different ways to print a `data.frame`

or a `tibble`

if you wish to inspect it. You can use
`View(my_data)`

to show the `my_data`

`data.frame`

in the *View* pane of RStudio:

You can also use the `str()`

function:

And if you need to access an individual column, you can use the `$`

sign, same as for a list:

## 3.9 Formulas

We will learn more about formulas later, but because it is an important object, it is useful if you already know about them early on. A formula is defined in the following way:

`## [1] "formula"`

Formula objects are defined using the `~`

symbol. Formulas are useful to define statistical models,
for example for a linear regression:

or also to define anonymous functions, but more on this later.

## 3.10 Models

A statistical model is an object like any other in R:

`## [1] "lm"`

`my_model`

is an object of class `lm`

. You can apply different functions to a model object:

```
##
## Call:
## lm(formula = mpg ~ hp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.7121 -2.1122 -0.8854 1.5819 8.2360
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.09886 1.63392 18.421 < 2e-16 ***
## hp -0.06823 0.01012 -6.742 1.79e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared: 0.6024, Adjusted R-squared: 0.5892
## F-statistic: 45.46 on 1 and 30 DF, p-value: 1.788e-07
```

This class will be explored in later chapters.

## 3.11 The `is.*()`

and `as.*()`

functions

`is.*()`

and `as.*()`

are very powerful, and this is the right moment to introduce them. `is.*()`

test the class of an object:

`## [1] FALSE`

`## [1] FALSE`

`as.*()`

functions convert from one type to another:

`## [1] "7"`

`## [1] 23.12`

but only if it makes sense:

`## Warning: NAs introduced by coercion`

`## [1] NA`

Keep these in mind, because they are going to be very useful. The `{purrr}`

package introduces
similar functions, `is_*()`

and `as_*()`

. We will explore them in Chapter 9.

## 3.12 Exercises

### Exercise 1

Try to create the following vector:

\[a = (6,3,8,9)\]

and add it this other vector:

\[b = (9,1,3,5)\]

and save the result to a new variable called `result`

.

### Exercise 2

Using `a`

and `b`

from before, try to get their dot product.

Try with `a * b`

in the R console. What happened?
Try to find the right function to get the dot product. Don’t hesitate to google the answer!

### Exercise 3

How can you create a matrix of dimension (30,30) filled with 2’s by only using the function `matrix()`

?

### Exercise 4

Save your first name in a variable `a`

and your surname in a variable `b`

. What does the function:

do? Look at the help for `paste()`

with `?paste`

or using the *Help* pane in RStudio. What does the
optional argument `sep`

do?

### Exercise 5

Define the following variables: `a <- 8`

, `b <- 3`

, `c <- 19`

. What do the following lines check?
What do they return?

### Exercise 6

Define the following matrix:

\[ \text{matrix_a} = \left( \begin{array}{ccc} 9 & 4 & 12 \\ 5 & 0 & 7 \\ 2 & 6 & 8 \\ 9 & 2 & 9 \end{array} \right) \]

- What does
`matrix_a >= 5`

do? - What does
`matrix_a[ , 2]`

do? - Can you find which function gives you the transpose of this matrix?

### Exercise 7

Solve the following system of equations using the `solve()`

function:

\[ \left( \begin{array}{cccc} 9 & 4 & 12 & 2 \\ 5 & 0 & 7 & 9\\ 2 & 6 & 8 & 0\\ 9 & 2 & 9 & 11 \end{array} \right) \times \left( \begin{array}{ccc} x \\ y \\ z \\ t \\ \end{array}\right) = \left( \begin{array}{ccc} 7\\ 18\\ 1\\ 0 \end{array} \right) \]

### Exercise 8

Load the `mtcars`

data (`mtcars`

is include in R, so you only need to use the `data()`

function to
load the data):

if you run `class(mtcars)`

, you get “data.frame”. Try now with `typeof(mtcars)`

. The answer is now
“list”! This is because the class of an object is an attribute of that object, which can even
be assigned by the user:

`## [1] "don't do this"`

The type of an object is R’s internal type of that object, which cannot be manipulated by the user.
It is always useful to know the type of an object (not just its class). For example, in the particular
case of data frames, because the type of a data frame is a list, you can use all that you learned
about lists to manipulate data frames! Recall that `$`

allowed you to select the element of a list
for instance:

`## [1] 1`

Because data frames are nothing but fancy lists, this is why you can access columns the same way:

```
## [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5
## [23] 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4
```