2 Vectors
Vectors are fundamental "objects" in R. They can store data of different types but the data within any single vector must be of the same type. For example, vectors can store numerical, character (text) or logical (TRUE
/FALSE
) data, and be used to easily execute mathematical operations or extract certain elements of the data. For our purposes in S2S, vectors will often contain the values of a particular variable in a dataset.
Let's create some trivial vectors in R to explore their properties. Copy and run the following code in your own R script.
You should notice that two 'Values
' appear in the Environment tab, x
and y
. This is because we have assigned values to each of these names using the <-
operator in the code above.
x
is the single value 3, which can be thought of as a vector of length 1.y
is a vector of length 3 and of the form \(\begin{bmatrix}2&4&1 \end{bmatrix}^\intercal\). The functionc()
has been used to 'combine' these values into one vector.
Note that you can find further details and examples of creating your own vectors in Section 1.3 Vectors of Probability and Statistics with R.
2.1 Addition
We can also use R to add vectors together. Let's try it out by running the following code.
[1] 5 7 4
The output shown above should appear in the console at the bottom left of the RStudio window.
Here, R has 'recycled' the element of the vector x
until it is the same length as the vector y
, and elementwise addition has then been completed. This essentially means R has completed \(3+2\) to calculate the first element, \(3+4\) to find the second and \(3+1\) to find the third.
What happens if we try to add together two vectors of unequal length, but they both have more than one element? We can try this out by creating a new, longer vector \(z=\begin{bmatrix}3&3&5&8&2\end{bmatrix}^\intercal\) and then adding this to y
.
Warning in y + z: longer object length is not a multiple of shorter object
length
[1] 5 7 6 10 6
We now see a warning message telling us that the length of vector z
is not a multiple of the length of vector y
- in other words, they have a different number of elements. We still see a result however (5, 7, 6, 10, 6) so the vectors have been added together.
This time, R recycles each element of y
in turn until it is the same length as z
and then completes the elementwise addition. y
has been extended by adding its first two elements on to the end so that it has five entries like z
, meaning R has completed the calculation \(\begin{bmatrix}2&4&1&2&4\end{bmatrix}^\intercal+\begin{bmatrix}3&3&5&8&2\end{bmatrix}^\intercal\).
Create and save the vector \(a=\begin{bmatrix}2&2&-1\end{bmatrix}^\intercal\) and scalar \(b=2\), then add these together and save the result as a new vector called c
and print the contents of c
in the console.
Section 1.3 Vectors of Probability and Statistics with R explains in detail how R adds two vectors together.
2.2 Logical operators
Vectors can contain logical values (TRUE
/FALSE
) rather than numerical values. We can create logical vectors by comparing the values within a numerical vector.
[1] FALSE TRUE FALSE
Here, R has again recycled the element from x
to make it the same length as y
, then it returns a vector telling us for which elements the statement x < y
is true. For example, the first element of x
is 3 which is greater than 2, so x < y
is FALSE
.
Other logical operators that we can use with vectors include:
Operator | Function |
---|---|
> |
greater than |
< |
less than |
<= |
less than or equal to |
>= |
greater than or equal to |
== |
exact equality |
!= |
exact inequality |
Let's explore how these operators behave with two new numerical vectors.
Write some code to show:
- which elements of
s
andt
are equal to each other - which elements of
s
are less than the corresponding element int
- which elements of
s
andt
are unequal
When using logical operators with vectors, R will compare the vectors elementwise meaning the first elements of each vector will be compared to each other, then the second elements and so on.
Equality:
[1] TRUE FALSE FALSE
This tells us that the first element of s
is equal to the first element of t
(they are both the value 2), whereas the second elements of s
and t
are not equal and neither are the third elements.
Less than:
[1] FALSE FALSE TRUE
The first element of s
is not less than the first elements of t
(that is 2 < 2 is not a true statement) so this returns FALSE
. The statement s < t
is TRUE
for the third elements because the third element of s
is 1 and for t
it is 4 which equates to the statement 1 < 4.
Inequality:
[1] FALSE TRUE TRUE
This return the opposite of s == t
. The first elements of s
and t
are both 2 so they are equal meaning !=
will return FALSE
. Since the second elements are 4 and 3 respectively, these are unequal so !=
returns TRUE
.
Try using some of the other operators listed above to see if they return the vectors you expect.
What if we wanted to compare specific elements of the vectors? We can achieve this by 'indexing' the vector. In other words we choose a single element from the vector that we want to use. This is done by first telling R which vector we want to look at by typing its name and then specifying the element we want by including its position inside square brackets ([ ]
).
For example, s[3]
would extract the third element from s
, that is the number 1.
The following code can be used to check whether the second element of s
is the same as the third element of t
.
[1] TRUE
Create the vectors \(u=\begin{bmatrix} 4 & 4 & 1 \end{bmatrix}^\intercal\) and \(v=\begin{bmatrix} 1 & 0 & 5 \end{bmatrix}^\intercal\). Write code to check:
- whether the first elements of
u+v
andx+y
(defined in the previous section) are the same. - if the third element of
u+v
is the same as the second element ofx+y
.
Equality of the first elements:
[1] TRUE
Here we have wrapped both u+v
and x+y
in brackets first so that we can then index the resulting vector. Alternatively, you could save u+v
and x+y
as new vectors, called something different, using the <-
operator.
u+v
is equal to \(\begin{bmatrix} 5 & 4 & 6 \end{bmatrix}^\intercal\) so its first element is 5.
x+y
is equal to \(\begin{bmatrix} 5 & 7 & 4 \end{bmatrix}^\intercal\) so its first element is also 5. It makes sense then that a TRUE
statement is returned when checking the equality of the first elements of these two vectors.
Equality of the third and second elements:
[1] FALSE
Similarly, we can see that the third entry of u+v
is 6 and the second entry of x+y
is 7. Since these are clearly not equal we would expect R to return a FALSE
statement when checking their equality, which we can see above.
Look at Section 1.3 Vectors of Probability and Statistics with R for more information on logical operators and vectors.
2.3 Other types of vectors
So far we have looked at numeric vectors and logical vectors. R can also store vectors with other types of elements, for example a character vector can be used to store strings of text.
The code below shows how to create a character vector;
Vectors containing complex numbers can also be created as follows;
The function typof()
is going to be useful for checking how R has stored the elements of a particular vector. To check the type of vector animals
is, we can run the code;
[1] "character"
This tells us that R has stored animals
as a character vector, meaning each element is stored as a string of text.
What type of vector is w
?
What happens if we try to combine two vectors which are of different types in R? Let's run some code and see.
[1] 0 1 0 2 4 1
Here, w
is a logical vector and y
is a numeric vector. R has returned the vector \(\begin{bmatrix} 0&1&0&2&4&1 \end{bmatrix}^\intercal\), rather than \(\begin{bmatrix} \mbox{FALSE} & \mbox{TRUE} & \mbox{FALSE} & 2 & 4 & 1 \end{bmatrix}^\intercal\) as might be expected. So what's happened?
When R combines vectors of different types it will ensure all the elements are of the same type in the resulting vector. This is because vectors need to be stored with all elements being the same type by R. This is done so that all elements are of the most 'complex' type. R considers numeric vectors to be more complex than logical vectors, so when we combine a logical and a numeric vector we will be left with a numeric vector.
The entries TRUE
and FALSE
within the vector w
therefore need to be presented numerically. A FALSE
statement will take the value 0 and a TRUE
statement will take the value 1.
Let's look at combining a numeric vector with a complex vector. Remember that y
is a numeric vector and complex
is a vector containing complex numbers.
[1] 2+0i 4+0i 1+0i 2+1i 0+0i 3+2i
Now all elements are presented as a complex number. That is because R considers complex vectors to be more complex than numeric vectors.
Create a new vector, called combined
, which is a combination of the logical vector w
and the character vector animals
. What type of vector is this?
[1] "FALSE" "TRUE" "FALSE" "dog" "sheep" "cow" "horse"
[1] "character"
The resulting vector is \(\begin{bmatrix} ``\mbox{FALSE"} & ``\mbox{TRUE"} & ``\mbox{FALSE"} & ``\mbox{dog"} & ``\mbox{sheep"} & ``\mbox{cow"} & ``\mbox{horse"} \end{bmatrix}^\intercal\) which we can see is stored by R as a character vector. Each element is now considered to be a string of text (note the quotation marks around each element).
It is possible to coerce vectors of one type to be another type. This can be done using one of the functions
as.logical()
as.numeric()
as.complex()
as.character()
For example, if we wanted to change the logical vector w
to be a numeric vector we could use the following code.
[1] 0 1 0
We again see that TRUE
statements have been given the value 1 and FALSE
statements have been given the value 0.
It is important to ensure that this coercion makes sense to do! It doesn't make much sense to represent the vector animals
as a numerical vector, and in fact if we tried to R would show us a warning message.
Warning: NAs introduced by coercion
[1] NA NA NA NA
R has still changed the vector to be numeric, but since "dog", for example, doesn't have a numerical value it has been represented with NA
(without any quoation marks "") which is R's symbol for "Not Available", which we think of as a missing value. This tells us that each element of the now numeric vector doesn't have a value. R also has the symbol NaN
("Not a Number") to represent an impossible value (e.g. result of dividing by zero),
To read more about the types of vectors R can handle, see Section 1.3 Vectors of Probability and Statistics with R.
2.4 Indexing vectors
We have seen briefly how to index vectors using square brackets [ ]
but we can now look at indexing in more detail. Specific elements can be extracted using positive indices or can be omitted from a vector using negative indices.
Let's again look at the vector z
(remember \(z=\begin{bmatrix}3&3&5&8&2 \end{bmatrix}^\intercal\)) and extract the first two elements.
[1] 3 3
In order to extract these two elements, we need to provide a vector (using c()
) giving the positions of the elements we want within the square brackets, [ ]
.
If we wanted to remove the third entry from z
we can use the following code.
[1] 3 3 8 2
Note that it is not possible to use both positive and negative indices at the same time, so we couldn't remove the first entry of a vector at the same time as extracting the second and third entries for example.
Complete the code that would remove the third and fourth elements from the vector animals
.
animals[
]
There are several ways this code could be written. We can remove the third and fourth entries through either of the following lines of code.
[1] "dog" "sheep"
[1] "dog" "sheep"
Alternatively, because animals has four elements, removing the third and fourth elements is equivalent to extracting the first and second elements. Therefore we can achieve the same results using the following code.
[1] "dog" "sheep"
This task has three steps, so you should write three lines of code.
- Create the vector \(\begin{bmatrix} 1&1&0&1 \end{bmatrix}^\intercal\) and call it
binary
- Change
binary
to be a logical vector and save this as a new vector calledlogical
- Extract the first and third elements of this vector
logical
.
Section 1.3.2 Vector Indexing of Proability and Statistics with R contains further details of indexing vectors.
2.5 Sequences
So far we have manually entered the elements of all the vectors we have created. If a vector is simply a sequence of numbers (following some pattern), R has some nice functions which will speed up how we create these vectors.
The operator :
can be used, e.g 2:8
, to generate a sequence of integers from the number on the left hand side (2
), up to the number on the right hand side (8
).
The following code generates a sequence of integers from 2 up to 8.
[1] 2 3 4 5 6 7 8
This can be useful if we want to extract (or omit) a sequence of elements from a vector which are all next to each other. Let's start by creating quite a long vector (the object letters
in R is a vector of all 26 lowercase letters of the English alphabet). If we then want to extract the first five letters of the alphabet, we can use the code 1:5
within our indexing. The following code shows how this is done.
[1] "a" "b" "c" "d" "e"
What if we want to generate a non-consecutive sequence of numbers? What about a sequence of non-integer values?
We can do this using the function seq()
in R. A function is a piece of code that tells R to complete some predetermined steps which will depend on the values provided to some 'arguments' within the brackets ( )
. The arguments that the function seq()
can be provided with are:
from =
: this is the starting value of our sequence (it can be an integer or a non-integer).to =
: this is the final value of our sequence (it can be an integer or a non-integer).by =
: this is the size of the jump between consecutive numbers in our sequence.length.out =
: this is the total number of values we want to be included in our sequence.
Hopefully you can see that if we specify each of to =
, by =
and length.out =
then they might not necessarily all match up. Therefore we actually only need to specify two of to =
, by =
or length.out =
within the seq()
function.
Let's look at some examples of using seq()
.
[1] 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0
[1] 3.00 3.25 3.50 3.75 4.00 4.25 4.50 4.75 5.00
[1] 4.0 4.5 5.0 5.5 6.0 6.5 7.0
Write some code to create the sequence 4.10, 4.15, 4.20, 4.25, 4.30, 4.35, 4.40 using the function seq()
There are three different ways we could generate this sequence using the seq()
function.
[1] 4.10 4.15 4.20 4.25 4.30 4.35 4.40
[1] 4.10 4.15 4.20 4.25 4.30 4.35 4.40
[1] 4.10 4.15 4.20 4.25 4.30 4.35 4.40
You can see additional examples of using the seq()
function in Section 1.3.3 Generating Vector Sequences and Repeating Vector Constants of Probability and Statistics with R.
2.6 Repeating constants
Another function in R that's going to be useful is the rep()
function. This allows us to create a vector which consists of repeated elements, or repeated sequences of elements. The arguments that rep()
can be provided with are:
x =
: this is the value (or sequence) that we want to be repeated.times =
: this is the number of times the value should be repeated - ifx
is a sequence then the full sequence will be repeated this many times.each =
: ifx
is a sequence then this is the number of times the first element of the sequence should be repeated before moving on to repeating the second element and so on.length.out =
: this is the total number of elements we want to be included in the vector of repeated values.
If we specify all of times =
, each =
and length.out =
, they might not necessarily be in agreement with each other. Therefore, only one of times =
, each =
or length.out =
needs to be specified.
Let's look at some examples of using rep()
.
[1] 2 2 2 2 2 2
[1] 1 1 2 2 3 3
[1] 1 2 2 3 3 3 4 4 4 4
[1] TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE
Write some code to create each of the following repeating sequences using rep()
:
- 3, 4, 5, 3, 4, 5, 3, 4, 5
- 2, 2, 4, 4, 6, 6
- blue, blue, blue, red, red
Write some code to generate the sequence 2, 3, 3, 4, 5, 5, 6, 7, 7 using both the seq()
and rep()
functions.
First of all, notice that the sequence of numbers you want to repeat is 2, 3, 4, 5, 6, 7. Therefore this sequence should be generated using seq()
, or the from:to
operator, and then fed into the rep()
function as the x =
argument.
Then have a think about how many times each of these elements should be repeated. 2 should be repeated once, 3 should be repeated twice, 4 should be repeated once and so on. To do this, the times =
argument within rep()
should be given the repeating sequence 1, 2, 1, 2, 1, 2.
You can see additional examples of using the rep()
function in Section 1.3.3 Generating Vector Sequences and Repeating Vector Constants of Probability and Statistics with R.
2.7 Filtering vectors
We can combine what we have already learned in terms of logical operators (Section 2.2) and indexing vectors (Section 2.4) in order to extract elements which satisfy certain conditions.
For example, if we create the vector \(\begin{bmatrix}2 & 2 & 3 & 6 \end{bmatrix}^\intercal\), then we can extract the elements of this vector which are greater than or equal to 3 in one easy step.
[1] 3 6
This works because R will initially run the logical operator vect >= 3
which will return the following logical vector.
[1] FALSE FALSE TRUE TRUE
Using this logical vector to index inside the square brackets returns exactly the same result as above.
[1] 3 6
A function within R which can be used to give the same results as logical operators within the square brackets is the subset()
function. This returns the elements of a specified vector which satisfy a given logical expression. The arguments that subset()
can be provided with are:
x =
: this is the vector we want to extract elements from.subset =
: this is the logical expression used to select the elements to keep.
We can use subset()
to extract the elements of the vector vect
which are greater than or equal to 3, exactly as we did before.
[1] 3 6
Extract the elements of the vector z
which are greater than 5 using the subset()
function.
Section 1.3.4 Filtering Vectors of Probability and Statistics with R gives further examples on how to filter vectors.
2.8 Getting help
If you are ever unsure of the arguments required in a particular function in R, or even what that function does, there are ways to find out. R contains extensive help files on any function that can be used. To access these, simply put a ?
in front of the function name, or use the function help()
.
For example, if we were unsure what the function rep()
is used for, then we can run either of the following lines of code.
This tells us in the Help tab that the first argument for the rep()
function is called x =
and a short description of what should be provided to this argument is given. It also describes the other arguments times =
, length.out =
and each =
.
The order that arguments appear in in the help file is important because you don't actually need to use the names of the arguments when using a function. Instead, you can just provide the values for each argument, and as long as they are in the same order as they appear in in the help file, R will know which value corresponds to which argument. This is known as using positional arguments.
Sometimes you might see a value associated with a specific argument in the help file, for example x = 5
. This is the "default" value for that argument and it means that if you don't specify a different value for the argument, then R will use the default value when executing the function. We'll see more about default values as we encounter more complex functions in future labs.
For more information on accessing the online and inbuilt help in RStudio, see Section 1.5 Getting Help in Probability and Statistics with R.