= 10.1:1.1
x x
[1] 10.1 9.1 8.1 7.1 6.1 5.1 4.1 3.1 2.1 1.1
typeof(x)
[1] "double"
After reading these notes you should be able to:
R has three subsetting operators: [
(single bracket), [[
(double bracket), and $
(dollar sign). Depending on the type of object you apply them to, they may have a different behavior.
Often, but very much not always, they will be used as follows:
[
: Create a subset that is the same type of the object being subset.[[
and $
: Extract a single element, which could be a different type than the object being subsetting.Additionally, these operators can often be mixed with one of the six types types of subsetting allowed in R:
We’ll demonstrate these with each of the three key objects that we have discussed so far: atomic vectors, lists, and data frames. Recall, each of these is a vector.
Let’s start with possibly the most important, using the single bracket with atomic vectors. We’ll demonstrate each of the six types.
To demonstrate, we’ll start with a simple atomic vector x
.
First, we’ll demonstrate using a vector of integers for subsetting. Note that any numeric vector used to subset is coerced to be integer.
[1] 8.1 9.1 10.1
[1] 10.1 9.1 7.1
[1] 1.1 2.1 3.1 4.1 5.1 6.1 7.1 8.1 9.1 10.1
In each of the above, an atomic vector with the same type as x
and the same length as the vector used to subset is returned. The elements of the vector returned correspond to the elements of the original vector at the indexes of the integers supplied.
Note that you can repeat integers.
Negative integers can be used to remove indexes from the original vector.
[1] 9.1 8.1 7.1 6.1 5.1 4.1 3.1 2.1 1.1
[1] 10.1 9.1 8.1 7.1 6.1 5.1 4.1 3.1 2.1
[1] 9.1 8.1 7.1 6.1 5.1 4.1 3.1 2.1
[1] 9.1 8.1 7.1 6.1 5.1 4.1 3.1 2.1
Note that you cannot mix positive and negative integers.
Error in x[c(1, -10)] : only 0's may be mixed with negative subscripts
Perhaps the most useful, logical subsetting allows us to use a logical vector of the same length as the vector being subset. It returns the elements at the same indexes as the TRUE
values.
[1] 10.1 8.1 6.1 4.1 2.1
[1] 10.1 9.1 8.1 7.1 6.1
[1] 10.1
If you do not supply a logical vector of the same length, expect recycling.
But, beware, in this case, R will not warn you if the logical vector does not cleanly divide the vector being subset.
A missing value in the logical vector will create a missing value in the result.
If you use the brackets with nothing, R will return the entire vector. This might seem useless, but we will demonstrate its power later.
Subletting using 0
returns a vector of length zero with the same type as the vector being subset.
This is the same as subsetting with NULL
.
Using a character vector to subset will only work if the vector being subset has names.
In general, single brackets return a object of the same type with some number of elements, while double brackets are said to extract a single element.
This can sometimes be hard to notice with atomic vectors.
Recall our example vector. Now let’s subset using an integer with both single and double brackets.
What’s the difference between the code examples above? In this case, nothing.
Let’s try with a named vector.
Here, there is a subtle difference. The former preserves the names, while the latter does not. This is because the double bracket is only extracting the element. It retains none of the information about the original vector, in this case, the names.1
Double brackets can only be used with positive integer (an index) or character vectors (a name) of length one.2
The dollar sign operator, $
cannot be used with atomic vectors.
Much of subsetting a list is done in a very similar fashion to atomic vectors. However, because with single brackets the object returned is a list, sometimes this creates confusion.
Each of the six types of subsetting using a single bracket also work with list.
$a
[1] 1 2 3 4 5 6 7 8 9 10
$b
[1] "Hello, World!"
$b
[1] "Hello, World!"
$c
function (x, base = exp(1)) .Primitive("log")
$d
$d$a
[1] 1
$d$b
[1] "z"
$a
[1] 1 2 3 4 5 6 7 8 9 10
$c
function (x, base = exp(1)) .Primitive("log")
$a
[1] 1 2 3 4 5 6 7 8 9 10
$b
[1] "Hello, World!"
$c
function (x, base = exp(1)) .Primitive("log")
$d
$d$a
[1] 1
$d$b
[1] "z"
named list()
$a
[1] 1 2 3 4 5 6 7 8 9 10
$c
function (x, base = exp(1)) .Primitive("log")
Notice, for each, a list is returned. Here a single bracket preserves the list type.3
Where this might cause confusion is a subset using a single bracket that returns a list of length one.
The important thing to note here: This is a length one list. It is not simply the atomic vector contained in the first element. It is a list containing that atomic vector.
If you want to extract a particular element of a list, this is done with double brackets.
The result here is not a list, but instead the atomic vector that was the first element of the list, which in this case was an atomic vector.
What happened here? This is equivalent to the following:
Extract the first element of the list, then extract the second element of the extracted element.
The dollar sign operator is essentially a shortcut to using double brackets for a named list.
As such, it also extracts the element. It does not return a list. Unless of course the element you’re extracting is itself a list.
Recall, data frames are also vectors, and in particular a list.
As such, everything that applies to a list, applies to a data frame. Just think of it as a list with named elements.
It looks like this is something different, subsetting columns, but remember, the elements of the data frame, are the elements of a list. It just so happens that we interpret them as columns.
b c d
1 a TRUE 1
2 a FALSE 1
3 a TRUE 1
4 a FALSE 1
5 a TRUE 1
a c
1 5 TRUE
2 4 FALSE
3 3 TRUE
4 2 FALSE
5 1 TRUE
a b c d
1 5 a TRUE 1
2 4 a FALSE 1
3 3 a TRUE 1
4 2 a FALSE 1
5 1 a TRUE 1
data frame with 0 columns and 5 rows
a c
1 5 TRUE
2 4 FALSE
3 3 TRUE
4 2 FALSE
5 1 TRUE
The one oddity here, is the use of 0
to subset.
Note that this suggest this data frame still has five rows. This is due to the preserving nature of single brackets. But importantly, this object is still length zero.
Double brackets also remain the same.
And again, the dollar sign operates the same as well.
The interesting addition to subsetting methods for data frames involves an addition to the single bracket syntax. Like other single bracket operations, it will mostly return a data frame. In general, the syntax is:
Let’s look at some examples.
Recall the data frame we had assigned the name z
.
Here, were are subsetting the original data frame to only the first two rows. But leaving a blank after the comma, this gets us all of the columns.
Here, we leave a blank before the comma, so all rows, but the third and fourth column.
We can also put these together:
Since we’re using single brackets, we can also use negative integers and more.
a b c d
2 4 a FALSE 1
3 3 a TRUE 1
4 2 a FALSE 1
5 1 a TRUE 1
a b c
1 5 a TRUE
2 4 a FALSE
3 3 a TRUE
4 2 a FALSE
5 1 a TRUE
a b c
2 4 a FALSE
3 3 a TRUE
4 2 a FALSE
5 1 a TRUE
a b c d
1 5 a TRUE 1
2 4 a FALSE 1
3 3 a TRUE 1
data frame with 0 columns and 0 rows
a d
1 5 1
2 4 1
3 3 1
Beware! The following breaks a rule we’ve seen so far:
You may have hoped this returned a data frame, however, it has simplified the result to a vector. To avoid this behavior:
This behavior can cause trouble since you can’t always predict it. Later, we’ll introduce tibbles which are a more-or-less drop-in replacement for data frames that avoid this behavior.
A theme has emerged. Until this recent exception, single brackets were a preserving operation. That is, it returns an object of the same type, and keeps attributes.4
a b c d
1 5 a TRUE 1
2 4 a FALSE 1
3 3 a TRUE 1
4 2 a FALSE 1
5 1 a TRUE 1
data frame with 0 columns and 5 rows
$names
character(0)
$row.names
[1] 1 2 3 4 5
$class
[1] "data.frame"
In contrast, double brackets and dollar signs are simplifying operations. They extract an individual element and do not keep attributes.
[1] "list"
$names
[1] "a" "b" "c" "d"
$class
[1] "data.frame"
$row.names
[1] 1 2 3 4 5
[1] 5 4 3 2 1
[1] "integer"
NULL
The following table summarizes what we have seen.
Type | Simplifying | Preserving |
---|---|---|
Atomic Vector | x[[1]] |
x[1] |
List | x[[1]] |
x[1] |
Data Frame | x[[1]] |
x[1] |
Data Frame | x[, 1] |
x[, 1, drop = FALSE] |
If we mix subsetting and assignment, we can replace elements.
[1] 10.1 9.1 8.1 7.1 6.1 5.1 4.1 3.1 2.1 1.1
[1] 42.0 9.1 42.0 7.1 42.0 5.1 4.1 3.1 2.1 1.1
We could do something like the above, but also utilize recycling.
$a
[1] 1 2 3 4 5 6 7 8 9 10
$b
[1] "Hello, World!"
$c
function (x, base = exp(1)) .Primitive("log")
$d
$d$a
[1] 1
$d$b
[1] "z"
$a
[1] 1 2 3 4 5 6 7 8 9 10
$b
[1] "Hello, World!"
$c
function (x, base = exp(1)) .Primitive("log")
$d
[1] 5 4 3 2 1
a b c d
1 5 a TRUE 1
2 4 a FALSE 1
3 3 a TRUE 1
4 2 a FALSE 1
5 1 a TRUE 1
a b c d
1 42 a TRUE 1
2 42 a FALSE 1
3 42 a TRUE 1
4 42 a FALSE 1
5 42 a TRUE 1
This is where empty subsetting can become useful.
Here, we’ve replaced all elements with the value 42
. The empty subsetting allows us to do this as x = 42
would simply assign the name x
to the object 42
.
We can also use more interesting subsets, for example with data frames.
a b c d
1 42 a TRUE 1
2 0 z FALSE 42
3 42 a TRUE 1
4 42 a FALSE 1
5 42 a TRUE 1
Notice we have to be careful here. We’re attempting to replace rows, but because rows span multiple columns, hence multiple types, we need to make sure those types are present in the replacement object. In other words, a row of a data frame is a data frame, so we need to replace it with a data frame.
Or, we could deal with a lot of coercion.
'data.frame': 5 obs. of 4 variables:
$ a: num 42 0 42 42 42
$ b: chr "a" "z" "a" "a" ...
$ c: logi TRUE FALSE TRUE FALSE TRUE
$ d: num 1 42 1 1 1
a b c d
1 42 a TRUE 1
2 0 z FALSE 42
3 42 a TRUE 1
4 42 a FALSE 1
5 42 a TRUE 1
'data.frame': 5 obs. of 4 variables:
$ a: chr "42" "0" "42" "42" ...
$ b: chr "a" "z" "a" "a" ...
$ c: chr "TRUE" "FALSE" "TRUE" "FALSE" ...
$ d: chr "1" "42" "1" "1" ...
Why did coercion happen here? Hint: Remember how atomic vectors work.5
Lastly, if you replace an element with NULL
, if will be removed.
More generally, attributes.↩︎
A single logical value will appear to work, but it is really first being coerced to integer.↩︎
Other options, might simplify.↩︎
This is why there are still five rows in the odd example we saw.↩︎
Rows of data frames are not atomic vectors. Columns of data frames are (most often) atomic vectors.↩︎