`lapply(X = some_list, FUN = f)`

# Iteration

In programming, **iteration** is the act of repeating a set of instructions. This can be done several different ways:

- Repeat until some condition is met.
- Repeat a certain number of times.
- Repeat through elements of a vector.

In R, the last example here, repeating through elements of a vector, is by far the most common. Because of this, R has built-in functions that make this type of iteration extremely easy. While R does provide the usual iteration abilities through the use of `for`

and `while`

loops, these should *not* be your go-to methods for performing iteration.

After reading these notes you should be able to:

- TODO: learning objectives

## Apply Functions

One of the most common operations that you will encounter while programming with R is running a function with each element of some vector as input and collecting the results in a vector.

`lapply`

The function in R that performs the operation described above is `lapply`

. The general syntax is:

That is, `some_list`

is a vector (atomic vector or list) that the function `f`

will be “applied” to each element of. Note that it is customary to not name the arguments to `lapply`

.

`lapply(some_list, f)`

Let’s start with a very simple example.

`lapply(1:3, log)`

```
[[1]]
[1] 0
[[2]]
[1] 0.6931472
[[3]]
[1] 1.098612
```

Here we see the `log`

function applied to each of the elements of the vector `1:3`

. This would be the same as running the following:

```
list(
log(1),
log(2),
log(3)
)
```

```
[[1]]
[1] 0
[[2]]
[1] 0.6931472
[[3]]
[1] 1.098612
```

Clearly, this isn’t a particularly useful example, as we could simply do the following:

`log(1:3)`

`[1] 0.0000000 0.6931472 1.0986123`

Although, note that `lapply`

is returning a list, but the above returns an atomic vector. More on that in a moment.

For now, know that `lapply`

will return a **list** that has the **same length** as the input vector.^{1}

Let’s look at an example of iterating over a list.

```
set.seed(42)
= list(a = runif(5),
ex_list b = runif(5),
c = runif(5))
```

` ex_list`

```
$a
[1] 0.9148060 0.9370754 0.2861395 0.8304476 0.6417455
$b
[1] 0.5190959 0.7365883 0.1346666 0.6569923 0.7050648
$c
[1] 0.4577418 0.7191123 0.9346722 0.2554288 0.4622928
```

`lapply(ex_list, max)`

```
$a
[1] 0.9370754
$b
[1] 0.7365883
$c
[1] 0.9346722
```

Again, here the input was a list of length three, so the output is as well. You might wish the output was an atomic vector. Again, more on that soon.

`lapply(ex_list, range)`

```
$a
[1] 0.2861395 0.9370754
$b
[1] 0.1346666 0.7365883
$c
[1] 0.2554288 0.9346722
```

Finally, a slightly more useful example. This returns the same object as the following:

```
list(
range(ex_list[[1]]),
range(ex_list[[2]]),
range(ex_list[[3]])
)
```

```
[[1]]
[1] 0.2861395 0.9370754
[[2]]
[1] 0.1346666 0.7365883
[[3]]
[1] 0.2554288 0.9346722
```

Hopefully, it is becoming clear that `lapply`

can be used to write concise, useful, and readable code.

What if we want to use a function with more than one argument? For example:

```
= function(x, c, p) {
multiply_and_power * x ^ p
c }
```

`multiply_and_power(x = 2, c = 3, p = 0.5)`

`[1] 4.242641`

`multiply_and_power(x = 2, c = 1:3, p = 0.5)`

`[1] 1.414214 2.828427 4.242641`

Be aware that depending on how we specify the values we pass to the arguments, there is likely going to be some length coercion taking place.

To use this function together with `lapply`

, we simply add the values of the additional parameters as arguments to `lapply`

.^{2}

`lapply(1:3, multiply_and_power, c = 1:5, p = 2)`

```
[[1]]
[1] 1 2 3 4 5
[[2]]
[1] 4 8 12 16 20
[[3]]
[1] 9 18 27 36 45
```

What did this code do?

```
list(
multiply_and_power(x = 1, c = 1:5, p = 2),
multiply_and_power(x = 2, c = 1:5, p = 2),
multiply_and_power(x = 3, c = 1:5, p = 2)
)
```

```
[[1]]
[1] 1 2 3 4 5
[[2]]
[1] 4 8 12 16 20
[[3]]
[1] 9 18 27 36 45
```

What if we wanted to iterate over a different argument, say `c`

instead of `x`

? Specify `x`

and `p`

in the call to `lapply`

. Now `lapply`

will iterate over `c`

.

`lapply(1:3, multiply_and_power, x = 1:5, p = 2)`

```
[[1]]
[1] 1 4 9 16 25
[[2]]
[1] 2 8 18 32 50
[[3]]
[1] 3 12 27 48 75
```

So, this time, we did the following:

```
list(
multiply_and_power(x = 1:5, c = 1, p = 2),
multiply_and_power(x = 1:5, c = 2, p = 2),
multiply_and_power(x = 1:5, c = 3, p = 2)
)
```

```
[[1]]
[1] 1 4 9 16 25
[[2]]
[1] 2 8 18 32 50
[[3]]
[1] 3 12 27 48 75
```

Sure, you could simply use this instead, but imagine needed to iterate over `1:100000`

instead.

`sapply`

Let’s return to the example that found the maximum of each element of a list.

```
set.seed(42)
= list(a = runif(5),
ex_list b = runif(5),
c = runif(5))
```

`lapply(ex_list, max)`

```
$a
[1] 0.9370754
$b
[1] 0.7365883
$c
[1] 0.9346722
```

As expected, the result is a list. However, notice that each element of said list is an atomic vector of length one, of the same type. We could actually check that using `lapply`

.

`lapply(lapply(ex_list, max), typeof)`

```
$a
[1] "double"
$b
[1] "double"
$c
[1] "double"
```

`lapply(lapply(ex_list, max), length)`

```
$a
[1] 1
$b
[1] 1
$c
[1] 1
```

It probably seems like what we really want as output here is an atomic vector that is the same length as the input vector. We can obtain this result by switching from `lapply`

to `sapply`

.

`sapply(ex_list, max)`

```
a b c
0.9370754 0.7365883 0.9346722
```

The “S” in `sapply`

refers to the simplifying action taken by the function. Much of the details of how the simplification works follows the usual rules of the coercion hierarchy. It is probably best not to worry too much about these rules, and not rely on simplification too much. Generally, it is probably best to use `sapply`

in the case we’ve just seen here: you are certain the result of the function applied to each element is an atomic vector of length one, each with the same type.

Another example:

`sapply(1:3, log)`

`[1] 0.0000000 0.6931472 1.0986123`

But again, this example isn’t truly necessary, as the following is even better:

`log(1:3)`

`[1] 0.0000000 0.6931472 1.0986123`

We show this to demonstrate that many operations in R are already vectorized, so there is no need to iterate.

### Other Apply Functions

Other apply functions exist. Many are rarely used. One that might be of interest is `vapply`

which will do simplification like `sapply`

, but the user will need to specify the expected outcome of each iteration, which will make the simplification more predictable.

`vapply(1:3, log, double(1))`

`[1] 0.0000000 0.6931472 1.0986123`

`vapply(1:3, log, integer(1))`

```
Error in vapply(1:3, log, integer(1)) : values must be type 'integer',
but FUN(X[[1]]) result is type 'double'
```

Another that you will likely see is the `apply`

function. We would advise avoiding this unless you truly understand what it does. Also, beware, it should probably not be used with data frames.^{3}

Later we will look at an alternative solution to the apply function through the use of the `purrr`

package.

## Loops

Loops are another form of control flow. Essentially, they allow you to explicitly specify the repetition of some code, in contrast to the apply functions above that did so explicitly.

Welcome to R Club.

- The first rule of R Club is: Do not use
`for`

loops!- The second rule of R Club is:
Do not use`for`

loops!- And the third and final rule: If you have to use a
`for`

loop, do not grow vectors!— Unknown

Loops are very common in programming, however, in R, it is best to avoid them unless you truly *need* them. The general hueristic you should use to determine if you need a loop or apply function is:

- Use a
**loop**when the result of the next iteration*depends*on the result of the previous iteration. - Use an
**apply**function when the results of each iteration are*independent*.^{4}

`for`

The most common looping structure is a `for`

loop. The generic syntax is:

```
for (element in vector) {
code_to_run }
```

We’ll refer to `element`

as the loop variable.

Let’s look at a specific example.

```
= double(length = 5)
x for (i in 1:5) {
= i ^ 2
x[i]
} x
```

`[1] 1 4 9 16 25`

First, note that `for`

is *not* a function, which is why you should consider placing a space between it and the parenthesis that follows. Next, `(i in 1:5)`

is considered the header of the loop which defines how the iteration will take place. Here the name of the loop variable is `i`

. The code inside the braces, `{}`

is called the body of the loop, much like the body of a function.

- Each time through the loop,
`i`

, will take one of the values from`1:5`

. Or generally, the loop variable will take the value of each element of a vector. - For each value of
`i`

, the code`x[i] = i ^ 2`

will run. In general, for each value of the looping variable, the code in the body will run. And often, that code will depend on the looping variable, like we see here.

So, the above for loop ran each of the following:

```
1] = 1 ^ 2
x[2] = 2 ^ 2
x[3] = 3 ^ 2
x[4] = 4 ^ 2
x[5] = 5 ^ 2 x[
```

This should make it clear that the purpose of a loop is to repeat code, without actually having to repeatedly type it.

As has become a theme, this for loop is truly useless in R. We could have simply done:

`1:5) ^ 2 (`

`[1] 1 4 9 16 25`

Here, `i`

is functioning much like the name of a function argument, except now, we pass a new value, an element of `1:5`

, each time through the loop.

You can use any name you want for the loop variable, but `i`

, `j`

, and `k`

are most common.

```
for (some_long_var_name in 1:5) {
print(some_long_var_name)
}
```

```
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
```

A `for`

loop is a very powerful structure, so it will not be possible for us to illustrate all possible usage examples.

Let’s look at a correct loop written poorly, then the same loop written better, and try to draw some conclusion about best practices with `for`

loops.

Before proceeding, let’s introduce the `seq_along`

function.

`seq_along(5:1)`

`[1] 1 2 3 4 5`

Essentially, `seq_along`

returns the indexes of a vector. Or, you could think of it as returning the result of the following:

`1:length(5:1)`

`[1] 1 2 3 4 5`

Let’s use a `for`

loop to create a sequence of numbers. The first two numbers will be `10`

, and `5`

. Elements after that will be calculated as:

\[ x_i = 3 \cdot \frac{x_{i - 1}}{x_{i - 2}} \]

We’ll use a loop to create a sequence of length ten that follows this specification.

First, a bad example of how to write a loop to accomplish this:

```
for (i in 1:10) {
if (i == 1) {
= 10
x else if (i == 2) {
} = c(x, 5)
x else {
} = c(x, 3 * x[i - 1] / x[i - 2])
x
}
} x
```

` [1] 10.0 5.0 1.5 0.9 1.8 6.0 10.0 5.0 1.5 0.9`

We see the correct resulting vector, `x`

, but we have used some sub-optimal technique. In particular, we “grew” the `x`

vector. The use of `x = c(x, some_new_element)`

takes what was `x`

, then creates a new `x`

but combining the previous `x`

with some new element. **Do not do this.** This is one of the reasons people incorrectly think R is slow. This operation *is* slow, but there is no need for it.

Instead, let’s pre-allocate the `x`

which we will store our results in.

```
= double(10) # pre-allocate x to be a double vector of the correct length
x for (i in seq_along(x)) {
if (i == 1) {
= 10
x[i] else if (i == 2) {
} = 5
x[i] else {
} = 3 * x[i - 1] / x[i - 2]
x[i]
}
} x
```

` [1] 10.0 5.0 1.5 0.9 1.8 6.0 10.0 5.0 1.5 0.9`

This time, since `x`

already existed, we are simply *replacing* individual elements of an already existing vector. This is faster. Any time you grow or add new elements (that is you increase the length of a vector) there is a copy operation taking place under the hood that you could have avoided.

Also, by pre-allocating `x`

, we can now use `seq_along(x)`

. In some applications we might be creating `x`

with a program, and we wouldn’t know its length ahead of time!

Some general ideas to keep in mind:

- Do not attempt to iterate over and store results in the same vector.
- Pre-allocate a “results” vector and update individual elements as you progress through the loop. Do not grow vectors.
- Use
`seq_along`

and iterate over indexes rather than elements of a vector.

We’ve already discussed why the second item is a problem. Let’s now create an example that demonstrate items one and three.

The following function will check if an number is even.

```
= function(x) {
is_even %% 2 == 0
x }
```

We also create a vector `y`

that stores some numbers.

```
set.seed(42)
= sample(1:10, size = 20, replace = TRUE)
y y
```

` [1] 1 5 1 9 10 4 2 10 1 8 7 4 9 5 4 10 2 3 9 9`

Our goal is to create a logical vector, the same length as `y`

, containing `TRUE`

at any index where `y`

is even.

This will, not work:

```
for (i in y) {
= is_even(i)
y[i] }
```

To better see the issue, temporarily place a `print()`

statement inside the loop.

```
set.seed(42)
= sample(1:10, size = 20, replace = TRUE)
y for (i in y) {
print(i)
= is_even(i)
y[i] }
```

```
[1] 1
[1] 5
[1] 1
[1] 9
[1] 10
[1] 4
[1] 2
[1] 10
[1] 1
[1] 8
[1] 7
[1] 4
[1] 9
[1] 5
[1] 4
[1] 10
[1] 2
[1] 3
[1] 9
[1] 9
```

` y`

` [1] 0 1 0 1 0 4 0 1 0 1 7 4 9 5 4 10 2 3 9 9`

So `i`

takes values from `y`

, but by doing so, we don’t have access to the indexes at which we need to replace with the result of `is_even`

. Let’s use `seq_along`

.

```
set.seed(42)
= sample(1:10, size = 20, replace = TRUE)
y for (i in seq_along(y)) {
= is_even(y[i])
y[i]
} y
```

` [1] 0 0 0 0 1 1 1 1 0 1 0 1 0 0 1 1 1 0 0 0`

Note that inside of `i`

, we now need to change `i`

to `y[i]`

to get the value rather than the index each time through the loop.

But there’s still an issue! We have `0`

and `1`

instead of `FALSE`

and `TRUE`

. Coercion!

```
set.seed(42)
= sample(1:10, size = 20, replace = TRUE)
y = logical(length(y))
res for (i in seq_along(y)) {
= is_even(y[i])
res[i]
} res
```

```
[1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE
[13] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE
```

Much better. But again, remember, many things in R are vectorized:

`is_even(y)`

```
[1] FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE TRUE FALSE TRUE
[13] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE
```

This example did not need a loop, because results from one iteration to the next were independent. In the previous example, this was not the case, and was an example of when you truly need a loop.

Note that these examples have used atomic vectors, but, not reason we couldn’t use a list!

`while`

A `while`

loop will repeat code until some condition is no longer met. The general syntax is:

```
while (condition) {
code_to_run }
```

Let’s see an example.

```
= 10
x = double(length = 10)
y while (x > 0) {
print(x)
= x ^ 2
y[x] = x - 1
x }
```

```
[1] 10
[1] 9
[1] 8
[1] 7
[1] 6
[1] 5
[1] 4
[1] 3
[1] 2
[1] 1
```

` x`

`[1] 0`

` y`

` [1] 1 4 9 16 25 36 49 64 81 100`

Here, the loop runs until `x`

is no longer greater than `0`

.

Notice, that if we don’t modify `x`

inside the loop, it could run forever! An infinite loop!^{5}

You’ll see `for`

loop more often, but `while`

loops are useful when you don’t know how many iterations you’ll need ahead of time, but instead have a stopping condition.

```
= 1
x while(x > 1e-10) {
print(x)
= x / 2
x }
```

`repeat`

- TODO: not very important, ignore for now.
- TODO: next, break

## Summary

- TODO: You’ve learned to…

## What’s Next?

- TODO: ?

## Footnotes

Think

**l**for list. Although, it is unclear if that is the etymology of the name of the`lapply`

function.↩︎If you check the documentation for

`lapply`

, you’ll notice an argument called`...`

. More on this later, but this is what allows R to pass these additional arguments to the function.↩︎The

`apply`

function is useful when working with matrix objects, which we have been avoiding.↩︎Also check that you can’t just use a vectorized operation.↩︎

If you experience an infinite loop, use Ctrl + C in the console to escape it. Or press the stop button.↩︎