Background

Your assignment was to create a reprex for use in a quest for help. It turns out that the problem was that the code was improperly indexing the values upon which we had hoped to compute the mean.

Setup

Let’s walk through the code to find out where our problem lies. Here’s the script 01_summary_statistics.R

## this script loads the data and calculates some summary statistics

## load libraries
library("here")

## set location of the data directory 
data_dir <- here("data")

## load data file
pisaster_data <- readRDS(file.path(data_dir, "pisaster_data.Rds"))

## peek at the data
head(pisaster_data)

## calculate mean counts across all years, sites, and plots
mean_count <- mean(pisaster_data$count)

## here() starts at /Users/scheuerl/Documents/GitHub/FISH497/website

##      year site plot count
## [1,] 2019 "a"  1    6    
## [2,] 2019 "a"  2    10   
## [3,] 2019 "a"  3    13   
## [4,] 2019 "a"  4    9    
## [5,] 2019 "a"  5    9    
## [6,] 2019 "b"  1    11

## Warning in mean.default(pisaster_data$count): argument is not numeric or
## logical: returning NA

The warning message suggests that the object we’re passing to mean() is not numeric, which seems odd at first glance because count does appear to be numeric in that its values are not surrounded by quotes as with site.

Create a reprex

Here’s an example of what my reprex would look like. The key elements here are:

the question/problem is stated clearly and succinctly
a short code snippet with only the info necessary to address the error
the data provided as a structure
all of the information is contained here (question/problem, code, data)

I'm trying to compute the mean of a column in a data frame, but I get the following error:

Warning in mean.default(ex_data$count): argument is not numeric or
logical: returning NA

Here is a shortened version of the data structure:

## example data
ex_data <- structure(list(2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 
    2019, 2019, "a", "a", "a", "a", "a", "b", "b", "b", "b", 
    "b", 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 
    6L, 10L, 13L, 9L, 9L, 11L, 7L, 10L, 12L, 
    8L), .Dim = c(10L, 4L), .Dimnames = list(
    NULL, c("year", "site", "plot", "count")))

Here is the problematic line of code:

## calculate the mean count across all times/sites/plots
mean_count <- mean(ex_data$count)

Here is my `sessionInfo()`

R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

Solution to the problem

The warning itself suggests that there is something wrong with the object we’re passing to mean() (i.e., argument is not numeric or logical), so the first thing to do would be to check on the class of the object.

## get the example data
ex_data <- structure(list(2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 
    2019, 2019, "a", "a", "a", "a", "a", "b", "b", "b", "b", 
    "b", 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 
    6L, 10L, 13L, 9L, 9L, 11L, 7L, 10L, 12L, 
    8L), .Dim = c(10L, 4L), .Dimnames = list(
    NULL, c("year", "site", "plot", "count")))

## peek at the data
head(ex_data)

##      year site plot count
## [1,] 2019 "a"  1    6    
## [2,] 2019 "a"  2    10   
## [3,] 2019 "a"  3    13   
## [4,] 2019 "a"  4    9    
## [5,] 2019 "a"  5    9    
## [6,] 2019 "b"  1    11

## what's the class of the object?
class(ex_data$count)

## [1] "NULL"

That’s odd. We can clearly see there are integers in the count column (denoted by the L in the structure(...) command). What’s the class of the larger object ex_data that count is a part of?

## what's the class of the larger object?
class(ex_data)

## [1] "matrix" "array"

OK, ex_data is not actually a data.frame, but rather a matrix (or equivalently a 2D array). That means we can’t index the columns with a dollar sign ($) and instead must use numeric or character row/column indexing (e.g., ex_data[, 4] or ex_data[, "count"]).

Let’s try to calculate the mean that way.

## Trying a different indexing
mean(ex_data[, "count"])

## Warning in mean.default(ex_data[, "count"]): argument is not numeric or logical:
## returning NA

## [1] NA

Hmm, still not working. Apparently the object is still not numeric. Let’s examine the class of the object with the proper indexing.

## inspect class
class(ex_data[, "count"])

## [1] "list"

This is getting weird. The column of numbers in count is apparently a list. Let’s take a look.

## inspect the object
ex_data[, "count"]

## [[1]]
## [1] 6
## 
## [[2]]
## [1] 10
## 
## [[3]]
## [1] 13
## 
## [[4]]
## [1] 9
## 
## [[5]]
## [1] 9
## 
## [[6]]
## [1] 11
## 
## [[7]]
## [1] 7
## 
## [[8]]
## [1] 10
## 
## [[9]]
## [1] 12
## 
## [[10]]
## [1] 8

Sure enough. Each of the numbers in the count column is an element of a list, but we can’t access the list with a standard $ operator. How does that work?

The trick here (and often elsewhere) is to examine the structure of the example data with str().

str(ex_data)

## List of 40
##  $ : num 2019
##  $ : num 2019
##  $ : num 2019
##  $ : num 2019
##  $ : num 2019
##  $ : num 2019
##  $ : num 2019
##  $ : num 2019
##  $ : num 2019
##  $ : num 2019
##  $ : chr "a"
##  $ : chr "a"
##  $ : chr "a"
##  $ : chr "a"
##  $ : chr "a"
##  $ : chr "b"
##  $ : chr "b"
##  $ : chr "b"
##  $ : chr "b"
##  $ : chr "b"
##  $ : int 1
##  $ : int 2
##  $ : int 3
##  $ : int 4
##  $ : int 5
##  $ : int 1
##  $ : int 2
##  $ : int 3
##  $ : int 4
##  $ : int 5
##  $ : int 6
##  $ : int 10
##  $ : int 13
##  $ : int 9
##  $ : int 9
##  $ : int 11
##  $ : int 7
##  $ : int 10
##  $ : int 12
##  $ : int 8
##  - attr(*, "dim")= int [1:2] 10 4
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:4] "year" "site" "plot" "count"

This clearly shows that ex_data is a list containing 40 objects, but unlike a standard list whose dim() should be NULL, this object has dimensions of 10 rows by 4 columns (i.e., attr(*, "dim")= int [1:2] 10 4). That’s why class(ex_data) == c("matrix", "array") in the code above. The solution here is to use unlist() on count() before passing it to mean().

## load `magrittr` to get the pipe operator
library(magrittr)

## SOLUTION: calculate mean counts across all years, sites, and plots
mean_count <- ex_data[,"count"] %>%
  unlist() %>%
  mean()
mean_count

## [1] 9.5

Endnote

There are 3 more hints that ex_data is not a data.frame. First, the character values in site have quotes around them instead of simply being displayed as normal text. For example, here is a comparison of ex_data before and after converting it to a data.frame:

## original
ex_data

##       year site plot count
##  [1,] 2019 "a"  1    6    
##  [2,] 2019 "a"  2    10   
##  [3,] 2019 "a"  3    13   
##  [4,] 2019 "a"  4    9    
##  [5,] 2019 "a"  5    9    
##  [6,] 2019 "b"  1    11   
##  [7,] 2019 "b"  2    7    
##  [8,] 2019 "b"  3    10   
##  [9,] 2019 "b"  4    12   
## [10,] 2019 "b"  5    8

## as a data frame
as.data.frame(ex_data)

##    year site plot count
## 1  2019    a    1     6
## 2  2019    a    2    10
## 3  2019    a    3    13
## 4  2019    a    4     9
## 5  2019    a    5     9
## 6  2019    b    1    11
## 7  2019    b    2     7
## 8  2019    b    3    10
## 9  2019    b    4    12
## 10 2019    b    5     8

Second, the row names in a matrix appear as standard row notation ([#,]) rather than a simple integer (#) as in a data frame. Third, all of the values in the columns are left-justified in the original matrix form of ex_data, but right-justified in the data frame.

The data structure I used here is rather uncommon and something that very few people would ever use. That is, I created a matrix wherein each element was itself a list with only one value in it. In so doing, I was able to combine integer/numeric and character values into the same matrix, which otherwise would not work. For example, the following line of code will convert the numeric 1 and 2 to the characters "1" and "2".

## create matrix
(mm <- matrix(c(1, 2, "a", "b"), 2, 2))

##      [,1] [,2]
## [1,] "1"  "a" 
## [2,] "2"  "b"

## check class of rows
mm %>% apply(1, class)

## [1] "character" "character"

## check class of cols
mm %>% apply(2, class)

## [1] "character" "character"

Getting help with errors in R

Answer Key for Assignment #3

FISH 497 - Intro to Environmental Data Science

29 January 2021

Background

Setup

Create a reprex

Solution to the problem

Endnote