Many of us use some sort of script for our analyses in R, but the real backbone of R is functions. R packages are collections of functions that are designed to serve a specific purpose. In many ways, packages are similar to scientific articles in that they are the standard way we communicate scientific results, and readers expect them to be in a certain format. The format is generally the same: text, formulas, and figures set together in a logical order. Some articles are intended for a broad audience while others are written for a very small audience. Just as an article conveys scientific ideas to others, a package allows us to distribute a set methodology to others.
For most of us, our first encounter with R packages is via those that come bundled with the base installation of R (eg, mean()
, print()
). Packages allow people to expand the functionality of R while still enforcing some standards. Packages are also a convenient way to maintain personal functions and share them with your colleagues. Packages also offer many advantages from a system administration point of view. Packages can be dynamically loaded and unloaded on runtime and hence only occupy memory when actually used.
Here is some common terminology with respect to R packages:
Package: An extension of the base R system with code, data, and documentation combined together in a standard format
Library: A directory/folder containing installed packages
Repository: A website providing packages for installation
Source: The original version of a package with human-readable text and code
Binary: A compiled version of a package with computer-readable text and code
Base packages: Part of the R source tree, maintained by R Core
Recommended packages: Part of every R installation, but not necessarily maintained by R Core
Contributed packages: All of the remaining packages contributed by the user community
The array of packages available in R is pretty mind boggling, which means there are innumerable functions out there underlying these packages. In general, an R function has the following structure
function_name <- function(arguments) {
error checks
commands to execute
return(something)
}
So, for example, here is a simple function that adds 2 numbers together and returns the result:
add <- function(x, y) {
## verify x & y are numbers
if(!is.numeric(x) | !is.numeric(y)) {
stop("`x` and `y` must be numbers")
}
## add the 2 numbers
z <- x + y
## return the result
return(z)
}
When building packages, we will also include some additional information that describes what the function does and how it works. This information will become part of the documentation that is returned when someone types ?function_name
.
Fortunately for us, there are variety of tools available to assist us in developing and producing packages. We’ll begin by creating a new, public repository on GitHub called pets
. Populate the repo with
a brief README.md
a .gitignore
file
You can skip the license for now. Once you are finished, create a new project in RStudio based upon this repo.
At the command prompt in RSudio, type
library(devtools)
which will allow us to access the package development tools in devtools
.
We’re now ready to create a framework for building our package. Before doing so, however, note the folder/directory where you created the pets
project. For example, mine is located in ~/Documents/GitHub/FISH497/pets
. If you are unsure, go ahead and type getwd()
at the command prompt. You can copy/paste the result into the next step. Now execute the following command at the prompt:
create_package("~/Documents/GitHub/FISH497/pets")
You should see R responding with a bunch of information about your new package framework, followed by a prompt asking if you’d like to overwrite the pre-existing file pets.Rproj
. Go ahead and select any of the options that look like Nope
, No way
, Not now
, etc (note that these options will vary each time you do this, so don’t worry if yours don’t mirror the options below). After doing so, RStudio will open a new instance of your pets
project.
✓ Setting active project to '/Users/scheuerl/Documents/GitHub/FISH497/pets'
✓ Creating 'R/'
✓ Writing 'DESCRIPTION'
Package: pets
Title: What the Package Does (One Line, Title Case)
Version: 0.0.0.9000
Authors@R (parsed):
* First Last <first.last@example.com> [aut, cre] (YOUR-ORCID-ID)
Description: What the package does (one paragraph).
License: `use_mit_license()`, `use_gpl3_license()` or friends to
pick a license
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
✓ Writing 'NAMESPACE'
Overwrite pre-existing file 'pets.Rproj'?
1: Definitely
2: Negative
3: Nope
Selection: 3
✓ Leaving 'pets.Rproj' unchanged
✓ Adding '^pets\\.Rproj$' to '.Rbuildignore'
✓ Adding '.Rproj.user' to '.gitignore'
✓ Adding '^\\.Rproj\\.user$' to '.Rbuildignore'
✓ Opening '/Users/scheuerl/Documents/GitHub/FISH497/pets/' in new RStudio session
✓ Setting active project to '<no active project>'
Wow, that’s a lot of information. What just happened here? First, you should notice that several new files and a folder have been added to your directory (note that if you cannot see the hidden files below that begin with a .
, go to the Files pane in RStudio and select More > Show Hidden Files
to display them). These include:
.Rbuildignore
DESCRIPTION
NAMESPACE
/R/
.Rbuildignore
: a list of files that we’ll need to have around, but that should be excluded when building the package from source; at present this contains ^pets\.Rproj$
and ^\.Rproj\.user$
DESCRIPTION
: provides metadata about your package, the contents of which were shown above when you ran create_package()
NAMESPACE
: declares the functions your package exports for external use and the external functions your package imports from other packages
/R/
: an empty folder where we’ll put our .R
files that contain our function definitions
At this point, go ahead and quit both instances of the RStudio pets
project. Navigate to the pets.Rproj
file in the folder/directory where you set up the project initially, and double-click it to restart your project. After doing so, you will need to reload devtools
.
library(devtools)
As mentioned above, the basis of R packages is functions. Let’s go ahead and create a new function called cats()
. To do so, we’ll use the use_r()
function to open up a new blank .R
file within the /R/
directory where we can define our new function.
use_r("cats")
Now we can write our function definition. Go ahead and copy/paste the following code into the cats.R
file. When you are finished, save the file.
cats <- function(love = TRUE) {
if(love == TRUE) {
msg <- "I love cats!"
}
else {
msg <- "I am not a cat person."
}
return(print(msg))
}
Now that we’ve defined a new function, it’s a good idea to try it out. Although we could just highlight the code and execute it in the normal R environment, there is a function load_all()
that will help us better simulate the building, installing, and attaching our new cats
package. As a package accumulates more functions, load_all()
gives you a much more accurate sense of how the package is developing than simply testing functions defined in the global workspace. load_all()
also allows much faster iteration than actually going through the process of building, installing, and attaching the package.
load_all()
The function should respond with a message that the package pets
has been loaded.
ℹ Loading pets
Let’s try out our new function.
cats(TRUE)
## [1] "I love cats!"
cats(FALSE)
## [1] "I am not a cat person."
cats(1)
## [1] "I love cats!"
cats("a")
## [1] "I am not a cat person."
The last 2 examples might not make sense to you because they don’t involve a logical argument TRUE
or FALSE
. In the first case, 1
is equivalent to TRUE
in R, so it returns the result as if cats(TRUE)
. In the second case, our function definition only involves a check whether the argument is TRUE
, and if not, it returns the result "I am not a cat person."
Note that load_all()
has made the cats()
function available for us to use, but it does not exist in the global workspace. For example, you can test that this is indeed true with the following:
exists("cats", where = globalenv(), inherits = FALSE)
## [1] FALSE
This is a good time to go ahead and commit the files we created as part of create_package()
and our definition of the cats()
function. Make sure to give your commit(s) a short but descriptive name(s).
At this point, we have every reason to believe that cats()
works as expected, but should really verify that all of the elements of the pets
package indeed work. This may seem silly to check, after such a small addition, but it’s good to establish the habit of checking this often.
The standard method for checking a package’s functionality is to execute R CMD check
in the terminal or bash. However, we can make us of the check()
function to do so without leaving RStudio. Note that check()
produces a lot of output. Here is a peek at the last of it (your results may vary slightly).
check()
── R CMD check results ─────────────────────────────────────── pets 0.0.0.9000 ────
Duration: 11.2s
> checking DESCRIPTION meta-information ... WARNING
Non-standard license specification:
`use_mit_license()`, `use_gpl3_license()` or friends to pick a
license
Standardizable: FALSE
0 errors ✓ | 1 warning x | 0 notes ✓
It is very important to read over all of this information, as it will report the things that passed the check as well as any warnings or errors encountered in the process. Here we see that we have 0 errors, 1 warning, and 0 notes. The warning says,
Non-standard license specification
which is OK for now, as we’ll address that later.
Note that you can also run a package check via the Build pane in RStudio.
DESCRIPTION
The DESCRIPTION
file provides some metadata about your package, including the package name, its title, author, etc. Here are the current content of our DESCRIPTION
file:
Package: pets
Title: What the Package Does (One Line, Title Case)
Version: 0.0.0.9000
Authors@R:
person(given = "First",
family = "Last",
role = c("aut", "cre"),
email = "first.last@example.com",
comment = c(ORCID = "YOUR-ORCID-ID"))
Description: What the package does (one paragraph).
License: `use_mit_license()`, `use_gpl3_license()` or friends to
pick a license
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1
Let’s go ahead and make the following changes:
Give the package a title:
Evaluates Your Feelings About Pets
Enter your first and last names, and your email address. You can leave the role
as is and delete the comment
field asking about your ORCID
.
Write some descriptive text Description
field. For example,
Pets are awesome companions. Some people are fans of cats, others not so much.
When finished, your DESCRIPTION
file should look something like this:
Package: pets
Title: Evaluates Your Feelings About Pets
Version: 0.0.0.9000
Authors@R:
person(given = "Mark",
family = "Scheuerell",
role = c("aut", "cre"),
email = "scheuerl@uw.edu")
Description: Pets are awesome companions. Some people are fans of cats, others not so much.
License: What license it uses
Encoding: UTF-8
LazyData: true
We should add a license to our package, as it’s considered a best practice (here we’ll use the MIT license). We’ll use the helper function use_mit_license()
to do so. Make sure to use your own name in the argument.
use_mit_license("Mark Scheuerell")
✔ Setting License field in DESCRIPTION to 'MIT + file LICENSE'
✔ Writing 'LICENSE'
✔ Writing 'LICENSE.md'
✔ Adding '^LICENSE\\.md$' to '.Rbuildignore'
Go ahead and open the newly created LICENSE
file and confirm it has the current year and your name.
YEAR: 2021
COPYRIGHT HOLDER: Mark Scheuerell
use_mit_license()
will also put a copy of the full license in LICENSE.md
and adds this file to .Rbuildignore
.
If you’re like me, you often rely on a function’s documentation to help understand the arguments to a function and the value(s) that the function returns. This is typically done via a question mark preceding the function name (eg, ?print
). Writing this documentation used to be a bit onerous, but now we can easily add documentation to our package using the roxygen2
package.
To do so, we’ll add a special form of comment above our function definition in cats.R
, which we’ll denote with a #'
. Go ahead and open cats.R
and then from your RStudio menu, select Code > Insert Roxygen Skeleton
. Your cats.R
file should now look like this:
#' Title
#'
#' @param love
#'
#' @return
#' @export
#'
#' @examples
cats <- function(love = TRUE) {
if(love == TRUE) {
msg <- "I love cats!"
}
else {
msg <- "I am not a cat person."
}
return(print(msg))
}
Let’s go over the different elements of the function’s documentation.
Title
: a short descriptive phrase of what the function does
@param
: one or more arguments to the function (here there is only one: love
) and its/their description(s)
@return
: a description of what the function returns
@export
: tells roxygen2
to add this function to the NAMESPACE
file so it’s accessible to users
@examples
: one or more examples of how the function is used
Go ahead and add a title to the cats()
function. Here’s one possibility:
#' Expresses your opinion about cats
Our roxygen
skeleton already contains the one parameter in our function, but we can go ahead and add a description of the argument. It’s often a good idea to also include what, if any, the default argument is.
#' @param love A logical argument indicating whether or not you love cats (default = `TRUE`)
It’s good practice to tell the user what they should expect from a function. This is admittedly a rather simple function, which is easy to decipher, but other functions will not be nearly as transparent.
#' @return One of two possible character strings (`"I love cats!"` or `"I am not a cat person."`).
This is an optional element of the documentation. If you include @examples
without any actual code, you will get a warning about @examples requires a value
. In this case, we can add a really simple example:
#' @examples cats(TRUE)
Now that we’ve written the function’s documentation, we need to pass it along to the cats.Rd
file in the help manual. To do so, we can use the document()
function.
document()
Updating pets documentation
ℹ Loading pets
Writing NAMESPACE
Writing NAMESPACE
Writing cats.Rd
We can check that everything worked as planned by previewing our help file.
?cats
You should also notice that there is a new folder in our project directory called /man/
, which stands for “manual”. This is where the help files live. Go ahead and open cats.Rd
and it should look like this:
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/cats.R
\name{cats}
\alias{cats}
\title{Expresses your opinion about cats}
\usage{
cats(love = TRUE)
}
\arguments{
\item{`love`}{A logical argument indicating whether or not you love cats (default = \code{TRUE})}
}
\value{
One of two possible character strings (\code{"I love cats!"} or \code{"I am not a cat person."}).
}
\description{
Expresses your opinion about cats
}
\examples{
cats(TRUE)
}
NAMESPACE
In addition to converting cats()
’s special comments into man/cats.Rd
, the call to document()
also updates the NAMESPACE
file, based on the @export
line found in the roxygen
comments. Go ahead and open up your NAMESPACE
file and verify that its contents look like this:
# Generated by roxygen2: do not edit by hand
export(cats)
Now is a good time to run our diagnostic checks. Go ahead a execute check()
, which should return no errors or warnings.
check()
── R CMD check results ──────────────────────────────────────────────────── pets 0.0.0.9000 ────
Duration: 6.8s
0 errors ✓ | 0 warnings ✓ | 0 notes ✓
Now that we’ve verified that our package builds correctly, we can install our package and try it out with install()
.
install()
✓ checking for file ‘/Users/scheuerl/Documents/GitHub/FISH497/pets/DESCRIPTION’ ...
─ preparing ‘pets’:
✓ checking DESCRIPTION meta-information ...
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘pets_0.0.0.9000.tar.gz’
Running /Library/Frameworks/R.framework/Resources/bin/R CMD INSTALL \
/var/folders/t4/nzmg35y56kx8jlvpd38xmvt00000gn/T//RtmpHWEk8a/pets_0.0.0.9000.tar.gz --install-tests
* installing to library ‘/Users/scheuerl/Rlibs’
* installing *source* package ‘pets’ ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (pets)
At this point, you can run install(pets)
to load the package and use the cats()
function.
library(pets)
Note that you can also install your package via the Build pane in RStudio.