gutenbergr

Search, download, and process public domain texts from the Project Gutenberg collection.

Installation

Install the released version from CRAN:

install.packages("gutenbergr")

Install the development version from GitHub:

# install.packages("pak")
pak::pak("ropensci/gutenbergr")

Quick Start

Load the package and any other required libraries:

library(gutenbergr)
library(dplyr)

We’ll get and set our Project Gutenberg mirror:

gutenberg_get_mirror()

#> [1] "https://gutenberg.pglaf.org"

Search through the metadata to find Jane Austen’s Persuasion:

gutenberg_works(title == "Persuasion")

#> # A tibble: 1 × 8
#>   gutenberg_id title      author       gutenberg_author_id language
#>          <int> <chr>      <chr>                      <int> <fct>   
#> 1          105 Persuasion Austen, Jane                  68 en      
#>   gutenberg_bookshelf                           rights                    has_text
#>   <chr>                                         <fct>                     <lgl>   
#> 1 Category: Novels/Category: British Literature Public domain in the USA. TRUE

Persuasion’s gutenberg_id is 105. We’ll use this ID to download it and also set our cache option to "persistent" so that we don’t have to re-download it later.

options(gutenbergr_cache_type = "persistent")
persuasion <- gutenberg_download(105)

persuasion

#> # A tibble: 8,357 × 2
#>    gutenberg_id text            
#>           <int> <chr>           
#>  1          105 "Persuasion"    
#>  2          105 ""              
#>  3          105 ""              
#>  4          105 "by Jane Austen"
#>  5          105 ""              
#>  6          105 "(1818)"        
#>  7          105 ""              
#>  8          105 ""              
#>  9          105 ""              
#> 10          105 ""              
#> # ℹ 8,347 more rows

Multiple works can be downloaded at once. We’ll also download Edna St. Vincent Millay’s Renascence and Other Poems (gutenberg_id 161) and throw in title data from the metadata.

books <- gutenberg_download(c(105, 161), meta_fields = "title")

books |> count(title)

#> # A tibble: 2 × 2
#>   title                           n
#>   <chr>                       <int>
#> 1 Persuasion                   8357
#> 2 Renascence, and Other Poems  1222

Vignettes

See the following vignettes for more advanced usage of gutenbergr.

Getting Started with gutenbergr - explore metadata and download books
Text Mining with gutenbergr and tidytext - complete analysis workflow with tidytext

FAQ

How were the metadata files generated?

See the data-raw directory for scripts. Metadata was generated from the Project Gutenberg catalog on 25 June 2026.

Do you respect robot access rules?

Yes! The package follows Project Gutenberg’s rules:

Retrieves books directly from mirrors using the authorized link format
Prioritizes .zip files to minimize bandwidth
Supports session and persistent caching
This package is designed for downloading individual works or small collections, not the entire corpus. For bulk downloads, set up a mirror.

See their Terms of Use for details.

Contributing

See CONTRIBUTING.md.

Note that this package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.