narrateR

This project is under testing and not yet formally released. Feedback welcome!

R does the statistics. AI does the communication.

narrateR generates plain-language narratives from statistical objects, model output, ggplot figures, and datasets, via any LLM that the ellmer package supports.

Philosophy

Statistical analysis requires human judgment. narrateR is not here to do your analysis for you. It is here to translate results you have already produced and validated into language appropriate for your audience.

Speculation is off by default. When speculate = FALSE (the default), the LLM is constrained to interpret only what is present in the output and to flag ambiguity rather than resolve it by guessing.

Prerequisites

narrateR uses ellmer to communicate with LLMs. You need an ellmer- compatible LLM connection — the most common options are:

Provider	Cost	Notes
Anthropic Claude	Pay per token	Developer account at console.anthropic.com — separate from Claude.ai. Set `ANTHROPIC_API_KEY`.
OpenAI	Pay per token	Account at platform.openai.com. Set `OPENAI_API_KEY`.
Groq	Free tier available	Account at console.groq.com. Runs open-source models fast. Set `GROQ_API_KEY`. Good starting point.
Local Ollama	Free	Install Ollama, pull a model. No account, no key, no data leaves your machine.
Google Gemini	Free tier (Flash models only)	Free tier restricted to Flash models; prompts may be used to train Google models. Paid tier required for privacy. Set `GOOGLE_API_KEY`.
Azure OpenAI	Institutional billing	Many universities have existing agreements — check with IT before setting up a personal account.
For any paid provider, API access likely requires a separate developer account
from any consumer subscription you may already have.

Note that output quality is directly tied to agent model capability.

Installation

install.packages("ellmer")   # provider-agnostic LLM interface

# install.packages("devtools")
devtools::install_github("sth1402/narrateR")

Getting started

Create an ellmer chat object for your provider, then pass it to initiate_narrateR(). This is the only setup step — call it once at the top of your script or Rmd setup chunk.

library(ellmer)
library(narrateR)

### Choose your provider

# Anthropic Claude (developer account required)
initiate_narrateR(chat_anthropic())

# OpenAI (developer account required)
initiate_narrateR(chat_openai())

# Local Ollama — completely free, no account needed
initiate_narrateR(chat_ollama(model = "llama3"))

# Google Gemini
initiate_narrateR(chat_google_gemini())

# Groq
initiate_narrateR(chat_groq())

Model selection, token limits, API keys, and all provider configuration are handled by ellmer, not by narrateR. See the ellmer documentation for the full list of providers and options.

# Pin to a specific model for reproducibility
initiate_narrateR(chat_anthropic(model = "claude-sonnet-4-20250514"))

# Increase token limit for large chronicle() output
initiate_narrateR(chat_anthropic(params = params(max_tokens = 2048)))

Functions

All functions share a common argument structure:

Argument	Default	Description
`audience`	`"researcher"`	See audience levels below
`speculate`	`FALSE`	If `FALSE`, interpret only what is explicitly present
`context`	`NULL`	Background about the study or data, provide this
`hint`	`NULL`	A supplementary nudge to the narrator; use this for your most critical behavioral constraints
`quiet`	`FALSE`	If `TRUE`, suppress printing and return the string visibly
`preview`	`FALSE`	If `TRUE`, the prompt is returns with an estimate of token usage - agent is not called

Audience levels

Level	Description
`"eli5"`	Explain everything; show work on every conclusion; use analogies; for someone who has never seen model output; best used in interactive mode as token usage can be high
`"lay"`	Plain language, no jargon
`"student"`	Upper-division undergraduate level
`"researcher"`	Peer-reviewed methods section (default)
`"executive"`	One paragraph, practical implications only
`"domain_expert"`	Deep domain knowledge, basic statistical literacy

`narrate()` — statistical model output

fit <- lm(mpg ~ wt + hp + cyl, data = mtcars)

# Researcher narrative
narrate(fit, audience = "researcher",
        context = "Motor Trend 1974 car data, n = 32 vehicles")

# eli5 — explain everything, show work on each conclusion
narrate(fit, audience = "eli5",
        context = "Predicting fuel efficiency from car characteristics")

# Capture as string for document writing
para <- narrate(fit, audience = "researcher", quiet = TRUE)
writeLines(para, "results_section.txt")

Works with any model object whose summary() method produces readable output: lm, glm, lme4::lmer, survival::coxph, mgcv::gam, rpart::rpart, stats::aov, stats::prcomp, and more.

For model types where the class name alone is ambiguous:

narrate(fit_b, generator = "brms::brm", audience = "researcher")

`caption()` — ggplot figure captions

Extracts structural information from a ggplot object via as.list() and generates a figure caption. No image rendering. No vision model.

library(ggplot2)

p <- ggplot(mtcars, aes(x = wt, y = mpg, colour = factor(cyl))) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(x = "Weight (1000 lbs)", y = "Miles per gallon", colour = "Cylinders")

caption(p, audience = "researcher",
        context = "1974 Motor Trend road test data, n = 32 vehicles")

On color mappings: explicitly specified colors (via scale_colour_manual()) are visible in the ggplot object and will be referenced in the caption. Default palette assignments are computed at render time and are not available; the caption will describe the mapping variable but not the specific colors.

Use quiet = TRUE to pass directly to a knitr chunk option:

```{r my-plot, fig.cap=caption(p, context = "...", quiet = TRUE)}
p
```

Note: to streamline the prompt, only specific components of the ggplot object are passed. Objects that extend ggplot and add information to non-standard fields will be under-presented to the agent. This is a work in progress.

`chronicle()` — data dictionary narrative

# Prose narrative
chronicle(mydata,
          context  = "Patient records from a cardiology clinic.",
          audience = "research team")

# Markdown table — one row per variable, consistent column structure
library(skimr)
chronicle(skim(mydata),
          context  = "Patient records from a cardiology clinic.",
          audience = "researcher",
          table    = TRUE)

# Flag potential data quality issues
chronicle(mydata,
          speculate = TRUE,
          hint      = "flag variables with more than 10 percent missingness")

# Capture and write as a data dictionary file
dict <- chronicle(skim(mydata), context = "...", table = TRUE, quiet = TRUE)
writeLines(dict, "data_dictionary.md")

Pass skimr::skim(df) rather than df directly for richer narrative; skim() provides quantiles, histograms, and detailed missingness that give the LLM more signal to work with.

speculate in chronicle() is narrower than in narrate(). When TRUE, the LLM may flag directly observable data quality considerations (unexpected missingness, distributional anomalies, implausible values) as observations worth investigating, not analytical directives.

Output modes

Default (quiet = FALSE) prints via cat(), returns invisibly. Use in Rmd chunks with results='asis':

```{r results='asis'}
narrate(fit, audience = "executive")
```

quiet = TRUE — suppresses printing, returns the string visibly:

# Document writing
para <- narrate(fit, audience = "researcher", quiet = TRUE)
writeLines(para, "grant_aim1_results.txt")

# Combine multiple narratives
full_results <- paste(
  narrate(fit1, quiet = TRUE),
  narrate(fit2, quiet = TRUE),
  sep = "\n\n"
)

# Inline Rmd — no results='asis' needed
# The model found that `r narrate(fit, audience = "lay", quiet = TRUE)`.

Session management

library(ellmer)
library(narrateR)

# Initialise
initiate_narrateR(chat_anthropic())

# Switch provider mid-session
narrateR_reset()
initiate_narrateR(chat_ollama(model = "llama3"))

A note on hallucinations

The risk is real. narrateR tries to mitigate it by:

Passing actual output text to the LLM, not asking it to recall facts
Defaulting to speculate = FALSE, constraining interpretation to what is visible in the output
Using structured, bounded input (summaries, ggplot lists, skim output) that leaves little room for invention
Making context explicit — you provide domain meaning, not the LLM

Review the output before publishing. The analyst is always responsible for the numbers. narrateR is responsible for the words.

Controlling output consistency:

LLM output is non-deterministic by default. For more consistent output across runs, useful when knitting documents repeatedly, set temperature = 0 when creating your chat object:

initiate_narrateR(chat_groq(params = params(temperature = 0)))
 initiate_narrateR(chat_anthropic(params = params(temperature = 0)))

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
R		R
man		man
tests/testthat		tests/testthat
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

narrateR

Philosophy

Prerequisites

Note that output quality is directly tied to agent model capability.

Installation

Getting started

Functions

Audience levels

`narrate()` — statistical model output

`caption()` — ggplot figure captions

`chronicle()` — data dictionary narrative

Output modes

Session management

A note on hallucinations

Controlling output consistency:

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

narrateR

Philosophy

Prerequisites

Note that output quality is directly tied to agent model capability.

Installation

Getting started

Functions

Audience levels

narrate() — statistical model output

caption() — ggplot figure captions

chronicle() — data dictionary narrative

Output modes

Session management

A note on hallucinations

Controlling output consistency:

License

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`narrate()` — statistical model output

`caption()` — ggplot figure captions

`chronicle()` — data dictionary narrative

Packages