Skip to content

sth1402/narrateR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

narrateR

This project is under testing and not yet formally released. Feedback welcome!

R does the statistics. AI does the communication.

narrateR generates plain-language narratives from statistical objects, model output, ggplot figures, and datasets, via any LLM that the ellmer package supports.


Philosophy

Statistical analysis requires human judgment. narrateR is not here to do your analysis for you. It is here to translate results you have already produced and validated into language appropriate for your audience.

Speculation is off by default. When speculate = FALSE (the default), the LLM is constrained to interpret only what is present in the output and to flag ambiguity rather than resolve it by guessing.


Prerequisites

narrateR uses ellmer to communicate with LLMs. You need an ellmer- compatible LLM connection — the most common options are:

Provider Cost Notes
Anthropic Claude Pay per token Developer account at console.anthropic.com — separate from Claude.ai. Set ANTHROPIC_API_KEY.
OpenAI Pay per token Account at platform.openai.com. Set OPENAI_API_KEY.
Groq Free tier available Account at console.groq.com. Runs open-source models fast. Set GROQ_API_KEY. Good starting point.
Local Ollama Free Install Ollama, pull a model. No account, no key, no data leaves your machine.
Google Gemini Free tier (Flash models only) Free tier restricted to Flash models; prompts may be used to train Google models. Paid tier required for privacy. Set GOOGLE_API_KEY.
Azure OpenAI Institutional billing Many universities have existing agreements — check with IT before setting up a personal account.
For any paid provider, API access likely requires a separate developer account
from any consumer subscription you may already have.

Note that output quality is directly tied to agent model capability.

Installation

install.packages("ellmer")   # provider-agnostic LLM interface

# install.packages("devtools")
devtools::install_github("sth1402/narrateR")

Getting started

Create an ellmer chat object for your provider, then pass it to initiate_narrateR(). This is the only setup step — call it once at the top of your script or Rmd setup chunk.

library(ellmer)
library(narrateR)

### Choose your provider

# Anthropic Claude (developer account required)
initiate_narrateR(chat_anthropic())

# OpenAI (developer account required)
initiate_narrateR(chat_openai())

# Local Ollama — completely free, no account needed
initiate_narrateR(chat_ollama(model = "llama3"))

# Google Gemini
initiate_narrateR(chat_google_gemini())

# Groq
initiate_narrateR(chat_groq())

Model selection, token limits, API keys, and all provider configuration are handled by ellmer, not by narrateR. See the ellmer documentation for the full list of providers and options.

# Pin to a specific model for reproducibility
initiate_narrateR(chat_anthropic(model = "claude-sonnet-4-20250514"))

# Increase token limit for large chronicle() output
initiate_narrateR(chat_anthropic(params = params(max_tokens = 2048)))

Functions

All functions share a common argument structure:

Argument Default Description
audience "researcher" See audience levels below
speculate FALSE If FALSE, interpret only what is explicitly present
context NULL Background about the study or data, provide this
hint NULL A supplementary nudge to the narrator; use this for your most critical behavioral constraints
quiet FALSE If TRUE, suppress printing and return the string visibly
preview FALSE If TRUE, the prompt is returns with an estimate of token usage - agent is not called

Audience levels

Level Description
"eli5" Explain everything; show work on every conclusion; use analogies; for someone who has never seen model output; best used in interactive mode as token usage can be high
"lay" Plain language, no jargon
"student" Upper-division undergraduate level
"researcher" Peer-reviewed methods section (default)
"executive" One paragraph, practical implications only
"domain_expert" Deep domain knowledge, basic statistical literacy

narrate() — statistical model output

fit <- lm(mpg ~ wt + hp + cyl, data = mtcars)

# Researcher narrative
narrate(fit, audience = "researcher",
        context = "Motor Trend 1974 car data, n = 32 vehicles")

# eli5 — explain everything, show work on each conclusion
narrate(fit, audience = "eli5",
        context = "Predicting fuel efficiency from car characteristics")

# Capture as string for document writing
para <- narrate(fit, audience = "researcher", quiet = TRUE)
writeLines(para, "results_section.txt")

Works with any model object whose summary() method produces readable output: lm, glm, lme4::lmer, survival::coxph, mgcv::gam, rpart::rpart, stats::aov, stats::prcomp, and more.

For model types where the class name alone is ambiguous:

narrate(fit_b, generator = "brms::brm", audience = "researcher")

caption() — ggplot figure captions

Extracts structural information from a ggplot object via as.list() and generates a figure caption. No image rendering. No vision model.

library(ggplot2)

p <- ggplot(mtcars, aes(x = wt, y = mpg, colour = factor(cyl))) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(x = "Weight (1000 lbs)", y = "Miles per gallon", colour = "Cylinders")

caption(p, audience = "researcher",
        context = "1974 Motor Trend road test data, n = 32 vehicles")

On color mappings: explicitly specified colors (via scale_colour_manual()) are visible in the ggplot object and will be referenced in the caption. Default palette assignments are computed at render time and are not available; the caption will describe the mapping variable but not the specific colors.

Use quiet = TRUE to pass directly to a knitr chunk option:

```{r my-plot, fig.cap=caption(p, context = "...", quiet = TRUE)}
p
```

Note: to streamline the prompt, only specific components of the ggplot object are passed. Objects that extend ggplot and add information to non-standard fields will be under-presented to the agent. This is a work in progress.


chronicle() — data dictionary narrative

# Prose narrative
chronicle(mydata,
          context  = "Patient records from a cardiology clinic.",
          audience = "research team")

# Markdown table — one row per variable, consistent column structure
library(skimr)
chronicle(skim(mydata),
          context  = "Patient records from a cardiology clinic.",
          audience = "researcher",
          table    = TRUE)

# Flag potential data quality issues
chronicle(mydata,
          speculate = TRUE,
          hint      = "flag variables with more than 10 percent missingness")

# Capture and write as a data dictionary file
dict <- chronicle(skim(mydata), context = "...", table = TRUE, quiet = TRUE)
writeLines(dict, "data_dictionary.md")

Pass skimr::skim(df) rather than df directly for richer narrative; skim() provides quantiles, histograms, and detailed missingness that give the LLM more signal to work with.

speculate in chronicle() is narrower than in narrate(). When TRUE, the LLM may flag directly observable data quality considerations (unexpected missingness, distributional anomalies, implausible values) as observations worth investigating, not analytical directives.


Output modes

Default (quiet = FALSE) prints via cat(), returns invisibly. Use in Rmd chunks with results='asis':

```{r results='asis'}
narrate(fit, audience = "executive")
```

quiet = TRUE — suppresses printing, returns the string visibly:

# Document writing
para <- narrate(fit, audience = "researcher", quiet = TRUE)
writeLines(para, "grant_aim1_results.txt")

# Combine multiple narratives
full_results <- paste(
  narrate(fit1, quiet = TRUE),
  narrate(fit2, quiet = TRUE),
  sep = "\n\n"
)

# Inline Rmd — no results='asis' needed
# The model found that `r narrate(fit, audience = "lay", quiet = TRUE)`.

Session management

library(ellmer)
library(narrateR)

# Initialise
initiate_narrateR(chat_anthropic())

# Switch provider mid-session
narrateR_reset()
initiate_narrateR(chat_ollama(model = "llama3"))

A note on hallucinations

The risk is real. narrateR tries to mitigate it by:

  1. Passing actual output text to the LLM, not asking it to recall facts
  2. Defaulting to speculate = FALSE, constraining interpretation to what is visible in the output
  3. Using structured, bounded input (summaries, ggplot lists, skim output) that leaves little room for invention
  4. Making context explicit — you provide domain meaning, not the LLM

Review the output before publishing. The analyst is always responsible for the numbers. narrateR is responsible for the words.

Controlling output consistency:

LLM output is non-deterministic by default. For more consistent output across runs, useful when knitting documents repeatedly, set temperature = 0 when creating your chat object:

initiate_narrateR(chat_groq(params = params(temperature = 0)))
 initiate_narrateR(chat_anthropic(params = params(temperature = 0)))

License

MIT

About

Generates plain-language narratives from R statistical objects, model output, ggplot figures, and datasets, via any LLM that the `ellmer` package supports.

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages