OpenEditors

Scraping editorial board data from academic publisher websites.

Overview

OpenEditors collects structured data about journal editors (names, roles, affiliations, countries; sometimes also ORCID IDs) from the websites of academic publishers. It matches raw affiliation data against the ROR API to obtain harmonized information about affiliations and country names. It also tracks changes over time, i.e., recording when editors are added or removed.

(Note that to avoid false positives caused by scraping failures or infinite-scroll timeouts, a confirmation window is used: an editor is only recorded as added or removed after the same outcome is observed across three consecutive scrape runs.)

The resulting dataset is publicly accessible via a dashboard at openeditors.ooir.org, and a snapshot from March 2026 can be downloaded as a CSV file at Zenodo: https://doi.org/10.5281/zenodo.19108866. Former versions are archived on Zenodo: https://doi.org/10.5281/zenodo.4619374 (versions from 2021 and 2022).

Citation

Andreas Nishikawa-Pacher, Tamara Heck, Kerstin Schoch, Open Editors: A dataset of scholarly journals' editorial board positions, Research Evaluation, Volume 32, Issue 2, April 2023, Pages 228–243. https://doi.org/10.1093/reseval/rvac037

Repository Structure

.
├── 00_create_tables.py                   # Creates all required MySQL tables
├── 01_scrape_journals.py                 # Scrapes journal catalogues per publisher
├── 02_scrape_editors.py                  # Scrapes editorial boards & tracks changes
├── 03_postprocessing_relocatecountry.py  # Extracts country from affiliation string
├── 04_postprocessing_inferror.py         # Infers ROR IDs via the ROR affiliation API
├── 05_postprocessing_countrynames.py     # Harmonises country name variants
├── fetch_browser.py                      # Playwright-based browser fetcher (human-like)
├── parsers/
│   ├── __init__.py                       # Re-exports all strategies & run_* functions
│   ├── sage.py
│   ├── frontiers.py
│   ├── elife.py
│   └── ...                               # One file per publisher
└── publisher_configs/
    ├── sage.yml
    ├── frontiers.yml
    └── ...                               # One YAML file per publisher

Pipeline Steps

`00_create_tables.py`

Creates the MySQL schema, including:

publishers — publisher names
journals — journal catalogue (title, URL, publisher key)
editors — editor records with timestamps (first_seen_at, last_seen_at, removed_at)
scrape_runs — audit log of every scrape run
pending_changes — staging table for the "confirmation window"

`01_scrape_journals.py`

Scrapes the journal catalogue for a given publisher and populates the journals table. Pagination, field selectors, and URL patterns are all integrated in the publisher's YAML config.

python 01_scrape_journals.py --publisher sage

`02_scrape_editors.py`

The main scraping script. For each journal, it fetches the editorial board page, parses the HTML, and reconciles the result against the current database state.

python 02_scrape_editors.py --publisher frontiers
python 02_scrape_editors.py --publisher sage --limit 50 --startingjournal 10
python 02_scrape_editors.py --publisher imedpub --debug

Options:

Flag	Description
`--publisher`	Publisher key (required); must match a file in `publisher_configs/`
`--limit`	Maximum number of journals to process
`--startingjournal`	1-based index to start from (useful for resuming)
`--debug`	Enables debug logging and saves screenshots when no editors are found

Change tracking & confirmation window

Editors are not immediately marked as added or removed on a single scrape. Instead, candidate changes are written to a pending_changes staging table and only committed to editors after appearing consistently in three consecutive runs for the same journal. This prevents transient network failures or incomplete infinite-scroll loads from generating spurious churn in the data.

The threshold is configurable per publisher via the YAML:

editors:
  confirmation_streak: 3   # default; override per publisher if needed

`03_postprocessing_relocatecountry.py`

For editors where country IS NULL, extracts the substring after the last comma in the affiliation field and writes it to country. (Targets the following publishers: APA, CUP, iMedPub, Longdom, SAGE, SCIRP, SciTechnol.)

python 03_postprocessing_relocatecountry.py

`04_postprocessing_inferror.py`

Calls the ROR affiliation API on raw affiliation strings to infer ROR IDs and canonical institution names.

python 04_postprocessing_inferror.py

`05_postprocessing_countrynames.py`

Harmonises country name variants (e.g. "USA" → "United States", "UK" → "United Kingdom") to a consistent controlled vocabulary.

python 05_postprocessing_countrynames.py

Configuration

Each publisher has a YAML file in publisher_configs/ and a Python parser in parsers/. The YAML controls journal catalogue scraping, editorial board URL patterns, HTML selectors, fetch mode, rate limiting, and pagination strategy. The parser contains the publisher-specific HTML parsing logic.

Example — publisher_configs/sage.yml (excerpt):

publisher:
  key: sage
  name: SAGE Journals

editors:
  url_template: "https://journals.sagepub.com/editorial-board/{journal_slug}"
  parsing:
    strategy: sage

fetch:
  mode: browser
  rate_limit:
    min_seconds_per_request: 5

Example — parsers/sage.py (excerpt):

def parse_sage(container_html: str) -> list[dict]:
    # returns a list of dicts with keys:
    # full_name, role, affiliation, country, raw_text
    ...

To add a new publisher, create both files and register the parser in parsers/__init__.py.

Setup

Requirements

Python 3.11+
MySQL 8+
Playwright (for browser-mode fetching)

Environment variables

Create a .env file in the project root:

DB_NAME=openeditors
DB_HOST=localhost
DB_PORT=3306
DB_USER=your_db_user
DB_PASSWORD=your_db_password
ROR_CLIENT_ID=your_ror_api_client_id

Database initialisation

python 00_create_tables.py

Supported Publishers

The following publishers are currently supported. Each has a corresponding YAML config and parser:

Allied Academies (probably predatory)
APA
ASCE
CUP (Cambridge University Press)
eLife
Emerald
Frontiers
iMedPub (probably predatory)
Inderscience
Karger
Longdom (probably predatory)
OMICS (probably predatory)
PeerJ
Pleiades
PLOS
RSC
SAGE
SCIRP (probably predatory)
SciTechnol (probably predatory)
Springer Nature

License

MIT

Disclosure

Claude (Sonnet 4.6) helped a lot in writing the code and in drafting this README file.

Contact

Andreas Nishikawa-Pacher · andreas.pacher@da-vienna.at

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenEditors

Overview

Citation

Repository Structure

Pipeline Steps

`00_create_tables.py`

`01_scrape_journals.py`

`02_scrape_editors.py`

Change tracking & confirmation window

`03_postprocessing_relocatecountry.py`

`04_postprocessing_inferror.py`

`05_postprocessing_countrynames.py`

Configuration

Setup

Requirements

Environment variables

Database initialisation

Supported Publishers

License

Disclosure

Contact

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
parsers		parsers
publisher_configs		publisher_configs
.gitignore		.gitignore
00_create_tables.py		00_create_tables.py
01_scrape_journals.py		01_scrape_journals.py
02_scrape_editors.py		02_scrape_editors.py
03_postprocessing_relocatecountry.py		03_postprocessing_relocatecountry.py
04_postprocessing_inferror.py		04_postprocessing_inferror.py
05_postprocessing_countrynames.py		05_postprocessing_countrynames.py
06_sync_mysqltables.py		06_sync_mysqltables.py
LICENSE		LICENSE
README.md		README.md
fetch_browser.py		fetch_browser.py

Folders and files

Latest commit

History

Repository files navigation

OpenEditors

Overview

Citation

Repository Structure

Pipeline Steps

00_create_tables.py

01_scrape_journals.py

02_scrape_editors.py

Change tracking & confirmation window

03_postprocessing_relocatecountry.py

04_postprocessing_inferror.py

05_postprocessing_countrynames.py

Configuration

Setup

Requirements

Environment variables

Database initialisation

Supported Publishers

License

Disclosure

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`00_create_tables.py`

`01_scrape_journals.py`

`02_scrape_editors.py`

`03_postprocessing_relocatecountry.py`

`04_postprocessing_inferror.py`

`05_postprocessing_countrynames.py`

Packages