Data reporter | Computational linguist | Open source enthusiast
I work at the intersection of social science and journalism to tell stories with the help of data.
- π» Data reporter at The Examination
- π Computational linguistics with a focus on NLP for Spanish
- π Building tools and resources for Latin American journalism
- π Interested in collaborating on data-driven projects
- π« Reach me at: feraguirre@riseup.net
π§ Tools
| Repository | Description |
|---|---|
| tidy-data-transformer | Transforms wide-format data into long (tidy) format by unpivoting columns into rows |
| aircraft-data | Search aircraft data by hex code or registration numbers |
| forgts | A package that extracts formatting from Excel files and applies it to great_tables objects |
| numerical-expressions | Python CLI tool describing changes between two numerical values |
| oportunidades-periodistas-latam | Website for opportunities for journalists in Latin America |
| miscellaneous-scripts | Personal scripts for automating tasks |
π§ NLP & Machine Learning
| Repository | Description |
|---|---|
| pmdm | Fine-tuned model for detecting hate speech against women in Spanish/Portuguese |
| attackdetector | Research on hate speech against journalists and activists (Mexico/Brazil) |
| hackathon-somos-nlp-2023 | Fine-tuning LLMs for detecting hate speech categories in Spanish |
| discursos-milei | Scraper and analysis of Javier Milei's speeches |
| ai4foia | Proof-of-concept to recommend recipients for FOIA requests |
| ner-spanish | Named Entity Recognition (NER) extraction for Spanish data |
| topicos-discursos-amlo | Topic modeling analysis of AMLO's speeches |
| bad-bunny | Lyrical analysis of Bad Bunny's songs |
| customized-headlines | Proof-of-concept to create customized headlines from news content based on demographic data |
| explained-recommendations | API for a system recommendation explained using generative AI |
π Data Analysis
| Repository | Description |
|---|---|
| travesticidios-argentina | Court decisions analysis on transvesticides in Argentina (2018-2023) |
| elecciones-argentina-2023 | Attacks against journalists on Twitter during 2023 Argentina elections |
| capir-transfronteriza2-2023 | Topic modeling of anti-rights groups (Brazil, Ecuador, Colombia) |
| migrantes-desaparecidos-eeuu | Missing migrants en route to the U.S. |
| violencia-obstetrica-cuba | Obstetric violence in Cuba |
| recomendaciones-escritoras | Recommendation system for Latin American women writers |
| cancilleria-colombia | Data analysis of public servants of Foreign Affairs in Colombia |
| gptzero-ai-articles | Data analysis of articles talking about ChatGPT that were created with generative AI models |
| covid19-venezuela | COVID-19 deaths analysis in Venezuela |
πΊοΈ Data Visualization
| Repository | Description |
|---|---|
| escritoras-latinas | Wikipedia scraping + network visualization of Latin American women writers |
| comision-revision-bolivia | Femicide rate map in Bolivia (2013-2020) |
| ping-pong-caba | Map of public ping pong tables in Buenos Aires |
| wifi-gratuito-cdmx | Map showing locations of public free internet service in Mexico City [ARCHIVED] |
| mapa-huertos | Map with locations of urban orchards in Mexico City [ARCHIVED] |
| maps-examples | Maps examples using folium and prettymaps modules in Python [ARCHIVED] |
π·οΈ Web Scraping
| Repository | Description |
|---|---|
| pubmed-scraper | Python CLI tool for scraping PubMed based on keywords search |
| cij-argentina | PDF scraper for Argentina's CIJ website |
| pdf-2-ner | Web app converting scanned PDFs to text + Spanish NER |
| opportunities-db | Scraper for opportunity-related websites (funds, scholarships) |
π Project Templates
| Repository | Description |
|---|---|
| cookiecutter-data-journalism | Template for data journalism projects |
| cookiecutter-data-analysis-extensive | Full-featured data analysis template |
| cookiecutter-data-analysis-lite | Beginner-friendly data analysis template |
π Learning Resources
| Repository | Description |
|---|---|
| csvconf-nlp | Intro to NLP session at csv,conf,v8 (Puebla, 2024) |
| taller-cookiecutter | Workshop on creating project templates |
| taller-python | Jupyter notebooks for Python basics |
| learn-python | Python scripts organized by topics |
| learn-react-d3 | Data visualization with React + D3.js |
| learn-scrollama | Scrollytelling examples |
| twitter-python | Examples for Twitter data collection with Tweepy in Python [ARCHIVED] |

