Skip to content
This repository was archived by the owner on Mar 17, 2025. It is now read-only.

aramshiva/babies

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Warning

As of March 16th 2025, this repo is not maintained, it has been merged into the names repo in the sql folder.

Note

This does not include any social security numbers. The only data stored is the name, frequency, sex, year born This is public data given by the Social Security Administration

Babies

A parser for every name listed on a social security card between 1880-2023.

(Tabulated based on Social Security records as of March 3, 2024)

Your first question is probably why? to that I ask why not?

This data is pulled from the US Social Security Administration's Baby Names from Social Security Card Applications - National Dataset. This script will insert the data into a MySQL database with the following schema:

name VARCHAR(255),
sex CHAR(1),
amount INT,
year INT

Some things to keep in note:

  • As of 2024 there are around 2,117,219 rows in the database.
  • The data is stored in a folder called "names" in the same directory as this script.
  • Names with 5 or less occurrences with the sex and year are defaulted to 5 by the SSA to protect privacy
  • The sex is a single character, either "M" or "F" for Male or Female.
  • The year is the year the person was born, NOT registered.
  • The raw data is a folder. For each year of birth YYYY after 1879, we created a comma-delimited file called yobYYYY.txt. Each record in the individual annual files has the format "name,sex,number," where name is 2 to 15 characters, sex is M (male) or F (female) and "number" is the number of occurrences of the name. Each file is sorted first on sex and then on number of occurrences in descending order. When there is a tie on the number of occurrences, names are listed in alphabetical order. This sorting makes it easy to determine a name's rank. The first record for each sex has rank 1, the second record for each sex has rank 2, and so forth.

Want to run yourself?

  • Fill in the .env (use .env.example as a guide)
  • Run python3 main.py
  • Boom! Your mySQL database is now full with data, and a table with 4 columns: name, sex, amount, year

About

👶 A parser for every name listed on a Social Security Card between 1880-2023

Topics

Resources

Stars

Watchers

Forks

Contributors

Languages