Single-Cell Protein-mRNA Modeling

Statistical modeling of the relationship between mRNA and protein abundances at single-cell resolution using XX distributions and log-normal noise models.

Overview

This project models protein abundance from mRNA counts in single cells by:

Fitting negative binomial (NB) distributions to mRNA count data
Simulating protein levels using a log-normal multiplicative noise model
Jointly optimizing mRNA and protein parameters to match observed distributions

The core model assumes:

P = (c * m + ε) * LogNormal(0, σ)

where m is mRNA count (NB distributed), c is a scaling factor, ε prevents log(0), and σ captures biological/technical noise.

Repository Structure

├── models/
│   ├── APPROACHES.md           # Modeling approaches and equations
│   └── Model_V1.ipynb          # Main analysis notebook (Google Colab)
├── preprocessing/
│   └── preproc_and_filter.R    # R preprocessing script for protein and mRNA data
├── dat/                        # Local data directory (optional)
└── requirements.txt            # Python dependencies

Data Access

Raw sequencing data is available on GEO: GSE244215

Processed data for modeling is stored on Google Drive: Data Folder

Google Drive Structure

This data is all for 1 cell type, basal cells (see preproc_and_filter.R)

Drive/
└── Protein_RNA_Modeling/
    ├── raw/           # mRNA counts, Raw protein data, metadata for both
    ├── processed/     # Cleaned/filtered protein data for our modeling purposes
    ├── results/       # Model outputs and figures
    └── notebooks/     # Working copies of Colab notebooks

Notebook Links

GitHub source links (used by the Colab badge) will break if files are moved/renamed in the repo — update the badge URL if you reorganize.
Google Drive links are based on file ID, not path — notebooks can be moved freely within Drive without breaking share links.

Data Loading (Built into Notebook)

The notebook includes a data loading cell that mounts Google Drive and loads the data. Just update the DATA_PATH variable to point to your data folder.

Quick Start

Open Model_V1.ipynb in Google Colab
Run the first cell to mount Google Drive
Update DATA_PATH to point to your data folder
Run cells sequentially

Core Functions

Function	Description
`fit_nb_mle(x)`	Fit negative binomial to mRNA counts via MLE
`simulate_protein_log2fc_from_mrna_nb()`	Simulate protein log2FC from mRNA model
`fit_sigma_log_to_protein()`	Grid search for optimal noise parameter σ
`joint_fit_nb_and_sigma()`	Joint optimization of NB and noise parameters

Dependencies

Python (Colab):

numpy
scipy
matplotlib

R (preprocessing):

Seurat
stringr
seqinr
dplyr

Usage Example

# Fit NB to mRNA counts
mu_hat, r_hat, _ = fit_nb_mle(x_mrna)

# Find optimal sigma to match observed protein distribution
best_sigma = fit_sigma_log_to_protein(y_obs, mu_hat, r_hat)

# Or jointly optimize all parameters
result = joint_fit_nb_and_sigma(
    x_mrna,
    y_obs,
    t_half_m_hours=2.0,   # mRNA half-life
    t_half_p_hours=24.0   # protein half-life
)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
dat		dat
models		models
preprocessing		preprocessing
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Single-Cell Protein-mRNA Modeling

Overview

Repository Structure

Data Access

Google Drive Structure

Notebook Links

Data Loading (Built into Notebook)

Quick Start

Core Functions

Dependencies

Usage Example

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Single-Cell Protein-mRNA Modeling

Overview

Repository Structure

Data Access

Google Drive Structure

Notebook Links

Data Loading (Built into Notebook)

Quick Start

Core Functions

Dependencies

Usage Example

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages