we present a prompt-leaning framework that integrates gene expression data into large language models (LLMs) to generate low-dimensional cell embeddings, called scPT. By generating prompts from gene expression profiles and putting them into transformer layers, scPT enhances the LLM embeddings, effectively fusing expression and text identity information.

- Python==3.10
- CUDA 12.2
To install scPT with Nvidia GPU CUDA support, for Linux Systems:
conda create -n scPT python=3.10
conda activate scPT
pip install -r requirements.txt- All the data can be found in the supplementary materials of the article.
- The model expects input files in
.h5adformat. asap.py: example script for ASAP dataset preprocessing.
python train.pyHyperparameters and datasets can be easily adjusted by editing the files as needed.
- You can download the nomic-ai from https://huggingface.co/nomic-ai/nomic-embed-text-v2-moe/tree/main.
- Use
train.pyto train the model, then you can obtain the data embeddings and model parameters. - We use
result.pyto perform the final result analysis for all methods, the results of the spatial data can be found in thespatialfolder.