Based on the publication:
Orlov, A. A., Akhmetshin, T. N., Horvath, D., Marcou, G., & Varnek, A. "From High Dimensions to Human Insight: Exploring Dimensionality Reduction for Chemical Space Visualization." Molecular Informatics, 2024, 44(1). DOI: 10.1002/minf.202400265
Requires Python 3.11 and uv.
git clone https://github.com/AxelRolov/cdr_bench.git
cd cdr_bench
uv sync# 1. Generate molecular descriptors from SMILES
python scripts/generate_descriptors.py bench_configs/features.toml
# 2. Run benchmarking (grid search optimization)
python scripts/run_benchmarking.py --config bench_configs/run_benchmarking.toml
# 3. Analyze and aggregate results
python scripts/analyze_results.py --input_dir results/ --output_dir results/ --k_hit 20Full documentation is available at axelrolov.github.io/cdr_bench.
cdr_bench/
├── src/cdr_bench/ # Core library
│ ├── dr_methods/ # DimReducer wrapper (PCA, UMAP, t-SNE, GTM)
│ ├── optimization/ # Grid search optimizer and parameter definitions
│ ├── scoring/ # Quality metrics (NN overlap, co-ranking, trustworthiness)
│ ├── io_utils/ # HDF5 I/O, config loading, data preprocessing
│ ├── features/ # Descriptor generation (Morgan FP, MACCS, ChemDist)
│ └── visualization/ # Plotting utilities
├── scripts/ # Pipeline scripts
│ ├── run_benchmarking.py
│ ├── generate_descriptors.py
│ ├── analyze_results.py
│ ├── prepare_lolo.py
│ └── analyze_lib_distance_preservation.py
├── bench_configs/ # TOML configuration files
│ ├── run_benchmarking.toml
│ ├── features.toml
│ └── method_configs/ # Per-method hyperparameter grids
├── datasets/ # Sample ChEMBL datasets (HDF5)
├── results/ # Benchmark results and metrics
├── notebooks/ # Jupyter notebooks for analysis
└── tests/ # Test suite
The datasets/ directory contains ChEMBL subset datasets used in the study. Full datasets and all embeddings are available on Zenodo.
@article{orlov2024high,
title={From High Dimensions to Human Insight: Exploring Dimensionality Reduction for Chemical Space Visualization},
author={Orlov, Alexey A. and Akhmetshin, Tagir N. and Horvath, Dragos and Marcou, Gilles and Varnek, Alexandre},
journal={Molecular Informatics},
volume={44},
number={1},
pages={e202400265},
year={2024},
doi={10.1002/minf.202400265}
}The GTM results in the original publication were obtained using an in-house implementation. This repository uses the open-source ChemographyKit for GTM. If you use it, please cite the ChemographyKit publication as well.
