MechDataExtractor

Usage

Use a conda environment for clean installation

$ conda create --name molseg python=3.8.0
$ conda activate molseg
$ conda install pip
$ python3 -m pip install -U pip
$ pip install -r requirements.txt

Data preparation
Ground truth images used for training are in RGB format. Image masks should be in Black and White format. they should be in identical names under imgs and masks folders.
All the images should be squares. Place them on a squared canvas if necessary. The model works well for images sizing under 600*600.
Mechanistic molecular ground truth data and image segmentation masks can be found on ZENODO.

Model Training
Run the training script or train.py.

$ sbatch scripts/train.sh

Save the best checkpoint to MODEL.pth
A pretrained checkpoint is saved to: checkpoint.pth in huggingface. If you want to use this checkpoint, simply -m checkpoint.pth after downloading it and put in your root directory.

Prediction
After training your model and saving it to MODEL.pth, you can easily test the output masks on your images via the CLI.

To predict a single image and save it:

$ python predict.py -i image.jpg -o output.jpg

To predict a multiple images and show them without saving them:

$ python predict.py -i image1.jpg image2.jpg --viz --no-save

For batch predictions, use scripts/predict.sh after setting up the bash environment. Feel free to change the input directory and the output directory for further accomodating your task.

Model result visualisations

To directly utilize the model for arrow removal, we provide the function predict_and_process.py here to directly use the default functions and checkpoints we had,

Clone or download this repo and make sure you have predict_and_postprocess.py in your working directory.
Install any dependencies by requirements.py
Prepare your input
- Create a folder (e.g. test/) and drop in your images (.jpg, .png, etc.).

Run the script

python predict_and_postprocess.py \
  --model MODEL.pth \
  --input test/

Command-line arguments

Flag	Description	Default
`-m`, `--model`	Path to your trained `.pth` model file	`MODEL.pth`
`-i`, `--input`	Required. Input image file or folder	—
`-o`, `--output`	Output directory for masks (`output/<input_name>/` if you omit)	`output/<input>/`
`-n`, `--no-save`	Don’t save mask images	false
`-v`, `--viz`	Pop up each image+mask for visual inspection	false
`-t`, `--mask-threshold`	Binarization cutoff (0 – 1)	`0.5`
`-s`, `--scale`	Scale factor for resizing input	`0.5`
`--bilinear`	Use bilinear upsampling	true
`-c`, `--classes`	Number of classes	`1`

Expected output

Mask images go to

output/<input_folder_name>/
  ├─ img1_OUT.png
  ├─ img2_OUT.png
  └─ ...

Post-processed images (white-painted, single-channel) go to

processed/<input_folder_name>/
  ├─ img1.png
  ├─ img2.png
  └─ ...

Application on Chemical Reaction Mechanism Images

We collected 296 reaction mechanism images from textbook: Named Reactions 4th edition (Li, 2009).

Usage

Each image is named with its reaction name. The images are processed with this model and parsed by RxnScribe (Qian, 2023). it contains information such as predicted molecular identity, positions and reaction conditions. Find the images and parsed dataset.

Results

Dess-Martin periodinane oxidation	Corresponding object masks

Image Postprocessing

This architecture is mainly used for noise removal in chemical reaction mechanism images. In order to remove the noise segmented out in the original image, use process.py for overlaying the image mask and the original image.

imgs_path = "ver_mech/"
masks_path = "mechrxn_arrowmask/"
processed_path = "mechrxn_processed/"

imgs_path is the original image folder path; masks_path is the images masks obtain with U-Net; processed_path can be renamed for your own interest.

Disclaimer

Note that the dataset includes errors still even though it performs better with preprocessing of arrow removals. This dataset does not aim to serve as a benchmark, but more of a centralized and unified collection of reaction that benefit future researches in both chemistry and computer vision.

References

The original U-Net paper: U-Net: Convolutional Networks for Biomedical Image Segmentation
The model took reference from Milesial/Pytorch-UNet
Molecular and reaction information extraction is employed by models from thomas0809/Molscribe and thomas0809/RxnScribe

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
rxn_data		rxn_data
scripts		scripts
unet		unet
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
evaluate.py		evaluate.py
hubconf.py		hubconf.py
infrared_removal.py		infrared_removal.py
predict.py		predict.py
predict_and_process.py		predict_and_process.py
process.py		process.py
requirements.txt		requirements.txt
resize.py		resize.py
sizecheck.py		sizecheck.py
tojpg.py		tojpg.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MechDataExtractor

Usage

Model result visualisations

Command-line arguments

Application on Chemical Reaction Mechanism Images

Usage

Results

Image Postprocessing

Disclaimer

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MechDataExtractor

Usage

Model result visualisations

Command-line arguments

Application on Chemical Reaction Mechanism Images

Usage

Results

Image Postprocessing

Disclaimer

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages