questions about the candidate set and the function "check_molecule_dict()" 

According to the paper, the candidate set should be consist of all reactants in the entrie USPTO database.

However, for the function "check_molecule_dict()"  in https://github.com/hankook/RetCL/blob/main/datasets/__init__.py, i found something different.

```python
def check_molecule_dict(mol_dict, datasets):
    for split in ['train', 'val', 'test']:
        for rxn in datasets[split]:
            assert rxn.product in mol_dict
            for reactant in rxn.reactants:
                assert reactant in mol_dict
```
This function seems to be quite important. The training and evaluation sciprt cannot work without passing this check function.
According to the code, should the products also be inclulded in the candidate set?

It would be great if you could make the code for how to get the candidate set public.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

questions about the candidate set and the function "check_molecule_dict()" #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

questions about the candidate set and the function "check_molecule_dict()" #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions