Skip to content

questions about the candidate set and the function "check_molecule_dict()"  #1

@otori-bird

Description

@otori-bird

According to the paper, the candidate set should be consist of all reactants in the entrie USPTO database.

However, for the function "check_molecule_dict()" in https://github.com/hankook/RetCL/blob/main/datasets/__init__.py, i found something different.

def check_molecule_dict(mol_dict, datasets):
    for split in ['train', 'val', 'test']:
        for rxn in datasets[split]:
            assert rxn.product in mol_dict
            for reactant in rxn.reactants:
                assert reactant in mol_dict

This function seems to be quite important. The training and evaluation sciprt cannot work without passing this check function.
According to the code, should the products also be inclulded in the candidate set?

It would be great if you could make the code for how to get the candidate set public.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions