Link Invent Dataset inconsistent with the code base and prior model.

Hello and thank you for the opensource repository.

I was going through LinkInvent and wanted to train to try to train the model in a TL fashion with the dataset provided in `ReinventCommunity/notebooks/data/linkinvent_prior_training_data` and the prior model. However, I think there was an error in the process of dataset creation. This was mainly for testing the code and I am aware there is no particular use in doing this TL.

The code expects the data to have warheads/inputs as first columns and linkers/targets as the second column. This can be seen in the code as well as in the `ReinventCommunity/notebooks/models/linkinvent.prior` vocabulary which has `*` and `|` as input tokens and `[*]` as target token.

The dataset provided however follows the following setup:
Linkers/target           ----            warheads/inputs                   -----                    Full smiles
`[*]C#CC(O)CCCCCCC[*]`  ----  `*C#CCO|*CCC#CCCCCCCC(C)C`    ----       `CC(C)CCCCCCC#CCCCCCCCCCC(O)C#CC#CCO`

They should be modified to:

 Warheads/inputs       -----                          linker/target                  ----                       Full smiles
`*C#CCO|*CCC#CCCCCCCC(C)C`   ----   `*C#CCO|*CCC#CCCCCCCC(C)C`       ----`CC(C)CCCCCCC#CCCCCCCCCCC(O)C#CC#CCO`

I tried it on my hand and after doing so it worked fine. 
This might not be a big issue since in the case of LinkInvent, TL is less important. And in the case of a new model the vocabulary will be recreated. I still wanted to share this feedback since the dataset does not match the code logic.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Link Invent Dataset inconsistent with the code base and prior model. #39

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Link Invent Dataset inconsistent with the code base and prior model. #39

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions