Skip to content
This repository was archived by the owner on Jul 17, 2025. It is now read-only.
This repository was archived by the owner on Jul 17, 2025. It is now read-only.

train_vqa (and others) hanging when using multiprocessing #12

@george-larionov

Description

@george-larionov

I'm trying to run this model as per the instructions and it keeps hanging, usually during or after optimizer.step() but sometimes in other places as well. I've found that completely removing the multiprocessing and just running train() on it's own get's rid of my problem (I'm using a p3.2xlarge AWS instance, so memory/processing power is not an issue).

I also found this page which appears to address a very similar issue with using the data loader which you are also using in your code, so I am wondering if this could be the root of the problem. I have downloaded and installed and deleted and reinstalled all the repositories and data and everything else numerous times so I am pretty certain the issue is not my fault. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions