Skip to content

Reading in files and genetic maps is done sequentially, and could be parallelized across threads #79

@pettyalex

Description

@pettyalex

I had a malformed genetic map file, and while diagnosing and fixing this, I noticed that shapeit4 currently reads in input files, then reads in the genetic map sequentially. It took substantial amounts of time before shapeit4 would reach reading in the genetic map, meaning that it took a long time for me to hit this error, notice, and diagnose. Peeking at the code, these two operations are fully independent and could be done simultaneously on multiple threads, making the pre-phasing initialization happen much faster.

The only major complicaiton that I see would be keeping logging output ordered. The file read operations themselves are into fully independent structures, so there's no need to worry about read or write contention that I see.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions