Since SNPs that are close to each other are passed down together in populations, genetic variability can separate different countries and geographical locations to a certain extent (see links). It would be cool to know whether a given genomic region has genetic variability that differs between ethnicities or geographical location. This might be helpful to suggest whether there might be different health effects from that region in different populations. This could be done in a supervised or unsupervised way.
Supervised: Get a large amount of genetic data and personal data. For each ethnicity, correlate genetic variability with that ethnicity. Use those correlation values to give a score to each region for how much it varies between other ethnicities and that ethnicity (bar plot or heatmap).
Unsupervised: Do PCA of a large amount of genetic data as has been done before (refs). PC loadings would give a score to regions that represents their inter-ethnic/geographic variability along the largest axes of genetic variation.
Note: people in our department might have easy access to/familiarity with this type of genetic data.
Links:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2735096/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5644186/
Since SNPs that are close to each other are passed down together in populations, genetic variability can separate different countries and geographical locations to a certain extent (see links). It would be cool to know whether a given genomic region has genetic variability that differs between ethnicities or geographical location. This might be helpful to suggest whether there might be different health effects from that region in different populations. This could be done in a supervised or unsupervised way.
Supervised: Get a large amount of genetic data and personal data. For each ethnicity, correlate genetic variability with that ethnicity. Use those correlation values to give a score to each region for how much it varies between other ethnicities and that ethnicity (bar plot or heatmap).
Unsupervised: Do PCA of a large amount of genetic data as has been done before (refs). PC loadings would give a score to regions that represents their inter-ethnic/geographic variability along the largest axes of genetic variation.
Note: people in our department might have easy access to/familiarity with this type of genetic data.
Links:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2735096/
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5644186/