Add example notebook demonstrating genomic data exploration using mal…#1058
Add example notebook demonstrating genomic data exploration using mal…#1058priyarai121 wants to merge 2 commits intomalariagen:masterfrom
Conversation
|
Hello! 👋 I’m a student interested in contributing to the MalariaGEN ecosystem and exploring the genomic datasets through the This PR adds a simple example notebook demonstrating:
The goal is to provide an easy starting point for new users who want to explore the dataset interactively. Please let me know if any changes or improvements are needed. I’d be happy to update the notebook. Thank you for maintaining this project! |
|
Hi! I noticed that contributions are usually discussed through issues first. I have opened an issue describing this example notebook contribution. Please let me know if any changes are needed, and I would be happy to update the PR accordingly. Issue link: #1085 |
|
Hi @priyarai121. Can you explain why you think that your example notebook is better than the already existing example notebooks present on the repo? Why did you choose to create a new folder when one with notebooks (helpfully called |
|
Thank you for the feedback @jonbrenas . When I first started exploring the repository and using the malariagen_data API, I personally felt the need for a very simple example that focuses specifically on exploring the geographic distribution of mosquito samples. As a new user, one of the first things I wanted to understand was where the samples were collected and how they are distributed across countries. That was the motivation behind creating this notebook. My intention was to provide a beginner-friendly example that demonstrates a simple exploratory workflow: loading sample metadata, summarizing the number of samples per country, and visualizing the distribution. Regarding the folder structure, thank you for pointing that out. I agree that placing the notebook inside the existing For the plotting part, I initially used basic pandas and matplotlib plotting because I wanted to clearly show the data processing steps. However, I understand that the API already provides dedicated plotting utilities for this purpose. I can revise the notebook to use those functions so that the example better demonstrates the intended use of the API. Thank you again for the suggestions. I’m happy to update the PR accordingly. |
|
Thanks @priyarai121. Don't you think that |
|
Thanks for pointing that out, @jonbrenas. My initial motivation came from my experience as a new user. When I first started exploring the dataset, I wanted to quickly understand the geographic distribution of mosquito samples as a first step in exploring the data. However, I agree that creating a separate notebook may not be necessary if similar functionality already exists. Instead, it might make more sense to improve or extend the existing If that would be a better direction, I would be happy to revise the contribution accordingly. |
Add example notebook for genomic data exploration
This pull request adds a Jupyter notebook demonstrating how to explore malaria mosquito genomic data using the
malariagen_dataAPI.The notebook includes:
This example helps new users understand how to access and analyze malaria genomic datasets using the Python API.