Skip to content

Add example notebook demonstrating genomic data exploration using mal…#1058

Open
priyarai121 wants to merge 2 commits intomalariagen:masterfrom
priyarai121:add-example-notebook
Open

Add example notebook demonstrating genomic data exploration using mal…#1058
priyarai121 wants to merge 2 commits intomalariagen:masterfrom
priyarai121:add-example-notebook

Conversation

@priyarai121
Copy link
Copy Markdown

Add example notebook for genomic data exploration

This pull request adds a Jupyter notebook demonstrating how to explore malaria mosquito genomic data using the malariagen_data API.

The notebook includes:

  • Loading mosquito sample metadata
  • Exploring geographic distribution of samples
  • Visualizing mosquito species counts
  • Example exploratory analysis workflow

This example helps new users understand how to access and analyze malaria genomic datasets using the Python API.

@priyarai121
Copy link
Copy Markdown
Author

Hello! 👋

I’m a student interested in contributing to the MalariaGEN ecosystem and exploring the genomic datasets through the malariagen_data API.

This PR adds a simple example notebook demonstrating:

  • Loading mosquito sample metadata
  • Exploring geographic distribution of samples
  • Visualizing mosquito species counts

The goal is to provide an easy starting point for new users who want to explore the dataset interactively.

Please let me know if any changes or improvements are needed. I’d be happy to update the notebook.

Thank you for maintaining this project!

@priyarai121
Copy link
Copy Markdown
Author

Hi! I noticed that contributions are usually discussed through issues first.

I have opened an issue describing this example notebook contribution. Please let me know if any changes are needed, and I would be happy to update the PR accordingly.

Issue link: #1085

@jonbrenas
Copy link
Copy Markdown
Collaborator

Hi @priyarai121. Can you explain why you think that your example notebook is better than the already existing example notebooks present on the repo? Why did you choose to create a new folder when one with notebooks (helpfully called notebooks) already exists? Why didn't you use the existing functions of the API that are designed to plot the kind of data that you are plotting?

@priyarai121
Copy link
Copy Markdown
Author

Thank you for the feedback @jonbrenas .

When I first started exploring the repository and using the malariagen_data API, I personally felt the need for a very simple example that focuses specifically on exploring the geographic distribution of mosquito samples. As a new user, one of the first things I wanted to understand was where the samples were collected and how they are distributed across countries. That was the motivation behind creating this notebook.

My intention was to provide a beginner-friendly example that demonstrates a simple exploratory workflow: loading sample metadata, summarizing the number of samples per country, and visualizing the distribution.

Regarding the folder structure, thank you for pointing that out. I agree that placing the notebook inside the existing notebooks/ directory would be more consistent with the repository structure, and I can update the PR to move it there.

For the plotting part, I initially used basic pandas and matplotlib plotting because I wanted to clearly show the data processing steps. However, I understand that the API already provides dedicated plotting utilities for this purpose. I can revise the notebook to use those functions so that the example better demonstrates the intended use of the API.

Thank you again for the suggestions. I’m happy to update the PR accordingly.

@jonbrenas
Copy link
Copy Markdown
Collaborator

Thanks @priyarai121. Don't you think that plot_samples.ipynb already does all that you describe and more?

@priyarai121
Copy link
Copy Markdown
Author

Thanks for pointing that out, @jonbrenas.

My initial motivation came from my experience as a new user. When I first started exploring the dataset, I wanted to quickly understand the geographic distribution of mosquito samples as a first step in exploring the data.

However, I agree that creating a separate notebook may not be necessary if similar functionality already exists. Instead, it might make more sense to improve or extend the existing plot_samples.ipynb notebook, for example by adding clearer explanations or a few additional beginner-focused steps.

If that would be a better direction, I would be happy to revise the contribution accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants