Skip to content

[DataCap Application] <AIND> - <Smart Selective Plane Illumination Microscopy-2023> #172

@MMMMercy

Description

@MMMMercy

Data Owner Name

Allen Institute for Neural Dynamics (AIND)

Data Owner Country/Region

United States

Data Owner Industry

Life Science / Healthcare

Website

https://allenneuraldynamics.github.io/data.html#aind-open-data

Social Media Handle

aindopendata@alleninstitute.org

Social Media Type

Other

What is your role related to the dataset

Data Preparer

Total amount of DataCap being requested

20PiB

Expected size of single dataset (one copy)

2.5PiB

Number of replicas to store

8

Weekly allocation of DataCap requested

2PiB

On-chain address for first allocation

f1r7oh66iquzdjqnhzsngcxc4dqskdzhzdpwjzvmq

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Custom multisig

  • Use Custom Multisig

Identifier

No response

Share a brief history of your project and organization

Data Access
The Allen Institute for Neural Dynamics (AIND) is committed to FAIR, Open, and Reproducible science. We therefore share all of the data we collect publicly with rich metadata as near to the time of collection as possible. We share data at all stages of the data lifecycle, including preliminary data collected during methods development, processed data that we are actively improving, or highly curated data used in a publication.

aind-open-data
In addition to sharing curated datasets with modality-specific NIH data archives like DANDI and BIL, we are also excited to share all of our data in one public S3 bucket generously hosted by the Registry of Open Data on AWS.

All data is stored here: s3://aind-open-data

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders


Describe the data being stored onto Filecoin

SmartSPIM is a high-throughput, three-dimensional, whole-brain imaging technology commonly used in neuroscience and biomedical research.

SPIM stands for Selective Plane Illumination Microscopy, commonly translated into Chinese as "light sheet microscopy" or "surface scanning microscopy."

SmartSPIM is a commercial light sheet microscopy imaging platform developed by LifeCanvas Technologies. It enables rapid, non-destructive, three-dimensional imaging of large tissues, such as mouse brains and brain slices.

This imaging technology is commonly used in cutting-edge biomedical fields such as neuron tracing, brain mapping, and tissue structure analysis.

The directory s3://aind-open-data/SmartSPIM-*** contains data collected using SmartSPIM technology, such as high-resolution 3D brain tissue images, raw imaging data, processed image data, and associated metadata.

Directories may be further subdivided by experiment, sample, date, animal ID, and other categories.

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here


If you are a data preparer. What is your location (Country/Region)

Singapore

If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?

We use the AWS CLI to download data from AWS, cut it into appropriate sizes, and package it into CAR files. We then send them to the SPs via HTTP or email. We also use our own DC distributor to distribute DCs to the SPs. This allows us to effectively control the number of CAR files distributed and the proportion of DCs allocated in each round, ensuring fairness.

If you are not preparing the data, who will prepare the data? (Provide name and business)


Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.

I'm not sure if this dataset is fully stored in the Filecoin network. It contains around 10 PiB of data.
https://github.com/MikeH1999/RFfil/issues/124
https://github.com/hash889900/HashTeam/issues/80
These two datasets are applications I've submitted to other allocators. They are:
s3://aind-open-data/behavior-XXXXX 264.7632 TiB
s3://aind-open-data/fmost-XXXXX 102.6549 TiB
s3://aind-open-data/ecephys-XXXXX 634.6383 TiB
s3://aind-open-data/HCR-XXXXX 1205.8201 TiB
This time, I'm applying for:
s3://aind-open-data/SmartSPIM-XXXXX 7814.3519 TiB

The dataset to be stored in this application is: s3://aind-open-data/SmartSPIM-2023: 2074.3036 TiB

I also saw two other applications:
[DataCap Application] <AWS open data> - <Aind_mouse> TOPPOOL-LEE/Allocator-Pathway-TOP-POOL#34
[DataCap Application] <AWS open data> - <Aind_mouse> zcfil/ZCFIL#3
However, the DCs they requested were all under 1 PiB, which is quite different from the 10 PiB of the entire dataset. This dataset is large and constantly updated, so I spent some time calculating its size.

Please share a sample of the data

s3://aind-open-data/SmartSPIM_000393_2023-01-06_13-35-10_stitched_2023-02-02_22-28-35
s3://aind-open-data/SmartSPIM_000393_2023-01-06_13-35-10SmartSPIM_000393_2023-01-06_13-35-10
s3://aind-open-data/SmartSPIM_000394_2023-01-05_14-56-34_stitched_2023-02-20_19-47-09
s3://aind-open-data/SmartSPIM_000394_2023-01-05_14-56-34
s3://aind-open-data/SmartSPIM_000397_2024-02-07_10-04-23_stitched_2024-03-06_04-02-49
s3://aind-open-data/SmartSPIM_000397_2024-02-07_10-04-23


s3://aind-open-data/SmartSPIM_XXXXXXX  7814.3519 TiB

Confirm that this is a public dataset that can be retrieved by anyone on the Network

  • I confirm

If you chose not to confirm, what was the reason


What is the expected retrieval frequency for this data

Yearly

For how long do you plan to keep this dataset stored on Filecoin

2 to 3 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, South America, Europe

How will you be distributing your data to storage providers

HTTP or FTP server, Shipping hard drives

How did you find your storage providers

Slack, Partners

If you answered "Others" in the previous question, what is the tool or platform you used


Please list the provider IDs and location of the storage providers you will be working with.

f03289271 Germany (deprecated)
f03315260 Germany (deprecated)
f03367915 United States (deprecated)
f03542777 Vietnam (deprecated)
f03607310 Brazil (deprecated)
f03603117 United States (deprecated)
f03559187 United States (deprecated)
f03100006 HongKong (deprecated)
f03100007 HongKong (deprecated)
f03649204 Hong Kong  (deprecated)
f03649212 Hong Kong (deprecated)
f03649217 Singapore  (deprecated)
f03649227 Singapore  (deprecated)
f03673681 Brazil (deprecated)
f03099888 Germany  (deprecated)
f03100088 United State  (deprecated)
f03099287 Brazil  (deprecated)
f03098965 Singapore  (deprecated)
f03099101 Singapore  (deprecated)
f03669112 Singapore  (deprecated)

f03100014 HongKong (NEW)
f03099777 Germany (NEW)

How do you plan to make deals to your storage providers

Boost client

If you answered "Others/custom tool" in the previous question, enter the details here


Can you confirm that you will follow the Fil+ guideline

Yes

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions