Data Owner Name
Allen Institute for Neural Dynamics (AIND)
Data Owner Country/Region
United States
Data Owner Industry
Life Science / Healthcare
Website
https://allenneuraldynamics.github.io/data.html#aind-open-data
Social Media Handle
aindopendata@alleninstitute.org
Social Media Type
Other
What is your role related to the dataset
Data Preparer
Total amount of DataCap being requested
20PiB
Expected size of single dataset (one copy)
2.5PiB
Number of replicas to store
8
Weekly allocation of DataCap requested
2PiB
On-chain address for first allocation
f1r7oh66iquzdjqnhzsngcxc4dqskdzhzdpwjzvmq
Data Type of Application
Public, Open Dataset (Research/Non-Profit)
Custom multisig
Identifier
No response
Share a brief history of your project and organization
Data Access
The Allen Institute for Neural Dynamics (AIND) is committed to FAIR, Open, and Reproducible science. We therefore share all of the data we collect publicly with rich metadata as near to the time of collection as possible. We share data at all stages of the data lifecycle, including preliminary data collected during methods development, processed data that we are actively improving, or highly curated data used in a publication.
aind-open-data
In addition to sharing curated datasets with modality-specific NIH data archives like DANDI and BIL, we are also excited to share all of our data in one public S3 bucket generously hosted by the Registry of Open Data on AWS.
All data is stored here: s3://aind-open-data
Is this project associated with other projects/ecosystem stakeholders?
No
If answered yes, what are the other projects/ecosystem stakeholders
Describe the data being stored onto Filecoin
SmartSPIM is a high-throughput, three-dimensional, whole-brain imaging technology commonly used in neuroscience and biomedical research.
SPIM stands for Selective Plane Illumination Microscopy, commonly translated into Chinese as "light sheet microscopy" or "surface scanning microscopy."
SmartSPIM is a commercial light sheet microscopy imaging platform developed by LifeCanvas Technologies. It enables rapid, non-destructive, three-dimensional imaging of large tissues, such as mouse brains and brain slices.
This imaging technology is commonly used in cutting-edge biomedical fields such as neuron tracing, brain mapping, and tissue structure analysis.
The directory s3://aind-open-data/SmartSPIM-*** contains data collected using SmartSPIM technology, such as high-resolution 3D brain tissue images, raw imaging data, processed image data, and associated metadata.
Directories may be further subdivided by experiment, sample, date, animal ID, and other categories.
Where was the data currently stored in this dataset sourced from
AWS Cloud
If you answered "Other" in the previous question, enter the details here
If you are a data preparer. What is your location (Country/Region)
Singapore
If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?
We use the AWS CLI to download data from AWS, cut it into appropriate sizes, and package it into CAR files. We then send them to the SPs via HTTP or email. We also use our own DC distributor to distribute DCs to the SPs. This allows us to effectively control the number of CAR files distributed and the proportion of DCs allocated in each round, ensuring fairness.
If you are not preparing the data, who will prepare the data? (Provide name and business)
Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.
I'm not sure if this dataset is fully stored in the Filecoin network. It contains around 10 PiB of data.
https://github.com/MikeH1999/RFfil/issues/124
https://github.com/hash889900/HashTeam/issues/80
These two datasets are applications I've submitted to other allocators. They are:
s3://aind-open-data/behavior-XXXXX 264.7632 TiB
s3://aind-open-data/fmost-XXXXX 102.6549 TiB
s3://aind-open-data/ecephys-XXXXX 634.6383 TiB
s3://aind-open-data/HCR-XXXXX 1205.8201 TiB
This time, I'm applying for:
s3://aind-open-data/SmartSPIM-XXXXX 7814.3519 TiB
The dataset to be stored in this application is: s3://aind-open-data/SmartSPIM-2023: 2074.3036 TiB
I also saw two other applications:
[DataCap Application] <AWS open data> - <Aind_mouse> TOPPOOL-LEE/Allocator-Pathway-TOP-POOL#34
[DataCap Application] <AWS open data> - <Aind_mouse> zcfil/ZCFIL#3
However, the DCs they requested were all under 1 PiB, which is quite different from the 10 PiB of the entire dataset. This dataset is large and constantly updated, so I spent some time calculating its size.
Please share a sample of the data
s3://aind-open-data/SmartSPIM_000393_2023-01-06_13-35-10_stitched_2023-02-02_22-28-35
s3://aind-open-data/SmartSPIM_000393_2023-01-06_13-35-10SmartSPIM_000393_2023-01-06_13-35-10
s3://aind-open-data/SmartSPIM_000394_2023-01-05_14-56-34_stitched_2023-02-20_19-47-09
s3://aind-open-data/SmartSPIM_000394_2023-01-05_14-56-34
s3://aind-open-data/SmartSPIM_000397_2024-02-07_10-04-23_stitched_2024-03-06_04-02-49
s3://aind-open-data/SmartSPIM_000397_2024-02-07_10-04-23
s3://aind-open-data/SmartSPIM_XXXXXXX 7814.3519 TiB
Confirm that this is a public dataset that can be retrieved by anyone on the Network
If you chose not to confirm, what was the reason
What is the expected retrieval frequency for this data
Yearly
For how long do you plan to keep this dataset stored on Filecoin
2 to 3 years
In which geographies do you plan on making storage deals
Greater China, Asia other than Greater China, North America, South America, Europe
How will you be distributing your data to storage providers
HTTP or FTP server, Shipping hard drives
How did you find your storage providers
Slack, Partners
If you answered "Others" in the previous question, what is the tool or platform you used
Please list the provider IDs and location of the storage providers you will be working with.
f03289271 Germany (deprecated)
f03315260 Germany (deprecated)
f03367915 United States (deprecated)
f03542777 Vietnam (deprecated)
f03607310 Brazil (deprecated)
f03603117 United States (deprecated)
f03559187 United States (deprecated)
f03100006 HongKong (deprecated)
f03100007 HongKong (deprecated)
f03649204 Hong Kong (deprecated)
f03649212 Hong Kong (deprecated)
f03649217 Singapore (deprecated)
f03649227 Singapore (deprecated)
f03673681 Brazil (deprecated)
f03099888 Germany (deprecated)
f03100088 United State (deprecated)
f03099287 Brazil (deprecated)
f03098965 Singapore (deprecated)
f03099101 Singapore (deprecated)
f03669112 Singapore (deprecated)
f03100014 HongKong (NEW)
f03099777 Germany (NEW)
How do you plan to make deals to your storage providers
Boost client
If you answered "Others/custom tool" in the previous question, enter the details here
Can you confirm that you will follow the Fil+ guideline
Yes
Data Owner Name
Allen Institute for Neural Dynamics (AIND)
Data Owner Country/Region
United States
Data Owner Industry
Life Science / Healthcare
Website
https://allenneuraldynamics.github.io/data.html#aind-open-data
Social Media Handle
aindopendata@alleninstitute.org
Social Media Type
Other
What is your role related to the dataset
Data Preparer
Total amount of DataCap being requested
20PiB
Expected size of single dataset (one copy)
2.5PiB
Number of replicas to store
8
Weekly allocation of DataCap requested
2PiB
On-chain address for first allocation
f1r7oh66iquzdjqnhzsngcxc4dqskdzhzdpwjzvmq
Data Type of Application
Public, Open Dataset (Research/Non-Profit)
Custom multisig
Identifier
No response
Share a brief history of your project and organization
Is this project associated with other projects/ecosystem stakeholders?
No
If answered yes, what are the other projects/ecosystem stakeholders
Describe the data being stored onto Filecoin
Where was the data currently stored in this dataset sourced from
AWS Cloud
If you answered "Other" in the previous question, enter the details here
If you are a data preparer. What is your location (Country/Region)
Singapore
If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?
If you are not preparing the data, who will prepare the data? (Provide name and business)
Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.
Please share a sample of the data
Confirm that this is a public dataset that can be retrieved by anyone on the Network
If you chose not to confirm, what was the reason
What is the expected retrieval frequency for this data
Yearly
For how long do you plan to keep this dataset stored on Filecoin
2 to 3 years
In which geographies do you plan on making storage deals
Greater China, Asia other than Greater China, North America, South America, Europe
How will you be distributing your data to storage providers
HTTP or FTP server, Shipping hard drives
How did you find your storage providers
Slack, Partners
If you answered "Others" in the previous question, what is the tool or platform you used
Please list the provider IDs and location of the storage providers you will be working with.
How do you plan to make deals to your storage providers
Boost client
If you answered "Others/custom tool" in the previous question, enter the details here
Can you confirm that you will follow the Fil+ guideline
Yes