[DataCap Application] <AIND> - <Smart Selective Plane Illumination Microscopy-2023>

### Data Owner Name

Allen Institute for Neural Dynamics (AIND)

### Data Owner Country/Region

United States

### Data Owner Industry

Life Science / Healthcare

### Website

https://allenneuraldynamics.github.io/data.html#aind-open-data

### Social Media Handle

aindopendata@alleninstitute.org

### Social Media Type

Other

### What is your role related to the dataset

Data Preparer

### Total amount of DataCap being requested

20PiB

### Expected size of single dataset (one copy)

2.5PiB

### Number of replicas to store

8

### Weekly allocation of DataCap requested

2PiB

### On-chain address for first allocation

f1r7oh66iquzdjqnhzsngcxc4dqskdzhzdpwjzvmq

### Data Type of Application

Public, Open Dataset (Research/Non-Profit)

### Custom multisig

- [ ] Use Custom Multisig

### Identifier

_No response_

### Share a brief history of your project and organization

```text
Data Access
The Allen Institute for Neural Dynamics (AIND) is committed to FAIR, Open, and Reproducible science. We therefore share all of the data we collect publicly with rich metadata as near to the time of collection as possible. We share data at all stages of the data lifecycle, including preliminary data collected during methods development, processed data that we are actively improving, or highly curated data used in a publication.

aind-open-data
In addition to sharing curated datasets with modality-specific NIH data archives like DANDI and BIL, we are also excited to share all of our data in one public S3 bucket generously hosted by the Registry of Open Data on AWS.

All data is stored here: s3://aind-open-data
```

### Is this project associated with other projects/ecosystem stakeholders?

No

### If answered yes, what are the other projects/ecosystem stakeholders

```text

```

### Describe the data being stored onto Filecoin

```text
SmartSPIM is a high-throughput, three-dimensional, whole-brain imaging technology commonly used in neuroscience and biomedical research.

SPIM stands for Selective Plane Illumination Microscopy, commonly translated into Chinese as "light sheet microscopy" or "surface scanning microscopy."

SmartSPIM is a commercial light sheet microscopy imaging platform developed by LifeCanvas Technologies. It enables rapid, non-destructive, three-dimensional imaging of large tissues, such as mouse brains and brain slices.

This imaging technology is commonly used in cutting-edge biomedical fields such as neuron tracing, brain mapping, and tissue structure analysis.

The directory s3://aind-open-data/SmartSPIM-*** contains data collected using SmartSPIM technology, such as high-resolution 3D brain tissue images, raw imaging data, processed image data, and associated metadata.

Directories may be further subdivided by experiment, sample, date, animal ID, and other categories.
```

### Where was the data currently stored in this dataset sourced from

AWS Cloud

### If you answered "Other" in the previous question, enter the details here

```text

```

### If you are a data preparer. What is your location (Country/Region)

Singapore

### If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?

```text
We use the AWS CLI to download data from AWS, cut it into appropriate sizes, and package it into CAR files. We then send them to the SPs via HTTP or email. We also use our own DC distributor to distribute DCs to the SPs. This allows us to effectively control the number of CAR files distributed and the proportion of DCs allocated in each round, ensuring fairness.
```

### If you are not preparing the data, who will prepare the data?  (Provide name and business)

```text

```

### Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.

```text
I'm not sure if this dataset is fully stored in the Filecoin network. It contains around 10 PiB of data.
https://github.com/MikeH1999/RFfil/issues/124
https://github.com/hash889900/HashTeam/issues/80
These two datasets are applications I've submitted to other allocators. They are:
s3://aind-open-data/behavior-XXXXX 264.7632 TiB
s3://aind-open-data/fmost-XXXXX 102.6549 TiB
s3://aind-open-data/ecephys-XXXXX 634.6383 TiB
s3://aind-open-data/HCR-XXXXX 1205.8201 TiB
This time, I'm applying for:
s3://aind-open-data/SmartSPIM-XXXXX 7814.3519 TiB

The dataset to be stored in this application is: s3://aind-open-data/SmartSPIM-2023: 2074.3036 TiB

I also saw two other applications:
[DataCap Application] <AWS open data> - <Aind_mouse> TOPPOOL-LEE/Allocator-Pathway-TOP-POOL#34
[DataCap Application] <AWS open data> - <Aind_mouse> zcfil/ZCFIL#3
However, the DCs they requested were all under 1 PiB, which is quite different from the 10 PiB of the entire dataset. This dataset is large and constantly updated, so I spent some time calculating its size.
```

### Please share a sample of the data

```text
s3://aind-open-data/SmartSPIM_000393_2023-01-06_13-35-10_stitched_2023-02-02_22-28-35
s3://aind-open-data/SmartSPIM_000393_2023-01-06_13-35-10SmartSPIM_000393_2023-01-06_13-35-10
s3://aind-open-data/SmartSPIM_000394_2023-01-05_14-56-34_stitched_2023-02-20_19-47-09
s3://aind-open-data/SmartSPIM_000394_2023-01-05_14-56-34
s3://aind-open-data/SmartSPIM_000397_2024-02-07_10-04-23_stitched_2024-03-06_04-02-49
s3://aind-open-data/SmartSPIM_000397_2024-02-07_10-04-23


s3://aind-open-data/SmartSPIM_XXXXXXX  7814.3519 TiB
```

### Confirm that this is a public dataset that can be retrieved by anyone on the Network

- [x] I confirm

### If you chose not to confirm, what was the reason

```text

```

### What is the expected retrieval frequency for this data

Yearly

### For how long do you plan to keep this dataset stored on Filecoin

2 to 3 years

### In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, South America, Europe

### How will you be distributing your data to storage providers

HTTP or FTP server, Shipping hard drives

### How did you find your storage providers

Slack, Partners

### If you answered "Others" in the previous question, what is the tool or platform you used

```text

```

### Please list the provider IDs and location of the storage providers you will be working with.

```text
f03289271 Germany (deprecated)
f03315260 Germany (deprecated)
f03367915 United States (deprecated)
f03542777 Vietnam (deprecated)
f03607310 Brazil (deprecated)
f03603117 United States (deprecated)
f03559187 United States (deprecated)
f03100006 HongKong (deprecated)
f03100007 HongKong (deprecated)
f03649204 Hong Kong  (deprecated)
f03649212 Hong Kong (deprecated)
f03649217 Singapore  (deprecated)
f03649227 Singapore  (deprecated)
f03673681 Brazil (deprecated)
f03099888 Germany  (deprecated)
f03100088 United State  (deprecated)
f03099287 Brazil  (deprecated)
f03098965 Singapore  (deprecated)
f03099101 Singapore  (deprecated)
f03669112 Singapore  (deprecated)

f03100014 HongKong (NEW)
f03099777 Germany (NEW)
```

### How do you plan to make deals to your storage providers

Boost client

### If you answered "Others/custom tool" in the previous question, enter the details here

```text

```

### Can you confirm that you will follow the Fil+ guideline

Yes

[DataCap Application] <AIND> - <Smart Selective Plane Illumination Microscopy-2023> #172

Description

Data Owner Name

Data Owner Country/Region

Data Owner Industry

Website

Social Media Handle

Social Media Type

What is your role related to the dataset

Total amount of DataCap being requested

Expected size of single dataset (one copy)

Number of replicas to store

Weekly allocation of DataCap requested

On-chain address for first allocation

Data Type of Application

Custom multisig

Identifier

Share a brief history of your project and organization

Is this project associated with other projects/ecosystem stakeholders?

If answered yes, what are the other projects/ecosystem stakeholders

Describe the data being stored onto Filecoin

Where was the data currently stored in this dataset sourced from

If you answered "Other" in the previous question, enter the details here

If you are a data preparer. What is your location (Country/Region)

If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?

If you are not preparing the data, who will prepare the data? (Provide name and business)

Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.

Please share a sample of the data

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

What is the expected retrieval frequency for this data

For how long do you plan to keep this dataset stored on Filecoin

In which geographies do you plan on making storage deals

How will you be distributing your data to storage providers

How did you find your storage providers

If you answered "Others" in the previous question, what is the tool or platform you used

Please list the provider IDs and location of the storage providers you will be working with.

How do you plan to make deals to your storage providers

If you answered "Others/custom tool" in the previous question, enter the details here

Can you confirm that you will follow the Fil+ guideline

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions