A various data sets for Machine Learning, Artificial Intelligence, and Data Science. Maintained by Community: https://www.Neuromancer.kr/
- Pix2Pix
1995-05-02~2019-04-30 (24๋ ๊ฐ), 1์ฒ๋ง๊ฑด (CSV) https://github.com/FinanceData/marcap.git
ๅ ตๅบซ็_ๅ จๅๆฐๅคๅฐๅฝขๅณ_ใใผใฟใซ๏ผ2010ๅนดๅบฆ๏ฝ2018ๅนดๅบฆ๏ผhttps://www.geospatial.jp/ckan/dataset/2010-2018-hyogo-geo-potal
refer from https://github.com/rudvlf0413/Dataset.git
- Dog Breed Identification dataset
- The dataset is designed for multiclass classification problem as it has 120 breeds of dogs. It
- https://www.kaggle.com/c/dog-breed-identification/data
Dataset: http://www.openslr.org/60/
- https://research.google.com/youtube8m/index.html?fbclid=IwAR3JtSscHE1npIsYNwLpJtnSN_Oym_zO6TJTMSoVPv6u6FogzjunKVisyHI
- Google AI ์์ ๊ธฐ์กด์ ์๋ ค์ง YouTube-8M์ ์ผ๋ถ๋ฅผ ํ์ฅํ์ฌ, segment level์ annotation์ด ์ ๊ณต๋๋ ๋ฐ์ดํฐ์
- ๊ธฐ์กด์ YouTube-8M์์๋ ๋น๋์ค/ํ๋ ์ level์ ๋จธ์ ์ด ์์ฑํ ๋ ์ด๋ธ์ ์ ๊ณตํ ๋ฐ๋ฉด, ์ด๋ฒ์๋ segment level์ ์ฌ๋์ด ๋งค๋ด์ผ๋ก ๊ฒ์ฆํ ๋ ์ด๋ธ์ด ์ ๊ณต
- 1,000๊ฐ์ ํด๋์ค์ ๋ํ์ฌ,
- 237K ๊ฐ์ ๋ ์ด๋ธ (์ฌ๋์ด ๋งค๋ด์ผํ๊ฒ)
- ํ๋์ ๋น๋์ค๋น ํ๊ท 5๊ฐ์ segments
- ํ๋์ segment๋น, ๋น๋์ค์์ ๋ฌด์์๋ก ์ถ์ถ๋ 5์ด ๊ตฌ
- annotation ํฌ๋งท์ ๊ธฐ์กด์ YouTube-8M๊ณผ ์ ์ฌํฉ๋๋ค. (segment์ ์์๊ณผ ๋, ๊ทธ๋ฆฌ๊ณ ๊ฐ segment๋น ๋ ์ด๋ธ ์ ๋ณด)
- tencent-ml-images
- https://github.com/NVlabs/ffhq-dataset
- Coil-20
- MS COCO
- NVIDIA food Image classification
- CIFAR-10, CIFAR-100
- Large-scale CelebFaces Attributes (CelebA) Dataset
- Street View House Numbers (SVHN)
- MNIST
- Facial Database
- Simple Vector Drawing Datasets
- Places2 (Space)
- Yelp dataset (restorance)
- DeepFashion
- Image to Latex (์์ ์ด๋ฏธ์ง๋ฅผ latex ์ฝ๋๋ก ๋ง๋๋ ๋ฐ์ดํฐ์ ์ ๋๋ค.)
- NIST Dataset(Fingerprint, Mugshot, OCR)
- Biometics ideal test dataset(Iris, Fingerprint, Face, palmprint, handwriting etc. - ๋ก๊ทธ์ธ ํ์!)
- PASCAL 2012 Dataset (Classification & Detection)
- Lung cancer dataset
- Brain tumor dataset
- Breast cancer dataset (kaggle)
- The cancer image archive
- Mammograpy dataset
- Bio Image Dataset @ IIIT Delhi
- CAMELYON 16 - metatstasis detection in lymph node
- CAMELYON17 Dataset
- YouTube-BoundingBoxes Dataset
- Youtube-8M Dataset
- The Kinetics Human Action Video Dataset
- StatMT(Machine Translation, summarization)
- UN parallel Corpus
- IWSLT Dataset (including TED Translation)
- The Stacks Project
- (๋์๊ธฐํํ ์ฑ ์ ์๋ณธ๊ณผ latex ์ฝ๋ pair set?)
- http://stacks.math.columbia.edu/
- Google sentence compression(Google์์ ๋ฌธ์ฅ์ ์ ํํ ํ ๋ฐ์ดํฐ์ ๋๋ค.)
- ์กฐ์ ์์กฐ์ค๋ก(ํ๊ธ/ํ๋ฌธ ๋ฒ์ญ)
- 20 Newsgroups
- Reuter dataset
- Tweet data, a subset of TREC 2011 microblog track
- Title data, including news titles with class labels from some news websites
- bAbI dataset (Facebook Question Answering)
- Question/Answering(๋น์นธ์ถ๋ก ๋ฌธ์ ) pairs using CNN/Daily Mail articles
- Stanford Question Answering Dataset
- CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
- WikiReading dataset
-
Word2Vec์ ์ฐ์ธ ๋ฐ์ดํฐ์ (์ํคํผ๋์, WMT11 ๋ฑ) https://code.google.com/archive/p/word2vec/
-
Fast Text pre-trained vector set
- Stanford Sentiment Treebank(SST)
- Nottingham music dataset
- A large-scale dataset of manually annotated audio events (Google research)
-
Freebase
-
Wordnet
-
Microsoft Concept Graph
-
DBPedia Dataset
- The DBpedia data set uses a large multi-domain ontology which has been derived from Wikipedia as well as localized versions of DBpedia in more than 100 l
- http://wiki.dbpedia.org/services-resources/datasets/dbpedia-datasets
-
Yago
- YAGO3 is a huge semantic knowledge base, derived from Wikipedia WordNet and GeoNames.
- https://datahub.io/ko_KR/dataset/yago
-
Google Knowledge graph API
- AMiner - Datasets for social network Analysis
- Netflix Prize Data Set
- ๋ ผ๋ฌธ bibliography ๋ฐ์ดํฐ์ , Author Citation Networks
- Politics sub redit
- Amazon dataset
- Twitter Spammer network
- Twitter tweets
- Online reviews
- Word2Vect
- GloVe
- FastText
- SKT Bigdata hub
- Titanic survivors dataset
- Obamaโs political speeches
- Yahoo Finance dataset
- Linux code
- NYC Taxi dataset
- US Census dataset
- Diamond.csv
- countries.csv
- exprs_GSE5859.csv
- movies.dat
- movie_lines.txt
- movie conversation
- mtcars.csv
- pollster_cleaned_2002_2008.csv
- pollster_cleaned_2010.csv
- pollster_cleaned_2012.csv
- kospi_kospi.csv