edit

Public Datasets

Floydhub hosts the below popular public datasets. Using this avoids having to reupload big datasets onto our servers. To use them, please pass it as --data in the run command.

Dataset Name Floyd Data ID Description
MS COCO 2014 Training Images jq4ZXUCSVer4t65rWeyieG Contains around 80k images
Imagenet VGG Very Deep 19 jq4ZXUCSVer4t65rWeyieG 19 weight layers pre-trained Convnet model
MNIST Gbya2j64ApqjSHt3vDpdSh Database for handwritten digits
CALTECH 101/256 Z48LF4K75SeyGbLnfpXbCP Pictures of objects belonging to 101/256 categories
Quora Duplicate Questions XeyQLG4nb2psqRjmzCTsbN Contains over 400K lines of potential question duplicate pairs
CIFAR 10/100 diSgciLH4WA7HpcHNasP9j Subset of 80 million tiny images dataset
Cats vs Dogs Redux: Kernels Edition SyccinddLDdS7p3vzcwGQ2 Dataset for Kaggle's famous Dogs vs Cats competition
CycleGAN f9RVzpea4vb9uCLaDggUgX Dataset for CycleGAN

If you have requests or suggestions for any public datasets to add to our servers, let us know contact@floydhub.com


Help make this document better

This guide, as well as the rest of our docs, are open-source and available on GitHub. We welcome your contributions.