10 Best Image Classification Datasets for ML Projects

To help you build object recognition models, scene recognition models, and more, we’ve compiled a list of the best image classification datasets. These datasets vary in scope and magnitude and can suit a variety of use cases. Furthermore, the datasets have been divided into the following categories: medical imaging, agriculture & scene recognition, and others.

Medical Image Classification Datasets

1. Recursion Cellular Image Classification – This data comes from the Recursion 2019 challenge. This goal of the competition was to use biological microscopy data to develop a model that identifies replicates. The full information regarding the competition can be found here.

2. TensorFlow patch_camelyon Medical Images – This medical image classification dataset comes from the TensorFlow website. It contains just over 327,000 color images, each 96 x 96 pixels. The images are histopathological lymph node scans which contain metastatic tissue.

Agriculture and Scene Datasets

3. CoastSat Image Classification Dataset – Used for an open-source shoreline mapping tool, this dataset includes aerial images taken from satellites. The dataset also includes meta data pertaining to the labels.

4. Images for Weather Recognition – Used for multi-class weather recognition, this dataset is a collection of 1125 images divided into four categories. The image categories are sunrise, shine, rain, and cloudy.

5. Indoor Scenes Images – From MIT, this dataset contains over 15,000 images of indoor locations. The dataset was originally built to tackle the problem of indoor scene recognition. All images are in JPEG format and have been divided into 67 categories. The number of images per category vary. However, there are at least 100 images for each category.

6. Intel Image Classification – Created by Intel for an image classification contest, this expansive image dataset contains approximately 25,000 images. Furthermore, the images are divided into the following categories: buildings, forest, glacier, mountain, sea, and street. The dataset has been divided into folders for training, testing, and prediction. The training folder includes around 14,000 images and the testing folder has around 3,000 images. Finally, the prediction folder includes around 7,000 images.

7. TensorFlow Sun397 Image Classification Dataset – Another dataset from Tensorflow, this dataset contains over 108,000 images used in the Scene Understanding (SUN) benchmark. Furthermore, the images have been divided into 397 categories. The exact amount of images in each category varies. However, there are at least 100 images in each of the various scene and object categories.

Other Image Classification Datasets

8. Architectural Heritage Elements – This dataset was created to train models that could classify architectural images, based on cultural heritage. It contains over 10,000 images divided into 10 categories. The categories are: altar, apse, bell tower, column, dome (inner), dome (outer), flying buttress, gargoyle, stained glass, and vault.

9. Image Classification: People and Food – This dataset comes in CSV format and consists of images of people eating food. Human annotators classified the images by gender and age. The CSV file includes 587 rows of data with URLs linking to each image.

10. Images of Cracks in Concrete for Classification – From Mendeley, this dataset includes 40,000 images of concrete. Each image is 227 x 227 pixels, with half of the images including concrete with cracks and half without.

Also published on: https://lionbridge.ai/datasets/top-10-image-classification-datasets-for-machine-learning/