Lemons quality control dataset
Lemon dataset has been prepared to investigate the possibilities to tackle the issue of fruit quality control. It contains 2690 annotated images (1056 x 1056 pixels). Raw lemon images have been captured using the procedure described in the following blogpost and manually annotated using CVAT.
Here's an example of raw unannotated data:
and some annotated samples:
Name | Attribute | Type | Default | Description | Example |
---|---|---|---|---|---|
condition | healthy | boolean | true | Determine whether the fruit is healthy. If not regions with identified issues are annotated | |
condition | greening | boolean | false | Determine whether the fruit contains areas that are not uniformly yellow and have green areas. | |
image_quality | blurry | boolean | false | Fruit image is blurry | |
image_quality | cropped | boolean | false | Not all fruit parts are on the image | |
image_quality | unnatural_color | boolean | false | There are issues with color representation. | |
image_quality | no_data | boolean | false | There are black spots on the fruit image that do not contain data. | |
illness | - | region | - | - | |
gangrene | - | region | - | - | |
mould | - | region | - | - | |
blemish | artificial | region + boolean | - | - | |
dark_style_remains | - | region | - | After pollination the remains of style are preserved in the fruit. A dark area around the remain of style indicates an unhealthy fruit. This place is the region from which the fruit starts rotting or catches mould. | |
pedicel | - | region | - | Pedicel refers to a structure connecting a single flower to its inflorescence. | |
artifact | - | region | - | Image contains artifacts i.e. regions that are not related to a fruit and are a result of wrong image processing. Those regions should be identified and described. |
You will notice that file names are composed to form a specific identifier e.g.:
0037_G_I_120_A
: 0037
(individual fruit instance), 120
(relative photo angle), A
(photo position). Some of them are restricted to the original project and cannot be published.
File | Format | Version |
---|---|---|
Lemon Dataset | COCO | v1 |
COCO API can be utilized to read the dataset.
from pycocotools.coco import COCO
coco = COCO('../lemon-dataset/annotations/instances_default.json')
If you use the lemons data set in a scientific publication, we would appreciate references to the following paper:
Biblatex entry:
@misc{softwaremill_2020,
author = {Maciej Adamiak},
title = {Lemons quality control dataset},
institution = {SoftwareMill},
month = jul,
year = 2020,
doi = {10.5281/zenodo.3965568},
url = {https://github.com/softwaremill/lemon-dataset}
}
MIT License
Copyright (c) 2020 SoftwareMill
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.