publication . Article . Data Paper . 2019

A benchmark dataset of herbarium specimen images with label data

Mathias Dillen; Quentin Groom; Simon Chagnoux; Anton Güntsch; Alex Hardisty; Elspeth Haston; Laurence Livermore; Veljo Runnel; Leif Schulman; Luc Willemse; ...
Open Access English
  • Published: 08 Feb 2019
Abstract
<jats:p>More and more herbaria are digitising their collections. Images of specimens are made available online to facilitate access to them and allow extraction of information from them. Transcription of the data written on specimens is critical for general discoverability and enables incorporation into large aggregated research datasets. Different methods, such as crowdsourcing and artificial intelligence, are being developed to optimise transcription, but herbarium specimens pose difficulties in data extraction for many reasons.</jats:p> <jats:p>To provide developers of transcription methods with a means of optimisation, we have compiled a benchmark dataset of...
Persistent Identifiers
Subjects
free text keywords: 119 Other natural sciences, GE, QA76, QH, Data Paper (Biosciences), Plantae, Fungi, Chromista, Biodiversity & Conservation, Bioinformatics, Africa, World, Europe, Australasia, Asia, Americas, lcsh:Biology (General), lcsh:QH301-705.5, Creative commons, Handwriting recognition, Digitization, Formatted text, computer.file_format, computer, Herbarium, Discoverability, Crowdsourcing, business.industry, business, Information retrieval, Data extraction, Computer science
Funded by
EC| ICEDIG
Project
ICEDIG
Innovation and consolidation for large scale digitisation of natural heritage
  • Funder: European Commission (EC)
  • Project Code: 777483
  • Funding stream: H2020 | RIA
Communities
Rural Digital Europe
34 references, page 1 of 3

Baird, Roger Charles. Leveraging the fullest potential of scientific collections through digitisation.. Biodiversity Informatics. 2010; 7 (2): 130-136 [OpenAIRE] [DOI]

Barber, Anne, Lafferty, Daryl, Landrum, Leslie R.. The SALIX Method: A semi-automated workflow for herbarium specimen digitization. Taxon. 2013; 62 (3): 581-590 [OpenAIRE] [DOI]

Carranza-Rojas, Jose, Goeau, Herve, Bonnet, Pierre, Mata-Montero, Erick, Joly, Alexis. Going deeper in the automated identification of Herbarium specimens. BMC Evolutionary Biology. 2017; 17: 181 [OpenAIRE] [PubMed] [DOI]

Chamberlain, S.. rgbif: Interface to the Global 'Biodiversity' Information Facility API.. 2017

Cope, James S., Corney, David, Clark, Jonathan Y., Remagnino, Paolo, Wilkin, Paul. Plant species identification using digital morphometrics: A review. Expert Systems with Applications. 2012; 39 (8): 7562-7573 [OpenAIRE] [DOI]

Corney, D., Clark, J., Tang, H., Wilkin, P.. Automatic extraction of leaf characters from herbarium specimens.. Taxon. 2012; 61: 231-244

Darwin Core Task Group, Biodiversity Information Standards (TDWG). Darwin Core.. 2009

Drinkwater, Robyn, Cubey, Robert, Haston, Elspeth. The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels. PhytoKeys. 2014; 38: 15-30 [OpenAIRE] [DOI]

Ellwood, Elizabeth R., Dunckel, Betty A., Flemons, Paul, Guralnick, Robert, Nelson, Gil, Newman, Greg, Newman, Sarah, Paul, Deborah, Riccardi, Greg, Rios, Nelson, Seltmann, Katja C., Mast, Austin R.. Accelerating the Digitization of Biodiversity Research Specimens through Online Public Participation. BioScience. 2015; 65 (4): 383-396 [OpenAIRE] [DOI]

GBIF.org, null. GBIF Occurrence download. [OpenAIRE] [DOI]

Secretariat, GBIF. GBIF Backbone Taxonomy. Checklist dataset. accessed via R package rgbif. [OpenAIRE] [DOI]

Team, GIMP Development. GNU Image Manipulation Program. 2017

Groom, Quentin, Hyam, Roger, Güntsch, Anton. Data management: Stable identifiers for collection specimens. Nature. 2017; 546 (7656): 33-33 [OpenAIRE] [DOI]

Güntsch, Anton, Hyam, Roger, Hagedorn, Gregor, Chagnoux, Simon, Röpert, Dominik, Casino, Ana, Droege, Gabi, Glöckler, Falko, Gödderz, Karsten, Groom, Quentin, Hoffmann, Jana, Holleman, Ayco, Kempa, Matúš, Koivula, Hanna, Marhold, Karol, Nicolson, Nicky, Smith, V. S., Triebel, Dagmar. Actionable, long-term stable and semantic web compatible identifiers for access to biological collection objects. Database. 2017; 2017 [OpenAIRE] [DOI]

Haston, E., Albenga, L., Chagnoux, S., Drinkwater, S., Durrant, J., Gilbert, E., Glöckler, F., Green, L., Harris, D., Holetschek, J., Hudson, L., Kahle, P., King, S., Kirchhoff, A., Kroupa, A., Kvacek, J., Le Bras, G., Livermore, L., Mühlenberger, G., Paul, D., Phillips, S., Smirnova, L., Vacek, F., Walker, S.. Automating data capture from natural history specimens. SYNTHEYS 3 Work Package 4,. 2015; 116 pp

34 references, page 1 of 3
Abstract
<jats:p>More and more herbaria are digitising their collections. Images of specimens are made available online to facilitate access to them and allow extraction of information from them. Transcription of the data written on specimens is critical for general discoverability and enables incorporation into large aggregated research datasets. Different methods, such as crowdsourcing and artificial intelligence, are being developed to optimise transcription, but herbarium specimens pose difficulties in data extraction for many reasons.</jats:p> <jats:p>To provide developers of transcription methods with a means of optimisation, we have compiled a benchmark dataset of...
Persistent Identifiers
Subjects
free text keywords: 119 Other natural sciences, GE, QA76, QH, Data Paper (Biosciences), Plantae, Fungi, Chromista, Biodiversity & Conservation, Bioinformatics, Africa, World, Europe, Australasia, Asia, Americas, lcsh:Biology (General), lcsh:QH301-705.5, Creative commons, Handwriting recognition, Digitization, Formatted text, computer.file_format, computer, Herbarium, Discoverability, Crowdsourcing, business.industry, business, Information retrieval, Data extraction, Computer science
Funded by
EC| ICEDIG
Project
ICEDIG
Innovation and consolidation for large scale digitisation of natural heritage
  • Funder: European Commission (EC)
  • Project Code: 777483
  • Funding stream: H2020 | RIA
Communities
Rural Digital Europe
34 references, page 1 of 3

Baird, Roger Charles. Leveraging the fullest potential of scientific collections through digitisation.. Biodiversity Informatics. 2010; 7 (2): 130-136 [OpenAIRE] [DOI]

Barber, Anne, Lafferty, Daryl, Landrum, Leslie R.. The SALIX Method: A semi-automated workflow for herbarium specimen digitization. Taxon. 2013; 62 (3): 581-590 [OpenAIRE] [DOI]

Carranza-Rojas, Jose, Goeau, Herve, Bonnet, Pierre, Mata-Montero, Erick, Joly, Alexis. Going deeper in the automated identification of Herbarium specimens. BMC Evolutionary Biology. 2017; 17: 181 [OpenAIRE] [PubMed] [DOI]

Chamberlain, S.. rgbif: Interface to the Global 'Biodiversity' Information Facility API.. 2017

Cope, James S., Corney, David, Clark, Jonathan Y., Remagnino, Paolo, Wilkin, Paul. Plant species identification using digital morphometrics: A review. Expert Systems with Applications. 2012; 39 (8): 7562-7573 [OpenAIRE] [DOI]

Corney, D., Clark, J., Tang, H., Wilkin, P.. Automatic extraction of leaf characters from herbarium specimens.. Taxon. 2012; 61: 231-244

Darwin Core Task Group, Biodiversity Information Standards (TDWG). Darwin Core.. 2009

Drinkwater, Robyn, Cubey, Robert, Haston, Elspeth. The use of Optical Character Recognition (OCR) in the digitisation of herbarium specimen labels. PhytoKeys. 2014; 38: 15-30 [OpenAIRE] [DOI]

Ellwood, Elizabeth R., Dunckel, Betty A., Flemons, Paul, Guralnick, Robert, Nelson, Gil, Newman, Greg, Newman, Sarah, Paul, Deborah, Riccardi, Greg, Rios, Nelson, Seltmann, Katja C., Mast, Austin R.. Accelerating the Digitization of Biodiversity Research Specimens through Online Public Participation. BioScience. 2015; 65 (4): 383-396 [OpenAIRE] [DOI]

GBIF.org, null. GBIF Occurrence download. [OpenAIRE] [DOI]

Secretariat, GBIF. GBIF Backbone Taxonomy. Checklist dataset. accessed via R package rgbif. [OpenAIRE] [DOI]

Team, GIMP Development. GNU Image Manipulation Program. 2017

Groom, Quentin, Hyam, Roger, Güntsch, Anton. Data management: Stable identifiers for collection specimens. Nature. 2017; 546 (7656): 33-33 [OpenAIRE] [DOI]

Güntsch, Anton, Hyam, Roger, Hagedorn, Gregor, Chagnoux, Simon, Röpert, Dominik, Casino, Ana, Droege, Gabi, Glöckler, Falko, Gödderz, Karsten, Groom, Quentin, Hoffmann, Jana, Holleman, Ayco, Kempa, Matúš, Koivula, Hanna, Marhold, Karol, Nicolson, Nicky, Smith, V. S., Triebel, Dagmar. Actionable, long-term stable and semantic web compatible identifiers for access to biological collection objects. Database. 2017; 2017 [OpenAIRE] [DOI]

Haston, E., Albenga, L., Chagnoux, S., Drinkwater, S., Durrant, J., Gilbert, E., Glöckler, F., Green, L., Harris, D., Holetschek, J., Hudson, L., Kahle, P., King, S., Kirchhoff, A., Kroupa, A., Kvacek, J., Le Bras, G., Livermore, L., Mühlenberger, G., Paul, D., Phillips, S., Smirnova, L., Vacek, F., Walker, S.. Automating data capture from natural history specimens. SYNTHEYS 3 Work Package 4,. 2015; 116 pp

34 references, page 1 of 3
Any information missing or wrong?Report an Issue