RRC ID 78131
著者 Takano A, Cole TCH, Konagai H.
タイトル A novel automated label data extraction and data base generation system from herbarium specimen images using OCR and NER.
ジャーナル Sci Rep
Abstract Digital extraction of label data from natural history specimens along with more efficient procedures of data entry and processing is essential for improving documentation and global information availability. Herbaria have made great advances in this direction lately. In this study, using optical character recognition (OCR) and named entity recognition (NER) techniques, we have been able to make further advancements towards fully automatic extraction of label data from herbarium specimen images. This system can be developed and run on a consumer grade desktop computer with standard specifications, and can also be applied to extracting label data from diverse kinds of natural history specimens, such as those in entomological collections. This system can facilitate the digitization and publication of natural history museum specimens around the world.
巻・号 14(1)
ページ 112
公開日 2024-1-2
DOI 10.1038/s41598-023-50179-0
PII 10.1038/s41598-023-50179-0
PMID 38167449
PMC PMC10761843
MeSH Databases, Factual Documentation* Entomology Museums*
リソース情報
GBIF Herbarium Specimens of Museum of Nature and Human Activities, Hyogo Pref., Japan