The goal of this project is to extract scientific names from Ruhoff 1980
Cleaned up data file
-
Make OCR
-
Fix spaces in species names
-
Fix commas which were recognized as periods.
-
Extract name part (06-names.csv first column is the place to fix errors)
| Names | Number | Percentage |
|---|---|---|
| Total | 35487 | 100% |
| All Matches | 26799 | 75.4% |
| No Match | 8688 | 24.6% |
| Canonical + Auth. Match | 22311 | 62.8% |
| Canonical Match | 3448 | 9.7% |
| Fuzzy Canonical Match | 1040 | 2.9% |