degree character detection °

chardet fails to deduce the encoding of the degree character correctly, I think.
https://stackoverflow.com/questions/1406707/weird-character-%C3%82-before-degrees-celcius-symbol-c

I noticed this while trying to detect character encoding on a csv file which contained `TOB1 °C;`

```
import chardet

b_bytes=b"\xc2\xb0"
b_bytes.hex()
Out[1]: 'c2b0'

b_bytes.decode("utf-8")
Out[2]: '°'

b_bytes.decode("ISO-8859-1")
Out[3]: 'Â°'

b_bytes.decode("latin-1")
Out[4]: 'Â°'

chardet.detect(b_bytes)
Out[5]: {'encoding': 'ISO-8859-1', 'confidence': 0.73, 'language': ''}
```

Shouldn't this be detected as `UTF-8` instead of `ISO-8859-1`? (apparently the same as latin-1 or windows-1252 in my case)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

degree character detection ° #305

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

degree character detection ° #305

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions