I noticed this while trying to detect character encoding on a csv file which contained TOB1 °C;
import chardet
b_bytes=b"\xc2\xb0"
b_bytes.hex()
Out[1]: 'c2b0'
b_bytes.decode("utf-8")
Out[2]: '°'
b_bytes.decode("ISO-8859-1")
Out[3]: '°'
b_bytes.decode("latin-1")
Out[4]: '°'
chardet.detect(b_bytes)
Out[5]: {'encoding': 'ISO-8859-1', 'confidence': 0.73, 'language': ''}
chardet fails to deduce the encoding of the degree character correctly, I think.
https://stackoverflow.com/questions/1406707/weird-character-%C3%82-before-degrees-celcius-symbol-c
I noticed this while trying to detect character encoding on a csv file which contained
TOB1 °C;Shouldn't this be detected as
UTF-8instead ofISO-8859-1? (apparently the same as latin-1 or windows-1252 in my case)