Archive
Posts Tagged ‘binary’
Is a file binary?
June 17, 2014
Leave a comment
Problem
I want to process all text files in a folder recursively. (Actually, I want to extract all URLs from them). However, their extensions are not necessarily .txt. How to separate text files from binary files?
Solution
In this thread I found a solution. Here is my slightly modified version:
def is_binary(fname):
"""
Return true if the given filename is binary.
found at http://stackoverflow.com/questions/898669
"""
CHUNKSIZE = 1024
with open(fname, 'rb') as f:
while True:
chunk = f.read(CHUNKSIZE)
if '\0' in chunk: # found null byte
return True
if len(chunk) < CHUNKSIZE:
break # done
return False
If it finds a '\0' character, then the file is considered to be binary. Note that it will also classify UTF-16-encoded text files as “binary”.
