Archive
Posts Tagged ‘text’
automatic text summarization
November 2, 2015
Leave a comment
See https://github.com/miso-belica/sumy . In the README there is a list of alternative projects.
Is a file binary?
June 17, 2014
Leave a comment
Problem
I want to process all text files in a folder recursively. (Actually, I want to extract all URLs from them). However, their extensions are not necessarily .txt. How to separate text files from binary files?
Solution
In this thread I found a solution. Here is my slightly modified version:
def is_binary(fname):
"""
Return true if the given filename is binary.
found at http://stackoverflow.com/questions/898669
"""
CHUNKSIZE = 1024
with open(fname, 'rb') as f:
while True:
chunk = f.read(CHUNKSIZE)
if '\0' in chunk: # found null byte
return True
if len(chunk) < CHUNKSIZE:
break # done
return False
If it finds a '\0' character, then the file is considered to be binary. Note that it will also classify UTF-16-encoded text files as “binary”.
Reading and writing a file
December 17, 2010
Leave a comment
Here is a mini cheat sheet for reading and writing a text file.
Read a text file line by line and write each line to another file (copy):
f1 = open('./in.txt', 'r')
to = open('./out.txt', 'w')
for line in f1:
to.write(line)
f1.close()
to.close()
Variations:
text = f.read() # read the entire file line = f.readline() # read one line at a time lineList = f.readlines() # read the entire file as a list of lines
