Text and Binary File Lesson
Text and Binary File Lesson
A file in itself is a bunch of bytes stored on some storage device like hard disk, thumb drive etc.
TYPES OF FILE
TEXT FILE
BINARY FILES
1) A binary file is just a file that contains information in the same format in which the information
is held in memory i.e the file content that is returened to you is raw.
2) There is no delimiter for a line
3) No translation occurs in binary file
4) Binary files are faster and easier for a program to read and write than are text files.
5) Binary files are the best way to store program information.
CSV FILES
1) CSV is a simple file format used to store tabular data, such as a spreadsheet or database.
2) CSV stands for "comma-separated values“.
3) A comma-separated values file is a delimited text file that uses a comma to separate values.
4) Each line of the file is a data record. Each record consists of one or more fields, separated by
commas. The use of the comma as a field separator is the source of the name for this file format
OPENING FILES
39
F= open('D:\\Computer\\abc.txt’) change the location of file then we have to mention complete
file path with file name . don t forget to mention double slash
in file path
F= open(r'D:\Computer\abc.txt,'w') if you don’t want to use // then use r in front of file
Note : if file mode is not mentioned in open function then default file mode i.e 'r' is used
2)Use of With Clause
Syntax:
with open(“file name”,access mode) as file object
example with open(“book.txt”,’r’) as f
NOTE: no need to close the file explicitly if we are using with clause
CLOSING FILES
close() : the close() method of a file object flushes any unwritten information and close the file object
after which no more writing can be done.
SYNTAX: fileobject.close()
FILE MODE
It defines how the file will be accessed
40
Returns the read bytes in the form of a string
In [11]:file 1=open(“E:\\mydata\\info.txt”)
In [12]:readInfo=file1.read(15)
In [13]:print(readInfo)#prints firt 15 #characters of
file
In [14]:type(readInfo)
Out[14]:str
2 Readline( ) <filehandle>.readline([n]) Reads a line of input ;if in is specified reads at most
n bytes.
Returns the read bytes in the form string ending
with in(line)character or returns a blank string if
no more bytes are left for reading in the file.
In [20]:file1 = open(“E:\\mydata\\info.txt”)
In [20]: readInfo =file1.readline()
In [22]:print (readInfo)
3 readlines() <filehandle>.readlines() Read all lines and returns them in a list
In [23]:file1 =open(“E:\\mydata\\info text”)
In [24]:readInfo =file1.readlines()
In [25]:print (readInfo)
In [26]:type (readInfo)
Out[26]:list
Writing data into files
1) the os module provides functions for working with files and directories ('os' stands for operating
system). os.getcwd returns the name of the current directory
import os
cwd=os.getcwd()
41
2) A string like cwd that identifies a file is called path. A relative path starts from the current
directory whereas an absolute path starts from the topmost directory in file system.
examples
To access the data in random fashion then we use seek () and tell () Method.
tell (): It returns an integer that specifies the current position of the file object in the file.
fileobject.tell()
seek(): It is used to position the file object at a particular position in a file. fileobject.seek(offset [,
reference point]) where offset is the number of bytes by which the file object is to be moved and
reference point indicating the starting position of the file object. Values of reference point as 0-
beginning of the file, 1- current position of the file, 2- end of file.
QUESTIONS
(1 mark questions)
Q1 what is the difference between 'w' and 'a' modes?
Q2 BINARY file is unreadable and open and close through a function only so what are the
advantages of using binary file
Q3 Which of the following functions changes the position of file pointer and returns its new
position?
A. a.)flush()
B. b. tell()
C. c. seek()
D. d. offset()
Q4 which of the following function returns a list datatype
A. d=f.read()
B. d=f.read(10)
C. d=f.readline()
D. d=f.readlines()
Q5 how many file objects would you need to manage the following situations :
(a) to process four files sequentially
(b) To process two sorted files into third file
Q6 The correct syntax of seek() is:
42
A. file_object.seek(offset [, reference_point])
B. seek(offset [, reference_point])
C. seek(offset, file_object)
D. seek.file_object(offset)
Q7 What will be the output of the following statement in python? (fh is a file handle)
fh.seek(-30,2)
A. open("file.txt", "w")
B. open("file.txt", "r")
C. read("file.txt")
D. write("file.txt")
Q 9 text file student.txt is stored in the storage device. Identify the correct option out of the
following options to open the file in read mode.
i. myfile = open('student.txt','rb')
ii. myfile = open('student.txt','w')
iii. myfile = open('student.txt','r')
iv. myfile = open('student.txt')
A. a. only i
B. b. both i and iv
C. c. both iii and iv
D. d. both i and iii
Q 10 Raman wants to open the file abc.txt for writing the content stored in a folder name sample
of his computer d drive help raman to tell the correct code for opening the file
A. myfile=open(“d:\\sample.txt\\abc”,’w’)
B. myfile=open(“d:\\sample\\abc.txt”,’w’)
C. myfile=open(“d:\\sample\abc.txt”,’w’)
D. all of the above
(2 mark questions)
Q1 write a single loop to display all the contents of a text file file1.txt after removing leading and
trailing WHITESPACES
Q2 what is the output of the following code fragment? explain
out=open('output.txt','w')
out.write('hello,world!\n')
out.write('how are you')
out.close()
43
open('output.txt').read()
Q3 read the code given below and answer the questions
f1=open('main.txt','w')
f1.write('bye')
f1.close()
if the file contains 'GOOD' before execution, what will be the content of the file after execution of
the code
Q4 observe the following code and answer the follow
f1=open("mydata","a")
______#blank1
f1.close()
(i)what type of file is mydata
(ii) Fill in the blank1 with statement to write "abc" in the file "mydata"
Q5 A given text file data.txt contains :
Line1\n
\n
line3
Line 4
\n
line6
Q6.What would be the output of following code?
f1=open('data.txt')
L=f1.readlines()
print(L[0])
print(L[2])
print(L[5])
print(L[1])
print(L[4])
print(L[3])
Q6 In which of the following file modes the existing data of the file will not be lost?
i) rb
ii) w
iii) a+b
iv) wb+
v)r+
vi)ab
vii) w+b
viii)wb
44
ix)w+
Q7 what would be the data types of variables data in following statements?
i) Data=f.read( )
ii) Data=f.read(10)
iii) Data=f.readline()
iv)Data=f.readlines()
Q8 Suppose a file name test1.txt store alphabets in it then what is the output of the following code
f1=open("test1.txt")
size=len(f1.read())
print(f1.read(5))
(3 marks questions)
Q 1 Write a user defined function in python that displays the number of lines starting with 'H' in
the file para.txt
Q2 write a function countmy() in python to read the text file "DATA.TXT" and count the number of
times "my" occurs in the file. For example if the file DATA.TXT contains-"This is my website. I
have diaplayed my preference in the CHOICE section ".-the countmy() function should display
the output as:"my occurs 2 times".
Q3 write a method in python to read lines from a text file DIARY.TXT and display those lines which
start with the alphabets P.
Q4 write a method in python to read lines from a text file MYNOTES.TXT and display those lines
which start with alphabets 'K'
Q5 write a program to display all the records in a file along with line/record number.
Q6 write a program that copies a text file "source.txt" onto "target.txt" barring the lines starting
with @ sign.
Answers
(1 mark questions)
Ans 1 w mode opens a file for writing only. it overwrites if file already exist but 'a mode appends
the existing file from end. It does not overwrites the file
Ans 2 binary file are easier and faster than text file.binary files are also used to store binary data
such as images, video files, audio files.
Ans3 c) seek()
Ans4 d) f.readlines()
Ans 5 a)4 b)3
Ans 6 a)
Ans 7 b)
Ans 8 b)
Ans 9 C)
Ans 10 b)
(2 marks questions)
print(line.strip())
45
Ans 2 The output will be
Hello,world!
The first line of code is opening the file in write mode,the next two line writes text t file .the last
line opens the file and from that reference reads the file content.file() performs the same functions
as open().Thus,the file(“output.txt”)will give the references to open the file on which read() is
applied.
Ans 3 The file would now contains “Bye”only because when an existing file is openend in write
mode .it truncates the existing data in file .
ii)File.write(“abc”)
Ans5Line1
Line3
Line 6
Line 4
Ans 8 No Output
Explanation: the f1.read() of line 2 will read entire content of file and place the file pointer at the
end of file. for f1.read(5) it will return nothing as there are no bytes to be read from EOF
and,thus,print statement prints nothing.
3 marks question
Ans.1
def count H ():
F = open (“para.txt” , “r” )
lines =0
l=f. readlines ()
for i in L:
if i [0]== ‘H’:
Lines +=1
print (“No. of lines are: “ , lines)
46
Ans.2
def countmy ():
f=open (“DATA.txt” ,”r”)
count=0
x= f.read()
word =x.split ()
for i in word:
if (i == “my”):
count =count + 1
print (“my occurs” ,count, “times”)
Ans.3
def display ():
file=open(‘DIARY.txt ‘ , ‘r’)
lines= file.readline()
while line:
if line[0]== ‘p’ :
print(line)
line=file.readline ()
file.close()
Ans.4
def display ():
file=open(MYNOTES.TXT’ , ‘r’)
lines=file.readlines()
while line:
if line[0]==’K’ :
print(line)
line=file.readline()
file.close()
Ans5
f=open(“result.dat” , “r”)
count=0
rec=””
While True:
rec=f.readline (0)
if rec == “ “ :
break
count=count+1
print (count,rec)
f.close()
Ans.6
def filter (oldfile, newfile):
fin =open (oldfile, “r”)
fout= open (newfile, “w”)
while True:
text =fin.readline ()
if len(text)==0:
break
if text[0]== “@”:
continue
fout.write(text)
fin.close()
fout.close()
filter(“source.txt” , “target.txt”)
47
BINARY FILES IN PYTHON:
A Binary file stores the information in the form of a stream of bytes. A binary file stores the data in the
same way as stored in the memory. In Binary file there is no delimiter for a line. The file contents
returned by a binary file is raw i.e. with no translation, thus Binary files are faster than text files. To
work with binary files, you need to open them using specific file modes:
• rb: Read a binary file. The file pointer is placed at the beginning of the file. This is the default
mode for reading.
• rb+: Read and write a binary file. The file pointer is placed at the beginning of the file.
• wb: Write to a binary file. This mode will overwrite the file if it exists, or create a new file if it
doesn't.
• wb+: Write and read a binary file. This mode will overwrite the file if it exists, or create a new
file if it doesn't.
• ab: Append to a binary file. The file pointer is at the end of the file if it exists. If the file does not
exist, it creates a new file for writing.
• ab+: Append and read a binary file. The file pointer is at the end of the file if it exists. If the file
does not exist, it creates a new file for writing.
1. Nature of Data
Binary Files: Store data in binary format (0s and 1s). The data is not human-readable and can represent various
types of data, including images, audio, and executable code.
Text Files: Store data in plain text format. The data is human-readable and consists of characters encoded in
formats such as ASCII or Unicode.
2. Usage
Binary Files: Used for data that is not meant to be read directly by humans, such as media files, compiled
programs, and data serialization.
Text Files: Used for data that needs to be read and edited by humans, such as source code, configuration files,
and documents.
3. Encoding
Binary Files: No specific encoding scheme; the interpretation of bytes depends on the file format.
Text Files: Use character encoding schemes like ASCII, UTF-8, or UTF-16 to represent text.
48
Python objects (list, dictionary etc) have a specific structure which must be maintained while storing or
accessing them. Python provides a special module called pickle module for this.
PICKLING refers to the process of converting the structure(list/dictionary) to a byte of stream before
writing it to a file. The process to converts any kind of python objects (list, dict etc.) into byte streams (0s and
1s).
UNPICKLING is used to convert the byte stream back to the original structure while reading the
contents of the file.
pickle Module: -
pickle.dump() – Thi method i u ed to write the object in the fi e which i opened in ‘wb’ or ‘ b’ i.e. write
binary or append binary access mode respectively.
Syntax :
pickle.dump(<structure>,<FileObject>)
import pickle
fo = open("binary_file1.dat","wb")
Laptop = ["Dell","HP","ACER"]
pickle.dump(Laptop,fo)
fo.close()
pickle.load() – This method is used to read data from a file and return back into the structure (list/dictionary).
Syntax :
<structure> = pickle.load(<FileObject>)
Structure can be any sequence in Python such as list, dictionary etc. FileObject is the file handle of file in which
we have to write.
49
f1=open("my_bin1.bin","rb")
D2=pickle.load(f1)
print(D2)
f.close()
2 The process of converting the structure to a byte stream before writing to the file is 1
known as _________.
3 The process of converting byte stream back to the original structure is known as 1
_______
4 Raman open a file in readmode, but the file doe n’t exi t in the fo der. 1
Python raised an error for the code. What type of error will be shown?
5 The prefix ______ in front of a string makes it raw string that is no special meaning 1
attached to any character
51
8 Which of the following statement is incorrect in the context of binary files? 1
a. Information is stored in the same format in which the information is held in memory.
b. No character translation takes place
c. Every line ends with a new line character
d. pickle module is used for reading and writing
10 How text files and binary files are stored inside computer memory? 2
11 Name any two exceptions that occur while working with pickle module. 2
13 Binary files are the best way to store program information. Discuss 3
Answers :
Sample Answers
1. Binary files
2. Pickling
3. Unpickling
4. FileNotFoundError
5. r
6. Serialization.
7. Newline
8. Every line ends with a new line character
9.EOFError is raised when one of the built-in functions input() or raw_input() hits an end-of-file
condition (EOF) without reading any data. We can overcome this issue by using try and except
keywords in Python, called Exception Handling.
10.A text file stores information in the form of a stream of ASCII or Unicode characters based on the
default state of programming languages. Binary file store information as stream of bytes .
11. Pickle.PicklingError and pickle.Unpickling Error
1 . writer.writerow(row) Write the row p r meter to the writer’ fi e object, form tted ccording to
de imiter defined in writer function. writerow (row ) Write mu tip e row ( equence) to the writer’
file object.
13. .Binary files store the information in the form of a stream of bytes similar to the format a computer
memory holds data. Also there is no delimiter for a line and no translations occur in binary files. Thus
binary files are faster and easier for a program to read and write. So the best method for a data or
program information is to store it as binary files.
52
QUESTION ANSWERS: SET 2
1 Write Python statements to open a binary file "student.dat" in both read & write 1
mode.
16 Amritya Seth is a programmer, who has recently been given a task to write a python 5
code to perform the following binary file operations with the help of two user defined
functions/modules:
b. GetStudents() to display the name and percentage of those students who have a
percentage greater than 75. In case there is no student having percentage > 75 the
function displays an appropriate message. The function should also display the average
percent. He has succeeded in writing partial code and has missed out certain
54
statements, so he has left certain queries in comment lines. You as an expert of Python
have to provide the missing statements and other related queries based on the
following code of Amritya Answer any four questions (out of five) from the below
mentioned questions.
import pickle
def AddStudents():
____________ #1 statement to open the binary file to write data
while True:
Rno = int(input("Rno :"))
Name = input("Name : ")
Percent = float(input("Percent :"))
L = [Rno, Name, Percent]
____________ #2 statement to write the list Linto the file
Choice = input("enter more (y/n): ")
if Choice in "nN":
break
F.close()
def GetStudents():
Total=0
Countrec=0
Countabove75=0
with open("STUDENT.DAT","rb") as F:
while True:
try:
____________ #3 statement to readfrom the file
Countrec+=1
Total+=R[2]
if R[2] > 75:
print(R[1], " has percent =",R[2])
Countabove75+=1
except:
break
55
if Countabove75==0:
print("There is no student who has percentage more than 75")
average=Total/Countrec print("average percent of class = ",average)
AddStudents()
GetStudents()
1. Wh ch f h f w g c mm s s s h f “ D .DA ” f
writing only in binary format? (marked as #1 in the Python code)
a. F= open("STUDENT.DAT",'wb')
b. F= open("STUDENT.DAT",'w')
c. F= open("STUDENT.DAT",'wb+')
d. F= open("STUDENT.DAT",'w+')
2. Which of the following commands is used to write the list L into the binary file,
STUDENT.DAT? (marked as #2 in the Python code)
a. pickle.write(L,f)
b. pickle.write(f, L)
c. pickle.dump(L,F) d.
f=pickle.dump(L)
3. Which of the following commands is used to read each record from the binary file
STUDENT.DAT? (marked as #3 in the Python code)
a. R = pickle.load(F)
b. pickle.read(r,f)
c. r= pickle.read(f)
d. pickle.load(r,f)
4. Which of the following statement(s) are correct regarding the file access modes?
. ‘r+’ open fi e for both re ding nd writing. Fi e object point to it beginning.
b. ‘w+’ open fi e for both writing nd re ding. Add t the end of the exi ting fi e if it
exists and creates a new one if it does not exist.
c. ‘wb’ open fi e for re ding nd writing in bin ry form t. Overwrite the fi e if it
exists and creates a new one if it does not exist.
d. ‘ ’ open fi e for ppending. The fi e pointer i t the t rt of the fi e if the fi e
exists
5. Which of the following statements correctly explain the function of seek()
method?
a. tells the current position within the file.
b. determines if you can move the file position or not.
56
c. indicates that the next read or write occurs from that position in a file.
d. moves the current file position to a given specified position
ANSWERS
1.file = open("student.dat", "rb+")
2.pickling is used for object serialization
3.a) ab b)pickle.dump(employee,outfile)
4. dump(t, myfile)
5. b
6.a
7.c
8.c
9.b
10 b
11. c
12. There are two types of files: Text Files- A file whose contents can be viewed using a text editor is
called a text file. A text file is simply a sequence of ASCII or Unicode characters. Python programs,
contents written in text editors are some of the example of text files. Binary Files-A binary file stores
the data in the same way as as stored in the memory. The .exe files, mp3 file, image files, word
document re ome of theex mp e of bin ry fi e . We c n’t re d bin ry fi e u ing text editor
13. Pickling is the process of transforming data or an object in memory (RAM) to a stream of bytes
called byte streams. These byte streams in a binary file can then be stored in a disk or in a database or
sent through a network.
Unpickling is the inverse of pickling process where a byte stream is converted back to Python object
14. A binary file is a file whose content is in a binary format consisting of a series of sequential bytes,
each of which is eight bits in length.Binary Files contain raw data so are not in human readable format.
It can be read by using some special tool or program.
Document files: .pdf, .doc, .xls etc.
Image files: .png, .jpg, .gif, .bmp etc.
Video files: .mp4, .3gp, .mkv, .avi etc.
Audio files: .mp3, .wav, .mka, .aac etc.
Database files: .mdb, .accde, .frm, .sqlite etc.
Archive files: .zip, .rar, .iso, .7z etc.
Executable files: .exe, .dll, .class etc
15. To delete a file, import the OS module, and run its os.remove() function. import os
os.remove("demofile.txt")
16. I. a) II. c) III. a) IV. a) V.d)
17. Do Yourself
57