Computer Science
Chapter 1
Data Representation
Revision
Chapter 1.
Data Representation
• Data is information coded in a format ready for processing.
• Data is raw facts and figures and can be in the form of numbers,
symbols or alphanumeric characters.
• Analogue data, are a continuous signal which represents physical
measurements, such as sound or light waves and impulses on our skin.
• They are the smooth stream of data and denoted by sine waves.
• Computers cannot process analogue data, they are only capable of
processing digital data.
• Digital data are discrete time signals generated by digital modulation
and represented in the values 1 and 0 that a computer can process.
• They are discrete or discontinuous values to represent information,
denoted by square waves.
• Since computers contain millions and millions of tiny ‘switches’, which
must be in the ON or OFF position.
• They can be represented by the binary system.
A switch in the ON position is represented by 1;
A switch in the OFF position is represented by 0.
• Switches used in a computer make use of logic gates and are used to
store and process data.
Number Systems
Binary Number System
• The denary number system which counts in multiples of 10.
• This gives us the well-known headings of units, 10s, 100s, 1000s, and so
on:
• Denary uses ten separate digits, 0-9, to represent all values.
• Denary is known as a base 10 number system.
• The binary number system is a base 2 number system. It is based on the
number 2.
• Thus, only the two ‘values’ 0 and 1 can be used in this system to
represent all values.
• Using the same method as denary, this gives the headings 20 , 21 , 22 ,
23 , and so on.
• The typical headings for a binary number with eight digits would be:
The hexadecimal system
• The hexadecimal number system is very closely related to the binary
system.
• Hexadecimal is a base 16 system and needs to use 16 different ‘digits’ to
represent each value.
• the numbers 0 to 9 and the letters A to F are used to represent each
hexadecimal (hex) digit.
• A in hex = 10 in denary, B = 11, C = 12, D = 13, E = 14 and F = 15.
• Using the same method as for denary and binary, this gives the
headings 160 , 161 , 162 , 163 , and so on.
• The typical headings for a hexadecimal number with five digits would
be:
• The following table summarises the link between binary, hexadecimal
and denary:
The Use of Hexadecimal System
Use of the hexadecimal system
• Since one hex digit represents four binary digits, computer scientists
work with hexadecimal.
• The hex number is far easier for humans to remember, copy and work
with.
• The uses of the hexadecimal system:
Error debugging
Standard Windows error message codes
Machine code instructions
MAC addresses
IPv6 addresses
HTML colour codes
Error debugging
• Hexadecimal numbers refer to the memory location of the error and are
usually automatically generated by the computer.
• The programmer needs to know how to interpret the hexadecimal error
codes.
• Programs that are written in hexadecimal are easier to debug than
those written in binary.
Standard Windows error message codes
• Standard Windows error message codes are given in hexadecimal
notation,.
• Eg: error code 404 (meaning File not found’) is a hexadecimal notation.
Machine code instructions
• Machine code consists of simple instructions that are directly executed
by the CPU.
• Hexadecimal is used for machine code as each byte can be coded as two
hexadecimal symbols.
Media Access Control (MAC) address
• MAC address refers to a number which uniquely identifies a device on
the internet.
• The MAC address refers to the network interface card (NIC) which is
part of the device.
• The MAC address is rarely changed so that a particular device can
always be identified no matter where it is.
• A MAC address is usually made up of 48 bits which are shown as six
groups of hexadecimal digits (although 64-bit addresses are also
known):
• NN:NN:NN:DD:DD:DD
• where the first half (NN - NN - NN) is the identity number of the
manufacturer of the device and the second half(DD- DO - DO) is the
serial number of the device.
Internet Protocol (IP) address.
• Pv4 has recently been improved upon by the adoption of IPv6.
• An IPv6 address is a 128-bit number broken down into 16-bit chunks,
represented by a hexadecimal number.
HyperText Mark-up Language (HTML) colour codes
• HyperText Mark-up Language (HTML) is used when writing and
developing web pages.
• HTML isn’t a programming language but is simply a mark-up language.
• A mark-up language is used in the processing, definition and
presentation of text.
Register
• A register is a small piece of memory built into the central processing
unit (CPU) of a computer system
• Register temporarily held the values and instructions of a program.
• They are not part of primary memory or secondary storage.
• Register has much quicker data read/write rate than primary memory or
secondary storage.
• Computer systems use registers to hold values and instructions for
processing, to increase the speed at which they can be processed.
• There are different types of register, such as processor registers and
hardware registers.
• For example the program counter (PC), the accumulator and the
memory address register (MAR) are part of the CPU.
• They are used to process data and instructions.
• They can be written to and read from extremely quickly.
• The fast speed of access makes registers very suitable for situations
where small amounts of data need to be accessed quickly, such as
performing calculations.
• Hardware registers are specific to different types of hardware.
• They are used to convey a signal.
• Consider a robot arm that has various motors to perform different
operations, for example, raise the arm, open the grip and close the grip.
• Each motor works via a signal, 1 for on, 0 for off. A register is used for
each motor to convey the signal.
Data Storage
Data storage
• As computers can only process the 1s and 0s that we know to be binary,
all data stored on a computer is in binary form.
• If we had to carry out our daily tasks on a computer using only binary it
would be extremely time consuming and challenging.
• Imagine having to write an email to your friends using only binary.
• Typing the 1s and 0s that represent each character would take you
much longer than typing out the characters the binary numbers
represent.
• Also, imagine having to type in each binary digit needed to create your
favourite images - it would be extremely difficult.
• Depending on its resolution, a character might require 100 bits of data.
• An image might require millions of bits.
• A number of systems and software were developed to do this for users
and they help a computer store different data such as text, images,
video and audio.
Character sets – ASCII code and Unicode
• ASCII (American Standard Code for Information Interchange)
• The ASCII code system was set up in 1963 for use in communication
systems and computer systems.
• A newer version of the code was published in 1986.
• The standard ASCII code character set consists of 7-bit codes
• (0 to 127 in denary or 00 to 7F in hexadecimal) that represent the
letters, numbers and characters found on a standard keyboard,
• together with 32 control codes (that use codes 0 to 31 (denary) or 00 to
19 (hexadecimal)).
Part of the standard ASCII code table (only the control codes have been removed)
• Extended ASCII uses 8-bit codes (0 to 255 in denary or 0 to FF in
hexadecimal).
• This gives another 128 codes to allow for characters in non-English
alphabets and for some graphical characters to be included:
Unicode
• ASCII code does not represent characters in non-Western languages,
for example Chinese characters.
• DOS and Windows use different characters for some ASCII codes.
• Different methods of coding have been developed over the years.
• One coding system is called Unicode.
• Unicode can represent all languages of the world, thus supporting
many operating systems, search engines and internet browsers used
globally.
• There is overlap with standard ASCII code, since the first 128 (English)
characters are the same, but Unicode can support several thousand
different characters in total.
• ASCII uses one byte to represent a character, whereas Unicode will
support up to four bytes per character.
• The Unicode consortium was set up in 1991. Version 1.0 was published
with five goals;
• These were to:
» create a universal standard that covered all languages and all writing
systems
» produce a more efficient coding system than ASCII
» adopt uniform encoding where each character is encoded as 16-bit
or 32-bit code
» create unambiguous encoding where each 16-bit and 32-bit value
always represents the same character
» reserve part of the code for private use to enable a user to assign
codes for their own characters and symbols (useful for Chinese and
Japanese character sets, for example).
Representation of sound
• Soundwaves are vibrations in the air.
• The human ear senses these vibrations and interprets them as sound.
• Each sound wave has a frequency, wavelength and amplitude.
• The amplitude specifies the loudness of the sound.
• Sound waves vary continuously.
• This means that sound is analogue.
• Computers cannot work with analogue data, so sound waves need to be
sampled in order to be stored in a computer.
• Sampling means measuring the amplitude of the sound wave.
• This is done using an analogue to digital converter.
• To convert the analogue data to digital, the sound waves are sampled at
regular time intervals.
• The amplitude of the sound cannot be measured precisely, so
approximate values are stored.
• The number of bits per sample is known as the sampling resolution (also
known as the bit depth).
• Sampling rate is the number of sound samples taken per second.
• How is sampling used to record a sound clip?
» the amplitude of the sound wave is first determined at set time intervals (the
sampling rate)
» this gives an approximate representation of the sound wave
» each sample of the sound wave is then encoded as a series of binary digits.
• The higher the sampling rate and/or sampling resolution, the greater
the file size.
Representation of (bitmap) images
• Bitmap images are made up of pixels (picture elements);
• an image is made up of a two-dimensional matrix of pixels.
• Each pixel can be represented as a binary number, and so a bitmap
image is stored in a computer as a series of binary numbers.
» A black and white image only requires 1 bit per pixel. this means that each
pixel can be one of two colours, corresponding to either 1 or 0
» If each pixel is represented by 2 bits, then each pixel can be one of four
colours (22 = 4), corresponding to 00, 01, 10, or 11
» if each pixel is represented by 3 bits then each pixel can be one of eight
colours (23 = 8), corresponding to 000, 001, 010, 011, 100, 101, 110, 111.
• The number of bits used to represent each colour is called the colour
depth.
• An 8 bit colour depth means that each pixel can be one of 256 colours
(because 28 = 256).
• Modern computers have a 24 bit colour depth, which means over 16
million different colours can be represented With x pixels, 2x colours can
be represented as a generalisation.
• Increasing colour depth also increases the size of the file when storing
an image.
• Image resolution refers to the number of pixels that make up an image;
• for example, an image could contain 4096 × 3072 pixels (12 582 912
pixels in total).
• The resolution can be varied on many cameras before taking, for
example, a digital photograph.
• Photographs with a lower resolution have less detail than those with a
higher resolution.
Measurement of Data storage
Measurement of data storage
• A bit is the basic unit of all computing memory storage terms and is
either 1 or 0.
• The word comes from binary digit.
• The byte is the smallest unit of memory in a computer.
• 1 byte is 8 bits.
• A 4-bit number is called a nibble – half a byte.
• 1 byte of memory wouldn’t allow you to store very much information
• Since memory size is actually measured in terms of powers of 2, another
system has been adopted by the IEC (International Electrotechnical
Commission) that is based on the binary system.
• This system is more accurate. Internal memories (such as RAM and
ROM) should be measured using the IEC system.
The file size of an image is calculated as:
image resolution (in pixels) × colour depth (in bits)
• A photograph is 1024 × 1080 pixels and uses a colour depth of 32 bits.
• How many photographs of this size would fit onto a memory stick of
64GiB?
1. Multiply number of pixels in vertical and horizontal directions to find
total number of pixels = (1024 × 1080) = 1105920pixels
2. Now multiply number of pixels by colour depth then divide by 8 to
give the number of bytes = 1105920 × 32 = 35389440/8 bytes =
4423680 bytes
3. 64 GiB = 64 × 1024 × 1024 × 1024 = 68719476736 bytes
4. Finally divide the memory stick size by the files size =
68719476736/4423680 = 15534 photos.
The size of a mono sound file is calculated as:
sample rate (in Hz) × sample resolution (in bits) × length of sample
(in seconds)
• An audio CD has a sample rate of 44100 and a sample resolution of
16bits. The music being sampled uses two channels to allow for stereo
recording. Calculate the file size for a 60-minute recording.
• Size of file = sample rate (in Hz) × sample resolution (in bits) × length of
sample (in seconds)
• Size of sample = (44100 × 16 × (60 × 60)) = 2540160000bits
• Multiply by 2 since there are two channels being used = 5080320000bits
• Divide by 8 to find number of bytes = (5080320000)/8 = 635040000 5
Divide by 1024 × 1024 to convert to MiB = 635 040 000 / 1 048 576 =
605MiB.
File compression
Reasons to reduce (or compress) the size of a file
1. to save storage space on devices such as the hard disk drive/solid state
drive
2. to reduce the time taken to stream a music or video file
3. to reduce the time taken to upload, download or transfer a file across
a network
4. the download/upload process uses up network bandwidth – this is the
maximum rate of transfer of data across a network, measured in bits
per second.
This occurs whenever a file is downloaded, for example, from a server.
Compressed files contain fewer bits of data than uncompressed files and therefore
use less bandwidth, which results in a faster data transfer rate.
5. reduced file size also reduces costs. For example, when using cloud
storage, the cost is based on the size of the files stored. Also an
internet service provider (ISP) may charge a user based on the amount
of data downloaded.
Lossy file compression
• With this technique, the file compression algorithm eliminates
unnecessary data from the file.
• This means the original file cannot be reconstructed once it has been
compressed.
• Lossy file compression results in some loss of detail when compared to
the original file.
• The algorithms used in the lossy technique have to decide which parts
of the file need to be retained and which parts can be discarded.
• For example, when applying a lossy file compression algorithm to:
» an image, it may reduce the resolution and/or the bit/colour depth
» a sound file, it may reduce the sampling rate and/or the resolution.
• Lossy files are smaller than lossless files which is of great benefit when
considering storage and data transfer rate requirements.
• Common lossy file compression algorithms are:
• » MPEG-3 (MP3) and MPEG-4 (MP4)
• » JPEG.
MPEG-3 (MP3) and MPEG-4 (MP4)
• MP3 files are used for playing music on computers or mobile phones.
• This compression technology will reduce the size of a normal music file
by about 90%.
• While MP3 music files can never match the sound quality found on a
DVD or CD, the quality is satisfactory for most general purposes.
• But how can the original music file be reduced by 90% while still
retaining most of the music quality?
• Essentially the algorithm removes sounds that the human ear can’t hear
properly.
• For example:
» removal of sounds outside the human ear range
» if two sounds are played at the same time, only the louder one can be heard by
the ear, so the softer sound is eliminated.
This is called perceptual music shaping.
• MP4 files are slightly different to MP3 files.
• This format allows the storage of multimedia files rather than just sound
– music, videos, photos and animation can all be stored in
the MP4 format.
• As with MP3, this is a lossy file compression format, but it still retains an
acceptable quality of sound and video.
• Movies, for example, could be streamed over the internet using the
MP4 format without losing any real discernible quality.
• JPEG
• When a camera takes a photograph, it produces a raw bitmap file which
can be very large in size.
• These files are temporary in nature.
• JPEG is a lossy file compression algorithm used for bitmap images.
• As with MP3, once the image is subjected to the JPEG compression
algorithm, a new file is formed and the original file can no longer be
constructed.
• The JPEG file reduction process is based on two key concepts:
» human eyes don’t detect differences in colour shades quite as well as they detect
differences in image brightness
(the eye is less sensitive to colour variations than it is to variations in brightness)
» by separating pixel colour from brightness, images can be split into 8 × 8 pixel blocks,
for example, which then allows certain ‘information’ to be discarded from the image
without causing any real noticeable deterioration in quality.
Lossless Compression
• Lossless refers to a method of compression that loses no data in the
process.
• In lossless compression, the compressed data can be reversed to
reconstruct the data file exactly as it was.
• Lossless compression is used when it is essential that no data is lost or
discarded during the compression process.
• There are many different lossless compression algorithms; most work
using a shorthand to store the data that can be then reconstructed
when the file is opened.
• If a lossless compression method is used on a music file it will not lose
any of the data from the file.
• A possible way to compress the data would be to look for repeating
patterns in the music.
• It would store this pattern once along with how many times it is
repeated.
• This way repeating data is reduced.
• When the music track is played, the full track, exactly as it was recorded,
can be reconstructed and listened to.
• People may use lossless compression when downloading a music track if
they want the highest quality possible and to hear the track exactly as it
was recorded.
• Lossless compression can also be used when storing text files.
• Consider the following message:
WHEN IT IS SNOWING HEAVILY LOOK OUTSIDE.
LOOK OUTSIDE IT IS SNOWING HEAVILY.
• Excluding the spaces between the words and the full stops, the message
has a total of 62 characters.
• 1 character requires 1 byte of storage, so we would need 62 bytes of
memory to store this message.
• When we examine the message we can see that it consists of words that
are mostly repeated.
• Therefore we would need 33 bytes to store the words and 13 bytes to
store the positions, giving a total of 46 bytes.
• This is much less than the 62 bytes we required with our original
method.
• No data has been lost and we have reduced our storage requirements
by 26%, quite a saving!
• To recreate the message, the computer simply retrieves the words and
places them in the positions allocated.
• We should note that the amount of compression we can achieve varies
depending on the data we wish to store.
• Both the lossy and lossless methods of compression reduce the size of
an image by looking for repeating colour patterns within the image.
• For example, for an image that has a main background colour that is the
same throughout the image, a compression method will recognise that
there are a lot of pixels that all have the same value and collate them.
• This means they will be stored as a single data value with further data
that records the pattern.
• Lossy compression will reduce the file size further by removing detail
from the image that should go unnoticed and will not affect the quality
too much.
• One issue with using some lossy compression methods on images, is
that the method will remove a little bit of detail each time the image is
saved in the compression method e.g. JPEG.
• This means that there will be a small loss in quality each time it is saved.
File formats
• A file format is the method that we choose to store different data on a
computer.
• Different file formats encode data in different ways.
• This means that they organise the data for storage in different ways.
• It is important for software to recognise the file format used to save the
data in order to access it.
• There are many different types of file format.
• Some are specific to software and some are more generic or standard.
• Certain file formats are designed for a particular type of data, for
example text, images or multimedia.
• The file format will mostly depend on the type of data it will be storing.
• The file format is normally three or four characters, separated from the
file name with a dot, and is known as the file extension.
• These are the most common file extensions:
• Users often need to import and export data in and out of different
software.
• In order to do this users need the different files to be compatible with
each other so that data can be effectively imported and exported.
• This prompted the development of standard file formats that different
software applications can understand.
• There are four multimedia standard file formats that you should be
aware of:
• Musical Instrument Digital Interface (MIDI) uses a series of protocols
and interfaces that allow lots of different types of musical instrument to
connect and communicate.
• MIDI also allows one computer, or instrument, to control other
instruments.
• MIDI files are not a musical recording, but a series of instructions for an
instrument to carry out.
• Joint Photographic Experts Group (JPEG) is a standard format for lossy
compression of images.
• It can reduce files down to 5% of their original size.
• MP3 is a standard format for lossy compression of audio files.
• MP4 is a standard format for lossy compression of video files.
• It can also be used on other data such as audio and images.
• MP3 and MP4 have developed from the original file format Motion
Picture Experts Group (MPEG).
• This is a lossy compression method for video files dating back to 1991.
• JPEGs, MP3s and MP4s are used in a wide variety of devices, such as
computers, digital cameras, DVD/Blu-Ray players and smartphones to
store content.