PART A:
CHAPTER 2: SOURCES OF DATA
Introduction
The management accountant needs data in order to be able to process it into information.
This chapter lists various sources of data and also various sampling techniques.
Definitions
Data – Raw facts and figures.
Information – Data that has been processed to have meaning.
Classify the statements as data or information:
Data or
Statement
information
Exam results for a class of mathematics students for the years 20X1 – 20X5. Data
A graph showing the exam results of a class of mathematics students for the Information
years 20X1 – 20X5.
A list of purchase invoices for Furniture Co for September 20X1. Data
Management accounts for Furniture Co for the month of June 20X1. Information
A list showing the age of members of staff in the Human Resources (HR) Data
department of Furniture Co during 20X5
A chart showing how the ages of staff in the HR department of Furniture Co Information
in 20X5 range from 28 to 49 years old
Primary and secondary sources of data
Primary data are data that have been collected for the specific purpose (does not exist).
Secondary data are data that have been collected for some other purpose but which we
then use for our purposes (already exists).
Illustration:
Decide which of the following are primary data and which are secondary data.
a) Information from clock cards when used for making up wages.
b) Data from a government publication on the toy industry used by a new toy shop to
determine which items to stock.
c) Expense claim forms submitted by sales representatives used to estimate the car mileage
they have travelled.
d) Results of an election opinion poll published in a newspaper.
Solution
a) This is primary data, since the data is collected to make up the wages.
b) This is secondary data; government statisticians collate data from various sources and the
data is used in a variety of ways.
c) This is secondary data since the expense claim data is collected for a different reason
initially.
d) This is primary data since the data was collected specifically for the purpose. If you said
secondary data you were probably thinking that the results were being used to predict the
result of the election; this is different from the reason why it was collected.
Internal and external sources of data
Internal data are data collected from our own records. These are the main source of primary
data.
External data are data collected from elsewhere – e.g. the internet, government statistics,
financial newspapers. These will be secondary data.
Sources of Data: Machine, Transactional, and Human
Data source Description Example
Machine/sensor data gathered from a device Sensors can record data such as
data containing a sensor that temperature, motion, light and proximity.
detects data from the physical For example, supermarket chains often
environment. record the temperature of their fridges in
stores using sensors and then use this data
to predict when particular sets of fridges
will need to be serviced.
In the agricultural industry, sensors can
measure soil temperature and relative
humidity, leading to better decisions about
pesticides and watering levels.
Transactional data relating to the Transactional data details include the date,
data transactions of a business, cost or price, quantities and payment
such as data recorded when methods.
products or services are Compared with data such as the name and
bought and sold. address of a customer (known as master
data), transactional data is created
frequently and changes frequently.
Human/social Data related to humans from It can include the user’s location (for
data social media posts, example, the place of the person posting
handwritten letters, phone on social media), language (for example,
calls, questionnaires and English or Chinese), emotions and
surveys. preferences.
It often needs human interpretation, for
example, to establish and understand what
people think and feel.
Exercise: Determine the source of the data:
Data Source
A review by a customer on social media about the quality of material used
Human/social
in the product
The date the materials were last purchased from the supplier transactional
The current location of the delivery van containing the new order of the
Machine/sensor
materials
A customer telephone query about whether the materials in the product
Human/social
have changed
Direct and Indirect Data Capture Costs
Direct data – Data captured directly from the source.
Indirect data – Any data which is not obtained directly from the source. Probably obtained
through secondary sources and aggregators.
Direct Costs of Information
Day-to-day running costs
Costs associated with capturing, processing, analysing, and using data.
For example, the costs associated with maintaining a system that captures customer data as
they make purchases on an e-commerce platform. e.g. Jumia app, Kilimall, Jiji
Another example could be costs associated with reporting platform that queries data from
storage and organises/translates it into useable reports/charts/dashboards.
Storage costs
Costs associated with storage of data. An example could be costs incurred to maintain an on-
site storage server or subscription to cloud storage.
It also includes costs associated with database management systems that manage stored data
and information queries.
Significant direct costs include data capture and processing.
Direct data capture costs are the costs of getting the data into the system in the first place.
Direct data capture methods include:
Use of and processing of data forms, either manually entered into a system, digitalised
forms with pre-set fields, or automated through optical character recognition (OCR).
Use machine-readable bar codes, QR codes, or RFID tags.
Artificial intelligence (AI) or algorithmic processing and codifying (adding associated
tags) of non-standardised data captures, such as faces or physical items.
Other machines, workforce, and systems for large-scale data capture, such as measuring
devices and associated systems, laboratory work, sensors on the manufacturing line,
survey drones, etc.
Indirect Data Capture Costs
Indirect data might have a transactional cost, meaning it was purchased directly from another
entity, such as a research firm or university.
There might also be costs in verifying the reliability of the data, such as establishing sources,
cross-referencing with data of known veracity, etc.
Similarly, to direct data, indirect data may need to be processed into an appropriate format and
structure before it can be used; this process may also incur costs.
Overview of Big Data
Big data – Collections of data that increase exponentially over time, with too much volume,
variety, and velocity for traditional data-processing methods to analyse effectively.
Aspects of Big Data:
Aspect Description
Volume The volume of data being captured from transactions, social media, customer
relationship management systems and sensors has exploded in recent years and
continues to do so.
Velocity The speed or velocity at which data is being streamed into systems and organisations
is also increasing rapidly. To be useful, this data needs to be captured and analysed
in an efficient and timely manner.
Variety Variety relates to the many types and forms of data collected and generated.
Structured data refers to data held in defined file structures, for example, a
transaction file.
Unstructured data includes images, audio and video files, and ‘free text’ in social
media posts and emails.
Veracit Data quality relating to accuracy and truthfulness is essential for effective decision-
y making.
Practical analysis to provide valuable findings can only be done if the data collected is
true; this includes considerations on the reliability of the data source.
Value The benefit of having data for the organisation must be higher than the cost of
obtaining it.
Match the data description to the correct aspect of Big Data:
Description Aspect
Data is too large to be analysed using a spreadsheet Volume
Data can be processed in real-time Velocity
Data includes text, photographs, emojis, videos Variety
Big data selected from a population of items may be inaccurate Veracity
Big data is only meaningful when analysed to form insights Value
Structured and Unstructured Data
Structured data: This data is stored within defined fields (numerical, text, date etc.),
often with defined lengths, within a defined record, in a file of similar records.
An example of structured data is found in banking systems, which record the receipts
and payments from your current account: date, amount, receipt/payment, and short
explanations such as payee or source of the money.
Structured data is easily accessible by well-established database-structured query
languages.
Unstructured data refers to information that does not have a pre-defined data model.
It comes in all shapes and sizes, and this variety and irregularity make it challenging to
store in a way that will allow it to be analysed, searched or otherwise used.
An often-quoted statistic is that 80% of business data is unstructured, residing in word
processor documents, spreadsheets, PowerPoint files, audio, video, social media
interactions and map data.
Use/purpose of Big Data by Organisations:
Data analytics – The process of deriving meaning from data.
Data analytics is the collection and analysis of data to find patterns and draw conclusions. The
more data is available, the better the resulting analysis and findings.
Data analytics can be significantly enhanced using artificial intelligence approaches such as
machine learning. Machine learning is an approach whereby developers build a programme
that creates a model based on a set of initial data, and the model can then be used to, for
example, make predictions such as customer demand for Product X.
Organisational Strategy - Big data can be used to simulate many different scenarios rapidly and
relatively cheaply (compared to the time and effort involved in human-developed scenarios).
This method can determine the likely best markets and approaches to meet growth and
profitability ambitions for profit-focused businesses. It could predict the impact of climate-
related events and enable governments and charities to provide relief where and when needed.
It can also be used more individually, for example, to provide medical diagnoses.
Customer Satisfaction and Forecasting Demand - Big data analysis allows organisations to
understand their stakeholders’ preferences and needs. For example, customers can be sent
shopping offers tailored to their habits and needs. Big data from sources such as visitor website
numbers, customer surveys and social media posts can be combined with data relating to the
economic climate and competition to spot trends, predict customer demand, and prevent
issues from arising.
Problems with Big Data
Big data veracity: Data needs to be relevant and trustworthy, but big data can sometimes
be imprecise, inconsistent or biased. The sources of big data include social media and the
internet.
Big data technology: The technology is constantly evolving, and it can sometimes be
challenging to implement and costly.
Big data analysts: Businesses need suitably skilled data analysts, which are currently in
short supply.
Security and privacy: Security and privacy laws vary across the world. Businesses must
ensure they are not violating regulations when they source and store data.
Sampling and Expected Values
Sampling
It is common to collect data from a sample rather than from the whole population.
Data from the sample are used as representative of the whole population.
A sample is a portion of the entire population of interest.
Population – The set of items from which a sample is drawn to form conclusions.
Sample – A subset of items selected from a population which is analysed to form conclusions
about the population.
Discrete and Continuous data
Discrete data is non-continuous data. Discrete data can only take certain values for example
the number of students taking a course (there wouldn’t be half a student). Discrete data is
counted.
Continuous data is unbroken data that has no gaps. Continuous data can take on any value
(within a range) for example time or distance. Continuous data is measured.
Sampling methods
1. random sampling
Every item in the population has an equal chance of being selected
2. systematic sampling (quasi-random)
Systematic sampling is a technique that involves selecting every nth item after the first item,
which is selected randomly.
Select (for example) every 10th item in the population
3. stratified sampling (quasi-random)
Split the population into groups, and then select at random. For example, if 60% of the
population are women and 40% are men, then 60% of the sample should be women and
40% men.
Stratified sampling is used when the population is divided into different strata or groups. A
random sample is then taken from each group. It is a good technique when there are
apparent groups within a population.
Using a stratified sample reduces the chance of accidentally putting together a sample that
is not representative of the population as a whole.
4. multistage sampling (quasi-random)
Sometimes a population may be too large to be sampled using the methods discussed so far
because it will not be possible to draw up a sampling frame of all the items in the
population.
Multistage sampling is a sampling technique that involves dividing a large population (such
as a country) into different areas. First, a set of regions is selected randomly, and each
chosen location is divided into further sub-areas. A random sample of these is then
selected.
For example, suppose a company has several thousand purchase invoices filed, filling 20
files. Take a random sample of (say) 5 files, and then a random sample of (say) 20 invoices
from each of these files.
5. cluster sampling (not random)
Cluster sampling is a technique that involves dividing the total population into small groups
(or clusters) and then randomly selecting one cluster and interviewing the entire population
of the chosen cluster.
For example, suppose a company has 100 offices through the country, each issuing sales
invoices. Take a random sample of (say) 5 offices and check every invoice at each of these
offices.
6. quota sampling (not random)
Quota sampling is a technique that involves dividing the population into different groups
with interviewers and then questioning a particular proportion or quota of people in each
group. It is common for the interviewers to decide how to select the sample from each
group.
Suppose the population is 60% women and 40% men, and that we want to question a
sample of 200 total. Decide on a quota of 120 women (60%) and 80 men (40%) and then
stop people as they appear until we have the required number of each.
Determine the sampling technique described:
Sampling
Statement
method
Every 15th child in a school is selected for an interview after the first child is chosen Systematic
randomly. sampling
1,000 people are each given a number 1 – 1,000, and then raffle tickets are drawn Random
from a box to select prize winners. sampling
Every patient in a hospital is questioned about the treatment they received in a Cluster
hospital that was selected randomly. sampling
An interviewer is asked to question the following: ‘5 men in Supermarket A’, 10 Quota
Women in Supermarket B’, 5 Women in Supermarket C’ and ’10 men in Supermarket sampling
D’.
Probability
Probability – The likelihood of an event, quantified between 0 (certainty event will not happen)
and 1 (certainty event will happen).
A probability of 0 means there is no chance of the event occurring, i.e. it is impossible.
A probability of 1 means the event will occur with certainty.
A probability of 0.5 means a 50% chance of an event occurring. For example, two possible
events may occur when a coin is tossed. The coin may land heads up, or it may land tails up.
The probability of heads up is 1 out of 2, or ½ or 50%.
Assigning Probabilities to Events
Probabilities are often estimated by analysing experience or by doing research. In simple
situations, the probability (p) of an event occurring is:
P = Number of events / Number of possible outcomes
Example 1: A box of coloured pens containing 3 red pens, 2 blue pens and 1 yellow pen.
If a pen is picked randomly, what is the probability that it will be a blue pen?
Answer:
P(blue pen) = Number of blue pens / Number of pens
= 2/6
= 1/3 or 0.33 (to two decimal places)
The probability of picking a red pen = 3/6 = 1/2
The probability of picking a yellow pen is 1/6.
Note that the sum of the probabilities adds up to 1. (1/3 + ½ + 1/6 = 1).
Expected Values
Modern businesses operate in situations of uncertainty, with much data being estimated based
on various conditions. Data such as future sales demand and costs cannot be accurately
predicted. Sometimes it is possible to assign probabilities to the likelihood of events. Once this
has been done, an expected value can be calculated.
Expected value – Weighted arithmetic mean of possible outcomes.
The expected value is calculated by multiplying each outcome by the probability of that
outcome and adding up the total.
Generally, the decision rule would be to choose the outcome with the highest EV.
Example 2: Research has estimated the following weekly sales volumes and their associated
probabilities:
Sales volume (units) Probability
1,000 0.2
5,000 0.5
7,000 0.3
What is the expected value of sales volume for the week?
Answer:
Sales volume
Probability Sales volume x probability
(units)
1,000 0.2 200
5,000 0.5 2,500
7,000 0.3 2,100
Expected value 4,800
The expected sales volume is 4,800 units. Management can use this information to make
decisions. For example, if the cost card showed that this product would only make a profit if a
minimum of 5,000 units were sold, producing and selling this product would not be financially
worthwhile.
Example 3:
A business is deciding whether or not to buy a new machine. The additional profits provided by
the machine per year and their associated probabilities are as follows:
Additional profit Probability
$2,000 0.35
$5,000 0.55
$6,000 0.10
What is the expected additional profit per year?
Solution
CHAPTER 2 QUESTIONS
1. Which of the following is/are primary sources of data?
i. Historical records of transport costs to be used to prepare forecasts for budgetary
planning
ii. The Annual Abstract of Statistics, published by the Office for National Statistics in the
United Kingdom
iii. Data collected by a bank in a telephone survey to monitor the effectiveness of the
bank's customer services
A. (i) and (ii)
B. (i) and (iii)
C. (i) only
D. (iii) only (2 marks)
2. The following statements relate to different types of data:
i. Secondary data are data collected especially for a specific purpose
ii. Discrete data can take on any value
iii. Qualitative data are data that cannot be measured
iv. Population data are data arising as a result of investigating a group of people or objects
Which of the statements are true?
A. (i) and (ii) only
B. (ii) and (iii) only
C. (ii) and (iv) only
D. (iii) and (iv) only (2 marks)
3. Which of the following statements are false?
i. If a sample is selected using random sampling, it will be free from bias.
ii. A sampling frame is a numbered list of all items in a sample.
iii. In cluster sampling there is very little potential for bias.
iv. In quota sampling, investigators are told to interview all the people they meet up to a
certain quota.
A. (i), (ii), (iii) and (iv)
B. (i), (ii) and (iii)
C. (ii) and (iii)
D. (ii) only (2 marks)
4. Government statistics can be a useful source of data and information. Which one of the
following types of data is most likely to be obtained from government statistics?
A. Foreign exchange rates
B. Population data
C. Details of industry costs
D. Interest rates (2 marks)
5. Which of the following explains the essence of quota sampling?
A. Each element of the population has an equal chance of being chosen
B. Every nth member of the population is selected
C. Every element of one definable sub-section of the population is selected
D. None of the above (2 marks)