0% found this document useful (0 votes)
86 views15 pages

Fraud Detection Techniques Overview

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views15 pages

Fraud Detection Techniques Overview

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

FRAUD DETECION: A REVIEW

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING,


KALASALINGAM ACADEMY OF REASEARCH AND EDUCATION-
KRISHNAN KOVIL, TAMIL NADU-626126

TEAM MEMBERS:

99210041475 – M. ASRITH
9921004028 – A. HEMANTH KUMAR REDDY
9921004549 – P. VIJAY KUMAR
9921004544 – P. VINAY KUMAR
99210041262 - P. NAVADEEP KUMAR
ABSTRACT: With the development of mankind. It can exist in an infinite number of
contemporary technology and the global distinct forms, including itself. But, the advent
superhighways of communication, fraud is of new technology in recent years (which have
rising exponentially, costing the world improved our ability to communicate and
billions of dollars annually. nevertheless, increased our purchasing power) has also
preventative. Although technology are the opened up new opportunities for fraud by
best means to decrease fraud, cheats are criminals. New types of fraud, such mobile
usually able to get around them over time telecommunications fraud and computer
since they are intelligent and adaptable. If we intrusion, have joined more established forms
want to apprehend fraudsters after fraud of fraud like money laundering in becoming
prevention has failed, we need easier to commit.
methodologies for fraud detection. A few
We begin by distinguishing between fraud
examples of actions that statistics and
prevention and fraud detection. Fraud
machine learning have been effectively used
prevention describes measures to stop fraud
to identify include money laundering, credit
from occurring in the first place. These include
card fraud in e-commerce,
elaborate designs, fluorescent fibers,
telecommunications fraud, and computer
multitone drawings, watermarks, laminated
penetration. We outline the techniques for
metal strips and holographs on banknotes,
statistical fraud detection as well as the
personal identification numbers for bankcards,
industries that employ fraud detection
Internet security systems for credit card
technologies the most.
transactions, Subscriber Identity Module (SIM)
KEYWORDS: Money laundering, computer cards for mobile phones, and passwords on
infiltration, credit cards, statistics, machine computer systems and telephone bank
learning, and telecommunications are all accounts. Of course, none of these methods is
examples of fraud. perfect and, in general, a compromise has to
be struck between expense and
INTRODUCTION: Fraud is described as
inconvenience (e.g., to a customer) on the one
"criminal deception; the use of false
hand, and effectiveness on the other.
statements to gain an unfair advantage" in the
Concise Oxford Dictionary. Fraud is as old as
On the other hand, fraud detection entails transactions or calls since it requires more
spotting fraud as soon as it has been than just a fresh statistical model. These
committed.After fraud prevention fails, fraud figures also highlight the potential worth of
detection is used. In reality, fraud detection fraud detection: if 0.1% of a business's 100
must be employed constantly because it is million transactions are fraudulent and each
common to be unaware that fraud prevention costs it £10, the company will lose £1 million
efforts have failed. We can strive to prevent in total.
credit card theft by vigilantly guarding our
As data from various applications might vary
cards, but if the card's information is
in quantity and type, statistical approaches for
nonetheless compromised, we must be able
fraud detection are numerous and diverse,
to identify the fraud as soon as feasible.
however there are recurring themes. These
Fraud detection is a field that is always tools generally compare observed data with
changing. When one method of detection is expected values, however depending on the
known to be in use, criminals will modify their situation, expected values can be calculated in
tactics and attempt others. Of all, there are a variety of ways. These may be simple
always new criminals entering the scene. graphical summaries with an obvious
Many of them will employ tactics that result in abnormality or single numerical summaries of
recognisable scams since they are unaware of a particular component of behaviour, but they
the fraud detection techniques that have been also frequently contain more intricate
effective in the past. This indicates that both behaviour profiles. Such behaviour profiles
the most recent advancements and the older may be derived from other comparable
detecting tools must be used. systems or based on prior behaviour of the
system under study. The fact that an actor
The extremely constrained flow of ideas in
may occasionally engage in dishonest
fraud detection makes it more challenging to
behaviour in some contexts (such as stock
develop new fraud detection techniques. In-
market trading) and not at other times further
depth explanations of fraud detection
complicates matters.
methods should not be made available to the
general public since doing so would provide Methods for statistically detecting fraud may
criminals with the knowledge they need to be supervised or unsupervised. Using
avoid detection. Results are frequently examples of both fraudulent and
censored and data sets are not made public, nonfraudulent records, supervised approaches
making it difficult to evaluate them. build models that let one categorise fresh
observations into one of the two classes.
Several issues with fraud detection require
Naturally, this calls for confidence in the actual
sizable data sets that are continually changing.
classes of the original data that was used to
Examples include the credit card company
create the models. Moreover, it is necessary
Barclaycard, which processes about 350
to have examples of both classes. Also, it can
million transactions annually in the United
only be used to identify frauds of the kind that
Kingdom alone (Hand, Blunt, Kelly, and
have already happened.
Adams, 2000), The Royal Bank of Scotland,
which has the largest credit card merchant Unsupervised approaches, in contrast, just
acquiring business in Europe, processes over a look for the accounts, clients, and so forth
billion transactions annually, and AT&T, which that deviate the greatest from the norm.
processes about 275 million calls every Thereafter, these can be looked at in further
workday (Cortes and Pregibon, 1998). Data detail. An elementary type of nonstandard
mining techniques are relevant for processing observation are outliers. Techniques for
large data sets in the hunt for fraudulent assessing data quality can be utilised, but
spotting unintentional mistakes is a very results in A more general conclusion is that
different issue than spotting intentionally false fraud can be minimised to any degree, but
data or data that precisely depicts a only with matching expense and effort. In
fraudulent pattern. actuality, a tradeoff between the expense of
fraud detection and the potential savings must
This brings up the fundamental fact that,
be struck; this decision is frequently a
based solely on statistical research, we can
commercial one. Sometimes the problems are
rarely be convinced that a fraud has been
made more difficult by things like the negative
committed. Instead, the analysis should be
press that comes along with fraud detection.
seen as a warning that an observation is out of
Even if substantial fraud has been discovered,
the ordinary or more likely to be false than
announcing that a bank is a large target does
others, allowing us to further analyse it. The
not instill trust in the public, and acting in a
goal of the statistical analysis can be thought
way that suggests to a client who is not guilty
of as returning a score of suspicion (where we
of fraud that they may be under suspicion is
will see a greater score as more suspicious
clearly bad for business.
than a lower one). The observation is more
uncommon or more similar to previously This paper's body is organised into sections
fabricated values as the score rises. Because that correspond to various aspects of fraud
there are numerous ways that fraud can be detection. It is obvious that we cannot
committed and numerous situations in which anticipate to include all possible applications
it can take place, there are numerous ways to for statistical approaches. Instead, we have
calculate suspicion scores. chosen a small number of fields where such
techniques are employed and where there is a
For each record in the database (each client
body of knowledge and literature that
with a bank account or credit card, each
describes them. Before delving into the
owner of a mobile phone, each user of a
specifics of several application areas, Part 2
desktop computer, etc.), suspicion ratings can
offers a quick rundown of a few fraud
be calculated and updated over time.
detection technologies.
Investigative attention can then be directed
towards individuals with the greatest scores or TOOLS USED FOR FRAUD DETECTION:
those who show a dramatic spike after these
Fraud detection may be monitored or
scores have been ranked. Now, the problem of
unsupervised, as we just mentioned. With
cost comes into play: because it is too
supervised approaches, a model that
expensive to do a thorough study of all
generates a suspicion score for new cases is
records, one focuses investigation on those
built using a database of existing
believed to be fraudulent the most.
fraudulent/legitimate cases. However, more
The fact that there are generally several potent tools, particularly neural networks,
authentic records for every fake one makes have also been widely used (Ripley, 1996;
fraud detection challenging. A highly effective Hand, 1997; Webb, 1999). Traditional
system may be one that accurately recognises statistical classification methods, such as
99% of the valid records as legitimate and 99% logistic discrimination and linear discriminant
of the fraudulent records as fraudulent. Yet, if analysis, have proven to be effective tools for
only one out of every thousand records is many applications (Hand, 1981; McLachlan,
fraudulent, only around nine of the 100 1992). Rules of the type If [certain
records that the algorithm labels as fraudulent circumstances], Then [a consequent] are used
will actually be so. This specifically means that in rule-based approaches, which are
finding those 9 requires a thorough review of supervised learning algorithms that create
all 100—possibly at a significant expense. This classifiers. Examples of such algorithms
include RIPPER, FOIL, and BAYES (Clark and approached similarly in the past (Senator et
Niblett, 1989). (Cohen, 1995). Similar al., 1995; Goldberg and Senator, 1995, 1998).
classifiers are produced by tree-based
When there are no prior sets of reliable and
algorithms as C4.5 (Quinlan, 1993) and CART
fraudulent observations, unsupervised
(Breiman, Friedman, Olshen, and Stone,
approaches are used. Here, techniques are
1984). To improve prediction in fraud
typically a mix of outlier detection and
detection, meta-learning algorithms can be
profiling techniques. In order to identify data
used to combine some or all of these
that deviate the most from the baseline
techniques (e.g., Chan, Fan, Prodromidis and
distribution of expected behaviour, we first
Stolfo, 1999).
model the baseline distribution. There are
Uneven class sizes and the various costs similarities between text analysis and author
associated with various types of identification. One example of such a
misclassifications are important factors to take technique is the use of Benford's law in digit
into account while developing a supervised analysis. According to Benford's law (Hill,
tool for fraud detection. The expense of 1995), numbers selected from a wide range of
looking into observations and the advantages random distributions will (asymptotically)
of finding fraud must also be taken into have a particular distribution of their first
account. Moreover, class membership is significant digits. This law was formerly
frequently ambiguous. Credit transactions, for thought of as purely a mathematical curiosity
instance, may be wrongly classified: a with no apparent practical applications.
fraudulent transaction may be undetected and Benford's law can, however, be used to spot
be classified as legal (the magnitude of which fraud in accounting data, as demonstrated by
may be unknown), or a legitimate transaction Nigrini and Mittermaier (1997) and Nigrini
may be reported as fraudulently. (1999). Benford's law and other fraud
Misclassification of training samples has been detection techniques are based on the idea
addressed in several studies (e.g., that it is challenging to fabricate data that
Lachenbruch, 1966, 1974; Chhikara and follows it.
McKeon, 1984), but not, as far as we are
Because fraudsters change how they prevent
aware, in the context of fraud detection.
and detect fraud, fraud detection must also
These and other issues were covered by Chan
adapt and change with time. Yet, since
and Stolfo (1998) as well as Provost and
legitimate account users may progressively
Fawcett (2001).
alter their behaviour over time, it's crucial to
Known fraudsters are linked to other people prevent phoney alarms. It is possible for
through social network and record linkage models to be updated periodically or
techniques in link analysis (Wasserman and constantly throughout time; for examples, see
Faust, 1994). For instance, security experts Burge and Shawe-Taylor (1997), Fawcett and
have discovered that fraudsters rarely operate Provost (1997a), Cortes, Pregibon and
independently from one another in Volinsky (2001), and Senator (2000).
telecommunications networks. Moreover, the
Although the fundamental statistical models
scammer frequently calls the same numbers
for fraud detection can be divided into
from another account after a fraudulent
supervised and unsupervised categories, it is
account has been disconnected (Cortes,
more difficult to identify the applications of
Pregibon, and Volinsky, 2001). Hence, phone
fraud detection. Their unique operational
calls from an account that are associated with
characteristics and the variety and volume of
fraudulent accounts might be used to denote
data accessible, both of which influence the
infiltration. Money laundering has been
selection of an appropriate fraud detection from the card issuer, merchants typically take
method, represent their diversity. responsibility for fraud losses due to the
possibility of lost sales as a result of
CREDIT CARD FRAUD:
diminished confidence.
It is challenging to determine the level of
A summary of the credit card industry's
credit card fraud, in part because businesses
operations is provided in Blunt and Hand
are frequently reluctant to release fraud
(2000). Examples of credit card fraud include
statistics for fear of alarmed consumers, and
simple theft, application fraud, and the use of
in part because the numbers fluctuate (and
fake cards.All of these include the usage of a
likely increase) over time. Many projections
physical card, but it is not necessary to have
have been made. For instance, Leonard (1993)
one in order to commit credit card fraud; one
estimated that the cost of Visa/Mastercard
of the main types is "cardholder-not-present"
fraud in Canada was $19, 29, and 46 million
fraud, in which just the card's data are
(Canadian) in 1989, 1990, and 1991,
provided (e.g., over the phone).
respectively. Aleskerov, Freisleben, and Rao
(1997) quoted figures of $700 million in the The easiest kind of credit card fraud may be
United States per year for Visa/Mastercard the use of a stolen card. Prior to the crime
and $10 billion globally in 1996. Ghosh and being discovered and the card being cancelled
Reilly (1994) suggested a figure of $850 in this instance, the fraudster often spends as
million (U.S.) per year for all sorts of credit much as they can in the shortest amount of
card fraud in the United States. Expedia, a time feasible, can stop significant losses.
Microsoft company, put aside $6 million in
When people use fraudulent personal
1999 for credit card fraud (Patient, 2000).
information to apply for new credit cards from
Over the past four years, there has been a
issuing organisations, application fraud occurs.
sharp increase in the total losses due to credit
Customers that are likely to default are
card fraud in the UK [£122 million in 1997;
identified using traditional credit scorecards
£135 million in 1998; £188 million in 1999;
(Hand and Henley, 1997), and one of the
and £293 million in 2000]. Association for
causes of this could be fraud. These
Payment Clearing Services, London (APACS) is
scorecards are based on the information
the source, and most recently APACS recorded
provided on the application forms, as well as
losses of £373.7 million for the fiscal year that
maybe other information like bureau
ended in August 2001.According to Jenkins
information. To identify cards obtained
(2000), 13p is lost to fraudsters for every £100
through a fraudulent application, statistical
you spend on a card in the UK. Questions over
models that track behaviour over time can be
precisely what to include in the fraud
utilised (for instance, a first-time cardholder
estimates aggravate the situation. These can,
who quickly makes numerous purchases and
however, be significant: Ghosh and Reilly
runs out of funds might raise suspicions). But,
(1994) cited a $2.65 billion estimate for
with application fraud, immediacy is less
bankruptcy fraud in 1992.
crucial to the fraudster, and it may take some
It is in the best interest of a business and the time before suspicion of fraud is raised before
card issuer to prevent fraud or, if that is not accounts are sent or repayment dates start to
possible, to identify it as quickly as possible. approach.
Otherwise, in addition to the immediate
When a transaction is completed remotely,
losses brought on by fraudulent sales,
only the card's information is required; a
consumer confidence in the card and the
manual signature and card imprint are not
company declines, and revenue is lost. Even
necessary at the time of purchase. This is
though the vendor has received authorisation
known as cardholder-not-present fraud. Hand and Blunt (2001) provides a
Online and telephone sales fall within this straightforward illustration of the patterns
category, and a significant share of losses are that certain customers display. It
attributable to this kind of fraud. It is demonstrates how the slopes of cumulative
necessary to collect card information without credit card spending over time are remarkably
the cardholder's knowledge in order to linear. Investigation is warranted if there are
commit such fraud. This is done in a number any abrupt jumps in these curves or changes
of ways, such as "skimming," in which in slope (such as a transaction or expenditure
employees stealthily copy a credit card's rate abruptly exceeding a threshold). Similar
magnetic strip using a small handheld card to "jam jarring," some consumers limit certain
reader, "shoulder surfers," who enter card cards to specific types of purchases (e.g., using
information into a mobile phone while one card just for gas purchases and a different
standing behind a customer in a line, and one for grocery purchases), so that using a
fraudsters who pose as credit card companies. card to make an unusual type of purchase can
Employees of the card firms obtaining startle such clients. Suspicion ratings may also
information about credit card transactions be based on anticipated overall usage profiles
from businesses over the phone. These details at a more general level. In contrast to people
can also be used to produce counterfeit cards, who are moving loans from another card, first-
which are presently the main cause of credit time credit card users are frequently initially
card fraud in the United Kingdom (source: somewhat hesitant in their utilisation. Last but
APACS). By techniques that look for changes in not least, instances of general transaction
transaction patterns and looking for certain patterns that are recognised to be inherently
patterns that are known to be suggestive of suspect include the sudden purchase of
counterfeiting, transactions conducted by several minor electronic goods or jewellery
fraudsters using fake cards and making (things that allow for simple black market
cardholder-not-present purchases can be resale) and the immediate use of a new card
discovered. in a variety of various settings.

Information about each transaction can be We noted above that there is a paucity of
found in credit card databases. This data published material on fraud detection for
consists of items like the merchant code, obvious reasons. A large portion of the
account number, credit card type, purchase information that has been published is found
type, client name, transaction quantity, and in the methodological data analytic literature,
transaction date. This data can in a variety of where the goal is to illustrate new data
formats, including numerical (such as analytic tools by using them to identify fraud
transaction size) and symbolic (such as rather than to detail specific fraud detection
merchant code, which might include hundreds techniques. Furthermore, a large portion of
of thousands of categories). Due to the the published literature in the field focuses on
heterogeneous data types, numerous supervised classification techniques because
statistical, machine learning, and data mining anomaly detection approaches are highly
technologies have been used. context dependent. Interest has been drawn
mostly to rule-based systems and neural
Suspicion scores, which are used to determine
networks. Researchers who have \sused
whether an account has been compromised,
neural networks for credit card fraud
can be based on models of past usage
detection \sinclude Ghosh and Reilly (1994),
patterns of specific customers, typical
Aleskerov et al. \s(1997), Dorronsoro, Ginel,
expected usage patterns, specific behaviours
Sanchez and Cruz (1997) \sand Brause,
that are known to frequently be linked to
Langsdorf and Hepp (1999), especially in the \
fraud, and supervised models. Figure 16 of
scontext of supervised categorization. HNC MONEY LAUNDERING:
Software has \sdeveloped Falcon, a software
The act of hiding the source, ownership, or
package that relies heavily on neural network
usage of money—typically cash—that is
technology to detect credit card \sfraud.
obtained through illegal conduct is known as
The issue of uneven class sizes is a concern money laundering. The scope of the issue is
with supervised approaches since the legal described in a 1995 study by the US Office of
transactions typically exceed the fraudulent Technology Assessment (OTA) (US Congress,
ones. These methods use samples from the 1995): "Federal agencies predict that up to
fraudulent/nonfraudulent classes as the $300 billion in illicit funds are laundered
foundation to develop classification rules to annually, globally. Profits from drugs made in
detect future cases of fraud. According to the United States could account for between
Brause, Langsdorf, and Hepp (1999), "the $40 billion and $80 billion of this. The use of
chance of fraud is quite low (0.2%) and has encryption has recently been the subject of
been lowered in a pre-processing step by a intense discussion. Prevention is tried through
standard fraud detecting system down to the use of legal restrictions and obligations,
0.1%" in their database of credit card the burden of which is progressively growing.
transactions. According to Hassibi (2000), "of Nevertheless, there is no foolproof
the roughly 12 billion transactions completed preventative method, thus discovery is crucial.
each year, about 10 million—or one out of
The OTA study states that each day in 1995,
every 1200 transactions—turn out to be
more than $2 trillion (U.S.) worth of wire
fraudulent. Also, 4.0% of all monthly active
transfers were made utilising the Fedwire,
accounts—or 4 out of every 10,000—are
CHIPS, and about a half million transfers using
bogus. With a bad rate of 0.1%, identifying
the SWIFT networks. This makes wire transfers
every transaction as legal will result in an error
an ideal arena for money laundering. Around
rate of only 0.001, which means that the basic
0.05-0.1% of these transactions were thought
misclassification rate cannot be utilised as a
to involve laundering. To find such laundering
performance indicator. Instead, one must
activity, sophisticated statistical and other
either fix a parameter (such as the number of
online data analytic processes are required.
instances one can afford to investigate in
We can anticipate seeing even more usage of
depth) or minimise a suitable cost-weighted
such tools as it is now required by law to
loss before attempting to maximise the
demonstrate that every reasonable method
number of fraudulent cases discovered within
has been employed to uncover fraud.
the limits.
Money laundering detection and prevention
A meta-classifier system for identifying credit
go hand in hand, just like in other fraud-
card fraud was described by Stolfo et al.
related situations. For instance, the Bank
(1997a, b). It is based on the idea of
Secrecy Act of 1970 in the United
employing several local fraud detection tools
within various corporate environments and
combining the findings to produce a more
accurate worldwide tool. In Chan and Stolfo The law required banks to report to the
(1998), Chan, Fan, Prodromidis, and Stolfo authorities any currency transactions worth
(1999), and Stolfo et al. (1999), this work was more than $10,000. However, just like in other
expanded upon, and they provided a more fraud schemes, the perpetrators modify their
accurate cost model to go along with the methods to correspond with the shifting
various classification outcomes. Moreover, strategies of the authorities. As a result, the
Wheeler and Aitken (2000) investigated the obvious strategy of splitting larger sums into
fusion of many classification criteria. multiple amounts of less than $10,000 and
depositing them in different banks was statistical analysis of trade databases, Pak and
developed. This technique is known as Zdanowicz (1994) described how to find
smurfing or structuring. Although this is now anomalies in government trade statistics, such
prohibited in the United States, the way that as the drug erythromycin costing $1694 per
money launderers adapt to the current gramme for imports but only $0.08 per
detection techniques can lead to according to gramme for exports.
a pessimistic viewpoint, only the incompetent
Layering: executing numerous transactions
Cash-launderers are found. This, obviously,
through numerous accounts at various
also reduces the value of supervised detection
financial institutions with various owners
methods because the patterns that are
while remaining within the legal financial
detected will be those that have historically
system.
been indicative of fraud but may no longer be.
Other tactics employed by money launderers Integration: combining the funds with cash
that reduce the value of supervised methods earned through legal means.
include alternating between wire and actual
cash movements, setting up front companies, Different levels can be the focus of detection
fabricating invoices, and, of course, the fact strategies. It is generally very challenging or
that a single transfer is unlikely to appear to impossible to label a specific transaction as
be a money laundering transaction in and of fraudulent (as is the case in some other
itself. Due to the significant sums at stake, contexts where fraud is practised). Instead, it
money launderers are also highly skilled and is necessary to identify transaction patterns
frequently have connections in the banking that are fraudulent or suspicious. A single
industry who can provide information about deposit of under $10,000 is not unusual, but
the detection techniques being used. several of them are; a large deposit is not
unusual, but a large deposit and an immediate
After the mid-1980s, a significant increase in withdrawal are. In fact, there are several levels
currency transactions over $10,000 in value of (potential) analysis that can be
led to a huge increase in the number of distinguished: the level of a single transaction,
reports filed (over 10 million in 1994, with a the levelof an account, the level of a business
total value of roughly $500 billion), which can (and, in fact, a person may have multiple
be problematic in and of itself. accounts), and the level of a "ring" of
businesses. While analyses can focus on
To combat this, the U.S. Department of the
specific levels, more advanced methods can
Treasury's Financial Crimes Enforcement
simultaneously examine multiple levels.
Network (FinCEN) processes all reports using
(There is a speech recognition system analogy
the below-mentioned FinCEN artificial
here; simple systems focused at the individual
intelligence system (FAIS). In general, banks
phoneme and word levels are not as effective
are required to report any suspicious
as those that try to recognise these elements
transactions, and this is the case for about
in a higher-level context of how words are put
0.5% of all currency transaction reports.
together when used.) Most money laundering
Money laundering involves three steps: detection strategies rely heavily on link
analysis, which identifies groups of
Placement: the entry of the money into the
participants in transactions. According to
banking system or a legitimate business (like
Senator et al. (1995), "Money laundering
turning cash from a drug store purchase into a
typically involves a multitude of transactions
cashier's check). Paying wildly inflated prices
into multiple accounts with different owners
for goods imported across international
at different banks and other financial
borders is one way to achieve this. Using
institutions, possibly by distinct individuals.
The ability to reconstruct these patterns of using this system. It is designed with a
transactions by connecting potentially related "blackboard" architecture that allows
transactions and then separating the programme modules to read from and write
legitimate sets of transactions from the to a main database that holds information
illegitimate ones is necessary for the detection about transactions, subjects, and accounts.
of large-scale money laundering schemes. The The system's suspicion score is a crucial part.
main analytical method used in law This rule-based system was created by the
enforcement intelligence is a technique U.S. Customs Service in the middle of the
known as link analysis that identifies 1980s. For various different types of
relationships between different pieces of transaction and activity, the system calculates
information (Andrews and Peterson, 1990). suspicion scores. Combining evidence that
"The possibility that a transaction with a suggests that a transaction or activity is
known criminal may raise suspicion is an fraudulent is done using simple Bayesian
obvious and simplistic illustration. The updating. In 1995, Senator et al. included a
recognition of the types of businesses that succinct but intriguing discussion of a study to
money laundering operations deal with forms determine whether case-based reasoning
the foundation of more subtle methods. (cf.242 Techniques like classification trees and
These are all supervised methods, so they are R. J. BOLTON AND D. J. HAND neighbour
all susceptible to any weaknesses that the methods) could be useful additions to the
perpetrators may develop in the future. As system. An advanced detection system (ADS;
mentioned, similar tools are used to identify Kirkland et al., 1998; Senator, 2000) is used by
telecom fraud. Systems based on rules have the American National Association of
been created, frequently with experience- Securities Dealers, Inc. to identify "patterns or
based rules ("flag transactions"). practises of regulatory concern." Like FAIS,
ADS emphasises visualisation tools and makes
"flag accounts showing a large deposit
use of rule pattern matchers and time-
immediately followed by a similar sized
sequence pattern matchers. Similar to FAIS,
withdrawal"; "flag accounts from countries X
data mining techniques are employed to find
and Y"). Calculating the cumulative sum of
novel patterns that may be of interest.
payments made into an account over a limited
time period, like a day, can be used to identify TELECOMMUNICATIONS FRAUD
structuring. Other techniques, like rate of
With the development of low-cost mobile
transactions and proportion of suspicious
phone technology in recent years, the
transactions, have been developed based on
telecommunications sector has grown
simple descriptive statistics. This concept is
significantly. Mobile phone fraud is expected
expanded upon by the use of the Benford
to increase globally as more people use
distribution. Methods like peer group analysis
mobile phones. The price of this fraud has
(Bolton and Hand, 2001) and break detection
been estimated in a number of ways. Cox,
(Goldberg and Senator, 1997) can be used to
Eick, Wills, and Brachman (1997), for instance,
detect money laundering, even though one
provided a figure of $1 billion annually.
may not typically be interested in spotting
According to Telecom and Network Security
changes in an account's behaviour. The U.S.
Review [4(5) April 1997], fraud cost the U.S.
Financial Crimes Enforcement Network AI
telecom industry between 4 and 6% of its
system (FAIS), which is described in Senator et
revenue. International statistics, according to
al., is one of the most complex money
Cahill, Lambert, Pinheiro, and Sun (2002), are
laundering detection systems (1995) and
worse, with "several new service providers
Senators Goldberg and (1998). Users can
reporting losses over 20%." According to
follow links between related transactions
Moreau et al. (1996), there are "several
million ECUs per year." Considering the size of between fraud with and without a financial
the other estimates, we wonder if this should gain. The latter's goal is simply to obtain a
be billions. Presumably, this refers to within service for free (or, as with computer hackers,
the European Union. Recent research for example, the simple challenge represented
indicates that "the industry already reports a by the system), whereas the former's goal is to
loss of £13 billion each year due to fraud" make money for the perpetrator. Telecom
(Neural Technologies, 2000). The year 2000's fraud can take many different forms and
Mobile Europe reported a $13 billion figure manifest itself at various levels (see, for
(U.S.). According to the latter article, instance, Shawe-Taylor et al., 2000).
fraudsters are thought to be able to steal up Subscription fraud and superimposed, or
to 5% of some operators' revenues, and some "surfing," fraud are the two most common
predict that within three years, telecom fraud types. Subscription fraud happens when a
will total $28 billion annually. Even though fraudster signs up for a service with the
these numbers vary, it is obvious that they are intention of not paying, frequently using false
all very large. In addition to the fact that they information about their identity. As a result,
There are other causes for the differences all transactions made using this number will
besides the fact that the differences are be fraudulent. This is the equivalent of a
estimates, which means they are subject to phone number. Superimposed fraud is the
expected variability and inaccuracies based on utilisation of a service without the required.
the data used to generate them. The Superimposed fraud can be committed in a
distinction between hard currency and soft number of ways, such as by cloning mobile
currency is one. Hard currency is actual phones and obtaining calling card
money that was used to pay for the service authorization information. At the level of
that was stolen from the offender by a third individual calls, superimposed fraud will
party. The amount that one mobile phone typically occur; the fraudulent calls will be
operator will pay another for the use of their mixed in withthose that are genuine. The
network was used as an example by Hynninen billing process will typically reveal subscription
(2000). The value of the service that was fraud at some point, but since significant costs
stolen is known as soft currency.If one can be quickly accrued, it is best to catch it
assumes that the thief would have, then at much earlier. Superimposed fraud has a long
least some of this is only a loss. The fact that detection window. Following a similar
such estimates are used is another factor distinction in credit card fraud, these two
contributing to the differences. types of fraud are distinguished from one
another. Other types of telecom fraud include
Hynninen (2000) provided examples of
insider fraud, in which employees of telecom
operators who gave estimates on the high side
companies sell information to criminals that
in the hopes of more stringent antifraud
can be used for fraudulent gain, and
legislation and operators who gave estimates
"ghosting" (technology that deceives the
on the low side to boost client confidence.
network to obtain free calls). This is a
The two types of fraud—fraud directed at the
common reason for fraud, regardless of the
service provider and fraud enabled by the
industry. In "tumbling," a form of
service provider—need to be distinguished.
superimposed fraud, cloned handsets are
Selling stolen call time is an example of the
used with rolling fake serial numbers so that
former, and interfering with telephone
subsequent calls can be attributed to various
banking instructions is an example of the
real phones. The odds of being discovered by
latter. (The public is hesitant to use their credit
noticing odd patterns are slim, and the illegal
cards online due to the possibility of the latter
phone will continue to function until all of the
type of fraud.) Additionally, we can distinguish
assumed identities have been discovered.
Sometimes, the phrase "spoofing" is used to changes has been the focus of some research
describe. (see, for instance, Fawcett and Provost, 1997).
The fact that signatures and thresholds may
Data mining techniques are especially
need to vary depending on the time of day,
important in the telecommunications industry
the type of account, and other factors, and
because these networks generate enormous
that they will likely need to be updated over
amounts of data, sometimes on the order of
time, is a general complexity. Although more
several gigabytes per day. the AT&T 1998
research is required in this area, Cahill et al.
database, for instance, processed 275 million
(2002) suggested excluding the extremely
call records daily and had 350 million profiles
suspicious scores from this updating
(Cortes and Pregibon, 1998).
process.Neural networks have once more
Similar to other fraud domains, methods for been extensively employed. The Fraud
detection rely on outlier detection and Solutions Unit of Nortel Networks' main fraud
supervised classification, either through the detection software combines neural networks
use of rule-based approaches or through the and profiling (Nortel, 2000). The European
comparison of statistically derived suspicion Commission, Vodaphone, other European
scores with some threshold. Simple rule- telecom companies, and academics
based detection systems work at a basic level collaborated on the ASPeCT project, which
by using rules like the apparent use of the resulted in the development of a combined
same phone in two consecutively distant rule-based profiling and neural network
geographic locations, calls that appear to approach (Moreau et al., 1996; Shawe-Taylor
overlap in time, and very expensive and et al., 2000). Described by Taniguchi, Haft,
lengthy calls. On a more advanced level, Hollmén, and Tresp (1998).
statistical summaries of call distributions—
According to Cortes, Pregibon, and Volinsky
often referred to as profiles or signatures at
(2001), link analysis creates "communities of
the user level—are contrasted with thresholds
interest" that can identify fraudster networks.
established by experts or by using supervised
These techniques are founded on observation
learning techniques on cases of known fraud
that although fraudsters frequently
or nonfraud. Rosset and others and Murad
collaborate with one another, they rarely alter
and Pinkas (1999). (1999) defined what are
their calling patterns. It is in the spirit of
essentially outlier detection methods for
phenomenal data mining to use similar
detecting anomalous behaviour and
transactional patterns to infer the presence of
distinguished between profiling at the levels
a specific fraudster (McCarthy, 2000).
of individual calls, daily call patterns, and
overall call patterns. Cortes and Pregibon In order to mine very large data sets,
provided a particularly intriguing description visualisation techniques (Cox et al., 1997)
of profiling techniques (1998). The Hancock were also developed for use in telecom fraud
language was described by Cortes, Fisher, detection. Here, human pattern recognition
Pregibon, and Rogers in 2000 for writing abilities work in conjunction with a graphic
programmes for processing profiles. The computer display of the volume of calls made
signatures were based on data such as between various subscribers in various
average call duration, longest call duration, locations. The ability to programme software
number of calls to particular regions in the last to recognise patterns that humans notice is a
day, and so on. Additionally, Fawcett and potential future development.
Provost (1997a, b, 1999) and Moreau,
With increased complexity comes increased
Verrelst, and Vandewalle (1997a, b, 1999)
opportunity for fraud in the telecom market.
described profiling and classification
At the moment, factors like call lengths and
techniques (1997).Detecting behavioural
tariffs are taken into account when user, while the supervised methods are
determining the extent of fraud. The third sometimes referred to as misuse detection.
generation of mobile phone technology will The issue with supervised methods is that
also need to consider factors like the call's they are, of course, limited to working on
priority and content (due to the use of packet intrusion patterns that have already happened
switching technology, equally long data (or partial matches to these). Data from a user
transmissions may contain very different or programme that has been classified as
numbers of packets of data). either normal or abnormal was subjected to
classification techniques by Lee and Stolfo
COMPUTER INTRUSION
(1998). (2000) came to the conclusion that
A 16-year-old boy was imprisoned on developing techniques for spotting new
September 21 for breaking into the NASA and However, Kumar and Spafford (1994) noted
Pentagon computer networks. Between that "a majority of break-ins... are the result of
October 14 and October 25, 2000, Microsoft a small number of known attacks, as
security monitored a hacker's illicit activity on evidenced by reports from response teams."
the Microsoft Corporate Network. These Patterns of intrusion rather than old patterns
instances show how even extremely secure (e.g., CERT). Therefore, automating the
domains are susceptible to having their detection of these attacks should enable the
computer security compromised. Computer detection of a sizable number of intrusion
intrusion fraud is a significant industry, and attempts. Shieh and Gligor (1991, 1997)
computer intrusion detection is a subject of described a pattern-matching method and
intense study. Hackers are able to read emails, argued that while it is superior to statistical
change source code, read files, find methods at detecting known types of
passwords, and more. Eight different types of intrusion, it is ineffective at detecting novel
computer intrusion were listed by Denning types of intrusion patterns that statistical
(1997). Such crime can be all but eliminated if methods could detect.
the hackers are stopped from accessing the
Markov models have naturally been used
computer system or are caught in time. The
because intrusion represents behaviour and
attacks are adaptive, though, and once one
the goal is to differentiate between intrusion
kind of intrusion has been noticed, the hacker
behaviour and normal behaviour in sequences
will try a different approach, as is the case
(e.g., Ju and Vardi, 2001). Probabilities of
with all fraud when the stakes are high. Due to
events were also used by Qu et al. (1998) to
the significance of this issue, numerous
define the profile. In 1996, Forrest, Hofmeyr,
commercial products, such as Cisco Secure
Somayaji, and Longstaff described a technique
Intrusion Detection System, have been
based on how natural immune systems can
developed that use intrusion detection
tell self-patterns from foreign ones. A
techniques.
detection system must be able to adapt to
Analysts of computer intrusion data primarily changes since both individual user patterns
employ sequence analysis techniques because and overall network behaviour evolve over
the only trace of a hacker's actions is the time, much like telecom data does. However,
order of commands used to compromise the it must do so slowly enough to prevent
system. as well Both supervised and accepting intrusions as legitimate changes.
unsupervised techniques are used in other Similarity of sequences that can be
fraud situations. The unsupervised methods interpreted in a probabilistic framework was
used in intrusion detection are typically also used by Lane and Brodley (1998) and
methods of anomaly detection, based on Kosoresow and Hofmeyr (1997).
profiles of usage patterns for each legitimate
Neural networks have unavoidably been Insurance fraud and medical fraud frequently
employed; Ryan, Lin, and Miikkulainen (1997) go hand in hand.Statistician Terry Allen from
profiled by training a neural network on the Utah Bureau of According to estimates of
process data and also cited other neural Medicaid fraud, up to 10% of the $800 million
approaches. Schonlau et al. (2001) described a in annual claims could be fraudulent (Allen,
comparative study of six statistical approaches 2000). By comparing observations with those
for detecting impersonation of other users with which they should be most similar (e.g.,
(masquerading) in one of the more thorough having similar geodemographics), Major and
studies in the field. They used real usage data Riedinger (1992) developed a
from 50 users and planted contaminating data knowledge/statistical-based system to identify
from other users to serve as the masquerade healthcare fraud. Using neural networks,
targets to be detected.The October 2000 issue Brockett, Xia, and Derrig (1998) classified
of Computer Networks [34(4)] is a special fraudulent and nonfraudulent claims for auto
issue on (relatively) recent advancements in bodily injury in healthcare insurance claims. A
intrusion detection systems, and it includes brief discussion of risk and fraud in the
several examples of new approaches to insurance industry was provided by Glasgow
statistical issues in computer intrusion (1997).
detection. Marchette (2001) provided a nice Visithttp://www.motherjones.com/mother
overview of statistical issues in computer jones/MA95/davis2.html for a glossary of
intrusion detection. several of the various types of medical fraud.
Of course, science is not the only field where
data have occasionally been made up,
falsified, or carefully chosen to support a
favourite theory. Science fraud issues are
receiving more attention, but they have
MEDICAL AND SCIENTIFIC FRAUD always existed: rogue scientists have been
known to manipulate experiment results to
There are different levels of medical fraud. speed up the development of a product or
Clinical trials may experience it (see, e.g., achieve a surreal significance level for a
Buyse et al., 1999). Additionally, it can take publication. Such a case was discussed by
place in a more business-related setting. Dmitry Yuryev on his webpages at
Upcoding, in which a doctor performs a http://www.orc.ru/yur77/statfr.htm.
medical procedure but bills the insurer for a Additionally, there are numerous classical
more expensive one or perhaps does not even cases (such as the work of Galileo, Newton,
perform one at all, are examples of Babbage, Kepler, Mendel, Millikan, and Burt)
prescription fraud. Other examples include where the data have been suspected of being
submitting claims for patients who are manipulated. In a fascinating discussion of the
deceased or who do not exist. In 2000, Allen part subjectivity plays in the scientific method,
provided an example of a bill that was Press and Tanur (2001) provided numerous
submitted for more than 24 hours in a examples.
workday. The use of neural networks, genetic CONCLUSIONS
algorithms, and nearest neighbour methods
to categorise the practise profiles of general The areas we've listed are possibly those
practitioners were described in He, Wang, where statistical and other data analysis tools
Graco, Hawkins, and He (1997) STATISTICAL have been most useful had the greatest effect
FRAUD DETECTION 245, and He, Graco, and on detecting fraud. This happens frequently
Yao (1999). because there are vast amounts of data that
are numerical or easily convertible into
numbers in the form of counts and biometric fraud detection techniques are
proportions. However, statistical tools have gradually growing in popularity. These include
also been used for fraud detection in other computerised fingerprint, retinal, and face
areas that weren't mentioned above. In recognition as well as the latter, which has
contexts other than money laundering, gained the most attention in relation to
irregularities in financial statements can be identifying football hooligans. The ability to
used to identify accounting and management process information quickly is crucial in many
fraud. Tools for digit analysis have become of the applications we have discussed.
popular in accounting (Nigrini and Although it also applies to the credit card,
Mittermaier, 1997; Nigrini, 1999, for banking, and retail sectors, this is especially
examples). Financial audits require statistical true in transaction processing, particularly
sampling techniques, and screening tools are with telecom and intrusion data, where
used to determine which tax returns require enormous volumes of records are processed
in-depth analysis. Insurance fraud was daily.
brought up in the context of medicine, but it is
The effectiveness of statistical tools in
undoubtedly more prevalent. Fanning, Cogger,
identifying fraud is a key concern in all of this
and Srivastava (1995), Green and Choi (1997),
work, and it is a fundamental issue that most
and Arts, Ayuso, and Guillén (1999)
of the time one does not know how many
investigated the use of neural network
fraudulent cases slip through the net.
classification techniques for the detection of
Measures such as average time to detection
management fraud. Sporting events have also
after fraud starts (in minutes, numbers of
used statistical fraud detection tools. For
transactions, etc.) should also be reported in
instance, Smith (1997), Barao and Tawn
applications such as banking fraud and
(1999), Robinson and Tawn (1995), and others
telecom fraud, where speed of detection
examined the results of running competitions
matters. Measures of this factor interact with
to determine whether any exceptional times
measures of the final detection rate because,
deviated from what might be anticipated.
frequently, it takes multiple fraudulent
Fraud also includes plagiarism. We briefly transactions on a phone, account, or other
discussed the use of statistical tools for author device before it is identified as fraudulent,
verification; the same techniques can be used necessitating a number of false negative
in this situation. The use of statistical tools can classifications.
be expanded though, For instance, with the
Using a graded system of investigation is an
development of the Internet, it has never
appropriate overall strategy. While accounts
been simpler for students to steal articles and
with large but less dramatic suspicion scores
pass them off as their own in coursework for
merit closer (but less expensive) observation,
high school or university. A system that can
those with very high suspicion scores require
evaluate a manuscript against their
immediate and intensive (and expensive)
"substantial database" of online articles is
investigation. Once more, finding a workable
described on the website
compromise is required.
http://www.plagiarism.org. The returned
statistic represents the manuscript's In regards to statistical tools for computer
originality. intrusion detection, Schonlau et al. (2001)
came to the following conclusions: "Many
Fraud detection, as we noted in the
challenges and opportunities for statistics and
Introduction, is a post hoc strategy that is
statisticians remain," though statistical
used when fraud prevention has failed. Some
methods can detect intrusions even in
methods of preventing fraud also use
challenging situations. This optimistic
statistical tools. For instance, so-called
conclusion, in our opinion, is true generally.  BRAUSE, R., LANGSDORF, T. and HEPP,
Fraud detection is a crucial field that M. (1999). Neural data mining for
statisticians can use to apply statistical and credit card fraud detection. In
data analytic tools in a variety of ways. Proceedings of the 11th IEEE
International Conference on Tools
REFERENCES
with Artificial Intelligence 103–106.
 ALESKEROV, E., FREISLEBEN, B. and IEEE Computer Society Press, Silver
RAO, B. (1997). CARDWATCH: A neural Spring, MD.
network-based database mining  BREIMAN, L., FRIEDMAN, J. H.,
system for credit card fraud detection. OLSHEN, R. A. and STONE, C. J. (1984).
In Computational Intelligence for Classification and Regression Trees.
Financial Engineering. Proceedings of Wadsworth, Belmont, CA.
the IEEE/IAFE 220–226. IEEE,  BROCKETT, P. L., XIA, X. and DERRIG, R.
Piscataway, NJ. A. (1998). Using Kohonen’s self-
 ALLEN, T. (2000). A day in the life of a organising feature map to uncover
Medicaid fraud statistician. Stats 29 automobile bodily injury claims fraud.
20–22. The Journal of Risk and Insurance 65
 ANDERSON, D., FRIVOLD, T. and 245–274.
VALDES, A. (1995). Nextgeneration  BURGE, P. and SHAWE-TAYLOR, J.
intrusion detection expert system (1997). Detecting cellular fraud using
(NIDES): A summary. Technical Report adaptive prototypes. In AAAI
SRI-CSL-95-07, Computer Science Workshop on AI
Laboratory, SRI International, Menlo  Approaches to Fraud Detection and
Park, CA. Risk Management 9–13. AAAI Press,
 ANDREWS, P. P. and PETERSON, M. B., Menlo Park, CA.
eds. (1990). Criminal Intelligence  WASSERMAN, S. and FAUST, K. (1994).
Analysis. Palmer Enterprises, Loomis, Social Network Analysis: Methods and
CA. Applications. Cambridge Univ. Press.
 ARTÍS, M., AYUSO, M. and GUILLÉN,  WEBB, A. R. (1999). Statistical Pattern
M. (1999). Modelling different types Recognition. Arnold, London.
of automobile insurance fraud  WHEELER, R. and AITKEN, S. (2000).
behaviour in the Spanish market. Multiple algorithms for fraud
Insurance Mathematics and detection. Knowledge-Based Systems
Economics 24 67–81. 13(2/3) 93–99
 BARAO, M. I. and TAWN, J. A. (1999).
Extremal analysis of short series with
outliers: Sea-levels and athletics
records. Appl.Statist. 48 469–487.
 BLUNT, G. and HAND, D. J. (2000). The
UK credit card market. Technical
report, Dept. Mathematics, Imperial
College, London.
 BOLTON, R. J. and HAND, D. J. (2001).
Unsupervised profiling methods for
fraud detection. In Conference on
Credit Scoring and Credit Control 7,
Edinburgh, UK, 5–7 Sept.

You might also like