ITE 1200 Lab/Tutorial Activity #3
Tasks
1. Read the following case study
2. Answer the review questions.
CASE Study: The Paradise Papers and Big Data Journalism – Extracted from
Essentials of MIS by Laudon and Laudon (14th Edition)
In the last months of 2017, the news cycle around the world was dominated by reports of tax
neutralization by the rich and powerful. This was the result of a thorough analysis by news outlets
of 13.4 million leaked documents that detailed the tax-avoidance strategies of wealthy individuals
and companies. The leaked documents, in large part originating from tax consultancy firm
Appleby, were dubbed the “Paradise Papers” after the idyllic islands that had served as tax
havens, such as Bermuda, where Appleby’s headquarters are based.
These documents came into the possession of the German newspaper Süddeutsche Zeitung,
whose managers quickly realized that they would be unable to analyze all the data by themselves.
They reached out to other news organizations, including The Guardian and the BBC in the United
Kingdom. No one knows for certain how many journalists and data analysts combed through the
emails, reports, and accounts weighing in at over 1,400 gigabytes—there were likely hundreds of
them. But all their efforts proved worthwhile, yielding a long list of scoops.
The star of the Irish band U2, Bono, turned out to be part owner of a shopping center in Lithuania
that was being investigated for dodging taxes. A member of the British House of Lords, Baron
Ashcroft, proved to have retained residency status outside the United Kingdom to evade taxes
despite his protestations that he was a UK resident. The list of revelations went on. Most of those
named in the Paradise Papers denied the accusations or stated that the tax neutralization
methods they used were perfectly legal. Bono claimed ignorance of the shopping mall in Lithuania
and said that he welcomed the insights provided by the analysis of the Paradise Papers.
The Paradise Papers reports were a triumph of what has come to be known as big data journalism.
Few readers recognized what a major feat it had been, not least because the methods used were
closely guarded. The International Consortium of Investigative Journalists, which played a major
role in coordinating the analysis of the data, recognized that computer scientists would have to
play a vital role in this enterprise. A chief technology officer was appointed to supervise the efforts
along with six software developers. Access to the files was restricted; for one thing, the identity
of the whistleblower had to be protected. At the instigation of the Consortium, all systems used
by the journalists while studying the files were encrypted and a two-factor authentication system
was applied. Many journalists were not well versed in security issues and had to be taught before
they could work on the documents.
Not only did the leaked documents number in the millions, but they came in various formats
too—some of these files were even in PSP (Paint Shop Pro). One reason for this was that Appleby
was not the only source of the documents; in total, there were at least 19 data sources. A big
challenge, thus, was devising a system that allowed easy access to all the files. However, the
documents included emails, handwritten notes, photos, etc.—files that could not be read by
machines, an absolutely necessary precondition for establishing a database.
A software company named Nuix stepped in and assisted in transforming all of these documents
into a readable format through advanced optical character recognition software that could
recognize text based on combinations of words that often occur together. Ingenious solutions like
this allowed the journalists to finally put all the files in one database. Data analysts then devised
algorithms that could cut across the many coding systems used in the 13.4 million documents and
create links between companies and individuals and the data relevant to them.
This was perhaps the most difficult part of the enterprise, as it required close cooperation
between the computer scientists and the journalists. To create a successful and efficient
algorithm, the journalists needed to provide the data specialists with lists of the terms that were
used in the Paradise Papers to refer to individuals and companies. As there are many of these—
and they sometimes appear only as numerical codes—there was a strong need for cooperation
between the journalists on the one hand and the data analysts on the other. Generally speaking,
the success of big data journalism is contingent on close cooperation between data analysts and
journalists; the challenge for journalists is to provide the data experts with clear information,
while the challenge for the data experts is to create a knowledge center that journalists, many of
whom are decidedly not computer wizards, can easily use.
The reporting on the Paradise Papers has been universally acclaimed as an outstanding example
of how new technology and techniques can be used to journalism’s advantage. The news outlets
won many awards for their work, including a prestigious Investigative Reporters and Editors award
recognizing the innovative use of big data.
Main Sources:
• “Paradise Papers,” BBC News, https://www.bbc.com/news/paradisepapers;
• “Paradise Papers: Secrets of the Global Elite,” ICIJ,
https://www.icij.org/investigations/paradise-papers/
• “Paradise Papers: Das istdas Leak,” Süddeutsche Zeitung,
https://projekte.sueddeutsche.de/paradisepapers/politik/das-ist-das-leak-e229478/,
[all accessed December 1, 2019].
Review Questions
1. Why was it a challenge to place all the documents from the Paradise Papers in one
database?
2. Protecting the identity of a whistleblower or whistleblowers is of vital importance to
journalists. Give at least one reason why this is so important.
3. Explain why cooperation between data experts and journalists was vital to the efficient
analysis of data.
4. News outlets have been experiencing a severe crisis of profitability. What do you think are
the causes of this crisis? What role can big data analytics play in countering it?
5. What role did optical character recognition (OCR) play in making the Paradise Papers
readable?
6. How did data analysts structure and retrieve information from 13.4 million documents?
7. What security measures were used to protect sensitive data during storage and
processing?
8. What were the biggest computational challenges in analyzing such a vast dataset?