Blockchain Solution for Web Scraping
Blockchain Solution for Web Scraping
A Blockchain-Based Solution to
High-Volume Web Scraping With
Smart Contracts on Ethereum
ii
Abstrakt
Eftersom det är svårt att skydda servrar mot storskaliga HTTP-förfrågningar
behövs nya lösningar. Att använda metoder såsom att begränsa datahastigheten
eller blockera IP-adresser räcker inte. I denna rapport föreslås en ny lösning
för att bekämpa webbskrapning med hjälp av blockkedjeteknik. Vi skapar en
kryptografisk algoritm och använder den på en mobil enhet för att kommu-
nicera med ett ethereumnätverk med avsikt att styra serveråtkomsten. Våra
studier indikerar att det finns potential att begränsa informationsåtkomst
genom att nyttja blockkedjeteknik på mobila enheter. Dessutom har block-
kedjor potentialen att fungera som ett ytterligare säkerhetsskikt istället för
enbart en nätverkslösning. För att fastställa hur effektiv lösningen är behövs
fler studier.
iii
Acknowledgements
Stockholm, 2018-07-20
iv
Contents
1 Introduction 1
2 Web Scraping 3
2.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Current methods . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.1 ShieldSquare . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.2 Distil Networks . . . . . . . . . . . . . . . . . . . . . . 10
2.3.3 Alibaba Cloud Web Application Firewall . . . . . . . . 11
2.4 Blockchains . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5.1 ShieldSquare and Distil . . . . . . . . . . . . . . . . . . 15
2.5.2 Alibaba Web . . . . . . . . . . . . . . . . . . . . . . . 16
2.5.3 Ethereum . . . . . . . . . . . . . . . . . . . . . . . . . 17
v
5 Test Results and Analysis 64
5.1 Token generation simulation . . . . . . . . . . . . . . . . . . . 64
5.2 Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.3 Approach to development . . . . . . . . . . . . . . . . . . . . 68
5.4 Server and network . . . . . . . . . . . . . . . . . . . . . . . . 70
5.5 PoE protocol and tests . . . . . . . . . . . . . . . . . . . . . . 71
6 Conclusion 73
6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Appendix 75
References 84
vi
Abbreviations
vii
1 Introduction
For The Mobile Life (TML) and its customers, scraping is an issue which
is difficult to entirely prevent. Not only is the act of scraping accesible to
perform, but it can in some regards be seen as an abuse of current internet
protocols. For TML, the problem is when there are high-volume requests to
servers (with e.g. flight data) in a short period of time. The consequences are
persistant and pervasive: unwanted traffic and additional costs. The costs
are both direct, since another party has to provide a booking service for a
fee, but also indirect when malicious parties extract flight prices on a con-
tinuous basis. The flight data can then be used by competitors to undercut
prices, if only by a small margin. The way the data is being scraped is by
using the Hypertext Transfer Protocol (HTTP) to send GET requests which
retrieves information from a server. In normal scenarios, it is simply a tool
to fetch data from a server. In this case, these are not normal interactions
but large scale operations with a goal of harvesting as much data as possible,
with as little effort as possible. Since the scrapers perform their attacks from
computers and the flight services are mainly accessed from smart devices, the
solution will have to be mobile-based while mitigating attacks from scrapers.
1
should be done without affecting regular user interactions, i.e. non-scraper
clients. We do the following: setup a blockchain, create a verifying smart
contract, and develop a cryptographic algorithm to be used by smart de-
vices. In theory, it is supposed to work such that when a client is requesting
data, requests should go through the blockchain before it is passed on to the
server. Data is then unidirectionally passed on from the server to the client,
i.e. the client is never supposed to directly communicate with the server. The
blockchain can then act as a filter against scraper attacks since it will drop
requests that does not contain a solution to a cryptographic puzzle. Due
to the inherent property of hashing, there is a lower limit of how fast such
solutions can be solved and generated. This is the basis of how high-volume
requests are to be mitigated.
2
2 Web Scraping
3
Table 2: Sorted requests by IP of a suspected scraper
IP address Request type Size (bytes) Agent
[Link] GetAvailableFlightPrice 837 (null)/(null)
[Link] GetAvailableFlightPrice 11218 (null)/(null)
[Link] GetAvailableFlightPrice 5437 (null)/(null)
Note that this request involves many different settings, including both time
dates and flight routes, which explain the varying size of this request. An-
other indicator, which cannot be seen in these tables, is the timestamp.
Scrapers usually persist over a longer period of time, although this is not
always the case since they are instructed by people. It is important to em-
phasize that a naı̈ve solution to counter high-volume requests of Table 2 is
to restrict certain Agent fields. In practice, this is very easy for the scrapers
to circumvent through spoofing and consequently not a sustainable solution.
To get a better perspective of requests in general, we could sort by total
amount of requests by each IP address, as in Table 3.
It is not definite that all agents with a higher request count are scrapers
but one could question the legitimacy of an IP address, if the one with most
amount of requests and highest average request size exclusively fetches data
composed of price information. Another naı̈ve solution could be to block
that IP address, but this is once more easily circumvented by the scraper
network by changing the IP address. Ultimately, we want to counter high-
volume scraping in real-time and not let it continue over a longer period of
time. A previous and temporary countermeasure can be seen in Fig. 1. This
4
countermeasure, based on a commercial service, suffered from a great many
of requests in relation to time.
Even though the result of the protection mechanism in the short term was
significant, it took a few hours before it had an impact. Furthermore, some-
times a proven mechanism stops to be sufficient, as can be seen in Fig. 2,
which once again is a commercial service used to counter scrapers.
5
It should also be noted that both Fig. 1 and Fig. 2 are two distinct servers
and companies, which means that they are targeted by different kinds of
scrapers. Due to many uncertain parameters, including lack of information,
it is not fair to compare these with each other on a one-to-one basis. Since
the scope of this report is not to mainly evaluate existing scraper protection
mechanisms, we will just conclude that current systems have a varying form
of success and speed.
2.1 Limitations
There are factors which need to be taken into consideration when deciding
on a practical and viable approach. The limitations are derived from com-
mercial demand from TML and from previous observations by looking at the
request behaviour through old logs. They are as following:
2.2 Requirements
6
Figure 3: The requirements for the scraper protection module.
7
where the less a user knows about the logic, the more difficult it will be
to circumvent a solution. Requirement 4 is an important factor of TML,
since new back-end solutions should preferably have the ability to deploy
and scale for new customers as times goes on, and especially be able to
be replicated for new business. The last requirement 5 is the basis of the
company since a significant share of their business is catered to large airlines
in need of mobile solutions and server architecture. These requirements are
not based on any previous solution or research, only potential components
of a POC. Furthermore, we do not know if the requirements are complete or
sufficient.
1. Rate limit – controls the rate of the traffic which helps to prevent DoS
(Denial of Service) attacks and/or limit the rate in which information
can be fetched
Problem: affects every user and may cause service delay if used ex-
cessively
2. Block IP address – drop clients that abuse the service
Problem: does not work when large computer networks can change
and spoof IP addresses; it is defeated by proxies
3. CAPTCHA or email signup – forces verification of all users to confirm
their identity, at least temporarily
Problem: allows for a less smooth user experience, may decrease the
8
conversion rate of some user-centric consumer applications and ulti-
mately cannot stop scraping
4. Client puzzle protocol – deters abuse by forcing clients to solve a com-
putational problem and return a solution
Problem: mainly designed against connection denial attacks and makes
retrieving information more difficult for every user, and not specifically
users with unwanted behaviour
5. Detention based on cookies – a longer delay on new cookies and a cap
of the amount of requests each are able to do could control the data
access
Problem: this and any other client-centric solution suffers from the
inherent weakness that it can be easily modified and accessed by the
client: specifically, alter or even delete a cookie such that the restriction
is circumvented completely, which means that a solution cannot allow
clients to have control of both the logic and ticket used to access the
resource
2.3.1 ShieldSquare
Their approach to bot protection is to build signatures for each unique visitor
[2]. This should not have an impact on genuine users and performance. In our
case, this violates project limitation 1 since the assumption of differentiating
between users is not a priority. Their solution is both integrated with the
web page being protected, and this is then evaluated on a cloud engine and
if it is evaluated as friendly (user or search engine crawler), then the API
response code allows the request. For our proposed solution, this violates
requirement 3 since we are not interested in protection mechanism entirely
on the server side. The user analysis is done in several layers
• IP tracking: network forensics based on data such as geo location, ISP
information, and connection type, but also whether it comes from a
proxy or not
• Behaviour analysis: bot behaviours are significantly different from a
genuine user; typical users have certain behavioral characteristics in
terms of page views per session, time spent on each page, and frequency
of repeat visits
9
• Collective intelligence: data that is gathered across sites are shared
with other websites to be fully utilized to identify bots; data from a
third party fraud intelligence can also be used to keep track of flagged
IPs and devices to counter attacks
This solution is a lower-layer approach (relative to ours) with a clear goal of
authenticating users to let certain requests bypass their bot filter, as can be
seen in Fig. 4. The proposed solution in this project is also partly based on
behaviour analysis, albeit not as extensive with the amount of parameters as
theirs. The main issue, which we are trying to explore, is the opportunity
to build a solution which does not need to be dependent on protecting the
server. It is very difficult to judge the success or efficiency of ShieldSquare
from a mere look on their approach. But the proposed solution should not
be dependent on classifying users nor should it have a server-sided depen-
dency. Furthermore, we have no interest in challenging a potential bot with
CAPTCHA.
Validating a browser and its JavaScript engine is once again the way to de-
tect scrapers [3] and this is in direct violation of requirement 3. Similarly,
when a device is roaming the website, it collects and analyzes data to identify
malicious behaviour. It is difficult to tell what this kind of analysis actually
10
means but an educated guess could be that it partly tries to look for certain
behaviours, like any other solutions. Furthermore, their main approach is
different kinds of CAPTCHA responses which they have designed. Even if
it is assumed these have zero false positives, it is not our intention to base a
protection mechanism on challenges, which violates requirement 1. We are
looking to build a reliable and robust filter, which is open to anyone but still
puts a dynamical limit on the amount of requests within a certain time frame.
This web application security firewall is not entirely focused on scrapers and
bots but web attacks in general [4], such as injections and exploits of other
vulnerabilities. For this reason, they have similar strategies of detecting il-
11
legal requests and their solution is partly based on modifying the website’s
DNS record. In comparison to ShieldSquare, this solutions seems more fo-
cused on the application layer instead of the network layer. When it comes
to efficiency for protection against scrapers, it is difficult to say whether a
more lower-layer solution is more suitable or not. What can be seen from
this approach is that it tries to defend against flooding, pinging attacks, and
other types of intrusion. A main concern is that it heavily relies on firewall
techniques such as DoS protection and not specifically scrapers.
The proposed solution in this study has neither requirements nor any in-
tentions to protect against DoS attacks. Like previous alternatives, it is
almost impossible to evaluate their solution in practice without using it and
understanding the approach in-depth. We can only state that it may work,
but it is not apparent that it is tailored to our end specifically. Observ-
ing Fig. 6 we can notice that it has very similar characteristics to current
methods, already discussed in Section 2.3. This may not come as a surprise
considering the service being offered has the word firewall in it, something
which is already deemed insufficient.
12
2.4 Blockchains
13
tablished. Furthermore, the mobile client is not a prioritized issue within the
biggest smart-contract network Ethereum. In this report the mobile client is
sometimes referred to as a light client. The opposite is a full client which is
run from a computer.
14
2.5 Comparison
In Section 2.3 we treated some of the more practical and direct counter-
measures to scraping. Methods that can be seen as rougher and not precise
enough. Here, we will compare a few commercial services and how they
satisfy our requirements. Their approach is, at least in theory, significantly
more refined than e.g. simply blocking IP addresses. Each alternative and
how they comply with the requirements can be seen in Table 4.
Their properties are deemed to be similar to each other, with respect to the
requirements. Firstly, the complexity (requirement 1) is similar in the sense
that applications can be challenged through actions such as CAPTCHA.
Even if these are very well tuned without any zero positives, this is not the
approach we are looking to take since it requires application input and hence
is not aligned with the requirements. Consequently, both solutions are com-
pletely on the server side which means that it is difficult for a malicious user
15
to access the logic or algorithms being used. On the other hand, we are not
looking for a solution that is present on the server side as both of these are
since they intercept the communication between client and server.
For dynamicity (requirement 4), it is not entirely clear if it would have the
intended effect on rapid response notice. To mention an example, Shield-
Square has an active mode protection [10], which means that it is possible to
choose one of two methods. The first one is called Real-Time Protection and
lets one make synchronous calls to their API to take action against bots of
different categories. The other is called Feed-Based Protection which allows
for asynchronous calls to be made. Similarly with Distil, they have different
protection settings that lets you configure what actions to take in response
of bot requests. What these two solutions have in common is that neither
are revising the permissions of an absolute amount of requests, but merely
responds in different ways depending on the classification of the bot. In
the solution, the sole goal is to eliminate the possibility of making excessive
queries and to this end neither is a particularly good fit.
The first requirement this solution satisfies is that none of the logic happens
in the front-end and will not challenge the user in any visual way, in the
very same a way a router would drop a packet. Also, it should not be an
issue with mobile applications, which the other established solutions most
definitely supports. What it does not support, however, is the following
requirements
1. Obscurity (requirement 2): the client may be very well exposed to
the constraints of this solution, as well as how it works with respect
to scraping; the product website [4] is very clear with what kind of
measures are taken and for that reason we will not reach sufficient
obscurity when protecting the data
2. Robustness (requirement 3): following the previous requirement, the
logic is mainly based on interaction with the server. Furthermore, com-
mon network rules may very well be circumvented through trial-and-
error or traditional brute force attacks, which violates the requirement
16
3. Dynamicity (requirement 5): even if firewall rules are changed in real-
time, it will not be precise enough or sufficiently effective for deal-
ing with high-volume scrapers; the approach results in weaknesses dis-
cussed in Section 2.3 and 2.5
The combination of the inherent properties of a firewall, and points 1 to
3 above, makes this service the least suitable for this project against high-
volume scraping.
2.5.3 Ethereum
17
manner compared to today. The significance of this research is that if it
works, it is not limited to protecting server APIs and other semi-public in-
formation but to any industry that is dependent on data being accessible by
consumers from interactive on-demand services.
18
3 Theory of Blockchains and Smart Contracts
In the past years we have seen a huge increase of global use of mobile devices
compared with desktop computers. Likewise, for many developing countries
opportunities to access the internet has increased due to lower costs and
better access. For many people, the first interaction with the World Wide
Web is through a handheld device. In the case of internet, the availability
and openness has a backside when it comes to vulnerability and new attack
vectors. Before digitization, companies relied on paper and physical records
containing information. An example is ledgers with credit and debit cards.
Since the internet is decentralised, it is also difficult for information to be
stopped. This is where blockchain can improve both openness and data confi-
dentiality through mathematical protocols. A blockchain protocol is, ideally,
defined in such a way that agents are inclined to take the most beneficial
decisions. Not only for themselves but for the network as a whole. Moreover,
a blockchain protocol can be used for more than its network properties. An
investigation of its potential as an additional security layer between users
and servers is both interesting and promising. This is something which could
help web services to defend against unwanted scrapers. Because of this, the
company The Mobile Life (TML) in Sweden has initiated an investigation of
blockchain technology to counter scraper attacks. Specifically, the solution
is intended for applications accessed through smart devices.
19
interacted with through the blockchain and differ from traditional requests
on the internet which makes scraping possible. Since the blockchain in this
report is private, a local network only for the company, it is also controlled
by a single authority. This may contradict the philosophy of blockchains but
its importance as a potential security layer against scrapers must be stressed.
20
Ethereum is a blockchain network which is different from Bitcoin. There
is some overlap between the networks, such as cryptographic and commu-
nication principles, while other things are unique such as gas. Despite this
project not being based on Bitcoin, the way its network acts and agents op-
erate is sometimes applicable to Ethereum. In such case, Bitcoin will be used
as a reference for practical examples.
21
• Cryptocurrency wallet: each user has to possess a wallet, an Ethereum
address, to receive and send a cryptocurrency, and this is based on a
key pair consisting of a public (64 bytes) and private (64 bytes) key.
The last 20 bytes of a hashed public key is referred to as an Ethereum
address.
• Public key encryption: the principle of asymmetric cryptography with
a pair of keys can be seen in Fig. 8. In the sense of cryptocurrencies, a
public key can be seen as the address to a wallet. In traditional banking,
this would be the bank account number. If funds are transferred to a
wallet address, a transaction is valid if it is correctly signed using the
corresponding private key.
• Ether : on the Ethereum network, the unit of value used to access com-
putational resources is called ether. Apart from simply a currency, this
is the unit of value often considered the ’digital oil’ of cryptocurren-
cies due to its ability to invoke computations and change the stored
information on the blockchain.
• Smart contract: a piece of code written as instructions to the Ethereum
Virtual Machine. A contract also works as an account and can both re-
ceive and send ether, according to its instructions. The smart-contract
oriented language used for development is called Solidity.
• Gas: a smart contract which performs operations and storage on the
blockchain comes with a cost at both deployment and execution of the
contract. Instead of estimating such costs in ether, which can fluctuate
significantly against other FIAT currencies (USD, EUR, JPY etc.), it
is rather denominated in gas. For example, a transaction to another
wallet costs 21,000 gas just for performing the operation.
22
Figure 8: The principle of public key encryption [17].
23
chaining-hash be the strings KTH and ElectricalEngineering, respectively.
In practice, the chaining-hash will itself be another hash while block-data
could contain several transactions (hashes). In Fig. 9 we see the Keccak-256
hash result of our example. Since hash functions are one-way functions, it is
easy to verify that the chaining-hash is made up of the two strings appended.
It must be noted that in this example we only hash individual strings, while
in other cases it can be beneficial to hash other simple or composite hash
values for verification, privacy or security reasons.
24
Figure 10: Basic flow of a blockchain transaction [20].
25
the project result in a token which can be used to access a resource.
26
• Decentralised: one major advantage of blockchains compared to other
distributed databases is the integration of consistency and security
through algorithmically enforced rules, which removes the the human
factor from the equation [23]
• Centralised: a blockchain that is centralised has one or a few single
points of failure. This means that a trusted party in this case will
solely be the company computer itself, running the full node.
27
Figure 12: A malicious user Tony sends a spoofed transaction message [24].
28
Both transactions could be propagated across the network at the same time
and both would show a valid origin, i.e. transaction legitimacy. Depending
on where each node is in the network, some would receive transaction A first
while others would receive transaction B. Since only one of these transac-
tions will be added to the blockchain, it will be the first confirmed block
that includes any of these transactions. After such confirmation, the other
transaction will be discarded and not processed.
Proof of work and hash rate. Both Bitcoin and Ethereum are using
the consensus protocol PoW, a principle which originally may have been a
proposed method to prevent spam [25]. The enabler of spam is its avail-
ability and cheapness to send but with proof of work (PoW) the sender has
to perform certain resource-intensive computations, memory intensive oper-
ations or in some way post a bond for each message sent. Computations of
these kind are quantified by its hash rate, which measures the number of
times a hash function can be calculated per second. It is easy for an agent
to calculate their own hash rate, but more likely to only be able to estimate
the hash power of other actors on the network [26]. Furthermore, this hash
rate is dynamical due to its hardware and software nature but also since its
operators can quickly change the network to mine. It could be argued that
a strong incentive to mine a specific network is the current profitability, i.e.
rewards of finding a new block. Since networks are driven by honest miners
to process transactions and find new blocks, it is necessary that the hash
power of the majority network belong to this group of honest nodes.
29
dle and trade cryptocurrencies. Nonetheless, there are no other established
alternative for the vast majority of retail cryptocurrency investors, seeking
liquidity of these digital currencies.
An occurring and perhaps the most fretted attack today is the so-called 51%-
attack, which means that a majority of the network’s hash power belongs to
one or several malicious users which may in some way want to diverge from
the blockchain protocol, such as legitimacy and consensus. A network which
suffers from such attack can accept double-spend transactions as valid and
even refuse to process certain transactions. In practice this means it is pos-
sible to impose censorship on certain entities [28]. There are ways to combat
51%-attacks, by introduction of second cryptographic challenges such as two
phase proof-of-work [29], or through other consensus mechanisms based on
proof of stake which is not dependent on hash power [30]. A version of the
latter, Casper, is considered to be implemented on the Ethereum network
due to additional security compared to PoW [31].
30
operability and cooperation between private cloud systems [35], albeit with
a possibly better approach than previously mentioned. The project aims to
achieve high data integrity among databases and for that reason blockchains
are a suitable option for distribution of databases. As can be seen in Fig. 14,
it is suggested to use a combination of a permissioned blockchain in the first
layer and a traditional PoW based blockchain protocol in the second layer.
A block in the second blockchain will be linked to a certain part of the first
blockchain, which means that the immutable transaction hash will act as a
forensic evidence that proves and validates the integrity of the data stored
in the first blockchain.
31
which can be treated as autonomous processes in the sense that they react
the same way to each trigger, independently of which entitity interacted with
it in the first place. When triggered with certain parameters, i.e. solution to
a puzzle, the contract will reward the sender of the trigger with a token.
32
parency for the participants. There would not be feasible at all to run this
module on the main network and since this benefits strictly the company, a
private blockchain is a sufficient enabler. Further, the company would have
to distribute real ethers to consumers also and have to deal with actual cryp-
tocurrencies.
The penultimate question is if writers are trusted, and they are not. Too
much trust in users has possibly contributed to scraper behaviour. Finally,
do we need public verifiability? Not really, as previously we are not inter-
ested in using the blockchain data collected by the users in any way apart
from identifying users that sends excessive requests.
Figure 15: A flowchart from [21] which argues for the necessity of a blockchain
for organizations.
33
scrapers.
Figure 16: A table from [21], comparing the openness of blockchains in rela-
tion to a central database.
34
Figure 17: A system with smart contracts [36]
For the project, we are not concerned about security but it should be noted
that there is a significant amount of risks associated with smart contracts.
Some of these risks can be seen in Fig. 18 and a proposed set of countermea-
sures, respectively.
35
Figure 18: Security issues of smart contracts and potential countermeasures
[36]
36
Figure 19: Visualization of a blockchain fork event and network re-
convergence [38]
The motivation of smart contracts for this project could be questioned. One
could argue that the corresponding logic on a server or similar machine could
work just as well and then a blockchain would not be needed at all. The objec-
tion is valid but not entirely justified since a transaction-oriented blockchain
with smart contracts is more than simply the logic and rules. Firstly, the
smart contracts opens up a layer that can operate outside of HTTP and con-
sequently traditional requests. This study can push the boundaries further
to make Web 3.0 become mainstream and an established network standard
to solve this issue. Within the scope of scraping, given that a smart-contract
solution works it could be distributed across a vast amount of systems and
industries with a steady stream of data available to consumers. A logic
that is simply stored on a server and distributed to thousands of others is
not as distributed as a decentralised and autononomous service. Such ser-
vice is audited, verified, and used around the world, around the clock. It
37
could be argued that there are few, if any, alternatives which can match the
transparency and integrity of a decentralised application. Provided that it is
necessary for an organization which is not always the case.
A hash function and its stablemate cryptography are two of the fundamental
aspects of blockchain technology when it comes to trust and integrity. The
output to a hash function is a message digest, or simply digest; a digital
fingerprint of characters and numbers. Cryptography provides a mechanism
to encode the rules of a cryptocurrency system in the system itself [39]. It
depends on deep academic research and advanced mathematical techniques
which can be difficult to understand. To understand a hash function better,
three basic properties are as presented in Definition 1.
38
Definition 2 (Cryptographically secure properties).
• Collision-resistance. A hash function H is collision resistant
if it is infeasible to find two values, x and y, such that x 6= y and
H(x) = H(y).
• Hiding. A hash function H is hiding if: when a secret value
r is chosen from a probability distribution function with high
min-entropy, then given H(r||x) it is infeasible to find x. Con-
catenation is denoted by the operator ||.
• Puzzle friendliness. A hash function H is called puzzle-
friendly if for every possible n-bit output value y it is infeasible
to find x such that H(k+x) = y in time significantly less than
2n , where k is chosen from a distribution with high min-entropy.
The hiding property asserts that given the outputs of the hash function
y = H(x), there is no feasible way to figure out the input x. Consider the
following example: an experiment with a coin flip is performed. If the out-
come of the flip is heads, we announce the hash of the string heads. Similarly,
we do the same with tails and the hash of the string tails. Were we to ask an
adversary, who did not see the flip but only saw the hash output, to figure
out the outcome of flip it is easy to verify: just hash the two strings and
compare. The adversary could find the input because of the limited set of
possible inputs: {heads, tails}. It needs to be the case that no value of x is
likely to achieve the hiding property and x has to be chosen from a very large,
spread out set. If x is sufficiently large, the method of trying a limited but
likely amount of values will not work. But it is possible to achieve the hiding
property even when the input is limited if it is concatenated with something
else such as a secret value r. This is known as salting a string to hash in
cryptography and takes advantage of the desirable cryptographic avalanche
39
property [40] of hashes: a small change in the input string of a hash, such
as flipping a single bit, will change the output of a hash function significantly.
40
4 A New Blockchain-Based Approach to Web
Scraping
Since we have decided to use the Ethereum blockchain and smart contracts,
it is necessary to run such blockchain with appropriate permission control.
The blockchain client is based on Geth according to Section 2.6 and since it is
required to run a private and permissioned blockchain, this should be the first
step in the project disposition. The whole mechanism of protecting against
scrapers must be decided before smart contract and prototype development
are initiated. It is necessary to develop a smart contract with the primary
feature to validate a message digest and confirm that it contains a prefix with
certain amount of leading zeros, in the same way the PoW protocol works.
This has been done in the contract-oriented programming language Solidity
paired with the development and testing environment Truffle.
41
42
Figure 20: The approach and the sequential steps of the project.
Figure 21: Technology used in the project, mapped to each implementation
step.
43
4.1 Blockchain disposition
• gasLimit: ”800000000000”
• difficulty: ”200000”
The values corresponding to each parameter were not entirely arbitrary.
Firstly, it is beneficial to have a very high limit of operational costs since
ether is not an issue in a private environment with only one miner. Sec-
ondly, we aim at a faster mining rate to process transactions compared to
the main Ethereum network with a relatively low value. Since we use a pri-
vate blockchain on a local area network, our equipment setup is the following
• MacBook Air (Early 2014), macOS High Sierra 10.13.4, Geth 1.8.10-
stable
• Samsung Galaxy S6, Android 7.0
After initiation of the blockchain, through the genesis file which creates the
first block, we run the node with the following flags
44
datadir. The directory of where we store blockchain information and his-
tory, including the genesis file.
rpc. Allowing remote procedure call, i.e. mobile devices are allowed to
send commands to the blockchain for execution.
rpcport. The port for devices to connect to, makes it possible to exploit
RPC calls.
The next steps are to create an account, unlock the account, and make it a
default account to be able to mine; process transactions and get rewarded in
ether. We have used the quick way to create and unlock an account without
any time restriction (0), and set the password to a simple string. This is
done by the following commands:
45
[Link]("daniel")
[Link]([Link][0], "daniel", 0)
[Link] = [Link]
In practice, using this way to create and unlock accounts is not to be recom-
mended with respect to eavesdropping in the network. For this project such
vulnerabilities are not considered but it is important to emphasis.
After mining at least one block, with a reward of 5 ETH, it is now pos-
sible to deploy one or several smart contracts on the blockchain network.
Deploying a contract comes with a gas cost, which can be translated to a
corresponding value in ether. Furthermore, on the main network, ethers have
a somewhat volatile price but gas costs are always the same for execution of
certain instructions.
46
To deploy the contract on the blockchain, known as migration, we use the
development framework Truffle and connect it to our blockchain through the
open rpcport flag used earlier. With a finished contract, the next step is to
migrate it through
truffle(development)> migrate
The contract address can be seen directly on the chain, after the full hash
of the block, or through Truffle. Despite a contract existing on the network,
it is not possible to directly communicate with it. The application binary
interface (ABI) of the contract is needed to interact with it on the blockchain.
truffle(development)> [Link]([Link])
[..]
After saving the output, we have both necessary components to fully inter-
act with a smart contract: (1) its contract address, and (2) its contract ABI.
The last steps can be seen in Fig. 23 where we specify a contract address and
ABI before we use its function. The very same steps shown are performed in
the following sections when we interact with a contract through the terminal.
47
Figure 23: Calling the function Hello from the contract Greeter.
The basis of the proposed solution is that access to a resource requires in-
vested CPU power and computational effort. Furthermore, the effort should
be dynamic and not static. More importantly, the resource should not sim-
ply be accessible through traditional HTTP requests which becomes an easy
target for scrapers. Using the requirements from Section 2.2 we can conclude
the following:
1. The client must perform certain calculations that are not easily spoofa-
ble or in some way circumvented without investing time and compu-
tational power. A current and unique solution, specifically for a user,
changes after each reward.
2. The contract must have a way to validate current solutions and in-
validate solutions already verified. In addition, a mechanism which
prevents a user from sending multiple identical solutions. This is a
transaction consensus property of blockchains, discussed in Section 3.2.
3. The server must be able to check whether a user has sufficient tokens
to exchange for a resource and after providing the resource be able to
subtract a fee from the user.
The proposed protocol can be seen in Fig. 24. At initiation of the mobile
application, the client starts to generate a proof and then sends it to the
blockchain for validation. After reception of a token, it will send a request
to the server through HTTP. The server, which is also connected to the
blockchain, can directly read the current state of the blockchain and subse-
quently send the client a resource. Finally, it must in some way simulate the
consumption of a token and it is done with token burn, a technique which
subtracts tokens from a user on the network. This means it is not really
necessary to transfer tokens from a user to the contract since the tokens
48
are fungible and without any other practical use or value. The sequence is
repeated continuously and in practice we could stop mine after collecting a
certain amount of tokens. This amount should preferably let genuine users
use the service freely. Scrapers on the other hand would struggle with the
dynamic and restrictive resource-granting service if they wish to continue as
in Table 2. Note that this scheme does not stop scrapers entirely but aims
to limit high-volume scrapers since it is necessary to put in computational
effort to generate solutions.
Figure 24: Concept of how the scraper protection mechanism should operate.
We continue with the scraper protection mechanism in detail, from the per-
spectives of a blockchain and user, respectively. The proposed mechanism
is called proof of effort (PoE), due to its similarity with PoW and emphasis
on computational power. In Fig. 24 it is the highest level perspective being
displayed, and the different steps between the three systems. The isolated
interaction between the client and the blockchain in more detail can be seen
in Fig. 25 and here we only take into consideration what the blockchain
49
knows and expects of the user. Nothing about any off-chain (device) calcu-
lation.
Figure 25: PoE protocol principles and expectations of the blockchain. Note
that the proof contains three components, where the || operator denotes
concatenation.
As can be seen, the blockchain only receives the nonce N and nothing else
from the client. This is because the public address is implicit in the message
and the token information is already stored and managed by the blockchain.
Furthermore, the incrementation will only occur after a token has been sent.
The off-chain algorithm and steps taken by the client can be seen in Fig.
26.
50
Figure 26: PoE protocol principles and expectations of the client in each
cycle. The computation is done in the hash step.
Since tokens must be acquired with continuity, both Fig. 25 and Fig. 26 show
each cycle (iteration) of the protocol. For the client, the same components
as in Fig. 25 have to be taken into consideration. Firstly, by cryptographic
methods, an Ethereum address is uniquely generated when participating in
a blockchain network. The address is based on the creation of a private
key through the Elliptic Curve Digital Signature Algorithm. It is necessary
for identification and the ability to transfer funds. The amount of tokens
received, T, is incremented whenever a user gets rewarded for a valid proof.
Since this increases after successful proofs are sent, we will have a unique
solution at every new iteration. In fact, the hash output is significantly
affected by a small change in T due to the avalanche effect of cryptographic
functions (see Section 3.5).
51
4.3 Smart contract development
With Solidity, we will now implement the proposed protocol in Section 4.3
for the blockchain in Fig. 25. The final function, named checkPow, can be
seen in Fig. 27.
Figure 27: The main smart contract function which checks and validates
proofs.
{z } f80ceb128711d5c1e0cd34bc6d588eb9165c1812d396909
hash9 = |000000000
9 zero-lead
52
Figure 28: The helper function in the smart contract which checks a hash
output for a certain amount of leading zeros. (Credits to Alexander for his
contribution.)
They way it operates is to iterate over a hash, from left to right, in number of
steps equal to the required amount of leading zeros. If any byte is not equal
to a zero, the function does not satisfy the difficulty. Furthermore, since
both checkPow and setDifficulty are functions which stores information
on the blockchain, they require gas to function. Because of that, they have
the payable identifier in the function declaration. The last function used is
setDifficulty, which can be seen in Fig. 29.
Figure 29: The function used by the authority to regulate the difficulty of
the proofs.
This function changes the amount of zeros necessary to have a valid proof.
Since the function should only be used by the contract owner, e.g. TML, this
is being checked before the difficulty is adjusted. The variable owner is set
when the contract is deployed. We let the variable difficulty to be global and
publicly accessible by all participants through the function in Fig. 30.
53
Figure 30: The function used to read the current difficulty of the network.
The final two functions displays the current token amount and the total
amount of tokens received, as in Fig. 31 and Fig. 32, respectively.
Figure 31: The function used to read the amount of tokens of an address.
Figure 32: The function used to read the total received tokens of an address.
The subsequent step is to implement the protocol of the client in Fig. 26.
We aim to generate a solution which can be verified in the same way as on
Ethereum. To make this possible, it is vital to take advantage of the light
client library for mobile communication with the blockchain. This is the most
critical step for a functional POC and part of the compatibility requirement
of Ethereum discussed in Section 2.5.3.
54
If it is assumed we run a blockchain according to the disposition in Sec-
tion 4.2, the first step is to connect to it with a mobile device, as in Fig.
33.
This node configuration should match the blockchain. The configuration in-
cludes information about the network such as its network ID number, genesis
block, and the enode address ID. An example of an enode address could look
like the following:
55
4.4.2 Wallet setup
Figure 34: Creation of a public and private key pair, i.e. a blockchain wallet
to receive and send transactions.
There is a custom object called KeyStore which manages storage and en-
cryption of the information needed. There is also an Account object that
represents a stored key while the Address represents a 20 byte address of
an Ethereum account. The only way to fully interact with the blockchain
through the blockchain library is to use these objects. Note that it is not rec-
ommended to write a password explicitly as in the string variable passphrase
but is once again something done for simplicity in this POC.
56
4.4.3 Read states with contract calls
It is easy to call a contract and read states with the terminal interface. For
mobile devices, it has not been as easy to simply send the function call as
done in Fig. 23. The code to call a contract which reads the difficulty of the
blockchain is in Fig. 35.
Figure 35: Read the current difficulty of the blockchain through the mobile
client.
Here there are two main issues to discuss. Firstly, we have encountered
an error where it is not possible to simply call a contract without a pa-
rameter. To this end, it was necessary to adjust the function such that it
receives an unused parameter, e.g. a string. The consequence is that the
function getDifficulty in Fig. 30 has to be adjusted to include an arbitrary
parameter, which does not affect the function itself. Secondly, to retrieve an
integer stored on the blockchain incurs some kind of marshalling error. This
means that we have only successfully managed to fetch strings and booleans.
The new getDifficultyStr function which tackles these two shortcomings is
found in Fig. 36, where we had to introduce an additional function which
converts integers to strings.
Figure 36: The new version which makes it possible to read the difficulty of
the blockchain as a string.
57
function can be found in Fig. 47 in Appendix A. The same approach is used
to retrieve the current token amount and total tokens received. In addition
to the same shortcomings as previously, another error occurs when a string
simply contains a zero (”0”). To mitigate this problem, we had to introduce
a function to replace requestAmount in Fig. 32. This function, seen in Fig.
37, returns a minimum value of 1 to circumvent this issue.
Figure 37: The new version which makes it possible to read the total received
tokens.
Since this shows one more token than the actual amount, it needs to be taken
into consideration when performing off-chain calculations which results in a
corresponding subtraction as in Fig. 38.
Figure 38: Read the total received tokens through the blockchain.
58
4.4.4 Write states with contract calls
The last step is to generate and send a solution in accordance to the PoE pro-
tocol of the client developed earlier, in Fig. 26. We will continue to present
the last step of the mobile communication with the blockchain and in the
next subsection finish with how a solution is generated off-chain, a (Java)
function we have named generateSolution.
59
Figure 40: Sign and send a transaction call to verify a dynamic PoW.
The same goes for strings but in contrast to integers they are padded to
the right of the 32 bytes while the content is still UTF-8 encoded (ASCII)
in hexadecimal notation. Again, we can illustrate how the string KTH is
represented when right padded to 32 bytes
60
Figure 41: The algorithm to generate dynamic solutions based on a public
address, total received tokens, and a nonce. Note that toLeadingZeros is
used for zero left padding of integers too.
In line with the PoE protocol the nonce N is incremented until the hash out-
put of the total string (public address, amount of received tokens, and the
nonce) satisfies the difficulty, i.e. a number of leading zeros of the hash. To
simulate the same Solidity (on-chain) Keccak-256 hash algorithm off-chain
we use the web3j library and its function [Link] [43]. The steps taken by
the algorithm are the following:
61
6. Else, increment the nonce, i = i + 1, and go back to step 2
Figure 43: Convert a text string to its corresponding ASCII code in hexadec-
imal notation.
Figure 44: Parse a string of ASCII code in hexadecimal notation for conver-
sion to a text string.
This concludes any communication between the mobile device and smart con-
tract (blockchain). In practice, the prototype of the communication between
62
blockchain and a smart device worked as expected. There is still more to
develop for the blockchain and server communication. Despite that, the last
part of how the server communicates with the contract is presented, in line
with the PoE protocol.
We use the Geth library to make a simple mock server which receives HTTP
headers to verify and then query the blockchain for information about the
sender, specifically the token balance. Due to a somewhat flawed mobile
client, the server does not contain the full concept of token burn but this is
a matter of implementation. It is possible to make a simple function in the
contract which subtracts the current tokenBalance of a user with an amount.
Since the server is not central to the solution and will not be included in any
tests, the full server functionality can be found in Appendix C. We summarize
the logic of the server as following:
1. Start a local server and connect it to the blockchain, as in Fig. 48.
2. Deploy the smart contract which includes the functions in Section 4.4.
3. Open a listener port and wait for custom headers X-Wallet-Address
4. Check the value to see if the header is formatted as a valid wallet
address
5. If that is the case, check the current token balance of that address and
write a simple string
63
5 Test Results and Analysis
It is now time to test the PoE protocol in a feasible and local environment.
Even though tests will not include any delay to the server, it is still possible to
evaluate the communication between the mobile device and the blockchain.
Moreover, the majority delay will most likely take place during the compu-
tations of the PoE protocol as it should, when fine-tuned, provide the user
with a continuous but moderate access to token rewards. It needs to be
emphasized that tokens are a mere commodity that can be exchanged for
something of value, and the philosophy lies in the adjustable rate of being
able to receive these tokens.
The test setup is simply that at the end of each loop, information about
the round is logged in the Android Studio debug console as in Fig. 45.
Figure 45: At every round: save the timestamp, amount of tokens, current
difficulty, and balance.
An extract of a log can be seen in Fig. 46, where the first two entries are
month/day and timestamp, respectively.
64
Figure 46: Log output in Android Studio: each line represents one round
and a posted solution to the blockchain.
It should be noted that sometimes it does not receive a token, such as at the
time incidents [Link] and [Link]. Based on the three tests, this is not
an outlier and is a repeated phenomenon which occurs throughout the whole
session. Indeed, it is not certain to as why this occurs and if the issue is on
the client or the blockchain side. Due to the short interval in the first case,
less than 4 seconds, it could be suspected that it found a solution in a short
time but the solution was outdated due to new states on the blockchain, i.e.
a higher total received tokens. On the other hand, in the second case it is
noted that between [Link] and [Link] the client has received two tokens
within one round.
Since the client software has certain flaws, it does not work to change the
difficulty while a client is running. To change the difficulty and make the
program work by itself, any adjustment must be done before the client is
connected to the blockchain. Thus, it is not possible to change the difficulty
while the program is running.
65
1. Clear the whole blockchain of any block data and cache and initialize
a new blockchain from a genesis block
2. Run the mobile client which contains the developed client side software
and save its public and enode address
3. Create a coinbase account, unlock it, and then add the mobile device
to the blockchain network
4. Mine exactly one block for a reward of 5 ether to be able to deploy a
contract
5. Deploy a contract trough Truffle and mine exactly four blocks such that
the contract is processed and exists on the blockchain
6. Fetch the contract address and ABI of the exchange value contract,
EvNew, through Truffle and apply it to the blockchain
7. Change the contract difficulty to 4 with setDifficulty which results in
a token reward frequency of approximately 20 to 120 seconds
8. Mine five additional blocks such that the blockchain with high certainty
has changed state of the difficulty
9. Transfer 10 ether to the light peer and log each round according to Fig.
45.
To analyze these logs, we used the programming language F# to parse and
graphically represent the amount of requests in relation to time between a
token is received. This was achieved by taking the difference between each
time stamp and pair it with each round number (see Appendix F). To increase
the confidence in that there would be no scalability issues in a local network,
it would be expected that the request time is stochastic and without an
increasing trend. In Appendix E the token generation graphs are shown and
from the sample size, the blockchain size does not appear to affect the time
to receive a token. On the contrary, it looks almost periodic, although the
main result observed is that the output (the time between tokens received)
is a stable and bounded signal which does not grow indefinitely.
66
5.2 Outcome
We refer back to Table 4 in Section 2.5 and the comparison between potential
solutions. The observed outcome can be seen in Table 5.
Requirement 2: the solution should not grant the client access to more
information than necessary. The purpose of using blockchain has satisfied
this criterion. A client only knows a cryptographic scheme and how to solve
a certain puzzle. The details of how to cirumvent it is not apparent and
interaction with the server does not rely on direct HTTP iteractions.
67
and fast on a blockchain as long as there are miners. This project has not
the ability to evaluate this criteria in depth however.
Since the project tackled the specific issue of web scraping, the focus was
to investigate the Ethereum blockchain and its feasibility as an additional
security layer with the help of smart contracts. The outcome has not en-
tirely confirmed whether or not it would work in a commercial environment
due to project progress limitations and maturity of software.
The first thing done was to deploy a simple contract on the blockchain and
try to call it from another computer, all through the console (Ethereum in-
terface). Subsequently, time was spent on the smart contract and finally the
light client. A better and more time-efficient approach would most definitely
have been to start with calling a simple contract on the mobile device and
learning about its limitations and flaws before the project proceeds to smart
contract development. In retrospect, this would have been faster since a dis-
proportional amount of time were spent on smart contracts and superfluous
helper functions in combination with unresolved issues. Many of which could
have not only been omitted but in the end would need to be rewritten. The
figures we have shown after function improvements in Section 4.5 are simple
getters and not as major as our original checkPow which used other ways
to append strings and data types, as well as ways to check for leading zeros.
In like fashion, time was spent on issues that are out of our control and
68
not vital to the project such as automatic peer discovery and the way a new
port is opened at every new session. Similarly, a lot of time was spent on
attempts to resolve other light client bugs with ad hoc solutions which after
a few weeks would get patched in the subsequent update. At the same time,
we tried to fix bugs that still are unresolved as of today and without any
shown interest by the community to tackle. It goes without saying that one
cannot demand much from the open source community but as for the goal
and ambition of the project it was a major headache. Solidity, for instance,
does not natively support string concatenation, conversion of data types such
as string to integer, or string comparison, to mention a few.
69
5.4 Server and network
The project was only conducted on a local area network and in a specific
office environment. Since we never tried it on a public and decentralised test
network, we implicitly assumed the network would and will continue to be
ideal. Specifically, there are several network fallacies [44] which have not
been taken into consideration at all. These are as following
1. The network is reliable
2. Latency is zero
3. Bandwidth is infinite
4. The network is secure
5. The topology does not change
6. There is one administrator
7. Transport costs are zero
8. The network is homogenous
In retrospect, our project touches the majority of these points and likely all
but 3). First and foremost, we assume a reliable network. Although this
is not a significant factor since a traditional client-server approach has this
assumption also and is fundamentally implicit for any web service. Our la-
tency is not only very low but the lowest possible due to operating a network
service on localhost/[Link] which is the IP address of the local machine.
Hence, our tests are outcomes of a very ideal and unrealistically low latency
scenario for network services. While this may be true, the bandwidth is most
likely less of an important factor since packets include comparatively small
message payloads with only characters and no multimedia.
The network is as secure as the local machine and area network; security for
the solution is not more important than network precautions and preventive
measures in general for web services. The topology and administrator factor
are not to be dismissed easily since they are a very fundamental assump-
tion in our project and if this proposed solution would be further developed.
Because in the end, it would require an entitiy to control the blockchain dif-
ficulty. Not necessarily a person but could very well be an algorithm which
70
uses statistical methods and data based on the users and requests, to change
token prices and even control individual difficulties. Similarly to the way
control systems work with feedback loops in industrial environments. Go-
ing back to topology, this would in practice mean that we still have one, or
a limited, amount of administrators or central nodes. It could be argued
that more middle layer oriented protocols have the opportunity to make
certain mass-consumer data communication more opaque with the help of
blockchain technology, in a positive way for service providers and genuine
users. This could in practice make unwanted scraper behaviour very difficult
and strengthen commercial web applications with exposure to clients over
the internet, with respect to medium access and infrastructural privacy for
any type of agent.
The protocol was designed to make it as difficult as possible for scrapers but it
is not without flaws. The most severe one is that it assumes all mobile devices
on the network to be equal. In practice, different models have different CPU
power and consequently hash power. This is an additional factor that needs
to be considered. Likewise, it could also be questioned if HTTP requests
between the device and server will work as intended, especially without ob-
structive delay and congestion. Finally, we have not investigated the power
consumption of this type of hashing. The proposed computations really need
to be kept on such a level that it is not intense for the device whatsoever.
Following that, it will become apparent that using such applications will be
more energy consuming and the question is only to what degree.
71
Regarding the main test, there are several parameters which could have had
affected its outcome. Since each component has not been tested in isola-
tion, it is difficult to find a significant cause and effect of the issue with
non-credited tokens and double-tokens in a round for instance. Likewise, the
experiment may suffer due to possible lack of internal validity and several
uncertainties such as
• wireless connection delay
• internet speed
• block congestion on the blockchain
• propagation speed and delay to the blockchain
• device and hardware delays and race conditions
• software specific delays and race conditions
• other stochastic phenomena
72
6 Conclusion
73
a malicious user decompiles the program and reverse engineer the algo-
rithm, it can not circumvent the fact that the cryptographic component
forces the user to solve puzzles. Because of the nature of hashing and
its avalanche effect on the output, the puzzles are solvable only in one
way: bruteforce.
• In contrast to the traditional client-server architecture, we have ex-
plored a model consisting of client-blockchain and server-blockchain
communication. By decreasing communication with a server and mainly
communicate with an additional security layer, the blockchain, it is pos-
sible to better protect and conceal data. Nonetheless, it is not certain
that a public or a private blockchain is the right choice.
Finally, another type of project could be to question whether or not the cur-
rent blockchains are usable and technologically sustainable for mobile clients.
Perhaps a blockchain specifically constructed and customized for mobile de-
vices is more suitable.
74
Appendix
Appendix A: Solidity
75
Appendix B: Decode ASCII characters
start end
0 1 2 3 4 5 6 7 Indices
6 4 6 1 7 6 6 5
start end
0 1 2 3 4 5 6 7 Indices
6 4 6 1 7 6 6 5
76
Appendix C: Server
Figure 48: Setting up a Geth server locally through RPC and deploying the
smart contract.
77
Figure 49: Starting a Geth server locally to listen for HTTP headers to parse;
if correct format and enough tokens, reward with a resource.
78
Appendix D: Android
Figure 50: Necessary information about the blockchain for the mobile client.
79
Appendix E: Token generation graphs
Figure 51: 4 hours of token generation: a graph of the time between tokens
received, in seconds, and amount of requests.
80
Figure 52: 4.5 hours of token generation: a graph of the time between tokens
received, in seconds, and amount of requests.
81
Figure 53: 5.5 hours of token generation: a graph of the time between tokens
received, in seconds, and amount of requests.
82
Appendix F: Token generation
Figure 54: Parse raw data in F# of the token generation logs from Android
Studio and structure it. (Credits to Gabriel for his contribution.)
Figure 55: Plot the structured data in F#. (Credits to Gabriel for his con-
tribution.)
83
References
84
[13] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system,”
[Link], October 2008.
[14] [Link]
Creative Commons Attribution 4.0 International (CC BY 4.0), Accessed
2018-05-23.
[15] G. Wood. Ethereum: A secure decentralised generalised trans-
action ledger. [Link] Ac-
cessed 2018-02-16.
[16] S. Tikhomirov, “Ethereum: state of knowledge and research perspec-
tives,” FPS 2017, October 2017.
[17] D. Göthberg. [Link] cryptography#
/media/File:Public key [Link]. Accessed 2018-05-24.
[18] D. C. de Leon, A. Q. Stalick, A. A. Jillepalli, M. A. Haney, and F. T.
Sheldon, “Blockchain: properties and misconceptions,” Asia Pacific
Journal of Innovation and Entrepreneurship, vol. 11, pp. 286–300, De-
cember 2017.
[19] J. Teutsch, S. Jain, and P. Saxena, “When cryptocurrencies mine their
own business,” Financial Cryptography and Data Security: 20th Inter-
national Conference, pp. 499–514, February 2016.
[20] J. Wild, M. Arnold, and P. Stafford. Technology: Banks seek the key
to blockchain. [Link]
567b37f80b64#axzz3qe4rV5dH. Accessed 2018-06-24.
[21] K. Wüst and A. Gervais, “Do you need a blockchain?” IACR Cryptology
ePrint Archive 2017, April 2017.
[22] User:Umapathy. [Link] network#/media/
File:Star [Link]. Own Work CC BY-SA 3.0, Accessed 2018-06-
26.
[23] J. Garzik. Public versus private blockchains part 1: Permissioned
blockchains. [Link]
[Link]. Accessed 2018-06-25.
85
[24] A. Berentsen and F. Schär, “A short introduction to the world of cryp-
tocurrencies,” Federal Reserve Bank of St. Louis Review, First Quarter
2018, vol. 100(1), pp. 1–16, 2018.
[25] D. Liu and L. J. Camp, “Proof of work can work,” The Workshop on
the Economics of Information Security, 2006.
[26] A. P. Ozisik, G. Bissias, and B. N. Levine, “Estimation of miner hash
rates and consensus on blockchains,” arXiv, 2017.
[27] T. Moore and N. Christin, “Beware the middleman: Empirical analysis
of bitcoin-exchange risk,” Financial Cryptography and Data Security -
17th International Conference, pp. 25–33, April 2013.
[28] N. Alexopoulos, J. Daubert, M. Mühlhäuser, and S. M. Habib, “Beyond
the hype: On using blockchains in trust management for authentica-
tion,” 2017 IEEE Trustcom/BigDataSE/ICESS, Aug 2017.
[29] M. Bastiaan, “Preventing the 51%-attack: a stochastic analysis of two
phase proof of work in bitcoin,” 22nd Twente Student Conference on IT
January 23rd, 2015.
[30] Y. Gao and H. Nobuhara, “A proof of stake sharding protocol for scal-
able blockchains,” Proceedings of the APAN – Research Workshop 2017,
2017.
[31] V. Buterin and V. Griffith, “Casper the friendly finality gadget,” arXiv,
October 2017.
[32] J. Bonneau, A. Narayanan, A. Miller, J. Clark, J. A. Kroll, and E. W.
Felten, “Anonymity for bitcoin with accountable mixes,” Financial
Cryptography and Data Security: 18th International Conference, March
2014.
[33] T. Ruffing and G. Malavolta, “Switch commitments: A safety switch for
confidential transactions,” Financial Cryptography and Data Security:
18th International Conference, March 2014.
[34] S. Meiklejohn and R. Mercer, “Möbius: Trustless tumbling for transac-
tion privacy,” PETS 2018, January 2018.
[35] E. Gaetani, L. Aniello, R. Baldoni, F. Lombardi, A. Margheri, and
V. Sassone, “Blockchain-based database to ensure data integrity in cloud
86
computing environments,” In Proceedings of the First Italian Conference
on Cybersecurity (ITASEC17), January 2017.
[36] M. Alharby and A. van Moorsel, “Blockchain-based smart contracts: A
systematic mapping study,” Computer Science & Information Technol-
ogy 7 (10): 1-16, August 2017.
[37] L. Luu, D.-H. Chu, H. Olickel, P. Saxena, and A. Hobor, “Making smart
contracts smarter,” 2016 ACM SIGSAC Conference, October 2016.
[38] A. M. Antonopoulus, Mastering Bitcoin. O’Reilly, 2014.
[39] A. Narayanan, J. Bonneau, E. Felten, A. Miller, and S. Goldfeder, Bit-
coin and Cryptocurrency Technologies. Princeton University Press,
2015.
[40] P. Witoolkollachit, “The avalanche effect of various hash functions be-
tween encrypted raw images versus non-encrypted images: A compar-
ison study,” Journal of the Thai Medical Informatics Association, 1,
69-82, 2016.
[41] Ethereum Foundation. [Link]
[Link]#abi-packed-mode. Accessed 2018-05-28.
[42] [Link]
ABI#examples. Accessed 2018-05-28.
[43] Conor Svensson. [Link]
src/main/java/org/web3j/crypto/[Link]. Accessed 2018-05-25.
[44] J. Newmarch, Network Programming with Go. Apress, 2017.
[45] Stack Exchange. [Link]
10811/solidity-concatenate-uint-into-a-string. Accessed 2018-04-23.
87
TRITA-EECS-EX-2018:478
[Link]