Static Analysis for OpenID Connect Vulnerabilities
Static Analysis for OpenID Connect Vulnerabilities
Abstract
OpenID Connect has become a de facto standard for managing authentication and autho-
rization in Web applications. It is however challenging for developers to understand the pro-
tocol and securely implement a client application. Even using an SDK that helps them along
the way, developers are responsible for doing data validation in a precise manner. The cor-
rectness of this validation can be ensured using security analysis and vulnerability detection
tools.
Previous solutions on security analysis and tools for vulnerability detection of OpenID
are mostly based on complex, formal models and comprehensive penetration testing frame-
works that cover the whole protocol. These often require much work to understand, develop
and use.
The objective of this thesis is to introduce a more developer-oriented way to ensure fewer
vulnerabilities in such client applications. This thesis proposes (1) a pragmatic model of the
authorization code flow, as a straightforward checklist targeted specifically at the concerns
of the developer, and (2) a demonstration that relatively simple static analysis techniques,
based on this model, can be used to find vulnerabilities related to the needed security checks.
The effectiveness of the analysis techniques is demonstrated experimentally on six open-
source clients, of which four were found to have vulnerabilities. 20 vulnerabilities regarding
incomplete or missing token validation were detected. The analyzer for token validation
had a precision of 61%, recall of 100% and a true negative rate of 90%. Its precision may be
improved further with a few weeks of engineering effort. More reliable metrics of its perfor-
mance can be found by doing a large-scale empirical study.
ii
Sammendrag
OpenID Connect has blitt en bransjestandard for å håndtere autentisering og autorisering
i Web-applikasjoner. Likevel er det vanskelig for utviklere å forstå protokollen og imple-
mentere en klient-applikasjon på en sikker måte. Selv om de bruker en SDK som hjelper
dem med detaljene, er utviklerne ansvarlige for å presist håndtere datavalidering. Sikkerhet-
sanalyse og automatiske verktøy for å finne svakheter kan bli brukt til å sørge for at denne
datavalideringen er gjort skikkelig.
Tidligere løsninger på sikkerhetsanlyse og automatiske detekteringsverkøy for OpenID
er for det meste bygget på komplekse, formelle modeller og helhetlige rammeverk for “pene-
tration testing”, som dekker hele protokollen. Det er ofte krevende å forstå, utvikle og bruke
disse løsningene.
Målet med denne masteroppgaven er å introdusere en mer utvikler-orientert måte for
å begrense mengden sikkerhetshull i klient-applikasjoner. Denne oppgaven presenterer (1)
en pragmatisk modell av protokollflyten, formet som en direkte sjekkliste som er rettet mot
det som angår utvikleren, og (2) viser at enkle statiske kodeanalyser som er basert på denne
modellen, kan brukes til å finne svakheter relatert til disse sikkerhetsjekkene.
Styrken til analyseteknikkene presenteres eksperimentelt på seks klienter med åpen kildekode.
Fire av disse har svakheter. 20 svakheter knyttet til usikker validering av “ID-tokens” ble
avdekket. Analysen for validering av ID-tokens fikk en precision på 61%, recall på 95% og en
falsk-negativ-rate på 90%. Presisjonen kan økes ytterligere med noen ukers ingeniørarbeid.
Mer pålitelige metrikker kan bli funnet i en stor-skala empirisk studie.
iii
Preface
Acknowledgment
This research would not have been possible without Kantega and NTNU, and the people
helping me. Kantega has been an arena where I have been welcome to come and work, and
the consultants there have been helpful and willingly participating in the study I did in my
work preceding this thesis. I am thankful that I have had the opportunity to study computer
science at NTNU, taking many interesting and educational courses that gave me the needed
theoretical foundation.
Thanks the people who have given me good advice and proof-reading along the way.
Thanks to Bjarte Østvold for your advice, our meetings have been really helpful.
I would like to give special thanks to co-supervisor, sparring partner and colleague from
Kantega, Edvard Karlsen, and my supervisor Jingyue Li for their invaluable contributions to
my work in encouraging and engaging discussions throughout the project.
Finally I want to thank my family and closest friends for your support and encourage-
ment. I was able to complete this thesis thanks to you all!
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Sammendrag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Background 6
2.1 Access Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Access Control Vulnerabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 OAuth 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 OpenID Connect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.5 Program analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.6 Validation of program analysis tools . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.7 Architecture of the FindSecBugs plugin . . . . . . . . . . . . . . . . . . . . . . . . 26
3 Related Work 33
3.1 Security analysis of OAuth and OpenID Connect Specification . . . . . . . . . . 33
3.2 Automated tools for detecting OpenID Connect vulnerabilities . . . . . . . . . . 37
3.3 Precursory thesis work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4 Research Design 41
4.1 Research motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3 Research strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4 Data generation and analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.5 Design to answer RQ1: Analysis of OpenID Connect . . . . . . . . . . . . . . . . 43
iv
CONTENTS v
7 Evaluation 81
7.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.2 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.3 Qualitative analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8 Discussion 100
8.1 RQ1: The developer-oriented model . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8.2 RQ2: Implementation of static analysis . . . . . . . . . . . . . . . . . . . . . . . . 102
8.3 Threats to validity and reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
References 113
List of Tables
vi
LIST OF TABLES vii
Glossary
Access token A credential understood by Identity Providers, that is used to grant access to
clients.
Client An application that connects with an Identity Provider for identity management.
Detector An class in FindBugs, which is an analyzer scanning Java classes for detecting a
certain kind of bugs. FindBugs is formed of several detectors.
Identity Provider The server in OpenID Connect that manages the authentication of the
end-user.
ID token A data item in OpenID connect given in the JSON Web Token format, that contains
information about the user’s identity as well as integrity-ensuring data.
JSON Web Token An open, industry standard method for representing claims securely be-
tween two parties like the Client and the Identity Provider.
OpenID Connect An identity layer on top of the OAuth 2.0 authorization protocol.
Penetration testing Testing a system for vulnerabilities by launching attacks and analyzing
the outcomes.
Precision The fraction of warnings from an analyzer that were true vulnerabilities.
Static analysis Automatic inspection, reasoning about the program’s code without running
it.
True Negative Rate The fraction of non-vulnerable code that was classified as non-vulnerable
by an analyzer.
LIST OF TABLES ix
Acronyms
API Application Programming Interface
RP Relying Party
Introduction
1.1 Motivation
OpenID Connect (OIDC) is becoming increasingly common in modern Web applications
as a de facto standard for authentication and authorization with Single sign-on federation
services. Developers may use well-known Software Development Kits (SDKs) for building a
Relying Party (RP) in OIDC to connect their app with an Identity Provider (IdP). Examples of
such SDKs are the Nimbus OAuth SDK [18] and Google OAuth Client Library [38, 39].
Even if these SDKs help the developer by encapsulating several difficult implementation
details, the developer and the SDK still share a common responsibility in securing the RP
application. The SDKs give tools for managing Web-specific features, and can provide and
parse strong data types for the data delivered between the RP and the IdP. Still, the devel-
oper is responsible for establishing a trust relationship with the IdP, and correctly managing
secrets and data that are needed to ensure integrity, confidentiality, and non-repudiation in
the communication.
Listing 1.1 shows a code sample from an open-source Android-app project [102], where
the developer has written code 1 verifying the ID Token using the Google library.
1 boolean isValidIdToken ( String clientId , String tokenString ) {
2 if ( clientId == null || tokenString == null ) {
3 return false ;
4 }
5 List < String > audiences = Collections . singletonList ( clientId ) ;
6 IdTokenVerifier verifier = new IdTokenVerifier
7 . Builder ()
8 . setAudience ( audiences ) . build () ;
9 IdToken idToken = IdToken . parse ( new GsonFactory () , tokenString ) ;
10 return verifier . verify ( idToken ) ;
11 }
Listing 1.1: Incomplete ID Token verification in an open-source android app project [102].
1
[Link] in the Zop-App project: [Link]
fc7f9a9b6f9e0f18b89612ced49d67001aa61deb/app/src/main/java/fi/aalto/legroup/zop/
authentication/[Link]
1
CHAPTER 1. INTRODUCTION 2
While the example may look fine at first sight, there are several risks associated with this
code. It lacks the following checks to satisfy the security requirements of the protocol speci-
fication [60, 62]:
The audience parameter is validated here, and the IdTokenVerifier on line 6 hides a de-
fault Freshness validation (which ensures that old or expired tokens are not used). This app
is therefore vulnerable of being exposed to known threats like man-in-the-middle or replay
attacks.
To understand which threats exist in the protocol and implementations of it, existing re-
search has been doing security analyses. Security analyses of OpenID Connect can divided
along to axes: one looks at formal security analysis and modeling of the protocol, or formal
security testing [2, 33, 34, 49, 62, 80, 86, 89], while the looks at implementations of the proto-
col with automated vulnerability analysis or testing tools [11, 51, 54, 76, 95, 96, 97, 99]. These
solutions generally seek to be comprehensive and tend to look at the threat models from the
perspective of a hacker (or an attacker). Several of the automated vulnerability analysis tools
require extensive work and configuration to use for discovering vulnerabilities in an applica-
tion implementing the protocol, and detect vulnerabilities late in the software development
life cycle.
Vulnerabilities in OpenID Connect can be considered a subset of Access Control Vulnera-
bilities, the fifth highest ranking risk according to the OWASP top 10 list of Web Application
Security Risks [92]. There have been several known cases of data breaches due to insecure
Single Sign-On implementations in the later years, like the Facebook breach in 2018 [55],
where millions of access tokens were hijacked. Due to insecurely implemented Relying Par-
ties lacking proper session management, adversaries could gain access to hundreds of web-
sites outside of Facebook itself. Even though existing solutions have been made to automati-
cally detect vulnerabilities, more must be done earlier in the development stage since several
clients on the Web still have vulnerabilities in their production code.
1.2 Objectives
I hypothesize that easy-to-use incomplete static analyses can be used for mitigating vulner-
abilities in Relying Party applications, early in the development stage. Simple-to-use static
analysis tools that do not require any configuration are something developers like [88]. These
simple analyses may mitigate vulnerabilities in a large portion of the more common security-
critical steps in OpenID Connect, as the steps share similar (uncomplicated) structural and
syntactic properties. The code structure for such critical steps is likely (or at least encour-
aged) to be relatively linear and simple [3], and vulnerabilities may consequently be quite
easy to find.
CHAPTER 1. INTRODUCTION 3
RQ1 What must a developer do to avoid introducing known security vulnerabilities, while
implementing a Relying Party with an OpenID Connect SDK?
RQ2 How can simple, explicit and intraprocedural static analysis checks be used to identify
vulnerabilities in OpenID Connect Relying Parties?
1.4 Contributions
This thesis proposes the following contributions related to RQ1:
This thesis explores the possibility that relatively simple and explicit static analysis tech-
niques can be used to find vulnerabilities in OpenID Connect, such as the ones in Listing 1.1.
The simple process of the analyses is demonstrated with an example. For detecting the
incomplete verification in Listing 1.1, the analyses could use something like the following
process:
1. This is a token verification method in OIDC. Another method with token request called
this method, and the method name and signature indicate token verification.
2. Here we expect that at least these n verification steps in the checklist are performed.
4. The warning informs the developer of the risks associated with not performing these
checks.
The analyses are added to the Find Security Bugs plugin, which is a popular easy-to-use
static analysis tool for detecting security bugs in Java. The tool comes as an IDE plugin, which
makes the analyses easily accessible to developers implementing the protocol in real-life
web applications. It may also be used in the graphical user interface provided by SpotBugs,
giving results like shown in Figure 1.1, where a vulnerability of missing ID token validation is
raised for a method.
2
This claim is supported in comparison to related work in Chapter 8.2.2
CHAPTER 1. INTRODUCTION 5
Vulnerabilities are detected by tailoring the checks based on the protocol flow, and check-
list over what steps are needed to ensure a secure RP.
This way the developer is instructed directly of the risks associated with the code flaws
in their security checks, while they still are in the context where the check is relevant. The
focus here is vulnerabilities in code calling OpenID Connect SDKs, meaning vulnerabilities
that developers introduce when they write code that interfaces with these SDKs. To be clear,
the analyses are not concerned with looking for vulnerabilities in the SDKs themselves.
Background
This chapter goes into the theoretical background with explanation and definitions of topics
that are used in this thesis. OpenID Connect and OAuth 2.0 are used to ensure authentica-
tion and authorization. Therefore Section 2.1 goes through Access Control (Authorization),
with an overview of common vulnerabilities. Section 2.3 shows how the authorization pro-
tocol OAuth 2.0 works, and Section 2.4 explains the workings of the authentication protocol
OpenID Connect. Then comes an insight into program analysis techniques in Section 2.5,
and the way program analysis tools are evaluated in Section 2.6. The abilities and architec-
ture of the static analysis tool Find Security Bugs is explained in Section 2.7.
Sections 2.1, 2.2, and 2.5 contain theoretical background that was mainly outlined during
the specialization project preceding this thesis [87].
6
CHAPTER 2. BACKGROUND 7
Mandatory Access Control, which is based on information sensitivity within resources, with
a formal authorization. Subjects are restrained from setting security attributes on a re-
source, and cannot pass on their access, hence the model is mandatory.
Discretionary Access Control, which is based on the identity of subjects, and what informa-
tion they need to know, in addition to group affiliation. A subject with a set of access
permissions may pass their access on to other subjects, hence the model is discre-
tionary.
Role-Based Access Control, which is based on roles within an organization, that are pro-
jected on to users and groups. Roles include collections of subjects within the orga-
nization that have a common need for access in order to perform their tasks. Access
levels or a set of permissions is formally defined for a role or group, and member sub-
jects inherit permissions.
in dated literature. The classifications that were made back then, no longer have a clear
definition in the newest updated lists. Inconsistent usage of terms makes it challenging to
properly classify the vulnerabilities that fall within the scope of an analyzer. Different re-
searchers tend to use their own definitions and understandings of the same terms, or use
the existing references to classifications. Access control is here viewed as any mechanism
explicitly or implicitly involved in controlling access to data in a given system. Here, the
following definitions are proposed to reason about the term “access control vulnerability”:
Definition 2.2.1. Data Leakage (DL) If some observer O can learn a piece of information I
from a software system S, and O is not supposed to be able to learn I, S has a Data Leakage.
Definition 2.2.2. Explicit Access Control Vulnerability (E ACV ) Explicit access control vul-
nerabilities are cases where the program source code explicitly fails in enforcing concrete,
program-specific access control rules, causing a data leakage.
Definition 2.2.3. Implicit Access Control Vulnerability (I ACV ) Implicit access control vulner-
abilities are any, potentially highly subtle, property of the program or software system that
can cause data leakage.
Definition 2.2.4. Access Control Vulnerability (ACV = I ACV ∪ E ACV ) Access control vulner-
abilities are the union of all explicit and implicit access control vulnerabilities, causing a
subset of all possible data leakages.
Here the relation is that DL ⊇ E ACV ∪ I ACV , meaning that an DL may entail subtle weak-
nesses that fall far beyond the scope of typical web-based access checks. Therefore the fur-
ther definition for ACV needs some fine-tuning, and for the purpose of this project, mentions
of ACV will be therefore limited to ACV ≡ E ACV ∪ I ACV
2.2.2 Examples
Examples of access control vulnerabilities common access control vulnerabilities in Web ap-
plications include:
• Metadata manipulation, where attacks are done through tampering with or replaying
access token or cookie, or manipulation of a hidden field to elevate privileges.
• Privilege escalation, for example when someone acts as a user without being logged in,
or manages to perform admin level actions as a normal user.
• Modification of application state to bypass access control checks. This includes modi-
fication of internal app state, URL, the HTML page or a custom API attacking tool.
CHAPTER 2. BACKGROUND 9
Other vulnerability classes in OWASP Top 10 as well as the CWE lists are used to define
different vulnerability classes can be considered relevant to access control, as for example
cross-site scripting (XSS) may lead to privilege escalation.
Modern Singe Sign-On (SSO) protocols seek to solve several of the issues with more clas-
sical Web-based vulnerabilities by leaving access management to a designated server, thus
separating identity management and delivery of resources. Even if several traditionally rooted
attacks become irrelevant with modern SSO, attacks with similar characteristics can still be
applied to applications implementing such protocols. Data must still be sanitized, and the
communications are still based on web requests.
Additionally, traditional protection mechanisms that work with a client-server model
may be circumvented if the application uses single sign-on, thus introducing novel attack
surfaces. Still, consequently if authorization is broken due to a flaw in the implemented
authorization protocol, it can be considered as an access control vulnerability as data was
leaked.
To better understand which vulnerabilities will apply when using single sign-on, Sec-
tions 2.3 and 2.4 go through the specification of the authorization protocol OAuth 2.0 and its
extension for identity management, OpenID Connect.
• The Client, an application that uses the resource owner’s authorization and makes
protected resource request on their behalf.
• The Authorization Server, responsible for issuing access tokens to the client after the
resource owner has been successfully authenticated and obtained authorization.
Meanwhile there are several data items that form the security properties of the protocol.
The protocol relies on the following important credentials used in the requests:
• Authorization Grant: A credential representing that the resource owner has given con-
sent allowing the client to obtain an access token.
• Access Token: A credential used to access protected resources, which is a string rep-
resenting an authorization issued to the client. This token is an abstraction layer that
replaces different authorization constructs with a single token understood by the re-
source server.
• Refresh Token: A credential used to obtain new access tokens when they expire or are
invalidated. This is an optional item to include together with the access token when
first prompted for tokens.
There are four authorization grant types that are defined in the protocol:
• Implicit grant The implicit grant is a simplified version of the flow used in the autho-
rization code grant. Instead of a flow with round trips, the client gets an access token
directly from the authorization server, effectively skipping the step that gives a code
grant. This grant is optimized for clients that run directly in the browser (therefore us-
ing a language like Javascript). This flow introduces some security risks that must be
considered against efficiency.
• Resource Owner password grant The password credentials of the resource owner are
used directly as an authorization grant to obtain an access token, skipping the round-
trip where an authorization code is issued. The client does not need to store the re-
source owner credentials, as these are used only once and can be replaced with a long-
lived access or refresh token.
• Client credentials grant The client credentials are used directly as an authorization
grant, effectively removing the resource owner from the picture. This grant can be
used when the authorization scope is limited to protected resources that belong to the
client.
CHAPTER 2. BACKGROUND 11
One of the advantages of this protocol is in the way the resource server only has to un-
derstand and validate access tokens when issuing protected resources to various subjects,
instead of having to handle various other authorization constructs. Otherwise a server would
have to understand an authorization construct, where the access is defined by the resource
owner directly authenticating with her username and password. OAuth is an abstraction
layer that allows for more flexible authorization rules, where the token can get a specific du-
ration of access and possibly a more restricted access than the authorization grant that was
used to obtain the token.
Figure 2.1: The authorization code flow in OAuth 2.0 has 10 steps from the client asking for
access via the authorization server, so that the protected resource can be obtained with an
access code. Steps C and D are broken into two parts, illustrating the interaction between
the browser and the resource owner.
The following steps are included in the illustrated authorization flow in Figure 2.1, with
the client seeking to access a protected resource at the resource server:
CHAPTER 2. BACKGROUND 12
(A) The flow initiated by the client, redirecting the resource owner’s browser to the autho-
rization endpoint with a set of parameters.
(B) The included parameters from the client through the browser are client identifier, re-
quested scope, local state and the redirection URI, which is the location the user agent
is redirected after access is granted. The browser presents the authorization endpoint
to the resource owner.
(C) Access is requested of the resource owner via the browser.
(D) The authorization server authenticates the resource owner, who either grants or denies
access.
(E) With granted access, the redirection URI from step B is used to send the browser back
to the client. Authorization code and the local state provided in the URI in step B are
included as parameters in the redirection URI.
(F) Client receives the authorization code as the browser is directed back.
(G) The client requests to get an access token by contacting the token endpoint provid-
ing the authorization code, hence it authenticates with the authorization server. The
redirection URI returned in step C is included for verification.
(H) The authorization server validates the authorization code, authenticates the client and
verifies that the parameters in the redirection URI matches the URI used to redirect
the client during step C. It returns an access token, and may optionally return a refresh
token.
(I) Having obtained an access token from the authorization server, the client can finally
request the protected resource from the resource server.
(J) The resource server validates the access token, providing the protected resource if the
token is valid.
Note that the last two steps, I and J, are optional parts of the flow, and not encapsulated by
the standard.
In authorization requests, the client adds a specified set of parameters to the query compo-
nent of the URI:
• response_type: Denotes what to expect in the response. This must be set as “code”.
• client_id: The unique identifier of the client, which is known by the authorization
server.
• redirect_uri: An encoded URI with the location to which the resource owner will be
redirected after authenticating with the authorization server.
• scope: The scope of the access. It represents a limitation to what kind of data the access
token can be used to obtain.
• state: An opaque string that is used to maintain a session state between the request and
the callback response. This value protects against Cross-site request forgery (CSRF)
attacks.
CHAPTER 2. BACKGROUND 13
The basic data transfers in the authorization code flow can then be illustrated by looking
at example HTTP requests. An authorization code request built by the client may look like
shown in Listing 2.1 (Step A). The client redirects the resource owner to the location in the
URL (Step B):
1 GET https :// a u th o r iz a ti o ns e r ve r . domain . com / authorize
2 ? response_type = code
3 & client_id = abc
4 & redirect_uri = https :// org . client . com / callback
5 & state = xyz
Listing 2.1: Step A-B: URL format for an authorization code grant request.
The authorization server then responds with a callback request after performing step C,
authenticating with the resource owner. It then redirects the resource owner back to the
redirect_uri. A callback response may be structured like in Listing 2.2
1 HTTP /1.1 302 Found
2 Location : https :// org . client . com / callback ?
3 code = S p l x l O B e Z Q Q Y b Y S 6 W x S b I A
4 & state = xyz
Then the client validates the state parameter in step F. Upon success, proceeds to step G,
and builds a token request using the authorization code it received. A token request typically
looks like shown in Listing 2.3. This time instead of redirecting the user, the client directly
contacts the authroization server on a back channel, leaving the browser out of the picture.
1 POST https :// a ut h o ri z at i o ns e rv e r . domain . com / token ?
2 grant_type = aut ho ri zat io n_ cod e
3 & code = S p l x l O B e Z Q Q Y b Y S 6 W x S b I A
4 & client_id = abcde
5 & client_secret = Xpbxlklk12WRlkoP
6 & scope = api . read api . write
7 & redirect_uri = https :// org . client . com / callback
After this, in step H, the authorization server gives a token response after receiving the
code. Token responses responses may come in the format shown in Listing 2.4:
1 HTTP /1.1 200 OK
2 Content - Type : application / json
3 {
4 " access_token ":"2 Y o t n F Z F E j r 1 z C s i c M W p A A " ,
5 " token_type ":" bearer " ,
6 " expires_in ":3600 ,
7 " refresh_token ":" t G z v 3 J O k F 0 X G 5 Q x 2 T l K W I A " ,
8 " scope ": " api . read api . write "
9 }
client access to protected resources, its design does not properly work to handle the integrity
of the data if used to do authentication where identity information must be obtained.
Table 2.1: An overview of how OAuth roles are extended or referred to with different terms in
OpenID Connect.
There are also more data artifacts and flows that are defined in the identify layer abstrac-
tion of the protocol. OpenID Connect introduces a set of request parameters in addition to
the ones described in OAuth (See Section 2.3.1). Among these is the nonce value, which is
a randomly generated string value. This value and the state value defined in OAuth serve
similar purposes. The state value comes in the callback request with the authorization code
in step F in the flow (See Figure 2.3), and must be verified by the relying party before the
token request is initiated. It binds the authentication request to the callback authentication
response. The nonce value comes with the token response, and ensures replay attack pro-
tection by binding the authentication request with the token response.
In addition, the following concepts are introduced in OpenID Connect:
• ID Token: A token that contains identifiers of the end-user as well as identifiers of the
IdP and integrity timestamp. It also contains the nonce value, which binds the token
CHAPTER 2. BACKGROUND 15
response to the initial authentication request. This token is sent in the token response
together with the access token. ID tokens come in the JSON Web Token (JWT) format,
which is defined in RFC7519 [45].
• Standard Scopes: A standard set of scopes are defined to specify what identity infor-
mation is available in a request. This information typically includes profile and email.
OIDC requests must always include the openid scope value.
• Claims: Claims are specific sets of information about an entity, typically the identity
information of a user.
• Discovery: The Discovery process is used to establish a trust relationship between the
relying party and the Identity Provider. The relying party sends a request to the /.well-
known/openid-configuration endpoint at the IdP, and receives a JSON document which
is called the Discovery document. This document forms a contract, and contains val-
ues that are used to ensure the integrity of the communication. The Discovery process
is described in its own document which was publish alongside the OIDC specifica-
tion [61].
A chart very similar to the model of OAuth (in Figure 2.1), is shown below in Figure 2.2.
Figure 2.2: The authorization code flow in OpenID Connect is quite similar to the autho-
rization flow in OAuth (Figure 2.1) as it is built on top of the authorization flow. The main
difference lies in some different abstractions, otherwise we see the same 10 steps with added
sub-steps. This model is based on the flow described by Navas and Beltrán [62].
Like Figure 2.2 shows, the main steps in the authorization flow are essentially the same
for OpenID Connect as for OAuth 2.0 (Shown in Figure 2.1), with the round-trips redirecting
CHAPTER 2. BACKGROUND 16
the end-user. The details that differ mostly from OAuth have been expanded to sub-steps,
which illustrates more of the critical validation events that are included in OIDC. Addition-
ally, the responses contain IDs-specific data instead of the generalized terms that are present
in OAuth.
Step A consequently consists of three sub-steps:
• (A 0 ) The end-user is the entity that naturally prompts the client to initiate the flow.
• (A 1 ) The client prepares the request. In this step it is critical to include the proper
parameters, and mistakes here may compromise the client.
• (A 2 ) When the request is ready client redirects the user to to the authorization end-
point.
Steps H and I are also expanded to highlight the validation events that are important to
ensure the integrity of the data sent between the entities.
Another perspective of this flow is presented in Figure 2.3, where the order of events flows
downwards in the chart.
CHAPTER 2. BACKGROUND 17
Figure 2.3: The authorization code flow in OpenID Connect in a sequence chart. This chart
is based on the flow described by Navas and Beltrán [62].
Token validation
Validation of tokens is one of the critical features in the protocol. During validation there
are several key steps that must be implemented correctly by the client developer. Table 2.2
shows the mandatory parameters in ID tokens. The ID token may contain a number of other
claims that have more identity information.
CHAPTER 2. BACKGROUND 18
Parameter Description
iss Issuer Identifier for the issuer of
the token, the IdP in format of a
case sensitive URL, using HTTPS.
sub Subject Identifier, a unique and
never reassigned identifier for the
end user within the IdP.
aud Audience the token is issued for.
An array of case sensitive strings
that at least must include the
client_id of the RP that sent the
Authentication Request.
exp Expiration time for the ID token
in Unix Epoch time. The ID token
must not be validated after this
time.
iat Issued At Time, when the ID token
was issued in Unix Epoch time.
This limits amount of time nonces
need to be stored.
• Token parsing: Received tokens must to be parsed into data objects so that they can
be processed further. All the required parameters must be present in the response. If
any of them is missing an appropriate error message must be produced. Following the
specification, any parameters that are not understood must be ignored.
• Origin verification: The RP receiving a token must validate the iss parameter, which is
the unique identity of the IdP. It must also check that the corresponding shared secrets,
keys, certificates, and other parameters are available and updated. These are needed
to perform further cryptographic verification.
• Audience verification: A token is intended for a single RP. Hence the aud parameter
should be checked for the correct value (the cl i ent _i d value issued from the IdP dur-
ing registration).
• Freshness validation: Validation of the token’s age to detect expired tokens. Parame-
ters such as exp and iat enables this validation. This is essential to avoid replay attacks.
• Session validation: The RP receiving a token must validate that the received nonce
parameter matches the one that was issued initially.
CHAPTER 2. BACKGROUND 19
• Cryptographic validation: This task involves the verification of signatures, and is usu-
ally the more time and resource consuming task. The cryptographic material (key, ci-
pher, etc.) belonging to the legitimate IdP must be used.
In addition to this, timing attacks may leak potentially useful information to an attacker.
If the code paths taken by successful or unsuccessful validation processes differ greatly, the
attacker may learn much about how the validation is structured. It is suggested to terminate
the processes and send an error message as soon as an error is found. All responses should
take similar amounts of time, whether they are successful or not [62].
Access Tokens may also be validated, but is considered an optional step in the Autho-
rization Code Flow. However using the Implicit Flow, the client must validate the access to-
ken [60] as the cryptographic integrity must be maintained as data has passed by untrusted
actors over a less secure connection.
Flow-sensitive analysis (Section 2.5.1) reasons about the program with the control-flow graph
(CFG). It is usually accurate, at the expense of also being time consuming.
Path-sensitive analysis considers path throughout the program that are valid. Variable val-
ues and booleans in conditionals or loops are reasoned about, so that execution branches
that are not possible can be pruned. Like flow-sensitivity, path-sensitivity implies ac-
curacy at a computational cost.
CHAPTER 2. BACKGROUND 20
Context-sensitive analysis takes into account things like global variables and parameters
of a function call, which form what can be considered the context. Context-sensitive
analysis is also known as inter-procedural analysis, which comes as a contrast to intra-
procedural analysis, that uses no context when analyzing a function. Context-sensitivity
implies a larger computational cost, but with a significant gain in accuracy compared
to intra-procecural analysis.
Pattern matching analysis uses simple linear code scans in a file to power a state machine,
looking for certain patterns of instructions in the code (like the invocation of a certain
type). Heuristics can be used to approximate control-flow of the program. It is very
fast and requires little memory, at the expense of accuracy [36, 44].
form the nodes in a page graph. The edges describe the way control may pass between basic
blocks [64, p. 5].
Figure 2.6 shows how control flow and data flow may pass through an arbitrary CFG. Data
may flow independent of the control due to global data structures [42]. Both direct data flows
− D), (D →
E. G. (C → − E ) and indirect data flows like (B →
− A) , (E →
− D).
Figure 2.6 illustrates a more concrete example of the data flow in a CFG of a simple if-then
statement [21, p. 488]:
Figure 2.6: Simple if-then statement with corresponding data flow in CFG
Finite state automata (FSA), or state machines, are seen as a flexible tools. FSA can either be
viewed as something that defines a language (i.e. a regular language), or defining a class of
graphs. The construction of an FSA contains a finite set which is the alphabet Σ, a finite set of
CHAPTER 2. BACKGROUND 22
states Q, the initial state i ∈ Q, the set of final states F ⊆ Q, and the set of edges E . This forms
a 5-tuple 〈Σ,Q, i , F, E 〉. Figure 2.7 below shows two examples of simple finite state automata
representing all the multiples of two and three as binary numbers [78, pp. 3–5].
kind of questions are interesting, and what kind of results can answer these questions, and
what kind of evidence demonstrates the validity of the results?
The first part is choice of questions. We can choose from pragmatic questions like method
of development on the form “How can we (better) create X” to analysis methods or design of
particular instances. In the other end of the scale we have generalization, characterization or
feasibility questions, more on the form “What is a good model for X” or “Does X even exist?”.
The more common kind of questions in international conferences tend to be of an improved
method or means of developing software, and analysis methods or testing and verification
questions.
The second component in this is the research result. The various kinds of results range
from procedure or technique, to qualitative, empirical or analytical models, tools (formal
language to support a model), specific solutions, judgments or reports of interesting obser-
vations. On one side with the models, we often look at formal results like taxonomies or
data-driven models. The work is highly constrained and often based on long and rigorous
data collection. On the other end with Specific solutions, judgments or reports, we are look-
ing at pragmatic software engineering solutions applied to problems, or careful analyses of
a system. The nature of the results in combination with the design of their evaluation tells a
lot about the validity of the research results.
A typical tendency has been that too many computer science papers contained no ex-
perimental or only informal validation of their contributions. The various choices for vali-
dation strategies differ in the value they contribute, ranging from a blatant assertion, which
is no serious evaluation of the results, to analysis, which is a thorough an time demanding
task. The choice of this strategy will impact the strength of evidence that the results of the
research are in fact sound. The two most commonly accepted methods are experience in ac-
tual use and systematic analysis. However well-chosen slice of life examples rooted in reality
are more convincing than idealized dummy examples, and reported as a common method
to use. Also Oates [65, pp.115.118] argues for the importance of real-life validation to get
convincing results.
• True positives (FP) are cases where the analysis reports a vulnerability, and this vul-
nerability exists.
• False positives (FN) are cases where the analysis reports a vulnerability, but no such
vulnerability exists.
• True negatives (TN) are cases where there is no vulnerability, and the analysis does not
report any vulnerability.
• False negatives (FN) are cases where existing vulnerabilities that don’t get reported.
CHAPTER 2. BACKGROUND 24
Table 2.3: Confusion matrix illustrating cases of true and false positives, and true and false
negatives.
Soundness and completeness are properties that are normally used to quantify an analy-
sis tools. Metrics for the properties come from computing false positives and false negatives.
Soundness and completeness have various definitions, but a commonly used definition is as
follows [26]:
The soundness of a program analyzer denotes whether it reports report all the issues in
the code. A sound analyzer may have false positives, but reports all existing issues (meaning
no false negatives).
The completeness of a program analyzer denotes whether it only report true issues. A
complete analyzer may have false negatives, but all its reports are true (meaning no false
positives).
It is however pointed out by Meyer [56] that soundness and completeness are boolean
properties in the sense that, either a tool is sound, or it is not. However in the assessment of
tools it is more interesting to look at the degree to which it can achieve one of these proper-
ties. To get a more granular sense in evaluating the metrics of a tool, the properties precision
and recall are often used instead to define the degree of completeness and soundness.
• The recall of a program analyzer is the percentage of the existing vulnerabilities that
are detected, in other words how sound is the analysis.
TP
Rec al l =
TP + FN
• The precision of a program analyzer is the percentage of its reports which are true
cases (in other words the true positive rate). We can say that the precision denotes the
degree of how complete the analysis is.
TP
P r eci si on =
TP + FP
A factor for both the precision and recall is that they use data from both columns of the
confusion matrix. This makes them sensitive to changes in data distributions, and may give
a skewed perspective on imbalanced data [90]. If two data sets have different numbers of
positives, the data distribution quickly changes. There exist metrics than can account for
imbalance of the data.
To deal with this, the true negative rate (or specificity) is an interesting metric, which de-
notes a classification model’s ability to correctly predict true negatives. This gives another
CHAPTER 2. BACKGROUND 25
perspective on how it resists false positives, without having to have a large volume of posi-
tives (which is required to get a confident precision) 1 . Both the identifiers in the metric lie in
the same column of the confusion matrix, so it is not sensitive to imbalanced data that easily
occur in small data sets [90]. This is because the changed values cancel each other out.
True negatives
True negative rate =
False positives + True negatives
False positives
False positive rate =
False positives + True positives
There is a clear trade-off when designing a program analysis tool for soundness or com-
pleteness, as aiming for one of them will limit the capability in the other in a real-world do-
main, and no program analysis tool fulfills both criteria. Designers of program analysis tools
must reason about whether they want to sacrifice completeness (precision) or soundness
(recall) [30].
tomation level. Such metrics are more convincingly computed in a deliberate test-bed, be-
cause it is hard to know about all the vulnerabilities in a real-life code base. On the other
side, artificial test-beds risk being too far from real code-bases in terms of complexity and to
which degree they cover realistic cases.
Three ideal test case characteristics are realism, statistical significance and ground truth.
Natural code bases offer realism, and may also provide statistical significance if in large
enough volume. However they lack ground-truth (we don’t have knowledge of all their vul-
nerabilities).
Delaitre et al. [26] found the following metric applicability for real-life natural test beds
versus artificial test-beds:
• For natural test-beds, precision is applicable, while it is hard to get convincing recall
rates because of the ground truth problem.
• For artificial test-beds, both precision and recall are applicable metrics.
Figure 2.8: FindSecBugs is integrated into the SpotBugs framework, utilizing its core detec-
tors [72].
• Bug: The definition of a sensible point or a vulnerability in the application. The def-
inition of a bug exist by its presence in the project configuration files [Link]
(which corresponds detectors to bug patterns) and [Link] (which contains de-
scriptions of the bugs and suggestions for fixes). When the bug is defined in these files
it can then be reported by detectors.
• Detector: A class containing the logic to find a bug type or a set of bug types. In other
static analysis tools it is also common to refer to these as “rules”.
The building blocks for FindSecBugs lies in the core framework, SpotBugs, which in its
order is built using Java Virtual Machine (JVM) bytecode abstractions from the Apache Com-
mons Bytecode Engineering Library (BCEL) 3 . Compiled Java code which is interpreted by
the Java Virtual Machine is located in .class files. FindBugs is designed to analyzed these
files.
Table 2.4: Overview of the layers of the various detector types in the SpotBugs framework as
shown in the FindBugs paper by Hovemeyer and Pugh [44].
However for relevance these layers are mainly divided into two rough categories, visitor-
based detectors (layers 1 and 2), and CFG-based detectors(layers 3 and 4), as elaborated in
4
an architecture document written at the time of version 0.94 of FindBugs by the project
founder, David Hovermeyer [23]. The visitor-based detectors are usually based on peep-
hole techniques, and are very computationally lightweight. The CFG-based detectors doing
control-flow and dataflow analysis are often heavier to run as they require more memory
with graph-based operations. Therefore if a peephole check is sufficient to quite confidently
classify a bug, usage of CFG-based analyses should be considered carefully. Despite this
distinction, the SpotBugs framework lays no real constraint on the way a detector is imple-
mented, and “any” analysis technique may be incorporated into a detector. In the end of the
day however, a bug detector has a very straightforward task: look at a compiled java class file
and find potential bugs, reporting them by creating a BugInstance object reporting it via
the BugReporter.
Visitor-based detectors
These visitor-based detectors often extend the class OpcodeStackDetector, which ultimately
is a subclass of DismantleBytecode. The basis behavior of these detector types is a top-
down traversal of the class file’s features, decoding the symbolic information. When the de-
4
Due to the document being dated on certain points, small modifications and corrections are made, taken
from code investigation of the current github repository of SpotBugs [84].
CHAPTER 2. BACKGROUND 29
tector encounters a feature like a field, instruction, method or others, a callback method is
invoked by the super class.
Visitor-based detectors can inspect the class file for suspicious features by overriding
these callback methods. In visitor-based detectors, a state machine (See Section 2.5.3) rec-
ognizer that works over the sequence of instructions is introduced as an important idiom.
The method sawOpCode() is the callback method handling individual instructions. Every
invocation of this method is a single input symbol to the state machine. The state machine
is practically a finite state automaton accepting a regular language. This language is a pat-
tern that indicates trouble if it appears in the bytecode for a method. This method is quite
simple in its management with control flow, but turns out to be significantly faster than the
CFG-based analyses.
The role of the OpcodeStack class, which is present in the subclass OpcodeStackDetector,
is to maintain information about the operand stack as the instructions in a method are vis-
ited, however still in a rather unsophisticated manner. While SpotBugs does not use context-
sensitive, inter-procedural analysis, some detectors reason about global information like
fields access throughout the application or sub-type relationships [8].
CFG-based detectors
Detectors employing the second layer with linear code scans are widely used both in the
SpotBugs and the extended FindSecBugs framework. Some of the more advanced analyses
developed in the SpotBugs core that do control- and data flow flow analysis can be utilized by
FindSecBugs detectors. In this analysis, a CFG representation is built from Java methods, and
the detectors usually implement the Detector interface directly instead of inheriting from a
visitor superclass.
The fundamental behavior of the analysis is to sequentially visit each method of an ana-
lyzed class, requesting a set of analysis objects, which are end products of a certain analysis.
An analysis object records certain and probable facts about the method based on for instance
a dataflow analysis. After collecting these analyses, the detector iterates through each loca-
tion in the control flow graph. Here a location is the point in execution just next to where a
certain instruction is to be executed.
The dataflow facts are checked at every location for suspicious heuristics. For instance,
the ResourceTrackingDetector class 5 is an abstract analysis class designed to find meth-
ods in which a resource of a kind is not properly cleaned up or closed properly. In this analy-
sis the instructions creating an object are expected not to have a path through the CFG which
does not lead to a close. Such a path will be considered an “open” path, and the method will
be reported as a bug. One case in which this class can be extended is when database connec-
tions have not been properly closed in a finally block, and an exceptional control flow may
therefore lead to an unclosed connection.
5
ResourceTrackingDetector in the SpotBugs project [Link]
master/spotbugs/src/main/java/edu/umd/cs/findbugs/[Link]
CHAPTER 2. BACKGROUND 30
1. A vulnerable test code sample is added to illustrate the bug. As it is only intended to
trigger the rule defined in the detector, it does not have to be a working application.
2. Then a test case is written, asserting that the given bug pattern was reported in the
expected code location by the detector.
3. The new detector is configured by adding a detector and bug pattern to [Link].
There are several detector classes that can be extended depending on the characteristics
of the bug pattern that is searched for. The main types are:
Detector which is the basic detector that analyzes the complete class context of a java class.
The simplest detectors use methods that are relatively easy to understand. Listing 2.5
shows an example of an XML related vulnerability, and Listing 2.6 has parts of its compiled
bytecode. The bug is reported by the detector shown in Listing 2.7, which on line 10 specif-
ically looks for the invokevirtual of the constructor of the XMLDecoder (line 5 in List-
ing 2.6).
1 public class XmlDecodeUtil {
2 public static Object handleXml ( InputStream in ) {
3 XMLDecoder d = new XMLDecoder ( in ) ;
4 try {
5 return d . readObject () ; // Deserialization happen here
6 }
7 finally {
8 d . close () ;
9 }
10 }
11 }
Listing 2.5: Vulnerable test code sample for usage of XML deseseralization.
Listing 2.6: Bytecode of compiled vulnerable test code sample for usage of XML
deseseralization.
Find Security Bugs has also introduced a taint analysis component, which may be used
by several detectors to track data between tainted sources and sinks. Several detectors in
Find Security Bugs use resource files to list their vulnerable sources and sinks, which form
inputs to the identifiers they want to check. Detectors read and use these lists when scanning
the code. This enables the community to easily update the detectors without their logic, only
their inputs. This may become useful when new vulnerabilities are discovered by the security
community. All that is needed for detecting the new vulnerability may be to add a single line
with a new identifier to the resource file. Such identifiers may also be used by other detectors
than the ones directly applying taint analysis.
FindSecBugs has inherited the usability of FindBugs, and offers integration in IDE or
other development phases like continuous integration steps. Users of the plugin may sup-
press false positives and target their analysis towards certain packages or classes. When go-
ing through warnings, it is likely that it is quick and easy to fix, and only requires inspection
of a few lines of code [8, 74].
Chapter 3
Related Work
This chapter goes through the related work to this thesis related to the two research ques-
tions. The first part 3.1 goes through formal security analyses and models of the protocol,
which relate to RQ1. Section 3.2 summarizes automated vulnerability detection and protec-
tion tools for the protocol, which are relevant to RQ2.
Related research was obtained with informal searches for relevant keywords like OpenID
Connect, vulnerabilities, detect, static/program analysis, in Oria (The digital library at NTNU),
Google Scholar, the ACM digital library, the IEEE digital library and the relevant paper index
in the Mendeley reference manager. The references and forward citing indexes of some of
the paper were briefly scanned for more inclusions.
The previous research on OAuth and OpenID Connect is often focused on formal security
analysis and threat modeling. The research works can be considered in to factions; One
uses formal security analysis based in threat models of the specification, or manual analysis
of implementations to reason about security vulnerabilities in the specification. The other
faction uses automated penetration testing or program analysis tools to find vulnerabilities
in implemented OpenID Connect systems.
cation
This section highlight related work which relates to RQ1: What must a developer do to avoid
introducing known security vulnerabilities, while implementing a Relying Party with an OpenID
Connect SDK?
The related formal security analyses are generally using two different approaches. Some
of the look at the specification itself, inferring vulnerabilities inherent to how the protocol
standard is defined [33, 34, 62, 89]. Other formally analyze implementations of the protocol,
and use results from experiments to find possible vulnerabilities [2, 49, 50, 86]
Sun and Beshnov(2012) [86] examined implementations of the much-used OAuth IdPs
33
CHAPTER 3. RELATED WORK 34
of security issues related to both the protocol specification and its implementation on the
Web. They focused on how the security parameters like state values and tokens play roles in
the security of the protocol, and how these values can be exposed. One of the threats they
highlighted is usage of the bearer token, since this does not offer data origin validation in
itself. They suggest that signature-based access tokens should be used instead. This problem
is however solved by OpenID Connect, using ID tokens. Additionally such signature-based
tokens are described in OpenID Connect as an optional part of the standard [60], for when
using the implicit flow.
Table 3.1: Overview of threats on OIDC Relying Parties [33, 34, 50, 62].
bilities
This section goes through research which is relevant for RQ2: How can simple, explicit and
intraprocedural static analysis checks be used to identify vulnerabilities in OpenID Connect
Relying Parties?
Previous research have made both dynamic- and static analysis solutions for security
verification of OpenID Connect and OAuth, with various focus areas and abstractions.
Wang et al [95] performed an analysis of three authorization and authentication SDKs in
2013, which at the time were used by 52% of the most popular Windows App store apps. They
used a semantic modeling-based approach using knowledge bases to explicate the SDKs.
The approach generates formalized assertions that are checked based on a clause of seman-
tic model properties, and detects security violations by testing proofs with a satisfiability
problem (SMT) solver. Using a symbolic execution framework for validation of the models,
analyses took between 11 and 25 hours to check the three SDKs for vulnerabilities.
In 2014, Zhou and Evans [99] introduced SSOScan, a black-box penetration testing tool
for Relying Parties, applications using SSO. They conducted a large-scale study, which is lim-
ited to RPs using Facebook’s implementation of OAuth. They detect four vulnerabilities: ac-
cess token misuse, app secret leak, user OAuth credentials leak and signed request misuse.
The former two are related to confusion regarding authorization mechanisms, while the lat-
ter two are based on failures to keep secrets confidential. SSOScan simulates a series of at-
tacks and observes the responses that come over the network. The tool has an regex-based
automated button finder on the forms in the sites that it analyzes. The tool is limited to
faking user interactions and as a black-box tool limited to vulnerabilities can be detected
through analyzing web traffic patterns.
Yang et al.(2016) [97] designed and implemented a model-based tool called OAuthTester.
They examined found major identity providers and 500 websites implementing OAuth 2.0.
In their design they use a finite State machine to model the protocol flow. They use fuzzing
(See Chapter 2.5.4) techniques to query the RP and the IdP. They mainly found vulnerabilities
related to improper management of the state parameter.
Mainka and Wich [54] proposed in 2017 an Evaluation-as-a-Service tool they call PrOfES-
SOS, which dynamically allows a tester to perform black-box penetration testing in run-time,
simulating honest and dishonest IdPs. They categorize to main classes of threats; Single-
Phase Attacks (exploit a single security check) and Cross-Phase attacks (complex attack setup
manipulating several messages in the data flow). These classes encapsulate most of the var-
ious threats summarized in Table 3.1. HTTP requests are manipulated by the tool’s IdP, and
RP reactions to different malicious requests are analyzed. Detection criteria for a vulnera-
bility is determined by successful maliciously obtained access to credentials. The analysis
requires a manual configuration to increase soundness.
Yang et al. (2018) [96] designed an automated testing tool, S3KVetter, verifying logical cor-
rectness and identifying vulnerabilities in SDKs implementing OpenId Connect or OAuth.
CHAPTER 3. RELATED WORK 38
Their focus is in SDKs that are used for implementing a client application, as a more specific
continuation of their previous work [97]. Their approach is based on theorem provers after
the program’s code is translated to appropriate logic predicates. The code is by dynamic
symbolic execution extracted to a symbolic predicate tree, in which all the program’s execu-
tion paths form branches in the thee, with the leaf nodes containing the end result of a given
path. The paths are explored with a scheduling algorithm that simulates various program
executions with data inputs. As their approach is focused around attacker-oriented steps
of the protocol flow, their analysis cannot reach different paths than an attacker might, and
their knowledge of the program internals is lacking. Their notion of an attacker is a malicious
“user” doing man-in-the-middle attacks. Hence their approach also assumes that the IdP is
trustworthy. The tool requires some manual setup of a sample app, and the user must mark
which functions may be reached by an attacker (functions handling user input).
Calzavara et al. (2018)[11] made a browser-side security monitor called WPSE, and a thor-
ough security analysis of Web protocols, including OAuth 2.0. Their tool is designed to ensure
compliance with the intended protocol flow, and integrity and confidentiality of messages.
In an experiment on 90 websites, they uncovered that over 61% had security flaws. This
browser extension must presumably be installed by the website’s users.
Li, Mitchell and Chen. (2019)[51] proposed at roughly the same time a security scanner
and protector, OAuthguard, which similar to WPSE provides protection for OIDC and OAuth
2.0 as a browser extension. They performed an experiment on the top 1000 RPs using the
Google single sign-on services as IdP. Like other dynamic analyses, this tool acts as a proxy,
and detects vulnerabilities by scanning HTTP messages. It may block http requests if the
request indicates unsafe token transfer (checking TLS usage), privacy leaks, impersonation
attacks and CSRF attacks.
Also in 2019, Rahat et al. [76] introduced OAuthLint, a tool using query based static anal-
ysis to find vulnerabilities in Android apps that implement the protocol using OAuth APIs.
They based their analysis on a model with anti-protocols, which denotes vulnerabilities in
the protocol. They analyze relying parties for vulnerabilities related to:
• failed validation of API calls from client Android devices which should not be trusted.
Their analysis computes a control-flow graph which they query with formal logic predicates.
They evaluated their analysis on around 600 popular Android apps, and found that 32% of
the analyzed apps had at least one of the five vulnerabilities they looked for. Their analysis
achieved a high precision of 90%.
CHAPTER 3. RELATED WORK 39
Table 3.2: Overview of related works doing automated vulnerability detection of OpenID
Connect and OAuth 2.0.
• Consultants have a near 50/50 distribution in preference that the static analysis tool
is fully automated (and is therefore less precise but easier to use) or requires some
annotations but more powerful.
• If the tool seamlessly integrates into their workflow, software consultants are more
likely to use it.
• The respondents answered that they generally do not think precision should be lower
than 90%.
• However, lower precision is acceptable if security code is analyzed. We found in our
study that the consultants are much more inclined towards in higher recall (or sound-
ness) than high precision if the tool looks for security-critical vulnerabilities like access
CHAPTER 3. RELATED WORK 40
control vulnerabilities.
Chapter 4
Research Design
41
CHAPTER 4. RESEARCH DESIGN 42
RQ1 What must a developer do to avoid introducing known security vulnerabilities, while
implementing a Relying Party with an OpenID Connect SDK?
RQ2 How can simple, explicit and intraprocedural static analysis checks be used to identify
vulnerabilities in OpenID Connect Relying Parties?
4.3.1 RQ1
In this thesis, RQ1 falls into the Characterization category. This means that the desired re-
sults are a form of document, list or model. The results from RQ1 are a qualitative model in
form of a well-grounded checklist and informal generalizations. The results from RQ1 are
considered input to the work in RQ2, and the validation of these results is therefore done
indirectly through the validation of the results emerging from RQ2.
4.3.2 RQ2
RQ2 is in the Design of a particular instance category for research questions. It is not seeking
a formal model or any general framework, but rather a pragmatic and concrete solutions
to a concrete problem. The validation strategy chosen in this thesis can be characterized
under the Example-based category, which implies some threats to the validity of the results
(This is discussed further in Chapter 8.3). While this is not considered as strong as the more
ideal choices, of Analysis-based or Experience-based validation strategies, well-chosen slice
of life examples can be considered somewhat successful and is fairly common in Software
Engineering research [82].
It was was considered too time-consuming for the constraints of this work to design a
statistically significant empirical validation of the results in this thesis, due to the work load
required to obtain a sufficient volume of code bases or corpus, as well as a statistically rigid
design for the experiment. Therefore a simpler alternative with slices of life were consid-
ered an acceptable plan B. As such, this research is mainly answered through what Oates
[65, pp.133–134] defines as field experiments on real-life code examples. The example-based
strategy can also according to Oates be considered a proof-by-demonstration.
CHAPTER 4. RESEARCH DESIGN 43
Connect Specification as well as research security analyses of the protocol (See Chapter 3),
code examples were constructed with the intention of having as much realism as possible,
without adding boilerplate code that can be considered out of scope for this thesis. These
code examples form the fundamental building block for the pragmatic model for develop-
ment as a means of an informational sweet-spot between the content-rich protocol, and the
quite limited developer guides.
This chapter answers RQ1: What must a developer do to avoid introducing known security
vulnerabilities, while implementing a Relying Party with an OpenID Connect SDK?
The foundation of the developer-oriented model worked out in this chapter is restricted
to the Authorization Code Flow of the protocol, with basis in two SDK implementation guides.
The steps covered are metadata discovery, authorization code request, token request and to-
ken validation, which are shown as steps 0-3 in Figure 5.1.
45
CHAPTER 5. RQ1 RESULTS: DEVELOPER-ORIENTED MODEL OF SECURE OIDC PRACTICE46
These three steps (plus the preparatory step) are inferred from a sensible division of code
into each their designated method. After step 3 naturally, the protocol would follow with a
query to the UserInfo endpoint. This is however left out of this scope. Code examples with
non-compiling Java-pseudo code of these steps are shown in listings below.
An example of the code in step 0, which is the Discovery process, is shown in Listing 5.1.
This is a preparatory step in the model. Listing 5.2 has a code example of step 1, building
the authentication request. The results from the pre-step 0 in the discovery protocol are
obtained on line 4, where the RP has sent a request to the discovery URI of the IdP, and re-
ceived the Provider Metadata Document in return to establish trust. The important things to
remember here are adding the state and nonce parameters. The rest of the parameters are
essentially required to even send a request.
Step 2 receiving callback response from the IdP is shown in Listing 5.3. Here handling an
eventual error response and validating the state parameter are the security-critical steps.
Step 3 has the most significant difference for the two SDKs analyzed in this thesis, and is
shown in two different ways using the Google library (Listing 5.4) and the Nimbus SDK(Listing 5.5).
The main difference lies in that the Google library in Listing 5.4 does not have a completed
validation encapsulated, and the developer must therefore handle details of the conditional
checks.
Nimbus in Listing 5.5 the other hand, requires the developer to set up an IDTokenValidator
object with some required parameters, and it will handle the individual checks and throw ap-
propriate exceptions if something is unexpected. Here the developer still has to pass the cor-
rect values, however, and must properly pass the correct nonce value that they have stored.
In Listing 5.1, the Discovery process in implemented with the Google library. In lines 1-7,
the RP builds the URL for the openid-configuration endpoint. Lines 10-16 contain checks to
ensure that the connection is using TLS, and that the response is of a valid HTTP response
code. Then the JSON document is retrieved in lines 17-21, and finally parsed. If the parsing
fails, an appropriate exception is thrown.
1 // Step 0
2 private Map < String , Object > discovery () {
3 try {
4 URI issuerURI = new URI ( " https :// provider . example . com / " ) ;
5 URL idpConfURL = issuerURI
6 . resolve ( " /. well - known / openid - configuration ? " )
7 . toURL () ;
8 Ht tp sU RLC on ne cti on conn = idpConfURL . openConnection () ;
9 conn . setRequestMethod ( " GET " ) ;
10 if (! conn . getURL () . getProtocol () . equals ( " https " ) ) {
11 throw Exception ... " Discovery url not using https "
12 }
13 if ( conn . getResponseCode () != Ht tp sU RLC on ne cti on . HTTP_OK ) {
14 throw Exception ... " Failed to respond with HTTP OK . "
15 }
16 InputStream stream = conn . getInputStream () ;
17 String providerInfo = " " ;
18 try ( java . util . Scanner s = new java . util . Scanner ( stream ) ) {
19 providerInfo = s . useDelimiter ( " \\ A " ) . next () ;
20 }
21 return parseJson ( providerInfo ) ;
CHAPTER 5. RQ1 RESULTS: DEVELOPER-ORIENTED MODEL OF SECURE OIDC PRACTICE48
Listing 5.1: Step 0 - The Discovery process using the Google library
Listing 5.2 shows a simplified code example of step 1, the authentication request. First in
line 4, the provider metadata document is obtained from the discovery process in step 0. The
state and nonce values are generated from methods that make opaque randomized strings
in line 5-6. These are stored in an object called OidcConfig. Then an AuthorizationCodeFlow
object is built in lines 8-11, storing values obtained from the IdP. The client id and client
secret have previously been obtained when registering the client at the IdP. Then in lines 12-
19, the authentication request URL is build. The state and nonce parameters are added to
the request. Then in lines 20-22, the requesting user agent is redirected to the authorization
end-point at the IdP.
1 // Step 1
2 public Response a u t h e n t i c a t i o n R e q u e s t ( Htt pS er vle tR eq ues t request ) {
3 try {
4 providerMetadata = discovery () ; // Step 0
5 String state = nonce () ; // random string
6 String nonce = state () ; // random string
7 // ... Store state and nonce in OidcConfig
8 codeFlow = new A u t h o r i z a t i o n C o d e F l o w . Builder (... ,
9 config . getProperty ( " clientSecret " ) ) ,
10 config . getProperty ( " clientId " ) ,
11 providerMetadata . get ( " a u t h o r i z a t i o n _ e n d p o i n t " ) ) . build () ;
12 requestUrl = codeFlow
13 . n e wA u th o r iz a ti o nU r l ()
14 . setResponseTypes ( Collections . singleton ( " code " ) )
15 . setScopes ( scopes )
16 . setRedirectUri ( callbackURI )
17 . setState ( state )
18 . set ( " nonce " , nonce )
19 . set (... , ...) ;
20 return Response
21 . seeOther ( requestUrl . toURI () )
22 . build () ;
23 } catch (... Exception e ) {
24 return Response ... UNAUTHORIZED ...;
25 }
26 }
Listing 5.3 contains a simplified code example of step 2, where the response from the IdP is
received as a callback request. The OidcConfig for the given flow is retrieved with a unique
UID in line 5. Then, the callback response URL is parsed. In lines 7-10, the callback response
is checked for an error, and the flow is broken if it does have an error. Then comes an impor-
tant check in line 11, where the state parameter in the callback request is compared to the
stored value, and an appropriate HTTP error code is returned. After this validation, a token
request is build in lines 14-19, adding the authorization code received in the callback request
CHAPTER 5. RQ1 RESULTS: DEVELOPER-ORIENTED MODEL OF SECURE OIDC PRACTICE49
as well as other required parameters. Then the request is executed on a back-channel con-
nection with the IdP in line 20. It returns a token response, and will throw and exception
if this is not successful. Then the token response and the oidcConfig containing the nonce
parameter are passed on to step 3.
1 // Step 2
2 public Response callback ( H ttp Se rv let Re qu est req ) {
3 try {
4 UUID uuid = UUID . fromString (... req . get ( uuid ) ) ;
5 OidcConfig oidcConfig = ( OidcConfig ) cache . get ( uuid ) ;
6 ... ResponseUrl responseUrl = new ... Url ( req . getRequestURI () ) ;
7 String error = responseUrl . getError () ;
8 if ( error != null ) {
9 return Response ... UNAUTHORIZED ...;
10 }
11 if (! oidcConfig . state . equals ( responseUrl . getState () ) ) {
12 return Response ... UNAUTHORIZED ...;
13 }
14 String authori zationCo de = responseUrl . getCode () ;
15 TokenRequest tokenRequest = codeFlow
16 . newTokenRequest ( author izationC ode )
17 . setToken ServerUr l ( codeFlow . g e t T o k e n S e r v e r E n c o d e d U r l () )
18 . setClientAuth ...( codeFlow . g e t C l i e n t A u t h e n t i c a t i o n () )
19 . setRedirectUri ( redirectUri ) ;
20 idTokenResponse = IdTokenResponse . execute ( tokenRequest ) ;
21 return validateTokens ( idTokenResponse , oidcConfig ) ;
22 } catch (... Exception e ) {
23 return Response ... BAD REQUEST ...;
24 }
25 }
Step 3 using the Google library is shown in Listing 5.4. Here all the required ID token
1
checks are implemented, using the various verify methods implemented in the IdToken
wrapper class in the Google library. The ID token is parsed in line 4. The checks in lines
5-24 are similar, retrieving appropriate stored values, each returning error responses with
the HTTP code 401 UNAUTHORIZED if the check fails. If none of the checks fail, the token
response is stored in line 25, and a success response is returned with the token as payload in
line 26.
1 // Step 3
2 public Response g o o g l e V a l i d a te T o k e n s (... tokenResponse , ... oidcConfig ) {
3 try {
4 IdToken idToken = tokenResponse . parseIdToken () ;
5 if (! oidcConfig . nonce . equals ( idToken ... getNonce () ) ) {
6 return Response ... UNAUTHORIZED ...
7 " Provided nonce did not match " ;
8 }
9 if (! idToken . verifySignature ( pu b li c Ke y F ro m Jw k S et () ) ) {
10 return Response . status ( Response . Status . UNAUTHORIZED )
11 " Jwt signature is not valid " ;
12 }
13 if (! idToken . verifyAudience ( clientId ) ) {
1
The required ID token checks are described in Chapter 2.4.1
CHAPTER 5. RQ1 RESULTS: DEVELOPER-ORIENTED MODEL OF SECURE OIDC PRACTICE50
Listing 5.4: Step 3 - Correct token validation using the Google library.
In Listing 5.5, the ID token verification is written using the Nimbus SDK. This code exam-
ple is very different from the one for the Google library. In lines 5-9, the IdTokenValidator is
instanciated with the required values, including encryption algorithms and other data from
the Discovery document. The in lines 15-16, the store nonce parameter is retrieved and the
ID token is obtained from the token response. Then by calling [Link],
the required checks are done by the validator, which in throws a BadJOSEException if any of
the checks failed, and a JOSEException if an error happened during the validation. If the
checks did not fail, a success response with the token request as payload is return in line 24.
It is here appropriate to use the IDTokenValidator object provided by the SDK, since it does
all the required checks if it receives the correct values from the developer.
1 // Step 3
2 public Response n i m b u s V a l i d a te T o k e n s (... tokenResponse , ... oidcConfig ) {
3 JWSAlgorithm metadataAlg = JWSAlgorithm . RS256 ;
4 try {
5 idTokenValidator = new IDTokenValidator (
6 providerMetadata . getIssuer () ,
7 clientID ,
8 metadataAlg ,
9 providerMetadata . getJWKSetURI () . toURL () ) ;
10 // JWKsetUri gives the keys from the IdP
11 } catch ( M a l f o r m e d U R L E x c e p t i o n e ) {
12 return Response ... I N T E R N A L _ S E R V E R _ E R R O R ...
13 " The provider metadata jwkSetUri is invalid " ;
14 }
15 Nonce expectedNonce = oidcConfig . nonce ;
16 JWT idToken = tokenResponse . getOIDCTokens () . getIDToken () ;
17 try {
18 idTokenValidator . validate ( idToken , expectedNonce ) ;
19 } catch ( BadJOSEException e ) {
20 return Response .... UNAUTHORIZED ) ... " Invalid ID token " ;
21 } catch ( JOSEException e ) {
22 return Response ... BAD_REQUEST ... " Error validating ID token . " ;
23 }
24 return Response ...200 OK ...
CHAPTER 5. RQ1 RESULTS: DEVELOPER-ORIENTED MODEL OF SECURE OIDC PRACTICE51
25 tokenResponse . toJSONObject () ;
26
27 }
Listing 5.5: Step 3 - Correct token validation using the Nimbus SDK.
RP
Rules for correct development can be worked out based on the OpenID Connect specifica-
tion [60], the developer SDK guides by Nimbus and Google Api Client [37], and analysis of
their javadocs and open-sourced code bases. These rules form a step-wise checklist rooted
in the model in Section 5.1
value is passed through from the authorization request to the ID token, and must be
unmodified.
• Store state and nonce safely. Could probably store it in a cookie, HTTP session or for
example retrieve them from a cache using a GUID associated with the request agent.
Details about how to do this step is not in the scope of this work.
• Make an authentication request URI
– Passing at least the parameters:
* client_id
* response_type
* scope
* redirect_uri
* state
* nonce
– Using the Nimbus SDK
* Build an instance of AuthenticationRequest with the required parameters.
Use the Builder to add parameters like the login_hint (optional but recom-
mended)
– Using the Google SDK
* Make an AuthorizationCodeFlow instance, adding client identifiers and urls
for endpoints. Use [Link]()
* Authentication request made in a builder pattern in difference to strict type
parameters in nimbus. A lot easier to forget state and nonce as you have to
add them manually. In nimbus you explicitly have to pass null as state and
nonce parameters to even run a request.
• Redirect the user agent with the authentication request URI to the authorization end-
point. Now the end user will log in on the IdP’s side.
– client_secret,
– grant_type: “authorization_code”,
– and redirect_uri
• Do proper error handling with token response: If you have error response the control
flow must be broken, return a HTTP code 401 UNAUTHORIZED.
• With successful token response, parse the token response. Pass the ID Token on to
step 3 for validation. The SDKs have implemented the parse function, use for example
[Link]().
* Therefore the developer does not have to perform any additional checks in
order to follow the specification. However the above checks must be per-
formed if you choose to implement them manually.
• Using the Google SDK:
– Cryptographic signature validation and nonce validation must be done manually
by the developer.
– The IdTokenVerifier can be set up with parameters issuer and client_id.
– The internal ItTokenVerifier performs the following checks [40]:
* Check if the iss and aud parameters match the expected IdP and client_id
* Checks if time is within acceptable validity window (exp and iat parameters)
using idToken, with time skew leeway.
– Even using the IdTokenVerifier the developer must either way verify:
* That the JWS algoritm matches the expected retrieved from discovery docu-
ment.
* That the ID token signature is valid using the key from the discovery docu-
ment.
* That the nonce value matches the saved (expected) one.
– Therefore for code clarity the developer should probably just validate everything
that is recommended until the SDK implements all checks in a future release. The
IdToken class 3 has implemented designated verify-methods for most of these.
* That ID token JWS algorithm matches the expected algorithm.
* The ID token signature or HMAC using the provided key material, from the
client secret or JWK set URL in the discovery document.
* If the ID token iss and audience aud parameters match the expected IdP and
client_id.
* That the ID token is within the specified validity window (between iat and
exp time, given a 1 minute leeway to accommodate clock skew).
* The nonce value matches the saved (expected) one.
Optional: validate the Access Tokens After the ID token is validated, it may be
used to obtain user info from the UserInfo endpoint.
• Usage of the Resource Owner Password Grant. In a later update of the OAuth stan-
dard [41], this grant is no longer considered acceptable.
3
Google SDK: [Link] [Link]
master/google-oauth-client/src/main/java/com/google/api/client/auth/openidconnect/
[Link]
CHAPTER 5. RQ1 RESULTS: DEVELOPER-ORIENTED MODEL OF SECURE OIDC PRACTICE55
• Usage of a known limited ID Token “validator” provided by an SDK, like the IdToken-
Verifier [40] of the Google library. This may trick the developer into thinking that all
needed checks are done.
Table 5.1: Potential vulnerabilities as various errors that can occur by breaking the rules in
the model.
The following chapter relates to RQ2: How can simple, explicit and intraprocedural static
analysis checks be used to identify vulnerabilities in OpenID Connect Relying Parties?. This
chapter goes through the design and implementation of simple static analysis techniques
for enforcing the security principles in the model for OpenID Connect (See Chapter 5.1).
The idea is that analyses of three layers can cover a lot of the security-critical protocol steps
the developer has to implement. Analyses are implemented as FindSecBugs detectors 1 .
The first layer is using the OpcodeStackDetector analysis from FindBugs (See Chapter
2.7), here named the Immediate Code Smell Detection analysis, which only looks at a single
instruction in the JVM bytecode. The second layer is the Co-existing Invocation Enforcement
analysis, which reasons about each method in a class, as well as inter-procedural approxi-
mation. Lastly, the third layer is the Static Control Flow Check analysis, which analyzes the
control-flow graph.
The focus here is vulnerabilities in code calling OpenID Connect SDKs, meaning bugs
that developers may introduce when they write code that interfaces with these SDKs. The
analyses are not concerned with looking for vulnerabilities in the SDKs themselves.
Definition 6.1.1. Peephole: In this thesis, a peephole is an instruction in the program’s byte-
code, denoted as <x>, a CFG edge type or a combination of instructions in a java method.
A peephole can be considered a simple property that is used to infer a property in a more
complex flow.
57
CHAPTER 6. RQ2 RESULTS: DESIGN AND IMPLEMENTATION 58
ple, simply the invocation of the class TokenResponse would imply that we are in a method
implementing step 3 of the code flow in Listing 5.4.
Table 6.1: Definitions for identifiers that are used in the checks of the peephole analyses.
Peephole Description
<inv> Invocation <inv> is defined as a bytecode in-
voke instruction in which a certain class or
type is instantiated.
<cmp> Comparison <cmp> is defined as an if-
instruction like the ifne bytecode instruction.
<ret> Return <ret> is defined as the act of returning
a certain HTTP Response code or throwing an
exception.
<ver> b Verification <ver> of a value b either happens
with an <inv> with [Link](), or as b passed
to another method in which an <inv> with
[Link] is called.
<comb> Combination <comb> is the coexistence of a
set of peepholes in the bytecode.
<pat> Peephole Pattern <pat> is in this context the
appearance of one of or a combination of the
attributes <cmp>, <inv>, <ver> or <ret>.
<pair> A strict pair of patterns where we expect pat-
tern b to be found if we have found pattern a.
instance just be the usage of a data type which is associated with a disallowed pattern in the
protocol.
Both its strength and its weakness lies in this simplicity. The way it is used in other parts
of FindSecBugs, we have a black-list of known functions that must never be used. For in-
stance just using the [Link]() function is something that typically must not be seen
in production code. It is therefore enough to just flag this value as a code smell, and make
sure that the developer is informed. The basic algorithm for the detector is quite simple. De-
fine a set of peephole patterns <pat>, which usually would be <pat> = {<pat1>: <inv> type A,
<pat2>: <inv> type B}. Then the typical detector is implemented like shown in Algorithm 6.1:
1 input : Code F i l e
2 output : V u l n e r a b i l i t y Reports
3 begin
4 scan opcodeStack in Code F i l e
5 foreach opcode in opcodeStack
6 i f opcode in <pat >
7 report v u l n e r a b i l i t y <pat >
8 end
9 end
10 end
This analysis is limited to very simple facts about a single instruction, and cannot infer more
complex relations between data items. It can however flag a data type that is associated with
a code smell.
Some simple examples can illustrate some of the abilities and boundaries of this analysis.
For example the analysis could expect to see that if b() has been called, somewhere c()
must follow. The code below would then be passed as safe, thereby a true negative:
void a() {
b();
... other code
c()
}
It will also let code like below pass as safe, where the call to c() is delegated to another
method d():
boolean d() {
c()
}
void a(var) {
b();
... other code
d()
}
After b() has been called, somewhere c() must follow. However something else is there, but
c() is missing. The snippet below would then raise a warning:
void a() {
b();
CHAPTER 6. RQ2 RESULTS: DESIGN AND IMPLEMENTATION 61
The call to c() may be delegated to another method d(). However d() does not have any
call to c() either, even if its name and context would suggest so. This is therefore vulnerable.
boolean d() {
e(); // c() is missing!
return f;
}
void a() {
b();
... other code
return d();
}
Here the inter-procedural component will come in. In this case the subsequent enforce-
ment follow this strategy to detect that we have a broken rule:
2. Note that a() has a called on b(), which means that somewhere in this area c() must
be found.
3. Scan linearly through the method. c() was not found, but d() may have a call to c(),
indicated by its parameters and name. Save a pair of a() and d() for later inspection.
4. Scan further through the rest of the methods and finish the list. If any method contains
a call to c(), save it in a list of approved methods.
5. Scan through the methods which have a suspected call to another, which may contain
a call to c(). Then finish the list.
6. Look through the methods saved for later inspection. d() is expected to have a call to
c(). Check if d() is in the list of approved methods. If not, raise warning on a().
This component is introduced because of a much-seen code pattern where a check is not
done in-line, but delegated to a pure verify method.
CHAPTER 6. RQ2 RESULTS: DESIGN AND IMPLEMENTATION 62
The only thing the Co-existing Invocation Enforcement really does, is looking at the existence
of certain method calls in the code. It however makes no assumption on whether they are
done right in terms of control flow. The following code would be vulnerable, but come as a
false negative using the Co-existing Invocation Enforcement:
boolean isValid(data){
if(data.c() != safe) {
return true;
}
return false;
}
Here what the Co-existing Invocation Enforcement would look for is the comparison of
data.c() and another value. However the developer of this code has made a blatant mistake
and reversed the if conditional, so that in any case where data.c() is not safe, is says that
it is safe. Such cases are therefore of too a subtle nature for this analysis. To deal with easy
control-flow mistakes, the Static Control Flow Check is appropriate to use (see Section 6.2.3)
.
This detection strategy is simple, but is limited to detecting the appearance or absence of
usages of a certain data attribute, i.e. the state parameter or the ID token. It has a simple
inter-procedural component that covers a simple, and easy-to expect case of delegating a
check downwards to another method. However it does not reason completely about inter-
procedural artifacts, and if the methods are structured in a special way in the code, it may
miss vulnerabilities or give false positives.
Another limitation is that it is unable to tell whether the check is carried out with a proper
control flow (the check may be useless if it does not enforce what happens after the check).
However this strategy is cheaper. The absence of the checks this analysis looks for makes
a heavier control flow analysis unnecessary, since there then would be no control flow to
check. Only when the checks are present, control flow analysis can be employed to further
verify the solidity of the code.
used in cases of the model where developers carefully have to ensure a proper control flow,
and is highly specialized and constrained towards what is a valid pattern. This is made in the
spirit of FindBugs [44], with the notion that developers make “dumb” mistakes like revers-
ing an if conditional or continuing running the code after a catch block where an error case
should be managed.
The initial assumption in a simple conceptual CFG is that you have a series of checks that
divide control flow. A typical pattern discovered in the validation steps of the OIDC protocol
for this model, are quite linearly placed if-conditionals that ends the program right there
with an error code or continues the flow if the check passed. This technique is applicable for
relatively linear control-flow graphs which follow one consistent green path and otherwise
break off early.
Such a graph is illustrated in Figure 6.1. Here the basic control flow of the code example
for token validation (Listing 6.11) is shown in a simplified manner. The main point of the
code in Listing 5.4 is that you have a series of if-else checks of values in the ID token. These
if-else checks can be modeled as a simple binary control-flow graph which either ends in a
leaf node or goes further down the thee.
In Figure 6.1, the green boxes represent a successful check. If one of the if-checks cor-
rectly verified the value, it will go to the next if-check. However if the checked value is invalid,
the control flow is broken. Then we end up in a red box in the modeled graph, thus ending
the control flow path and stopping the flow in the program.
During the analysis the intuition is to look for a blue block, which should be a negative
comparison like . Such a block is followed by expected outgoing edges, and
the fall-through edge leads into the if-block braces. If we are inside the if-block it meant one
of the verification steps has failed, and the ID Token is invalid. We therefore expect a return
statement which takes us out of the method. This return statement is also expected to give
an appropriate HTTP error code.
CHAPTER 6. RQ2 RESULTS: DESIGN AND IMPLEMENTATION 64
Given the above, the intuitive peephole check is a simple look at a basic block, and the
neighboring block following one of the outgoing edges. However in a code base with sev-
eral invocations, the analysis is not quite so simple as to look at a single edge between
the if-statement and the code in its following curly braces. Rather, if we return the object:
[Link](CONSTANT...), a chain of invocations happen as we perform several
compact method calls. Therefore the reality is a more complex graph like shown in Fig-
ure 6.2. Here we actually have to traverse a series of “leaf” nodes to get to the actual leaf
node.
Figure 6.2: A closer look at the control flow graph for token verification.
This means that even a minimized peephole analysis needs to traverse the CFG to some
extent. Luckily, the control flow graph in such return statements as described above are still
relatively simple. Each of the basic blocks have usually two outgoing edges, either an excep-
tion edge if the invocation failed, or simply fall-through to the next invocation. This series of
basic blocks could therefore almost be considered parts of the edge between our two main
trigger points, namely the blue conditional block, and the final red return block. However,
simple hints may come from the instructions also inside these “fall-through” blocks. These
hints are tracked in a simple data object that is checked in the end of the analysis. Even while
traversing the CFG and performing some operations that approximate data-flow, the triggers
CHAPTER 6. RQ2 RESULTS: DESIGN AND IMPLEMENTATION 66
of the analysis are simply a look-and-match. For certain expected instructions inside each
basic block, and a combination pattern of these basic blocks in a single method.
The detection strategy follows, as shown simply in Algorithm 6.3: Iterate the basic blocks
in the CFG. If a block contains one of our expected verification method calls, and it is an
if/else conditional block, we have triggered the analysis. Traverse linearly through the fall-
through edges following the conditional check. Pick up additional instructions in the tra-
versed basic blocks. The sum of the peepholes in the series of basic blocks determine the
peephole. If we do not find a following set of blocks that satisfies the expected return pat-
terns, but end up in a new check or a different return state, we have a control flow bug.
1 input : Code F i l e
2 output : V u l n e r a b i l i t y Reports
3 begin
4 foreach basicBlock in CFG
5 i f basicBlock i n s t r u c t i o n s match <pat a >
6 t r a v e r s e neighboring blocks looking for <pat b >
7 i f not found i n s t r u c t i o n s with <pat b > in neighboring blocks
8 report v u l n e r a b i l i t y
9 end
10 end
11 end
12 end
Algorithm 6.3: Detection strategy for Static Control Flow Check in token validation bugs.
The Static Control Flow Check analysis has some of the same limitations as the other anal-
yses. To avoid false positives, a large number of patterns are needed. It also suffers from
lacking ability to track data, and will therefore be limited in how subtle errors it can find.
A potential fourth detector type is to use data-flow analysis for another layer of sophistica-
tion. Such detectors exist in FindBugs and FindSecBugs for other vulnerabilities, and can
potentially be designed for appropriate cases in OpenID Connect. A detector such as the Re-
sourceTrackingDetector (See Chapter 2.7) may have some uses for certain cases in OpenID
Connect where it is crucial to have a comprehensive view of the data flow. That is however
beyond the scope of this study and is an interesting avenue for further work.
The model in Chapter 5.1 proposes a developer-oriented way to think about how OpenID
Connect is to be implemented securely. Based on the model, a set of potential vulnerabilities
due to implementation bugs were inferred in Table 5.1. The three detector types are appro-
priate for different kinds of bugs in OIDC code, and similar algorithms may work for similar
issues related to different data items.
Table 6.2 shows a suggestion of which vulnerabilities can be detected by which kind of
analysis. Many of these vulnerabilities will occur for similar reasons, making similar detec-
tors appropriate for covering the whole authorization code flow. FindSecBugs detectors were
implemented for four of these inferred bugs. These implemented detectors are explained in
Section 6.2.
CHAPTER 6. RQ2 RESULTS: DESIGN AND IMPLEMENTATION 68
Table 6.2: Suggested analyses to be implemented as detector for various errors that can occur
by breaking the rules in the model. The detectors that are implemented in this study are
highlighted with bold
• Immediate Code Smell Detection: Insecure authorization grant detector (Auth. Gr.),
which is explained in Section 6.4.1.
• Co-existing Invocation Enforcement: Improper state verification detector (State ver.),
which is explained in Section 6.4.2.
• Co-existing Invocation Enforcement: Improper ID token verification detector (Token
ver.), which elaborated in Section 6.4.3.
• Static Control Flow Check: Token CFG (Token CFG), which is explained in Section 6.4.4.
In FindBugs, each detector reports a set of bug patterns, or in this context also referred
to as vulnerability patterns. These patterns are created based on the protocol model, and
the potential vulnerabilities in Table 5.1. A bug pattern may relate directly to these poten-
tial vulnerabilities, but do in cases offer various level of detail. This is a mapping between
the theoretical model, and practical ways to report and inform developers of their vulner-
abilities. Table 6.3 gives an overview of the main vulnerability patterns for the detectors.
Specific vulnerability patterns are presented for each implemented detector respectively, in
Tables 6.4, 6.5, 6.6 and 6.7.
Table 6.3: Overview of the main vulnerability patterns developed for the implemented de-
tectors. These are explained in more detail under each detector in Sections 6.4.1 to 6.4.4
in case they unknowingly use types associated with anti-patterns. The Resource Owner Pass-
word Grant is disallowed in the newer version of OAuth [41].
Listing 6.4 shows a secure code example of the authorization grant. The authorization code
grant is considered secure when implementing OpenID connect.
1 Au th or iza ti on Gra nt codeGrant = new A u t h o r i z a t i o n C o d e G r a n t (
2 authorizationCode ,
3 callbackURI ) ;
4 TokenRequest tokenReq = new TokenRequest (
5 providerMetadata . g e tT o ke n E nd p oi n tU R I () ,
6 clientSecretBasic ,
7 codeGrant ,
8 scopes
9 ...)
Listing 6.4: Correct usage of authorization grant, like using the authorization code grant.
An example of a vulnerable grant flow is shown in Listing 6.5, in which the Resource Owner
Password Grant is used in line 2. The detector will raise a warning if encountered with code
where objects like the ResourceOwnerPasswordCredentialsGrant are used, because these de-
note usage of the unsafe grant type.
1 Au th or iza ti on Gra nt
2 passwordGrant = new R e s o u r c e O w n e r P a s s w o r d C r e d e n t i a l s G r a n t (
3 username ,
4 password ) ;
5 TokenRequest tokenReq = new TokenRequest (
6 tokenEndpoint ,
7 clientSecretBasic ,
8 passwordGrant ,
9 scopes ,
10 ...)
Detection strategy
This detector simply uses the behavior of the OpCodeStackdetector (See Chapter 2.7). It
looks for a bytecode instruction which matches its blacklist, which consist of types that de-
note usage of the bad code grant. When a match comes it raises a warning. Otherwise it will
ignore everything, so it is not very prone to false positives.
Vulnerability Patterns
The Insecure authorization grant detector currently looks at one vulnerability class. This
class is reported if it notices the usage of a bad authorization grant:
CHAPTER 6. RQ2 RESULTS: DESIGN AND IMPLEMENTATION 71
In step 1, an authentication request has added the state parameter to the authentication
request (See Listing 5.2). This value must be checked in step 2, which is receives the callback
request. A secure code example is shown in Listing 5.3.
The error of state checking is quite simple. In the code in Listing 6.6, the state is not checked
even if we are in the callback context. Here the code proceeds to use the authorization code
in line 10.
1 public Response callback ( H ttp Se rv let Re qu est req ) {
2 try {
3 UUID uuid = UUID . fromString (... req . get ( uuid ) ) ;
4 OidcConfig oidcConfig = ( OidcConfig ) cache . get ( uuid ) ;
5 ... ResponseUrl responseUrl = new ... Url ( req . getRequestURI () ) ;
6 String error = responseUrl . getError () ;
7 if ( error != null ) {
8 return Response ... UNAUTHORIZED ...;
9 }
10 // Missing check state !
11 String authori zationCo de = responseUrl . getCode () ;
12 .... token response
13 }
Vulnerability Patterns
There are two vulnerability patterns defined in this detector, shown in Table 6.5.
CHAPTER 6. RQ2 RESULTS: DESIGN AND IMPLEMENTATION 72
Table 6.5: Vulnerability patterns in the Improper state verification detector. In addition to the
blatant missing verification of the state parameter, passing the value somewhere unvalidated
is flagged as a lower-level warning.
Detection strategy
To detect bugs related to improper validation of the state parameter, we have the following
strategy (which is further explained in Section 6.2): Scan through each method in the code
file, identify a method call that is an authentication response. For each such method call,
identify an action that compares an existing state string to the state parameter retrieved from
the authentication response. The absence of such a comparison means we have a potential
vulnerability, and MISSING_VERIFY_OIDC_STATE. Additionally, identify if the State object is
passed to another method. If we cannot find verification in the called method, report MISS-
ING_VERIFY_OIDC_STATE. If the called method is not in this Java class, and no checks were
found elsewhere, report EXTERNAL_CALL_POSSIBLY_MISSING_VERIFY_OIDC_STATE
Severity
This bug may lead to Replay Attacks (See Table 3.1). An attacker may impersonate a proto-
col entity by obtaining a credential value. However validation of the state parameter helps
mitigate this kind of attack, as several separate data artifacts contribute to the integrity of a
request.
There are generally two correct ways of implementing the ID Token verification in the devel-
opment model. This step differs from SDK to SDK. In google you have two options. Either
use the token validator and check the last two (signature and nonce) yourself, or call the
validate-methods implemented on the IDToken wrapper class around JWT.
Listing 6.7 shows how the token request is constructed in the callback method, after re-
ceiving and verifying the callback request from the IdP. After running receiving the token
response from [Link](tokenRequest) in line 10, the token response
is passed to the verification method in line 11, and the process moves to step 3.
1 public Response callback ( H ttp Se rv let Re qu est callbackRequest ) {
2 try {
3 ...
4 // .. state validation and error check
5 String authori zationCo de = responseUrl . getCode () ;
6 TokenRequest tokenRequest = a u t h o r i z a t i o n C o d e F l o w .
newTokenRequest ( autho rization Code )
7 . setToken ServerUr l ( new GenericUrl (
a u t h o r i z a t i o n C o d e F l o w . g e t T o k e n S e r v e r E n c o d e d U r l () ) )
8 . setClientAuthentication ( authorizationCodeFlow .
g e t C l i e n t A u t h e n t i c a t i o n () )
9 . setRedirectUri ( redirectUri ) ;
10 IdTokenResponse idTokenResponse = IdTokenResponse . execute (
tokenRequest ) ;
11 return validateTokens ( idTokenResponse , oidcConfig ) ;
12 } catch ( Exception e ) {
13 return Response . status ( Response . Status . UNAUTHORIZED ) . build () ;
14 }
15
16 }
In Listing 6.8, a correct example using the IdTokenVerifier is shown. If the developer
manages to do the other required checks in addition to the initially incomplete validator,
this is predicted as secure code.
1 // Step 3
2 public Response validateTokens (... tokenResponse , ... oidcConfig ) {
3 try {
4 IdToken idToken = tokenResponse . parseIdToken () ; // Parse
5 IdTokenVerifier verifier = new IdTokenVerifier . Builder ()
6 . setAudience ( clientId ) )
7 . setIssuer ( providerMetadata . get ( "
iss " ) )
8 . setAcceptableTimeScewSeconds (
TIME _SKEW_SEC ONDS )
9 . build () ;
10 IdToken idToken = IdToken . parse ( new GsonFactory () , tokenString ) ;
11 if (! oidcConfig . nonce . equals ( idToken ... getNonce () ) ) {
12 return Response ... UNAUTHORIZED ...
13 " Provided nonce did not match " ;
14 }
15 if (! idToken . verifySignature ( pu b li c Ke y F ro m Jw k S et () ) ) {
16 return Response ... UNAUTHORIZED ...
CHAPTER 6. RQ2 RESULTS: DESIGN AND IMPLEMENTATION 74
Listing 6.8: Step 3 - Correct token validation using the Google library. Full example in Listing
5.4. A corresponding example using the Nimbus SDK is shown in Listing 5.5.
The easiest example which will yield a warning is a received and parsed ID token response,
without any following verification (all five checks missing) is shown in Listing 6.9. Verifica-
tion of the ID token is expected between line 6 and line 9. This example is a vulnerability of
type MISSING_VERIFY_ID_TOKEN.
1 public Response callback ( H ttp Se rv let Re qu est callbackRequest ) {
2 try {
3 // After verified state and parse auth code ..
4 TokenRequest tokenRequest = ...
5 IdTokenResponse idTokenResponse = IdTokenResponse . execute (
tokenRequest ) ;
6 IdToken idToken = idTokenResponse . parseIdToken () ;
7 // BUG : missing verification
8 // userinfo request with ID token ...
9 return Response . ok ()
10 . entity ( idTokenResponse )
11 . build () ;
12 }
13 } catch ( Exception e ) {
14 // Error handling
15 }
16 return Response ... UNAUTHORIZED ...
17 }
Listing 6.9: Step 3 - Missing token validation using the Google library.
26 }
27 }
Listing 6.10: Step 3 - Incomplete token validation using the Google library ID token verifier.
This is another variation where this method is called from "callback".
Vulnerability Patterns
There are seven vulnerability patterns in this detector, shown in Table 6.6.
CHAPTER 6. RQ2 RESULTS: DESIGN AND IMPLEMENTATION 76
Table 6.6: Vulnerability Patterns in the Improper ID token verification detector. Two of them
are collector classes for the token verification parameters. Seven individual vulnerability
patterns are checked for in the detector.
Detection strategy
This detector also uses the Co-existing Invocation Enforcement analysis to detect vulner-
abilities. The detection strategy uses the following process visiting each method in a Java
class:
1. As we scan through the methods we collect relevant methods in a set of data structures:
2. Look for invocations patterns that indicate that an ID token is retrieved, for example
[Link]().
3. Look for other patterns that indicate that validation is happening. We have the follow-
ing options:
CHAPTER 6. RQ2 RESULTS: DESIGN AND IMPLEMENTATION 77
4. If the ID token was retrieved, but this method is not added to a list for later inspection,
raise warning of MISSING_VERIFY_ID_TOKEN. No sign of validation was found.
5. Look through all the methods in list 2, searching for the five required checks to be
present. If any one of the required checks is absent, raise a warning of
INCOMPLETE_ID_TOKEN_VERIFICATION.
6. Look through the hash map of pairs. If the called method is not in list 1, raise a warning:
Severity
This bug may lead to Replay Attacks (See Table 3.1). An attacker may impersonate an ad-
versary by obtaining a credential value. However validation of the state parameter helps
mitigate this kind of attack, as several separate data artifacts contribute to the integrity of a
request.
This detection strategy is simple, but is limited to detecting the appearance or absence of
usages of a certain data attribute, namely the State parameter. Is is not able to tell whether
the check is carried out in a proper control flow. However this strategy is cheaper, and the
existence of this vulnerability makes a heavier control flow analysis unnecessary. Only when
this bug does not exist, control flow and potentially data flow analysis can be employed to
further verify the solidity of the code.
CHAPTER 6. RQ2 RESULTS: DESIGN AND IMPLEMENTATION 78
In code doing token validation, each check must be followed by a correct response. If a value
does no match, a HTTP response code 401 UNAUTHORIZED must be returned, like shown in
Listing 6.11. Here the checks are implemented as a series of similar if-checks. All the checks
use a negative comparison, meaning that the value being true means that the values do not
match. In line 4 for example, the program flow is broken and a response code is returned if
the saved nonce parameter does not match the one in the ID token.
1 public Response validateTokens (... tokenResponse , ... oidcConfig ) {
2 try {
3 IdToken idToken = tokenResponse . parseIdToken () ;
4 if (! oidcConfig . nonce . equals ( idToken ... getNonce () ) ) {
5 return Response ... UNAUTHORIZED ...
6 " Provided nonce did not match " ;
7 }
8 if (! idToken . verifySignature ( pu b li c Ke y F ro m Jw k S et () ) ) {
9 return Response . status ( Response . Status . UNAUTHORIZED )
10 " JWT signature is not valid " ;
11 }
12 ... other checks
13 .... c r e a t e A n d S t o r e C r e d e n t i a l ( tokenResponse , oidcConfig . appuuid ) ;
14 return Response . ok ()
15 . entity ( tokenResponse )
16 } catch (... | ... | ... Exception e ) {
17 return Response . status ( Response . Status . BAD_REQUEST ) . build () ;
18 }
19 }
Listing 6.11: Correct suggestion for token validation. The full example is found in Listing 5.4
The basic error that is attempted covered with the CFG detector is cases where the developer
makes a silly mistake and for instance forgets to enforce a check properly, or accidentally
reverses a conditional. In the case in Listing 6.12 an if-check of the token signature is imple-
mented in line 5, but the program flow is not broken. As a result, the program will continue
down to line 9, where a success response is returned. In such a case, the program acciden-
tally falls back on the green path even if the signatures did not match.
CHAPTER 6. RQ2 RESULTS: DESIGN AND IMPLEMENTATION 79
Another mistake developers may make is to write reversed logic. In line 1, the if-check is
not checking for a negative value. As a result, the code will accept an incorrect issuer value,
and an adversary impersonating an IdP might exploit this mistake.
1 if ( idToken . verifyIssuer (...) ) {
2 // BUG : Reversed if conditional
3 return Response ... UNAUTHORIZED ...
4 }
5 if (! idToken . verifySignature ( publicKey ) ) {
6 // do something
7 // BUG : no return . Falls through to response OK .
8 }
9 ...
10 return Response . ok ()
11 . entity ( tokenResponse )
12 . build () ;
Listing 6.13 shows other hypothesized ways the developer may write code that responds
to checks with an incorrect control flow. In line 5, a return or throw statement that breaks the
control flow is absent. In line 9 the developer accidentally returns a wrong response code,
instead of an error code. Lastly, in line 14, the developer returns null. This is not necessarily
a vulnerability, but is a code smell.
1 public Response validateToken ( IdTokenResponse tokenResponse , OidcConfig
oidcConfig ) {
2 try {
3 IdToken idToken = tokenResponse . parseIdToken () ; // Parse
4 if (! oidcConfig . nonce . equals ( idToken . getPayload () . getNonce () ) ) {
5 // BUG : no return
6 }
7 ...
8 if (! idToken . verifyAudience ( Collections . singleton ( clientId ) ) ) {
9 return Response . ok () . build () ;
10 // BUG : returns OK in wrong place
11 }
12 if (! idToken . verifyTime ( Instant . now () . toEpochMilli () ,
13 DEFAULT_TIME_SKEW_SECONDS )){
14 return null ;
15 // BUG : Smelly code returning null .
16 ...
17 ... c r e a t e A n d S t o r e C r e d e n t i a l ( tokenResponse , oidcConfig . appuuid ) ;
18 return Response . ok ()
19 . entity ( tokenResponse )
20 . build () ;
21 } catch (... | ... | ... Exception e ) {
22 return Response ... BAD_REQUEST ) ;
23 } catch ( Exception e ) {
24 return Response ... I N T E R N A L _ S E R V E R _ E R R O R ) ;
25 }
26 }
Vulnerability Patterns
Table 6.7 shows the two vulnerability patterns used by the Control flow ID token verification
detector.
Table 6.7: Vulnerability patterns in the Control flow ID token verification detector
Detection strategy
This detector uses the Static Control Flow Check analysis, which is described in Section 6.2.3.
It looks for patterns of ID token validation like the Improper ID token verification detector.
It has the following outcomes when analyzing methods that do ID token verification:
Severity
If one of the checks does not have a correct response, or a conditional is reversed, the check
is essentially useless. The relying party will then be at risk of a token forgery attack.
Chapter 7
Evaluation
This chapter contains a practical demonstration and validation of the analyses. This valida-
tion of the research results uses the “slice of life” examples taken from real code bases (See
Chapter 4.3).
Section 7.1 goes through the experimental setup. Quantitative results and metrics are
presented in tables in Section 7.2. Section 7.3 contains qualitative interpretation of the re-
sults, going into some key examples with insights into why the analysis yielded false positives
or negatives, or succeeded.
81
CHAPTER 7. EVALUATION 82
Table 7.1: Applications using OpenID Connect SDKs that were analyzed. Three using the
Google library, and three using the Nimbus SDK. Files denoted files included from the given
application.
Like mentioned in Chapter 6.3, detectors have not been implemented for all the steps in
the flow, and do not currently cover all the points in the checklist. For this demonstration it
suffices to show a case for each type, though two detectors are implemented for Co-existing
Invocation Enforcement. The detectors tested in the evaluation are shown in Table 7.2. The
flow steps referred to in the table are explained in Chapter 5.1.
Table 7.2: Overview of the implemented analyses as Find Security Bugs detectors used in the
evaluation.
The code bases used in the validation were obtained in Github open-source repositories
on 18th of May 2020. Github repositories were searched for projects containing code that
uses any of the two SDKs, the Nimbus SDK or the Google library, which are used as basis
for the analyses in this thesis. The goal for the experiment on source code bases was to ob-
tain three code bases using the Nimbus SDK, and three code bases using the Google library,
aiming at a total of six open-source projects. The search protocol on Github was as follows:
CHAPTER 7. EVALUATION 83
import [Link]
– For the Google SDK:
import [Link].
• Perform a Github search with the given strings above, in code mode: [Link]
com/search?q=<query-string>&type=Code. This mode searches the contents of
code files.
• Scan through the first 20 pages for relevant code projects, or stop if goal is reached.
• The included code bases must satisfy the following inclusion criteria:
1. This seems like a real code base used in a production setting, and is not just an
example or a personal dummy project.
2. The file that contains the code matching the search hit indeed seems to be imple-
menting parts of, or the whole authorization code flow in a Relying Party.
The analyses were performed on a Lenovo P1 gen 2 running Ubuntu 19.04, with a Intel
Core i7-9850H processor and a Graphics adapter NVIDIA Quadro T1000 (Laptop) 4096 MB
and 32GB RAM.
Detectors were run in the FindBugs Command-Line Interface (CLI)1 , based on instruc-
tions in the CLI guide at FindSecBugs [73]. The tool was built in the following way from the
root folder in the forked FindSecBugs project [28], in the evaluation-opensource branch2 :
The analyzed applications were all cloned, and the files considered relevant were built to
.jar files, because the CLI interface takes compiled java as .jar files for input. Inclusion
criteria for the files was that they had imports of SDK classes that are used in OpenID Con-
nect or OAuth, or that they were linked to classes that satisfied this constraint. As few files
as possible to compile the given project without too much changes were included. Other
files than this would not trigger detectors. The analyzed files were added to the Eval and
build in the each their sub-module in the Eval project [29]. Then in the root folder of the
1
Findbugs guide for running in command-line: [Link]
html
2
evaluation-opensource branch: [Link]
feature/evaluation-opensource
CHAPTER 7. EVALUATION 84
Eval project, $ mvn clean install was run to get a .jar file. The file was generated to
<submodule-subject>/target.
Then in the FindSecBugs project, still in the find-sec-bugs/cli directory, each of the
subjects were analyzed by entering the following command in the find-sec-bugs/cli di-
rectory:
$ ./[Link]
-output <subject>[Link]
-visitors ImproperTokenValidationDetector,
TokenValidationCFGAnalysis,
InsecureAuthorizationGrantDetector,
MissingCheckStateOidcDetector
<path from root>/Eval/<subject-module>/target/<subject>-[Link]
The resulting analyses came as output in the named xml file in the find-sec-bugs/cli
directory. This action runs the whole FindSecBugs plugin (also including FindBugs detec-
tors) on the targeted code, including the analyses implemented in this thesis. Only the de-
tectors that were triggered came in with specific analysis times in millisecond. The total time
that is computed was the time for the whole plugin to run analyses on the given code. The
files were reviewed manually to verify the analysis results.
The analyzed files are publicly available at the Oidc-FindSecbugs-Eval project on Github [29],
and the detectors that were run in a forked FindSecBugs repository [28], in the branch evaluation-
opensource. Most of the project files of the analyzed applications had to be altered because
they had dependency errors or would not build from their master branch, or were hard to
configure to run. These alterations mainly included commenting out internal library refer-
ences, and did not touch any code artifacts likely to affect the analysis. The projects varied
in size, and therefore required different inclusion volumes to compile without altering too
much of the code. Most changes are documented with comments in the code in the Eval
project.
The FindSecBugs plugin can normally be used in the IDE, but compatibility issues and an
outdated version of FindSecBugs made it impossible to run in the IDE at this time. Users run-
ning the plugin in normal circumstances would simply install the plugin and run it targeted
at their selected files, or on their whole project. That would make the process of running the
analysis significantly easier than the during this evaluation.
these metrics, properties like precision, and false positive rate were selected for illustrating
performances as fractions. Recall is also included, but this metric has lower confidence be-
cause the code analyzed is real applications, which means it is hard to get ground-truths
about how many vulnerabilities exist in the applications. Still, some limited measure of the
characteristic can be included to have some reference. The most important role of these
characteristics will be to illustrate the performance of the analyses in a set of real-world
cases.
To calculate the metrics, constraints have to be set for how positives and negatives are
counted for the detectors. The following constraints are set for defining the evaluation met-
rics.
These methods provide a benchmark, and metrics are computed on only these potential
warning-raising methods, while other boilerplate methods which would otherwise obscure
the data are excluded. Timing of detectors also includes the other files, as the plugin scans
the whole project.
Definition 7.1.2. Total potential warnings for a method The total amount of potential warn-
ings WM for a method, analyzed with d detectors each report reporting P i patterns: Wm =
Pd
i =1 P i
Definition 7.1.3. Potential warnings for a class The total amount of potential warnings WC
for a class: WC = m · Wm
Definition 7.1.4. Potential warnings for an application per detector The amount of poten-
tial warnings W AD for an application with c classes for a detector with P patterns: W A =
c · m wr · P
Definition 7.1.5. Total potential warnings for an application The total amount of potential
warnings W AT for an application with c classes: W AT = c ·C w = c ·m wr ·WM = c ·m wr · di=1 P i
P
CHAPTER 7. EVALUATION 86
True positives
The detector has a true positive if it raises warning in one of its rules is broken in a method,
and the rule is in fact broken. Some warnings may link two methods since they relate to the
same vulnerability, this is then counted as one positive. The set of true positives (T P ) for a
method is the amount of patterns P that are true rule violations.
False positives
The detector has a false positive (F P ) if it raises warning in one of its rules is broken in a
method, and the rule is followed in the code. A special case exists here. If the Improper-
TokenValidationDetector notices that the code is missing four checks, it will raise a general
warning plus each of the four instances. This is then counted as four reports. If all five checks
it looks for are missing, it will condense them into one warning saying you miss five things.
This is then counted as five positives, since the total absence of five properties raised this
special warning. The set of false positives (F P ) for a method is the amount of patterns P
raised as warnings which are not rule violations.
True negatives
The detector has a true negative (T N ) if it does not report any vulnerabilities, and the method
in a class does not have a vulnerability breaking the rule. Each method satisfying this prop-
erty is considered a true negative. The true negatives illustrate the difference between poten-
tially raised warnings and true warnings. To further clarify, for any method where a detector
can potentially report six different vulnerabilities, and none of these vulnerabilities exist in
the method, the method had six true negatives. Based on Definition 7.1.5, the following def-
inition is set for counting true negatives in the analysis results of an application:
Definition 7.1.6. Count true negatives method The total amount of true negatives T Nm for
a method with a total of potential warnings Wm is defined:
T N m = Wm − T P m − F P m − F N m
Definition 7.1.7. Count true negatives application The total amount of true negatives T N A
for an application with a total of potential warnings W AT and methods m is defined:
T N A = W AT − T P A − F P A − F N A = m
P
n=1 T Nm
False negatives
The detector has a false negative it it does not report the vulnerabilities in the code, but there
is in fact a vulnerability in the code. The set of false negatives (F N are counted with manual
review of the code, imitating the potential warnings that would have been raised by detec-
tors.
CHAPTER 7. EVALUATION 87
7.2.1 Overview
Table 7.3: Overview of the analysis results for the four detectors on each of the six applica-
tions. Subject: the application analyzed, Files: number of source files included in analysis,
G: Google, N: Nimbus, t t : total plugin clock run time in seconds, Detector: the implemented
analysis, W A : the potential number of warnings for a detector analyzing an application, TP:
True positives (vulnerabilities found), FP: false positives, FN: false negatives, TN: true nega-
tives, t: run time as clock milliseconds,
Table 7.4: Total of analysis results for detectors. W AT : the total potential number of warnings
a detector may raise, TP: True positives (vulnerabilities found), FP: false positives, FN: false
negatives, TN: true negatives
Detector W AT TP FP TN FN
[Link] 9 0 0 9 0
State ver. 14 0 2 10 1
Token ver. 154 20 13 121 0
Token CFG 10 0 4 6 0
SUM 187 20 19 147 1
Like Table 7.5 shows, it is the Co-existing Invocation Enforcement analysis implemented through
the ImproperTokenValidationDetector that got the highest volume of detected vulnerabili-
ties, and had the most relevant analysis points in W A .
Table 7.5: Analysis results for the performance of the detectors, grouped per SDK. W AT : the
total potential number of warnings a detector may raise in a given application, TP: True pos-
itives (vulnerabilities found), FP: false positives, FN: false negatives, TN: true negatives.
SDK Detector W AT TP FP TN FN
Google [Link] 4 0 0 9 0
State ver. 4 0 0 3 1
Token ver. 84 15 3 66 0
Token CFG 10 0 4 6 0
SUM G 102 15 7 79 1
Nimbus [Link] 5 0 0 5 0
State ver. 10 0 2 8 0
Token ver. 70 5 10 55 0
Token CFG 0 0 0 0 0
SUM N 85 5 12 68 0
SUM 187 20 19 147 1
7.2.2 Metrics
From the numbers of the analyses, the metrics Precision, Recall and True Negative Rate
(TNR) were computed. These are explained and defined in Chapter 2.6.
True positives
P r eci si on =
True positives + False positives
True positives
Rec al l =
True positives + False Negatives
CHAPTER 7. EVALUATION 89
True negatives
True negative rate =
False positives + True negatives
When looking at all the detectors on all the code files, the evaluation yielded an unsatis-
factory precision of 51%. With further investigation, this may be caused by certain outliers
in a subset of the analyzed applications. Two of the applications implementing the Nimbus
SDK had a code structure that lead to false positives. These outliers likely skewed the data.
Insights into these false positives are explained in Sections 7.3. When looking at the data of
the Token ver. detector for the applications in the Google API, it has a promising precision of
83%.
Table 7.6 shows the metrics for the detectors. The metrics for the results analyzing the
two different groups of applications are shown. One of the groups contains the applications
using the Google library, while the other contains the ones using the Nimbus SDK. There
were some internal differences when looking at these two groups. The total metrics for the
analysis of all the analyzed subjects is shown in the bottom of the table.
Table 7.6: Metrics for analysis results per detector and total: Precision, Recall and True Neg-
ative Rate (TNR). The totals for Google (Total G) and Nimbus (Total N) are calculated using
the complete numbers in SUM G and SUM N in Table 7.5, including numbers from all the
detectors.
Emphasis is put on Improper ID token verification detector, because was the detector with
the largest volume of data, and the only detector with true positives.
7.3.1 ZopSpace
ZopSpace is one of the google-library using applications, and is as the only one, an Android
app [102]. This heavily limited the inclusion of files, since Android SDKs will not easily com-
pile in an ordinary Java run-time environment. However the one relevant file was mostly
written in “ordinary” Java, and required few changes to make it compile.
When analyzing this code base, the ID token validation and partly the State validation
detector yielded the most interesting results, which illustrate both strengths and weaknesses
with the analyses.
MissingCheckStateOidcDetector
Listing 7.1: Step 1 in zopspace, building the Authorization Code request URL. The URL
clearly misses addition of the state parameter.
This absence was not picked up by the MissingCheckStateOidcDetector, which uses the
callback context as its entry-point trigger for analysis. However another detector could be
added, which focuses on how the request URL is build in step 1. Such a detector is suggested
in Table 6.2. It could use the following process to detect the bug above:
2. Look for the setState() method call on the request url object (which comes in some
form to allow adding the state parameter).
If the first point is satisfied, the second one must also be satisfied. Otherwise a warning
of missing state parameter is raised.
ImproperTokenValidationDetector
The Improper ID token verification detector was able to pick up 10 vulnerabilities of incom-
plete validation of the ID token in the Zop-app project. It had four true positives which were
just as expected, in the isValidIdToken method in Listing 7.2. In line 6, the IdTokenVerifier is
instantiated, and the audience parameter is added. When [Link]() is called in
line 10, only the freshness and audience parameter is verified. The analysis correctly reported
that this file misses validation of the nonce, signature and iss parameters. Additionally it re-
ported USING_INCOMPLETE_ID_TOKEN_VALIDATOR, which is a vulnerability pattern re-
ported when the verifier is implemented incompletely.
1 boolean isValidIdToken ( String clientId , String tokenString ) {
2 if ( clientId == null || tokenString == null ) {
3 return false ;
4 }
5 List < String > audiences = Collections . singletonList ( clientId ) ;
6 IdTokenVerifier verifier = new IdTokenVerifier
7 . Builder ()
8 . setAudience ( audiences ) . build () ;
9 IdToken idToken = IdToken . parse ( new GsonFactory () , tokenString ) ;
10 return verifier . verify ( idToken ) ;
11 }
Listing 7.2: Incomplete ID Token verification in an open-source android app project [102].
Interestingly, the analysis also unexpectedly picked up five true positives in refreshTokens,
even though this is currently not covered by the model. This was picked up because the de-
tector looks for token requests as pattern A, and then expects validation as B. This is however
also a true vulnerability according to the OpenID Connect specification [60], since it is re-
quired to perform a complete ID token verification if an ID token is included in the refresh
request. The example here shows that the analysis is applicable, even if its initial design was
based on a more restricted mental model.
CHAPTER 7. EVALUATION 92
Listing 7.3: Refresh token request. Currently not covered by the model in this thesis, but the
analysis still picked it up.
7.3.2 Atricore
Atricode-idbus, the Atricore Identity Bus Platform [7] has one file which was considered rel-
evant upon inspection, the GoogleAuthzTokenConsumerProducer which uses the Google li-
brary. Due to their complex architecture, several other files touch upon OpenID Connect
messages. These are not included, both since their relevance is limited, and because it would
require a lot of work to make these files compile.
Token validation is clearly missing in the method doProcessAuthzTokenResponse in List-
ing 7.4. After retrieving the ID token in, the code directly proceeds to retrieve user info.
1 protected void d o P r o c e s s A u t h z T o k e n R e s p o n s e ( C a m e l M e d i a t i o n E x c h a n g e
exchange ) {
2 ... correct authorization response parsing
3 ... state validation
4 request . setRedirectUri ( a c c e s s T o k e n C o n s u m e r L o c a t i o n . getLocation () ) ;
5 IdTokenResponse idTokenResponse = ( IdTokenResponse ) mediator .
sendMessage ( request , a c c e s s T o k e n C o n s u m e r L o c a t i o n ) ;
6 IdToken idToken = idTokenResponse . parseIdToken () ;
7 // NO ID token validation !!
8 ... userinfo request
9 }
Listing 7.4: Steps 2-3 in atricore, token request. Token validation is missing.
This turned out to be a case which worked exactly as expected for the ImproperToken-
ValidationDetector. The trigger was a call to [Link](). Then the
detector expected to find validation. Failing to find this in the method, it raised warning that
all five required checks are missing.
7.3.3 Firebase
Firebase is an open-source app development platform delivered by Google. The code an-
alyzed is their admin Java SDK [35]. Different from the other code bases, this application
CHAPTER 7. EVALUATION 93
turns out to not implement the entire OpenID Connect protocol. Rather, they have they own
way of doing authorization and authentication, but are using the types of the Google OIDC
library to wrap around their JSON Web Tokens, which are used to manage identities. This
strictly means that the discovery here is not a true bug, given that they do not follow the pro-
tocol specification in the first place, and therefore are not bound by its rules. However given
the rule set of the detectors, the broken rule of missing nonce was correctly identified in the
code.
Listing 7.5 contains a method which provoked three false positives in Firebase. The method
returns the IdTokenVerifier class in the Google Library, which does not completely do all the
required validation checks. The detector assumes that this method is used to validate the ID
token, and correctly identifies that this method in itself misses the signature validation and
nonce validation checks. However, this method is called elsewhere, and its result is used in a
place where the needed checks are in fact implemented.
The method in Listing 7.5 is implemented in an entirely different code file than the other
relevant code, and effectively dodges the inherent single-file analysis design of FindBugs,
which the analyses in this thesis is limited to. These false positives could be avoided by mod-
ifying the analysis to set stricter rules for the conditions that form an entry-pattern. However,
imposing such stricter rules would also yield some false negatives, in a case.
1 private static IdTokenVerifier ne wI dT oke nV er ifi er ( Clock clock ,
2 String issuerPrefix ,
3 String projectId ) {
4 return new IdTokenVerifier . Builder ()
5 . setClock ( clock )
6 . setAudience ( ImmutableList . of ( projectId ) )
7 . setIssuer ( issuerPrefix + projectId )
8 . build () ;
9 }
Firebase was the only application implementing code where the Control flow ID token veri-
fication detector was relevant. It yielded four false positives.
One of the false positives was the of the REVERSED_IF_EQUALS_ID_TOKEN_VERIFY
pattern, shown in method isSignatureValid in Listing 7.6. The check on line 3 is reversed
from what is expected in the model that drives the Control flow ID token verification detec-
tor, which wants to see checks like !isValidABC(). Such a negative check does indeed come
in the method that calls this method, shown in Listing 7.7 in line 3. Cases like this can proba-
bly be accounted for if the analysis is re-engineered slightly to reason about inter-procedural
heuristics like the Co-existing Invocation Enforcement analysis does.
CHAPTER 7. EVALUATION 94
Listing 7.7: Method in Firebase that raised reversed if conditional check warning.
11 ...
Listing 7.8: Method in Firebase that raised improper control flow warning.
Listing 7.9: Method in Firebase that raised improper control flow warning.
7.3.4 SonarQube
In the admin api of SonarQube [94], the Improper state verification detector yielded a false
positive. This false positive is completely impossible for a tool like FindSecBugs to detect.
Listing 7.10 contains the step 2 callback method called getAuthorizationCode. The parse
call in line 5 is the trigger of the analysis. However the state parameter is not verified here.
Instead, it is verified in another class, in the method shown in Listing 7.11 in line 2. It verifies
the value passively, and we cannot reason about the type of the State parameter at all. The
way these have implemented their code, it is impossible for the analysis to avoid the false
positive. FindBugs detectors inherently cannot reason about facts that cross Java classes,
since they visit one and one class.
1 public Authori zationCo de g e t Au t h o r i z a t i o n C o d e ( H ttp Se rv let Re qu est
callbackRequest ) {
2 A u t h en t i c a t i o n R e s p o n s e authResponse = null ;
3 try {
4 HTTPRequest request = ServletUtils . crea teHTTPReq uest (
callbackRequest ) ;
5 authResponse = A u t h e n t i c a t i o n R e s p o n s e P a r s e r . parse ( request . getURL ()
. toURI () , request . g etQ ue ry Par am et ers () ) ;
6 } catch ( ParseException | U RI Sy nta xE xc ept io n | IOException e ) {
7 throw new I l l e g a l S t a t e E x c e p t i o n ( " Error while parsing callback
request " , e ) ;
8 }
9 if ( authResponse instanceof A u t h e n t i c a t i o n E r r o r R e s p o n s e ) {
10 ErrorObject error = (( A u t h e n t i c a t i o n E r r o r R e s p o n s e ) authResponse ) .
getErrorObject () ;
11 throw new I l l e g a l S t a t e E x c e p t i o n ( " Authentication request failed : "
+ error . toJSONObject () ) ;
12 }
13 Autho rization Code auth orizatio nCode = ((
A u t h e n t i c a t i o n S u c c e s s R e s p o n s e ) authResponse ) . g e t A u t h o r i z a t i o n C o d e () ;
14 return authori zationCo de ;
15 }
16 }
CHAPTER 7. EVALUATION 96
Listing 7.10: Callback method in SonarQube yielding a false positive for the Improper state
verification detector.
Listing 7.12 shows the method that yielded five false positives in SonarQube. The analysis
incorrectly thinks this method needs five checks, because it sends token request and simply
passes its result with [Link](response) in line 7. It is however
not this method that is required to actually have checks, but rather the method that calls this
method. In this case, the method that called this one also missed the checks, and had five
true positives.
Still it is incorrect to flag getTokenResponse as a vulnerable method. However, like ex-
plained in Section 7.3.6, a case like this yields false positives even in the code correctly im-
plements checks elsewhere.
1 protected TokenResponse getTokenResponse ( Authori zationCod e
authorizationCode , String callbackUrl ) {
2 try {
3 URI tokenEndpointURI = ge t Pr o v id e rM e ta d a ta () . ge t To k e nE n dp o i nt U RI ()
;
4 TokenRequest request = new TokenRequest ( tokenEndpointURI , new
Clie ntSecretB asic ( getClientId () , getClientSecret () ) ,
5 new A u t h o r i z a t i o n C o d e G r a n t ( authorizationCode , new URI (
callbackUrl ) ) ) ;
6 HTTPResponse response = request . toHTTPRequest () . send () ;
7 return O I D C T o k e n R e s p o n s e P a r s e r . parse ( response ) ;
8 } catch ( U RI Sy nta xE xc ept io n | ParseException e ) {
9 throw new I l l e g a l S t a t e E x c e p t i o n ( " Retrieving access token failed " ,
e);
10 } catch ( IOException e ) {
11 throw new I l l e g a l S t a t e E x c e p t i o n ( " Retrieving access token failed : "
12 + " Identity provider not reachable - check network proxy
setting ’ http . nonProxyHosts ’ in ’ sonar . properties ’" ) ;
13 }
14 }
7.3.5 Liferay
In the Liferay portal [52], the Improper state verification detector gave a false positive. This
false positive came for the same reason as for why the token validation gets a false positive
in Codice (Section 7.3.6).
The validation of the state parameter happens in a different method which calls the
method triggering the analysis, and it is not possible to eliminate this false positive with-
out the same effort described in Section 7.3.6. This is an inherent limitation of the simple
detector-baser analyses in FindBugs, which visits each class individually.
7.3.6 Codice
Listing 7.13 shows the method that yielded five false positive in Codice [14]. The analysis
incorrectly thinks this method needs five checks, because it sends token request and simply
passes its result in the end line where it returns [Link]().
It is however not this location that is required to actually have checks, but the method that
calls this one.
1 public static OIDCTokens getOidcTokens (
2 Au th or iza ti on Gra nt grant ,
3 O I DC P r o v i d e rM e t a d a t a metadata ,
4 C l ie n t A u t h e nt i c a t i o n clientAuthentication ,
5 int connectTimeout ,
6 int readTimeout )
7 throws IOException , ParseException {
8 final TokenRequest request =
9 new TokenRequest ( metadata . ge t T ok e nE n d po i nt U RI () ,
clientAuthentication , grant ) ;
10 HTTPRequest tokenHttpRequest = request . toHTTPRequest () ;
11 tokenHttpRequest . setConne ctTimeou t ( connectTimeout ) ;
12 tokenHttpRequest . setReadTimeout ( readTimeout ) ;
13 final HTTPResponse httpResponse = tokenHttpRequest . send () ;
14 LOGGER . debug (
15 " Token response : status ={} , content ={} " ,
16 httpResponse . getStatusCode () ,
17 httpResponse . getContent () ) ;
18 final TokenResponse response = O I D C T o k e n R e s p o n s e P a r s e r . parse (
httpResponse ) ;
19 if ( response instanceof T oke nE rr orR es po nse ) {
20 throw new T ec hni ca lE xce pt io n (
21 " Bad token response , error = " + (( To ke nE rro rR es pon se ) response )
. getErrorObject () ) ;
22 }
23 LOGGER . debug ( " Token response successful " ) ;
24 final OI DCTokenRe sponse t o k e n S u c c e s s R e s po n s e = ( OIDC TokenRes ponse )
response ;
25 return to k e n S u c c e s s R e s p o n s e . getOIDCTokens () ;
26 }
Listing 7.13: A method that yielded five false positive in Codice/ddf [14].
The method that called this one was not flagged for warnings, as it passed the rules of
the analysis. This this method, which is called by the other one, gets positives. The rea-
CHAPTER 7. EVALUATION 98
son for this is a limitation on how the Co-existing Invocation Enforcement analysis handles
inter-procedural heuristics. The Co-existing Invocation Enforcement uses simple heuris-
tics to approximate one piece of inter-procedural analysis, by going down in the call graph.
This was modeled for the case where a certain check is delegated down to another method.
Code structured like the pseudo code below will not give false positives, because the method
verifyB() contains the needed checks, which is then called by method getA(). Some in-
struction trigger() will alert the detector that a set of checks is required:
boolean verifyB(data) {
if check data.a;
if check data.b;
if check data....
}
Response getA() {
trigger() // a trigger that says checks are needed
if verifyB(data)
...
return response;
}
This is solved, and the initial false positive is avoided because we "put aside" verifyB()
and getA(), and do a double check after the initial screening analysis is [Link]() is then
cleared from being a potentially vulnerable method when it is verified that it is assosciated
with verifyB(), which contains the required checks. This is explained further in Chap-
ter 6.2.2.
However in the case for this set of false positives, the code has the opposite structure,
essentially requiring the analysis to be able to go upwards in the call graph. The “level” above
this current method, verification might actually happen. The code like below will give false
positives:
Trigger getTrigger(data) {
...
return trigger();
// Here the detector expected some checks!
}
Response getA() {
Trigger t = getTrigger();
CHAPTER 7. EVALUATION 99
if check a;
if check b;
if check...;
return response;
}
The detector notices that the method getA() contains checks, and will correctly say that it is
safe. Still the instruction trigger() alerts the detector that some checks are needed to fol-
low the rules. However this would be a reversed association between the methods, compared
to the heuristic that is used in the other example above. The principle for detecting that these
are in fact negatives could be similar, and another set of associated methods could possibly
be put aside and checked.
To do this however, the whole analysis would need a redesign since it is inherently intra-
procedural with a simple inter-procedural check. This requires a significant amount of time,
and it might take a full month to properly extend the analysis to account for this. The “down-
ward” checks are just based on another linear verification of the code, and still keeps the de-
tector rather simple. The same might be done for the above code snippet, but this may also
have other effects on the recall of the analysis, and introduces complexity to the detector.
It would then have to approximate what happens both upwards and downwards in the
call graph with only simple heuristics. To get to this level one might as well just have to
compute a call graph, which introduces a different level of complexity, and goes beyond the
scope of the simple nature of the Co-existing Invocation Enforcement analysis.
Chapter 8
Discussion
100
CHAPTER 8. DISCUSSION 101
tage and a weakness. On one side, the specificity allows us to get a more clear set of rules to
enforce when performing automated analyses like the ones designed and implemented in
this thesis. The rules are rooted in explicit code examples, and the specific SDKs are likely
to be used by many applications. Hence by adding rules for more SDKs, a significant part of
the client applications on the web can be secured, as long as they use a popular and well-
known SDK to support their implementation. However the specificity also introduces some
weakness, as it can have a too narrow view for some applications, and may miss some rele-
vant vulnerabilities. There are examples of this shown in the evaluation of the implemented
static analyses (Chapter 7.3). The restrictions of the model makes the analyses prone to both
false positives and false negatives. However it is shown by the demonstration that analyses
based on the model found vulnerabilities in real-world code.
name several roles. In the development stage the entities of Identity Provider and Relying
Party must be developed.
The IdP is often delivered as a service or product by a large organization, and is likely to
be put under a careful testing regime, which enhances their security. The IdP organizations
often also provide their own type sets or development kits that ordinary developers can use
to integrate their app with the IdP. Nevertheless, developers implementing an their app are
not unlikely to fail in writing some critical implementation details correctly. It is apparent
by the analyses of the two SDKs and their developer guides [15, 19, 37], that the SDKs them-
selves not necessarily give all the answers regarding security practices either. Instead they
often have very simple “get-started” guides, and have limited security guidelines compared
to the protocol specification.
The steps needed to securely implement a Relying Party or any other protocol entity are
thoroughly explained in the official specification [60]. The SDKs provide developer guides [19,
37] showing simple code samples for what is needed to get started in implementing the pro-
tocol flow. Even together, the specification and guides may be confusing sources as the of-
ficial specification has loads of information, while the developer guides are rather short and
simple.
Analysis of these SDKs and the specification can be extracted to a qualitative model of the
flow containing a few concrete and simple rules for secure development of an RP, formed as
a straightforward checklist. Even with such rules at hand, it could be bothersome or hard for
a non-security competent developer to realize the gravity of failing to implement a certain
security feature. Developers are also prone to forgetting a certain check or performing a
check in the wrong way. Therefore in addition to serve as a guideline for the developers, this
model forms a knowledge framework for the simple static analyses, which in their turn help
enforce the rules of the protocol.
The recall of the tool in total was 95%, meaning most of the known vulnerable code was
discovered. The tool also had a precision of 51% in total, and 61% for the Improper ID token
verification detector as the only detector finding vulnerabilities. The recall looks promising,
but this precision might look unsatisfactory low at first sight, since developers generally like
precise tools [13]. While the analysis has some weaknesses in today’s implementation, there
are several arguments for why this metric should not alone be taken as a definitive measure
of the strength and potential of the tool.
Firstly, the population of the applications analyzed is not statistically significant, mean-
ing the result in total may be caused by a coincidence. If the precision had turned out to
be for instance, 85% for these applications, this is still not a metric that on its own could
confidently describe the performance of the tool. The data from the validation are arguably
more nuanced than that, and the qualitative insights are important to determine its potential
strengths and weaknesses in a small data. The only test characteristic that is fully satisfied
by this case study is realism [26].
Three out of four detectors had no positives, and this itself might skew the results. Addi-
tionally, a significant portion (50%) of the FPs came due to one factor in the code structure
of two applications. This is considered an outlier that greatly impacted the small data set,
skewing the results. It can be argued that the outlier comes due to an overly complex code
structure 2 , which is an anti-pattern in security code [3].
The analysis was on a contrary rather effective on the code bases that implemented OIDC
in a way that is more similar to the developer guides given by the SDKs [19, 37]. This makes
sense since the analyses are based on the developer-oriented model, which is greatly influ-
enced by these guides.
The Improper ID token verification detector has a precision of 61% in total. On a closer
look, if disregarding the outlier, a promising 83% precision would be found for the Improper
ID token verification detector. Removed outliers would leave only three false positives than
cannot be dealt with 3 . This would be in the range of what is considered acceptable for most
developers, which generally are not likely to accept a precision of any lower than 80% [13,
88].
With an intuitive reasoning from the qualitative analysis in Chapter 7.3, probably 15 of
the 20 false positives in this trial can be avoided with improvements of the details in the
analysis. Doing this requires a substantial engineering effort. It may take a few week’s work
to fix without sacrificing recall. This was not possible in the time constraints of this thesis.
Instead of concluding directly based on precision and recall, however, it is more valuable
to also emphasize the qualitative insights of the results 4 .
The metrics should also be interpreted based on the limitations of the data. Precision and
recall are sensitive to imbalanced data sets [90]. In a small data set it is more useful to look
at the true negative rate (TNR) together with the precision and recall. While the precision
and recall can change significantly if exposed to outliers in small data sets, the true negative
2
Outliers are described in detail in Chapters 7.3.4 and 7.3.6
3
The false positives related to ID token verification that cannot be fixed are explained in Chapter 7.3.3.
4
Qualitative insights into the details of the analyses are presented in Chapter 7.3.
CHAPTER 8. DISCUSSION 104
rate is not affected as much by coincidences. In contrast to the other two metrics, the true
negative rate is robust when facing imbalanced data [90].
The true negative rate was of 90%, meaning that 9 out of 10 non-vulnerable cases were
correctly predicted as negatives. This gives a more nuanced picture of the resistance to false
positives, and shows that in most cases, the analysis avoids giving false positives. The recall
5
is only an indicative metric which is more confidently calculated in controlled test suites
More empirical testing is however needed to learn more about how strongly the analyses
truly can perform, as all the metrics are affected by the size of the data set.
The Control flow ID token verification detector performed significantly better on the ap-
plications using the Google library than the ones using the Nimbus SDK, and 15 of the 20
discovered vulnerabilities were found in applications using the Google library. Additionally
it had significantly different results for the metrics when looking at these two groups of appli-
cations. Table 7.6 shows that precision analyzing the applications that use the Google library,
was at 68% for the current analysis.
Finally, if the total precision in should in fact turn out to be of 51% in the end, even this
might be acceptable because the code analyzed is security-critical. Sørensen et al. [88] found
that developers are more interested in finding all the vulnerabilities than having a high pre-
cision when looking at security-critical code.
But as it stands, the precision of the tool as implemented today is not likely to be satisfac-
tory for a general practitioner. However the precision can be increased significantly through
a few weeks of careful engineering efforts. This is an interesting avenue for further work.
FindSecBugs has been found to be quite precise in detection of several of the OWASP
top 10 vulnerabilities [48]. However most of such the existing detectors (especially the ones
which use similar techniques to the Improper ID token verification detector) can effectively
be fooled into yielding false positives if the logic of the methods is structured in a certain way,
similar to like the outliers found in this study. This must be considered since static analysis
is dependent on the code structure.
The detectors are implemented using FindSecBugs, which has been found to have a su-
perior usability to other known tools [48]. FindSecBugs easily integrates in the workflow of
the developer, which increases the chance that developers will use these analyses to elimi-
nate vulnerabilities [88]. Before this, no detectors for OpenID Connect vulnerabilities have
been implemented in FindSecBugs, highlighting the novelty of this work.
The three other detectors gave no true positives, and thereby did not generate much data to
analyze. This might be because they are narrow, or because it is not as common for develop-
ers to make mistakes introducing these vulnerabilities in their code.
The Insecure authorization grant detector has a very limited scope, and correctly pre-
dicted 9 true negatives. It is not the most important detector, but contributes to avoiding
special code smells. Also the Control flow ID token verification detector has a very special-
5
See Chapter 2.6.3 about test suites.
CHAPTER 8. DISCUSSION 105
ized scope, and only one of the six applications had relevant code i predicted. It had four
false positives, three of which are possible to avoid if it is extended to look for a few more
patterns. To eliminate the final false positive, it may have to get some inter-procedural at-
tributes similar to the ones in the Co-existing Invocation Enforcement analysis.
Meanwhile the Improper state verification detector had a total of two false positives,
yielding a true negative rate of 84%. The two false positives came due to the code struc-
ture in the analyzed projects. One of these cases cannot be avoided because the analysis is
bound to one class, while the other can be fixed with some efforts. The false negative, i. e.
the vulnerability the Improper state verification detector could not detect 6 , will be detected
if another corresponding detector is implemented to cover step 1 in the developer-oriented
model 7 .
In comparison, the static analysis in this thesis (integrated into FindSecBugs) used 2.8
seconds to analyze the largest included project, Firebase, with 7000 lines of Java code. How-
ever the analysis of SonarQube took 23 seconds since it included an unknown volume of files
from the dependencies that were package into the .jar file that was analyzed.
The solution proposed by Rahat et al. [76], OAuthLint, is the only one of the related works
that uses static analysis to detect vulnerabilities, in the OAuth protocol in Android applica-
tions. They use a formal predicate language to query a control-flow graph of the program.
The vulnerabilities they target are based on the limitations of OAuth, and they have a scope
that looks at different vulnerabilities related to transfer protocols and local storage of data.
In comparison, the analyses in this thesis seek to cover all the authorization code flow steps
in OpenID connect, with an emphasis on validation of data like the ID token and state pa-
rameters. Such vulnerabilities cannot be detected by OAuthLint.
PrOfESSOS [54] by Mainka and Wich is the only related token that scans applications for
token forgery attacks, i.e. ID token validation vulnerabilities. They tested their penetration
testing on 8 open-source Relying Party libraries. They discovered 22 ID token-related vulner-
abilities which intersect those the Improper ID token verification detector looks for. Addition-
ally, they found 8 vulnerabilities in a novel attack they have proposed, called IdP Confusion.
They do not give any information about false positives, and it is therefore not easy to reason
about the precision of their analysis.
Like mentioned above, their penetration testing tool only detects such vulnerabilities in
already running applications, and their tool requires manual configuration. In comparison,
the static analyses proposed in this thesis can mitigate these vulnerabilities during devel-
opment. The Improper ID token verification detector found 19 equivalent vulnerabilities in
six open-source code bases implementing client logic with ID tokens, plus one vulnerability
they do not cover 8 . The Improper ID token verification detector is implemented in Find-
SecBugs, and can be run automatically without manual configuration.
As such, none of the other known works fills the space that this thesis proposes, as the
first static analysis tool that detects ID token validation vulnerabilities. While the other tools
are useful to detect vulnerabilities and protect already implemented applications, and assist
penetration testers in their security analysis of an application, all these tools detect vulnera-
bilities very late in the development stage.
This thesis is therefore, to the best of my knowledge, the first work that uses simple static
analysis techniques to detect vulnerabilities in OpenID Connect client applications. It also
has the static first analysis to detect ID token verification related vulnerabilities.
While the scope of the currently implemented analyses in this thesis is not as compre-
hensive as several of the penetration testing tools, the main driver for this effort is to provide
assistance to the developers. By implementing the analyses in a tool that is offered as an IDE
plugin, vulnerabilities introduced in the code by the developers can be picked up very early
in the development phase. If any vulnerabilities are impossible to find with static analysis,
these may be picked up later by penetration testing tools.
8
The unique vulnerability detected in this thesis is the warning that an incomplete SDK-implemented val-
idator is used. This is described in Chapter 7.3.1.
CHAPTER 8. DISCUSSION 107
For other vulnerability classes like injection, static analysis tools tend to complement
penetration testing tools in what kind of vulnerabilities they find [6]. This may also be the
case for certain vulnerabilities in OpenID Connect, and this work can therefore contribute
well alongside penetration testing tools proposed. While the detectors implementing using
the techniques presented in this thesis detect a certain set of vulnerabilities, they are limited
in what information they can get about the program, and some cases are better suited for a
tool used in the run-time. The same factor goes the other way, as penetration testing tools
cannot reason about details in implementation errors.
Given the simplicity of the mistakes that then exist in a code base, then it has some logical
sense that we can use simple security testing to verify that these simple things are done right.
Simple static analyses have several advantages. They are often easy to use, they are easy to
produce, and they can obtain a wide coverage. Through integrating the analysis with the
popular FindSecBugs tool which has the FindBugs framework in its core, a large range of
usability is possible. One of the main arguments for using static analysis is that bugs can be
detected early in the development phase. The developer could use the tool in their IDE, or
set up the analyses with their continuous integration. In both cases, vulnerabilities can be
mitigated early in the development life cycle.
One inherent challenge with simple static analysis techniques is that they are prone to
false positives and false negatives. They are based on a simple model with a certain set of
assumptions. To some level, these assumptions must be satisfied. The analyses will be re-
stricted by which information the detectors can retrieve from the program across the code
files. If the code base is very complex and introduces many abstractions, they will no longer
be covered by a model which is originally rooted in rather simple examples.
However there is an argument that security code should be as simple as possible, and
that unnecessary complexity generally can make the code more prone to vulnerabilities [3].
If the code is too “dodgy” to pass through the analysis, one might argue that it is a sign of a
code smell, hence, the static analysis technique may also enforce a certain coding style for
the security code. Security-critical should arguably be of a higher quality, and as simple and
readable as possible. Code which is readable for a human is also likely to be more readable
for a static analyzer.
Therefore some false positives that come with the analysis can be considered style-warnings.
So even if the true bug does not exist, the complexity of a warned code base might bring up
some red lights.
the results associated with these, as well as a clear explanation of the methods used.
The external validity of the thesis is mainly threatened by the Example-based research
strategy explained in Chapter 4.3. This strategy is based on illustrating results with some
real-world examples. Results from these can not be generalized due to lack of statistical sig-
nificance, coming from a small data set. These code examples cannot confidently represent
the average code base.
Additionally, two SDKs were selected for analysis of implementation details. These SDKs
do not confidently represent all OpenID connect SDKs, and analyses related to them cannot
be generalized. While it is likely that most these SDKs have many similarities due to them
implementing the same protocol standard, differences can be many. For these two SDKs,
special cases had to be introduced in the analyses to account for a major difference in how
the same function is implemented differently.
The test-retest-reliability of the validation results is threatened by the size of the data set.
If another experiment was conducted, the results risk being very different from the ones in
this study. A proper large-scale empirical study is required to set a bench-mark for the ability
of this tool.
This internal consistency of the results is also threatened. Between the application groups
for the two SDKs, the internal correlation of subsets of the data in Table 7.6 differs greatly.
Chapter 9
This thesis, has proposed a developer-oriented model of OpenID Connect. It has been im-
plemented and employed as a foundation for static analyses designed to help developers
secure the critical protocol steps. These analyses, which only cover part of the protocol, un-
covered 20 vulnerabilities in six open-source applications. The research done here resulted
in the first static analysis that detects ID token validation vulnerabilities. This demonstrates
that simple static analyses can be used to find security bugs in OpenID connect clients.
The work in this thesis could provide both industrial and scientific value. Firstly, the anal-
yses are implemented as extensions of the prevalent static analysis tool, Find Security Bugs.
This tool is easy to use and popularly downloaded, and may enable a significant number
of developers to improve the security of their OpenID Connect code. Secondly, the knowl-
edge work done in this thesis may give further research incentive to keep putting more effort
into using simple, pragmatic techniques as an avenue parallel to the complex models, for
securing modern web applications.
The precision of the tool, as implemented today, was 51% in total, and 61% for the Im-
proper ID token verification detector. Such a precision might not be satisfactory for a prac-
titioner. However, practitioners are inclined to accept a lower precision if the code analyzed
is security critical. The precision of the analyses can be increased significantly through a few
weeks of careful engineering efforts. The tool also had a True Negative Rate of 89%, meaning
that 9 out 10 negative cases were correctly predicted as negative, showing resistance to false
positives. The recall of 95% indicates that the tool is effective in picking up vulnerabilities if
they exist.
110
CHAPTER 9. CONCLUSIONS AND FURTHER WORK 111
9.1 Conclusion
This thesis has attempted to answer two research questions through generating a qualitative
model of OpenID Connect, and design and implementation of a software component:
RQ1 What must a developer do to avoid introducing known security vulnerabilities, while
implementing a Relying Party with an OpenID Connect SDK?
RQ2 How can simple, explicit and intraprocedural static analysis checks be used to identify
vulnerabilities in OpenID Connect Relying Parties?
analyses done in the validation trial in this thesis, have the potential to reach a precision of
83% if more work is put down to cover rare cases of complex code structure.
For the Improper ID token verification detector, 10 of its 13 false positives can be re-
moved. These 10 false positives came due to the way the methods in the code are structured.
It already has a check that infers inter-procedural checks in forwards analysis “down” the call
stack. If an equivalent check is implemented for backwards calls in the call stack, this can be
avoided.
The Control flow ID token verification detector had three false positives which are pos-
sible to avoid if it is extended to look for a few more patterns. To eliminate the final false
positive, its algorithm have be altered to get some inter-procedural attributes similar to the
ones in the Co-existing Invocation Enforcement analysis.
Lastly, for the Improper state verification detector the false negative that came can be
eliminated by the introduction of another detector. Generally, more detectors will likely lead
to fewer false negatives. One of its false positives can be caught if applying the same solution
as for the Improper ID token verification detector. Its last false positive is not possible to pick
up because of the limitations of FindBugs detectors.
[2] A LACA , F., AND VAN O ORSCHOT, P. C. Comparative Analysis and Framework Evaluating
Web Single Sign-On Systems.
[3] A LENEZI , M., AND Z AROUR , M. On the relationship between software complexity and
security. arXiv preprint arXiv:2002.07135 (2020).
[4] A LLEN , F. E. Control flow analysis. In ACM Sigplan Notices (1970), vol. 5, ACM, pp. 1–
19.
[5] A LLEN , F. E. Interprocedural analysis and the information derived by it. In Program-
ming Methodology (Berlin, Heidelberg, 1975), C. E. Hackl, Ed., Springer Berlin Heidel-
berg, pp. 291–321.
[6] A NTUNES , N., AND V IEIRA , M. Comparing the effectiveness of penetration testing and
static code analysis on the detection of sql injection vulnerabilities in web services.
In 2009 15th IEEE Pacific Rim International Symposium on Dependable Computing
(2009), pp. 301–306.
[8] AYEWAH , N., P UGH , W., H OVEMEYER , D., M ORGENTHALER , J. D., AND P ENIX , J. Using
static analysis to find bugs. IEEE Software 25, 5 (2008), 22–29.
[9] B ENANTAR , M. Access Control Systems: Security, Identity Management and Trust Mod-
els. Springer, 2006.
[10] B OCI Ć , I., AND B ULTAN , T. Finding access control bugs in web applications with can-
check. In ASE 2016 - Proceedings of the 31st IEEE/ACM International Conference on
Automated Software Engineering (8 2016), Association for Computing Machinery, Inc,
pp. 155–166.
[11] C ALZAVARA , S., F OCARDI , R., M AFFEI , M., S CHNEIDEWIND, C., S QUARCINA , M., AND
T EMPESTA , M. {WPSE}: Fortifying web protocols via browser-side security monitoring.
In 27th {USENIX} Security Symposium ({USENIX} Security 18) (2018), pp. 1493–1510.
113
REFERENCES 114
[12] C HO, J., G ARCIA -M OLINA , H., AND PAGE , L. Efficient crawling through url ordering.
In Seventh International World-Wide Web Conference (WWW 1998) (1998).
[13] C HRISTAKIS , M., AND B IRD, C. What developers want and need from program anal-
ysis: An empirical study. In Proceedings of the 31st IEEE/ACM International Confer-
ence on Automated Software Engineering (New York, NY, USA, 2016), ASE 2016, ACM,
pp. 332–343.
[14] Ddf distributed data framework - an open source, modular integration framework.
[Link] Accessed in May 2020. Version: ddf-2.24.0.
[15] [Link]
examples/openid-connect. Accessed in May 2020.
[17] Nimbus oauth 2.0 sdk with openid connect 1.0 extensions v8.2.
[Link]
[Link]/doc/[Link]/oauth2-oidc-sdk/8.2/[Link]. Accessed in
May 2020, version 8.2 of the SDK.
[18] Nimbus oauth 2.0 sdk with openid connect extensions. [Link]
products/nimbus-oauth-openid-connect-sdk. Accessed in May 2020.
[19] [Link]
guides/java-cookbook-for-openid-connect-public-clients, 2018. Accessed
in May 2020.
[20] [Link]
2015. Accessed in May 2020.
[22] D ALTON , M., KOZYRAKIS , C., AND Z ELDOVICH , N. Nemesis: Preventing Authentication
& Access Control Vulnerabilities in Web Applications. Tech. rep., 2009.
[24] D EEPA , G., AND T HILAGAM , P. S. Securing web applications from injection and logic
vulnerabilities: Approaches and challenges. Information and Software Technology 74
(6 2016), 160–180.
[25] D EEPA , G., T HILAGAM , P. S., P RASEED, A., AND PAIS , A. R. Detlogic: A black-box ap-
proach for detecting logic vulnerabilities in web applications. Journal of Network and
Computer Applications 109 (2018), 89–109.
REFERENCES 115
[26] D ELAITRE , A. M., S TIVALET, B. C., B LACK , P. E., O KUN , V., C OHEN , T. S., AND R IBEIRO,
A. Sate v report: Ten years of static analysis tool expositions. Tech. rep., 2018.
[27] E L K ATEB , D., E L R AKAIBY, Y., M OUELHI , T., AND L E T RAON , Y. Access control en-
forcement testing. In Proceedings of the 8th International Workshop on Automation of
Software Test (2013), IEEE Press, pp. 64–70.
[28] Eliassoren/find-sec-bugs.[Link]
tree/feature/evaluation-opensource. Accessed in June 2020. Repository for the
implemented detecors, in the branch used for evaluation.
[30] E MANUELSSON , P., AND N ILSSON , U. A comparative study of industrial static analysis
tools. Electron. Notes Theor. Comput. Sci. 217 (July 2008), 5–21.
[31] E SPINOZA , A. M., K NOCKEL , J., C OMESAÑA -A LFARO , P., AND C RANDALL , J. R. V-dift:
Vector-based dynamic information flow tracking with application to locating crypto-
graphic keys for reverse engineering. In 2016 11th International Conference on Avail-
ability, Reliability and Security (ARES) (Aug 2016), pp. 266–271.
[32] FANG , Z., Z HANG , Y., KONG , Y., AND L IU , Q. Static detection of logic vulnerabilities in
java web applications. Security and Communication Networks 7, 3 (2014), 519–531.
[33] F ETT, D., K ÜSTERS , R., AND S CHMITZ , G. A comprehensive formal security analysis of
OAuth 2.0. In Proceedings of the ACM Conference on Computer and Communications
Security (2016).
[34] F ETT, D., K USTERS , R., AND S CHMITZ , G. The Web SSO Standard OpenID Connect: In-
depth Formal Security Analysis and Security Guidelines. In Proceedings - IEEE Com-
puter Security Foundations Symposium (2017).
[37] [Link]
openid-connect, April 2020. Accessed in April 2020.
[41] H ARDT, E. D. The oauth 2.0 authorization framework. RFC 6749, Internet Engineering
Task Force (IETF), October 2012.
[42] H ENRY , S., AND K AFURA , D. Software structure metrics based on information flow.
IEEE Transactions on Software Engineering SE-7, 5 (Sep. 1981), 510–518.
[44] H OVEMEYER , D., AND P UGH , W. Finding bugs is easy. SIGPLAN Not. 39, 12 (Dec. 2004),
92–106.
[45] J ONES , M., M., B RADLEY, J., I DENTITY, P., AND S AKIMURA , M. Json web token (jwt).
RFC 7519, Internet Engineering Task Force (IETF), May 2015.
[46] K HALID, M. N., FAROOQ , H., I QBAL , M., A LAM , M. T., AND R ASHEED, K. Predicting
web vulnerabilities in web applications based on machine learning. In Intelligent Tech-
nologies and Applications (Singapore, 2019), I. S. Bajwa, F. Kamareddine, and A. Costa,
Eds., Springer Singapore, pp. 473–484.
[47] K RONJEE , J., H OMMERSOM , A., AND V RANKEN , H. Discovering software vulnerabili-
ties using data-flow analysis and machine learning. In ACM International Conference
Proceeding Series (8 2018), Association for Computing Machinery.
[48] L I , J., B EBA , S., AND K ARLSEN , M. M. Evaluation of open-source ide plugins for de-
tecting security vulnerabilities. In Proceedings of the Evaluation and Assessment on
Software Engineering (New York, NY, USA, 2019), EASE ’19, ACM, pp. 200–209.
[49] L I , W., AND M ITCHELL , C. J. Security issues in OAUTH 2.0 SSO implementations. Lec-
ture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelli-
gence and Lecture Notes in Bioinformatics) (2014).
[50] L I , W., M ITCHELL , C. J., AND C HEN , T. Mitigating CSRF attacks on OAuth 2.0 and
OpenID Connect.
[51] L I , W., M ITCHELL , C. J., AND C HEN , T. Oauthguard: Protecting user security and
privacy with Oauth 2.0 and Openid connect. In Proceedings of the ACM Conference on
Computer and Communications Security (2019).
[53] L UOTONEN , A., AND A LTIS , K. World-wide web proxies. Computer Networks and ISDN
systems 27, 2 (1994), 147–154.
[54] M AINKA , C., M LADENOV, V., S CHWENK , J., AND W ICH , T. SoK: Single Sign-On Security
- An Evaluation of OpenID Connect. In Proceedings - 2nd IEEE European Symposium
on Security and Privacy, EuroS and P 2017 (2017).
[59] M UTHUKUMARAN , D., O’K EEFFE , D., P RIEBE , C., E YERS , D., S HAND, B., AND P IET-
ZUCH , P. Flowwatcher: Defending against data disclosure vulnerabilities in web appli-
cations. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Com-
munications Security (2015), ACM, pp. 603–615.
[62] N AVAS , J., AND B ELTRÁN , M. Understanding and mitigating OpenID Connect threats.
Computers and Security (2019).
[63] N EAR , J. P., AND J ACKSON , D. Finding security bugs in web applications using a cat-
alog of access control patterns. In Proceedings - International Conference on Software
Engineering (5 2016), vol. 14-22-May-2016, IEEE Computer Society, pp. 947–958.
[68] P ELLEGRINO, G., C ATAKOGLU , O., B ALZAROTTI , D., AND R OSSOW, C. Uses and abuses
of server-side requests. In International Symposium on Research in Attacks, Intrusions,
and Defenses (2016), Springer, pp. 393–414.
[72] Owasp find security bugs-the community static code analyzer. https:
//[Link]/presentations/2019-09-12-appsecglobaldc/OWASP_
Find-Security_Bugs.pdf, 2019. Accessed in April 2020.
[74] Find security bugs - the spotbugs plugin for security audits of java web applications.
[Link] 2019. Accessed in April 2020.
[75] P OTTER , B., AND M C G RAW , G. Software security testing. IEEE Security Privacy 2, 5
(Sep. 2004), 81–85.
[76] R AHAT, T. A., F ENG , Y., AND T IAN , Y. Oauthlint: An empirical study on oauth bugs in
android applications. In Proceedings of the 34th IEEE/ACM International Conference
on Automated Software Engineering (2019), ASE ’19, IEEE Press, p. 293–304.
[77] R AZZAQ , A., L ATIF, K., FAROOQ A HMAD, H., H UR , A., A NWAR , Z., AND B LOODSWORTH ,
P. C. Semantic security against web application attacks. Information Sciences 254 (1
2014), 19–38.
[78] R OCHE , E., AND S CHABES , Y. Finite-state language processing. MIT press, 1997.
[79] RUSSELL , R., K IM , L., H AMILTON , L., L AZOVICH , T., H ARER , J., O ZDEMIR , O., E LLING -
WOOD, P., AND M C C ONLEY, M. Automated vulnerability detection in source code us-
ing deep representation learning. In 2018 17th IEEE International Conference on Ma-
chine Learning and Applications (ICMLA) (2018), IEEE, pp. 757–762.
[80] S ADQI , Y., B ELFAIK , Y., AND S AFI , S. Web oauth-based sso systems security. In Proceed-
ings of the 3rd International Conference on Networking, Information Systems Security
(New York, NY, USA, 2020), NISS2020, Association for Computing Machinery.
REFERENCES 119
[81] S ENG , L. K., I THNIN , N., AND S AID, S. Z. M. The approaches to quantify web applica-
tion security scanners quality: a review. International Journal of Advanced Computer
Research 8, 38 (2018), 285–312.
[82] S HAW, M. What makes good research in software engineering? International Journal
on Software Tools for Technology Transfer 4, 1 (2002), 1–7.
[83] S ON , S., AND S HMATIKOV, V. Saferphp: Finding semantic vulnerabilities in php appli-
cations. In Proceedings of the ACM SIGPLAN 6th Workshop on Programming Languages
and Analysis for Security (2011), ACM, p. 8.
[85] S UN , F., X U , L., AND S U , Z. Static detection of access control vulnerabilities in web
applications. In USENIX Security Symposium (2011), vol. 64.
[86] S UN , S. T., AND B EZNOSOV, K. The devil is in the (implementation) details: An empiri-
cal analysis of OAuth SSO systems. In Proceedings of the ACM Conference on Computer
and Communications Security (2012).
[88] S ØRENSEN , E. B., K ARLSEN , E. K., AND L I , J. What norwegian developers want and
need from security-directed program analysis tools: A survey. In Proceedings of the
Evaluation and Assessment in Software Engineering (New York, NY, USA, 2020), EASE
’20, Association for Computing Machinery, p. 505–511.
[89] T. L ODDERSTEDT, E D., M. M C G LOIN , P. H. OAuth 2.0 Threat Model and Security Con-
siderations. Tech. rep., 2013.
[92] Owasp top 10 - 2017 the ten most critical web application security risks. https://
[Link]/images/b/b0/OWASP_Top_10_2017_RC2_Final.pdf, 2017. Added
in September 2019.
[95] WANG , R., Z HOU , Y., C HEN , S., Q ADEER , S., E VANS , D., AND G UREVICH , Y. Explicating
SDKs: Uncovering assumptions underlying secure authentication and authorization.
In Proceedings of the 22nd USENIX Security Symposium (2013).
[96] YANG , R., L AU , W. C., C HEN , J., AND Z HANG , K. Vetting single sign-on {SDK} imple-
mentations via symbolic reasoning. In 27th {USENIX} Security Symposium ({USENIX}
Security 18) (2018), pp. 1459–1474.
[97] YANG , R., L I , G., L AU , W. C., Z HANG , K., AND H U , P. Model-based security testing:
An empirical study on OAuth 2.0 implementations. In ASIA CCS 2016 - Proceedings of
the 11th ACM Asia Conference on Computer and Communications Security (may 2016),
Association for Computing Machinery, Inc, pp. 651–662.
[98] Z HENG , Y., AND Z HANG , X. Path sensitive static analysis of web applications for re-
mote code execution vulnerability detection. In 2013 35th International Conference
on Software Engineering (ICSE) (2013), IEEE, pp. 652–661.
[99] Z HOU , Y., AND E VANS , D. SSOScan: Automated testing of web applications for single
sign-on vulnerabilities. In Proceedings of the 23rd USENIX Security Symposium (2014).
[100] Z HU , J., C HU , B., AND L IPFORD, H. Detecting privilege escalation attacks through In-
strumenting web application source code. In Proceedings of ACM Symposium on Access
Control Models and Technologies, SACMAT (6 2016), vol. 06-08-June-2016, Association
for Computing Machinery, pp. 73–80.
[101] Z HU , J., C HU , B., L IPFORD, H., AND T HOMAS , T. Mitigating access control vulnerabil-
ities through interactive static analysis. In Proceedings of ACM Symposium on Access
Control Models and Technologies, SACMAT (6 2015), vol. 2015-June, Association for
Computing Machinery, pp. 199–209.
I
APPENDIX A. RESEARCH PAPER OF PRECURSORY WORK II
2 RELATED WORK Research guidelines suggested by Kitchenam and Pfleeger [12] and
Christakis and Bird conducted a survey targeting developers in by Oates [19] were also considered in the design process. Christakis
Microsoft [4]. They investigated developers’ requirements to static and Bird’s survey of Microsoft developers [4] was a particularly
analyzers and compiled a ranked list of barriers against their use. important influence.
Among their primary findings were: The main purpose of the questionnaire was to explore developers’
relative preferences between opposing tool characteristics, and their
• Developers want the opportunity to customize analyzers. thoughts on various challenges with and requirements of tools.
• Programmatic annotations are most preferable, before rules given The questionnaire was distributed to approximately 750 consul-
in global configuration files and annotations coded in comments. tants, from seven consultancy firms. Each of the invitees received a
• 90% of developers accept 5% false positives, 50% of developers reminder a few days after the initial invitation, and the question-
accept 15% false positives, and only 24% of developers accept naire was open for a week.
more than 20% false positives. After reviewing relevant literature and similar surveys, six statis-
• There is a 50/50 distribution in preferences for soundness versus tical hypotheses, listed in Table 1, were selected. While hypotheses
completeness. 1–5 lay wholly within the scope of RQ2, hypothesis 6, which does
Thomas et al. [27] conducted a study investigating the implica- not concern a background-specific relation, does not.
tions of interactive code annotations within the IDE. Their main Two details concerning the formulation of the six hypotheses
findings were that it is easy to write annotations for access control should be clarified: First, many hypotheses, and survey questions,
logic, but hard to find causes of vulnerabilities. Even non-security concern the relative preference between soundness and complete-
people were able to describe access control policies with reasonable ness, on the underlying assumption that increased performance
effort. with regard to one attribute necessitates a decrease in performance
Sadowski et al. [22] worked with a Google project called Tri- with the other attribute . Thus, when the phrase “prefers soundness
corder. Their main findings include: over completeness” is used, it means only that one is willing to sacri-
• Low false alarm rate is important. fice some degree of completeness for increased soundness, not that
• It is important to allow customization at project level, and not one would not ideally want both. Second, for brevity, the qualifier
only at user level. “software consultants” is generally omitted from the hypotheses,
• Analysis tools should not only find, but also fix bugs. Tools that and the shorthand “tools” is used to mean specifically “program
automatically apply fixes reduce the need for context switches. analysis tools for detecting access control vulnerabilities”.
• Program analysis tools should be shardable to ensure that analy- Inferential statistical analysis involving ordinal data is a con-
ses can run at large scale. tested, methodologically challenging issue [1, 23, 30]. It is especially
difficult to assess when classical parametric statistical tests are appli-
Tripp et al. [28] present a tool called ALETHEIA. Their main cable, and how to safely prepare ordinal data for use with such tests.
idea is to apply statistical learning to user-tailor warning output. To err on the side of caution, we opted to use only non-parametric
The tool learns from feedback on a smaller set of warnings. They tests in the analysis. In particular, Kendall’s Tau rank correlation
confirm the well-known finding that developers are very bothered coefficient was used for the majority of hypotheses, as it is a natural
by an excess of false positives. choice for investigating relationships involving ordinal variables
Tymchuk et al. [29] interviewed experienced developers to un- representing preferences. The downside of using non-parametric
derstand how they were influenced by an IDE tool providing just- tests is that they generally have lower statistical power than para-
in-time feedback for good coding practices. Usefulness of analyzers metric ones; non-parametric tests are more likely to result in type II
in different situations was assessed, and they gathered feedback errors, where one fails to reject a false null hypothesis. For purpose
about the behavior of the tool. They found that the main negative of our study, we regard it preferable to err on the side of rejecting
issues of static analyzers are false positives, unclear explanations, a hypothesized relation, rather than erroneously concluding one
annoying user experience, and annoying rules. exists when that is not the case.
Li et al. [14] performed an experimental validation of various
open-source IDE plugins that detect security vulnerabilities. They
investigated vulnerability class scopes, quality of detection, and 3.1 Semi-structured interviews
user-friendliness of tool warnings. They found a mismatch between Three respondents were invited to semi-structured, follow-up in-
the claimed and actual coverage of the tools, as well as unexpectedly terviews after participating in the questionnaire. The focus of these
high false positive rates. Several tools had limited information in interviews was to get insight into how these respondents inter-
their output, with drawbacks such as imprecise or lacking explana- preted the questions and gave their responses, to discover potential
tions of vulnerabilities. Another issue was missing opportunities to weaknesses in the survey design and enhance understanding of the
direct a tool; some tools are only able to scan full code bases, and survey data and results. The qualitative responses were analyzed
not smaller units. semantically, though not with any formal coding framework.
Figure 1: Distribution of relative preference for soundness Figure 2: Joint distribution of relative preference for sound-
versus completeness ness vs. completeness and accepted rate of false positives
work as system administrators, and the remaining 4% work with Figure 2 displays the relation between the answers for relative
management. preference for soundness vs. completeness and accepted rate of
false positives.
4.1 RQ1: Non-functional characteristics The consultants were also asked to weigh the characteristics
The survey explored where in the development cycle consultants “more automatic, but less precise” and “more precise, but more
would prefer to use a program analysis tool, i.e. their preferred work with annotations” against each other. The bar plot in Figure 3
workflow integration point. “Direct integration in an IDE” was the shows that the majority of the respondents lean towards the neutral
most popular option, before “integration in a Continuous Integra- ground, which highlights the importance of balance in the tools.
tion/Continuous Delivery (CI/CD) pipeline” and “integration in
the (local) build process”. Further, the consultants were asked to 4.2 RQ2: Background-related effects
indicate their relative preference between the conflicting attributes
soundness and completeness. The ordinal data illustrated in Fig- Hypothesis 1: Consultants working with security-critical systems
ure 1 show a total of 51% preferring to find as many critical errors tend to have higher relative preference for annotation-based versus
as possible (soundness), while only 37.5% of the consultants viewed automated tools than consultants who do not work with such systems.
having fewer false positives as the more important attribute when Among the respondents, 49 persons work with security-critical
detecting data leaks. systems, while 19 persons do not work with such systems. To test
APPENDIX A. RESEARCH PAPER OF PRECURSORY WORK V
Figure 3: Distribution of relative preference between the Figure 4: Joint distribution of relative preferences for sound-
characteristics a) “more automatic, but less precise” and ness vs. completeness and for automatic vs. annotation-
b) “more precise, but more work with annotations” based tools
they would like vulnerabilities reported, which other vulnerabili- also highlighted the importance of analyzing during run-time in
ties they find important, and what challenges they see with using addition to static analysis. This motivation for several analysis
program analysis tools for security. The follow-up interviews also modes may come from that attacks vectors change over time.
gave valuable insights.
5.3 Challenges with program analysis
5.1 Workflow integration point In the end of the questionnaire the respondents were asked to
After giving their opinion on their preferred workflow integra- describe any challenges they could see with using program analysis
tion point, each participant was asked to elaborate further. Several to detect access control vulnerabilities.
interesting responses provided potentially valuable insights. One The following are some of the challenges with program analysis
consultant discussed possibilities of integrating program analysis tools that the developers in consulting mentioned:
in the implementation step in the development cycle: “In the day to • Properly detecting complex patterns and contexts is challenging.
day basis I would find it natural that program analysis is executed in • Developers may turn off an analysis if it takes too long.
the build process, for example in Jenkins”. • Results of analysis may appear in a hidden place, somewhere the
An experienced consultant provided other perspectives in their developer must actively seek.
written response: “I think this is difficult to answer. For me it is natural • The tool may be too generic for the domain.
that such access control is something that is tested in integration test • Result credibility is weakened with too many false alarms.
and is defined in code. It should be part of an active development • There is lacking trust that a tool will be able to detect errors, due
of an API, and one should construct it 100% restrictively, to then to the complexity in software development
open the security following needs. For me this is a part of the craft • Poor performance in soundness or completeness is challenging.
of doing software development. If we should have some architecture • There are probably situations in which non-standard program
that ‘automagically’ understands business rules, then it is nice to behavior is misunderstood by the tool.
have it running in the production systems, either as firewall or other • When the tool usability is too bad developers will not use it.
rule-based systems. But the rules must still be written?”
The last response indicates a natural skepticism, and points to the 5.4 Follow-up interviews
complexity of most software systems, which makes it hard to trust
Three of the consultants were taken in to follow-up interviews, in
that a tool can handle such complexity. The response also suggests
which they got to review their questionnaire responses and provide
a lower need for pedagogical tooling for more senior developers.
thoughts about the topics in question.
Several consultants liked the idea of using program analysis tools
The following overall insights and opinions came from these
during code review, or as part of a CI/CD pipeline.
three interviews:
• High soundness and completeness is more important than where
5.2 Vulnerability report formats
in the workflow a tool is used.
After rating various vulnerability report methods and formats, re- • A tool that learns from code practices and version control history
spondents were asked for their own suggestions. One developer may be valuable.
would like the opportunity to have access control vulnerabilities • The idea that the tool has a faster in-editor mode, and a slower
trigger compilation errors, so the code can not execute until the mode that runs later is acceptable.
issue is fixed. Several respondents pointed out that they would • The false positive rate may be higher if vulnerabilities are pre-
like the output of program analysis in logs. That way they may sented in an orderly, ranked manner.
configure dashboard-based, mail-based or other types of reporting • License fees are a possible barrier from usage of program analysis
on their own. tools.
An important aspect of having the tool output during build or • A configurable tool sounds intriguing. However, the configura-
in CI/CD, was that the build or pipeline must break if there is a tion must be easily understandable, and the defaults must be
vulnerability. Otherwise the vulnerability report could easily drown sensible.
among other log warnings. One respondent wanted the output • One of the respondents thinks a tool should focus on finding and
to result in warnings in the local build process, but to result in ranking more intricate vulnerabilities.
errors when code is processed in a CI pipeline. It was pointed out
how program analysis could, and should, be used in harmony with 6 DISCUSSION
other protection methods: “Whatever can be detected automatically
should be detected as early as possible, then via either IDE, build or 6.1 Comparison with related work
CI. Meanwhile, I would believe that some things are detectable only The data illustrated in Figure 1 shows that nearly 1.4 times as many
via a larger penetration test that is carried out by experts, who then chose soundness, suggesting significant difference. 31% of the re-
typically would write a report from their test.” spondents found soundness to be “most important” (a score of 5).
Others worried about the time load tools could carry with them: Including the scores of “most important“ and “slightly more im-
“One would not like things to take a long time, so the IDE is preferable. portant“ as weights, would give a preference ratio of nearly 1.5 in
However, not if it heavily burdens the performance of the IDE, then it the favor of soundness, which further solidifies the overall prefer-
is better to put analysis later. So the answer to the questions depends ence. This study uses a narrower range for false positive acceptance
on how high a load the tool puts on each step.” This respondent rates than what was used by Christakis and Bird [4], illustrated
APPENDIX A. RESEARCH PAPER OF PRECURSORY WORK VII
in Figure 2. There is a misalignment between what is considered The external validity is mainly threatened by sampling bias. A
few false positives by participants, and what is considered few in few consulting firms that were accessible through contacts of re-
research [4]. This apparent cognitive dissonance may explain the searchers were selected, and among them most consultants were
contrast that comes from the majority also viewing soundness as invited. No probabilistic sampling was done, and the survey relies
the most important factor. Interestingly, as indicated by the follow- on self-selection. The mass of the respondents may still be large
up interviews, developers may prefer completeness in the early enough so that results can generalize to other consulting firms in
stage of development, and allow soundness in the later stages to- Norway. The questionnaire was sent out to around 750 consultants,
gether with other thorough testing. The most preferred workflow among which 400 were invited by e-mail, while the remaining 350
integration point, embedding inside an IDE, aligns with the findings were invited by channel posts in work place chat services. Use of
of Christakis and Bird. However as indicated by interviewees, the chat service rather than email imposes a greater risk of several
preference workflow integration as well as other responses depend potential participants never being properly exposed to the invita-
on where in the development process the project is. The preference tion, as the message quickly drowns. The response rate of 10.5%
for using annotations had a near 50/50 distribution, as shown in may threaten the generalizability of the study, but given that the
Figure 3. An IDE-integrated tool should be fast, while a CI-based 80 respondents come from seven different firms with various busi-
tool could possibly be allowed to scan code all night. Worry about ness areas, the sample may be an acceptable representation of the
license fees is one barrier against adapting tools in consulting, as Norwegian IT consulting industry. The validity of comparison to
well as lack of trust in the performance of open-source tools. These related work is also threatened by differences in development cul-
participants’ worries are also confirmed by recent research [14]. ture, so it is hard to draw conclusions reaching outside of Norway
Neither degree of general experience, experience with security- for this sample. Additionally, each industry may have different soft-
critical system, nor amount of security-oriented education signifi- ware development life cycles, standards and environments, which
cantly influence the relative preferences for the opposing tool char- means that the preferences of these consultants may not apply for
acteristics soundness versus completeness and automatic versus developers with slight differences regarding these factors.
annotation-based of developers.
The results of the hypothesis tests do not suggest any obvious,
new guidelines for tailoring an analysis tool to the preferences of
background-specific subsets of developers. However, hypothesis
test 6 suggests that there exists a subset of developers who are 7 CONCLUSION AND FURTHER WORK
positively inclined towards tools and more willing to make sacrifices This paper surveys and analyses the preferences of Norwegian
to utilize their strengths in the development process. Still, most software consultants in program analysis tools for detecting access
developers will avoid using a tool that has bad usability or is lacking control vulnerabilities. 80 IT consultants from seven Norwegian
in non-functional characteristics. Therefore, designers of program consulting firms were surveyed for their opinions, with embedded
analysis tools should adapt to the process of software developers long text answers and follow-up interviews.
in order to provide proper value to the development, a point that We find that high soundness is considered more important than
confirms ideas from related works [4, 22, 28, 29]. high completeness when uncovering ACVs, and observe a near
Several solutions in the state of the art of program analysis for 50/50 preference distribution between fully automated and annotation-
access control analysis do not have a clear usability perspective. based tools. Of the developers surveyed, 51% prefer soundness over
Even the ones claiming to use “interactive communication” as a completeness when detecting ACVs, and only 37.5% consider com-
usability factor in their solution [33], have been found to come pleteness the more important characteristic.
with major drawbacks regarding usability and other non-functional The quantitative analysis shows that neither degree of general
characteristics [14]. The worry about false positives is ever appar- experience, experience with security-critical system, nor amount
ent, and it is unclear what should be a realistic false positive rate, of security-oriented education significantly influence developers’
though the preference towards soundness when mitigating data relative preferences for the opposing tool characteristics soundness
leaks suggests some acceptance. The developers do not want a versus completeness and automatic versus annotation-based.
strictly automatic or strictly annotation-based tool, but prefer to However, the survey data suggests there exists a group of devel-
use something adaptable. opers who are more positively inclined towards program analysis
tools and more willing to make sacrifices to utilize their strengths
in the development process.
The preferences regarding opposing characteristics explored in
this paper may be determined by additional context-dependent
6.2 Threats to validity factors, like project life cycle stage and kind of vulnerability. Hence,
The internal validity of the practitioner survey is threatened by an interesting avenue for future work is to delve deeper into the
biased and imprecise questions that still persisted after trials of various contexts to explore subtle influences over preference of
testing. Another internal limitation is different understanding of tool usage. There may also exist other relations like the one ex-
terms. A term properly defined before the question may be missed plored by hypothesis 6, which could be explored. Finally, it would
or skipped by the respondent. Qualitative responses were translated be interesting to look deeper into the statistical nature of the opin-
from Norwegian, which carries the risks of semantics getting lost ions in a larger-scale study with probabilistic sampling, potentially
in translation. expanding to a wider population.
APPENDIX A. RESEARCH PAPER OF PRECURSORY WORK VIII
For five of the applications, files analyzed were retrieved from the original sources [7, 14,
35, 52, 102], and had to be slightly altered so that they could be analyzed. The altered code
can be found in the Eval project [29]. The sixth application, SonarQube [94], was build and
analyzed with its original code.
IX
APPENDIX B. RAW DATA FROM EVALUATION X
6/16/2020 FindBugs Report
FindBugs Report
Project Information
Project:
Code analyzed:
/home/elias/git/masterthesis/new-findsecbugs/Oidc-FindSecbugs-Eval/Eval/zopspace/target/zopspace-
[Link]
Metrics
0 lines of code analyzed, in 0 classes, in 1 packages.
Contents
Security Warnings
Details
Summary
Warning Type Number
Security Warnings 6
Total 6
Warnings
Click on a warning row to see full context information.
file:///home/elias/git/masterthesis/new-findsecbugs/find-sec-bugs/cli/results-html-pdf/[Link] 1/4
APPENDIX B. RAW DATA FROM EVALUATION XI
6/16/2020 FindBugs Report
Security Warnings
Code Warning
According the OpenID Connect specification, ID Tokens must be validated by the Relying
SECIIDTV
Party (Client). There are five values in the ID token response that must be verified todo1.
According the OpenID Connect specification, ID Tokens must be validated by the Relying
SECMVIDT
Party (Client). There are five values in the ID token response that must be verified todo2.
SECMVTISSU ID Tokens must be validated by the Relying Party (Client). iss parameter validation
SECMVTSIGN ID Tokens must be validated by the Relying Party (Client). Cryptographic validation
According the OpenID Connect specification, ID Tokens must be validated by the Relying
SECUIDTV
Party (Client). There are five values in the ID token response that must be verified.
Details
file:///home/elias/git/masterthesis/new-findsecbugs/find-sec-bugs/cli/results-html-pdf/[Link] 2/4
APPENDIX B. RAW DATA FROM EVALUATION XII
6/16/2020 FindBugs Report
// todo
You seem to be missing such validation in the code locations where you implement the token request flow.
There are five values in the ID token response that must be verified.
You may use an SDK-implemented validation if this implements all these checks.
Otherwise it is recommended to do these comparisons yourself.
// todo
// todo
USING_INCOMPLETE_ID_TOKEN_VALIDATOR: Using
incomplete SDK-implemented ID Token validation.
file:///home/elias/git/masterthesis/new-findsecbugs/find-sec-bugs/cli/results-html-pdf/[Link] 3/4
APPENDIX B. RAW DATA FROM EVALUATION XIII
6/16/2020 FindBugs Report
According the OpenID Connect specification, ID Tokens must be validated by the Relying Party (Client).
You seem to be using an ID token validation method implemented by an SDK which is known to be
incomplete, as it does not implement all five checks. There are five values in the ID token response that must
be verified. In Google's APIs for example, the com/google/api/client/auth/openidconnect/IdTokenVerifier
fails to implement the checks for validating the Key signatures and Nonce. Meanwhile the
[Link] implements crypto signature validation.
Still it misses Nonce check.([Link]
client/src/main/java/com/google/api/client/googleapis/auth/oauth2/[Link]) Otherwise it
is recommended to do these comparisons yourself.
// todo
file:///home/elias/git/masterthesis/new-findsecbugs/find-sec-bugs/cli/results-html-pdf/[Link] 4/4
APPENDIX B. RAW DATA FROM EVALUATION XIV
FindBugs Report
Project Information
Project:
Code analyzed:
/home/elias/git/masterthesis/new-findsecbugs/Oidc-FindSecbugs-Eval/Eval/atricore/target/[Link]
Metrics
0 lines of code analyzed, in 0 classes, in 1 packages.
Contents
Security Warnings
Details
Summary
Warning Type Number
Security Warnings 1
Total 1
Warnings
Click on a warning row to see full context information.
Security Warnings
Code Warning
According the OpenID Connect specification, ID Tokens must be validated by the Relying Party (Client). There are
SECMVIDT
five values in the ID token response that must be verified todo2.
Details
MISSING_VERIFY_ID_TOKEN: Missing validation of ID Token.
According the OpenID Connect specification, ID Tokens must be validated by the Relying Party (Client).
You seem to be missing such validation in the code locations where you implement the token request flow.
There are five values in the ID token response that must be verified.
You may use an SDK-implemented validation if this implements all these checks.
Otherwise it is recommended to do these comparisons yourself.
file:///home/elias/git/masterthesis/new-findsecbugs/find-sec-bugs/cli/results-html-pdf/[Link] 2/2
APPENDIX B. RAW DATA FROM EVALUATION XVII
FindBugs Report
Project Information
Project:
Code analyzed:
/home/elias/git/masterthesis/new-findsecbugs/Oidc-FindSecbugs-Eval/Eval/firebase/target/[Link]
Metrics
0 lines of code analyzed, in 0 classes, in 1 packages.
Contents
Security Warnings
Details
Summary
Warning Type Number
Security Warnings 10
Total 10
Warnings
Click on a warning row to see full context information.
Security Warnings
Code Warning
According the OpenID Connect specification, ID Tokens must be validated by the Relying Party (Client). There are
SECIIDTV
five values in the ID token response that must be verified todo1.
According the OpenID Connect specification, ID Tokens must be validated by the Relying Party (Client). There are
SECIIDTV
five values in the ID token response that must be verified todo1.
SECITVCF When performing an ID token check the specification requires that you return HTTP 401.
SECITVCF When performing an ID token check the specification requires that you return HTTP 401.
SECITVCF When performing an ID token check the specification requires that you return HTTP 401.
ID Tokens must be validated by the Relying Party (Client). The nonce is a cryptographically opaque value like the
SECMVNONCE
state value which binds an authentiaction request to the ID Token
ID Tokens must be validated by the Relying Party (Client). The nonce is a cryptographically opaque value like the
SECMVNONCE
state value which binds an authentiaction request to the ID Token
SECMVTEXP ID Tokens must be validated by the Relying Party (Client). Freshness validations
SECMVTSIGN ID Tokens must be validated by the Relying Party (Client). Cryptographic validation
SECREQTVER Token validation requires proper control flow. You seems to have reversed the boolean of one of your checks.
Details
file:///home/elias/git/masterthesis/new-findsecbugs/find-sec-bugs/cli/results-html-pdf/4fi[Link] 2/4
APPENDIX B. RAW DATA FROM EVALUATION XX
6/16/2020 FindBugs Report
// todo
return [Link]()
.entity(tokenResponse)
.build();
file:///home/elias/git/masterthesis/new-findsecbugs/find-sec-bugs/cli/results-html-pdf/4fi[Link] 4/4
APPENDIX B. RAW DATA FROM EVALUATION XXII
FindBugs Report
Project Information
Project:
Code analyzed:
/home/elias/git/masterthesis/new-findsecbugs/evaluation-files/sonar-auth-oidc/target/sonar-auth-oidc-plugin-2.0.1-
[Link]
Metrics
0 lines of code analyzed, in 0 classes, in 3 packages.
Contents
Security Warnings
Details
Summary
Warning Type Number
Security Warnings 11
Total 11
Warnings
Click on a warning row to see full context information.
Security Warnings
Code Warning
The new update of the OAuth 2.0 standard disallows usage of this method entirely:
SECISAUTH
[Link]
file:///home/elias/git/masterthesis/new-findsecbugs/find-sec-bugs/cli/results-html-pdf/[Link] 1/3
APPENDIX B. RAW DATA FROM EVALUATION XXIV
6/16/2020 FindBugs Report
In class [Link]
In method [Link](Map)
At [Link]:[line 192]
According the OpenID Connect specification, ID Tokens must be validated by the Relying Party (Client).
SECMVIDT
There are five values in the ID token response that must be verified todo2.
According the OpenID Connect specification, ID Tokens must be validated by the Relying Party (Client).
SECMVIDT
There are five values in the ID token response that must be verified todo2.
SECVMOS The state parameter in the Authentication Response must be verified for checking integrity of the IdP.
SECVMOS The state parameter in the Authentication Response must be verified for checking integrity of the IdP.
SECVMOS The state parameter in the Authentication Response must be verified for checking integrity of the IdP.
SECVMOS The state parameter in the Authentication Response must be verified for checking integrity of the IdP.
SECVMOS The state parameter in the Authentication Response must be verified for checking integrity of the IdP.
SECVMOS The state parameter in the Authentication Response must be verified for checking integrity of the IdP.
SECVMOS The state parameter in the Authentication Response must be verified for checking integrity of the IdP.
SECVMOS The state parameter in the Authentication Response must be verified for checking integrity of the IdP.
Details
USING_PASSWORD_GRANT_OAUTH: Usage of insecure authorization
grant. Use redirection flow instead.
Instead of the password grant, use proper redirect methods with for example authorization code grant.
AuthorizationCode code = new AuthorizationCode("xyz...");
URI callback = new URI("[Link]
AuthorizationGrant codeGrant = new AuthorizationCodeGrant(code, callback);
You seem to be missing such validation in the code locations where you implement the token request flow.
There are five values in the ID token response that must be verified.
You may use an SDK-implemented validation if this implements all these checks.
Otherwise it is recommended to do these comparisons yourself.
if(![Link]().equals(state)) {
// Unauthorized
}
file:///home/elias/git/masterthesis/new-findsecbugs/find-sec-bugs/cli/results-html-pdf/[Link] 3/3
APPENDIX B. RAW DATA FROM EVALUATION XXVI
FindBugs Report
Project Information
Project:
Code analyzed:
/home/elias/git/masterthesis/new-findsecbugs/Oidc-FindSecbugs-Eval/Eval/liferay/target/liferay-1.0-
[Link]
Metrics
0 lines of code analyzed, in 0 classes, in 1 packages.
Contents
Security Warnings
Details
Summary
Warning Type Number
Security Warnings 1
Total 1
Warnings
Click on a warning row to see full context information.
Security Warnings
Code Warning
file:///home/elias/git/masterthesis/new-findsecbugs/find-sec-bugs/cli/results-html-pdf/[Link] 1/2
APPENDIX B. RAW DATA FROM EVALUATION XXVIII
6/16/2020 FindBugs Report
SECVMOS The state parameter in the Authentication Response must be verified for checking integrity of the IdP.
Details
MISSING_VERIFY_OIDC_STATE: State verification check is missing
in your handling of authorization code flow response from IdP.
Remember to check that the state matches to avoid CSRF attacks.
if(![Link]().equals(state)) {
// Unauthorized
}
file:///home/elias/git/masterthesis/new-findsecbugs/find-sec-bugs/cli/results-html-pdf/[Link] 2/2
APPENDIX B. RAW DATA FROM EVALUATION XXIX
FindBugs Report
Project Information
Project:
Code analyzed:
/home/elias/git/masterthesis/new-findsecbugs/Oidc-FindSecbugs-Eval/Eval/ddf/target/ddf-1.0-
[Link]
Metrics
0 lines of code analyzed, in 0 classes, in 1 packages.
Contents
Security Warnings
Details
Summary
Warning Type Number
Security Warnings 1
Total 1
Warnings
Click on a warning row to see full context information.
file:///home/elias/git/masterthesis/new-findsecbugs/find-sec-bugs/cli/results-html-pdf/4ddff[Link] 1/2
APPENDIX B. RAW DATA FROM EVALUATION XXXI
6/16/2020 FindBugs Report
Security Warnings
Code Warning
According the OpenID Connect specification, ID Tokens must be validated by the Relying
SECMVIDT
Party (Client). There are five values in the ID token response that must be verified todo2.
Details
MISSING_VERIFY_ID_TOKEN: Missing validation of ID Token.
According the OpenID Connect specification, ID Tokens must be validated by the Relying Party (Client).
You seem to be missing such validation in the code locations where you implement the token request flow.
There are five values in the ID token response that must be verified.
You may use an SDK-implemented validation if this implements all these checks.
Otherwise it is recommended to do these comparisons yourself.
file:///home/elias/git/masterthesis/new-findsecbugs/find-sec-bugs/cli/results-html-pdf/4ddff[Link] 2/2
Elias Brattli Sørensen Using Static Analysis to Detect Vulnerabilities in OpenID Connect Clients