Web Application Mapping
INFR 4662U – Winter 2020
Garrett Hayes
Excerpts and concepts taken from the Web Application Hacker’s Handbook 2nd Edition License: Creative Commons
Stuttard & Pinto, Wiley Press
2
E n u m e ra t i n g C o n t e n t
3
Enumeration Basics
§ Enumeration refers to identifying the set of
resources and functionality that’s part of a
web application
§ This includes pages, JS files, application
logs, external resources, etc.
§ Basic enumeration can be done by simply
visiting the web application and exploring
how it works
§ Other automated and systematic approaches
exist to sus out functionality occurring behind
the scenes
4
What is Spidering?
§ Spidering refers to the use of automated
tools that identify and recursively follow links
in a web application to collect information
about its structure
§ Content without direct links can be
found using brute-force techniques that
look for common/predictable content
and page names
§ Effective spidering utilities will also parse
JS and forms to identify backend
functionality like APIs, WebSockets, etc.
5
Automated Spidering
§ Automated spidering tools can miss whole
areas of an application due to:
§ JS being used to render links and drop-
down menus not visible to the utility
§ Form submission endpoints not being
seen due to failed automatic form filling
§ AJAX-rendered pages may not show until
an action is completed by a user
(e.g. logging in)
6
Automated Spidering
§ Automated spidering tools can miss whole
areas of an application due to:
§ Random values in the URL (e.g. expiry
times) may cause the application to
spider forever
§ Some content may not be accessible by
authenticated users
§ Embedded objects like Java applets are
difficult to spider and may contain links
or consume other backend assets
7
Enumeration: robots.txt
§ In some cases, a webmaster may not want
automated spidering tools (like a GoogleBot)
to cache or crawl specific pages
§ To avoid this, administrators create a
robots.txt file in the web root that
identifies all pages that shouldn’t be
mapped
§ This file often contains sensitive
endpoints and directories not intended
to show up on Google, of which are very
interesting to an attacker
8
Manual/Directed Spidering
§ Since a variety of situations cause automated
spidering tools to fail, some pentesters will
manually explore a web application while
using an intercepting proxy to automatically
build a map of the site
§ For example, one might use BurpSuite to
automatically index all pages and
resources found while browsing
§ Two common intercepting proxies are
BurpSuite and WebScarab
9
Manual/Directed Spidering
§ Manual spidering is often superior to
automated spidering for many reasons,
including:
§ More effective identification and
following of navigation controls
§ Avoiding application actions that can
break a site (for example, calling a
backup script or backend functionality)
§ Identifying pages & resources only
available to logged-in users
10
Spidering Tool: BurpSuite
11
Enumerating Hidden Content
§ Some web application functionality is not made
visible to users through links or buttons
§ Examples:
§ A form submission triggers a backend call to
another PHP file
§ A script called backup.php zips up the
contents of a web application
§ An automation script called test.php adds a
demo user to a web app
§ Some web app functionality may not be
visible to all users
12
Enumerating Hidden Content
§ Common hidden content I’ve seen in many
pentests include:
§ Backup files or code files with extensions
like index.php.bak
§ Old versions of files/code that can still be
called (e.g. home2.php may imply
home1.php exists)
§ Exposed configuration files
§ Hidden directories used for testing/backups
that have directory indexing enabled
§ Exposed log files
13
Brute-Force Enumeration
§ In order to identify backend content not directly
visible to users, the use of automated brute
forcing utilities is paramount
§ I recommend gobuster, but there is also
a GUI version called DirBuster that ships
with Kali
§ Brute-force utilities require three inputs:
1. A good wordlist containing common
directory and file names
2. One or more known file extensions likely to
be used by the web app (e.g. .php)
3. A starting point ( / , for example )
14
GoBuster Brute-Force Attempt
ubuntu@security:~$ ./go/bin/gobuster dir -w ~/Wordlists/common.txt -s 200 -u http://xxxxxxxx.com
===============================================================
Gobuster v3.0.1
by OJ Reeves (@TheColonial) & Christian Mehlmauer (@_FireFart_)
===============================================================
[+] Url: http:// xxxxxxxx.com
[+] Threads: 10
[+] Wordlist: /home/ubuntu/Wordlists/common.txt
[+] Status codes: 200
[+] User Agent: gobuster/3.0.1
[+] Add Slash: true
[+] Timeout: 10s
===============================================================
2020/01/02 19:44:08 Starting gobuster
===============================================================
/backup/ (Status: 200)
/css/ (Status: 200)
/fonts/ (Status: 200)
/highslide/ (Status: 200)
/icons/ (Status: 200)
/images/ (Status: 200)
/js/ (Status: 200)
15
Brute-Force Results
16
Brute-Force Enumeration
§ When brute-forcing an application, each request will
return a status code
§ Some common “gotchas” for status codes include:
§ 302 often means a resource exists but you must be
logged in to access it
§ 401 & 403 means the resource exists but is not
accessible by any user
§ A 200 code for a page that would never exist (e.g.
/dassdsdads.php) indicates a redirect is occurring
§ A 400 code indicates you’re using an incorrect
extension or incorrectly formatted RESTful URL
17
Brute-Force Wordlists
§ Most web applications use common page
names and endpoint URLs, allowing us to
generate effective wordlists by crawling the
web
§ SecLists on GitHub has a lot of great
wordlists, including RobotsDisallowed-
Top1000.txt and common.txt
§ Don’t forget that a lot can vary in a web app.
You may need to:
§ Use a trailing slash when brute-forcing
directories
§ Add a specific file extension to requests
§ Filter out non 200/300 status codes
18
Inferring Web Content
§ Considering the structured nature of web apps, it’s
common to see predictable page names or RESTful
resource URLs when exploring or spidering
§ For example:
https://example.com/users/user/1
May infer the following pages exist:
https://example.com/users/user/2
https://example.com/users/
https://example.com/admins/
https://example.com/admins/user/1
https://example.com/admins/admin/1
19
Inferring File Extensions
§ Although a web app may consistently use a single
file extension, like .php for example, it’s possible
that other file extensions exist and are used for
backups, alternative versions of files, or older
versions of files
§ It makes sense to use a good wordlist and append
the following extensions when brute-forcing files:
§ .old § .tar § ~1
§ .bak § .tar.gz § .tmp
§ .backup § .zip § .temp
§ .sql § .src
§ .txt § .php5
20
Server Misconfigurations
§ Even if a web application is built securely, it is
possible that the underlying webserver is
misconfigured and leaking sensitive
information
§ Webservers can leak resources like:
§ Whole directory contents if directory
indexing is enabled
§ Users on a system, especially if user
directories are enabled
21
Directory Indexing Misconfiguration
22
User Directories Misconfiguration
Google Dork: inurl:"/~john" intext:"index of"
Note: when user directories are enabled
in Apache, users on the system that have
a public_html directory in their home
path will automatically have that
directory make public at the location
/~username
What might our next steps be to
identify additional users on the system?
23
Hidden Parameters
§ Webmasters may use custom or hidden parameters
in GET or POST requests to toggle the visibility or
functionality of a web app
§ For example, the following URLs may result in a
response with different content and lengths:
https://example.com/index.php
https://example.com/index.php?debug=1
§ A brute-force tool can be used to find hidden
parameters using:
§ Common parameter names like test, debug,
bypass, source, etc.
§ Common parameter values like 0, 1, true, false,
null
24
Discovering User Input
25
Analyzing User Input
§ In preparation for future exploitation attempts, its
crucial to identify all user input fields and actions
that can be submitted to the web application
§ User input may be present in:
§ URLs using standard GET request parameters
§ RESTful URLs between slashes
§ Cookies
§ HTTP headers
§ Out-of-band channels
26
User Input: URLs
§ Standard URLS that include GET parameters
take user input or input that directs the
functionality of the web application
§ Typical URL parameters look like:
/search.php?searchTerm=data&results=10
§ Some abnormal URL parameter styles do
exist, such as:
/process/search;searchTerm=data
/process/search?searchTerm=data$results=10
/process/searchTerm=data/search
/process/search?searchTerm=data:data2
27
User Input: RESTful URLs
§ RESTful URLs do not use standard GET parameters;
rather, data is provided inline in the URL between
slashes
§ Typical RESTful URL parameters look like:
/search/data
§ Other alternative forms exist, such as:
/search/searchTerm/data
/search/searchTerm/data/
/search/data/10
/search/data/data2/10.json
§ In the last case, output data is requested in JSON
format – it may also be possible to ask for .txt or .xml
28
User Input: Cookies
§ Cookies set by the web application may be used to
identify a user or store data temporarily for a
session
§ Cookie values may be looked up in a
database or may be used to load specific
resources
§ For example, a cookie can be used to rebuild a
shopping cart:
Cookie: cart=item676&cart=item888&discount=10
§ Or can be used to identify a user:
Cookie: username=joe.blow&authenticated=1
29
User Input: HTTP Headers
§ HTTP headers are automatically generated by client
browsers, but may be used by a web application
when directing functionality or enforcing access
control mechanisms
§ The host header, for example, indicates to the
webserver which site the request is destined for
§ The user agent header indicates the kind of client
accessing the site (e.g. Chrome vs. GoogleBot)
§ Access control headers may provide session strings or
other client-identifying data that is passed to a backend
database or system
§ The X-Forwarded-For header used by load balancers
can be manipulated to make requests look like they’re
coming from the webserver
30
User Input: OOB
§ Out-of-band (OOB) functionality refers to any
code, scripts, automation tools, or external
services used to facilitate the operations of a
web application
§ These include external resources such as: web
forms (Google forms), SMTP services like
Mailgun, fileservers, etc.
§ OOB resources can be potentially manipulated to
modify input to a web application – especially if it’s
an API
§ For example, web services may use a provider
like MailGun to automatically receive password
reset requests via email
31
S e r v e r- S i d e A n a l y s i s
32
Technique: Banner Grabbing
§ Used to glean information about computer
systems on a network and the services
running on its open ports
§ Banner grabbing helps identify the version of
software running on a remote host
§ Usually performed on: HTTP, FTP, and SMTP
§ Tools commonly used:
§ Curl, telnet, Nmap, and Netcat
33
Banner Grabbing Example
Request:
curl -I https://ontariotechu.ca
Result:
HTTP/1.1 200 OK
Date: Mon, 13 Jan 2020 20:18:25 GMT
Server: Apache/2.4.18 (Ubuntu)
Strict-Transport-Security: max-age=2600000;
Vary: Host
Content-Type: text/html; charset=UTF-8
34
Analyzing File Extensions
§ File extensions are the simplest way to identify
the underlying technology being used to render
pages
§ Keep in mind that file extensions are
arbitrary and may be modified or removed
to evade dissection
§ Common extensions include:
§ .php & .php5 for PHP applications
§ .jsp for Java server pages
§ .pl for Perl CGIs or pages
§ .py for Python CGIs or pages
§ .dll for compiled CGIs or pages (C, C++, etc.)
§ .d2w for WebSphere
35
Analyzing Error Messages
§ The simplest way to determine the underlying
framework or webserver being used is to trigger a
fault in the system that causes an error page to show
§ For example, browsing to /sadklhadlkas will
likely causes a 404, of which may show the
webserver version
§ Manipulating GET parameters may cause SQL or
other application errors, ultimately leaking additional
information
§ Examples:
https://example.com?search=’
https://example.com/users?id=-1000
36
Analyzing Directory Names
§ Predictable and standard directory naming
conventions may indicate specific technologies
are being used
§ For example, Java servlets are often served
at web paths like /server/name
§ A few other modern and common cases
include:
§ /rails/ for ruby-on-rails applications
§ /pls/ for Oracle applications and SQL
gateways
37
Analyzing Session Tokens
§ Certain session token names (present in cookies)
may indicate specific web technologies are being
used by the application:
§ Java uses JSESSIONID
§ PHP uses PHPSESSID
§ The IIS webserver uses ASPSESSIONID
§ Whereas ASP.Net uses
ASP.NET_SessionID
§ Django uses a more generic session
38
Analysis Example #1
https://wahh-app.com/calendar.jsp?name=new%20applicants
&isExpired=0&startDate=22%2F09%2F2010
&endDate=22%2F03%2F2011&OrderBy=name
Example taken from the Web Application Hacker’s Handbook 2nd Edition
Stuttard & Pinto, Wiley Press
39
Analysis Example #2
https://wahh-app.com/workbench.aspx?template=NewBranch.tpl
&loc= /default&ver=2.31&edit=false
Example taken from the Web Application Hacker’s Handbook 2nd Edition
Stuttard & Pinto, Wiley Press
40
Analysis Example #3
POST /feedback.php HTTP/1.1
Host: wahh-app.com
Content-Length: 389
[email protected]&[email protected]&subject=
Problem+logging+in&message=Please+help...
Example taken from the Web Application Hacker’s Handbook 2nd Edition
Stuttard & Pinto, Wiley Press
41
Let’s break!
S e e Yo u N e x t T i m e