XPath Cheatsheet - Essential Guide for Web Scraping &
Automation
A comprehensive reference guide for web developers, QA engineers, and data analysts
🔍 Basic Selectors
Descendant Navigation
Purpose CSS XPath Notes
Select all h1 elements h1 //h1
Select p inside div div p //div//p Any depth
Direct child ul > li //ul/li First level only
Nested direct ul > li > a //ul/li/a
All children div > * //div/*
Document root :root /
Body from root :root > body /body
Attribute Selection
Purpose CSS XPath Notes
ID selector #id //*[@id="id"]
Class selector .class //*[@class="class"]
Type attribute input[type="submit"] //input[@type="submit"]
Multiple attributes a#abc[for="xyz"] //a[@id="abc"][@for="xyz"]
Has attribute a[rel] //a[@rel]
Starts with a[href^='/'] //a[starts-with(@href, '/')]
Ends with a[href$='pdf'] //a[ends-with(@href, '.pdf')]
Contains a[href*='://'] //a[contains(@href, '://')]
Word match a[rel~='help'] //a[contains(@rel, 'help')]
🔢 Order & Position
Purpose CSS XPath
First child ul > li:first-of-type //ul/li[1]
Second child ul > li:nth-of-type(2) //ul/li[2]
Last child ul > li:last-of-type //ul/li[last()]
First with ID li#id:first-of-type //li[1][@id="id"]
First of type a:first-child //*[1][name()="a"]
Last of type a:last-child //*[last()][name()="a"]
👫 Sibling Selection
Purpose CSS XPath
Following siblings h1 ~ ul //h1/following-sibling::ul
Adjacent sibling h1 + ul //h1/following-sibling::ul[1]
Following with ID h1 ~ #id //h1/following-sibling::*[@id="id"]
🔄 Traversal (jQuery Equivalent)
jQuery XPath Purpose
$('ul > li').parent() //ul/li/.. Select parent
$('li').closest('section') //li/ancestor-or-self::section Closest ancestor
$('a').attr('href') //a/@href Get attribute
$('span').text() //span/text() Get text content
💡 Advanced Techniques
Special Selectors
Purpose XPath Notes
Negation //h1[not(@id)] Elements without attribute
Exact text match //button[text()="Submit"]
Text contains //button[contains(text(),"Go")]
Arithmetic comparison //product[@price > 2.50]
Has any children //ul[*]
Has specific child //ul[li]
OR logic //a[@name or @href]
Union (either/or) `//a //div`
Special Class Selection
xpath
//div[contains(concat(' ',normalize-space(@class),' '),' foobar ')]
🧭 Axes & Paths
Path Types
Prefix Example Description
// //hr[@class='edge'] Search anywhere in document
./ ./a Relative to current node
/ /html/body/div From document root
Common Axes
Axis Short Form Notes
attribute @ @href == attribute::href
child (default) div == child::div
descendant-or-self // // == /descendant-or-self::node()
self . . == self::node()
parent .. .. == parent::node()
following-sibling Siblings after current node
preceding-sibling Siblings before current node
ancestor All ancestor elements
ancestor-or-self Including current node
🔍 Predicates & Filters
Example Description
//div[true()] Boolean filter
//div[@class="head"] Match attribute
//div[@class="head"][@id="top"] Chain conditions
//ul[count(li) > 2] Count children
//ul[count(li[@class='hide']) > 0] Nested filtering
Operators
xpath
//a[@id = "xyz"]
//a[@id != "xyz"]
//a[@price > 25]
//div[@id="head" and position()=2]
//div[(x and y) or not(z)]
Element Position
xpath
//a[1] # first <a>
//a[last()] # last <a>
//ol/li[2] # second <li>
//ol/li[position()=2] # same
//ol/li[position()>1] # not first element
🧮 Useful Functions
Node Functions
xpath
name() # Current element name
text() # Text content
count() # Count nodes
position() # Position index (1-based)
String Functions
xpath
contains(str, substr)
starts-with(str, substr)
ends-with(str, substr)
concat(x, y)
substring(str, start, len)
substring-before("01/02", "/") # Returns "01"
substring-after("01/02", "/") # Returns "02"
normalize-space() # Trim whitespace
string-length() # String length
🔄 Complex Examples
xpath
//* # All elements
count(//*) # Count all elements
(//h1)[1]/text() # Text of first h1
//li[span] # <li> with <span> inside
//ul/li/.. # Select parent
//section[h1[@id='section-name']] # Section containing specific h1
//item[@price > 2*@discount] # Price comparison