Metamorphic Testing:
Addressing the oracle problem
5 December 2023
1
§ Faults are subtle
§ Discovered when testing with many inputs
(e.g., after deployment in the field)
§ Manually specifying expected results for all of
them is impractical
§ Automatically deriving expected results is
2 infeasible
Metamorphic Testing alleviates the Oracle Problem
• Invented by T.Y. Chen in 1998 G
• Metamorphic Testing (MT) assumes that a
• it is simpler to reason about relations between
outputs of multiple test executions, than to specify b e
the output of the system for a given input c
• MT is a property-based testing approach d
• In MT, system properties are captured as metamorphic f
relations (MRs) that
• specify how to automatically transform an initial set f
of test inputs (source inputs) into follow-up test
inputs
• specify the relation between the outputs obtained
from source and follow-up inputs function under test: shortPath
• A failure is observed when such relations are violated
x2=(π- x1) ⇒ sin(x1) = sin(x2) x1=(G,a,f) ∧ x2=(G,f,a)
⇒ len(shortPath(x1)) = len(shortPath(x2))
3
Application Domains
Picture from:
§ S. Segura, D. Towey, Z. Q. Zhou and T. Y. Chen, "Metamorphic
Testing: Testing the Untestable," in IEEE Software, vol. 37, no. 3,
pp. 46-53, May-June 2020, doi: 10.1109/MS.2018.2875968.
Other relevant surveys:
§ S. Segura, G. Fraser, A. B. Sanchez and A. Ruiz-Cortés, "A
Survey on Metamorphic Testing," in IEEE Transactions on
Software Engineering, vol. 42, no. 9, pp. 805-824, 1 Sept. 2016,
doi: 10.1109/TSE.2016.2532875.
§ Tsong Yueh Chen, Fei-Ching Kuo, Huai Liu, Pak-Lok Poon, Dave
Towey, T. H. Tse, and Zhi Quan Zhou. 2018. Metamorphic
Testing: A Review of Challenges and Opportunities. ACM
Comput. Surv. 51, 1, Article 4 (January 2019), 27 pages.
https://doi.org/10.1145/3143561
4
Examples: Testing Web engines
MR: with an additional filtering criterion, the returned results should be lower or equal
5
Examples: Testing Web engines (2)
Z. Q. Zhou, L. Sun, T. Y. Chen and D. Towey, "Metamorphic Relations for Enhancing System Understanding and Use," in IEEE Transactions on Software Engineering,
vol. 46, no. 10, pp. 1120-1154, 1 Oct. 2020, doi: 10.1109/TSE.2018.2876433.
6
Examples: Testing Web APIs
S. Segura, J. A. Parejo, J. Troya and A. Ruiz-Cortés, "Metamorphic Testing of RESTful Web APIs," in IEEE Transactions
on Software Engineering, vol. 44, no. 11, pp. 1083-1099, 1 Nov. 2018, doi: 10.1109/TSE.2017.2764464.
7
Examples: Testing Deep Neural Networks
MR: The steering angle predicted by the DNN should remain
the same even in the presence of fog
https://towardsdatascience.com/metamorphic-testing-of-machine-
learning-based-systems-e1fe13baf048
Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. DeepTest: automated testing of deep-neural-network-driven
autonomous cars. In Proceedings of the 40th International Conference on Software Engineering (ICSE '18). Association for Computing
Machinery, New York, NY, USA, 303–314. https://doi.org/10.1145/3180155.3180220 8
Examples: Elevators
MR: If we increase the number of elevators (simulated in the test environment) the average wait time should decrease
J. Ayerdi, S. Segura, A. Arrieta, G. Sagardui and M. Arratibel, "QoS-aware Metamorphic Testing: An Elevation Case Study," 2020 IEEE
31st International Symposium on Software Reliability Engineering (ISSRE), Coimbra, Portugal, 2020, pp. 104-114, doi:
10.1109/ISSRE5003.2020.00019. 9
Examples: Compilers
Chen, T. Y., Kuo, F. C., Ma, W., Susilo, W., Towey, D., Voas, J., & Zhou, Z. Q. (2016). Metamorphic Testing for
Cybersecurity. Computer, 49(6), 48–55. https://doi.org/10.1109/MC.2016.176
10
Deriving Metamorphic Relations
§ Input-driven approach
§ thinking of changes to the program’s inputs that should produce expected changes in the outputs.
§ the possible changes in the input parameters depend on their data type.
§ Example for lists: adding an element to the list; removing an element from the list; splitting the list; reordering
the list; and so on.
§ numerical and graph-theory programs
§ Output-driven approach
§ starting from possible relations among outputs typically found in the target domain, and then thinking
about what kind of changes in the program's inputs would lead to satisfaction of the expected relation
among outputs
§ example output relations having a result set which is a subset of another result set; having two result sets
containing the same items; or having two disjoint result sets (sets with no common elements)
11
Inference of Metamorphic Relations
§ State-of-the-art: search-based approach (AutoMR)
§ Zhang, B., Zhang, H., Chen, J., Hao, D., & Moscato, P. (2019). Automatic Discovery and Cleansing of Numerical Metamorphic Relations. 2019 IEEE International
Conference on Software Maintenance and Evolution (ICSME), 235–245. https://doi.org/10.1109/ICSME.2019.00035
§ Other approaches:
§ Genetic programming (industrial application)
§ Zhang, B., Zhang, H., Chen, J., Hao, D., & Moscato, P. (2019). Automatic Discovery and Cleansing of Numerical Metamorphic Relations. 2019 IEEE International Conference on
Software Maintenance and Evolution (ICSME), 235–245. https://doi.org/10.1109/ICSME.2019.00035
§ Symbolic regression
§ Hong, J., Zhang, J., Qiu, Q., Ma, A., Li, M., Yan, S., & Gong, H. (2022). A Dynamic Recognition Method of Metamorphic Relation Identification. 13th International Conference on
Reliability, Maintainability, and Safety: Reliability and Safety of Intelligent Systems, ICRMS 2022, 81–86. https://doi.org/10.1109/ICRMS55680.2022.9944595
12
SnT contribution:
Metamorphic Testing for Web System Security
Framework for Metamorphic Testing automation
Work partially done with University of Ottawa
Nazanin Bayati Fabrizio Pastore Arda Goknil Lionel Briand
University of Ottawa University of Luxembourg University of Ottawa
SINTEF Digital, Norway
University of Luxembourg
13 September 2023 Nazanin Bayati - Metamorphic Testing for Web System Security 13 13
Metamorphic Security Testing
• Source input: a sequence of valid interactions with the system
{login(Admin), RequestURL(settings_page)}
• Follow-up input: generated by altering valid interactions as an attacker would do
{login(User1), RequestURL(settings_page)}
• Relations: capture properties that hold when the system is not vulnerable
if the user in the follow-up input cannot access the URL from her GUI then the output of the
source and follow-up inputs should be different
14
MST-wi: Metamorphic Security Testing for Web Interfaces
1 2
Select or Specify the Translate Metamorphic
Metamorphic Relations Relations to Java
List of Executable
Metamorphic Relations Metamorphic
Catalog of 76 Relations in Java
Metamorphic Relations
3 Log in 4
Execute the Data
Log in
Execute the
Collection Framework Submit Metamorphic Testing
form
logout Framework
logout Test results
Web System Source Inputs
15
MST-wi – MR Example
• Security issue: Bypass Authorization Schema
16
MST-wi – MR Example
• Security issue: Bypass Authorization Schema
17
MST-wi – MR Example
• Security issue: Bypass Authorization Schema
18
MST-wi – MR Example
• Security issue: Bypass Authorization Schema
19
MST-wi – MR Example
• Security issue: Bypass Authorization Schema
20
MST-wi – MR Example
• Security issue: Bypass Authorization Schema
21
MST-wi – MR Example
• Security issue: Bypass Authorization Schema
Our metamorphic testing algorithm executes
each MR multiple times, to ensure that every
possible combination of source and follow-up
inputs is exercised
22
D4.4 Prototype of a toolset for specification-based functional security testing of CPS
M18 EC review
23
Demo: Web systems
Project meeting
2023, June 13
24
MST Demo Objective: detect a real vulnerability
Project meeting
2023, June 13
25
MST Demo Objective: detect a real vulnerability
• Only admins should be able to launch/relaunch agent slaves
• But users can do it
M18 EC
D4.4
26 Prototype
review of a toolset for specification-based functional security testing of CPS
D4.4 Prototype of a toolset for specification-based functional security testing of CPS
M18 EC review
27
Demo: ROS
D4.4 Prototype of a toolset for specification-based functional security testing of CPS
M18 EC review
28
ROS vulnerability https://github.com/aliasrobotics/RVD/issues/88
When running a test scenario
Publisher1 Master Subscriber1 Attacker1 with two interacting actors,
executing a third unauthorized
advertise(“position”) actor interacting with one of the
subscribe(“position”) two should not alter the output.
publisherUpdate([“Publisher1-URI”])
tcpConnect(“position”)
data(“position1”) • Source input: a test scenario with
interacting components
print(“position1”)
data(“position2”) • Follow-up input: a test scenario
with additional remote calls to one
print(“position2”)
component from one unauthorized
component
•
publisherUpdate([])
Relations: the output should be
X
data(“position3”)
the same
print(“position3”)
D4.4 Prototype of a toolset for specification-based functional security testing of CPS
M18 EC review
29
MR for ROS vulnerability pattern
Source input
Follow-up input
2nd Follow-up input
User-provided Source input
1st Follow-up input
catalog file
Delay
D4.4 Prototype of a toolset for specification-based functional security testing of CPS
M18 EC review
30
D4.4 Prototype of a toolset for specification-based functional security testing of CPS
M18 EC review
31
Empirical Results
MST-wi – Research Questions
• RQ1. What testing activities can be automated thanks to oracle automation provided by MST-wi?
• RQ2. What vulnerability types can MST-wi detect?
• RQ3. What testability guidelines can we define to enable effective test automation with MST-wi?
• RQ4. How does MST-wi compare to state-of-the-art SAST and DAST tools?
• RQ5. Can we identify patterns for writing MST-wi relations?
• RQ6. Is MST-wi effective?
• RQ7. Is MST-wi efficient?
32
MST-wi – What vulnerability types can MST-wi detect?
• We investigated the feasibility of implementing MRs that discover the vulnerability types described in the
MITRE Common Weakness Enumeration (CWE) database
• Considered three subsets:
• CWE view for common security architectural tactics
• CWE Top 25 most dangerous software errors
• OWASP Top 10 Web security risks
• To implement an MR, for each weakness, we first inspect its description, its demonstrative examples, the
description of concrete vulnerabilities (CVE) and common attack patterns (CAPEC) associated with the
weakness.
• This process led to a catalog of 76 MRs.
33
MST-wi – What vulnerability types can MST-wi detect?
Summary of the CWE architectural security design principles and weaknesses
addressed by MST-wi.
Security Design Principle Vulnerability types Addressed by MST-wi Rank
Audit 6 1(16%) 10th
Authenticate Actors 28 12 (43%) 4th
Authorize Actors 60 34 (57%) 3rd
Cross Cutting 9 3 (33%) 6th
Encrypt Data 38 8 (21%) 8th
Identify Actors 12 3 (25%) 7th
Limit Access 8 3 (38%) 5th
Limit Exposure 6 0 (0%) 11th
Lock Computer 1 0 (0%) 11th
Manage User Session 6 4 (67%) 2nd
Validate Inputs 39 31 (79%) 1st
Verify Message Integrity 19 2 (20%) 9th
Total 223 101 (45%)
34
MST-wi – How does MST-wi compare to state-of-the-art SAST and
DAST tools?
• We compared the vulnerability types detected by MST-wi, with the vulnerability types detected by state-
of-the-art SAST and DAST tool reported in a recent empirical study
35
MST-wi – How does MST-wi compare to state-of-the-art SAST and
DAST tools?
Security Design Weaknesses Addresses by Weaknesses Addressed by MST but not Weaknesses bot addresses by MST but
Principle by addresses by
MST Zap DA2 Sonar SA2 Zap DA2 Sonar SA2 Zap DA2 Sonar SA2
Audit 1 0 0 0 3 1 1 1 0 0 0 0 2
Authenticate Actors 12 0 2 1 9 12 11 11 7 0 1 0 4
Authorize Actors 34 2 0 1 13 32 34 34 25 0 0 1 4
3 0 0 2 0 3 3 2 3 0 0 1 0
Cross Cutting
Encrypt Data 8 2 5 8 10 8
The
8
set of7 weaknesses
4
targeted
2 5
by MST-wi
7 6
Identify Actors 3 1 1 1 7 3 is larger
3 than
3 what1 can be1 targeted
1 by applying
1 5
Limit Access 3 0 1 1 5 3 all
3 four competing
2 0 approaches
0 1 together.
0 2
Limit Exposure 0 1 0 0 1 0 0 0 0 1 0 0 1
Lock Computer 0 0 0 0 0 0 0 0 0 0 0 0 0
Manage User Session 4 0 0 0 2 4 4 4 2 0 0 0 0
Validate Inputs 31 10 7 2 14 24 25 30 19 3 1 1 2
Verify Message Integrity 2 1 0 0 3 2 2 2 1 1 0 0 2
Total 101 17 16 16 67 92 94 96 62 8 9 11 28
84
36
MST-wi – Is MST-wi effective?
Applied MST-wi to test well-known Web systems:
• Jenkins v 2.121
• Joomla v. 3.8.7.
Assessed MST-wi capability to detect known vulnerabilities:
• 11 for Jenkins, 3 for Joomla.
• One of them discovered by MST-wi (CVE-2018-17857)
Considered two setups:
• Derive source inputs with crawler only
• Consider additional manually implemented functional test cases
Metrics:
• Sensitivity: proportion of vulnerabilities identified
• Specificity: proportion of inputs not leading to false alarms
13 September 2023 Nazanin Bayati - Metamorphic Testing for Web System Security 37
MST-wi – Is MST-wi effective?
• The high specificity indicates that only a negligible fraction of follow-up inputs leads to false alarms
• Since sensitivity reflects the fault detection rate (i.e., the proportion of vulnerabilities discovered),
we conclude that our approach is highly effective
• We can discover more than 60% of vulnerabilities in a completely automated manner, using only the crawler
• And up to 85% using both crawler and manual inputs
38
https://github.com/MetamorphicSecurityTesting/MST
39
Metamorphic Testing:
Addressing the oracle problem
5 December 2023
40