How to Convert XML to YAML using Python
In this tutorial, you’ll learn how to convert XML to YAML using Python.
You’ll explore various methods to transform XML to YAML, handling different XML structures and customizing the output
Using xmltodict and PyYAML
You can use the xmltodict library to parse XML and PyYAML to write YAML files.
import xmltodict
import yaml
xml_data = '''
<employees>
<employee id="1">
<name>Amina</name>
<role>Developer</role>
</employee>
<employee id="2">
<name>Omar</name>
<role>Designer</role>
</employee>
</employees>
'''
# Convert XML to dictionary
data_dict = xmltodict.parse(xml_data)
# Convert dictionary to YAML
yaml_data = yaml.dump(data_dict, sort_keys=False)
print(yaml_data)
Output:
employees:
employee:
- '@id': '1'
name: Amina
role: Developer
- '@id': '2'
name: Omar
role: Designer
The XML structure is parsed into a Python dictionary and then serialized into YAML format.
Convert XML with Attributes
Handle XML attributes by ensuring they are correctly represented in the YAML output.
import xmltodict
import yaml
xml_data = '''
<library>
<book id="101">
<title>Python Programming</title>
<author>Hassan</author>
</book>
<book id="102">
<title>Data Science Essentials</title>
<author>Laila</author>
</book>
</library>
'''
data_dict = xmltodict.parse(xml_data)
yaml_data = yaml.dump(data_dict, sort_keys=False)
print(yaml_data)
Output:
library:
book:
- '@id': '101'
title: Python Programming
author: Hassan
- '@id': '102'
title: Data Science Essentials
author: Laila
Attributes like id are prefixed with @ in the YAML output to differentiate them from child elements.
Custom Parsing
Using ElementTree
You can use the ElementTree to manually extract each element for customizing the XML parsing process:
import xml.etree.ElementTree as ET
import yaml
xml_data = '''
<products>
<product>
<name>Smartphone</name>
<price>699</price>
</product>
<product>
<name>Laptop</name>
<price>999</price>
</product>
</products>
'''
root = ET.fromstring(xml_data)
products = []
for product in root.findall('product'):
prod = {
'name': product.find('name').text,
'price': int(product.find('price').text)
}
products.append(prod)
yaml_data = yaml.dump({'products': products}, sort_keys=False)
print(yaml_data)
Output:
products:
- name: Smartphone
price: 699
- name: Laptop
price: 999
Using lxml
You can use the lxml library for advanced XML parsing capabilities.
from lxml import etree
import yaml
xml_data = '''
<company>
<employee>
<name>Yasmine</name>
<department>HR</department>
</employee>
<employee>
<name>Karim</name>
<department>Engineering</department>
</employee>
</company>
'''
root = etree.fromstring(xml_data)
employees = []
for emp in root.findall('employee'):
employee = {
'name': emp.findtext('name'),
'department': emp.findtext('department')
}
employees.append(employee)
yaml_data = yaml.dump({'employees': employees}, sort_keys=False)
print(yaml_data)
Output:
employees:
- name: Yasmine
department: HR
- name: Karim
department: Engineering
Custom Key Naming
You can rename the keys by iterating over them and modifying the key:
import xmltodict
import yaml
xml_data = '''
<inventory>
<item id="201">
<productName>Tablet</productName>
<quantity>50</quantity>
</item>
<item id="202">
<productName>Headphones</productName>
<quantity>150</quantity>
</item>
</inventory>
'''
data_dict = xmltodict.parse(xml_data)
# Rename keys
items = []
for item in data_dict['inventory']['item']:
items.append({
'product_id': item['@id'],
'name': item['productName'],
'stock': int(item['quantity'])
})
yaml_data = yaml.dump({'inventory': items}, sort_keys=False)
print(yaml_data)
Output:
inventory:
- product_id: '201'
name: Tablet
stock: 50
- product_id: '202'
name: Headphones
stock: 150
Convert Specific Elements
You can convert specific elements from XML by specifying the elements during the iteration process:
import xmltodict
import yaml
xml_data = '''
<university>
<student>
<name>Salma</name>
<major>Biology</major>
<gpa>3.8</gpa>
</student>
<student>
<name>Tarek</name>
<major>Mathematics</major>
<gpa>3.9</gpa>
</student>
</university>
'''
data_dict = xmltodict.parse(xml_data)
# Extract only names and majors
students = []
for student in data_dict['university']['student']:
students.append({
'name': student['name'],
'major': student['major']
})
yaml_data = yaml.dump({'students': students}, sort_keys=False)
print(yaml_data)
Output:
students:
- name: Salma
major: Biology
- name: Tarek
major: Mathematics
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.