How to Convert XML to YAML using Python

In this tutorial, you’ll learn how to convert XML to YAML using Python.

You’ll explore various methods to transform XML to YAML, handling different XML structures and customizing the output

Table of Contents hide

1 Using xmltodict and PyYAML
2 Convert XML with Attributes
3 Custom Parsing
- 3.1 Using ElementTree
- 3.2 Using lxml
4 Custom Key Naming
5 Convert Specific Elements

Using xmltodict and PyYAML

You can use the xmltodict library to parse XML and PyYAML to write YAML files.

import xmltodict
import yaml
xml_data = '''
<employees>
    <employee id="1">
        <name>Amina</name>
        <role>Developer</role>
    </employee>
    <employee id="2">
        <name>Omar</name>
        <role>Designer</role>
    </employee>
</employees>
'''

# Convert XML to dictionary
data_dict = xmltodict.parse(xml_data)

# Convert dictionary to YAML
yaml_data = yaml.dump(data_dict, sort_keys=False)
print(yaml_data)

Output:

employees:
  employee:
    - '@id': '1'
      name: Amina
      role: Developer
    - '@id': '2'
      name: Omar
      role: Designer

The XML structure is parsed into a Python dictionary and then serialized into YAML format.

Convert XML with Attributes

Handle XML attributes by ensuring they are correctly represented in the YAML output.

import xmltodict
import yaml
xml_data = '''
<library>
    <book id="101">
        <title>Python Programming</title>
        <author>Hassan</author>
    </book>
    <book id="102">
        <title>Data Science Essentials</title>
        <author>Laila</author>
    </book>
</library>
'''
data_dict = xmltodict.parse(xml_data)
yaml_data = yaml.dump(data_dict, sort_keys=False)
print(yaml_data)

Output:

library:
  book:
    - '@id': '101'
      title: Python Programming
      author: Hassan
    - '@id': '102'
      title: Data Science Essentials
      author: Laila

Attributes like id are prefixed with @ in the YAML output to differentiate them from child elements.

Custom Parsing

Using ElementTree

You can use the ElementTree to manually extract each element for customizing the XML parsing process:

import xml.etree.ElementTree as ET
import yaml
xml_data = '''
<products>
    <product>
        <name>Smartphone</name>
        <price>699</price>
    </product>
    <product>
        <name>Laptop</name>
        <price>999</price>
    </product>
</products>
'''
root = ET.fromstring(xml_data)
products = []
for product in root.findall('product'):
    prod = {
        'name': product.find('name').text,
        'price': int(product.find('price').text)
    }
    products.append(prod)
yaml_data = yaml.dump({'products': products}, sort_keys=False)
print(yaml_data)

Output:

products:
  - name: Smartphone
    price: 699
  - name: Laptop
    price: 999

Using lxml

You can use the lxml library for advanced XML parsing capabilities.

from lxml import etree
import yaml
xml_data = '''
<company>
    <employee>
        <name>Yasmine</name>
        <department>HR</department>
    </employee>
    <employee>
        <name>Karim</name>
        <department>Engineering</department>
    </employee>
</company>
'''
root = etree.fromstring(xml_data)
employees = []
for emp in root.findall('employee'):
    employee = {
        'name': emp.findtext('name'),
        'department': emp.findtext('department')
    }
    employees.append(employee)
yaml_data = yaml.dump({'employees': employees}, sort_keys=False)
print(yaml_data)

Output:

employees:
  - name: Yasmine
    department: HR
  - name: Karim
    department: Engineering

Custom Key Naming

You can rename the keys by iterating over them and modifying the key:

import xmltodict
import yaml
xml_data = '''
<inventory>
    <item id="201">
        <productName>Tablet</productName>
        <quantity>50</quantity>
    </item>
    <item id="202">
        <productName>Headphones</productName>
        <quantity>150</quantity>
    </item>
</inventory>
'''
data_dict = xmltodict.parse(xml_data)

# Rename keys
items = []
for item in data_dict['inventory']['item']:
    items.append({
        'product_id': item['@id'],
        'name': item['productName'],
        'stock': int(item['quantity'])
    })
yaml_data = yaml.dump({'inventory': items}, sort_keys=False)
print(yaml_data)

Output:

inventory:
  - product_id: '201'
    name: Tablet
    stock: 50
  - product_id: '202'
    name: Headphones
    stock: 150

Convert Specific Elements

You can convert specific elements from XML by specifying the elements during the iteration process:

import xmltodict
import yaml
xml_data = '''
<university>
    <student>
        <name>Salma</name>
        <major>Biology</major>
        <gpa>3.8</gpa>
    </student>
    <student>
        <name>Tarek</name>
        <major>Mathematics</major>
        <gpa>3.9</gpa>
    </student>
</university>
'''
data_dict = xmltodict.parse(xml_data)

# Extract only names and majors
students = []
for student in data_dict['university']['student']:
    students.append({
        'name': student['name'],
        'major': student['major']
    })
yaml_data = yaml.dump({'students': students}, sort_keys=False)
print(yaml_data)

Output:

students:
  - name: Salma
    major: Biology
  - name: Tarek
    major: Mathematics

Mokhtar Ebrahim

Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.

Using xmltodict and PyYAML

Convert XML with Attributes

Custom Parsing

Using ElementTree

Using lxml

Custom Key Naming

Convert Specific Elements

Related posts

Leave a Reply Cancel reply