5 Methods To Split JSON array in Python
In this tutorial, you’ll learn various methods to split JSON arrays in Python.
You’ll learn about list slicing, condition-based splitting, using libraries like NumPy and Pandas, and more.
Using List Slicing
First, assume you have a JSON array like this:
import json
json_data = '''
[
{'id': 1, 'name': 'Customer A', 'plan': 'Premium'},
{'id': 2, 'name': 'Customer B', 'plan': 'Basic'},
{'id': 3, 'name': 'Customer C', 'plan': 'Standard'},
{'id': 4, 'name': 'Customer D', 'plan': 'Premium'},
{'id': 5, 'name': 'Customer E', 'plan': 'Basic'},
{'id': 6, 'name': 'Customer F', 'plan': 'Standard'},
{'id': 7, 'name': 'Customer G', 'plan': 'Premium'},
{'id': 8, 'name': 'Customer H', 'plan': 'Basic'},
{'id': 9, 'name': 'Customer I', 'plan': 'Standard'},
{'id': 10, 'name': 'Customer J', 'plan': 'Premium'}
]
'''
customers = json.loads(json_data)
Now, let’s say you want to split this array into two parts. For simplicity, let’s split it in the middle.
# Splitting the array
mid_index = len(customers) // 2
first_half = customers[:mid_index]
second_half = customers[mid_index:]
print("First half:", first_half)
print("Second half:", second_half)
Output:
First half: [{'id': 1, 'name': 'Customer A', 'plan': 'Premium'}, {'id': 2, 'name': 'Customer B', 'plan': 'Basic'}, {'id': 3, 'name': 'Customer C', 'plan': 'Standard'}, {'id': 4, 'name': 'Customer D', 'plan': 'Premium'}, {'id': 5, 'name': 'Customer E', 'plan': 'Basic'}]
Second half: [{'id': 6, 'name': 'Customer F', 'plan': 'Standard'}, {'id': 7, 'name': 'Customer G', 'plan': 'Premium'}, {'id': 8, 'name': 'Customer H', 'plan': 'Basic'}, {'id': 9, 'name': 'Customer I', 'plan': 'Standard'}, {'id': 10, 'name': 'Customer J', 'plan': 'Premium'}]
In this output, the original array has been divided into two halves.
The mid_index determines the splitting point, ensuring an even distribution of data between first_half and second_half.
Using List Comprehensions
You can split JSON array based on conditions using list comprehensions.
Suppose you want to separate customers into two groups based on their plan: ‘Premium’ and others.
premium_customers = [customer for customer in customers if customer['plan'] == 'Premium']
other_customers = [customer for customer in customers if customer['plan'] != 'Premium']
print("Premium Customers:", premium_customers)
print("Other Customers:", other_customers)
Output:
Premium Customers: [{'id': 1, 'name': 'Customer A', 'plan': 'Premium'}, {'id': 4, 'name': 'Customer D', 'plan': 'Premium'}, {'id': 7, 'name': 'Customer G', 'plan': 'Premium'}, {'id': 10, 'name': 'Customer J', 'plan': 'Premium'}]
Other Customers: [{'id': 2, 'name': 'Customer B', 'plan': 'Basic'}, {'id': 3, 'name': 'Customer C', 'plan': 'Standard'}, {'id': 5, 'name': 'Customer E', 'plan': 'Basic'}, {'id': 6, 'name': 'Customer F', 'plan': 'Standard'}, {'id': 8, 'name': 'Customer H', 'plan': 'Basic'}, {'id': 9, 'name': 'Customer I', 'plan': 'Standard'}]
Two new lists: premium_customers and other_customers, are created.
Using the numpy.array_split
The NumPy array_split is useful when you need to divide data into nearly equal parts, even when it can’t be divided evenly.
First, ensure you have numpy installed:
pip install numpy
Now, let’s apply numpy.array_split to our customer data. Assume the same JSON data as before, converted into a Python list.
import numpy as np
split_arrays = np.array_split(customers, 3)
for i, array in enumerate(split_arrays):
print(f"Part {i+1}:", array.tolist())
Output:
Part 1: [{'id': 1, 'name': 'Customer A', 'plan': 'Premium'}, {'id': 2, 'name': 'Customer B', 'plan': 'Basic'}, {'id': 3, 'name': 'Customer C', 'plan': 'Standard'}, {'id': 4, 'name': 'Customer D', 'plan': 'Premium'}]
Part 2: [{'id': 5, 'name': 'Customer E', 'plan': 'Basic'}, {'id': 6, 'name': 'Customer F', 'plan': 'Standard'}, {'id': 7, 'name': 'Customer G', 'plan': 'Premium'}]
Part 3: [{'id': 8, 'name': 'Customer H', 'plan': 'Basic'}, {'id': 9, 'name': 'Customer I', 'plan': 'Standard'}, {'id': 10, 'name': 'Customer J', 'plan': 'Premium'}]
In this example, numpy.array_split divides the list of customers into three parts. Unlike simple list slicing, numpy.array_split can handle uneven divisions.
Using Iterative Splitting
The iterative splitting is useful in cases where the division logic needs to be dynamically determined.
Suppose you want to split the customer data into multiple groups where each group has a mix of different plan types. Here’s how you can do it:
def iterative_split(data, group_size):
groups = []
temp_group = []
for item in data:
temp_group.append(item)
if len(temp_group) == group_size:
groups.append(temp_group)
temp_group = []
if temp_group:
groups.append(temp_group)
return groups
grouped_customers = iterative_split(customers, 3)
for i, group in enumerate(grouped_customers):
print(f"Group {i+1}:", group)
Output:
Group 1: [{'id': 1, 'name': 'Customer A', 'plan': 'Premium'}, {'id': 2, 'name': 'Customer B', 'plan': 'Basic'}, {'id': 3, 'name': 'Customer C', 'plan': 'Standard'}]
Group 2: [{'id': 4, 'name': 'Customer D', 'plan': 'Premium'}, {'id': 5, 'name': 'Customer E', 'plan': 'Basic'}, {'id': 6, 'name': 'Customer F', 'plan': 'Standard'}]
Group 3: [{'id': 7, 'name': 'Customer G', 'plan': 'Premium'}, {'id': 8, 'name': 'Customer H', 'plan': 'Basic'}, {'id': 9, 'name': 'Customer I', 'plan': 'Standard'}]
Group 4: [{'id': 10, 'name': 'Customer J', 'plan': 'Premium'}]
In this example, iterative_split is a function that takes a list of data and a group size.
Using Pandas GroupBy
The GroupBy function from Pandas allows for segmenting the data into groups based on some criteria and applying a function to each group independently.
First, make sure Pandas is installed:
pip install pandas
Now, let’s use GroupBy to split our customer data based on their subscription plan.
import pandas as pd
# Converting list to Pandas DataFrame
customers_df = pd.DataFrame(customers)
grouped_customers = customers_df.groupby('plan')
for plan, group in grouped_customers:
print(f"Plan: {plan}")
print(group)
Output:
Plan: Basic
id name plan
1 2 Customer B Basic
4 5 Customer E Basic
7 8 Customer H Basic
Plan: Premium
id name plan
0 1 Customer A Premium
3 4 Customer D Premium
6 7 Customer G Premium
9 10 Customer J Premium
// ... additional plans ...
In this case, groupby('plan') groups the data based on the ‘plan’ column.
Each group then contains only the rows from the DataFrame that share the same plan value.
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.