How to Merge Multiple Excel Files Using Python

Written by luca1iu | Published 2023/12/15
Tech Story Tags: data-analysis | merge-excel-files | python-for-excel | excel-tutorial | excel-hacks | excel-for-data-analysis | merge-excel-files-with-python

TLDRTired of wrestling with multiple Excel files? Python comes to the rescue! Explore the simplicity of merging Excel files using Python's pandas and os modules. The provided code effortlessly navigates through files in a specified folder, combining them into a structured 'merged_excel.xlsx' file. Whether you're dealing with straightforward data merging or nuanced scenarios like date-driven filenames, Python offers a powerful solution. Elevate your data management game and transform chaos into order with this comprehensive guide.via the TL;DR App

In today's data-driven landscape, consolidating information scattered across multiple Excel files is a common challenge. Fortunately, Python provides powerful tools to streamline this process, offering a more efficient way to merge and organize data. In this guide, we'll explore how to leverage Python's capabilities to effortlessly combine multiple Excel files into a cohesive and structured dataset.

In the world of data, things get messy when you have multiple Excel files with the same columns. If you're looking to tidy up and merge them all into one file, especially when you need to go through them one by one, you're in the right place. This guide will show you an easy way to make sense of it all and keep your data organized.

Python Solution

Merge multiple Excel files effortlessly with this Python code! 🚀 Using pandas and os, the script navigates through files in a specified folder, combining them into a neat merged_excel.xlsx file.

Just plug in your folder path, run the code, and voila – streamlined data!

# import packages
import pandas as pd
import os

# Define a function 'append' to merge Excel files in a specified path
def append(path):
	# Create an empty list to store individual DataFrames    
	frames = []    
	for root, dirs, files in os.walk(path):        
		for file in files:            
				file_with_path = os.path.join(root, file)          
				df = pd.read_excel(file_with_path)      
				frames.append(df)    
	df = pd.concat(frames, axis=0)    
	return df

# path:The folder path where storage all the excel files 
df = append(path)  
df.to_excel("merged_excel.xlsx")

In this code snippet, we're using two powerful tools: pandas and os (a module for working with the operating system).

The append function is the star here. It digs through all the Excel files in a specified folder ('path') and collects them into a DataFrame, which is like a neat table for our data.

Now, for the magic moment: the last two lines! They use our append function to merge all the Excel data in the specified folder into one consolidated file called merged_excel.xlsx.


Bonus: Handling Unique Data Situations

While we've explored solutions for straightforward data merging, let's navigate a slight detour to address more nuanced scenarios.

Picture this: your Excel files carry distinct dates in their names, demanding not just consolidation but a thoughtful integration into a final summary so that you can know the date of data in the final version.

Situation 1: Data-Driven Excel Filenames: Crafting a Comprehensive Summary Excel

For example, if the file name follows a pattern like "filename_YYYY-MM-DD.xlsx", and you want to extract both the date and the filename, you can adapt the code as follows:

import pandas as pd
import os

def append(path):
    frames = []  # Create an empty list to store individual DataFrames
    for root, dirs, files in os.walk(path):
        for file in files:
            file_with_path = os.path.join(root, file)
            
            # Extract filename and date information from the file name
            file_info = os.path.splitext(file)[0].split('_')
            
            # Read each Excel file using pandas
            df = pd.read_excel(file_with_path)
            
            # Add filename and date columns to the DataFrame
            df["filename"] = file_info[0]  # Assuming filename is the first part
            df["date"] = pd.to_datetime(file_info[1])  # Assuming date is the second part
            
            # Append the DataFrame to the list
            frames.append(df)
    
    # Concatenate all DataFrames in the list along the rows
    df = pd.concat(frames, axis=0)
    
    return df

Situation 2: Streamlining Data: Merging Multiple Excel Files into One with Distinct Sheets Named After Each File

Now, let's pivot to another scenario. Suppose your goal is a consolidated master file, but with a twist – each file contributes to a separate sheet, named after its original file.

import pandas as pd
import os

def combine(path):
    with pd.ExcelWriter("merged_excel.xlsx") as writer:
        for root, dirs, files in os.walk(path):
            for file in files:
                filename = os.path.join(root, file)
                df = pd.read_excel(filename)
                # Delete the file name suffix, sometimes it could be xlsv/.xlsx
                df.to_excel(writer, sheet_name=file.strip(
                    '.csv'))  
        return df


Written by luca1iu | a Business Intelligence Developer with a passion for all things data
Published by HackerNoon on 2023/12/15