Tuesday 9 March 2021

Concatenate large csv files using python

 If your csv files are very large say more than 100MB, then it would be very difficult to concatenate the csv files using conventional methods.

In Python we can use the library shutil to concatenate multiple large csv files into a single csv file.

Sample code is

import shutil
csv_files = ['source1.csv', 'source2.csv', 'source3.csv', 'source4.csv', 'source5.csv']
target_file_name = 'dest.csv';
shutil.copy(csv_files[0], target_file_name)
with open(target_file_name, 'a') as out_file:
for source_file in csv_files[1:]:
with open(source_file, 'r') as in_file:
# if your csv doesn't contains header, then remove the following line.
in_file.readline()
shutil.copyfileobj(in_file, out_file)
in_file.close()
out_file.close()
Image for post

To Test the code, download some sample large csv file (eg : https://www.stats.govt.nz/large-datasets/csv-files-for-download/)

Then make some copies of same files and run the above program.


from : https://medium.com/@princekfrancis/concatenate-large-csv-files-using-python-7e155e70f643

No comments:

Post a Comment