How to split a large CSV file based on the number of rows 🔪
source link: https://dev.to/ranb2002/how-to-split-a-large-csv-file-based-on-the-number-of-rows-312o
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
How to split a large CSV file based on the number of rows 🔪
Jul 22 Originally published at benjaminrancourt.ca on Jun 24
・2 min read
Recently I had to import CSV files into an external application at work, but the application was not able to process all the rows because it was triggering timeouts after a few minutes. The files weren't exactly huge, they were only 8MB and maxed out at 200,000 lines, but that was too much for this application... 😢
I had no control over the external software, but I needed smaller files. I could have adapted the application that creates them, but it would have required fetching the data from several APIs again and it was taking a long time. ⌛
Instead, I decided to try splitting the files directly. After a quick search on a search engine, I came accros a Gist in Python that does exactly what I wanted. My CSV file had semicolons (;
) instead of commas (,
), so I had to adapt its script a bit to accommodate this (delimiter=';'
).
Here is my adapted version, also saved in a forked Gist, in case I need it later:
import csv
import sys
import os
# example usage: python split.py example.csv 200
# above command would split the `example.csv` into smaller CSV files of 200 rows each (with header included)
# if example.csv has 401 rows for instance, this creates 3 files in same directory:
# - `example_1.csv` (row 1 - 200)
# - `example_2.csv` (row 201 - 400)
# - `example_3.csv` (row 401)
CURRENT_DIR = os.path.dirname(os.path.realpath( __file__ ))
filename = sys.argv[1]
full_file_path = os.path.join(CURRENT_DIR, filename)
file_name = os.path.splitext(full_file_path)[0]
rows_per_csv = int(sys.argv[2]) if len(sys.argv) > 2 else 5000
with open(filename) as infile:
reader = csv.DictReader(infile, delimiter=';')
header = reader.fieldnames
rows = [row for row in reader]
pages = []
row_count = len(rows)
start_index = 0
# here, we slice the total rows into pages, each page having [row_per_csv] rows
while start_index < row_count:
pages.append(rows[start_index: start_index+rows_per_csv])
start_index += rows_per_csv
for i, page in enumerate(pages):
with open('{}_{}.csv'.format(file_name, i+1), 'w+') as outfile:
writer = csv.DictWriter(outfile, fieldnames=header, delimiter=';')
writer.writeheader()
for row in page:
writer.writerow(row)
print('DONE splitting {} into {} files'.format(filename, len(pages)))
Thanks to Kelvin Tay for his script, it saved me a lot of time! 🥳
Recommend
-
8
SQL Authority with Pinal DaveSQL SERVER – Number of Rows Read – Execution PlanRecently one of the clients sent me the following two images from the execution plan and his question was about the Number of Rows Read...
-
4
OneDrive for Business “deleted a large number of files” email notification Hey folks, look at what landed in my Inbox yesterday: This is not a spam or a clever phishing message, it’s a legitimate email...
-
0
I can not enter a large decimal number in Excel advertisements I am using Python and I use this method to enter numbers in an Excel sheet:...
-
4
Bash - Delete rows that do not end with a number advertisements I need to remove all lines (in text file) not ending with a number. Bef...
-
3
Get the number of rows for a parquet file We were using Pandas to get the number of rows for a parquet file: import pandas as pd df = pd.read_parquet("my.parqu...
-
8
jQuery datatables - display the number of rows on the array advertisements I have a jQuery datatable on my view...
-
1
How to Insert Thousands of Rows from an Excel / CSV File into a database with a single database call ...
-
9
How to Count Number of Rows in a Table Using jQuery 1142 views 2 years ago jQuery Use the length
-
8
Adobe Livecycle Designer (number of rows in the table) Skip to Content...
-
1
How to split off an older copy of a file while preserving git line history
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK