25

CSV导入MySQL数据库被意外截断解决

 4 years ago
source link: https://www.taterli.com/6240/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

如下图,就可能会被意外截断.

Bn6feaF.png!web

想到的解决办法有几个.

  • 用JSON保存爬虫数据,但是导入MySQL需要自己处理.
  • 在爬取时候替换逗号为中文逗号,会降低爬虫效率.
  • 边爬取边入库,会使用数据库资源,成本高.
  • 对现有CSV文件进行处理,用Python MySQL方式入库,入库效率较低.
  • 对现有CSV文件处理,然后修改替换入库,效率最高,任务分离,我选择这种方法.

代码如下,

import csv

with open('./ScrapyBooks-all.csv', newline='', encoding='utf-8') as f,open('./ScrapyBooks-new.csv', 'w', newline='', encoding='utf-8') as wf:
    reader = csv.reader(f)
    writer = csv.writer(wf)
    for row in reader:
        row[0] = row[0].replace(',',',')
        writer.writerow(row)
        wf.flush()

虽然只是简单的字符串替换,但是依然吃光CPU,然后替换完了,再导入这个文件,这个问题自然就解决了… (其他问题,遇到再看)


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK