3

Python 数据处理程序内存异常

 2 years ago
source link: https://www.v2ex.com/t/814087
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

V2EX  ›  Python

Python 数据处理程序内存异常

  CaptainD · 8 小时 18 分钟前 · 136 次点击
  • 请教各位 V 友一个问题,本人需要增量处理一些大型的 XML 文件,从 python-cookbook 上找到了代码,我改到了我的场景下,但是代码似乎没有正常工作,内存占用上升很快,大约处理十几万行会占用几个 g 内存,我不太理解,希望大神指点,主要逻辑代码如下

  • macOS BigSur

  • python 3.8.12


from xml.etree.ElementTree import iterparse
def parse_and_remove(filename, path):
    path_parts = path.split('/')
    doc = iterparse(filename, ('start', 'end'))
    # Skip the root element
    next(doc)
    tag_stack = []
    elem_stack = []
    for event, elem in doc:
        if event == 'start':
            tag_stack.append(elem.tag)
            elem_stack.append(elem)
        elif event == 'end':
            if tag_stack == path_parts:
                yield elem
                elem_stack[-2].remove(elem)
            try:
                tag_stack.pop()
                elem_stack.pop()
            except IndexError:
                pass
data = parse_and_remove('my.xml','path')
client, table = getMongo()

for pothole in data:
    resDict = {
        # 获取我需要的数据
        } 

    table.insert(resDict)
client.close()

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK