2

python正则表达式替换字符串

 3 years ago
source link: https://www.the5fire.com/python-re-str-replace.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

python正则表达式替换字符串

作者:the5fire | 标签: html格式化  python正则  字符串替换  | 发布:2012-02-26 11:29 p.m. | 阅读量: 42248, 40804

一个实例,有一段html代码,不符合xml格式规范,所以要用python对它进行下转换。 其中要转换的地方有:

1、[&]转为[&](但是不能把[ ]转了);
2、把代码中的["=""]去掉;
3、把[svg]和[path]标签都改为[svg:svg]和[svg:path];
4、关闭[img]标签;
5、将url()中的["]转为[']

使用正则对html进行了处理。 下面就是代码:

.. code:: python

import re
str_url = 'test, url("http://www.baidu.com")&,dddddd "="" <svg></svg><path></path><img src="http://www.baidu.com">ininnnin<img src="http://www.dd.com">'
#2、把代码中的["=""]去掉;
#3、把[svg]和[path]标签都改为[svg:svg]和[svg:path];
str_url = str_url.replace('"=""','')
str_url = str_url.replace('svg','svg:svg')
str_url = str_url.replace('path', 'svg:path')

#1、[&]转为[&](但是不能把[ ]转了);
url_re = re.compile('&(?!\w{4};)')
str_result = url_re.sub('&', str_url)

#4、关闭[img]标签;
img_list = re.findall('<img.*?>',str_result)

for img_r in img_list:
    str_result = str_result.replace(img_r,img_r + '</img>')

#5、将url()中的["]转为[']
url_list = re.findall('url\(".*?"\)',str_result)
print url_list
for url_r in url_list:
    url_new = url_r.replace('"','\'')
    str_result = str_result.replace(url_r,url_new)
print str_result
- from the5fire.com
----EOF-----

微信公众号:Python程序员杂谈

django_source_inside_video_.png

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK