85

Python 语法速览与实战清单

 6 years ago
source link: https://zhuanlan.zhihu.com/p/31331277?
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Python 语法速览与实战清单

公众号:某熊的技术之路

题注:阿里巴巴南京研发中心成立一年来业务与团队迅速扩展,诚求前端/后端等各路英才。欢迎关注 [阿里南京技术专刊] (https://zhuanlan.zhihu.com/ali-nanjing),也欢迎投递简历,发送简历到 [email protected], 诚邀各路大佬前来指教。

本文是对于 现代 Python 开发:语法基础与工程实践的总结,更多 Python 相关资料参考 Python 学习与实践资料索引;本文参考了 Python Crash Course - Cheat Sheetspysheeet 等。本文仅包含笔者在日常工作中经常使用的,并且认为较为关键的知识点与语法,如果想要进一步学习 Python 相关内容或者对于机器学习与数据挖掘方向感兴趣,可以参考程序猿的数据科学与机器学习实战手册

Python 是一门高阶、动态类型的多范式编程语言;定义 Python 文件的时候我们往往会先声明文件编码方式:

# 指定脚本调用方式
#!/usr/bin/env python
# 配置 utf-8 编码
# -*- coding: utf-8 -*-

# 配置其他编码
# -*- coding: <encoding-name> -*-

# Vim 中还可以使用如下方式
# vim:fileencoding=<encoding-name>

人生苦短,请用 Python,大量功能强大的语法糖的同时让很多时候 Python 代码看上去有点像伪代码。譬如我们用 Python 实现的简易的快排相较于 Java 会显得很短小精悍:

def quicksort(arr):
    if len(arr) <= 1:
        return arr
    pivot = arr[len(arr) / 2]
    left = [x for x in arr if x < pivot]
    middle = [x for x in arr if x == pivot]
    right = [x for x in arr if x > pivot]
    return quicksort(left) + middle + quicksort(right)
    
print quicksort([3,6,8,10,1,2,1])
# Prints "[1, 1, 2, 3, 6, 8, 10]"

控制台交互

可以根据 __name__ 关键字来判断是否是直接使用 python 命令执行某个脚本,还是外部引用;Google 开源的 fire 也是不错的快速将某个类封装为命令行工具的框架:

import fire

class Calculator(object):
  """A simple calculator class."""

  def double(self, number):
    return 2 * number

if __name__ == '__main__':
  fire.Fire(Calculator)

# python calculator.py double 10  # 20
# python calculator.py double --number=15  # 30

Python 2 中 print 是表达式,而 Python 3 中 print 是函数;如果希望在 Python 2 中将 print 以函数方式使用,则需要自定义引入:

from __future__ import print_function

我们也可以使用 pprint 来美化控制台输出内容:

import pprint

stuff = ['spam', 'eggs', 'lumberjack', 'knights', 'ni']
pprint.pprint(stuff)

# 自定义参数
pp = pprint.PrettyPrinter(depth=6)
tup = ('spam', ('eggs', ('lumberjack', ('knights', ('ni', ('dead',('parrot', ('fresh fruit',))))))))
pp.pprint(tup)

Python 中的模块(Module)即是 Python 源码文件,其可以导出类、函数与全局变量;当我们从某个模块导入变量时,函数名往往就是命名空间(Namespace)。而 Python 中的包(Package)则是模块的文件夹,往往由 __init__.py 指明某个文件夹为包:

# 文件目录
someDir/
    main.py
    siblingModule.py

# siblingModule.py

def siblingModuleFun():
    print('Hello from siblingModuleFun')
    
def siblingModuleFunTwo():
    print('Hello from siblingModuleFunTwo')

import siblingModule
import siblingModule as sibMod

sibMod.siblingModuleFun()

from siblingModule import siblingModuleFun
siblingModuleFun()

try:
    # Import 'someModuleA' that is only available in Windows
    import someModuleA
except ImportError:
    try:
        # Import 'someModuleB' that is only available in Linux
        import someModuleB
    except ImportError:

Package 可以为某个目录下所有的文件设置统一入口:

someDir/
    main.py
    subModules/
        __init__.py
        subA.py
        subSubModules/
            __init__.py
            subSubA.py

# subA.py

def subAFun():
    print('Hello from subAFun')
    
def subAFunTwo():
    print('Hello from subAFunTwo')

# subSubA.py

def subSubAFun():
    print('Hello from subSubAFun')
    
def subSubAFunTwo():
    print('Hello from subSubAFunTwo')

# __init__.py from subDir

# Adds 'subAFun()' and 'subAFunTwo()' to the 'subDir' namespace 
from .subA import *

# The following two import statement do the same thing, they add 'subSubAFun()' and 'subSubAFunTwo()' to the 'subDir' namespace. The first one assumes '__init__.py' is empty in 'subSubDir', and the second one, assumes '__init__.py' in 'subSubDir' contains 'from .subSubA import *'.

# Assumes '__init__.py' is empty in 'subSubDir'
# Adds 'subSubAFun()' and 'subSubAFunTwo()' to the 'subDir' namespace
from .subSubDir.subSubA import *

# Assumes '__init__.py' in 'subSubDir' has 'from .subSubA import *'
# Adds 'subSubAFun()' and 'subSubAFunTwo()' to the 'subDir' namespace
from .subSubDir import *
# __init__.py from subSubDir

# Adds 'subSubAFun()' and 'subSubAFunTwo()' to the 'subSubDir' namespace
from .subSubA import *

# main.py

import subDir

subDir.subAFun() # Hello from subAFun
subDir.subAFunTwo() # Hello from subAFunTwo
subDir.subSubAFun() # Hello from subSubAFun
subDir.subSubAFunTwo() # Hello from subSubAFunTwo

表达式与控制流

Python 中使用 if、elif、else 来进行基础的条件选择操作:

if x < 0:
     x = 0
     print('Negative changed to zero')
 elif x == 0:
     print('Zero')
 else:
     print('More')

Python 同样支持 ternary conditional operator:

a if condition else b

也可以使用 Tuple 来实现类似的效果:

# test 需要返回 True 或者 False
(falseValue, trueValue)[test]

# 更安全的做法是进行强制判断
(falseValue, trueValue)[test == True]

# 或者使用 bool 类型转换函数
(falseValue, trueValue)[bool(<expression>)]

for-in 可以用来遍历数组与字典:

words = ['cat', 'window', 'defenestrate']

for w in words:
    print(w, len(w))

# 使用数组访问操作符,能够迅速地生成数组的副本
for w in words[:]:
    if len(w) > 6:
        words.insert(0, w)

# words -> ['defenestrate', 'cat', 'window', 'defenestrate']

如果我们希望使用数字序列进行遍历,可以使用 Python 内置的 range 函数:

a = ['Mary', 'had', 'a', 'little', 'lamb']

for i in range(len(a)):
    print(i, a[i])

基本数据类型

可以使用内建函数进行强制类型转换(Casting):

int(str)
float(str)
str(int)
str(float)

Number: 数值类型

x = 3
print type(x) # Prints "<type 'int'>"
print x       # Prints "3"
print x + 1   # Addition; prints "4"
print x - 1   # Subtraction; prints "2"
print x * 2   # Multiplication; prints "6"
print x ** 2  # Exponentiation; prints "9"
x += 1
print x  # Prints "4"
x *= 2
print x  # Prints "8"
y = 2.5
print type(y) # Prints "<type 'float'>"
print y, y + 1, y * 2, y ** 2 # Prints "2.5 3.5 5.0 6.25"

Python 提供了常见的逻辑操作符,不过需要注意的是 Python 中并没有使用 &&、|| 等,而是直接使用了英文单词。

t = True
f = False
print type(t) # Prints "<type 'bool'>"
print t and f # Logical AND; prints "False"
print t or f  # Logical OR; prints "True"
print not t   # Logical NOT; prints "False"
print t != f  # Logical XOR; prints "True" 

String: 字符串

Python 2 中支持 Ascii 码的 str() 类型,独立的 unicode() 类型,没有 byte 类型;而 Python 3 中默认的字符串为 utf-8 类型,并且包含了 byte 与 bytearray 两个字节类型:

type("Guido") # string type is str in python2
# <type 'str'>

# 使用 __future__ 中提供的模块来降级使用 Unicode
from __future__ import unicode_literals
type("Guido") # string type become unicode
# <type 'unicode'>

Python 字符串支持分片、模板字符串等常见操作:

var1 = 'Hello World!'
var2 = "Python Programming"

print "var1[0]: ", var1[0]
print "var2[1:5]: ", var2[1:5]
# var1[0]:  H
# var2[1:5]:  ytho

print "My name is %s and weight is %d kg!" % ('Zara', 21)
# My name is Zara and weight is 21 kg!
str[0:4]
len(str)

string.replace("-", " ")
",".join(list)
"hi {0}".format('j')
str.find(",")
str.index(",")   # same, but raises IndexError
str.count(",")
str.split(",")

str.lower()
str.upper()
str.title()

str.lstrip()
str.rstrip()
str.strip()

str.islower()
# 移除所有的特殊字符
re.sub('[^A-Za-z0-9]+', '', mystring) 

如果需要判断是否包含某个子字符串,或者搜索某个字符串的下标:

# in 操作符可以判断字符串
if "blah" not in somestring: 
    continue

# find 可以搜索下标
s = "This be a string"
if s.find("is") == -1:
    print "No 'is' here!"
else:
    print "Found 'is' in the string."

Regex: 正则表达式

import re

# 判断是否匹配
re.match(r'^[aeiou]', str)

# 以第二个参数指定的字符替换原字符串中内容
re.sub(r'^[aeiou]', '?', str)
re.sub(r'(xyz)', r'\1', str)

# 编译生成独立的正则表达式对象
expr = re.compile(r'^...$')
expr.match(...)
expr.sub(...)

下面列举了常见的表达式使用场景:

# 检测是否为 HTML 标签
re.search('<[^/>][^>]*>', '<a href="#label">')

# 常见的用户名密码
re.match('^[a-zA-Z0-9-_]{3,16}$', 'Foo') is not None
re.match('^\w|[-_]{3,16}$', 'Foo') is not None

# Email
re.match('^([a-z0-9_\.-]+)@([\da-z\.-]+)\.([a-z\.]{2,6})$', '[email protected]')

# Url
exp = re.compile(r'''^(https?:\/\/)? # match http or https
                ([\da-z\.-]+)            # match domain
                \.([a-z\.]{2,6})         # match domain
                ([\/\w \.-]*)\/?$        # match api or file
                ''', re.X)
exp.match('www.google.com')

# IP 地址
exp = re.compile(r'''^(?:(?:25[0-5]
                     |2[0-4][0-9]
                     |[1]?[0-9][0-9]?)\.){3}
                     (?:25[0-5]
                     |2[0-4][0-9]
                     |[1]?[0-9][0-9]?)$''', re.X)
exp.match('192.168.1.1')

List: 列表

Operation: 创建增删

list 是基础的序列类型:

l = []
l = list()

# 使用字符串的 split 方法,可以将字符串转化为列表
str.split(".")

# 如果需要将数组拼装为字符串,则可以使用 join 
list1 = ['1', '2', '3']
str1 = ''.join(list1)

# 如果是数值数组,则需要先进行转换
list1 = [1, 2, 3]
str1 = ''.join(str(e) for e in list1)

可以使用 append 与 extend 向数组中插入元素或者进行数组连接

x = [1, 2, 3]

x.append([4, 5]) # [1, 2, 3, [4, 5]]

x.extend([4, 5]) # [1, 2, 3, 4, 5],注意 extend 返回值为 None

可以使用 pop、slices、del、remove 等移除列表中元素:

myList = [10,20,30,40,50]

# 弹出第二个元素
myList.pop(1) # 20
# myList: myList.pop(1)

# 如果不加任何参数,则默认弹出最后一个元素
myList.pop()

# 使用 slices 来删除某个元素
a = [  1, 2, 3, 4, 5, 6 ]
index = 3 # Only Positive index
a = a[:index] + a[index+1 :]

# 根据下标删除元素
myList = [10,20,30,40,50]
rmovIndxNo = 3
del myList[rmovIndxNo] # myList: [10, 20, 30, 50]

# 使用 remove 方法,直接根据元素删除
letters = ["a", "b", "c", "d", "e"]
numbers.remove(numbers[1])
print(*letters) # used a * to make it unpack you don't have to

Iteration: 索引遍历

你可以使用基本的 for 循环来遍历数组中的元素,就像下面介个样纸:

animals = ['cat', 'dog', 'monkey']
for animal in animals:
    print animal
# Prints "cat", "dog", "monkey", each on its own line.

如果你在循环的同时也希望能够获取到当前元素下标,可以使用 enumerate 函数:

animals = ['cat', 'dog', 'monkey']
for idx, animal in enumerate(animals):
    print '#%d: %s' % (idx + 1, animal)
# Prints "#1: cat", "#2: dog", "#3: monkey", each on its own line

Python 也支持切片(Slices):

nums = range(5)    # range is a built-in function that creates a list of integers
print nums         # Prints "[0, 1, 2, 3, 4]"
print nums[2:4]    # Get a slice from index 2 to 4 (exclusive); prints "[2, 3]"
print nums[2:]     # Get a slice from index 2 to the end; prints "[2, 3, 4]"
print nums[:2]     # Get a slice from the start to index 2 (exclusive); prints "[0, 1]"
print nums[:]      # Get a slice of the whole list; prints ["0, 1, 2, 3, 4]"
print nums[:-1]    # Slice indices can be negative; prints ["0, 1, 2, 3]"
nums[2:4] = [8, 9] # Assign a new sublist to a slice
print nums         # Prints "[0, 1, 8, 9, 4]"

Comprehensions: 变换

Python 中同样可以使用 map、reduce、filter,map 用于变换数组:

# 使用 map 对数组中的每个元素计算平方
items = [1, 2, 3, 4, 5]
squared = list(map(lambda x: x**2, items))

# map 支持函数以数组方式连接使用
def multiply(x):
    return (x*x)
def add(x):
    return (x+x)

funcs = [multiply, add]
for i in range(5):
    value = list(map(lambda x: x(i), funcs))
    print(value)

reduce 用于进行归纳计算:

# reduce 将数组中的值进行归纳

from functools import reduce
product = reduce((lambda x, y: x * y), [1, 2, 3, 4])

# Output: 24

filter 则可以对数组进行过滤:

number_list = range(-5, 5)
less_than_zero = list(filter(lambda x: x < 0, number_list))
print(less_than_zero)

# Output: [-5, -4, -3, -2, -1]
d = {'cat': 'cute', 'dog': 'furry'}  # 创建新的字典
print d['cat']       # 字典不支持点(Dot)运算符取值

如果需要合并两个或者多个字典类型:

# python 3.5
z = {**x, **y}

# python 2.7
def merge_dicts(*dict_args):
    """
    Given any number of dicts, shallow copy and merge into a new dict,
    precedence goes to key value pairs in latter dicts.
    """
    result = {}
    for dictionary in dict_args:
        result.update(dictionary)
    return result

可以根据键来直接进行元素访问:

# Python 中对于访问不存在的键会抛出 KeyError 异常,需要先行判断或者使用 get
print 'cat' in d     # Check if a dictionary has a given key; prints "True"

# 如果直接使用 [] 来取值,需要先确定键的存在,否则会抛出异常
print d['monkey']  # KeyError: 'monkey' not a key of d

# 使用 get 函数则可以设置默认值
print d.get('monkey', 'N/A')  # Get an element with a default; prints "N/A"
print d.get('fish', 'N/A')    # Get an element with a default; prints "wet"


d.keys() # 使用 keys 方法可以获取所有的键

可以使用 for-in 来遍历数组:

# 遍历键
for key in d:

# 比前一种方式慢
for k in dict.keys(): ...

# 直接遍历值
for value in dict.itervalues(): ...

# Python 2.x 中遍历键值
for key, value in d.iteritems():

# Python 3.x 中遍历键值
for key, value in d.items():

其他序列类型

# Same as {"a", "b","c"}
normal_set = set(["a", "b","c"])
 
# Adding an element to normal set is fine
normal_set.add("d")
 
print("Normal Set")
print(normal_set)
 
# A frozen set
frozen_set = frozenset(["e", "f", "g"])
 
print("Frozen Set")
print(frozen_set)
 
# Uncommenting below line would cause error as
# we are trying to add element to a frozen set
# frozen_set.add("h")

Python 中的函数使用 def 关键字进行定义,譬如:

def sign(x):
    if x > 0:
        return 'positive'
    elif x < 0:
        return 'negative'
    else:
        return 'zero'


for x in [-1, 0, 1]:
    print sign(x)
# Prints "negative", "zero", "positive"

Python 支持运行时创建动态函数,也即是所谓的 lambda 函数:

def f(x): return x**2

# 等价于
g = lambda x: x**2

Option Arguments: 不定参数

def example(a, b=None, *args, **kwargs):
  print a, b
  print args
  print kwargs

example(1, "var", 2, 3, word="hello")
# 1 var
# (2, 3)
# {'word': 'hello'}

a_tuple = (1, 2, 3, 4, 5)
a_dict = {"1":1, "2":2, "3":3}
example(1, "var", *a_tuple, **a_dict)
# 1 var
# (1, 2, 3, 4, 5)
# {'1': 1, '2': 2, '3': 3}
def simple_generator_function():
    yield 1
    yield 2
    yield 3

for value in simple_generator_function():
    print(value)

# 输出结果
# 1
# 2
# 3
our_generator = simple_generator_function()
next(our_generator)
# 1
next(our_generator)
# 2
next(our_generator)
#3

# 生成器典型的使用场景譬如无限数组的迭代
def get_primes(number):
    while True:
        if is_prime(number):
            yield number
        number += 1

装饰器是非常有用的设计模式:

# 简单装饰器

from functools import wraps
def decorator(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        print('wrap function')
        return func(*args, **kwargs)
    return wrapper

@decorator
def example(*a, **kw):
    pass

example.__name__  # attr of function preserve
# 'example'
# Decorator 

# 带输入值的装饰器

from functools import wraps
def decorator_with_argument(val):
  def decorator(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
      print "Val is {0}".format(val)
      return func(*args, **kwargs)
    return wrapper
  return decorator

@decorator_with_argument(10)
def example():
  print "This is example function."

example()
# Val is 10
# This is example function.

# 等价于

def example():
  print "This is example function."

example = decorator_with_argument(10)(example)
example()
# Val is 10
# This is example function.

Python 中对于类的定义也很直接:

class Greeter(object):
    
    # Constructor
    def __init__(self, name):
        self.name = name  # Create an instance variable
        
    # Instance method
    def greet(self, loud=False):
        if loud:
            print 'HELLO, %s!' % self.name.upper()
        else:
            print 'Hello, %s' % self.name
        
g = Greeter('Fred')  # Construct an instance of the Greeter class
g.greet()            # Call an instance method; prints "Hello, Fred"
g.greet(loud=True)   # Call an instance method; prints "HELLO, FRED!"
# isinstance 方法用于判断某个对象是否源自某个类
ex = 10
isinstance(ex,int)

Managed Attributes: 受控属性

# property、setter、deleter 可以用于复写点方法

class Example(object):
    def __init__(self, value):
       self._val = value
    @property
    def val(self):
        return self._val
    @val.setter
    def val(self, value):
        if not isintance(value, int):
            raise TypeError("Expected int")
        self._val = value
    @val.deleter
    def val(self):
        del self._val
    @property
    def square3(self):
        return 2**3

ex = Example(123)
ex.val = "str"
# Traceback (most recent call last):
#   File "", line 1, in
#   File "test.py", line 12, in val
#     raise TypeError("Expected int")
# TypeError: Expected int

类方法与静态方法

class example(object):
  @classmethod
  def clsmethod(cls):
    print "I am classmethod"
  @staticmethod
  def stmethod():
    print "I am staticmethod"
  def instmethod(self):
    print "I am instancemethod"

ex = example()
ex.clsmethod()
# I am classmethod
ex.stmethod()
# I am staticmethod
ex.instmethod()
# I am instancemethod
example.clsmethod()
# I am classmethod
example.stmethod()
# I am staticmethod
example.instmethod()
# Traceback (most recent call last):
#   File "", line 1, in
# TypeError: unbound method instmethod() ...

Python 中对象的属性不同于字典键,可以使用点运算符取值,直接使用 in 判断会存在问题:

class A(object):
    @property
    def prop(self):
        return 3

a = A()
print "'prop' in a.__dict__ =", 'prop' in a.__dict__
print "hasattr(a, 'prop') =", hasattr(a, 'prop')
print "a.prop =", a.prop

# 'prop' in a.__dict__ = False
# hasattr(a, 'prop') = True
# a.prop = 3

建议使用 hasattr、getattr、setattr 这种方式对于对象属性进行操作:

class Example(object):
  def __init__(self):
    self.name = "ex"
  def printex(self):
    print "This is an example"


# Check object has attributes
# hasattr(obj, 'attr')
ex = Example()
hasattr(ex,"name")
# True
hasattr(ex,"printex")
# True
hasattr(ex,"print")
# False

# Get object attribute
# getattr(obj, 'attr')
getattr(ex,'name')
# 'ex'

# Set object attribute
# setattr(obj, 'attr', value)
setattr(ex,'name','example')
ex.name
# 'example'

异常与测试

Context Manager - with

with 常用于打开或者关闭某些资源:

host = 'localhost'
port = 5566
with Socket(host, port) as s:
    while True:
        conn, addr = s.accept()
        msg = conn.recv(1024)
        print msg
        conn.send(msg)
        conn.close()
from __future__ import print_function

import unittest

def fib(n):
    return 1 if n<=2 else fib(n-1)+fib(n-2)

def setUpModule():
        print("setup module")
def tearDownModule():
        print("teardown module")

class TestFib(unittest.TestCase):

    def setUp(self):
        print("setUp")
        self.n = 10
    def tearDown(self):
        print("tearDown")
        del self.n
    @classmethod
    def setUpClass(cls):
        print("setUpClass")
    @classmethod
    def tearDownClass(cls):
        print("tearDownClass")
    def test_fib_assert_equal(self):
        self.assertEqual(fib(self.n), 55)
    def test_fib_assert_true(self):
        self.assertTrue(fib(self.n) == 55)

if __name__ == "__main__":
    unittest.main()

Python 内置的 __file__ 关键字会指向当前文件的相对路径,可以根据它来构造绝对路径,或者索引其他文件:

# 获取当前文件的相对目录
dir = os.path.dirname(__file__) # src\app

## once you're at the directory level you want, with the desired directory as the final path node:
dirname1 = os.path.basename(dir) 
dirname2 = os.path.split(dir)[1] ## if you look at the documentation, this is exactly what os.path.basename does.

# 获取当前代码文件的绝对路径,abspath 会自动根据相对路径与当前工作空间进行路径补全
os.path.abspath(os.path.dirname(__file__)) # D:\WorkSpace\OWS\tool\ui-tool-svn\python\src\app

# 获取当前文件的真实路径
os.path.dirname(os.path.realpath(__file__)) # D:\WorkSpace\OWS\tool\ui-tool-svn\python\src\app

# 获取当前执行路径
os.getcwd()

可以使用 listdir、walk、glob 模块来进行文件枚举与检索:

# 仅列举所有的文件
from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

# 使用 walk 递归搜索
from os import walk

f = []
for (dirpath, dirnames, filenames) in walk(mypath):
    f.extend(filenames)
    break

# 使用 glob 进行复杂模式匹配
import glob
print(glob.glob("/home/adam/*.txt"))
# ['/home/adam/file1.txt', '/home/adam/file2.txt', .... ]

简单文件读写

# 可以根据文件是否存在选择写入模式
mode = 'a' if os.path.exists(writepath) else 'w'

# 使用 with 方法能够自动处理异常
with open("file.dat",mode) as f:
    f.write(...)
    ...
    # 操作完毕之后记得关闭文件
    f.close()

# 读取文件内容
message = f.read()

复杂格式文件

import json

# Writing JSON data
with open('data.json', 'w') as f:
     json.dump(data, f)

# Reading data back
with open('data.json', 'r') as f:
     data = json.load(f)

我们可以使用 lxml 来解析与处理 XML 文件,本部分即对其常用操作进行介绍。lxml 支持从字符串或者文件中创建 Element 对象:

from lxml import etree

# 可以从字符串开始构造
xml = '<a xmlns="test"><b xmlns="test"/></a>'
root = etree.fromstring(xml)
etree.tostring(root)
# b'<a xmlns="test"><b xmlns="test"/></a>'

# 也可以从某个文件开始构造
tree = etree.parse("doc/test.xml")

# 或者指定某个 baseURL
root = etree.fromstring(xml, base_url="http://where.it/is/from.xml")

其提供了迭代器以对所有元素进行遍历:

# 遍历所有的节点
for tag in tree.iter():
    if not len(tag):
        print tag.keys() # 获取所有自定义属性
        print (tag.tag, tag.text) # text 即文本子元素值

# 获取 XPath
for e in root.iter():
    print tree.getpath(e)

lxml 支持以 XPath 查找元素,不过需要注意的是,XPath 查找的结果是数组,并且在包含命名空间的情况下,需要指定命名空间:

root.xpath('//page/text/text()',ns={prefix:url})

# 可以使用 getparent 递归查找父元素
el.getparent()

lxml 提供了 insert、append 等方法进行元素操作:

# append 方法默认追加到尾部
st = etree.Element("state", name="New Mexico")
co = etree.Element("county", name="Socorro")
st.append(co)

# insert 方法可以指定位置
node.insert(0, newKid)

Excel

可以使用 [xlrd]() 来读取 Excel 文件,使用 xlsxwriter 来写入与操作 Excel 文件。

# 读取某个 Cell 的原始值
sh.cell(rx, col).value
# 创建新的文件
workbook = xlsxwriter.Workbook(outputFile)
worksheet = workbook.add_worksheet()

# 设置从第 0 行开始写入
row = 0

# 遍历二维数组,并且将其写入到 Excel 中
for rowData in array:
    for col, data in enumerate(rowData):
        worksheet.write(row, col, data)
    row = row + 1

workbook.close()

对于高级的文件操作,我们可以使用 Python 内置的 shutil

# 递归删除 appName 下面的所有的文件夹
shutil.rmtree(appName)

Requests

Requests 是优雅而易用的 Python 网络请求库:

import requests

r = requests.get('https://api.github.com/events')
r = requests.get('https://api.github.com/user', auth=('user', 'pass'))

r.status_code
# 200
r.headers['content-type']
# 'application/json; charset=utf8'
r.encoding
# 'utf-8'
r.text
# u'{"type":"User"...'
r.json()
# {u'private_gists': 419, u'total_private_repos': 77, ...}

r = requests.put('http://httpbin.org/put', data = {'key':'value'})
r = requests.delete('http://httpbin.org/delete')
r = requests.head('http://httpbin.org/get')
r = requests.options('http://httpbin.org/get')

MySQL

import pymysql.cursors

# Connect to the database
connection = pymysql.connect(host='localhost',
                             user='user',
                             password='passwd',
                             db='db',
                             charset='utf8mb4',
                             cursorclass=pymysql.cursors.DictCursor)

try:
    with connection.cursor() as cursor:
        # Create a new record
        sql = "INSERT INTO `users` (`email`, `password`) VALUES (%s, %s)"
        cursor.execute(sql, ('[email protected]', 'very-secret'))

    # connection is not autocommit by default. So you must commit to save
    # your changes.
    connection.commit()

    with connection.cursor() as cursor:
        # Read a single record
        sql = "SELECT `id`, `password` FROM `users` WHERE `email`=%s"
        cursor.execute(sql, ('[email protected]',))
        result = cursor.fetchone()
        print(result)
finally:
    connection.close() 




About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK