10

聚合函数处理excel数据------(一)

 3 years ago
source link: https://blog.csdn.net/weixin_50590724/article/details/112208821
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
import numpy as np
import pandas as pd

一.transform 聚合函数

在pandas 或者numpy 中没有现成的函数可以使用,可以通过transform

data=pd.read_excel(r'地址')

# 写法一:
data2=data['counts'].transform(lambda x:x*2)
# 写法二:
def transforml(value):
	values=value*2
	return values
data2=data['counts'].transform(transforml)
groupby(by='columns')

三.透视表

'''
# pd.pivot_table(数据表,index索引,一系列参数)
参数:
index: 行分组键,分完组以后,分组键的取值在行索引的位置
aggfunc: 聚合函数==>和agg方法一致
values:指定想要进行分组的列
columns:指定想要进行分组的行
fill_value:将数据为nan的值填充为对应的值
margins:表示汇总开关,默认是false,当margins为True时,添加汇总的行与列
margins_name:'All'  汇总的列或者行columns,index的索引

data3=pd.pivot_table(data,index=['order_id','amounts'],aggfunc=[np.mean,np.sum],values=['counts','add_inprice']).head()
data4=pd.pivot_table(data,columns=['order_id'],aggfunc=[np.mean],values=['counts','add_inprice']).head()
# print(data3)

# 透视表查找数据
pd.pivot_table(data,
               columns='order_id',
               index='dishes_name',
               values='counts',
               aggfunc=np.sum,
               fill_value='ss'
)
'''

四.交叉表

data5=pd.crosstab(
    index=data['order_id'],
    columns=data['dishes_name'],
    values=data['counts'],
    aggfunc=np.sum
)
# print(data5)

五.表格合并方法

(1)concat数据合并

# pd.concat((表一,表二,表三),axis=0)  -----纵向拼接,

left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                         'A': ['A0', 'A1', 'A2', 'A3'],
                        'B': ['B0', 'B1', 'B2', 'B3']})

right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                      'key2': ['K0', 'K0', 'K0', 'K0'],
                         'C': ['C0', 'C1', 'C2', 'C3'],
                         'D': ['D0', 'D1', 'D2', 'D3']})
# print(left)
# print(right)
data6=pd.concat((left,right),
                axis=0,
                join='outer'    # outer外连接-求并集, inner内连接-求交集
                # verify_integrity= False   # 检查索引是否有重复项,没有不报错,有重复项会报错

                )
# print(data6)

(2)表合并,解决行索引没有意义的情况下,数据不匹配问题(解决concat中横向拼接问题)

# merge 主键合并方法,
# on=主键--必须是两张表共有的参数才能作为主键, 主键可以传两个参数
# how=inner取交集,outer取并集,没有用Nan补齐
# 左连接-左表作为主表,最终表包含左表的所有数据,只取右表中含左表的部分
# pd.merge(left,right,on=['name','telephone'],how='right')
'''当两个表中的主键不一样时,left_on=,right_on=,一一对应     '''

# 补充:   更改表格名称的方法
left2=left.rename(columns={'key1':'key222','key2':'dsjjdj'},
            inplace=False     # 是否对原表进行修改,默认False
            )

# print(left2)

六.重叠合并

df1作为主表,用df2填充df1,当df2与df1不一致时默认主表数据

dict1 = {'ID':[1,2,3,4,5,6,7,8,9],
         'System':['W10','w10',np.nan,'w10',np.nan,np.nan,'w7','w7','w8']}

dict2 = {'ID':[1,2,3,4,5,6,7,8,9],
         'System':[np.nan,np.nan,'w7','w7','w7','w7','w8',np.nan,np.nan]}
df1 = pd.DataFrame(dict1)
df2 = pd.DataFrame(dict2)

df3=df1.combine_first(df2)

print(df1,df2,df3)

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK