Pandas基础操作(上)

发布时间:2025-12-09 17:53:46 浏览次数:3

文章目录

  • 一、Pandas文件读取
    • 1.pandas数据读取
    • 1、读取纯文本文件
      • 1.1 读取csv,使用默认的标题行、逗号分隔符
      • 1.2 读取txt文件,自己指定分隔符、列名
    • 2、读取excel文件
    • 3、读取sql文件
  • 二、pandas的数据结构DataFrame和Series
      • DataFrame:二维数据,整个表格,多行多列
    • 1.Series
      • 1.1 仅有数据列表即可生产最简单的Series
      • 1.2 创建一个具有标签索引的Series
      • 1.3 使用python字典创建Series
      • 1.4 根据数据标签索引查询数据
    • 2. DataFrame
      • 2.1 根据多个字典序列创建DataFrame
    • 从DataFrame中查询出Series
      • 3.1 查询一列,结果是一个pd.Series
      • 3.2 查询多列,结果是一个pd.DataFrame
      • 3.3 查询一行,结果是一个pd.Series
      • 3.4 查询多行,结果是一个pd.DataFrame
  • 三.Pandas查询数据的5种方法
    • Pandas查询数据的几种方法
    • Pandas使用df.loc查询数据的方法
    • 注意
    • 0. 读取数据
    • 1. 使用单个label值查询数据
    • 2. 使用值列进行表批量查询
    • 3. 使用数值区间进行范围查询
    • 4. 使用条件表达式查询
      • 复杂条件查询,查询一下完美得天气
    • 5. 调用函数查询
  • 四、Pandas怎样新增数据列
    • 0. 读取csv数据到DataFrame
    • 1. 直接赋值的方法
    • 2. df.apply方法
    • 3. df.assign方法
    • 4. 按条件选择分组分别赋值
  • 五、Pandas数据统计函数
    • 0. 读取csv数据
    • 1. 汇总类统计
    • 2. 唯一去重和按值计数
      • 2.1 唯一去重
      • 2.2 按值计数
    • 3. 相关系数和协防差
  • 六、Pandas对缺失值的处理
    • 实例:特殊excel的读取、清洗、处理
    • 步骤1:读取excel的时候,忽略前几个空行
    • 步骤2:检测空值
    • 步骤3:删除掉全是空值的列
    • 步骤4:删除掉全是空值的行
    • 步骤5:将分数列为空的填充为0分
    • 步骤6:将姓名的缺失值填充
    • 步骤7:将清晰好的excel保存
  • 七、Pandas的SettingWithCopyWarning报警
    • 0. 数据读取
    • 1. 复现
    • 2、原因
    • 4. 解决方法2
      • Pandas不允许先筛选子DataFrame,在进行修改写入
  • 八、Pandas数据排序
    • 0. 读取数据
    • 1. Series的排序
    • 2. DataFrame的排序
      • 2.1 单列排序
      • 2.2 多列排序
  • 九、Pandas字符串处理
    • 0. 读取北京2018年天气数据
    • 1. 获取Series的str属性,使用各种字符串处理函数
    • 4. 使用正则表达式的处理
      • Series.str默认就开启了正则表达式模式
  • 十、Pandas的axis参数怎么理解?
      • ***按哪个axis,就是这个axis要动起来(类似被for遍历),其它的axis保持不动\***
    • 1. 单列drop, 就是删除某一列
    • 3. 按照axis=0/index执行mean聚合操作
      • ***按哪个axis,就是这个axis要动起来(类似被for遍历),其它的axis保持不动\***
    • 3. 按照axis=1/columns执行mean聚合操作
      • ***按哪个axis,就是这个axis要动起来(类似被for遍历),其它的axis保持不动\***
    • 5. 再次举例, 加深理解
      • ***按哪个axis,就是这个axis要动起来(类似被for遍历),其它的axis保持不动\***
  • 十一、Pandas的索引index的用途
    • 1. 使用index查询数据
    • 2. 使用index会提升查询性能
    • 实验1:完全随机的查询顺序
    • 实验2:将index排序后的查询
    • 3.使用index能自动对齐数据
      • s1,s2都具有b,c索引,而a,d为各自独有,无法对齐,所有相加结果为空
    • 4. 使用index更多更强大的数据结构支持
  • 十二、Pandas怎样实现DataFrame的Merge
      • merge的语法:
    • 1、电影数据集的join实例
        • 电影评分数据集
    • 2、理解merge时数量的对齐关系
      • 2.1 one-to-one 一对一关系的merge
      • 2.2 one-to-many 一对多关系的merge
      • 2.3 many-to-many 多对多关系的merge
    • 3、理解left join、right join、inner join、outer join的区别
      • 3.1 inner join,默认
      • 3.2 left join
      • 3.3 right join
      • 3.4 outer join
    • 4、如果出现非Key的字段重名怎么办
  • 十三、Pandas实现数据的合并concat
        • 使用场景:
        • 一句话说明concat语法:
        • concat语法:pandas.concat(objs, axis=0, join='outer', ignore_index=False)
        • append语法:DataFrame.append(other, ignore_index=False)
        • 参考文档:
    • 一、使用Pandas.concat合并数据
    • 1. 默认的concat, 参数为axis=0, join=outer, ignore_index=False
    • 2. 使用ignore_index=True可以忽略原来的索引
    • 3. 使用join=inner过滤掉不匹配的列
    • 4. 使用axis=1相当于添加新列
      • A:添加一列Series
      • B:添加多列Series
    • 二、使用DateFrame.append按行合并数据
      • 1. 给一个DataFrame添加另一个DataFrame
      • 2. 忽略原来的索引,另ignore_index=True
      • 3.可以一行一行的给DataFrame添加数据
      • A:低性能版
      • B:性能好的版本
  • 十四、Pandas批量拆分Excel与合并Excel
    • 0. 读取源Excel到Pandas
    • 1、将一个大excel等份拆成多个Excel
      • 1. 1 计算拆分后的每个excel的行数
      • 1.2 拆分成多个DataFrame
      • 1.3 将每个DataFrame存入excel
    • 2、合并多个小Excel到一个大Excel
      • 2.1 遍历文件夹,得到要合并的Excel名称列表
      • 2.2 分别读取到DataFrame
      • 2.3 使用pd.concat进行合并
      • 2.4 将合并后的DataFrame输出到Excel
  • 十五、Pandas怎样实现groupby分组统计
    • 1、分组使用聚合函数做数据统计
      • 1.1 单个列groupby,查询所有数据列的统计
      • 1.2 多个列groupby,查询所有数据列的统计
      • 1.3 同时查看多种数据统计
      • 1.4 查看单列的结果数据统计
      • 1.5 不同列使用不同的聚合函数
    • 2、遍历groupby的结果理解执行流程
      • 2.1 遍历单个列聚合的分组
      • 2.2 遍历多个列聚合的分组
    • 3、实例分组探索天气数据
      • 3.1 查看每个月的最高温度
      • 3.2 查看每个月的最高温度、最低温度、平均空气质量指数
  • 十六、Pandas的分层索引MultiIndex
    • 1、Series的分层索引MultiIndex
    • 2、Series有多层索引MultiIndex怎么筛选数据?
    • 3、DataFrame的多层索引MultiIndex
    • 4、DataFrame有多层索引MultiIndex怎样筛选?
  • 十七、Pandas的数据转换函数map、apply、applymap
    • 1. map用于Series值的转换
      • 方法1:Series.map(dict)
      • 方法2:Series.map(function)
    • 2. apply用于Series和DataFrame的转换
      • Series.apply(function)
      • DataFrame.apply(function)
    • 3. applymap用于DataFrame所有值的转换
  • 十八、Pandas怎样对每个分组应用apply函数?
      • 知识:Pandas的GroupBy遵从split、apply、combine模式
      • GroupBy.apply(function)
      • 本次实例演示:
    • 实例1:怎样对数值列按分组的归一化?
      • 演示:用户对电影评分的归一化
    • 实例2:怎么取每个分组的TOP N数据

一、Pandas文件读取

1.pandas数据读取

pandas需要先读取表格类型的数据,然后进行分析

数据类型说明pandas读取方法
csv、tsv、txt用逗号分隔、tab分割的纯文本文件pd.read_csv
excel微软xls或者xlsx文件pd.read_excel
mysql关系型数据库表pd.read_sql

In [1]:

import pandas as pd

1、读取纯文本文件

1.1 读取csv,使用默认的标题行、逗号分隔符

In [2]:

fpath = "./pandas-learn-code/datas/ml-latest-small/ratings.csv"

In [3]:

# 使用pd.read_csv读取数据ratings = pd.read_csv(fpath)

In [4]:

# 查看前几行数据ratings.head()

Out[4]:

userIdmovieIdratingtimestamp
0114.0964982703
1134.0964981247
2164.0964982224
31475.0964983815
41505.0964982931

In [5]:

# 查看数据的形状,返回(行数、列数)ratings.shape

Out[5]:

(100836, 4)

In [6]:

# 查看列名列表ratings.columns

Out[6]:

Index(['userId', 'movieId', 'rating', 'timestamp'], dtype='object')

In [7]:

# 查看索引ratings.index

Out[7]:

RangeIndex(start=0, stop=100836, step=1)

In [9]:

# 查看每列的数据类型ratings.dtypes

Out[9]:

userId int64movieId int64rating float64timestamp int64dtype: object

1.2 读取txt文件,自己指定分隔符、列名

In [10]:

fpath = "./pandas-learn-code/datas/crazyant/access_pvuv.txt"

In [11]:

pvuv = pd.read_csv(fpath, sep="\t", header=None, names=["pdate","pv","uv"])
  • sep代表分隔符
  • header=none代表没有列名
  • names代表指定的列明

In [13]:

pvuv.head()

Out[13]:

pdatepvuv
02019-09-1013992
12019-09-09185153
22019-09-0812359
32019-09-076540
42019-09-0615798

2、读取excel文件

In [18]:

fpath = "./pandas-learn-code/datas/crazyant/access_pvuv.xlsx"pvuv = pd.read_excel(fpath)

In [19]:

pvuv

Out[19]:

日期PVUV
02019-09-1013992
12019-09-09185153
22019-09-0812359
32019-09-076540
42019-09-0615798
52019-09-05205151
62019-09-04196167
72019-09-03216176
82019-09-02227148
92019-09-0110561

3、读取sql文件

In [36]:

import pymysqlconn = pymysql.connect(host="127.0.0.1",user="root",password="123456",database="test",charset="utf8")

In [41]:

fpath = "./pandas-learn-code/datas/crazyant/test_crazyant_pvuv.sql"mysql_page = pd.read_sql("select * from crazyant_pvuv", con=conn)

In [42]:

pvuv

Out[42]:

日期PVUV
02019-09-1013992
12019-09-09185153
22019-09-0812359
32019-09-076540
42019-09-0615798
52019-09-05205151
62019-09-04196167
72019-09-03216176
82019-09-02227148
92019-09-0110561

二、pandas的数据结构DataFrame和Series

DataFrame:二维数据,整个表格,多行多列

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tT4RRssV-1597761927694)(C:\Users\z&y\AppData\Roaming\Typora\typora-user-images\image-20200730213558995.png)]

In [1]:

import pandas as pdimport numpy as np

1.Series

Series是一种类似于一维数组的对象,它由一组数据(不同数据类型)以及一组与之相关的数据标签(即索引)组成。

1.1 仅有数据列表即可生产最简单的Series

In [3]:

s1 = pd.Series([1,'a',5.2,7])

In [5]:

# 左侧为索引,右侧是数据s1.head()

Out[5]:

0 11 a2 5.23 7dtype: object

In [6]:

# 获取索引s1.index

Out[6]:

RangeIndex(start=0, stop=4, step=1)

In [7]:

# 获取数据s1.values

Out[7]:

array([1, 'a', 5.2, 7], dtype=object)

1.2 创建一个具有标签索引的Series

In [8]:

s2 = pd.Series([1,'a',5.2,7], index=['a','b','c','d'])

In [9]:

s2

Out[9]:

a 1b ac 5.2d 7dtype: object

In [10]:

s2.index

Out[10]:

Index(['a', 'b', 'c', 'd'], dtype='object')

1.3 使用python字典创建Series

In [11]:

sdata = {'Ohio':35000, 'Texas':72000, 'Oregon':16000, 'Utah':5000}

In [13]:

s3 = pd.Series(sdata)

In [14]:

# 字典的key成为了Series的索引s3

Out[14]:

Ohio 35000Texas 72000Oregon 16000Utah 5000dtype: int64

1.4 根据数据标签索引查询数据

类似python的字典dict

In [15]:

s2

Out[15]:

a 1b ac 5.2d 7dtype: object

In [20]:

s2['a']

Out[20]:

1

In [21]:

# 查询一个值,返回查询值的数据类型type(s2['a'])

Out[21]:

int

In [18]:

# 一次查询多个值s2[['a','b','c']]

Out[18]:

a 1b ac 5.2dtype: object

In [22]:

# 查询多个值,返回的还是Seriestype(s2[['a','b','c']])

Out[22]:

pandas.core.series.Series

2. DataFrame

DataFrame是一个表格型的数据结构

  • 每列可以是不同的值类型(数值,字符串,布尔值等)
  • 既有行索引index,也有列索引columns
  • 可以被看做由Series组成的字典

2.1 根据多个字典序列创建DataFrame

In [24]:

data = {'state':['Ohio','Ohio','Ohio','Nevada','Nevada'],'year':[2000,2001,2002,2003,2004],'pop':[1.5,1.7,3.6,2.4,2.9]}df = pd.DataFrame(data)

In [25]:

df

Out[25]:

stateyearpop
0Ohio20001.5
1Ohio20011.7
2Ohio20023.6
3Nevada20032.4
4Nevada20042.9

In [26]:

df.dtypes

Out[26]:

state objectyear int64pop float64dtype: object

In [27]:

df.columns

Out[27]:

Index(['state', 'year', 'pop'], dtype='object')

In [28]:

df.index

Out[28]:

RangeIndex(start=0, stop=5, step=1)

从DataFrame中查询出Series

  • 如果只查询一列,一列,返回的是pd.Series
  • 如果查询多行,多列,返回的是pd.DataFrame

In [29]:

df

Out[29]:

stateyearpop
0Ohio20001.5
1Ohio20011.7
2Ohio20023.6
3Nevada20032.4
4Nevada20042.9

3.1 查询一列,结果是一个pd.Series

In [30]:

df['year']

Out[30]:

0 20001 20012 20023 20034 2004Name: year, dtype: int64

In [35]:

# 返回的是一个Seriestype(df['year'])

Out[35]:

pandas.core.series.Series

3.2 查询多列,结果是一个pd.DataFrame

In [33]:

df[['year', 'pop']]

Out[33]:

yearpop
020001.5
120011.7
220023.6
320032.4
420042.9

In [34]:

# 返回的结果是一个DataFrametype(df[['year','pop']])

Out[34]:

pandas.core.frame.DataFrame

3.3 查询一行,结果是一个pd.Series

In [39]:

df.loc[0]

Out[39]:

state Ohioyear 2000pop 1.5Name: 0, dtype: object

In [40]:

type(df.loc[0])

Out[40]:

pandas.core.series.Series

3.4 查询多行,结果是一个pd.DataFrame

In [41]:

# DataFrame中切片会返回结尾的数据df.loc[0:3]

Out[41]:

stateyearpop
0Ohio20001.5
1Ohio20011.7
2Ohio20023.6
3Nevada20032.4

In [42]:

type(df.loc[0:3])

Out[42]:

pandas.core.frame.DataFrame

三.Pandas查询数据的5种方法

Pandas查询数据的几种方法

  • df.loc方法,根据行,列的标签值查询
  • df.iloc方法,根据行,列的数字位置查询
  • df.where方法
  • df.query方法
  • .loc方法既能查询,又能覆盖写入,推荐使用此方法

    Pandas使用df.loc查询数据的方法

  • 使用单个label值查询数据
  • 使用值列表批量查询
  • 使用数值区间进行范围查询
  • 使用条件表达式查询
  • 调用函数查询
  • 注意

    • 以上查询方法,既适用于行,也适用于列

    In [3]:

    import pandas as pd

    0. 读取数据

    数据为北京2018年全年天气预报

    In [4]:

    df = pd.read_csv("./pandas-learn-code/datas/beijing_tianqi/beijing_tianqi_2018.csv")

    In [5]:

    df.head()

    Out[5]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    02018-01-013℃-6℃晴~多云东北风1-2级592
    12018-01-022℃-5℃阴~多云东北风1-2级491
    22018-01-032℃-5℃多云北风1-2级281
    32018-01-040℃-8℃东北风1-2级281
    42018-01-053℃-6℃多云~晴西北风1-2级501

    In [6]:

    # 设定索引为日期,方便按日期筛选df.set_index('ymd', inplace=True)

    In [7]:

    df.head()

    Out[7]:

    bWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    ymd
    2018-01-013℃-6℃晴~多云东北风1-2级592
    2018-01-022℃-5℃阴~多云东北风1-2级491
    2018-01-032℃-5℃多云北风1-2级281
    2018-01-040℃-8℃东北风1-2级281
    2018-01-053℃-6℃多云~晴西北风1-2级501

    In [8]:

    # 时间序列见后续课程,本次按字符串处理df.index

    Out[8]:

    Index(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04', '2018-01-05','2018-01-06', '2018-01-07', '2018-01-08', '2018-01-09', '2018-01-10',...'2018-12-22', '2018-12-23', '2018-12-24', '2018-12-25', '2018-12-26','2018-12-27', '2018-12-28', '2018-12-29', '2018-12-30', '2018-12-31'],dtype='object', name='ymd', length=365)

    In [9]:

    # 替换掉温度的后缀℃# df.loc[:]表示筛选出所有的行df.loc[:, "bWendu"] = df["bWendu"].str.replace("℃","").astype('int32')df.loc[:, "yWendu"] = df["yWendu"].str.replace("℃","").astype('int32')

    In [10]:

    # bWendu和yWendu改为int类型df.dtypes

    Out[10]:

    bWendu int32yWendu int32tianqi objectfengxiang objectfengli objectaqi int64aqiInfo objectaqiLevel int64dtype: object

    1. 使用单个label值查询数据

    行或者列,都可以只传入单个值,实现精确匹配

    In [11]:

    # 得到单个值df.loc['2018-01-03','bWendu']

    Out[11]:

    2

    In [12]:

    # 得到一个Seriesdf.loc['2018-01-03',['bWendu', 'yWendu']]

    Out[12]:

    bWendu 2yWendu -5Name: 2018-01-03, dtype: object

    2. 使用值列进行表批量查询

    In [13]:

    # 得到Seriesdf.loc[['2018-01-03','2018-01-04','2018-01-05'], 'bWendu']

    Out[13]:

    ymd2018-01-03 22018-01-04 02018-01-05 3Name: bWendu, dtype: int32

    In [14]:

    # 得到DataFramedf.loc[['2018-01-03','2018-01-04','2018-01-05'], ['bWendu','yWendu']]

    Out[14]:

    bWenduyWendu
    ymd
    2018-01-032-5
    2018-01-040-8
    2018-01-053-6

    3. 使用数值区间进行范围查询

    注意:区间既包含开始,也包含结束

    In [15]:

    # 行index按区间df.loc['2018-01-03':'2018-01-05', 'bWendu']

    Out[15]:

    ymd2018-01-03 22018-01-04 02018-01-05 3Name: bWendu, dtype: int32

    In [16]:

    # 列index按区间df.loc['2018-01-03','bWendu':'fengxiang']

    Out[16]:

    bWendu 2yWendu -5tianqi 多云fengxiang 北风Name: 2018-01-03, dtype: object

    In [17]:

    # 行和列都按区间查询df.loc['2018-01-03':'2018-01-05','bWendu':'fengxiang']

    Out[17]:

    bWenduyWendutianqifengxiang
    ymd
    2018-01-032-5多云北风
    2018-01-040-8东北风
    2018-01-053-6多云~晴西北风

    4. 使用条件表达式查询

    bool列表的长度得等于行数或者列数

    In [23]:

    df.loc[df["yWendu"]<-10,:]

    Out[23]:

    bWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    ymd
    2018-01-23-4-12西北风3-4级311
    2018-01-24-4-11西南风1-2级341
    2018-01-25-3-11多云东北风1-2级271
    2018-12-26-2-11晴~多云东北风2级261
    2018-12-27-5-12多云~晴西北风3级481
    2018-12-28-3-11西北风3级401
    2018-12-29-3-12西北风2级291
    2018-12-30-2-11晴~多云东北风1级311

    In [24]:

    df["yWendu"]<-10

    Out[24]:

    ymd2018-01-01 False2018-01-02 False2018-01-03 False2018-01-04 False2018-01-05 False2018-01-06 False2018-01-07 False2018-01-08 False2018-01-09 False2018-01-10 False2018-01-11 False2018-01-12 False2018-01-13 False2018-01-14 False2018-01-15 False2018-01-16 False2018-01-17 False2018-01-18 False2018-01-19 False2018-01-20 False2018-01-21 False2018-01-22 False2018-01-23 True2018-01-24 True2018-01-25 True2018-01-26 False2018-01-27 False2018-01-28 False2018-01-29 False2018-01-30 False... 2018-12-02 False2018-12-03 False2018-12-04 False2018-12-05 False2018-12-06 False2018-12-07 False2018-12-08 False2018-12-09 False2018-12-10 False2018-12-11 False2018-12-12 False2018-12-13 False2018-12-14 False2018-12-15 False2018-12-16 False2018-12-17 False2018-12-18 False2018-12-19 False2018-12-20 False2018-12-21 False2018-12-22 False2018-12-23 False2018-12-24 False2018-12-25 False2018-12-26 True2018-12-27 True2018-12-28 True2018-12-29 True2018-12-30 True2018-12-31 FalseName: yWendu, Length: 365, dtype: bool

    复杂条件查询,查询一下完美得天气

    • 注意,组合条件用&符号合并,每个条件判断都得带括号

    In [29]:

    df.loc[(df["bWendu"]<=30) & (df["yWendu"]>=15) & (df["tianqi"]=="晴") & (df["aqiLevel"]==1),:]

    Out[29]:

    bWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    ymd
    2018-08-243020北风1-2级401
    2018-09-072716西北风3-4级221

    In [30]:

    (df["bWendu"]<=30) & (df["yWendu"]>=15) & (df["tianqi"]=="晴") & (df["aqiLevel"]==1)

    Out[30]:

    ymd2018-01-01 False2018-01-02 False2018-01-03 False2018-01-04 False2018-01-05 False2018-01-06 False2018-01-07 False2018-01-08 False2018-01-09 False2018-01-10 False2018-01-11 False2018-01-12 False2018-01-13 False2018-01-14 False2018-01-15 False2018-01-16 False2018-01-17 False2018-01-18 False2018-01-19 False2018-01-20 False2018-01-21 False2018-01-22 False2018-01-23 False2018-01-24 False2018-01-25 False2018-01-26 False2018-01-27 False2018-01-28 False2018-01-29 False2018-01-30 False... 2018-12-02 False2018-12-03 False2018-12-04 False2018-12-05 False2018-12-06 False2018-12-07 False2018-12-08 False2018-12-09 False2018-12-10 False2018-12-11 False2018-12-12 False2018-12-13 False2018-12-14 False2018-12-15 False2018-12-16 False2018-12-17 False2018-12-18 False2018-12-19 False2018-12-20 False2018-12-21 False2018-12-22 False2018-12-23 False2018-12-24 False2018-12-25 False2018-12-26 False2018-12-27 False2018-12-28 False2018-12-29 False2018-12-30 False2018-12-31 FalseLength: 365, dtype: bool

    5. 调用函数查询

    In [31]:

    # 直接写lambda表达式df.loc[lambda df: (df["bWendu"]<=30) & (df["yWendu"]>=15),:]

    Out[31]:

    bWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    ymd
    2018-04-282717西南风3-4级125轻度污染3
    2018-04-293016多云南风3-4级193中度污染4
    2018-05-042716晴~多云西南风1-2级862
    2018-05-092917晴~多云西南风3-4级792
    2018-05-102618多云南风3-4级118轻度污染3
    2018-05-112415阴~多云东风1-2级106轻度污染3
    2018-05-122816小雨东南风3-4级186中度污染4
    2018-05-133017南风1-2级682
    2018-05-162921多云~小雨东风1-2级142轻度污染3
    2018-05-172519小雨~多云北风1-2级702
    2018-05-182816多云~晴南风1-2级491
    2018-05-192716多云~小雨南风1-2级692
    2018-05-202116阴~小雨东风1-2级542
    2018-05-232915西南风3-4级153中度污染4
    2018-05-263017小雨~多云西南风3-4级143轻度污染3
    2018-05-283016西北风4-5级178中度污染4
    2018-06-092317小雨北风1-2级451
    2018-06-102717多云东南风1-2级512
    2018-06-112919多云西南风3-4级852
    2018-06-132819雷阵雨~多云东北风1-2级732
    2018-06-183021雷阵雨西南风1-2级112轻度污染3
    2018-06-223021雷阵雨~多云东南风1-2级832
    2018-07-083023雷阵雨南风1-2级732
    2018-07-093022雷阵雨~多云东南风1-2级106轻度污染3
    2018-07-103022多云~雷阵雨南风1-2级481
    2018-07-112522雷阵雨~大雨东北风1-2级441
    2018-07-122722多云南风1-2级461
    2018-07-132823雷阵雨东风1-2级602
    2018-07-172723中雨~雷阵雨西风1-2级281
    2018-07-242826暴雨~雷阵雨东北风3-4级291
    2018-08-113023雷阵雨~中雨东风1-2级602
    2018-08-123024雷阵雨东南风1-2级742
    2018-08-142924中雨~小雨东北风1-2级421
    2018-08-163021晴~多云东北风1-2级401
    2018-08-173022多云~雷阵雨东南风1-2级692
    2018-08-182823小雨~中雨北风3-4级401
    2018-08-192623中雨~小雨东北风1-2级371
    2018-08-222821雷阵雨~多云西南风1-2级481
    2018-08-243020北风1-2级401
    2018-08-273022多云~雷阵雨东南风1-2级892
    2018-08-282922小雨~多云南风1-2级582
    2018-08-302920多云南风1-2级471
    2018-08-312920多云~阴东南风1-2级481
    2018-09-012719阴~小雨南风1-2级501
    2018-09-022719小雨~多云南风1-2级552
    2018-09-033019北风3-4级702
    2018-09-062718多云~晴西北风4-5级371
    2018-09-072716西北风3-4级221
    2018-09-082715多云~晴北风1-2级281
    2018-09-092816西南风1-2级512
    2018-09-102819多云南风1-2级652
    2018-09-112619多云南风1-2级682
    2018-09-122919多云南风1-2级592
    2018-09-132920多云~阴南风1-2级107轻度污染3
    2018-09-142819小雨~多云南风1-2级128轻度污染3
    2018-09-152615多云北风3-4级421
    2018-09-172717多云~阴北风1-2级371
    2018-09-182517阴~多云西南风1-2级501
    2018-09-192617多云南风1-2级522
    2018-09-202716多云西南风1-2级632

    64 rows × 8 columns

    In [33]:

    # 编写自己的函数,查询9月份,空气质量好的数据def query_my_data(df):return df.index.str.startswith("2018-09") & (df["aqiLevel"]==1)df.loc[query_my_data,:]

    Out[33]:

    bWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    ymd
    2018-09-012719阴~小雨南风1-2级501
    2018-09-043118西南风3-4级241
    2018-09-053119晴~多云西南风3-4级341
    2018-09-062718多云~晴西北风4-5级371
    2018-09-072716西北风3-4级221
    2018-09-082715多云~晴北风1-2级281
    2018-09-152615多云北风3-4级421
    2018-09-162514多云~晴北风1-2级291
    2018-09-172717多云~阴北风1-2级371
    2018-09-182517阴~多云西南风1-2级501
    2018-09-212514西北风3-4级501
    2018-09-222413西北风3-4级281
    2018-09-232312西北风4-5级281
    2018-09-242311北风1-2级281
    2018-09-252412晴~多云南风1-2级441
    2018-09-292211北风3-4级211
    2018-09-301913多云西北风4-5级221

    四、Pandas怎样新增数据列

    In [1]:

    import pandas as pd

    0. 读取csv数据到DataFrame

    In [15]:

    df = pd.read_csv("./pandas-learn-code/datas/beijing_tianqi/beijing_tianqi_2018.csv")

    In [16]:

    df.head()

    Out[16]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    02018-01-013℃-6℃晴~多云东北风1-2级592
    12018-01-022℃-5℃阴~多云东北风1-2级491
    22018-01-032℃-5℃多云北风1-2级281
    32018-01-040℃-8℃东北风1-2级281
    42018-01-053℃-6℃多云~晴西北风1-2级501

    1. 直接赋值的方法

    实例:清理温度列,变成数字类型

    In [31]:

    df.loc[:,"bWendu"] = df["bWendu"].str.replace("℃","").astype('int32')df.loc[:,"yWendu"] = df["yWendu"].str.replace("℃","").astype('int32')实例:计算温差

    In [49]:

    del df["bWendnu"]

    In [51]:

    del df["bWednu"]

    In [52]:

    # 注意,fpath["bWendu"]其实是一个Series,后面的减法返回的是Seriesdf.loc[:,"wencha"] = df["bWendu"] - df["yWendu"]

    In [53]:

    df.head()

    Out[53]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevelwencha
    02018-01-013-6晴~多云东北风1-2级5929
    12018-01-022-5阴~多云东北风1-2级4917
    22018-01-032-5多云北风1-2级2817
    32018-01-040-8东北风1-2级2818
    42018-01-053-6多云~晴西北风1-2级5019

    2. df.apply方法

    Apply a function along an axis of the DataFrame. Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1) 实例:添加一列温度类型:

  • 如果最高温度大于33度就是高温
  • 低于-10度是低温
  • 否则是常温
  • In [60]:

    def get_wendu_type(x):if x["bWendu"] > 33:return "高温"if x["yWendu"] < -10:return "低温"return "常温"# 注意需要设置axis=1df.loc[:,"wendu_type"] = df.apply(get_wendu_type, axis=1)

    In [61]:

    # 查看温度类型的计数df["wendu_type"].value_counts()

    Out[61]:

    常温 328高温 29低温 8Name: wendu_type, dtype: int64

    3. df.assign方法

    Assign new columns to a DataFrame.

    Returns a new object with all original columns in addtion to new ones.

    实例:将温度从摄氏度变成华氏度

    In [63]:

    # 可以同时添加多个新的列df.assign(yWendu_huashi = lambda x: x["yWendu"]*9/5 + 32,bWendu_huashi = lambda x: x["bWendu"]*9/5 + 32)

    . . .

    4. 按条件选择分组分别赋值

    按条件选择数据,然后对整个数据赋值新列

    实例:高低温差大于10度,则认为温差大

    In [65]:

    df.loc[:,"wencha_type"] = ""df.loc[df["bWendu"]-df["yWendu"]>10, "wencha_type"] = "温差大"df.loc[df["bWendu"]-df["yWendu"]<=10, "wencha_type"]= "温度正常"

    In [67]:

    df["wencha_type"].value_counts()

    Out[67]:

    温度正常 187温差大 178Name: wencha_type, dtype: int64

    五、Pandas数据统计函数

  • 汇总类统计
  • 唯一去重和按值计数
  • 相关系数和协方差
  • In [2]:

    import pandas as pd

    0. 读取csv数据

    In [5]:

    fpath = "./pandas-learn-code/datas/beijing_tianqi/beijing_tianqi_2018.csv"df = pd.read_csv(fpath)

    In [6]:

    df.head(3)

    Out[6]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    02018-01-013℃-6℃晴~多云东北风1-2级592
    12018-01-022℃-5℃阴~多云东北风1-2级491
    22018-01-032℃-5℃多云北风1-2级281

    In [12]:

    df.loc[:, "yWendu"] = df["yWendu"].str.replace("℃","").astype("int32")

    In [14]:

    df.head(3)

    Out[14]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    02018-01-013-6晴~多云东北风1-2级592
    12018-01-022-5阴~多云东北风1-2级491
    22018-01-032-5多云北风1-2级281

    1. 汇总类统计

    In [15]:

    # 一次提取所有数字列统计结果df.describe()

    Out[15]:

    bWenduyWenduaqiaqiLevel
    count365.000000365.000000365.000000365.000000
    mean18.6657538.35890482.1835622.090411
    std11.85804611.75505351.9361591.029798
    min-5.000000-12.00000021.0000001.000000
    25%8.000000-3.00000046.0000001.000000
    50%21.0000008.00000069.0000002.000000
    75%29.00000019.000000104.0000003.000000
    max38.00000027.000000387.0000006.000000

    In [16]:

    # 查看单个Series的数据df["bWendu"].mean()

    Out[16]:

    18.665753424657535

    In [17]:

    # 最高温df["bWendu"].max()

    Out[17]:

    38

    In [18]:

    # 最低温df["bWendu"].min()

    Out[18]:

    -5

    2. 唯一去重和按值计数

    2.1 唯一去重

    一般不用于数值列,而是枚举、分类列

    In [19]:

    df["fengxiang"].unique()

    Out[19]:

    array(['东北风', '北风', '西北风', '西南风', '南风', '东南风', '东风', '西风'], dtype=object)

    In [20]:

    df["tianqi"].unique()

    Out[20]:

    array(['晴~多云', '阴~多云', '多云', '阴', '多云~晴', '多云~阴', '晴', '阴~小雪', '小雪~多云','小雨~阴', '小雨~雨夹雪', '多云~小雨', '小雨~多云', '大雨~小雨', '小雨', '阴~小雨','多云~雷阵雨', '雷阵雨~多云', '阴~雷阵雨', '雷阵雨', '雷阵雨~大雨', '中雨~雷阵雨', '小雨~大雨','暴雨~雷阵雨', '雷阵雨~中雨', '小雨~雷阵雨', '雷阵雨~阴', '中雨~小雨', '小雨~中雨', '雾~多云','霾'], dtype=object)

    In [22]:

    df["fengli"].unique()

    Out[22]:

    array(['1-2级', '4-5级', '3-4级', '2级', '1级', '3级'], dtype=object)

    2.2 按值计数

    In [24]:

    df["fengxiang"].value_counts()

    Out[24]:

    南风 92西南风 64北风 54西北风 51东南风 46东北风 38东风 14西风 6Name: fengxiang, dtype: int64

    In [25]:

    df["tianqi"].unique()

    Out[25]:

    array(['晴~多云', '阴~多云', '多云', '阴', '多云~晴', '多云~阴', '晴', '阴~小雪', '小雪~多云','小雨~阴', '小雨~雨夹雪', '多云~小雨', '小雨~多云', '大雨~小雨', '小雨', '阴~小雨','多云~雷阵雨', '雷阵雨~多云', '阴~雷阵雨', '雷阵雨', '雷阵雨~大雨', '中雨~雷阵雨', '小雨~大雨','暴雨~雷阵雨', '雷阵雨~中雨', '小雨~雷阵雨', '雷阵雨~阴', '中雨~小雨', '小雨~中雨', '雾~多云','霾'], dtype=object)

    In [26]:

    df["fengli"].value_counts()

    Out[26]:

    1-2级 2363-4级 681级 214-5级 202级 133级 7Name: fengli, dtype: int64

    3. 相关系数和协防差

    用途:

  • 两只股票,是不是同涨同跌?程度多大?正相关还是负相关?
  • 产品销量的波动,跟哪些因素正相关、负相关,程度有多大?
  • 对于两个变量x, y:

  • 协方差:衡量同向反向程度,如果协方差为正,说明x,y同向变化,协方差越大说明同向程度越高;如果协方差为负,说明x,y反向运动,协方差越小说明反向程度越高。
  • 相关系数:衡量相似度程度,当他们的相关系数为1时,说明两个变量变化时正向相似度越大,当关系数为-1时,说明两个变量变化的反向相似度最大
  • In [27]:

    # 协方差矩阵df.cov()

    Out[27]:

    bWenduyWenduaqiaqiLevel
    bWendu140.613247135.52963347.4626220.879204
    yWendu135.529633138.18127416.1866850.264165
    aqi47.46262216.1866852697.36456450.749842
    aqiLevel0.8792040.26416550.7498421.060485

    In [28]:

    # 相关系数矩阵df.corr()

    Out[28]:

    bWenduyWenduaqiaqiLevel
    bWendu1.0000000.9722920.0770670.071999
    yWendu0.9722921.0000000.0265130.021822
    aqi0.0770670.0265131.0000000.948883
    aqiLevel0.0719990.0218220.9488831.000000

    In [29]:

    # 单独查看空气质量和最高温度的相关系数df["aqi"].corr(df["bWendu"])

    Out[29]:

    0.07706705916811067

    In [30]:

    df["aqi"].corr(df["yWendu"])

    Out[30]:

    0.026513282672968895

    In [31]:

    # 空气质量和温差的相关系数df["aqi"].corr(df["bWendu"]-df["yWendu"])

    Out[31]:

    0.2165225757638205
    • 虽然单独观察最高温度和最低温度对空气质量的影响不大,但是明显温差对空气质量的影响要大得多,因此,前端数据的挖掘对结果的呈现十分重要。

    六、Pandas对缺失值的处理

    Pandas使用这些函数处理缺失值:

    • isnull和notnull:检测是否是空值,可用于df和Series
    • dropna:丢弃、删除缺失值
      • axis:删除行还是列,{0 or “index”, 1 or “columns”}, default 0
      • how:如果等于any则任何值为空都删除,如果等于all则所有值都为空才删除
      • inplace:如果为True则修改当前df,否则返回新的df
    • fillna:填充空值
      • value:用于填充的值,可以是单个值,或者字典(key是列名,value是值)
      • method:等于ffill使用前一个不为空的值填充forward fill,等于fill使用后一个不为 空的值填充backword fill
      • axis:按行还是列填充,{0 or “index”, 1 or "columns’}
      • inplace:如果为True则修改当前df,否则返回新的df

    In [1]:

    import pandas as pd

    实例:特殊excel的读取、清洗、处理

    步骤1:读取excel的时候,忽略前几个空行

    In [5]:

    # skiprows=2, 跳过前两行studf = pd.read_excel("./pandas-learn-code/datas/student_excel/student_excel.xlsx", skiprows=2)

    In [6]:

    studf

    Out[6]:

    Unnamed: 0姓名科目分数
    0NaN小明语文85.0
    1NaNNaN数学80.0
    2NaNNaN英语90.0
    3NaNNaNNaNNaN
    4NaN小王语文85.0
    5NaNNaN数学NaN
    6NaNNaN英语90.0
    7NaNNaNNaNNaN
    8NaN小刚语文85.0
    9NaNNaN数学80.0
    10NaNNaN英语90.0

    步骤2:检测空值

    In [7]:

    studf.isnull()

    Out[7]:

    Unnamed: 0姓名科目分数
    0TrueFalseFalseFalse
    1TrueTrueFalseFalse
    2TrueTrueFalseFalse
    3TrueTrueTrueTrue
    4TrueFalseFalseFalse
    5TrueTrueFalseTrue
    6TrueTrueFalseFalse
    7TrueTrueTrueTrue
    8TrueFalseFalseFalse
    9TrueTrueFalseFalse
    10TrueTrueFalseFalse

    In [9]:

    studf["分数"].isnull()

    Out[9]:

    0 False1 False2 False3 True4 False5 True6 False7 True8 False9 False10 FalseName: 分数, dtype: bool

    In [10]:

    studf["分数"].notnull()

    Out[10]:

    0 True1 True2 True3 False4 True5 False6 True7 False8 True9 True10 TrueName: 分数, dtype: bool

    In [12]:

    # 筛选没有空分数的所有行studf.loc[studf["分数"].notnull(), :]

    Out[12]:

    Unnamed: 0姓名科目分数
    0NaN小明语文85.0
    1NaNNaN数学80.0
    2NaNNaN英语90.0
    4NaN小王语文85.0
    6NaNNaN英语90.0
    8NaN小刚语文85.0
    9NaNNaN数学80.0
    10NaNNaN英语90.0

    步骤3:删除掉全是空值的列

    In [15]:

    studf.dropna(axis="columns", how="all", inplace=True)

    In [16]:

    studf

    Out[16]:

    姓名科目分数
    0小明语文85.0
    1NaN数学80.0
    2NaN英语90.0
    4小王语文85.0
    5NaN数学NaN
    6NaN英语90.0
    8小刚语文85.0
    9NaN数学80.0
    10NaN英语90.0

    步骤4:删除掉全是空值的行

    In [13]:

    studf.dropna(axis="index", how="all", inplace=True)

    In [17]:

    studf

    Out[17]:

    姓名科目分数
    0小明语文85.0
    1NaN数学80.0
    2NaN英语90.0
    4小王语文85.0
    5NaN数学NaN
    6NaN英语90.0
    8小刚语文85.0
    9NaN数学80.0
    10NaN英语90.0

    步骤5:将分数列为空的填充为0分

    In [19]:

    studf.fillna({"分数": 0})

    . . .

    In [20]:

    # 等同于studf.loc[:,"分数"] = studf["分数"].fillna(0)

    In [21]:

    studf

    Out[21]:

    姓名科目分数
    0小明语文85.0
    1NaN数学80.0
    2NaN英语90.0
    4小王语文85.0
    5NaN数学0.0
    6NaN英语90.0
    8小刚语文85.0
    9NaN数学80.0
    10NaN英语90.0

    步骤6:将姓名的缺失值填充

    使用前面的有效值填充,用ffill:forward fill

    In [22]:

    studf.loc[:, "姓名"] = studf['姓名'].fillna(method="ffill")

    In [23]:

    studf

    Out[23]:

    姓名科目分数
    0小明语文85.0
    1小明数学80.0
    2小明英语90.0
    4小王语文85.0
    5小王数学0.0
    6小王英语90.0
    8小刚语文85.0
    9小刚数学80.0
    10小刚英语90.0

    步骤7:将清晰好的excel保存

    In [25]:

    studf.to_excel(r"D:\WinterIsComing\python\New_Wave\pandas_basic\student_excel.xlsx", index=False)

    七、Pandas的SettingWithCopyWarning报警

    0. 数据读取

    In [1]:

    import pandas as pd

    In [2]:

    fpath = "./pandas-learn-code/datas/beijing_tianqi/beijing_tianqi_2018.csv"df = pd.read_csv(fpath)

    In [3]:

    df.head()

    Out[3]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    02018-01-013℃-6℃晴~多云东北风1-2级592
    12018-01-022℃-5℃阴~多云东北风1-2级491
    22018-01-032℃-5℃多云北风1-2级281
    32018-01-040℃-8℃东北风1-2级281
    42018-01-053℃-6℃多云~晴西北风1-2级501

    In [5]:

    df.loc[:, "bWendu"] = df["bWendu"].str.replace("℃","").astype("int32")df.loc[:, "yWendu"] = df["yWendu"].str.replace("℃","").astype("int32")

    In [7]:

    df.head()

    Out[7]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    02018-01-013-6晴~多云东北风1-2级592
    12018-01-022-5阴~多云东北风1-2级491
    22018-01-032-5多云北风1-2级281
    32018-01-040-8东北风1-2级281
    42018-01-053-6多云~晴西北风1-2级501

    1. 复现

    In [10]:

    # 筛选出3月份的数据用于分析condition = df.loc[:, "ymd"].str.startswith("2018-03")

    In [11]:

    # 设置三月份的温差# 错误写法df[condition]["wen_cha"] = df["bWendu"] - df["yWendu"]D:\Tools\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.Try using .loc[row_indexer,col_indexer] = value insteadSee the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copyThis is separate from the ipykernel package so we can avoid doing imports until

    In [12]:

    # 查看修改是否成功df[condition].head()# 只筛选了3月的数据,但没有新增温差列

    Out[12]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    592018-03-018-3多云西南风1-2级461
    602018-03-029-1晴~多云北风1-2级952
    612018-03-03133多云~阴北风1-2级214重度污染5
    622018-03-047-2阴~多云东南风1-2级144轻度污染3
    632018-03-058-3南风1-2级942

    2、原因

    发出警告的代码 df[condition][“wen_cha”] = df[“bWendu”]-df[“yWendu”]

    相当于:df.get(condition).set(wen_cha),第一步骤的get发出了报警

    *链式操作其实是两个步骤,先get后set,get得到的dataframe可能是view(是DateFrame的子视图,我们对它修改会直接影响原DateFrame)也可能是copy,pandas发出警告*

    官网文档: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy

    核心要诀:pandas的dataframe的修改写操作,只允许在源dataframe上进行,一步到位

    ## 3. 解决方法1将get+set的两步操作,改成set的一步操作

    In [15]:

    df.loc[condition, "wen_cha"] = df["bWendu"] - df["yWendu"]

    In [18]:

    df.head(2)

    Out[18]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevelwen_cha
    02018-01-013-6晴~多云东北风1-2级592NaN
    12018-01-022-5阴~多云东北风1-2级491NaN

    In [19]:

    df[condition].head(2)

    Out[19]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevelwen_cha
    592018-03-018-3多云西南风1-2级46111.0
    602018-03-029-1晴~多云北风1-2级95210.0

    4. 解决方法2

    如果需要预筛选数据做后续的处理分析,先使用copy复制DataFrame并进行操作

    In [20]:

    # 复制一个新的DateFrame df_month3:筛选3月份的数据并复制df_month3 = df[condition].copy()

    In [22]:

    df_month3.head()

    Out[22]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevelwen_cha
    592018-03-018-3多云西南风1-2级46111.0
    602018-03-029-1晴~多云北风1-2级95210.0
    612018-03-03133多云~阴北风1-2级214重度污染510.0
    622018-03-047-2阴~多云东南风1-2级144轻度污染39.0
    632018-03-058-3南风1-2级94211.0

    In [24]:

    df_month3["wencha"] = df_month3["bWendu"] - df_month3["yWendu"]

    In [25]:

    df_month3.head()

    Out[25]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevelwen_chawencha
    592018-03-018-3多云西南风1-2级46111.011
    602018-03-029-1晴~多云北风1-2级95210.010
    612018-03-03133多云~阴北风1-2级214重度污染510.010
    622018-03-047-2阴~多云东南风1-2级144轻度污染39.09
    632018-03-058-3南风1-2级94211.011

    Pandas不允许先筛选子DataFrame,在进行修改写入

    • 要么使用.loc实现一个步骤直接修改源dataframe
    • 要么先复制一个子DataFrame再一个步骤执行修改

    八、Pandas数据排序

    Series的排序:
    *Series.sort_values(ascending=True, inplace=False)*
    参数说明:

    • ascending:默认为True升序排序,为False降序排序
    • inplace:是否修改原始Series

    DataFrame的排序:
    *DataFrame.sort_values(by, ascending=True, inplace=False)*
    参数说明:

    • by:字符串或者List<字符串>,单列排序或者多列排序
    • ascending:bool或者List,升序还是降序,如果是list对应by的多列
    • inplace:是否修改原始DataFrame

    In [1]:

    import pandas as pd

    0. 读取数据

    In [2]:

    fpath = "./pandas-learn-code/datas/beijing_tianqi/beijing_tianqi_2018.csv"df = pd.read_csv(fpath)

    In [4]:

    # 替换温度的后缀℃df.loc[:, "bWendu"] = df["bWendu"].str.replace("℃","").astype("int32")df.loc[:, "yWendu"] = df["yWendu"].str.replace("℃","").astype("int32")

    In [5]:

    df.head()

    Out[5]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    02018-01-013-6晴~多云东北风1-2级592
    12018-01-022-5阴~多云东北风1-2级491
    22018-01-032-5多云北风1-2级281
    32018-01-040-8东北风1-2级281
    42018-01-053-6多云~晴西北风1-2级501

    1. Series的排序

    In [7]:

    # 默认为升序df["aqi"].sort_values()

    Out[7]:

    271 21281 21249 22272 22301 22246 2435 2433 2410 24273 25282 25359 269 26111 2724 272 28264 28319 28250 28266 283 28265 28205 28197 28204 29258 29362 29283 30308 3022 31... 334 163109 164108 17068 171176 17470 174294 176124 177286 177147 17849 183131 18613 187118 193336 198287 198330 198306 20661 21490 218316 21957 220335 23485 243329 245317 26671 28791 28772 29386 387Name: aqi, Length: 365, dtype: int64

    In [10]:

    # 将排序方式调整为降序df["aqi"].sort_values(ascending=False)

    Out[10]:

    86 38772 29391 28771 287317 266329 24585 243335 23457 220316 21990 21861 214306 206330 198287 198336 198118 19313 187131 18649 183147 178286 177124 177294 17670 174176 17468 171108 170109 164334 163... 22 31308 30283 30362 29258 29204 29197 28205 28265 283 28266 28250 28319 28264 282 2824 27111 279 26359 26282 25273 2510 2433 2435 24246 24301 22272 22249 22281 21271 21Name: aqi, Length: 365, dtype: int64

    In [12]:

    # 对中文也可以排序df["tianqi"].sort_values()

    Out[12]:

    225 中雨~小雨230 中雨~小雨197 中雨~雷阵雨196 中雨~雷阵雨112 多云108 多云232 多云234 多云241 多云94 多云91 多云88 多云252 多云84 多云364 多云165 多云81 多云79 多云78 多云77 多云257 多云74 多云69 多云67 多云261 多云262 多云268 多云270 多云226 多云253 多云... 338 阴~多云111 阴~多云243 阴~小雨139 阴~小雨20 阴~小雪167 阴~雷阵雨237 雷阵雨195 雷阵雨223 雷阵雨187 雷阵雨168 雷阵雨188 雷阵雨193 雷阵雨175 雷阵雨218 雷阵雨~中雨216 雷阵雨~中雨224 雷阵雨~中雨222 雷阵雨~中雨189 雷阵雨~多云163 雷阵雨~多云180 雷阵雨~多云183 雷阵雨~多云194 雷阵雨~多云172 雷阵雨~多云233 雷阵雨~多云191 雷阵雨~大雨219 雷阵雨~阴335 雾~多云353 霾348 霾Name: tianqi, Length: 365, dtype: object

    2. DataFrame的排序

    2.1 单列排序

    In [13]:

    # 按照空气质量进行排序df.sort_values(by="aqi")

    Out[13]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    2712018-09-292211北风3-4级211
    2812018-10-09154多云~晴西北风4-5级211
    2492018-09-072716西北风3-4级221
    2722018-09-301913多云西北风4-5级221
    3012018-10-29153北风3-4级221
    2462018-09-043118西南风3-4级241
    352018-02-050-10北风3-4级241
    332018-02-030-9多云北风1-2级241
    102018-01-11-1-10北风1-2级241
    2732018-10-012412北风4-5级251
    2822018-10-10174多云~晴西北风1-2级251
    3592018-12-26-2-11晴~多云东北风2级261
    92018-01-10-2-10西北风1-2级261
    1112018-04-221612阴~多云东北风3-4级271
    242018-01-25-3-11多云东北风1-2级271
    22018-01-032-5多云北风1-2级281
    2642018-09-222413西北风3-4级281
    3192018-11-168-1晴~多云北风1-2级281
    2502018-09-082715多云~晴北风1-2级281
    2662018-09-242311北风1-2级281
    32018-01-040-8东北风1-2级281
    2652018-09-232312西北风4-5级281
    2052018-07-253225多云北风1-2级281
    1972018-07-172723中雨~雷阵雨西风1-2级281
    2042018-07-242826暴雨~雷阵雨东北风3-4级291
    2582018-09-162514多云~晴北风1-2级291
    3622018-12-29-3-12西北风2级291
    2832018-10-11185晴~多云北风1-2级301
    3082018-11-05102多云西南风1-2级301
    222018-01-23-4-12西北风3-4级311
    3342018-12-0170多云东南风1级163中度污染4
    1092018-04-202814多云~小雨南风4-5级164中度污染4
    1082018-04-192613多云东南风4-5级170中度污染4
    682018-03-1014-2东南风1-2级171中度污染4
    1762018-06-263625西南风3-4级174中度污染4
    702018-03-12153多云~晴南风1-2级174中度污染4
    2942018-10-22195多云~晴西北风1-2级176中度污染4
    1242018-05-052513多云北风3-4级177中度污染4
    2862018-10-142110多云南风1-2级177中度污染4
    1472018-05-283016西北风4-5级178中度污染4
    492018-02-196-3多云南风1-2级183中度污染4
    1312018-05-122816小雨东南风3-4级186中度污染4
    132018-01-146-5晴~多云西北风1-2级187中度污染4
    1182018-04-293016多云南风3-4级193中度污染4
    3362018-12-038-3多云~晴东北风3级198中度污染4
    2872018-10-151711小雨北风1-2级198中度污染4
    3302018-11-279-3晴~多云西北风2级198中度污染4
    3062018-11-03166多云南风1-2级206重度污染5
    612018-03-03133多云~阴北风1-2级214重度污染5
    902018-04-012511晴~多云南风1-2级218重度污染5
    3162018-11-13135多云东南风1-2级219重度污染5
    572018-02-2770东风1-2级220重度污染5
    3352018-12-0292雾~多云东北风1级234重度污染5
    852018-03-272711南风1-2级243重度污染5
    3292018-11-26100多云东南风1级245重度污染5
    3172018-11-14135多云南风1-2级266重度污染5
    712018-03-13175晴~多云南风1-2级287重度污染5
    912018-04-022611多云北风1-2级287重度污染5
    722018-03-14156多云~阴东北风1-2级293重度污染5
    862018-03-28259多云~晴东风1-2级387严重污染6

    365 rows × 9 columns

    In [14]:

    # 指定降序df.sort_values(by="aqi", ascending=False)

    Out[14]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    862018-03-28259多云~晴东风1-2级387严重污染6
    722018-03-14156多云~阴东北风1-2级293重度污染5
    712018-03-13175晴~多云南风1-2级287重度污染5
    912018-04-022611多云北风1-2级287重度污染5
    3172018-11-14135多云南风1-2级266重度污染5
    3292018-11-26100多云东南风1级245重度污染5
    852018-03-272711南风1-2级243重度污染5
    3352018-12-0292雾~多云东北风1级234重度污染5
    572018-02-2770东风1-2级220重度污染5
    3162018-11-13135多云东南风1-2级219重度污染5
    902018-04-012511晴~多云南风1-2级218重度污染5
    612018-03-03133多云~阴北风1-2级214重度污染5
    3062018-11-03166多云南风1-2级206重度污染5
    2872018-10-151711小雨北风1-2级198中度污染4
    3362018-12-038-3多云~晴东北风3级198中度污染4
    3302018-11-279-3晴~多云西北风2级198中度污染4
    1182018-04-293016多云南风3-4级193中度污染4
    132018-01-146-5晴~多云西北风1-2级187中度污染4
    1312018-05-122816小雨东南风3-4级186中度污染4
    492018-02-196-3多云南风1-2级183中度污染4
    1472018-05-283016西北风4-5级178中度污染4
    2862018-10-142110多云南风1-2级177中度污染4
    1242018-05-052513多云北风3-4级177中度污染4
    2942018-10-22195多云~晴西北风1-2级176中度污染4
    702018-03-12153多云~晴南风1-2级174中度污染4
    1762018-06-263625西南风3-4级174中度污染4
    682018-03-1014-2东南风1-2级171中度污染4
    1082018-04-192613多云东南风4-5级170中度污染4
    1092018-04-202814多云~小雨南风4-5级164中度污染4
    3342018-12-0170多云东南风1级163中度污染4
    2742018-10-022411西北风1-2级311
    3082018-11-05102多云西南风1-2级301
    2832018-10-11185晴~多云北风1-2级301
    3622018-12-29-3-12西北风2级291
    2582018-09-162514多云~晴北风1-2级291
    2042018-07-242826暴雨~雷阵雨东北风3-4级291
    22018-01-032-5多云北风1-2级281
    32018-01-040-8东北风1-2级281
    2502018-09-082715多云~晴北风1-2级281
    2052018-07-253225多云北风1-2级281
    1972018-07-172723中雨~雷阵雨西风1-2级281
    2642018-09-222413西北风3-4级281
    2662018-09-242311北风1-2级281
    2652018-09-232312西北风4-5级281
    3192018-11-168-1晴~多云北风1-2级281
    1112018-04-221612阴~多云东北风3-4级271
    242018-01-25-3-11多云东北风1-2级271
    92018-01-10-2-10西北风1-2级261
    3592018-12-26-2-11晴~多云东北风2级261
    2732018-10-012412北风4-5级251
    2822018-10-10174多云~晴西北风1-2级251
    332018-02-030-9多云北风1-2级241
    2462018-09-043118西南风3-4级241
    102018-01-11-1-10北风1-2级241
    352018-02-050-10北风3-4级241
    2492018-09-072716西北风3-4级221
    3012018-10-29153北风3-4级221
    2722018-09-301913多云西北风4-5级221
    2712018-09-292211北风3-4级211
    2812018-10-09154多云~晴西北风4-5级211

    365 rows × 9 columns

    2.2 多列排序

    In [15]:

    # 按空气质量等级、最高温度默认排序,默认升序df.sort_values(by=["aqiLevel", "bWendu"])

    Out[15]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    3602018-12-27-5-12多云~晴西北风3级481
    222018-01-23-4-12西北风3-4级311
    232018-01-24-4-11西南风1-2级341
    3402018-12-07-4-10西北风3级331
    212018-01-22-3-10小雪~多云东风1-2级471
    242018-01-25-3-11多云东北风1-2级271
    252018-01-26-3-10晴~多云南风1-2级391
    3612018-12-28-3-11西北风3级401
    3622018-12-29-3-12西北风2级291
    92018-01-10-2-10西北风1-2级261
    3392018-12-06-2-9西北风3级401
    3412018-12-08-2-10晴~多云西北风2级371
    3592018-12-26-2-11晴~多云东北风2级261
    3632018-12-30-2-11晴~多云东北风1级311
    102018-01-11-1-10北风1-2级241
    322018-02-02-1-9北风3-4级321
    32018-01-040-8东北风1-2级281
    332018-02-030-9多云北风1-2级241
    352018-02-050-10北风3-4级241
    82018-01-091-8西北风3-4级341
    342018-02-041-8西南风1-2级361
    402018-02-101-9西北风3-4级391
    3452018-12-121-8西南风1级501
    12018-01-022-5阴~多云东北风1-2级491
    22018-01-032-5多云北风1-2级281
    52018-01-062-5多云~阴西南风1-2级321
    72018-01-082-6西北风4-5级501
    142018-01-152-5东南风1-2级471
    42018-01-053-6多云~晴西北风1-2级501
    3462018-12-133-7西北风2级421
    3302018-11-279-3晴~多云西北风2级198中度污染4
    562018-02-2612-1晴~多云西南风1-2级157中度污染4
    682018-03-1014-2东南风1-2级171中度污染4
    702018-03-12153多云~晴南风1-2级174中度污染4
    2872018-10-151711小雨北风1-2级198中度污染4
    2942018-10-22195多云~晴西北风1-2级176中度污染4
    2862018-10-142110多云南风1-2级177中度污染4
    842018-03-26257多云西南风1-2级151中度污染4
    1242018-05-052513多云北风3-4级177中度污染4
    1082018-04-192613多云东南风4-5级170中度污染4
    1092018-04-202814多云~小雨南风4-5级164中度污染4
    1312018-05-122816小雨东南风3-4级186中度污染4
    1422018-05-232915西南风3-4级153中度污染4
    1182018-04-293016多云南风3-4级193中度污染4
    1472018-05-283016西北风4-5级178中度污染4
    1332018-05-143422晴~多云南风3-4级158中度污染4
    1762018-06-263625西南风3-4级174中度污染4
    572018-02-2770东风1-2级220重度污染5
    3352018-12-0292雾~多云东北风1级234重度污染5
    3292018-11-26100多云东南风1级245重度污染5
    612018-03-03133多云~阴北风1-2级214重度污染5
    3162018-11-13135多云东南风1-2级219重度污染5
    3172018-11-14135多云南风1-2级266重度污染5
    722018-03-14156多云~阴东北风1-2级293重度污染5
    3062018-11-03166多云南风1-2级206重度污染5
    712018-03-13175晴~多云南风1-2级287重度污染5
    902018-04-012511晴~多云南风1-2级218重度污染5
    912018-04-022611多云北风1-2级287重度污染5
    852018-03-272711南风1-2级243重度污染5
    862018-03-28259多云~晴东风1-2级387严重污染6

    365 rows × 9 columns

    In [17]:

    # 两个字段都是降序df.sort_values(by=["aqiLevel","bWendu"],ascending=False)

    Out[17]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    862018-03-28259多云~晴东风1-2级387严重污染6
    852018-03-272711南风1-2级243重度污染5
    912018-04-022611多云北风1-2级287重度污染5
    902018-04-012511晴~多云南风1-2级218重度污染5
    712018-03-13175晴~多云南风1-2级287重度污染5
    3062018-11-03166多云南风1-2级206重度污染5
    722018-03-14156多云~阴东北风1-2级293重度污染5
    612018-03-03133多云~阴北风1-2级214重度污染5
    3162018-11-13135多云东南风1-2级219重度污染5
    3172018-11-14135多云南风1-2级266重度污染5
    3292018-11-26100多云东南风1级245重度污染5
    3352018-12-0292雾~多云东北风1级234重度污染5
    572018-02-2770东风1-2级220重度污染5
    1762018-06-263625西南风3-4级174中度污染4
    1332018-05-143422晴~多云南风3-4级158中度污染4
    1182018-04-293016多云南风3-4级193中度污染4
    1472018-05-283016西北风4-5级178中度污染4
    1422018-05-232915西南风3-4级153中度污染4
    1092018-04-202814多云~小雨南风4-5级164中度污染4
    1312018-05-122816小雨东南风3-4级186中度污染4
    1082018-04-192613多云东南风4-5级170中度污染4
    842018-03-26257多云西南风1-2级151中度污染4
    1242018-05-052513多云北风3-4级177中度污染4
    2862018-10-142110多云南风1-2级177中度污染4
    2942018-10-22195多云~晴西北风1-2级176中度污染4
    2872018-10-151711小雨北风1-2级198中度污染4
    702018-03-12153多云~晴南风1-2级174中度污染4
    682018-03-1014-2东南风1-2级171中度污染4
    562018-02-2612-1晴~多云西南风1-2级157中度污染4
    3302018-11-279-3晴~多云西北风2级198中度污染4
    42018-01-053-6多云~晴西北风1-2级501
    3462018-12-133-7西北风2级421
    12018-01-022-5阴~多云东北风1-2级491
    22018-01-032-5多云北风1-2级281
    52018-01-062-5多云~阴西南风1-2级321
    72018-01-082-6西北风4-5级501
    142018-01-152-5东南风1-2级471
    82018-01-091-8西北风3-4级341
    342018-02-041-8西南风1-2级361
    402018-02-101-9西北风3-4级391
    3452018-12-121-8西南风1级501
    32018-01-040-8东北风1-2级281
    332018-02-030-9多云北风1-2级241
    352018-02-050-10北风3-4级241
    102018-01-11-1-10北风1-2级241
    322018-02-02-1-9北风3-4级321
    92018-01-10-2-10西北风1-2级261
    3392018-12-06-2-9西北风3级401
    3412018-12-08-2-10晴~多云西北风2级371
    3592018-12-26-2-11晴~多云东北风2级261
    3632018-12-30-2-11晴~多云东北风1级311
    212018-01-22-3-10小雪~多云东风1-2级471
    242018-01-25-3-11多云东北风1-2级271
    252018-01-26-3-10晴~多云南风1-2级391
    3612018-12-28-3-11西北风3级401
    3622018-12-29-3-12西北风2级291
    222018-01-23-4-12西北风3-4级311
    232018-01-24-4-11西南风1-2级341
    3402018-12-07-4-10西北风3级331
    3602018-12-27-5-12多云~晴西北风3级481

    365 rows × 9 columns

    In [18]:

    # 分别指定升序和降序df.sort_values(by=["aqiLevel", "bWendu"], ascending=[True, False])

    Out[18]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    1782018-06-283524多云~晴北风1-2级331
    1492018-05-303318西风1-2级461
    2062018-07-263325多云~雷阵雨东北风1-2级401
    1582018-06-083219多云~雷阵雨西南风1-2级431
    2052018-07-253225多云北风1-2级281
    2262018-08-153224多云东北风3-4级331
    2312018-08-203223多云~晴北风1-2级411
    2322018-08-213222多云北风1-2级381
    1482018-05-293116多云西北风1-2级411
    1962018-07-163124中雨~雷阵雨南风1-2级431
    2342018-08-233121多云北风1-2级431
    2402018-08-293120晴~多云北风3-4级441
    2462018-09-043118西南风3-4级241
    2472018-09-053119晴~多云西南风3-4级341
    1902018-07-103022多云~雷阵雨南风1-2级481
    2202018-08-093024多云南风1-2级491
    2272018-08-163021晴~多云东北风1-2级401
    2352018-08-243020北风1-2级401
    2192018-08-082924雷阵雨~阴东北风1-2级451
    2252018-08-142924中雨~小雨东北风1-2级421
    2412018-08-302920多云南风1-2级471
    2422018-08-312920多云~阴东南风1-2级481
    1372018-05-182816多云~晴南风1-2级491
    2042018-07-242826暴雨~雷阵雨东北风3-4级291
    2292018-08-182823小雨~中雨北风3-4级401
    2332018-08-222821雷阵雨~多云西南风1-2级481
    1922018-07-122722多云南风1-2级461
    1972018-07-172723中雨~雷阵雨西风1-2级281
    2432018-09-012719阴~小雨南风1-2级501
    2482018-09-062718多云~晴西北风4-5级371
    1422018-05-232915西南风3-4级153中度污染4
    1092018-04-202814多云~小雨南风4-5级164中度污染4
    1312018-05-122816小雨东南风3-4级186中度污染4
    1082018-04-192613多云东南风4-5级170中度污染4
    842018-03-26257多云西南风1-2级151中度污染4
    1242018-05-052513多云北风3-4级177中度污染4
    2862018-10-142110多云南风1-2级177中度污染4
    2942018-10-22195多云~晴西北风1-2级176中度污染4
    2872018-10-151711小雨北风1-2级198中度污染4
    702018-03-12153多云~晴南风1-2级174中度污染4
    682018-03-1014-2东南风1-2级171中度污染4
    562018-02-2612-1晴~多云西南风1-2级157中度污染4
    3302018-11-279-3晴~多云西北风2级198中度污染4
    3362018-12-038-3多云~晴东北风3级198中度污染4
    3342018-12-0170多云东南风1级163中度污染4
    132018-01-146-5晴~多云西北风1-2级187中度污染4
    492018-02-196-3多云南风1-2级183中度污染4
    852018-03-272711南风1-2级243重度污染5
    912018-04-022611多云北风1-2级287重度污染5
    902018-04-012511晴~多云南风1-2级218重度污染5
    712018-03-13175晴~多云南风1-2级287重度污染5
    3062018-11-03166多云南风1-2级206重度污染5
    722018-03-14156多云~阴东北风1-2级293重度污染5
    612018-03-03133多云~阴北风1-2级214重度污染5
    3162018-11-13135多云东南风1-2级219重度污染5
    3172018-11-14135多云南风1-2级266重度污染5
    3292018-11-26100多云东南风1级245重度污染5
    3352018-12-0292雾~多云东北风1级234重度污染5
    572018-02-2770东风1-2级220重度污染5
    862018-03-28259多云~晴东风1-2级387严重污染6

    365 rows × 9 columns

    九、Pandas字符串处理

    前面我们已经使用了字符串的处理函数:
    df[“bWendu”].str.replace(“℃”, “”).astype(‘int32’)

    *Pandas的字符串处理:*

  • 使用方法:先获取Series的str属性,然后在属性上调用函数;
  • 只能在字符串列上使用,不能数字列上使用;
  • Dataframe上没有str属性和处理方法
  • Series.str并不是Python原生字符串,而是自己的一套方法,不过大部分和原生str很相似;
  • *Series.str字符串方法列表参考文档:*
    https://pandas.pydata.org/pandas-docs/stable/reference/series.html#string-handling

    *本节演示内容:*

  • 获取Series的str属性,然后使用各种字符串处理函数
  • 使用str的startswith、contains等bool类Series可以做条件查询
  • 需要多次str处理的链式操作
  • 使用正则表达式的处理
  • 0. 读取北京2018年天气数据

    In [5]:

    import pandas as pd

    In [6]:

    fpath = "./pandas-learn-code/datas/beijing_tianqi/beijing_tianqi_2018.csv"df = pd.read_csv(fpath)

    In [8]:

    df.head()

    Out[8]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    02018-01-013℃-6℃晴~多云东北风1-2级592
    12018-01-022℃-5℃阴~多云东北风1-2级491
    22018-01-032℃-5℃多云北风1-2级281
    32018-01-040℃-8℃东北风1-2级281
    42018-01-053℃-6℃多云~晴西北风1-2级501

    In [13]:

    df.dtypes

    Out[13]:

    ymd objectbWendu objectyWendu objecttianqi objectfengxiang objectfengli objectaqi int64aqiInfo objectaqiLevel int64dtype: object

    1. 获取Series的str属性,使用各种字符串处理函数

    In [14]:

    df["bWendu"].str

    Out[14]:

    <pandas.core.strings.StringMethods at 0x25ffbcce898>

    In [15]:

    # 字符串替换函数df.loc[:, "bWendu"] = df["bWendu"].str.replace("℃","").astype("int32")

    In [16]:

    df.head()

    Out[16]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    02018-01-013-6℃晴~多云东北风1-2级592
    12018-01-022-5℃阴~多云东北风1-2级491
    22018-01-032-5℃多云北风1-2级281
    32018-01-040-8℃东北风1-2级281
    42018-01-053-6℃多云~晴西北风1-2级501

    In [19]:

    # 判断是不是数字df["yWendu"].str.isnumeric()

    Out[19]:

    0 False1 False2 False3 False4 False5 False6 False7 False8 False9 False10 False11 False12 False13 False14 False15 False16 False17 False18 False19 False20 False21 False22 False23 False24 False25 False26 False27 False28 False29 False... 335 False336 False337 False338 False339 False340 False341 False342 False343 False344 False345 False346 False347 False348 False349 False350 False351 False352 False353 False354 False355 False356 False357 False358 False359 False360 False361 False362 False363 False364 FalseName: yWendu, Length: 365, dtype: bool

    In [21]:

    # 在数列列上调用str会报错df["aqi"].str.len()

    . . .

    ## 2. 使用str的startswith, contains等得到的bool的Series可以做条件查询

    In [23]:

    # 查询三月数据condition = df["ymd"].str.startswith("2018-03")

    In [25]:

    condition

    . . .

    In [27]:

    df[condition].head()

    Out[27]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    592018-03-018-3℃多云西南风1-2级461
    602018-03-029-1℃晴~多云北风1-2级952
    612018-03-03133℃多云~阴北风1-2级214重度污染5
    622018-03-047-2℃阴~多云东南风1-2级144轻度污染3
    632018-03-058-3℃南风1-2级942
    ## 3. 需要多次str处理的链式操作怎么提取201803这样的数字月份1. 先将日期2018-03-31替换成20180331的形式2. 提取月份字符串201803

    In [28]:

    df["ymd"].str.replace("-","")

    . . .

    In [29]:

    # 每次调用函数,都返回一个新Series# 不能直接在Series上调用str方法df["ymd"].str.replace("-","").slice(0, 6)---------------------------------------------------------------------------AttributeError Traceback (most recent call last)<ipython-input-29-a6e2e7006edf> in <module>1 # 每次调用函数,都返回一个新Series----> 2 df["ymd"].str.replace("-","").slice(0, 6)D:\Tools\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)5065 if self._info_axis._can_hold_identifiers_and_holds_name(name):5066 return self[name]-> 5067 return object.__getattribute__(self, name)5068 5069 def __setattr__(self, name, value):AttributeError: 'Series' object has no attribute 'slice'

    In [31]:

    # replace后得到的是Series,通过再次.str后才能切片df["ymd"].str.replace("-","").str.slice(0, 6)

    Out[31]:

    0 2018011 2018012 2018013 2018014 2018015 2018016 2018017 2018018 2018019 20180110 20180111 20180112 20180113 20180114 20180115 20180116 20180117 20180118 20180119 20180120 20180121 20180122 20180123 20180124 20180125 20180126 20180127 20180128 20180129 201801... 335 201812336 201812337 201812338 201812339 201812340 201812341 201812342 201812343 201812344 201812345 201812346 201812347 201812348 201812349 201812350 201812351 201812352 201812353 201812354 201812355 201812356 201812357 201812358 201812359 201812360 201812361 201812362 201812363 201812364 201812Name: ymd, Length: 365, dtype: object

    In [32]:

    # slice就是切片语法,可以直接调用df["ymd"].str.replace("-","").str[0:6]

    Out[32]:

    0 2018011 2018012 2018013 2018014 2018015 2018016 2018017 2018018 2018019 20180110 20180111 20180112 20180113 20180114 20180115 20180116 20180117 20180118 20180119 20180120 20180121 20180122 20180123 20180124 20180125 20180126 20180127 20180128 20180129 201801... 335 201812336 201812337 201812338 201812339 201812340 201812341 201812342 201812343 201812344 201812345 201812346 201812347 201812348 201812349 201812350 201812351 201812352 201812353 201812354 201812355 201812356 201812357 201812358 201812359 201812360 201812361 201812362 201812363 201812364 201812Name: ymd, Length: 365, dtype: object

    In [37]:

    df.head()

    Out[37]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    02018-01-013-6℃晴~多云东北风1-2级592
    12018-01-022-5℃阴~多云东北风1-2级491
    22018-01-032-5℃多云北风1-2级281
    32018-01-040-8℃东北风1-2级281
    42018-01-053-6℃多云~晴西北风1-2级501

    4. 使用正则表达式的处理

    In [42]:

    # 添加新列def get_nianyueri(x):year, month, day = x["ymd"].split("-")return f"{year}年{month}月{day}日"df["中文日期"] = df.apply(get_nianyueri, axis=1)

    In [40]:

    df["中文日期"]

    . . .

    问题:怎么将"2018年12月31日"中的年,月,日三个中文字符去除

    In [44]:

    # 方法1:链式replacedf["中文日期"].str.replace("年","").str.replace("月","").str.replace("日","")

    . . .

    Series.str默认就开启了正则表达式模式

    In [43]:

    # 方法2:正则表达式替换df["中文日期"].str.replace("[年月日]","")

    . . .

    十、Pandas的axis参数怎么理解?

    • axis=0或者"index":
      • 如果是单行操作,就指的是某一行
      • 如果是聚合操作,指的是跨行cross rows
    • axis=1或者"columns":
      • 如果是单列操作,就指的是某一列
      • 如果是聚合操作,指的是跨列cross columns

    *按哪个axis,就是这个axis要动起来(类似被for遍历),其它的axis保持不动*

    In [ ]:

    In [2]:

    import pandas as pdimport numpy as np

    In [7]:

    df = pd.DataFrame(np.arange(12).reshape(3,4),columns = ["A", "B", "C", "D"])

    In [8]:

    df

    Out[8]:

    ABCD
    00123
    14567
    2891011

    1. 单列drop, 就是删除某一列

    In [9]:

    # 代表的就是删除某列df.drop("A", axis=1)

    Out[9]:

    BCD
    0123
    1567
    291011

    In [10]:

    # 代表的就是删除某行df.drop(1, axis=0)

    Out[10]:

    ABCD
    00123
    2891011

    3. 按照axis=0/index执行mean聚合操作

    • 反直觉:输出的不是每行的结果,而是每列的结果

    In [11]:

    df

    Out[11]:

    ABCD
    00123
    14567
    2891011

    In [16]:

    # axis=0 or axis=indexdf.mean(axis=0)

    Out[16]:

    A 4.0B 5.0C 6.0D 7.0dtype: float64

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-2VilxhcD-1597761927700)(http://localhost:8888/notebooks/pandas-learn-code/other_files/pandas-axis-index.png)]

    *按哪个axis,就是这个axis要动起来(类似被for遍历),其它的axis保持不动*

    3. 按照axis=1/columns执行mean聚合操作

    • 反直觉:输出的不是每行的结果,而是每列的结果

    In [21]:

    df

    Out[21]:

    ABCD
    00123
    14567
    2891011

    In [22]:

    # axis=1 or axis = columnsdf.mean(axis=1)

    Out[22]:

    0 1.51 5.52 9.5dtype: float64

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-XahGzry0-1597761927702)(http://localhost:8888/notebooks/pandas-learn-code/other_files/pandas-axis-columns.png)]

    *按哪个axis,就是这个axis要动起来(类似被for遍历),其它的axis保持不动*

    5. 再次举例, 加深理解

    In [23]:

    def get_sum_value(x):return x["A"] + x["B"] + x["C"] + x["D"]df["sum_value"] = df.apply(get_sum_value, axis=1)

    In [24]:

    df

    Out[24]:

    ABCDsum_value
    001236
    1456722
    289101138

    *按哪个axis,就是这个axis要动起来(类似被for遍历),其它的axis保持不动*

    In [27]:

    df["A"]

    Out[27]:

    0 01 42 8Name: A, dtype: int32

    十一、Pandas的索引index的用途

    把数据存储于普通的column列也能用于数据查询,那使用index有什么好处?

    index的用途总结:

  • 更方便的数据查询;
  • 使用index可以获得性能提升;
  • 自动的数据对齐功能;
  • 更多更强大的数据结构支持;
  • In [1]:

    import pandas as pd

    In [2]:

    df = pd.read_csv("./pandas-learn-code/datas/ml-latest-small/ratings.csv")

    In [3]:

    df.head()

    Out[3]:

    userIdmovieIdratingtimestamp
    0114.0964982703
    1134.0964981247
    2164.0964982224
    31475.0964983815
    41505.0964982931

    In [4]:

    df.count()

    Out[4]:

    userId 100836movieId 100836rating 100836timestamp 100836dtype: int64

    1. 使用index查询数据

    In [5]:

    # drop==False,让索引列还保持在column# 下列代码实现了将userId设置成了index,同时保留了userIddf.set_index("userId", inplace=True, drop=False)

    In [6]:

    df.head()

    Out[6]:

    userIdmovieIdratingtimestamp
    userId
    1114.0964982703
    1134.0964981247
    1164.0964982224
    11475.0964983815
    11505.0964982931

    In [7]:

    df.index

    Out[7]:

    Int64Index([ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...610, 610, 610, 610, 610, 610, 610, 610, 610, 610],dtype='int64', name='userId', length=100836)

    In [8]:

    # 使用index的查询方法:在loc[]中直接写入要查询的参数# 查询userId为500的用户信息df.loc[500].head(5)

    Out[8]:

    userIdmovieIdratingtimestamp
    userId
    50050014.01005527755
    500500111.01005528017
    500500391.01005527926
    5005001011.01005527980
    5005001044.01005528065

    In [9]:

    # 使用column的condition查询方法df.loc[df["userId"]==500].head()

    Out[9]:

    userIdmovieIdratingtimestamp
    userId
    50050014.01005527755
    500500111.01005528017
    500500391.01005527926
    5005001011.01005527980
    5005001044.01005528065

    2. 使用index会提升查询性能

    • 如果index是唯一的,Pandas会使用哈希表优化,查询性能为O(1);
    • 如果index不是唯一的,但是有序,Pandas会使用二分查找算法,查询性能为O(logN);
    • 如果index是完全随机的,那么每次查询都要扫描全表,查询性能为O(N);

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-3xFJcngE-1597761927705)(http://localhost:8888/notebooks/pandas-learn-code/other_files/pandas-index-performance.png)]

    实验1:完全随机的查询顺序

    In [11]:

    # 将数据随机打散from sklearn.utils import shuffledf_shuffle = shuffle(df)

    In [12]:

    df_shuffle.head()

    Out[12]:

    userIdmovieIdratingtimestamp
    userId
    24424413774.0975093513
    41341337535.01484439911
    28028065393.51348435219
    1818863323.51455051197
    27427431602.51197275106

    In [13]:

    # 索引是否是递增的df_shuffle.index.is_monotonic_increasing

    Out[13]:

    False

    In [14]:

    # 索引是否是唯一的df_shuffle.index.is_unique

    Out[14]:

    False

    In [15]:

    # 计时,查看id==500的数据性能# %timeit将名称执行多次,查看性能%timeit df_shuffle.loc[500]366 µs ± 7.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

    实验2:将index排序后的查询

    In [17]:

    # 将df_shuffle进行排序df_sorted = df_shuffle.sort_index()

    In [18]:

    df_sorted.head()

    Out[18]:

    userIdmovieIdratingtimestamp
    userId
    1135785.0964980668
    1124064.0964982310
    111104.0964982176
    1120905.0964982838
    1120964.0964982838

    In [19]:

    # 索引是否是递增的df_sorted.index.is_monotonic_increasing

    Out[19]:

    True

    In [20]:

    df_sorted.index.is_unique

    Out[20]:

    False

    In [21]:

    %timeit df_sorted.loc[500]178 µs ± 4.55 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

    3.使用index能自动对齐数据

    包括Series和DateFrame

    In [22]:

    s1 = pd.Series([1,2,3], index=list("abc"))

    In [23]:

    s1

    Out[23]:

    a 1b 2c 3dtype: int64

    In [24]:

    s2 = pd.Series([2,3,4], index=list("bcd"))

    In [25]:

    s2

    Out[25]:

    b 2c 3d 4dtype: int64

    In [26]:

    s1 + s2

    Out[26]:

    a NaNb 4.0c 6.0d NaNdtype: float64

    s1,s2都具有b,c索引,而a,d为各自独有,无法对齐,所有相加结果为空

    4. 使用index更多更强大的数据结构支持

    *很多强大的索引数据结构*

    • CategoricalIndex,基于分类数据的Index,提升性能;
    • MultiIndex,多维索引,用于groupby多维聚合后结果等;
    • DatetimeIndex,时间类型索引,强大的日期和时间的方法支持;

    十二、Pandas怎样实现DataFrame的Merge

    Pandas的Merge,相当于Sql的Join,将不同的表按key关联到一个表

    merge的语法:

    pd.merge(left, right, how=‘inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True, suffixes=(’*x’, ‘*y’), copy=True, indicator=False, validate=None)

    • left,right:要merge的dataframe或者有name的Series
    • how:join类型,‘left’, ‘right’, ‘outer’, ‘inner’
    • on:join的key,left和right都需要有这个key
    • left_on:left的df或者series的key
    • right_on:right的df或者seires的key
    • left_index,right_index:使用index而不是普通的column做join
    • suffixes:两个元素的后缀,如果列有重名,自动添加后缀,默认是(’*x’, ‘*y’)

    文档地址:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

    本次讲解提纲:

  • 电影数据集的join实例
  • 理解merge时一对一、一对多、多对多的数量对齐关系
  • 理解left join、right join、inner join、outer join的区别
  • 如果出现非Key的字段重名怎么办
  • 1、电影数据集的join实例

    电影评分数据集

    是推荐系统研究的很好的数据集
    位于本代码目录:./datas/movielens-1m

    包含三个文件:

  • 用户对电影的评分数据 ratings.dat
  • 用户本身的信息数据 users.dat
  • 电影本身的数据 movies.dat
  • 可以关联三个表,得到一个完整的大表

    数据集官方地址:https://grouplens.org/datasets/movielens/

    In [8]:

    import pandas as pd

    In [12]:

    df_ratings = pd.read_csv("./pandas-learn-code/datas/movielens-1m/ratings.dat",sep = "::",engine = "python",names = "UserID::MovieID::Rating::Timestamp".split("::"))

    In [13]:

    df_ratings.head()

    Out[13]:

    UserIDMovieIDRatingTimestamp
    0111935978300760
    116613978302109
    219143978301968
    3134084978300275
    4123555978824291

    In [14]:

    df_users = pd.read_csv("./pandas-learn-code/datas/movielens-1m/users.dat",sep = "::",engine = "python",names = "UserID::Gender::Age::Occupation::Zip-code".split("::"))

    In [15]:

    df_users.head()

    Out[15]:

    UserIDGenderAgeOccupationZip-code
    01F11048067
    12M561670072
    23M251555117
    34M45702460
    45M252055455

    In [17]:

    df_movies = pd.read_csv("./pandas-learn-code/datas/movielens-1m/movies.dat",sep = "::",engine = "python",names = "MovieID::Title::Genres".split("::"))

    In [18]:

    df_movies.head()

    Out[18]:

    MovieIDTitleGenres
    01Toy Story (1995)Animation|Children’s|Comedy
    12Jumanji (1995)Adventure|Children’s|Fantasy
    23Grumpier Old Men (1995)Comedy|Romance
    34Waiting to Exhale (1995)Comedy|Drama
    45Father of the Bride Part II (1995)Comedy

    In [ ]:

    df_

    In [21]:

    # inner:两边都有某个数据时才会保留df_ratings_users = pd.merge(df_ratings, df_users, left_on="UserID", right_on="UserID", how="inner")

    In [22]:

    df_ratings_users.head()

    Out[22]:

    UserIDMovieIDRatingTimestampGenderAgeOccupationZip-code
    0111935978300760F11048067
    116613978302109F11048067
    219143978301968F11048067
    3134084978300275F11048067
    4123555978824291F11048067

    In [25]:

    df_ratings_users_movies = pd.merge(df_ratings_users, df_movies, left_on="MovieID", right_on="MovieID", how="inner")

    In [26]:

    df_ratings_users_movies.head()

    Out[26]:

    UserIDMovieIDRatingTimestampGenderAgeOccupationZip-codeTitleGenres
    0111935978300760F11048067One Flew Over the Cuckoo’s Nest (1975)Drama
    1211935978298413M561670072One Flew Over the Cuckoo’s Nest (1975)Drama
    21211934978220179M251232793One Flew Over the Cuckoo’s Nest (1975)Drama
    31511934978199279M25722903One Flew Over the Cuckoo’s Nest (1975)Drama
    41711935978158471M50195350One Flew Over the Cuckoo’s Nest (1975)Drama

    2、理解merge时数量的对齐关系

    以下关系要正确理解:

    • one-to-one:一对一关系,关联的key都是唯一的
      • 比如(学号,姓名) merge (学号,年龄)
      • 结果条数为:1*1
    • one-to-many:一对多关系,左边唯一key,右边不唯一key
      • 比如(学号,姓名) merge (学号,[语文成绩、数学成绩、英语成绩])
      • 结果条数为:1*N
    • many-to-many:多对多关系,左边右边都不是唯一的
      • 比如(学号,[语文成绩、数学成绩、英语成绩]) merge (学号,[篮球、足球、乒乓球])
      • 结果条数为:M*N

    2.1 one-to-one 一对一关系的merge

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-gL3djpVk-1597761927707)(http://localhost:8888/notebooks/pandas-learn-code/other_files/pandas-merge-one-to-one.png)]

    In [31]:

    left = pd.DataFrame({"sno":[11, 12, 13, 14],"name":["name_a","name_b","name_c","name_d"]})left

    Out[31]:

    snoname
    011name_a
    112name_b
    213name_c
    314name_d

    In [28]:

    right = pd.DataFrame({"sno":[11, 12, 13, 14],"age":["21","22","23","24"]})right

    Out[28]:

    snoage
    01121
    11222
    21323
    31424

    In [30]:

    # 一对一关系,结果中有4条pd.merge(left, right, on="sno")

    Out[30]:

    snonameage
    011name_a21
    112name_b22
    213name_c23
    314name_d24

    2.2 one-to-many 一对多关系的merge

    注意:数据会被复制

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7xToCx8V-1597761927707)(http://localhost:8888/notebooks/pandas-learn-code/other_files/pandas-merge-one-to-many.png)]

    In [32]:

    left = pd.DataFrame({"sno":[11, 12, 13, 14],"name":["name_a","name_b","name_c","name_d"]})left

    Out[32]:

    snoname
    011name_a
    112name_b
    213name_c
    314name_d

    In [33]:

    right = pd.DataFrame({"sno":[11, 11, 11, 12, 12, 13],"grade":["语文88","数学90","英语75","语文66", "数学55", "英语29"]})right

    Out[33]:

    snograde
    011语文88
    111数学90
    211英语75
    312语文66
    412数学55
    513英语29

    In [35]:

    # 数目以多的一边为准pd.merge(left, right, on="sno")

    Out[35]:

    snonamegrade
    011name_a语文88
    111name_a数学90
    211name_a英语75
    312name_b语文66
    412name_b数学55
    513name_c英语29

    2.3 many-to-many 多对多关系的merge

    注意:结果数量会出现乘法

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Y9grLyta-1597761927708)(http://localhost:8888/notebooks/pandas-learn-code/other_files/pandas-merge-many-to-many.png)]

    In [36]:

    left = pd.DataFrame({"sno":[11, 11, 12, 12, 12],"爱好":["篮球","羽毛球","乒乓球","篮球", "足球"]})left

    Out[36]:

    sno爱好
    011篮球
    111羽毛球
    212乒乓球
    312篮球
    412足球

    In [37]:

    right = pd.DataFrame({"sno":[11, 11, 11, 12, 12, 13],"grade":["语文88","数学90","英语75","语文66", "数学55", "英语29"]})right

    Out[37]:

    snograde
    011语文88
    111数学90
    211英语75
    312语文66
    412数学55
    513英语29

    In [38]:

    pd.merge(left, right, on="sno")

    Out[38]:

    sno爱好grade
    011篮球语文88
    111篮球数学90
    211篮球英语75
    311羽毛球语文88
    411羽毛球数学90
    511羽毛球英语75
    612乒乓球语文66
    712乒乓球数学55
    812篮球语文66
    912篮球数学55
    1012足球语文66
    1112足球数学55

    3、理解left join、right join、inner join、outer join的区别

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-45nVqPR9-1597761927708)(http://localhost:8888/notebooks/pandas-learn-code/other_files/pandas-leftjoin-rightjoin-outerjoin.png)]

    In [52]:

    left = pd.DataFrame({"key":["K0", "K1", "K2", "K3"],"A":["A0","A1","A2","A3"],"B":["B0","B1","B2","B3"]})right = pd.DataFrame({"key":["K0", "K1", "K4", "K5"],"C":["C0","C1","C2","C3"],"D":["D0","D1","D2","D3"]})

    In [53]:

    left

    Out[53]:

    keyAB
    0K0A0B0
    1K1A1B1
    2K2A2B2
    3K3A3B3

    In [54]:

    right

    Out[54]:

    keyCD
    0K0C0D0
    1K1C1D1
    2K4C2D2
    3K5C3D3

    3.1 inner join,默认

    左边和右边的key都有,才会出现在结果里

    In [55]:

    pd.merge(left, right, how="inner")

    Out[55]:

    keyABCD
    0K0A0B0C0D0
    1K1A1B1C1D1

    3.2 left join

    左边的都会出现在结果里,右边的如果无法匹配则为Null

    In [56]:

    pd.merge(left, right, how="left")

    Out[56]:

    keyABCD
    0K0A0B0C0D0
    1K1A1B1C1D1
    2K2A2B2NaNNaN
    3K3A3B3NaNNaN

    3.3 right join

    右边的都会出现在结果里,左边的如果无法匹配则为Null

    In [57]:

    pd.merge(left, right, how="right")

    Out[57]:

    keyABCD
    0K0A0B0C0D0
    1K1A1B1C1D1
    2K4NaNNaNC2D2
    3K5NaNNaNC3D3

    3.4 outer join

    左边、右边的都会出现在结果里,如果无法匹配则为Null

    In [58]:

    pd.merge(left, right, how="outer")

    Out[58]:

    keyABCD
    0K0A0B0C0D0
    1K1A1B1C1D1
    2K2A2B2NaNNaN
    3K3A3B3NaNNaN
    4K4NaNNaNC2D2
    5K5NaNNaNC3D3

    4、如果出现非Key的字段重名怎么办

    In [61]:

    left = pd.DataFrame({"key":["K0", "K1", "K2", "K3"],"A":["A0","A1","A2","A3"],"B":["B0","B1","B2","B3"]})right = pd.DataFrame({"key":["K0", "K1", "K4", "K5"],"A":["A10","A11","A12","A13"],"D":["D0","D1","D4","D5"]})

    In [60]:

    left

    Out[60]:

    keyAB
    0K0A0B0
    1K1A1B1
    2K2A2B2
    3K3A3B3

    In [62]:

    right

    Out[62]:

    keyAD
    0K0A10D0
    1K1A11D1
    2K4A12D4
    3K5A13D5

    In [64]:

    pd.merge(left, right, on="key")

    Out[64]:

    keyA_xBA_yD
    0K0A0B0A10D0
    1K1A1B1A11D1

    In [65]:

    # 两个元素的后缀,如果列有重名,自动添加后缀,默认是('x', 'y')pd.merge(left, right, on="key", suffixes=('_left', '_right'))

    Out[65]:

    keyA_leftBA_rightD
    0K0A0B0A10D0
    1K1A1B1A11D1

    十三、Pandas实现数据的合并concat

    使用场景:

    批量合并相同格式的Excel、给DataFrame添加行、给DataFrame添加列

    一句话说明concat语法:

    • 使用某种合并方式(inner/outer)
    • 沿着某个轴向(axis=0/1)
    • 把多个Pandas对象(DataFrame/Series)合并成一个。

    concat语法:pandas.concat(objs, axis=0, join=‘outer’, ignore_index=False)

    • objs:一个列表,内容可以是DataFrame或者Series,可以混合
    • axis:默认是0代表按行合并,如果等于1代表按列合并
    • join:合并的时候索引的对齐方式,默认是outer join,也可以是inner join
    • ignore_index:是否忽略掉原来的数据索引

    append语法:DataFrame.append(other, ignore_index=False)

    append只有按行合并,没有按列合并,相当于concat按行的简写形式

    • other:单个dataframe、series、dict,或者列表
    • ignore_index:是否忽略掉原来的数据索引

    参考文档:

    • pandas.concat的api文档:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html
    • pandas.concat的教程:https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
    • pandas.append的api文档:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html

    In [1]:

    import pandas as pdimport warningswarnings.filterwarnings("ignore")

    一、使用Pandas.concat合并数据

    In [2]:

    df1 = pd.DataFrame({"A":["A0","A1","A2","A3"],"B":["B0","B1","B2","B3"],"C":["C0","C1","C2","C3"],"D":["D0","D1","D2","D3"],"E":["E0","E1","E2","E3"]})df1

    Out[2]:

    ABCDE
    0A0B0C0D0E0
    1A1B1C1D1E1
    2A2B2C2D2E2
    3A3B3C3D3E3

    In [3]:

    df2 = pd.DataFrame({"A":["A4","A5","A6","A7"],"B":["B4","B5","B6","B7"],"C":["C4","C5","C6","C7"],"D":["D4","D5","D6","D7"],"F":["F4","F5","F6","F7"]})df2

    Out[3]:

    ABCDF
    0A4B4C4D4F4
    1A5B5C5D5F5
    2A6B6C6D6F6
    3A7B7C7D7F7

    1. 默认的concat, 参数为axis=0, join=outer, ignore_index=False

    In [4]:

    pd.concat([df1, df2])

    Out[4]:

    ABCDEF
    0A0B0C0D0E0NaN
    1A1B1C1D1E1NaN
    2A2B2C2D2E2NaN
    3A3B3C3D3E3NaN
    0A4B4C4D4NaNF4
    1A5B5C5D5NaNF5
    2A6B6C6D6NaNF6
    3A7B7C7D7NaNF7

    2. 使用ignore_index=True可以忽略原来的索引

    In [5]:

    pd.concat([df1, df2], ignore_index=True)

    Out[5]:

    ABCDEF
    0A0B0C0D0E0NaN
    1A1B1C1D1E1NaN
    2A2B2C2D2E2NaN
    3A3B3C3D3E3NaN
    4A4B4C4D4NaNF4
    5A5B5C5D5NaNF5
    6A6B6C6D6NaNF6
    7A7B7C7D7NaNF7

    3. 使用join=inner过滤掉不匹配的列

    In [6]:

    pd.concat([df1, df2], ignore_index=True, join="inner")

    Out[6]:

    ABCD
    0A0B0C0D0
    1A1B1C1D1
    2A2B2C2D2
    3A3B3C3D3
    4A4B4C4D4
    5A5B5C5D5
    6A6B6C6D6
    7A7B7C7D7

    4. 使用axis=1相当于添加新列

    In [7]:

    df1

    Out[7]:

    ABCDE
    0A0B0C0D0E0
    1A1B1C1D1E1
    2A2B2C2D2E2
    3A3B3C3D3E3

    A:添加一列Series

    In [9]:

    s1 = pd.Series(list(range(4)), name="F")pd.concat([df1, s1], axis=1)

    Out[9]:

    ABCDEF
    0A0B0C0D0E00
    1A1B1C1D1E11
    2A2B2C2D2E22
    3A3B3C3D3E33

    B:添加多列Series

    In [10]:

    s2 = df1.apply(lambda x:x["A"] + "_GG", axis=1)

    In [11]:

    s2

    Out[11]:

    0 A0_GG1 A1_GG2 A2_GG3 A3_GGdtype: object

    In [12]:

    s2.name="G"

    In [13]:

    # 列表可以只有Seriespd.concat([s1,s2], axis=1)

    Out[13]:

    FG
    00A0_GG
    11A1_GG
    22A2_GG
    33A3_GG

    In [14]:

    # 列表是可以混合顺序的pd.concat([s1, df1, s2], axis=1)

    Out[14]:

    FABCDEG
    00A0B0C0D0E0A0_GG
    11A1B1C1D1E1A1_GG
    22A2B2C2D2E2A2_GG
    33A3B3C3D3E3A3_GG

    二、使用DateFrame.append按行合并数据

    In [15]:

    df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list("AB"))df1

    Out[15]:

    AB
    012
    134

    In [16]:

    df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list("AB"))df2

    Out[16]:

    AB
    056
    178

    1. 给一个DataFrame添加另一个DataFrame

    In [18]:

    df1.append(df2)

    Out[18]:

    AB
    012
    134
    056
    178

    2. 忽略原来的索引,另ignore_index=True

    In [19]:

    df1.append(df2, ignore_index=True)

    Out[19]:

    AB
    012
    134
    256
    378

    3.可以一行一行的给DataFrame添加数据

    In [21]:

    # 创建一个空的dfdf = pd.DataFrame(columns=["A"])df

    Out[21]:

    A

    A:低性能版

    In [22]:

    for i in range(5):# 注意:这里每次都在复制 df = df.append({"a":i}, ignore_index=True)df

    Out[22]:

    Aa
    0NaN0.0
    1NaN1.0
    2NaN2.0
    3NaN3.0
    4NaN4.0

    B:性能好的版本

    In [23]:

    # 第一个pd.concat([pd.DataFrame([i], columns=["A"]) for i in range(5)],ignore_index=True)

    Out[23]:

    A
    00
    11
    22
    33
    44

    In [27]:

    ss = pd.DataFrame( i for i in range(5))ss

    Out[27]:

    0
    00
    11
    22
    33
    44

    十四、Pandas批量拆分Excel与合并Excel

    实例演示:

  • 将一个大Excel等份拆成多个Excel
  • 将多个小Excel合并成一个大Excel并标记来源
  • In [51]:

    work_dir = "D:/WinterIsComing/python/New_Wave/pandas_basic/15.excel_split_merge"# 用来放置拆分后的小文件splits_dir = f"{work_dir}/splits"

    In [52]:

    import os if not os.path.exists(splits_dir):os.mkdir(splits_dir)

    0. 读取源Excel到Pandas

    In [54]:

    import pandas as pd

    In [53]:

    df_source = pd.read_excel(r"D:/WinterIsComing/python/New_Wave/pandas_basic/15.excel_split_merge/crazyant_blog_articles_source.xlsx")

    In [55]:

    df_source.head()

    Out[55]:

    idtitletags
    02585Tensorflow怎样接收变长列表特征python,tensorflow,特征工程
    12583Pandas实现数据的合并concatpandas,python,数据分析
    22574Pandas的Index索引有什么用途?pandas,python,数据分析
    32564机器学习常用数据集大全python,机器学习
    42561一个数据科学家的修炼路径数据分析

    In [56]:

    df_source.index

    Out[56]:

    RangeIndex(start=0, stop=258, step=1)

    In [57]:

    # 258行,3列df_source.shape

    Out[57]:

    (258, 3)

    In [58]:

    # 通过df_source.shape得到元组(258, 3)# 通过df_source.shape[0]得到行数total_row_count = df_source.shape[0]total_row_count

    Out[58]:

    258

    1、将一个大excel等份拆成多个Excel

  • 使用df.iloc方法,将一个大的DataFrame,拆分成多个小DataFrame
  • 将使用DataFrame.to_excel保存每个小Excel
  • 1. 1 计算拆分后的每个excel的行数

    In [59]:

    # 将一个大Excel,拆分给这几个人user_names = ["A", "B", "C", "D", "E", "F"]

    In [60]:

    # 每个人的任务数目split_size = total_row_count // len(user_names)# 此处的作用在于如果有余数,可以将未分配的行数,分配给前面几人,保证所有的行都分配出去if total_row_count % len(user_names) != 0:split_size += 1split_size

    Out[60]:

    43

    1.2 拆分成多个DataFrame

    In [64]:

    df_subs = []for idx, user_name in enumerate(user_names):# iloc的开始索引begin = idx*split_size# iloc的结束索引end = begin + split_size# 实现df按照iloc拆分df_sub = df_source.iloc[begin:end]# 将每个子df存入列表df_subs.append((idx, user_name, df_sub))df_subs[0][2].head(5)

    Out[64]:

    idtitletags
    02585Tensorflow怎样接收变长列表特征python,tensorflow,特征工程
    12583Pandas实现数据的合并concatpandas,python,数据分析
    22574Pandas的Index索引有什么用途?pandas,python,数据分析
    32564机器学习常用数据集大全python,机器学习
    42561一个数据科学家的修炼路径数据分析

    In [65]:

    df_subs[1][2].head(5)

    Out[65]:

    idtitletags
    432120Zookeeper并不保证读取的是最新数据zookeeper
    442089Mybatis源码解读-初始化过程详解mybatis
    452076怎样借助Python爬虫给宝宝起个好名字python,爬虫
    462022Mybatis源码解读-设计模式总结mybatis,设计模式
    472012打工者心态、主人公意识、个人公司品牌程序人生

    1.3 将每个DataFrame存入excel

    In [63]:

    for idx, user_name, df_sub in df_subs:file_name = f"{splits_dir}/spike_pandas_{idx}_{user_name}.xlsx"df_sub.to_excel(file_name, index=False)

    2、合并多个小Excel到一个大Excel

  • 遍历文件夹,得到要合并的Excel文件列表
  • 分别读取到DataFrame,给每个df添加一列用于标记来源
  • 使用pd.concat进行df批量合并
  • 将合并后的DataFrame输出到excel
  • 2.1 遍历文件夹,得到要合并的Excel名称列表

    In [66]:

    import osexcel_names = []# listdir返回指定目录下的所有文件和文件夹名称for excel_name in os.listdir(splits_dir):excel_names.append(excel_name)excel_names

    Out[66]:

    ['spike_pandas_0_A.xlsx','spike_pandas_1_B.xlsx','spike_pandas_2_C.xlsx','spike_pandas_3_D.xlsx','spike_pandas_4_E.xlsx','spike_pandas_5_F.xlsx']

    2.2 分别读取到DataFrame

    In [70]:

    df_list = []for excel_name in excel_names:# 读取每个excel到dfexcel_path = f"{splits_dir}/{excel_name}"df_split = pd.read_excel(excel_path)# 得到username,通过字符串切片username = excel_name.replace("spike_pandas_", "").replace(".xlsx", "")[2:]# print(username)# 给df_split添加一列usernamedf_split["username"] = usernamedf_list.append(df_split)

    2.3 使用pd.concat进行合并

    In [71]:

    df_merged = pd.concat(df_list)

    In [72]:

    df_merged.shape

    Out[72]:

    (258, 4)

    In [74]:

    df_merged.head()

    Out[74]:

    idtitletagsusername
    02585Tensorflow怎样接收变长列表特征python,tensorflow,特征工程A
    12583Pandas实现数据的合并concatpandas,python,数据分析A
    22574Pandas的Index索引有什么用途?pandas,python,数据分析A
    32564机器学习常用数据集大全python,机器学习A
    42561一个数据科学家的修炼路径数据分析A

    In [76]:

    df_merged["username"].value_counts()

    Out[76]:

    B 43F 43D 43E 43A 43C 43Name: username, dtype: int64

    2.4 将合并后的DataFrame输出到Excel

    In [77]:

    df_merged.to_excel(f"{work_dir}/spike_pandas_merged.xlsx", index=False)

    十五、Pandas怎样实现groupby分组统计

    类似SQL:
    select city,max(temperature) from city_weather group by city;

    groupby:先对数据分组,然后在每个分组上应用聚合函数、转换函数

    本次演示:
    一、分组使用聚合函数做数据统计
    二、遍历groupby的结果理解执行流程
    三、实例分组探索天气数据

    In [1]:

    import pandas as pdimport numpy as np# 加上这一句,能在jupyter notebook展示matplo图表%matplotlib inline

    In [4]:

    df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],'C': np.random.randn(8),'D': np.random.randn(8)})df

    Out[4]:

    ABCD
    0fooone-0.1023690.042233
    1barone1.552845-0.623522
    2footwo0.7700770.205682
    3barthree-1.989910-0.617111
    4footwo1.230455-0.422428
    5bartwo-0.697516-0.964579
    6fooone-0.939646-0.414017
    7foothree0.7635700.451086

    1、分组使用聚合函数做数据统计

    1.1 单个列groupby,查询所有数据列的统计

    In [5]:

    df.groupby("A").sum()

    Out[5]:

    CD
    A
    bar-1.134582-2.205211
    foo1.722086-0.137444

    我们看到:

  • groupby中的’A’变成了数据的索引列
  • 因为要统计sum,但B列不是数字,所以被自动忽略掉
  • 1.2 多个列groupby,查询所有数据列的统计

    In [6]:

    # 以A,B为索引,查询C,D的平均值df.groupby(["A", "B"]).mean()

    Out[6]:

    CD
    AB
    barone1.552845-0.623522
    three-1.989910-0.617111
    two-0.697516-0.964579
    fooone-0.521008-0.185892
    three0.7635700.451086
    two1.000266-0.108373

    In [7]:

    # 取消A.B作为索引df.groupby(["A", "B"], as_index=False).mean()

    Out[7]:

    ABCD
    0barone1.552845-0.623522
    1barthree-1.989910-0.617111
    2bartwo-0.697516-0.964579
    3fooone-0.521008-0.185892
    4foothree0.7635700.451086
    5footwo1.000266-0.108373

    1.3 同时查看多种数据统计

    In [8]:

    df.groupby("A").agg([np.sum, np.mean, np.std])

    Out[8]:

    CD
    summeanstdsummeanstd
    A
    bar-1.134582-0.3781941.792834-2.205211-0.7350700.198786
    foo1.7220860.3444170.864635-0.137444-0.0274890.385242

    我们看到:列变成了多级索引

    1.4 查看单列的结果数据统计

    In [10]:

    # 预过滤,性能更好df.groupby("A")["C"].agg([np.sum, np.mean, np.std])

    Out[10]:

    summeanstd
    A
    bar-1.134582-0.3781941.792834
    foo1.7220860.3444170.864635

    In [9]:

    # 方法2df.groupby("A").agg([np.sum, np.mean, np.std])["C"]

    Out[9]:

    summeanstd
    A
    bar-1.134582-0.3781941.792834
    foo1.7220860.3444170.864635

    1.5 不同列使用不同的聚合函数

    In [12]:

    # 以字典的形式对不同的列使用不同的聚合函数df.groupby("A").agg({"C":np.sum, "D":np.mean})

    Out[12]:

    CD
    A
    bar-1.134582-0.735070
    foo1.722086-0.027489

    2、遍历groupby的结果理解执行流程

    for循环可以直接遍历每个group

    2.1 遍历单个列聚合的分组

    In [13]:

    g = df.groupby("A")g

    Out[13]:

    <pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000024E95FCC320>

    In [16]:

    df

    Out[16]:

    ABCD
    0fooone-0.1023690.042233
    1barone1.552845-0.623522
    2footwo0.7700770.205682
    3barthree-1.989910-0.617111
    4footwo1.230455-0.422428
    5bartwo-0.697516-0.964579
    6fooone-0.939646-0.414017
    7foothree0.7635700.451086

    In [15]:

    for name,group in g:print(name)print(group)print()# name:bar and foo# group:是两个DataFramebarA B C D1 bar one 1.552845 -0.6235223 bar three -1.989910 -0.6171115 bar two -0.697516 -0.964579fooA B C D0 foo one -0.102369 0.0422332 foo two 0.770077 0.2056824 foo two 1.230455 -0.4224286 foo one -0.939646 -0.4140177 foo three 0.763570 0.451086

    *可以获取单个分组的数据*

    In [17]:

    g.get_group("bar")

    Out[17]:

    ABCD
    1barone1.552845-0.623522
    3barthree-1.989910-0.617111
    5bartwo-0.697516-0.964579

    2.2 遍历多个列聚合的分组

    In [20]:

    g = df.groupby(["A", "B"])g

    Out[20]:

    <pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000024E95ED3E80>

    In [21]:

    for name, group in g:print(name)print(group)print()# 分组的名称变成了元组('bar', 'one')A B C D1 bar one 1.552845 -0.623522('bar', 'three')A B C D3 bar three -1.98991 -0.617111('bar', 'two')A B C D5 bar two -0.697516 -0.964579('foo', 'one')A B C D0 foo one -0.102369 0.0422336 foo one -0.939646 -0.414017('foo', 'three')A B C D7 foo three 0.76357 0.451086('foo', 'two')A B C D2 foo two 0.770077 0.2056824 foo two 1.230455 -0.422428

    可以看到,name是一个2个元素的tuple,代表不同的列

    In [22]:

    g.get_group(("foo", "one"))

    Out[22]:

    ABCD
    0fooone-0.1023690.042233
    6fooone-0.939646-0.414017

    *可以直接查询group后的某几列,生成Series或者子DataFrame*

    In [24]:

    # 获得一个SeriesGroupByg["C"]

    Out[24]:

    <pandas.core.groupby.generic.SeriesGroupBy object at 0x0000024E95C42828>

    In [25]:

    for name, group in g["C"]:print(name)print(group)print(type(group))print()('bar', 'one')1 1.552845Name: C, dtype: float64<class 'pandas.core.series.Series'>('bar', 'three')3 -1.98991Name: C, dtype: float64<class 'pandas.core.series.Series'>('bar', 'two')5 -0.697516Name: C, dtype: float64<class 'pandas.core.series.Series'>('foo', 'one')0 -0.1023696 -0.939646Name: C, dtype: float64<class 'pandas.core.series.Series'>('foo', 'three')7 0.76357Name: C, dtype: float64<class 'pandas.core.series.Series'>('foo', 'two')2 0.7700774 1.230455Name: C, dtype: float64<class 'pandas.core.series.Series'>

    其实所有的聚合统计,都是在dataframe和series上进行的;

    3、实例分组探索天气数据

    In [27]:

    fpath = "./pandas-learn-code/datas/beijing_tianqi/beijing_tianqi_2018.csv"df = pd.read_csv(fpath)df.head()

    Out[27]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    02018-01-013℃-6℃晴~多云东北风1-2级592
    12018-01-022℃-5℃阴~多云东北风1-2级491
    22018-01-032℃-5℃多云北风1-2级281
    32018-01-040℃-8℃东北风1-2级281
    42018-01-053℃-6℃多云~晴西北风1-2级501

    In [28]:

    # 替换掉温度的后缀℃df.loc[:, "bWendu"] = df["bWendu"].str.replace("℃","").astype("int32")df.loc[:, "yWendu"] = df["yWendu"].str.replace("℃","").astype("int32")df.head()

    Out[28]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    02018-01-013-6晴~多云东北风1-2级592
    12018-01-022-5阴~多云东北风1-2级491
    22018-01-032-5多云北风1-2级281
    32018-01-040-8东北风1-2级281
    42018-01-053-6多云~晴西北风1-2级501

    In [29]:

    # 新增一列为月份df["month"] = df["ymd"].str[:7]df.head()

    Out[29]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevelmonth
    02018-01-013-6晴~多云东北风1-2级5922018-01
    12018-01-022-5阴~多云东北风1-2级4912018-01
    22018-01-032-5多云北风1-2级2812018-01
    32018-01-040-8东北风1-2级2812018-01
    42018-01-053-6多云~晴西北风1-2级5012018-01

    3.1 查看每个月的最高温度

    In [31]:

    data = df.groupby("month")["bWendu"].max()data

    Out[31]:

    month2018-01 72018-02 122018-03 272018-04 302018-05 352018-06 382018-07 372018-08 362018-09 312018-10 252018-11 182018-12 10Name: bWendu, dtype: int32

    In [32]:

    type(data)

    Out[32]:

    pandas.core.series.Series

    In [34]:

    data.plot()

    Out[34]:

    <matplotlib.axes._subplots.AxesSubplot at 0x24e95bbe240>

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Ssxk6ssS-1597761927710)(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAXUAAAEKCAYAAADticXcAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzt3Xl8VNX9//HXJ5N9ISEkgZAEwx7CkqAhoLiLghvY1p0iClStti6lX63WumtbrdhNrQqIIlqtK0UlUiqgVtmErBB2SYBshGwkJCQ5vz9m9Ecpy2SZubN8no/HPDK5uXPv+0DymTvn3nOuGGNQSinlGwKsDqCUUqr7aFFXSikfokVdKaV8iBZ1pZTyIVrUlVLKh2hRV0opH6JFXSmlfIgWdaWU8iFa1JVSyocEunuHcXFxJjU11d27VUopr7Z+/foqY0z8ydZze1FPTU1l3bp17t6tUkp5NRH51pn1tPtFKaV8iBZ1pZTyIVrUlVLKh2hRV0opH6JFXSmlfIgWdaWU8iFa1JVSyoe4/Tp1pVxl7a5q8ktrGZEUzYikHoQH66+38j/6W6+83oGDLTzx8SbeWV/6/bIAgSG9o8hIjmFUSjQZyTEM7RNFkE0/nCrfpkVdeS1jDO9v2MPjH22irukwt507kOvH9qO4rJ7c0lpyS2rIKSrjrXUlAIQEBjC8bw8yUmLISI4hIyWG1F7hiIjFLVGq+4gxxq07zMrKMjpNgOqqXVUHeeCDAr7YVsWp/WJ48ocjSevT43/WM8ZQUt3ExtIacktqyCutIX9PLYcOtwMQHRbEqOTo74t8RnI0CT1C3d0cpU5KRNYbY7JOtp4eqSuv0tLazsuf7+DPy7cSbAvgsStGMDW7HwEBxz7aFhH69QqnX69wJmf0BaC1rZ2tFQ3kltSQW1pDbkktL6zcTlu7/QAnMTr0+26bzOQYRiRH0yM0yG1tVKortKgrr7H+22ruey+fLeUNXDoykYcuT+/UUXWgLYBhiT0YltiDa7P7AdDU0kbh3trvu21yS2tYWlgGgAgMiIsgIyWGzJQYRiXHMCwxipBAW7e2T6nuoEVdebzapsM8tXQzi1bvJikmjHnTs7hgWO9u3UdYsI2s1FiyUmO/X3bgYAt5e2q/77ZZtaWK977ZA0CQTUhP7MGoI7ptBsZHHvcTg1Luon3qymMZY/gofx+P/LOI/Q3NzBjfn7svHEJEiDXHIsYY9tUeIrekho2lNeSV1JJXWsPBljYAIkMCGZkU/X23TUZKDInRoXoiVnUL7VNXXq30QCO/+aCAz4orGZkUzSs3jmFEUrSlmUSEvjFh9I0J4+KRiQC0tRt2VDawsaSGvNJacktrmP/FTg632Q+W4iJDyEyJ/q8j+pjwYCuboXycFnXlUVrb2nnly13MWbYFEfjNZelMP/0UAj30+nJbgDC4dxSDe0dxVVYKAM2tbWzaV3/Eidga/rWp4vvXnNIr3H4iNjmazJQYhveNJixY++dV99CirjxGXmkN972XT+HeOiYMS+CRKSNIigmzOlaHhQTayHScVP1O3aHDFJTWft9ts3ZXNYtz9wL2N4YhvaPIdAySGpUcw5DekR77RqY8m/apK8s1NLfyzKfFvPqfXcRHhfDI5OFMHN7H5/uiK+oO/dfVNrklNdQdagUgNCiAEX2j7V02jm6bfrE6UMqfOdunrkVdWerTwjIeWlxIWd0hpo07hV9OHOq314QbY9i1v5G80ho2ltiLfOHeOppb7QOlYsKD7IOkku3FflRyDPFRIRanVu6iJ0qVRyurPcRDiwvIKSwnrU8Uz009lVP79bQ6lqVEhP5xEfSPi2BKZhIAh9vaHdMe2Lttcktr+OtnlTjGSZEUE8bVWSncccEgPYpXgBZ15WZt7YbXv/6Wp3OKaW1v595Jacw6q79OtHUcQbYAx6yT0Uwda1/W2NJKwZ46cktq+GJbFc/+awstbW3838Q0a8Mqj6BFXblN0d467n8/n40lNZw1OI4nrhhJv17hVsfyOuHBgWT3jyW7fyyzzurP/e8X8Nxn2wkPDuT28wZZHU9ZTIu6crmmljb+uHwLcz/fSc/wIP50bSaTM/pqd0E3EBEev2IETS2tPJ1TTFiQjRln9rc6lrKQU0VdREKBVUCI4zXvGGMeEpEFwDlArWPVG40xG10RVHmnlVsqeeCDfEqqm7h2TAq/ujhNB990M1uA8IerMmg63MajS4oID7Z9P6eN8j/OHqk3A+cbYxpEJAj4QkQ+cfzs/4wx77gmnvJWlfXNPLakiMW5exkYH8FbN49j7IBeVsfyWYG2AP583Whufm09972fT1iw7fuTrcq/OFXUjf26xwbHt0GOh3uvhVRe4+21JTz+URGHDrdz94Qh3HruAJ3R0A1CAm28OO00bnxlDb94O5fQIBsTh/exOpZyM6cvORARm4hsBCqAZcaY1Y4fPSEieSLyrIgc86JZEblZRNaJyLrKyspuiK081fMrtnHPu3kMS+zBJ3edxZ0TBmtBd6PQIBtzp49hZFI0P39jAyu36N+bv3G6qBtj2owxmUAykC0iI4D7gDRgDBAL3Huc175kjMkyxmTFx8d3Q2zliRZ8uZOnlhYzJbMvb/xkHAPjI62O5JciQwJ59aZsBiVEcsvCdazesd/qSMqNOnxxsDGmBlgBTDLG7DN2zcArQHY351Ne4u21JTz8zyIuSu/NH67KwKbzilsqOjyIhTOzSe4ZzowFa9lYUmN1JOUmThV1EYkXkRjH8zBgArBZRBIdywS4AihwVVDluRbn7uXe9/I4e0g8f7l+tA4k8hC9IkN4feZYekWGcMO81RTtrbM6knIDZ//6EoHPRCQPWIu9T30JsEhE8oF8IA543DUxladaVlTOL97ayJjUWF788Wnaf+5h+kSHsmjWWCJCApk2bzXbKhpO/iLl1XRCL9Vpn2+tZOaCdQzr24NFs8YSadEdidTJ7ahs4OoXv8YWAP+45QwdyeuFnJ3QSz8nq05Zs7Oan7y2joEJkbx2U7YWdA83ID6S12dl09zazvVzv2ZfbZPVkZSLaFFXHZZbUsOMBWtJiglj4cxsosP9c6pcb5PWpwevzcimpvEwU19eTWV9s9WRlAtoUVcdsmlfHTfMX0PPiCAWzRpHXKTO5+1NRiXH8MpNY9hXe4hp81ZT09hidSTVzbSoK6dtr2xg2rzVhAXZeGPWOPpEh1odSXXCmNRYXr4hix2VB5k+fw31hw5bHUl1Iy3qyikl1Y1Mfdk+iHjRT8aSEqsn2rzZmYPjeH7qqRTurWPmgnU0tbRZHUl1Ey3q6qTKag8xde5qmg63sXDmWB0p6iMmpPfm2WsyWfdtNTcvXEdzqxZ2X6BFXZ1QVUMzU+d+TfXBFl6bkc2wxB5WR1Ld6PKMvvzuR6P4fGsVP3tjA4fb2q2OpLpIi7o6rtrGw0ybt4Y9NU3Mv3EMGSkxVkdSLnB1VgqPThnOsqJyZr+dS1u7TsDqzfTiYnVMDc2tTH9lDdsrGpg7PYvs/rFWR1IudMPpqTS2tPG7TzYTFmTjtz8cSYDO3+OVtKir/9HU0sbMBWvJ31PLC1NP5ewhOrOmP7j1nIE0Nrfy539vIyzYxkOXp+stB72QFnX1X5pb27j19fWs2VXNn64dzUV6kwW/cveFQ2hsaWPuFzsJD7Zxz6Q0qyOpDtKirr7X2tbOHW/ab6zw1I9GMTmjr9WRlJuJCL++dBiNh9t4fsV2IkICuf28QVbHUh2gRV0B0NZu+OU/cskpLOfhy9O5ekyK1ZGURUSEx6eMoKmljadzigkLsjHjzP5Wx1JO0qKuMMbwwAf5fLBxL/dMGsqN4/UP2N8FBAhPXzmKppY2Hl1SRHiwjWuz+1kdSzlBL2n0c8YYHluyiTfXlPCz8wZx27n6UVvZBdoC+PN1ozl3aDz3vZ/Phxv3WB1JOUGLup+bs2wL87/cyU3jU5l90RCr4ygPExwYwN9+fBpj+8fyi7dzWVpQZnUkdRJa1P3Y8yu28Zd/b+PaMSk8eJlevqaOLTTIxtzpYxiVHM3P3/yGFcUVVkdSJ6BF3U8t+HInTy0tZkpmX574wUgt6OqEIkMCWXBTNoMTorhl4Xq+3rHf6kjqOLSo+6G315bw8D+LmDi8N89clYFNRw4qJ0SHBbFwZjYpseHMXLCWDbsPWB1JHYMWdT+zOHcv976Xx9lD4vnzdaMJtOmvgHJer8gQFs0aS1xUCNPnr9EbWXsg/Yv2I8uKyvnFWxsZkxrLiz8+jZBAm9WRlBfq3SOU12eOJcgWwC0L1+lNNjyMFnU/8fnWSm5f9A0jkqKZf+MYwoK1oKvOS4kN57mpp7JrfyOz386lXWd29Bha1P3Amp3V/OS1dQxMiOTVm7KJDNExZ6rrxg3oxa8vGcanReU8v2Kb1XGUg1NFXURCRWSNiOSKSKGIPOJY3l9EVovIVhF5S0SCXRtXdVRuSQ0zFqwlKSaMhTOziQ4PsjqS8iE3jU/lB6OTeGbZFj7brJc6egJnj9SbgfONMRlAJjBJRMYBvweeNcYMBg4AM10TU3XGpn113DB/DT0jglg0axxxkSFWR1I+RkR48gcjGdanB3f8fQO7qg5aHcnvOVXUjd13p7mDHA8DnA+841j+KnBFtydUnVK0t45p81YTFmTjjVnj6BMdanUk5aPCgm28OO00bAHCLQvXc7C51epIfs3pPnURsYnIRqACWAZsB2qMMd/9D5YCScd57c0isk5E1lVWVnY1szqBppY2fr90M5P/+gUiwqKfjCUlNtzqWMrHpcSG89frTmVrRT33vJOHMXri1CpOF3VjTJsxJhNIBrKBYcda7TivfckYk2WMyYqP17vouMqqLZVM/OMqXlixnR+MTuLTu85mYHyk1bGUnzhzcBz3Tkrjo/x9vLhqh9Vx/FaHL4MwxtSIyApgHBAjIoGOo/VkYG8351NOqGpo5rElRXy4cS8D4iJ48yfjOH1gL6tjKT9089kDyNtTy1NLNzO8bw/OGqwHce7m7NUv8SIS43geBkwANgGfAVc6VpsOfOiKkOrYjDG8tXY3Fzyzkk/yy7jzgsF8ctdZWtCVZUTs87AP6R3Fz9/cQEl1o9WR/I6z3S+JwGcikgesBZYZY5YA9wK/EJFtQC9gnmtiqqNtq2jgmpe+5t538xnaJ4qP7zyLuy8coqNEleXCgwN5cdpptLcbbl64nqaWNqsj+RVx9wmNrKwss27dOrfu05c0t7bx/GfbeWHFdsKCbdx/SRpXnZZCgE7KpTzMZ8UVzFiwlikZfXn2mkydCbSLRGS9MSbrZOvp0EIv8vWO/dz/fj47Kg8yJbMvD1yaTnyUXnuuPNN5QxOYfeEQ/vDpFkYmxzBT73PqFlrUvcCBgy08+fEm/rG+lJTYMF6dkc05Q/QElPJ8t507iPw9tTz58SaGJUZxxsA4qyP5PJ37xYMZY3h/QykT5qzkvQ17uPWcgXx61zla0JXXCAgQnrk6k/5xEfz8jQ3sqWmyOpLP06Luob7df5Ab5q/h7rdySYkNZ8nPz+RXF6fp7IrK60SG2E+ctrS2c+vC9Rw6rCdOXUmLuoc53NbOc59t46JnV7Fxdw2PTRnOuz89g2GJPayOplSnDYyPZM41meTvqeWBDwp0xKkLaZ+6B1n/7QHufy+f4vJ6Lh7Rh4cnD6d3D52zRfmGC9N7c+cFg/nT8q1kJEcz7fRUqyP5JC3qHqC26TBPLd3MG2t2k9gjlLk3ZDEhvbfVsZTqdndeMJiCPbU88s8i0hJ7MCY11upIPke7XyxkjOGjvH1MmLOSN9fs5qYz+vPpL87Rgq58VkCAMOeaTFJiw/np699QVnvI6kg+R4u6RUoPNDLz1XXc/sY3JESF8OHtZ/Lg5el6VyLl86LDgnhp2mk0tbTy00XraW7VE6fdSYu6m7W2tTP38x1cOGcVX23fzwOXDuPD28czMjna6mhKuc3g3lH84aoMNuyu4eHFRVbH8Sl6WOhG+aW13Pd+HgV76jg/LYFHpwwnuafOda7808UjE7nt3IE8v2I7o5KjuS67n9WRfIIWdTdoaG7lmU+LefU/u4iLDOH5qady8Yg+OheG8nuzLxpKwd46HvqwkKF9oji1X0+rI3k97X5xsZVbKrlozkoW/GcX14/tx79mn8MlIxO1oCsF2AKEP1+bSZ/oUH76+noq6vXEaVdpUXehxpZWbl24nvCQQN659Qwev2IkPUKDrI6llEeJCQ/mxWmnUdfUyu2LvqGltd3qSF5Ni7oLrdpSSdPhNh6dMpzTTtGPlUodz7DEHvz+ylGs3XWAJz7SE6ddoX3qLrS0oIye4UFk6wALpU5qckZf8ktrePnznYxIiuaqrBSrI3klPVJ3kZbWdpZvrmDCsN4E2vSfWSln3DspjTMG9uLXHxSQV1pjdRyvpNXGRb7asZ/6Q61MGtHH6ihKeY1AWwB/vf5U4iNDuHXhevY3NFsdyetoUXeRnMIyIoJtjB+kNwVQqiNiI+wnTvcfbOH2N76htU1PnHaEFnUXaGs3fFpYzrlDEwgN0vnPleqoEUnR/PaHI/l6RzW//WSz1XG8ip4odYENuw9Q1dDMRO16UarTfnhqMnmltcz7YiejkqOZkplkdSSvoEfqLrC0oIxgWwDnDdXbzinVFb++dBjZ/WO59908CvfWWh3HKzhV1EUkRUQ+E5FNIlIoInc6lj8sIntEZKPjcYlr43o+Yww5RWWMH9SLKB1opFSXBNkCeO76U4kJC+aWhes5cLDF6kgez9kj9VZgtjFmGDAOuF1E0h0/e9YYk+l4fOySlF5k0756SqqbmDhcu16U6g7xUSH8bdppVNQ1c8ffN9DWrrfCOxGniroxZp8x5hvH83pgE6AdXMewtLCMAEFvdKFUN8pMieGxK4bz+dYqns4ptjqOR+twn7qIpAKjgdWORT8TkTwRmS8ifj8W/tPCMrJSY4mLDLE6ilI+5Zox/Zg6th9/W7mdj/P3WR3HY3WoqItIJPAucJcxpg54ARgIZAL7gGeO87qbRWSdiKyrrKzsYmTPtavqIJvL6pmkXS9KucRDlw8nMyWG+97Lp7xOZ3Q8FqeLuogEYS/oi4wx7wEYY8qNMW3GmHbgZSD7WK81xrxkjMkyxmTFx/vuFSE5hWUAXDRcu16UcoXgwACevSaT5tY27n03D2O0f/1ozl79IsA8YJMxZs4RyxOPWO0HQEH3xvMuSwvLGJkUrXczUsqF+sdFcN/Fw1hRXMnf15ZYHcfjOHukPh6YBpx/1OWLT4lIvojkAecBd7sqqKcrrzvEht01TNSjdKVcbtq4Uxg/qBePLymipLrR6jgexdmrX74wxogxZtSRly8aY6YZY0Y6lk82xvjt2YtPHV0veimjUq4XECA8dWUGASLM/kcu7XqZ4/d0RGk3ySksZ0B8BIMSIq2OopRfSIoJ48HL01mzs5r5X+60Oo7H0KLeDWoaW/hqx34mDtebSSvlTleelsyEYQk8lVPMtop6q+N4BC3q3WD5pgra2o1eyqiUm4kIT/5wJBHBNma/navT9KJFvVssLSwjMTqUUcnRVkdRyu8kRIXy+BUjyS2t5YUV262OYzkt6l3U2NLKqi2V2vWilIUuHZXI5Iy+/Gn5Vgr2+PdsjlrUu2hlcSXNre064Egpiz06ZTixEcHMfjuX5tY2q+NYRot6F+UUltEzPIjs1Firoyjl12LCg/n9j0ZRXF7Ps8u2Wh3HMlrUu6CltZ3lmyuYMKw3gTb9p1TKauelJXDtmBReWrWd9d9WWx3HElqJuuCrHfupP9TKJL1tnVIe44HL0ukbE8bst3NpbGm1Oo7baVHvgpzCMiKCbYwfFGd1FKWUQ2RIIE9fmcGu/Y38zg9vWq1FvZPa2g2fFpZzbloCoUE2q+MopY5w+sBezBjfn9e++pYvtlZZHcettKh30obdB6hqaNa5XpTyUPdMGsqA+AjueSeXukOHrY7jNlrUO2lpQRnBtgDOG+q788Mr5c1Cg2zMuTqT8vpmHv1nkdVx3EaLeicYY8gpKmP8oF5EhQZZHUcpdRyZKTHcdu5A3llfyrKicqvjuIUW9U4o2ldHSXWTdr0o5QV+fv5g0hN7cN97eVQfbLE6jstpUe+EnMJyAgQmpOsoUqU8XXBgAHOuyaC26TAPfJDv87fA06LeCTkFZWSlxhIXGWJ1FKWUE9L69ODuC4fwcX4Zi3P3Wh3HpbSod9DOqoMUl9frNLtKeZlbzh7I6H4x/OaDAsrrDlkdx2W0qHdQjuO2dTqBl1LexRYgzLk6k5a2du55J89nu2G0qHdQTmEZI5OiSe4ZbnUUpVQH9Y+L4L6Lh7FySyV/X1tidRyX0KLeAWW1h9iwu4aJepSulNeaNu4UzhjYi8eXFFFS3Wh1nG6nRb0DlhXZu150Ai+lvFdAgPD0VRmICLP/kUt7u291w2hR74ClhWUMiI9gUEKU1VGUUl2QFBPGg5ens2ZnNfO/3Gl1nG7lVFEXkRQR+UxENolIoYjc6VgeKyLLRGSr42tP18a1Tk1jC1/vqNYBR0r5iKtOS2bCsASeyilmW0W91XG6jbNH6q3AbGPMMGAccLuIpAO/ApYbYwYDyx3f+6Tlmypoazd6KaNSPkJEePKHI4kItjH77Vxa29qtjtQtnCrqxph9xphvHM/rgU1AEjAFeNWx2qvAFa4I6QmWFpaRGB3KqORoq6MopbpJQlQoj18xktzSWp5fsd3qON2iw33qIpIKjAZWA72NMfvAXviBhO4M5ykaW1pZtaWSicP7ICJWx1FKdaNLRyUyOaMvf16+lYI9tVbH6bIOFXURiQTeBe4yxtR14HU3i8g6EVlXWVnZ0YyWW1lcSXNruw44UspHPTplOLERwcx+O5fm1jar43SJ00VdRIKwF/RFxpj3HIvLRSTR8fNEoOJYrzXGvGSMyTLGZMXHe9/84zmFZfQMDyI7NdbqKEopF4gJD+b3PxpFcXk9zy7banWcLnH26hcB5gGbjDFzjvjRYmC64/l04MPujWe9ltZ2lm+uYMKw3gTa9ApQpXzVeWkJXDsmhZdWbWf9t9VWx+k0Z6vUeGAacL6IbHQ8LgF+B1woIluBCx3f+5Svduyn/lCrDjhSyg88cFk6fWPCmP12Lo0trVbH6RRnr375whgjxphRxphMx+NjY8x+Y8wFxpjBjq/e+/Z2HEsLyogItjF+UJzVUZRSLhYZEsjTV2awa38jv/tks9VxOkX7E06grd2wrKicc9MSCA2yWR1HKeUGpw/sxYzx/Xntq2/5YmuV1XE6TIv6CXyz+wBVDc06ilQpP3PPpKEMiI/gnndyqTt02Oo4HaJF/QRyCsoItgVw3lDvu2JHKdV5oUE25lydSVndIR5ZXGR1nA7Ron4cxhiWFpYxflAvokKDrI6jlHKzzJQYbjt3EO9+U8qnjpvjeAMt6sdRtK+O0gNN2vWilB+744LBpCf24P7389nf0Gx1HKdoUT+OnIIyAgQmpOsoUqX8VXBgAHOuyaC26TAPfFDgFbfA06J+HDmF5WSlxhIXGWJ1FKWUhdL69ODuC4fwSUEZi3P3Wh3npLSoH8POqoMUl9frNLtKKQBuOXsgo/vF8JsPCqis9+xuGC3qx5DjOCmiE3gppQBsAcLTV2bQdLiNp5Z69qAkLerHkFNYxsikaJJ7hlsdRSnlIQYlRDLjzP78Y30p6789YHWc49KifpSy2kNs2F3DRD1KV0od5Y7zB9O7RwgPLS6gzUNvWK1F/SjLiuxdLzqBl1LqaBEhgfz60nQK9tTxxprdVsc5Ji3qR1laWMaA+AgGJURZHUUp5YEuH5XI6QN68YecYqoPtlgd539oUT9CTWMLX++o1qtelFLHJSI8MmU4Dc2tPJ3jeSdNtagf4V+bKmhrNzqKVCl1QkN6R3HjGan8fW0JuSU1Vsf5L1rUj5BTWEZidCijkqOtjqKU8nB3TRhMXGQID35YQLsHnTTVou7Q2NLKqi2VTBzeB/vd+5RS6viiQoO4/5I0cktreXtdidVxvqdF3WFlcSXNre064Egp5bQrMpMYk9qT3y/dTE2jZ5w01aLusLSwjJ7hQWSnxlodRSnlJUSERyaPoLbpMH/4tNjqOIAWdQBaWtv596YKJgzrTaBN/0mUUs5L79uDG05PZdHq3RTsqbU6jhZ1gP9sr6K+uVUHHCmlOuXuC4fQKyLYI06aalHHPs1uRLCN8YPirI6ilPJC0WFB3DspjW921/DuN6WWZvH7ot7WblhWVMa5aQmEBtmsjqOU8lI/OjWZ0f1i+N0nm6ltsu5m1U4VdRGZLyIVIlJwxLKHRWSPiGx0PC5xXUzX+Wb3AaoaWnTAkVKqSwIChMemjKC6sYVnl22xLoeT6y0AJh1j+bPGmEzH4+Pui+U+OQVlBNsCOG9ovNVRlFJebkRSNFPH9uO1r3axaV+dJRmcKurGmFVAtYuzuJ0xhqWFZYwf1Iuo0CCr4yilfMAvLxpKdFgQD35ozT1Nu9qn/jMRyXN0z/TslkRuVLSvjtIDTdr1opTqNjHhwdwzKY21uw7wwcY9bt9/V4r6C8BAIBPYBzxzvBVF5GYRWSci6yorK7uwy+6VU1BGgMCEdB1FqpTqPtdkpZCRHM2TH2+m/pB7T5p2uqgbY8qNMW3GmHbgZSD7BOu+ZIzJMsZkxcd7Tt91TmE5WamxxEWGWB1FKeVDAgKER6eMoKqhmT/9a6t7993ZF4pI4hHf/gAoON66nmhn1UGKy+t17nSllEtkpMRw7ZgUXvnPLraU17ttv85e0vgm8BUwVERKRWQm8JSI5ItIHnAecLcLc3a7nEL7bet0Ai+llKv838Q0IkMC3XrSNNCZlYwx1x1j8bxuzuJWSwvKGJkUTXLPcKujKKV8VGxEML+cOJTffFDAkrx9XJ7R1+X79MsRpWW1h9hYUsNEPUpXSrnY9dn9GJHUgyc+2sTB5laX788vi/qnRfauF53ASynlarYA+/S8ZXWH+PO/XX/S1C+Lek5hGQPiIxiUEGV1FKWUHzjtlJ5ceVoy8z7fybaKBpfuy++K+oGDLXy9o1qvelFKudWvLk4jLNjGw4sLXXrS1O+K+vLNFbS1Gx1FqpRyq7jIEGZfOIQvtlUAmgBuAAAMdUlEQVSxtKDMZfvxu6K+tKCMxOhQRiVHWx1FKeVnfjzuFNL6RPHYkiIaW1xz0tSvinpjSyufb61k4vA+iIjVcZRSfibQFsCjU0awt/YQz322zSX78KuivrK4kubWdh1wpJSyTHb/WH4wOomXV+1kZ9XBbt++XxX1pYVl9AwPIjs11uooSik/dt/FaQQHBvDIP7v/pKnfFPWW1nb+vamCCcN6E2***2YrpTxQQo9Q7powmBXFlSwrKu/WbftNdfvP9irqm1t1wJFSyiNMPyOVIb0jeXRJEYcOt3Xbdv2mqOcUlhMRbGP8oDiroyilFEG2AB6ZPILSA028sGJ7t23XL4p6W7thWVEZ56YlEBpkszqOUkoBcPrAXlye0ZcXVm5n9/7GbtmmXxT1L7ZVUdXQogOOlFIe5/5L0ggMEB5dUtgt2/P5or65rI47/76B5J5hXJCWYHUcpZT6L4nRYdxxwWD+tamCf2/u+klTny7qOyob+PHc1YQG2nhj1jgiQpyaPl4ppdxqxvj+DIyP4JF/dv2kqc8W9ZLqRqbOXY0x8PqssfTrpTfDUEp5puDAAB6ePJxv9zfy8qodXdqWTxb1stpDTJ27moPNrSycOZZBCZFWR1JKqRM6a3A8F4/ow3MrtlF6oPMnTX2uqO9vaGbq3K/Z39DMqzOySe/bw+pISinllAcuS0cQHltS1Olt+FRRr208zI/nrWFPTRPzbxzD6H49rY6klFJOS4oJ42fnDyKnsJyVWyo7tQ2fKeoNza1Mf2UN2ysaeHFaFmMH9LI6klJKddiss/qT2iucRxYX0tza8ZOmPlHUm1ramLlgLfl7avnL9aM5Z0i81ZGUUqpTQgJtPDR5ODuqDjLvi50dfr3XF/Xm1jZufX09a3ZVM+fqDB1gpJTyeucNTeDC9N78Zfk29tY0dei1ThV1EZkvIhUiUnDEslgRWSYiWx1f3d6B3drWzh1vbmDllkp++4ORTMlMcncEpZRyiQcvS6fdGJ74eFOHXufskfoCYNJRy34FLDfGDAaWO753m/Z2wy//kUtOYTkPXpbOtdn93Ll7pZRyqZTYcG47dxAf5e3jy21VTr/OqaJujFkFVB+1eArwquP5q8AVTu+1i4wx/PqDAj7YuJf/mziUGWf2d9eulVLKbW45ZwApsWE8tNj5eWG60qfe2xizD8Dx1S0TqxhjeGzJJt5cs5vbzxvI7ecNcsdulVLK7UKDbDx02XC2VTQ4/Rq3nCgVkZtFZJ2IrKus7Ny1l9+Zs2wL87/cyY1npPLLi4Z2U0KllPJME9J7c/PZA5xevytFvVxEEgEcXyuOt6Ix5iVjTJYxJis+vvOXGz6/Yht/+fc2rslK4cHL0hGRTm9LKaW8xf2XDHN63a4U9cXAdMfz6cCHXdjWSb36n108tbSYyRl9efKHIwkI0IKulFJHc/aSxjeBr4ChIlIqIjOB3wEXishW4ELH9y7x9toSHlpcyIXpvXnm6gxsWtCVUuqYnJpg3Bhz3XF+dEE3Zjmmxbl7ufe9PM4aHMdfrx9NkM3rx0sppZTLeHSFXFZUzi/e2siYU2J5aVoWIYF6f1GllDoRjy3qn2+t5PZF3zA8KZp5N2YRFqwFXSmlTsYji/qandX85LV1DIiP4NWbxhAVGmR1JKWU8goeV9RzS2qYsWAtfWPCeH3WWGLCg62OpJRSXsOjivqmfXXcMH8NPSOCeGPWOOIiQ6yOpJRSXsVjivqOygamzVtNWJCNN2aNo090qNWRlFLK63hEUS+pbmTq3NUYA6/PGktKbLjVkZRSyis5dZ26K5XVHmLq3NU0trTx95vHMSgh0upISinltSw9Ut/f0MzUuV9TfbCFV2dkMyyxh5VxlFLK61lW1GsbD/PjeWvYU9PEvOlZZKbEWBVFKaV8hiVFvaG5lemvrGF7RQMvTsti7IBeVsRQSimf4/Y+9XYDMxesJX9PLS9MPZVzhnR+Kl6llFL/ze1Ffff+gxzYVc0fr8nkouF93L17pZTyaW4v6vXNrTz3w5FMyUxy966VUsrnub1PPTE6lGvG9HP3bpVSyi+4vajr0H+llHIdjxhRqpRSqntoUVdKKR+iRV0ppXyIFnWllPIhWtSVUsqHaFFXSikfokVdKaV8iBZ1pZTyIWKMce8OReqBYrfu1HpxQJXVIdxM2+z7/K29YG2bTzHGnHQGRCvufFRsjMmyYL+WEZF12mbf529t9rf2gne0WbtflFLKh2hRV0opH2JFUX/Jgn1aTdvsH/ytzf7WXvCCNrv9RKlSSinX0e4XpZTyISct6iKSIiKficgmESkUkTsdy2NFZJmIbHV87elYniYiX4lIs4j88qht3e3YRoGIvCkiocfZ53THdreKyPQjlj8hIiUi0tC1ZntVm5eKSK5jG38TEZsftHmFiBSLyEbHI8GX2ywiUUe0daOIVInIH321vY7l14hInmMbT3V3Wy1u81IRqRGRJUct/5mIbBMRIyJxrmozxpgTPoBE4FTH8yhgC5AOPAX8yrH8V8DvHc8TgDHAE8Avj9hOErATCHN8/zZw4zH2FwvscHzt6Xje0/GzcY48DSfL3ZWHh7W5h+OrAO8C1/pBm1cAWa78P/a0Nh+13nrgbF9tL9AL2A3EO9Z7FbjAF/6PHT+7ALgcWHLU8tFAKrALiHPV7/VJj9SNMfuMMd84ntcDmxwNnOL4z/juP+UKxzoVxpi1wOFjbC4QCBORQCAc2HuMdSYCy4wx1caYA8AyYJJj218bY/adLHNXeVib647YTjDgkpMgntRmd/HENovIYOyF5fMuNu9/eFB7BwBbjDGVjvX+BfyoG5r4PyxoM8aY5UD9MZZvMMbs6lKDnNChPnURScX+brMa6P1dgXV8PeFHZGPMHuAP2N+h9wG1xphPj7FqElByxPeljmWW8IQ2i0gOUIH9F+WdTjbFaZ7QZuAVR1fEb0REOtkUp3lImwGuA94yjkM7V7G4vduANBFJdRTIK4CUrrTHGW5qs+WcLuoiEon94/9dRxw9Os3RZzUF6A/0BSJE5MfHWvUYyyy5RMdT2myMmYj9Y2QIcH5Hc3SEh7R5qjFmJHCW4zGtozk6wkPa/J1rgTc7mqEjrG6v46j9p8Bb2D+R7AJaO5qjI9zYZss5VdRFJAj7P8giY8x7jsXlIpLo+Hki9iPJE5kA7DTGVBpjDgPvAWeIyNgjThBNxv5ufuS7djLH+ZjjSp7WZmPMIWAx9l8sl/CUNjuOir77uPwGkN09LfxfntJmx74ygEBjzPpuadwxeEp7jTH/NMaMNcacjn0uqK3d1cajubnNlnPm6hcB5gGbjDFzjvjRYuC7s9nTgQ9PsqndwDgRCXds8wLHNlcbYzIdj8VADnCRiPR0vDte5FjmNp7SZhGJPOIXLxC4BNjcXe08kge1OfC7KwMcf4yXAQXd1c4jeUqbj9jOdbjwKN2T2iuOK5ocy28D5nZPK/+bBW22njn52eMzsX9EzAM2Oh6XYD+DvRz7O+xyINaxfh/s79B1QI3j+XdXcDyCvSgVAAuBkOPscwb2frdtwE1HLH/Ksb12x9eHT5a/Mw9PaTPQG1jryFEI/AX7kZwvtzkC+9Uf37X5T4DNl9t8xM92AGmuaKuntRf7m1eR4+GSK7osbPPnQCXQ5Hj9RMfyOxzft2L/xDLXFW3WEaVKKeVDdESpUkr5EC3qSinlQ7SoK6WUD9GirpRSPkSLulJK+RAt6kqdhIjEiMhtR3x/rhw1A59SnkKLulInF4N9gIxSHk+LuvIpjkmiNovIXLHPe71IRCaIyJdinzs7W+xzaX8g9vm8vxaRUY7XPiwi88U+n/sOEbnDsdnfAQMdQ8GfdiyLFJF3HPta5BhlqJTlAq0OoJQLDAKuAm7GPiL3euwjCycD92OfOXCDMeYKETkfeA3IdLw2DTgP+9zbxSLyAvb5tkcYYzLB3v2Cfba/4dhHBn4JjAe+cEfjlDoRPVJXvminMSbfGNOOfaqB5cY+dDof+00KzsQ+zBtjzL+BXiIS7XjtR8aYZmNMFfZJnnofZx9rjDGljn1sdGxXKctpUVe+qPmI5+1HfN+O/dPpiabAPfK1bRz/06yz6ynlVlrUlT9aBUyF77tSqsyJ59iux94do5TH06ML5Y8exn5npTygkf8/BesxGWP2O060FgCfAB+5PqJSnaOzNCqllA/R7hellPIhWtSVUsqHaFFXSikfokVdKaV8iBZ1pZTyIVrUlVLKh2hRV0opH6JFXSmlfMj/AwCmttdcI8/aAAAAAElFTkSuQmCC)]

    3.2 查看每个月的最高温度、最低温度、平均空气质量指数

    In [35]:

    df.head()

    Out[35]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevelmonth
    02018-01-013-6晴~多云东北风1-2级5922018-01
    12018-01-022-5阴~多云东北风1-2级4912018-01
    22018-01-032-5多云北风1-2级2812018-01
    32018-01-040-8东北风1-2级2812018-01
    42018-01-053-6多云~晴西北风1-2级5012018-01

    In [38]:

    group_data = df.groupby("month").agg({"bWendu":np.max, "yWendu":np.min, "aqi":np.mean})group_data

    Out[38]:

    bWenduyWenduaqi
    month
    2018-017-1260.677419
    2018-0212-1078.857143
    2018-0327-4130.322581
    2018-04301102.866667
    2018-05351099.064516
    2018-06381782.300000
    2018-07372272.677419
    2018-08362059.516129
    2018-09311150.433333
    2018-1025167.096774
    2018-1118-4105.100000
    2018-1210-1277.354839

    In [39]:

    group_data.plot()

    Out[39]:

    <matplotlib.axes._subplots.AxesSubplot at 0x24e963082b0>

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Lwmi8hwM-1597761927711)(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAXoAAAEKCAYAAAAcgp5RAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzs3Xd4VFX6wPHvSe+kE0hIAiQQOoTQi6AiSLEg1XVX176ua0Fsuz/B3VVXUFlZXVxx7UhHUVDRtQSQTugkgYSSkFASUgjpycz5/XGHEKSFtJlM3s/z5GHmzp173wmT95577rnvUVprhBBC2C8HawcghBCiYUmiF0IIOyeJXggh7JwkeiGEsHOS6IUQws5JohdCCDsniV4IIeycJHohhLBzkuiFEMLOOVk7AIDAwEAdGRlp7TCEEKJJSUhIOK21DrraejaR6CMjI9m+fbu1wxBCiCZFKZVWk/Wk60YIIeycJHohhLBzkuiFEMLO2UQfvRAVFRVkZGRQWlpq7VCaBDc3N8LCwnB2drZ2KKIJkEQvbEJGRgbe3t5ERkailLJ2ODZNa01OTg4ZGRm0bdvW2uGIJkC6boRNKC0tJSAgQJJ8DSilCAgIkLMfUWOS6IXNkCRfc/K7EtdCEr0VHMg9wJYTW6wdhhCimZBE38jM2sxTa5/iof89xNYTW60djqjm6NGjdO3a9YJlX375JbfddlvV83/84x9ERUVVPV+1ahW33HJLvew/MjKS06dP18u2hKhOEn0j23h8I2kFabg5uTF97XSOFx63dkjiCgYOHMimTZuqnm/atAkfHx+ysrIA2LhxI4MGDbJWeELUiCT6RrYwaSGB7oF8evOnVJgreOLnJyitlItqtqKyspK7776b7t27M2HCBDw9PWnRogWpqakAZGZmcscdd7Bx40bASPQDBw4E4Pvvv2fAgAHExsYyceJECgsLAaOlPnPmTGJjY+nWrRvJyckA5OTkcNNNN9GrVy8eeughtNbAxWcWr7/+Oi+++GJj/QqEHZLhlY0ovSCdXzJ/4eEeDxPtF80/hvyDP/30J/6++e+8NOglucBm8ddV+0k8XlCv2+zc2oeZ47pcdb0DBw7w/vvvM2jQIO69917mzZvHwIED2bhxIyaTiejoaPr37893333H2LFj2bNnD3369OH06dO89NJL/PDDD3h6ejJr1izmzJnDjBkzAAgMDGTHjh3MmzeP119/nf/+97/89a9/ZfDgwcyYMYOvv/6a+fPn1+tnFuIcadE3okXJi3BUjkzsMBGAYW2G8UiPR/jq0FcsTF5o5egEQJs2baq6Yu666y5++eUXBg0axMaNG9m4cSMDBgygb9++bNmyhZ07d9KxY0fc3NzYvHkziYmJDBo0iJ49e/Lxxx+Tlna+3tT48eMB6N27N0ePHgVg3bp13HXXXQCMGTMGPz+/xv2wotmQFn0jKa4oZmXqSkZEjiDI43xV0Yd6PERibiKvbXuNDn4d6BPSx4pR2oaatLwbyq/PqpRSDBw4kLfeeguTycQDDzyAt7c3paWlxMfHVx0UtNaMGDGCRYsWXXK7rq6uADg6OlJZWXnZ/QE4OTlhNpurnst4eVFX0qJvJKsOraKwopA7Y+68YLmDcuCVwa/QxrsN09dO52TRSStFKADS09OrLr4uWrSIwYMH07lzZ44fP8769evp1asXAD179uQ///lPVf98//792bBhQ1VffnFxMQcPHrzivoYOHcpnn30GwLfffkteXh4ALVu2JCsri5ycHMrKyli9enWDfFbRfEiibwRaaxYlL6JzQGd6BPW46HVvF2/mXj+XMlOZXJy1sk6dOvHxxx/TvXt3cnNz+cMf/oBSin79+hEYGFhVW2bAgAEcPny4KtEHBQXx0UcfMXXqVLp3707//v2rLrpezsyZM1m3bh2xsbF8//33hIeHA+Ds7MyMGTPo168fY8eOJSYmpmE/tLB76tyVfmuKi4vT9jzxyOYTm3ng+wd4adBL3Bp162XX+yn9Jx7/+XFuaX9Ls7s4m5SURKdOnawdRpMivzOhlErQWsddbT1p0TeChUkL8XP1Y1TbUVdc7/rw63m4x8N8degrFh9Y3EjRCSHsnST6BpZZmMnajLVM6DABV0fXq67/hx5/4Lqw65i9dTbbT9rvWY4QovFcNdErpT5QSmUppfZVW/aaUipZKbVHKfWFUsq32mvPK6VSlVIHlFIjGyrwpmJJ8hIUikkdJ9VofQflwD+G/IMw7zCeWvuUXJwVQtRZTVr0HwG/7nP4H9BVa90dOAg8D6CU6gxMAbpY3jNPKeVYb9E2MSWVJaxIWcH14dcT4hlS4/d5u3jz5vA3Ka0sZVr8NMpMZQ0YpRDC3l010Wut1wG5v1r2vdb63GDgzUCY5fGtwGKtdZnW+giQCvStx3iblG8Of0NBecFFQypror1ve14Z/Ap7T+/l5c0vYwsXzYUQTVN99NHfC3xreRwKHKv2WoZlWbOjtWZh8kI6+HWgd8vetdrGDRE38GD3B/ki9QuWHlhazxEKIZqLOiV6pdRfgErgs3OLLrHaJZuiSqkHlVLblVLbs7Oz6xKGTUo4lcDBvIPcGXNnnYZJPtLjEYaEDuHVra+y49SOeoxQXM3tt9/OypUrq5537NiRl156qer5HXfcweeff17n/cTHxzN27Ng6b0eIy6l1oldK3Q2MBX6jz/crZABtqq0WBlyyDq/Wer7WOk5rHRcUFHSpVZq0hckL8XHxYXS70XXajqODI68OfZXWXq2ZFj+NU0Wn6ilCcTXnipmBUWnSy8vropLF526YEsKW1SrRK6VGAc8Ct2iti6u99BUwRSnlqpRqC0QDzW52jZNFJ/kp/SfuiL4Ddyf3Om/Px8WHucPnUlxZzLT4aZSbyushSlHdCy+8wNy5c6ue/+Uvf6GkpOSCcsRjx44lOzsbrTVHjhzB3d2dkJAQTCYTTz/9NH369KF79+68++67gNFSHzZsGBMmTCAmJobf/OY3Vdda1qxZQ0xMDIMHD77grODFF1/k9ddfr3retWvXqiJoQtTWVYuaKaUWAcOAQKVUBjATY5SNK/A/S7fEZq31w1rr/UqppUAiRpfOH7XWpoYK3lYtPbAUjWZyzOR622aUXxQvD36ZafHTeGXLK7w48MV627bN+fY5OLm3frcZ0g1ufvWyL993332MHz+exx9/HLPZzOLFi9m4cSNz5syhvLycjRs3ct1113H48GGSkpLYuXNnVUGz999/nxYtWrBt2zbKysoYNGgQN910EwA7d+5k//79tG7dmkGDBrFhwwbi4uJ44IEH+Omnn4iKimLy5Pr7nghxKVdN9FrrqZdY/P4V1n8ZeLkuQTVlZaYylh9cznVh1xHqVb/XoUdEjOCBbg/w3t736BLYparcsai7yMhIAgIC2LlzJ6dOnaJXr160bNmSLl26sGPHDjZv3swzzzzD4cOH2bhxIzt37rxgwpE9e/awfPlyAM6cOUNKSgouLi707duXsDBjUFrPnj05evQoXl5etG3blujoaMAohyy16EVDkjLF9WzNkTXkleVxZ6drH1JZE3/s+UcScxN5ZcsrRPtG0zO4Z4Psx6qu0PJuSPfffz8fffQRJ0+e5N577wWMfvp169Zx9uxZ/Pz86N+/P2+//TY7d+7k4YcfBowRVm+99RYjR154f2B8fHxVeWK4sETx5S7QS4li0RCkBEI9Ojeksn2L9vQL6dcg+3B0cGTWkFm08mzFk/FPklWc1SD7aY5uv/121qxZw7Zt26qS9qBBg3j33Xfp0cOoOtq9e3c2b95Meno6XboYdfNHjhzJO++8Q0VFBQAHDx6kqKjosvuJiYnhyJEjHDp0COCCGvaRkZHs2GGMrtqxYwdHjhyp/w8qmh1J9PVod/ZuEnMSmRoztUErT7ZwbcHc4XMpqihiWvw0KkwVDbav5sTFxYXhw4czadIkHB2NG7oHDhzI4cOHGTBgAGC0uIODg4mLi8PBwfjzuf/+++ncuTOxsbF07dqVhx566ILJRX7Nzc2N+fPnM2bMGAYPHkxERETVa3fccQe5ubn07NmTd955hw4dOjTgJxbNhZQprkfPrHuGXzJ+4YeJP+Dh7NHg+/vu6HdMXzudiR0mMmPAjAbfX0OyhZK7ZrOZ2NhYli1bVtV/bsts4XcmrEvKFDey7OJs/nf0f9wadWujJHmAkZEjubfrvSw7uIzlB5c3yj7tVWJiIlFRUdxwww1NIskLcS3kYmw9WXZwGSZtYmrMpQYpNZzHej1Gcm6ycXHWL/qSM1iJq+vcuTOHDx+2dhhCNAhp0deDClMFyw4uY3DoYMJ9wht1344OjsweOptgj2Cm/TyN0yWnG3X/QgjbJ4m+Hnyf9j2nS0432JDKqzl3cfZsxVm5OCuEuIgk+nqwMHkhET4RDGxtvbonHf078reBf2Nn1k5mbZtltTiEELZHEn0d7T+9nz3Ze5gaMxUHZd1f56i2o/h9l9+z5MASvkj5wqqxCCFshyT6OlqYvBAPJw9ubX+rtUMB4LHYx+jfqj9/3/x39mbXc70YcYH//Oc/fPLJJ9YOQ4irkkRfBzklOXx75FtuaX8LXi5e1g4HACcHJ14b+hrBHsE8Ef+EXJxtQA8//DC/+93vrB2GEFclib4OVqSsoMJcwdROjTuk8mp83Xx5c/ibFJQV8FT8U3Jx9hrcdttt9O7dmy5dulQVGvvwww/p0KED1113HQ888ACPPvoocHFJYSFslYyjr6UKcwVLDixhQKsBtGvRztrhXCTGP4a/Dvwrz65/lte2v8af+/3Z2iHV2Kyts0jOTa7Xbcb4x/Bs32evut4HH3yAv78/JSUl9OnThzFjxjBz5kwSEhJo0aIFw4cPp1evXvUamxANTVr0tfRT+k9kFWdZbUhlTYxuN5q7O9/NouRFrExdefU3CP71r3/Ro0cP+vfvz7Fjx/j0008ZNmwYQUFBuLi4SO14cYFKcyVrjqyhqOLyRexsgbToa2lh0kJCvUIZEjrE2qFc0RO9nyA5N5m/b/o7Ub5RdA3sau2QrqomLe+GEB8fzw8//MCmTZvw8PBg2LBhxMTEkJSUZJV4hO1bnLyYWdtmMazNMOYOn2v1kXeXY5tR2bjk3GR2ZO1gasxUHB0crR3OFTk5OPHada8R6B7IEz8/QU5JjrVDsllnzpzBz88PDw8PkpOT2bx5MyUlJcTHx5OTk0NFRQXLli2zdpjCRuSV5jFv9zyC3IOIPxbPe3ves3ZIlyWJvhYWJi3E3cmd26Jus3YoNeLn5sebw98kvyyfp9Y+xZ7sPRRXFF/9jc3MqFGjqKyspHv37rzwwgv079+fVq1a8eKLLzJgwABuvPFGYmNjrR2msBH/3vVviiuKee+m9xjTbgz/3vVv1mest3ZYlyRliq9Rfmk+Ny6/kXHtxzFzwExrh3NNVh9ezZ/X/xmN8X8e6hVKtG800X7RRPlGEe0XTaRPJM6Ozo0eW1MpufvRRx+xfft23n77bWuH0mR+Z/boQO4BJq2exJSOU3i+3/OUVJbw229+y/Gi4ywZs4Q2Pm0aJY6alimuyeTgHwBjgSytdVfLMn9gCRAJHAUmaa3zlDHbxlxgNFAM3KO13lHbD2GLVqSsoMxU1uhVKuvD2HZj6R3cm6TcJFLyUkjNTyUlL4X1mesxWeZwd1JORLaIJNo3mii/qKp/Q71Cbbb/UYjGpLVm9rbZeLt480jPRwBwd3Lnn8P/yZTVU3gi/gkWjF6Au5O7lSM976oteqXUUKAQ+KRaop8N5GqtX1VKPQf4aa2fVUqNBv6Ekej7AXO11ledU6+ptOgrzZWM/nw0Yd5hfDDyA2uHU2/KTeUcOXOE1PzUquSfmp9KZmFm1TruTu5E+UZVtfzP/RvgFlAvs2lJ6/Taye/MOn5M+5En4p/gL/3+wpSYKRe89kvmLzzywyPc3PZmXh3yaoPONAf12KLXWq9TSkX+avGtwDDL44+BeOBZy/JPtHH02KyU8lVKtdJan6h56LZr7bG1nCg6wTN9nrF2KPXKxdGFjv4d6ejf8YLlheWFHDpzqCrxp+alsjZjLV+knq+j4+fqd0HLP9rXOAjU5k5hrXWD/2HYC1vocm2OykxlvLb9NaJ8o5jQYcJFrw8OHcyjvR7lrZ1v0S2wG3d1vssKUV6stsMrW55L3lrrE0qpYMvyUOBYtfUyLMsuSvRKqQeBBwHCwxu3hnttLUxeSIhnCMPaDLN2KI3Cy8WLHkE9LprMJKckh5T8FFLzzp8BfJH6BSWVJVXrtPJsVdXyj/KNooNfB9r7tsfJ4dJfOTc3N3JycggIqJ8zBHumtSYnJwc3Nzdrh9LsfJr4KZmFmbx303uX/S7f3+1+9p3ex+vbX6ejf0f6hPRp5CgvVt/j6C/1F3rJpofWej4wH4yum3qOo96l5KWw9eRWHo99/LL/wc1FgHsAAe4B9G/Vv2qZWZs5Xni8KvGn5BtnARuPb6TSbEyUHeUbxZxhc2jbou1F2wwLCyMjI4Ps7OxG+xxNmZubG2FhYdYOo1nJLs5m/p75DG8z/ILv/q85KAdeGfwKU7+eyvS101k6diktPVs2YqQXq23GOnWuS0Yp1QrIsizPAKpfbg4DjtclQFuxKHkRLg4u3BF9h7VDsUkOyoEw7zDCvMMuOOOpMFWQVpDG3tN7mZMwhymrp/DXQX9lVOSoC97v7OxM27YXHwCEsBVzd8yl0lzJ9LjpV13Xy8WLN4e/yZ1f38m0tdP4cOSHuDi6NEKUl1bbYRRfAXdbHt8NfFlt+e+UoT9wxh7658+UnWH14dWMbjcaPzc/a4fTpDg7OhPlF8Xt0bezbNwyov2ieXrt07y69VUptiaajH2n9/HloS/5beff1ni60Pa+7Xlp8Evsyd7Dq1tfbeAIr+yqiV4ptQjYBHRUSmUope4DXgVGKKVSgBGW5wDfAIeBVOA94JEGibqRrUxdSUllSZMcUmlLQjxD+HDkh9zV6S4+S/qMe9bcw4nCJt8OEHZOa82rW18lwC2AB7s/eE3vHRExgnu73suyg8v4POXzBorw6moy6uZy2e2GS6yrgT/WNShbYjKbWJy8mJ5BPekc0Nna4TR5zo7OPNv3WWJbxvLChheYuHoirw55lcGhg60dmhCX9M2Rb9idvZu/Dfwbns6e1/z+x3o9RmJOIi9vfpkOfh2sUm9K7oC5il8yfyGjMMOmq1Q2RSMiRrB4zGJaerTkkR8e4e2db2Mym6wdlhAXKK4oZk7CHDoHdObWqNrNIufo4MjsobMJdA/kyfgnyS3Nrecor04S/VUsTF5IkHsQN0bcaO1Q7E5ki0gWjF7ALe1v4d097/LQDw9J0TVhUz7c/yFZxVk81/e5Ot0Z7ufmxz+H/5O80jyeWftM1Ui0xiKJ/gqOnDnCxuMbmdhxIs4OjV//pTlwd3LnpcEv8beBf2NX1i4mrZrEzqyd1g5LCI4XHufDfR9yc9ub6RVc98lmOgd05oX+L7Dl5Bbm7phbDxHWnCT6K1iUvAgnBycmdpho7VDs3u3Rt7Ng9AJcnVz5/Zrf8/H+j+XuT2FVcxLmoFBM6z2t3rZ5a9StTO44mY/2f8Sao2vqbbtXI4n+MgrLC/ky9UtGRo4k0D3Q2uE0CzH+MSwZu4ThbYbz+vbXeTL+SQrKC6wdlmiGEk4l8N3R77i3672EeIbU67af7fMsPYN6MmPDDFLyUup125cjif4yvjz0JcWVxdwZIxdhG5O3izdzhs1hetx01h5by5TVU+p9/lghrsRkNjFr6yxCPEO4p+s99b59Z0dn3hj2Bp7Ono3WmJFEfwlmbWZx8mK6BnSle1B3a4fT7CiluLvL3Xww6gPKTGX85uvf8HnK59KVIxrFytSVJOUmMa33tAYrNRzsEcwb171B5tlM/rL+L5i1uUH2c44k+kvYdHwTRwuOypBKK+sV3IulY5cS2zKWmRtn8sKGFy4onCZEfTtbfpZ/7fwXvYJ7XVSmo77Ftozl6T5PE58Rz7t73m3QfUmiv4SFyQvxd/NnZORIa4fS7AW4B/CfG//Dwz0e5qtDX/Gbb37D0TNHrR2WsFPz98wnrzSPZ/s+2yhVVKfGTGVcu3G8s+sd1mWsa7D9SKL/lWMFx1ifsZ4JHSZYtQiROM/RwZE/9vwj826cR3ZxNlO+nsJ3R7+zdljCzqQVpLEgaQG3Rd1Gl4AujbJPpRQzBsygo39Hnlv/HOkF6Q2yH0n0v7LowCIclSOTOkyydijiVwaHDmbZuGW0923P9LXTmbV1lhRGE/Xm9W2v4+roymOxjzXqft2c3PjnsH/ioBx4Iv4JiiuK630fkuirKa4oZmXKSm6IuMHq9aPFpYV4hvDRyI+4q9NdLEhawD3f3cPJopPWDks0cRszNxKfEc+D3R+0ynDqMO8wZg+ZTWpeKi9uerHeBx5Ioq9m9eHVnK04K0Mqbdy5wmivX/c6qXmpTFw1kQ2ZG6wdlmiiKswVzN42mzbebbirk/Wm/hsYOpDHYh/j2yPf8mnip/W6bUn0FlprFiUvIsY/pl5udxYNb2TkSBaPXUygeyB/+OEPzNs1TwqjiWu29MBSDp05xNNxT1v9utx9Xe/jhvAbmJMwh20nt9XbdiXRW2w9uZXU/FTujLlT5ixtQtq2aMvCMQsZ134c7+x+h4d/eNgq1QFF05Rfms+8XfPo36q/TcwFrZTipUEvEe4TzvS10+utW1ISvcXCpIX4uvpyc9ubrR2KuEbuTu68NOgl/jrwr+w4tYOJqyayK2uXtcMSTcC/d/2boooinunzjM008M5NQ1hmKmNa/DTKTeV13qYkeowqdfEZ8YyPHo+bk5u1wxG1oJRifPR4ozCao1EY7ZP9n8jdtOKyUvJSWHpwKRM7TCTaL9ra4VygXYt2vDzoZfae3ss/tv6jztuTRA8sPrAYgMkdJ1s5ElFXnQI6sXjsYoaGDeW17a8xLX4aZ8vPWjssYWO01szaNgsvZy/+2NM2J8W7IeIG7u92P8sPLmfFwRV12ladEr1S6kml1H6l1D6l1CKllJtSqq1SaotSKkUptUQpZdN3HZVWlvJ5yucMbzOc1l6trR2OqAc+Lj68OfxNpsdN5+djPzNp1SQ+T/mc0spSa4cmbMTPx35my4kt/LHnH/F187V2OJf1aM9HGdh6IC9veZm92XtrvZ1aJ3qlVCjwGBCnte4KOAJTgFnAP7XW0UAecF+to2sE3xz5hjNlZ2RIpZ2pKow28gM8nT2ZuXEmNy2/ibd3vk12cba1wxNWVG4q5/Xtr9O+RXsmdbTtGyMdHRyZNWQWwR7BPBn/ZK1nYKtr140T4K6UcgI8gBPA9cByy+sfA7fVcR8NRmvNwqSFRPlG0Sekj7XDEQ0gtmUsy8Yt44ORH9AjuAfz98znphU38Zdf/kJSTpK1wxNWsCBpAcfOHuOZvs/g5OBk7XCuytfNl38O+yf5Zfk8ve7pWk1DWOtEr7XOBF4H0jES/BkgAcjXWp+LJAMIre0+GtqOrB0cyDvA1JipNnPFXdQ/pRR9Qvrw1vVvsfr21UzuOJkf0n5g0upJ3LPmHn5M/1HG3zcTp0tO8+7udxkWNoyBrQdaO5wa6xTQiZkDZrLt5DbeTH***t9fl64bP+BWoC3QGvAELjU28ZLDHpRSDyqltiultmdnW+dUemHSQrxdvBnbbqxV9i8aX7hPOM/1fY7/Tfwf0+Omc6LwBE/8/ARjvxjLgsQFFJYXWjtE0YD+teNflJvLmd5nurVDuWbj2o9jasxUPk78mG+PfHtN761L182NwBGtdbbWugL4HBgI+Fq6cgDCgOOXerPWer7WOk5rHRcUFFSHMGrnh7Qf+DH9R26Puh0PZ49G37+wLh8XH+7ucjdfj/+aN657gyCPIGZtm8WNy29k1tZZZJzNsHaIop7tz9nPytSV/LbTb4nwibB2OLXydNzT9AruxcyNMzmYd7DG76tLok8H+iulPJTR73EDkAj8DEywrHM38GUd9lHvtNa8u/tdnox/ki4BXXig2wPWDklYkZODEzdF3sQnN3/CojGLuC7sOhYnL2bMF2N44ucnSDiVIGPx7YDWmllbZ+Hn5seD3R+0dji15uzozBvXvYGXsxdP/PxEjd9Xlz76LRgXXXcAey3bmg88C0xTSqUCAcD7td1HfSutLOXZdc/y9q63GdtuLB+M+sCmh1aJxtU1sCuzhs5izR1ruLfrvWw/tZ171tzD5NWTWXVolZREbsLWHF3DzqydPB77OF4uXtYOp06CPIKYM2wOJ4pO1Pg9yhZaK3FxcXr79u0Nuo+s4iwe++kxEnMSeSz2Me7rep9cgBVXVFJZwurDq1mQuIDDZw4T6B7IlI5TmNRxEn5uftYOT9RQSWUJ474Yh7+bP4vGLMLRwdHaIdWLZQeXManjpAStddzV1m0WiX7f6X08/tPjnK04y6tDXuX68OsbbF/C/pi1mU3HN/Fp4qdsOL4BV0dXxrYby12d7iLKL8ra4YmreGfXO8zbPY+PRn1E75a9rR1OvVJK1SjR2/4g0jpac2QN/7fh/whwC+DTmz+lo39Ha4ckmhgH5cCg0EEMCh3EofxDLEhawKpDq1iRsoIBrQZwV+e7GBw6GAclFUVszYnCE3yw7wNGRY6yuyR/Ley2RW/WZubtmse7e94lNjiWOcPmEOAeUK/7EM1Xfmk+y1OWsyhpEVklWUT6RHJXp7sY136cjOKyIc+sfYafjv3EqttW0cqrlbXDqXc1bdHbZaIvrijm/zb8H/9L+x+3Rd3GC/1fsPqEAsI+VZgq+D7tez5N/JT9OfvxcfFhQocJTI2ZSohniLXDa9Z2nNrB3Wvu5uEeD9ts4bK6araJ/mTRSf700584mHeQab2n8bvOv5OLrqLBaa3Zlb2LTxM/5cf0H1Eoboq4ifu7308Hvw7WDq/ZMWszU7+eSk5JDl/d9pXdnmU1yz763dm7efynxykzlfHW9W8xNGyotUMSzYRSil7BvegV3IvMwkwWJS1iRcoKvk/7nqkxU3mk5yN4u3hbO8xm48vUL0nMSeTVIa/abZK/FnZz9WjVoVXcu+Ze3J3cWTB6gSR5YTWhXqFM7zOdb8d/y/jo8XyW9Bm3rLyF1Ycc29InAAAgAElEQVRXy81XjaCwvJC5O+bSM6gno9uOtnY4NqHJJ3qzNvPPhH/y51/+TI/gHiwas4j2vu2tHZYQ+Lr5MmPADBaNWUSIRwjPr3+e33/3+2u6dV1cu/l755NTmsOzfZ+VbluLJp3oiyqKePynx/lg3wdM7DCRd0e8K3e6CpvTJbALn435jJkDZpKan8qkVZOYvW22FFBrAOkF6SxIXMCt7W+la2BXa4djM5psos8szOS33/6W9Znreb7v87zQ/wWcHZytHZYQl+SgHJjQYQKrb1vN7dG3syBxAeNWjpPunHr2+vbXcXZw5vHYx60dik1pkol+x6kdTF09lZOFJ5l34zzu7HSnnKKJJsHXzZeZA2aycMxCWnq0rOrOSclLsXZoTd6m45v4+djPPND9AYI8Gr8iri1rcon+i5QvuO/7+2jh2oLPxnzWpCYPEOKcroFd+Wz0Z8wYMIPU/FQmrpoo3Tl1UGmuZPa22YR5hfHbzr+1djg2p8kkepPZxGvbXmPGxhnEtYxjwegFtG3R1tphCVFrjg6OTOww8aLunK8Pfy3dOddAa817e98jNT+V6XHTcXV0tXZINqdJJPqz5Wd59KdH+STxE+6MuZN3bnyHFq4trB2WEPXiXHfOZ6M/o6VHS55b/xz3fncvqXmp1g7N5hVXFPPc+ueYt2seIyJGSMHCy7D5O2OPFRzj0Z8eJb0gnef7PW/zs7YLURcms4nPUz9n7o65FJYX8ptOv+EPPf7Q5GuoN4SUvBSeWvsUaQVpPNrzUe7rdl+zKyxX0ztjbfq3svXEVqZ+M5Wc0hzeHfGuJHlh985156y6bRW3Rd3Gp4mfcsvKW6Q751e+OvQVd359JwVlBbw34j0e6P5As0vy18JmfzNLDyzlof89RIBbAItGL6Jvq77WDkmIRuPn5seLA19kwegFBHkESXeORWllKTM3zuQvv/yFbkHdWH7LcskNNWBzXTfnrp4vSl7E4NDBzB46W2qEiGbNZDaxImUFc3fMpbii2OjO6fkHPJ09rR1ao0orSOOp+Kc4kHeAB7o9wCM9H8HJwa7KdV2zJlm98kzZGaavnc7mE5u5u/PdPNn7SbuZ9kuIusorzWPujrmsSFlBkHsQ0+Omc3Pbm5vFPSTfH/2eGRtn4OTgxD8G/4MhYUOsHZJNaJREr5TyBf4LdAU0cC9wAFgCRAJHgUla67wrbScuLk4v+3EZf/rpT2QWZjKj/wxuj7691nEJYc/2ZO/h5S0vk5iTSJ+QPvy575/tdkrDClMFbyS8wWdJn9E9sDuvX/e6XU4gUluNleg/BtZrrf+rlHIBPIA/A7la61eVUs8BflrrZ6+0nZgeMdr/OX+clBNvDn+T2JaxtY5JiOagOXTnHC88zvS109l7ei93dbqLab2n4ewoZU6qa/BEr5TyAXYD7XS1jSilDgDDtNYnlFKtgHit9RUnavVo66FHvjmSt294m1Cv0FrFI0RzlFuay9wdc/k85XOC3YOZ3mc6oyJHNfnunHUZ6/jzL3/GZDbxt0F/Y0TECGuHZJMaI9H3BOYDiUAPIAF4HMjUWvtWWy9Pa+13pW217NhSH9532K5aI0I0pt3Zu3l588sk5SbRN6Qvz/d9vkl251SaK/n3rn/z373/paNfR+YMm0O4T7i1w7JZjZHo44DNwCCt9Ral1FygAPhTTRK9UupB4EGA8PDw3mlpabWKQwhhMJlNLD+4nLk7jZutBoUOYkrHKQwOHdwkBjVkF2fzzLpn2H5qO3dE38FzfZ/DzcnN2mHZtMZI9CHAZq11pOX5EOA5IIpr7Lqp78nBhWjOcktzWZy8mOUHl5Ndkk1rz9ZM7DiR8dHj8Xfzt3Z4l7TlxBaeWfcMJZUlvND/Bca1H2ftkJqExroYux64X2t9QCn1InCu7yWn2sVYf631M1fajiR6IepfhbmCn9N/ZsmBJWw9uRVnB2duiryJKR2n0COoh03045u1mff2vMe83fOI8IlgznVzmmSXk7U0VqLviTG80gU4DPwe427bpUA4kA5M1FrnXmk7kuiFaFiH8g+x5MASvjr0FUUVRcT4xzC542RGtx1ttcmz80rzeH7982w4voHRbUczc8BMmcj7GjXJG6aEEA2ruKKY1YdXs+TAEg7mHcTL2Ytbo25lUsdJtGvRrtHi2JW1i+lrp5NbmstzfZ9jYoeJNnGG0dRIohdCXJbWml3Zu1icvJjv076n0lxJv5B+TI6ZzPA2wxustIDWmk8SP+HNhDdp6dmSOcPm0Dmgc4PsqzmQRC+EqJHTJadZmbqSpQeWcqLoBMHuwUzoOIEJ0RPqdUq+gvICZmyYwY/pP3J9m+v5++C/4+PiU2/bb44k0QshronJbGJ95noWH1jMhswNOCknrg+/nikxU4hrGVenrpXEnESein+Kk0UneaL3E/yu8++kq6YeSKIXQtRaekE6Sw8s5YvULygoL6Bdi3ZM7jiZce3HXVM1Wa01yw4uY9bWWfi6+fLGdW/QM7hnA0bevEiiF0LUWWllKWuOrmFJ8hL25ezD3cmdse3GMrnjZDr6X/H2GIorivnrpr/yzZFvGNR6EK8MecVmx/E3VZLohRD1at/pfSw5sIRvj3xLmamM2OBYJneczI0RN+Li6HLBuofyDzEtfhpHC47ySI9HZAaoBiKJXgjRIM6UnWFl6kqWHFjCsbPH8Hfz547oO5jYYSKtvFqx6tAq/r7577g7uTN76Gz6tepn7ZDtliR6IUSDMmszm45vYvGBxazLWAdAjH8MiTmJ9G7Zm9lDZxPsEWzlKO1bTRN9856HSwhRaw7KgUGhgxgUOojjhcdZfnA5P6T/wP3d7uePPf/Y7Kf5syXSohdCiCaqpi16uToihBB2ThK9EELYOUn0Qghh5yTRCyGEnZNEL4QQdk4SvRBC2DlJ9EIIYeck0QshhJ2TRC+EEHauzoleKeWolNqplFpted5WKbVFKZWilFqilHK52jaEEEI0nPpo0T8OJFV7Pgv4p9Y6GsgD7quHfQghhKilOiV6pVQYMAb4r+W5Aq4HlltW+Ri4rS77EEIIUTd1bdG/CTwDmC3PA4B8rXWl5XkGEFrHfQghhKiDWid6pdRYIEtrnVB98SVWvWR5TKXUg0qp7Uqp7dnZ2bUNQwghxFXUpWD0IOAWpdRowA3wwWjh+yqlnCyt+jDg+KXerLWeD8wHo0xxHeIQ4pJKyk2k5xaTfbYMN2cHPFyc8HBxxMPVEU8XJ9ydHXFwuFTbRAj7UutEr7V+HngeQCk1DJiutf6NUmoZMAFYDNwNfFkPcQpxEa01+cUVpOUWk5ZTRHpOcdXjtJxiss6WXXUb7s6OeLo64u5iJH8PF8eqA4Knq5NluWO1g4ST5bmxzNPVEXdn499z68gBRNiahpgC5llgsVLqJWAn8H4D7EM0E2az5kRB6QWJ3PjXSOZnSysvWL+ljysR/p4M7RBEhL8H4QEehPi4UW4yU1Rmori8kuJy49+iMhMlFSaKyoxlRWWVVc9PF5ZdtN618Kh2MDj32NP14gOJx1UOGp4uTni4OlYdQIzxDkJcm3pJ9FrreCDe8vgw0Lc+tiuah9IKExl5xaTlGD/p51rlucVk5JZQbjJXrevsqAjz8yDc34PYcD/C/T2ICPAkIsCDNn4euLs4NkiMZrM2DgLllZSUm6oOGkXlJkosB4NzB5GichPFZdVeKzdRUm6isKySrIIyiisqKS4ztlVaYb76zi2UAg9n46yi6uDgUu1sxHJA8HRxIsDLhZ5t/Oge1gI354b5nYimQyZ1FI2iqKySw9lFVS3xc63y9JxiThSUUn1GS08XR8IDPOkQ7M2ITi0JD/AgMsCTcH8PWvu642iFbhEHB4WnqxOervX7J2OyHEDOHRiqDhZllgNKtWUXr2M8PltayamC0gvOUMoqjQOIs6OiS+sWxEX40dvyE+zjVq+fQdg+SfSi3mmtycwvISEtr+on6UQB5mrJPNDLlYgAD/q3CyA8wIOIAA/C/Y2WeYCnS7PponB0UHi5OuFVzweQ3KLyqt/9jrQ8Pt2cxn9/OQJAG393eoefS/z+dAzxtsrBUzQemRxc1FmFyUzi8QK2W5JKQloeJwtKAaOvule4L70j/OncyseS0D3qvWUsrqy80sz+42eqkv/2tDyyLRervVyd6BXuS6wl+fcK98XbzdnKEYuaqOnk4JLoxTXLLy6/oLW+OyO/qq851Ned3hF+xEX6ERvuR0yIN06OUjvP1mitycgrsST9XBLS8jlw0jjrUgo6tvQmLtLS6g/3p42/e7M5y2pKJNGLeqG15vDpIhKO5lUlhUPZRQA4OSi6tPahd4R/Vf9vSAvp/22qzpZWsOtYftUBfGd6PoVlxqimIG/X8909kX50ae2Dq5Nc5LW2miZ6OX8WFyitMLH7WD4J6XkkHM1jR3oeecUVAPh6ONM73I/xsWHERfjRPcy3wUa5iMbn7ebMkOgghkQHAcaF4oOnzlZ1yW1Py2XN/pMAuDg50COsBbERflUHgAAvV2uGL65AWvTN3KmCUqOlfjSPhPQ89meeodJy1bR9kGdVS713hD/tAj3lRqBmLsvyfUlIM74v+zLPUGEyvi9tAz2JDfdjYPsAhkQHyuieRiBdN+IixeXGEMed6cbFuIS0PDLySgBwdXKgRxtfo389wo9e4X74e8pUAuLKSitM7M08c76xkJZbdQYYE+LN0A5BDIkOpE+kv4znbwCS6JshrTW5ReXn7x6tNlY9zVLz5Zxgb1fLxTajf71zKx9cnOSiqagbs1mTeKKA9SmnWZ+SzfajeZSbzLg6OdC3rT9Do4MY0iGQji295eJuPZBEb6dMZs3x/BLL3aPVErnljtJzF8/OadXCzXL36Pk7SHuE+RLmJ6MoRMMrLq9ky5Fc1h3MZn3KaVKzCgGjoTEkOoihHQIZFBVIoPTv14ok+iastMLEsapEXky6pRxAek4xx/KKq/pEwbjzsY2fUdMlolo5gIgAD8L8POR0WdiU4/kl/JJymnUp2fySepp8SzdPl9Y+RuKPDqR3pJ+M6KkhSfQ2Lr+4/MJEXq1g17mbjc7xdnW66O7RcwW7WrWwTkkAIerKZNbsP36G9SmnWXcwm4S0PCrNGndnR/q182dIdBDXdQikfZCXnH1ehiR6G5WWU8TTy/ew9UjuBcuDvV0vTOSWO0gjAjzx83CWL7qwe4VllWw5nFPVzXP4tHG/RqsWbgyJDmRIdBCDogJlkEA1kuhtjNaaz7ak88o3STgqxYND29EhxLsqoXu4yC0NQlR3LLeYX1KNi7q/pJymoLQSpaBbaIuqxB8b7tesBxFIorchJ8+U8uyKPaw9mM3gqEBmT+hOa193a4clRJNhMmv2ZORXjebZkZ6PyazxcHFkQLsAhnYI4roOQUQGelo71EYlid4GaK35avdxXli5j3KTmT+P7sRd/SLkpiMh6qigtILNh3JYl2J086TlFANGjZ6RXUMY1SWETq3sfwinJHoryy0q5/9W7uWbvSfpFe7LGxN70C7Iy9phCWGX0nKK+DEpi+/2n2Tb0VzMGsL9PRjVNYSRXVrSq42fXTawJNFb0Y9Jp3h2xV7OlJTzxI0deGhoO6ngKEQjOV1Yxg+Jp1iz/yQbUk9TYdIEe7tyU5eWjOrSin7t/HG2k79HSfRWcLa0gr+vTmTp9gxiQryZM6knnVv7WDssYS2V5eDgaPwIqygoreDnZKOl/3NyNiUVJlq4O3NDp2BGdQlhaIegJn2vSYMneqVUG+ATIAQwA/O11nOVUv7AEiASOApM0lrnXWlb9pDoNx3KYfqy3Zw4U8JD17XniRuj5aaP5ubsKTi2GY5thWNb4Pgu0CbwCADPYPCy/HgGgVfLix97BMhBoQGVVphYn3KaNftO8kPSKc6UVODu7MiwjkGM6hrC8JhgfJrYhCuNkehbAa201juUUt5AAnAbcA+Qq7V+VSn1HOCntX72Sttqyom+tMLE7DUH+GDDESIDPHhjUg96R/hbOyzR0MwmyEo0EvqxrZC+GfLTjNccXSE0FsL6gJMrFGZBUTYUnoLCbCjKgsrSi7epHMAjsNoBIRi8LAeCXz/28JeDQh1UmMxsOZzLd/tP8t3+k2SdLcPZUTEoKpCRXUIY0bllkyjL0OhdN0qpL4G3LT/DtNYnLAeDeK11xyu9t6km+t3H8pm2dBeHsov43YAInrs5RsbD26vSAsjcfj6pZ2yH8rPGa57BEN4P2vSDNv2hVXcjwV+O1lBWcD7pVz8AXPDY8mMqu3gbysE4G7jcmYJ/e+NgY+ejTuqD2azZeSyf7/afZM2+k6TnFuOgIC7Sn1FdQhjZNYRQGx0O3aiJXikVCawDugLpWmvfaq/laa39LvGeB4EHAcLDw3unpaXVOY7GUmEy89aPKfw7/hBBXq68NrF71WQNwg5obbTOzyX1Y1shaz9oM6CgZRdo09dI6m36gl9kwyXUqoOCJelXPwBc8NhyxmAqP//eoE7Q5z7oPhnc5FpRTWitST55ljX7jJZ+8knjYN4ttIVlBE8IUcG2M3qu0RK9UsoLWAu8rLX+XCmVX5NEX11TatEfPHWWaUt3sS+zgPG9Qpl5SxdauDetfj3xK5XlcHKPJalbumIKjZmUcPGCsLjzST0sDtxaWDfey9EaSs8YST99M2x/H47vBGdP6D4R4u4zzjZEjR09XWS09PefZGd6PgBRwV6MtIzg6RrqY9Wx+o2S6JVSzsBq4Dut9RzLsgPYYdeNyax5/5fDvP79QbxcnXjl9q6M6trK2mGJ2ijKgYxqrfXjO873mfuGn0/q4f0huHPT7gvP3GEk/L0roLLEuG4Qdx90uR2cZQaoa3HyTCnfJxot/c2HczGZNaG+7ozsEsItPVvTI6xFoyf9xrgYq4CPMS68PlFt+WtATrWLsf5a62eutC1bT/TpOcVMX7abrUdzGdG5Ja/c3o0gb9u/UCMwWrmnD55P6sc2Q06q8ZqDM7TqYfSth/eDsL7gY6cH75I82LUItn8AOSng7ge97oLev4eA9taOrsnJKyrnh6RTfLf/JOtSTlNeaaZdkCfje4VyW69Qwvw8GiWOxkj0g4H1wF6M4ZUAfwa2AEuBcCAdmKi1zr3kRixsNdFrrVm09RgvfZ2Io1LMvKULd8SG2v1t1U2e2Wy02JNWQdJXkJ9uLHf3P5/U2/SD1r3A2TYvsjUYreHIOqOVn/w1mCuh/fVGK7/DKHCUwQTXqqC0gm/3nmDFjsyqqrT92vpzR2wYN3cLwbsBh2zKDVN1dKrAKEQWfyCbQVEBzJ7Qw2avvAuMfvaj643knvy1caHS0QXaDYOOoyFyMAREySiU6s6ehB2fQMJHUJAJPqEQezfE/s5+z2wa2LHcYlbuzOSLnZkcPl2Eq5MDN3UJYXyvUIZEB9b7HfKS6OvgXCGyskoTz42K4XcDIu2yTkaTV14Mh34ykvvBb40Lkc6eED0COo2D6JtktElNmCrh4BqjlX/oJ3BwgpgxRiu/7VA5ONaC1ppdx/L5YmcmX+0+Tn5xBYFertzSozXjY0Pp0rp+LuJKoq+FvKJy/u/LfXy95wQ92/gyZ5IUIrM5JfmQ8r3RJZPyg3GB0d3PaLXHjIX2w5tfd0x9yjkECR/CzgVGv35ANMTdCz2nGr9ncc3KK83EH8jii52Z/JiURbnJTIeWXtzeK4zberWmVYvaf18l0V+jn5OzeGbFHvKLpRCZzSnMMrpjklYZ/cvmCvBuZbQ6O42DiEHgKENc61VFKSSuhG3vG9c7nNyh2x1GKz801trRNVn5xeV8vfcEn+/IJCEtD6VgYPsAxvcKY1TXEDxdr+0aiST6Giosq+Sl1Yks3naMji29mTO5B11a2+g46eYkLw2SV0PSakjfBGjwa2sk9k63QGhvcJADcaM4scfo1tmzDCqKjIvYcfdB1zvApXFGl9ijo6eL+MLSn5+eW4y7syOjuoZwe69QBkUF1mguaEn0V2E2azYdzuHZFXs4nl/Cg0Pb8+QIKURmVdkHjC6ZpFVwYrexrGVXI7nHjDXuSJX+YuspPQN7lhqt/Owk48axHncaXTtBHawdXZOltSYhLY/Pd2ayevdxCkoraenjyq09Q7m9VyidWl3+OpMk+l8pKTexOyOfhLQ8EtLy2JGeR35xBREBHsyRQmTWobVx52bSKuMnJ8VYHtYXOo01kruM8bY9WkPaRqOVn/iV0ZUWOQT63G90p0k3Wq2VVpj4OTmLFTsyiT+QRaVZ06mVD+N7hXJrz9YE+1x4k1uzT/Qnz5SSkJbH9rRcdqTlsf94AZVm47NGBXvRO9yP3pF+jO3eSgqRNSazyeiKSVpldMsUZIByNIY/dhpnJAqf1taOUtRUYRbs/BS2fwRn0sErxEj4fe4zKmyKWsspLGP1nhN8vjOT3cfycVAwODqIO2JDualzCO4ujs0r0VeazCSfPFvVWk9IyyMzvwQAN2cHeoT50jvCj7hIP3q18cPP06W+Qhc1YTbDkbWw/3NI/gaKTxulfKNuMJJ7h1GSFJo6swlSf4Ct70Hq/8DZA3r9FgY8YhR9E3WSmlVYNT4/M78ETxdHbu7Wijcm9bTfRF9QWsHO9HwSjuaSkJ7HrvR8ispNALT0cSUuwp/eEX70jvCjc2sfu5k2rMkpzoVdnxm33eceBhdv6DDS6JaJGgGuMnTVLp1KhE1vG/352gSdb4WBj8lonXpgNmu2Hs3l8x0ZfLP3JPv/Nso+Er3WmvTcYrYfzSMhPY+Eo3kczDqL1uCgoFMrH+Ii/Ii1JPZQX3cpUWBNWhu12re/D/s+N2qpt+lvnMp3ukUKaTUnBcdhy39g+4dGqeXIIUbCj7pRRkzVg5JyEx6uTk0z0ZdVmtiXecboXz9qXDQ9XWjU2PZ2daJXhB9xlqTes43vNY87FQ2kvAj2LjNGZJzcY5T37T7ZGJER0tXa0QlrKi0wSi1snmeUWgiKgYF/gm4TrzxBi7iqJtVH36FLD/371xazPS2PvRlnKDcZNdIiAjyqumDiIvyJDvaSUgS2JivZaL3vXmy02lp2NZJ790ng6m3t6IQtMVUYZ3kb/wWn9hkXbvs/bFTQdPe9+vvFRZpUondtFa0j7v0XXUN9iIv0JzbcSO5SCthGVZYb4923fwBpG4ziYZ1vM7pn2vSTse7iyrQ2aups/BccjjfO/nrfA/0eBt821o6uSWlSib5z9556R0ICbs5ys5JNy083Kh3u+MSYxcg3wmi997oLPAOtHZ1oik7sgY1vwb4VRgOhy3ijW0dmwqqRJpXobaHWjbgMswlSfzS6Zw5+Z/wxdhhl3ALf/nq5qCbqR/4x2PwO7PgYyguN8tIDHzO+Y3KGeFmS6EXdFGYbN8IkfGi05D2DjTrlve+R02vRcEryje/c5v8Y8/a27Ga08LuOlztuL0ESvbh2Wp+fVDrxSzCVG0Pi4u41yhE4yY1mopFUlhmjuDa+BdnJxqQo/f9gTIwicwxUkUQvaq60APYsMS6uZiWCawuj/njcvRB0xXndhWhYZrNxp+2Gf0HaL8Z3M+4e6PcHmQULSfSiJk7uNca9711m9Iu26mHUKel6B7h4Wjs6IS6UmWC08BO/NOojdZ9kdOsEd7J2ZFZT00TfYHcbKaVGAXMBR+C/WutXG2pf4hpUlBp/KNvfh2NbwMnNSOznJpSQC1/CVoX2hokfQe4R4+arnQuMEhtRI2DQY0Y3o3x/L6lBWvRKKUfgIDACyAC2AVO11omXWl9a9A2ossyo7Z6+2UjsaRssU8RFGV0zPaZKQTHRNBXnGmekW981hvsGREPEQAjvb9zP4d/O7hO/tVv0fYFUrfVhSzCLgVuBSyZ6UY8Ks42EfmwLHNtq1Hs3lRmv+beDDjdDj8nQ9jq7/yMQds7DH657GgY+atyZnfw17F9pDNEE8Ag0En54P+PfVj2bba2lhkr0ocCxas8zgH7VV1BKPQg8CBAeHt5AYdg5s9kYkXAuqR/bbFSJBONu1da9oN9Dxpe8TV/wCrZuvEI0BGd3iPu98WM2w+kDljNYy9/Ega+N9RxdjGTfpu/5Vn8z+ZtoqER/qabiBX1EWuv5wHwwum4aKA77UlZoXJCqarFvg7IzxmueQcYXt/fvjX9b95SCUaL5cXAwLs4GdzISPxhnuRlbzyf/rfONMspgzENcvdUfFAMO9neHfkMl+gyg+l01YcDxBtqXfdIazmRUS+pb4OQ+o743yvgidx1//kvq11a6YoS4FK8gY+aymDHG819ftzr0I+xZbLzm6gNhfc7/XYX2tovifA2V6LcB0UqptkAmMAW4s4H2ZR9MFcZwx3NJPX0LnLUcG509ISwOhjxl+fLFSbU/IWrLydXovmnT13iuNeQdMVr751r98f8ANCgHY1L6Nv3PJ/8WbZpco6pBEr3WulIp9SjwHcbwyg+01vsbYl9NVnEuZGw7n9QzE6DSmP6QFuHG6IFzX6zgLuAodfeFaBBKGQMV/NtBjynGspJ8yNxu/G0e2wK7FsK294zXvFtZrntZ/j5Dutt8eQa5YaqxVJQYk2If+gkO/WzU4wZwcDK+KOe+NGF9oUWodWMVQlzIVAlZ+y9s9Z9JN15z9oCON0P3KUYRtkZslMmdsdamtVFO4NBPxk/aRqgsNa78hw+AtkONK/+tY8HFw9rRCiGuVcFxo7V/ZB3s/8K4P8UzGLpNMM4MQro3eBePJHprKMw2JlI4l9wLTxrLg2KMI337G4wuGUnsQtiXynJI+d64qHtgDZgrIKiTkfC7TwKf1g2yW0n0jaGyzDiNO5fYT+4xlrv7Q/vhRnJvN1y6YoRoTopzjRb+niVGix9lnMH3mAqdxoGrV73tShJ9Q9Aasg9U647ZABXFRj97m/5Gco+6AUJ6yIQcQgjIOQR7lsLuRZCfZvTndxoH3Scbk6vUccy+JPr6UpQDR+LPX0QtyDSWB0RbujeMg7UAAAhPSURBVGOuh8hBdjHWVgjRQLQ2Wve7Fxmt/dIzxuidbhOMln7LLrXarCT62qosN+6iO9dqP74L0ODmaxyB219vtNx9pWyDEKIWKkrh4BqjayflezBXGjNp9ZgM3SaCd0iNNyWJvqa0hpzU84n9yHqoKDLqXbfpe77V3rqXXd4aLYSwoqLTsO9z4yJuZoJxg1a74UYrP2bMVQduSKI/R2soKzBGxBRlQeGp84/PZMLRX86Ph/VvV607ZjC4tWiYmIQQ4teyDxqt/D1L4MwxcPGCzrca/fmRQy553c++E73WUHbWqEFdeAoKsy7z2JLQK0sv3oZyMAqBhfU5n9z929bfhxJCiNowmyF9o6U//0soP2vMmdt9knFTVnBM1apNL9Fv22ZMZ1eYZUnWWVd+fLnk7RFolB71DAKvlkZBI8/gix97+EtXjBDCtlWUwIFvjHr7qT8aRQ1b9TTG53edgPIObkKJvo2H3v6Qz/laLxdQ4BloSdCW5O0ZZCTzqsctjeceAZK8hRD2qTAL9i43+vNP7AbliHoxz7pzxl4TF0/oc5+lJV49oVuStxT0EkI0d17BMOAR4ycryWjl87cavdU2WvS2NLxSCCGaiJr20cvtm0IIYeck0QshhJ2TRC+EEHZOEr0QQtg5SfRCCGHnJNELIYSdk0QvhBB2ThK9EELYOZu4YUopdRY4YO04GlkgcNraQTQy+czNg3zmxhOhtQ662kq2UlvgQE3u7rInSqnt8pntn3zm5sHWP7N03QghhJ2TRC+EEHbOVhL9fGsHYAXymZsH+czNg01/Zpu4GCuEEKLh2EqLXgghRAOpVaJXSrVRSv2slEpSSu1XSj1uWe6vlPqfUirF8q+fZXmMUmqTUqpMKTX9V9t60rKNfUqpRUopt8vs827LdlOUUndXW/6yUuqYUqqwNp+liX7mNUqp3ZZt/Ecp1SDTatnYZ45XSh1QSu2y/ATb82dWSnlX+6y7lFKnlVJv2vNntiyfrJTaY9nG7Ib4vFb8zGuUUvlKqdW/Wv6oUipVKaWVUoEN8oG11tf8A7QCYi2PvYGDQGdgNvCcZflzwCzL42CgD/AyML3adkKBI4C75flS4J5L7M8fOGz518/y2M/yWn9LPIW1+SxN9DP7WP5VwApgSjP4zPFAXEP+H9vaZ/7VegnAUHv+zEAAkA4EWdb7GLjBHj6z5bUbgHHA6l8t7wVEAkeBwIb4vLVq0WutT2itd1genwWSLB/4Vst/zrn/pNss62RprbcBFZfYnNP/t3d3IVKVcRzHv3/aEnUhrYvNXsDeYKG3vdJeDMrEDQkRIsgsoi6CvJCug2C7i4oguuhmKyi2EERqQ8Jsu8gELULRrcxEpSyp1V4UKsn238X/mTrp7LgzOzPn6fj7wLA7Z87bT8b/Oc8553kWmG1mPcAc4Ps68wwCW9z9J3f/GdgC3J3Wvd3dj7SSoxmZZT5eWM8FQEdutOSUuVtyzGxm1xKFZusM49WVUeargH3uPpHm+wC4tw0Rz1BCZtx9DDhRZ/pOdz80o0BnMeNr9Ga2kDgi7QD6akU3/WzYvHb374DniaP4EeBXd3+/zqyXAd8W3h9O00qRQ2Yz2wz8SHxxNrQYZdpyyAy8li5jPGVm1mKUacskM8BqYL2n079OKjnzfqDfzBamorkKuGImeaajS5lLNaNCb2a9xKWDJwpnmc0sP584gl4JXArMNbMH681aZ1opjwvlktndB4nm5yxgabP70YxMMq9x9xuA29ProWb3oxmZZK65H3ir2X1oVtmZ09n948B6ovVyCDjV7H40o4uZS9VyoTez84l/oBF335gm/2BmC9LnC4gzzkaWAQfdfcLd/wQ2Area2eLCTaiVxBG/eGS/nCmaR52UW2Z3/wMYJb5oHZFL5nTmVGtmvwksak/CM+WSOW3rJqDH3T9rS7gp5JLZ3d9198Xufgsx/tXX7cp4ui5nLlWrT90Y8Arwpbu/UPhoFKjdQX8YeOcsq/oGuNnM5qR13pXWucPdB9JrFNgMLDez+ekIujxN65pcMptZb+GL2AOsAPa2K2dRRpl7ak8jpP+c9wDj7cpZlEvmwnpW0+Gz+ZwyW3qaKk1fCwy3J+V/lZC5XN7aHeslRPNyN7ArvVYQd83HiKPwGHBRmv8S4ih+HPgl/V57cuRpolCNA28As6bY5qPENbz9wCOF6c+m9U2mn0OtZPq/ZAb6gE/TfnwOvESc8VU581ziqZNa5heB86qcufDZAaC/E1lzzEwc1L5Ir448TVZi5q3ABPB7Wn4wTV+X3p8iWjbD7c6rnrEiIhWnnrEiIhWnQi8iUnEq9CIiFadCLyJScSr0IiIVp0Iv0gIzm2dmawvv77DTRiUUyYUKvUhr5hEdekSyp0IvlZcGydprZsMWY4aPmNkyM9tmMe74IotxyN+2GAt9u5ndmJYdMrNXLcbDP2Bm69JqnwGuTl3cn0vTes1sQ9rWSOopKVK6nrJ3QKRLrgHuAx4jehY/QPSOXAk8SYymuNPdV5nZUuB1YCAt2w/cSYxb/pWZvUyMVX69uw9AXLohRkC8jujduA24Dfi4G+FEGtEZvZwrDrr7HnefJIZRGPPoFr6H+KMPS4ju67j7h8DFZnZhWnaTu59096PEIFd9U2zjE3c/nLaxK61XpHQq9HKuOFn4fbLwfpJo2TYaLri47F9M3RKe7nwiXaVCLxI+AtbAP5dhjnrj8clPEJdyRLKnMw6RMET8BavdwG/8O1RtXe5+LN3MHQfeAzZ1fhdFWqPRK0VEKk6XbkREKk6FXkSk4lToRUQqToVeRKTiVOhFRCpOhV5EpOJU6EVEKk6FXkSk4v4GsPIi+oY43AcAAAAASUVORK5CYII=)]

    十六、Pandas的分层索引MultiIndex

    为什么要学习分层索引MultiIndex?

    • 分层索引:在一个轴向上拥有多个索引层级,可以表达更高维度数据的形式;
    • 可以更方便的进行数据筛选,如果有序则性能更好;
    • groupby等操作的结果,如果是多KEY,结果是分层索引,需要会使用
    • 一般不需要自己创建分层索引(MultiIndex有构造函数但一般不用)

    演示数据:百度、阿里巴巴、爱奇艺、京东四家公司的10天股票数据
    数据来自:英为财经
    https://cn.investing.com/

    本次演示提纲:
    一、Series的分层索引MultiIndex
    二、Series有多层索引怎样筛选数据?
    三、DataFrame的多层索引MultiIndex
    四、DataFrame有多层索引怎样筛选数据?

    In [7]:

    import pandas as pd %matplotlib inline

    In [8]:

    fpath = "./pandas-learn-code/datas/stocks/互联网公司股票.xlsx"stocks = pd.read_excel(fpath)

    In [10]:

    stocks.shape

    Out[10]:

    (12, 8)

    In [5]:

    stocks.head()

    Out[5]:

    日期公司收盘开盘高低交易量涨跌幅
    02019-10-03BIDU104.32102.35104.73101.152.240.02
    12019-10-02BIDU102.62100.85103.2499.502.690.01
    22019-10-01BIDU102.00102.80103.26101.001.78-0.01
    32019-10-03BABA169.48166.65170.18165.0010.390.02
    42019-10-02BABA165.77162.82166.88161.9011.600.00

    In [12]:

    stocks["公司"].unique()

    Out[12]:

    array(['BIDU', 'BABA', 'IQ', 'JD'], dtype=object)

    In [14]:

    # 按公司分组查询收盘价的平均值stocks.groupby("公司")["收盘"].mean()

    Out[14]:

    公司BABA 166.80BIDU 102.98IQ 15.90JD 28.35Name: 收盘, dtype: float64

    1、Series的分层索引MultiIndex

    In [16]:

    # ser是Series,有两列索引ser = stocks.groupby(["公司", "日期"])["收盘"].mean()ser

    Out[16]:

    公司 日期 BABA 2019-10-01 165.152019-10-02 165.772019-10-03 169.48BIDU 2019-10-01 102.002019-10-02 102.622019-10-03 104.32IQ 2019-10-01 15.922019-10-02 15.722019-10-03 16.06JD 2019-10-01 28.192019-10-02 28.062019-10-03 28.80Name: 收盘, dtype: float64

    多维索引中,空白的意思是:使用上面的值

    In [20]:

    ser.index

    Out[20]:

    MultiIndex(levels=[['BABA', 'BIDU', 'IQ', 'JD'], ['2019-10-01', '2019-10-02', '2019-10-03']],codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]],names=['公司', '日期'])

    In [21]:

    # unstack把二级索引变成列# 公司继续作为索引,但日期变为columnsser.unstack()

    Out[21]:

    日期2019-10-012019-10-022019-10-03
    公司
    BABA165.15165.77169.48
    BIDU102.00102.62104.32
    IQ15.9215.7216.06
    JD28.1928.0628.80

    In [22]:

    ser

    Out[22]:

    公司 日期 BABA 2019-10-01 165.152019-10-02 165.772019-10-03 169.48BIDU 2019-10-01 102.002019-10-02 102.622019-10-03 104.32IQ 2019-10-01 15.922019-10-02 15.722019-10-03 16.06JD 2019-10-01 28.192019-10-02 28.062019-10-03 28.80Name: 收盘, dtype: float64

    In [24]:

    # 将两层索引(公司,日期)都变成了columnsser.reset_index()

    Out[24]:

    公司日期收盘
    0BABA2019-10-01165.15
    1BABA2019-10-02165.77
    2BABA2019-10-03169.48
    3BIDU2019-10-01102.00
    4BIDU2019-10-02102.62
    5BIDU2019-10-03104.32
    6IQ2019-10-0115.92
    7IQ2019-10-0215.72
    8IQ2019-10-0316.06
    9JD2019-10-0128.19
    10JD2019-10-0228.06
    11JD2019-10-0328.80

    2、Series有多层索引MultiIndex怎么筛选数据?

    In [25]:

    ser

    Out[25]:

    公司 日期 BABA 2019-10-01 165.152019-10-02 165.772019-10-03 169.48BIDU 2019-10-01 102.002019-10-02 102.622019-10-03 104.32IQ 2019-10-01 15.922019-10-02 15.722019-10-03 16.06JD 2019-10-01 28.192019-10-02 28.062019-10-03 28.80Name: 收盘, dtype: float64

    In [27]:

    ser.loc["BIDU"]

    Out[27]:

    日期2019-10-01 102.002019-10-02 102.622019-10-03 104.32Name: 收盘, dtype: float64

    In [ ]:

    # 多层索引,可以用元组的形式筛选

    In [28]:

    ser.loc[("BIDU","2019-10-02")]

    Out[28]:

    102.62

    In [29]:

    ser.loc[:, "2019-10-02"]

    Out[29]:

    公司BABA 165.77BIDU 102.62IQ 15.72JD 28.06Name: 收盘, dtype: float64

    3、DataFrame的多层索引MultiIndex

    In [30]:

    stocks.head()

    Out[30]:

    日期公司收盘开盘高低交易量涨跌幅
    02019-10-03BIDU104.32102.35104.73101.152.240.02
    12019-10-02BIDU102.62100.85103.2499.502.690.01
    22019-10-01BIDU102.00102.80103.26101.001.78-0.01
    32019-10-03BABA169.48166.65170.18165.0010.390.02
    42019-10-02BABA165.77162.82166.88161.9011.600.00

    In [40]:

    stocks.set_index(["公司", "日期"], inplace=True)

    . . .

    In [41]:

    stocks.head()

    Out[41]:

    收盘开盘高低交易量涨跌幅
    公司日期
    BIDU2019-10-03104.32102.35104.73101.152.240.02
    2019-10-02102.62100.85103.2499.502.690.01
    2019-10-01102.00102.80103.26101.001.78-0.01
    BABA2019-10-03169.48166.65170.18165.0010.390.02
    2019-10-02165.77162.82166.88161.9011.600.00

    In [42]:

    stocks.index

    Out[42]:

    MultiIndex(levels=[['BABA', 'BIDU', 'IQ', 'JD'], ['2019-10-01', '2019-10-02', '2019-10-03']],codes=[[1, 1, 1, 0, 0, 0, 2, 2, 2, 3, 3, 3], [2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0]],names=['公司', '日期'])

    In [43]:

    stocks.sort_index(inplace=True)

    In [44]:

    stocks

    Out[44]:

    收盘开盘高低交易量涨跌幅
    公司日期
    BABA2019-10-01165.15168.01168.23163.6414.19-0.01
    2019-10-02165.77162.82166.88161.9011.600.00
    2019-10-03169.48166.65170.18165.0010.390.02
    BIDU2019-10-01102.00102.80103.26101.001.78-0.01
    2019-10-02102.62100.85103.2499.502.690.01
    2019-10-03104.32102.35104.73101.152.240.02
    IQ2019-10-0115.9216.1416.2215.5011.65-0.01
    2019-10-0215.7215.8515.8715.128.10-0.01
    2019-10-0316.0615.7116.3815.3210.080.02
    JD2019-10-0128.1928.2228.5727.9710.640.00
    2019-10-0228.0628.0028.2227.539.530.00
    2019-10-0328.8028.1128.9727.828.770.03

    4、DataFrame有多层索引MultiIndex怎样筛选?

    【*重要知识*】在选择数据时:

    • 元组(key1,key2)代表筛选多层索引,其中key1是索引第一级,key2是第二级,比如key1=JD, key2=2019-10-02
    • 列表[key1,key2]代表同一层的多个KEY,其中key1和key2是并列的同级索引,比如key1=JD, key2=BIDU

    In [45]:

    stocks.loc["BIDU"]

    Out[45]:

    收盘开盘高低交易量涨跌幅
    日期
    2019-10-01102.00102.80103.26101.001.78-0.01
    2019-10-02102.62100.85103.2499.502.690.01
    2019-10-03104.32102.35104.73101.152.240.02

    In [46]:

    # BIDU, 2019-10-02当天所有的相关数据stocks.loc[("BIDU", "2019-10-02"), :]

    Out[46]:

    收盘 102.62开盘 100.85高 103.24低 99.50交易量 2.69涨跌幅 0.01Name: (BIDU, 2019-10-02), dtype: float64

    In [48]:

    # 逻辑关系为BIDU的2019-10-02的开盘数据stocks.loc[("BIDU", "2019-10-02"), "开盘"]

    Out[48]:

    100.85

    In [50]:

    # 并列筛选,BIDU和JD为同级关系stocks.loc[["BIDU", "JD"], :]

    Out[50]:

    收盘开盘高低交易量涨跌幅
    公司日期
    BIDU2019-10-01102.00102.80103.26101.001.78-0.01
    2019-10-02102.62100.85103.2499.502.690.01
    2019-10-03104.32102.35104.73101.152.240.02
    JD2019-10-0128.1928.2228.5727.9710.640.00
    2019-10-0228.0628.0028.2227.539.530.00
    2019-10-0328.8028.1128.9727.828.770.03

    In [51]:

    stocks.loc[(["BIDU", "JD"], "2019-10-03"), :]

    Out[51]:

    收盘开盘高低交易量涨跌幅
    公司日期
    BIDU2019-10-03104.32102.35104.73101.152.240.02
    JD2019-10-0328.8028.1128.9727.828.770.03

    In [52]:

    stocks.loc[(["BIDU", "JD"], "2019-10-03"), "收盘"]

    Out[52]:

    公司 日期 BIDU 2019-10-03 104.32JD 2019-10-03 28.80Name: 收盘, dtype: float64

    In [54]:

    stocks.loc[("BIDU",["2019-10-02", "2019-10-03"]), "收盘"]

    Out[54]:

    公司 日期 BIDU 2019-10-02 102.622019-10-03 104.32Name: 收盘, dtype: float64

    In [55]:

    # slice(None)代表筛选这一索引的所有内容stocks.loc[(slice(None), ["2019-10-02", "2019-10-03"]),:]

    Out[55]:

    收盘开盘高低交易量涨跌幅
    公司日期
    BABA2019-10-02165.77162.82166.88161.9011.600.00
    2019-10-03169.48166.65170.18165.0010.390.02
    BIDU2019-10-02102.62100.85103.2499.502.690.01
    2019-10-03104.32102.35104.73101.152.240.02
    IQ2019-10-0215.7215.8515.8715.128.10-0.01
    2019-10-0316.0615.7116.3815.3210.080.02
    JD2019-10-0228.0628.0028.2227.539.530.00
    2019-10-0328.8028.1128.9727.828.770.03

    In [56]:

    # 将多层索引恢复成列stocks.reset_index()

    Out[56]:

    公司日期收盘开盘高低交易量涨跌幅
    0BABA2019-10-01165.15168.01168.23163.6414.19-0.01
    1BABA2019-10-02165.77162.82166.88161.9011.600.00
    2BABA2019-10-03169.48166.65170.18165.0010.390.02
    3BIDU2019-10-01102.00102.80103.26101.001.78-0.01
    4BIDU2019-10-02102.62100.85103.2499.502.690.01
    5BIDU2019-10-03104.32102.35104.73101.152.240.02
    6IQ2019-10-0115.9216.1416.2215.5011.65-0.01
    7IQ2019-10-0215.7215.8515.8715.128.10-0.01
    8IQ2019-10-0316.0615.7116.3815.3210.080.02
    9JD2019-10-0128.1928.2228.5727.9710.640.00
    10JD2019-10-0228.0628.0028.2227.539.530.00
    11JD2019-10-0328.8028.1128.9727.828.770.03

    十七、Pandas的数据转换函数map、apply、applymap

    数据转换函数对比:map、apply、applymap:

  • map:只用于Series,实现每个值->值的映射;
  • apply:用于Series实现每个值的处理,用于Dataframe实现某个轴的Series的处理;
  • applymap:只能用于DataFrame,用于处理该DataFrame的每个元素;
  • 1. map用于Series值的转换

    实例:将股票代码英文转换成中文名字

    Series.map(dict) or Series.map(function)均可

    In [2]:

    import pandas as pdstocks = pd.read_excel(r"D:\WinterIsComing\python\New_Wave\pandas_basic\pandas-learn-code\datas\stocks\互联网公司股票.xlsx")

    In [3]:

    stocks.head()

    Out[3]:

    日期公司收盘开盘高低交易量涨跌幅
    02019-10-03BIDU104.32102.35104.73101.152.240.02
    12019-10-02BIDU102.62100.85103.2499.502.690.01
    22019-10-01BIDU102.00102.80103.26101.001.78-0.01
    32019-10-03BABA169.48166.65170.18165.0010.390.02
    42019-10-02BABA165.77162.82166.88161.9011.600.00

    In [7]:

    stocks["公司"].unique()

    Out[7]:

    array(['BIDU', 'BABA', 'IQ', 'JD'], dtype=object)

    In [8]:

    # 公司股票代码到中文的映射,注意这里是小写dict_company_names={"bidu":"百度","baba":"阿里巴巴","iq":"爱奇艺","jd":"京东"}

    方法1:Series.map(dict)

    In [9]:

    stocks["中文公司1"]=stocks["公司"].str.lower().map(dict_company_names)

    In [10]:

    stocks

    Out[10]:

    日期公司收盘开盘高低交易量涨跌幅中文公司1
    02019-10-03BIDU104.32102.35104.73101.152.240.02百度
    12019-10-02BIDU102.62100.85103.2499.502.690.01百度
    22019-10-01BIDU102.00102.80103.26101.001.78-0.01百度
    32019-10-03BABA169.48166.65170.18165.0010.390.02阿里巴巴
    42019-10-02BABA165.77162.82166.88161.9011.600.00阿里巴巴
    52019-10-01BABA165.15168.01168.23163.6414.19-0.01阿里巴巴
    62019-10-03IQ16.0615.7116.3815.3210.080.02爱奇艺
    72019-10-02IQ15.7215.8515.8715.128.10-0.01爱奇艺
    82019-10-01IQ15.9216.1416.2215.5011.65-0.01爱奇艺
    92019-10-03JD28.8028.1128.9727.828.770.03京东
    102019-10-02JD28.0628.0028.2227.539.530.00京东
    112019-10-01JD28.1928.2228.5727.9710.640.00京东

    方法2:Series.map(function)

    function的参数是Series的每个元素的值

    In [13]:

    # lambda x中的x代表Series的每个值(即stocks["公司"]中的每个值)stocks["公司中文2"]=stocks["公司"].map(lambda x : dict_company_names[x.lower()])

    In [12]:

    stocks.head()

    Out[12]:

    日期公司收盘开盘高低交易量涨跌幅中文公司1公司中文2
    02019-10-03BIDU104.32102.35104.73101.152.240.02百度百度
    12019-10-02BIDU102.62100.85103.2499.502.690.01百度百度
    22019-10-01BIDU102.00102.80103.26101.001.78-0.01百度百度
    32019-10-03BABA169.48166.65170.18165.0010.390.02阿里巴巴阿里巴巴
    42019-10-02BABA165.77162.82166.88161.9011.600.00阿里巴巴阿里巴巴

    2. apply用于Series和DataFrame的转换

    • Series.apply(function), 函数的参数是每个值
    • DataFrame.apply(function), 函数的参数是Series

    Series.apply(function)

    function的参数是Series的每个值

    In [14]:

    stocks["中文公司3"]=stocks["公司"].apply(lambda x : dict_company_names[x.lower()])

    In [16]:

    stocks.head()

    Out[16]:

    日期公司收盘开盘高低交易量涨跌幅中文公司1公司中文2中文公司3
    02019-10-03BIDU104.32102.35104.73101.152.240.02百度百度百度
    12019-10-02BIDU102.62100.85103.2499.502.690.01百度百度百度
    22019-10-01BIDU102.00102.80103.26101.001.78-0.01百度百度百度
    32019-10-03BABA169.48166.65170.18165.0010.390.02阿里巴巴阿里巴巴阿里巴巴
    42019-10-02BABA165.77162.82166.88161.9011.600.00阿里巴巴阿里巴巴阿里巴巴

    DataFrame.apply(function)

    function的参数是对应轴的Series

    In [18]:

    stocks["中文公司4"]=stocks.apply(lambda x: dict_company_names[x["公司"].lower()], axis=1)

    In [19]:

    stocks["公司"]

    Out[19]:

    0 BIDU1 BIDU2 BIDU3 BABA4 BABA5 BABA6 IQ7 IQ8 IQ9 JD10 JD11 JDName: 公司, dtype: object

    注意这个代码:
    1、apply是在stocks这个DataFrame上调用;
    2、lambda x的x是一个Series,因为指定了axis=1所以Seires的key是列名,可以用x[‘公司’]获取

    In [20]:

    stocks.head()

    Out[20]:

    日期公司收盘开盘高低交易量涨跌幅中文公司1公司中文2中文公司3中文公司4
    02019-10-03BIDU104.32102.35104.73101.152.240.02百度百度百度百度
    12019-10-02BIDU102.62100.85103.2499.502.690.01百度百度百度百度
    22019-10-01BIDU102.00102.80103.26101.001.78-0.01百度百度百度百度
    32019-10-03BABA169.48166.65170.18165.0010.390.02阿里巴巴阿里巴巴阿里巴巴阿里巴巴
    42019-10-02BABA165.77162.82166.88161.9011.600.00阿里巴巴阿里巴巴阿里巴巴阿里巴巴

    3. applymap用于DataFrame所有值的转换

    In [21]:

    sub_df = stocks[["收盘","开盘","高","低","交易量"]]

    In [22]:

    sub_df

    Out[22]:

    收盘开盘高低交易量
    0104.32102.35104.73101.152.24
    1102.62100.85103.2499.502.69
    2102.00102.80103.26101.001.78
    3169.48166.65170.18165.0010.39
    4165.77162.82166.88161.9011.60
    5165.15168.01168.23163.6414.19
    616.0615.7116.3815.3210.08
    715.7215.8515.8715.128.10
    815.9216.1416.2215.5011.65
    928.8028.1128.9727.828.77
    1028.0628.0028.2227.539.53
    1128.1928.2228.5727.9710.64

    In [23]:

    # 将这些数字取整数,应用于所有元素(即表格中所有的值)sub_df.applymap(lambda x: int(x))

    Out[23]:

    收盘开盘高低交易量
    01041021041012
    1102100103992
    21021021031011
    316916617016510
    416516216616111
    516516816816314
    61615161510
    7151515158
    81516161511
    9282828278
    10282828279
    112828282710

    In [25]:

    # 直接修改原df的这几列stocks.loc[:, ["收盘","开盘","高","低","交易量"]] = sub_df.applymap(lambda x: int(x))

    In [26]:

    stocks.head()

    Out[26]:

    日期公司收盘开盘高低交易量涨跌幅中文公司1公司中文2中文公司3中文公司4
    02019-10-03BIDU10410210410120.02百度百度百度百度
    12019-10-02BIDU1021001039920.01百度百度百度百度
    22019-10-01BIDU1021021031011-0.01百度百度百度百度
    32019-10-03BABA169166170165100.02阿里巴巴阿里巴巴阿里巴巴阿里巴巴
    42019-10-02BABA165162166161110.00阿里巴巴阿里巴巴阿里巴巴阿里巴巴

    十八、Pandas怎样对每个分组应用apply函数?

    知识:Pandas的GroupBy遵从split、apply、combine模式

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-lUn7YSnp-1597761927713)(http://localhost:8891/notebooks/pandas-learn-code/other_files/pandas-split-apply-combine.png)]

    这里的split指的是pandas的groupby,我们自己实现apply函数,apply返回的结果由pandas进行combine得到结果

    GroupBy.apply(function)

    • function的第一个参数是dataframe
    • function的返回结果,可是dataframe、series、单个值,甚至和输入dataframe完全没关系

    本次实例演示:

  • 怎样对数值列按分组的归一化?
  • 怎样取每个分组的TOPN数据?
  • 实例1:怎样对数值列按分组的归一化?

    将不同范围的数值列进行归一化,映射到[0,1]区间:

    • 更容易做数据横向对比,比如价格字段是几百到几千,增幅字段是0到100
    • 机器学习模型学的更快性能更好

    归一化的公式:

    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-CuwTEdIl-1597761927714)(http://localhost:8891/notebooks/pandas-learn-code/other_files/Normalization-Formula.jpg)]

    演示:用户对电影评分的归一化

    每个用户的评分不同,有的乐观派评分高,有的悲观派评分低,按用户做归一化

    In [1]:

    import pandas as pd

    In [7]:

    ratings = pd.read_csv("./pandas-learn-code/datas/movielens-1m/ratings.dat",sep="::",engine="python",names="UserID::MovieID::Rating::Timestamp".split("::"))

    In [8]:

    ratings.head()

    Out[8]:

    UserIDMovieIDRatingTimestamp
    0111935978300760
    116613978302109
    219143978301968
    3134084978300275
    4123555978824291

    In [10]:

    # 实现按照用户ID分组,然后对ratings进行归一化def ratings_norm(df):# 实际参数是每个用户分组的df(按照UserID分组的DataFrame)max_value = df["Rating"].max()min_value = df["Rating"].min()df["Rating_norm"] = df["Rating"].apply(lambda x:(x - min_value)/(max_value - min_value))return df# 按照用户分组,apply一个函数,给该DataFrame新增了一列,实现了Rating列的归一化ratings = ratings.groupby("UserID").apply(ratings_norm)

    In [12]:

    ratings["Rating"]

    . . .

    In [16]:

    type(ratings)

    Out[16]:

    pandas.core.frame.DataFrame

    In [17]:

    ratings[ratings["UserID"]==1].head()

    Out[17]:

    UserIDMovieIDRatingTimestampRating_norm
    01119359783007601.0
    1166139783021090.0
    2191439783019680.0
    31340849783002750.5
    41235559788242911.0

    实例2:怎么取每个分组的TOP N数据

    获取2018年每个月温度最高的2天数据

    In [18]:

    fpath = "./pandas-learn-code/datas/beijing_tianqi/beijing_tianqi_2018.csv"df = pd.read_csv(fpath)

    In [19]:

    df.head()

    Out[19]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevel
    02018-01-013℃-6℃晴~多云东北风1-2级592
    12018-01-022℃-5℃阴~多云东北风1-2级491
    22018-01-032℃-5℃多云北风1-2级281
    32018-01-040℃-8℃东北风1-2级281
    42018-01-053℃-6℃多云~晴西北风1-2级501

    In [21]:

    # 替换掉温度后的℃df.loc[:, "bWendu"]=df["bWendu"].str.replace("℃","").astype("int32")df.loc[:, "yWendu"]=df["yWendu"].str.replace("℃","").astype("int32")

    In [22]:

    # 新增一列为月份df["month"] = df["ymd"].str[0:7]df.head()

    Out[22]:

    ymdbWenduyWendutianqifengxiangfengliaqiaqiInfoaqiLevelmonth
    02018-01-013-6晴~多云东北风1-2级5922018-01
    12018-01-022-5阴~多云东北风1-2级4912018-01
    22018-01-032-5多云北风1-2级2812018-01
    32018-01-040-8东北风1-2级2812018-01
    42018-01-053-6多云~晴西北风1-2级5012018-01

    In [24]:

    def getWenduTopN(df, topn):# 这里的df,是每个月份分组group的dfreturn df.sort_values(by="bWendu")[["ymd", "bWendu"]][-topn:]df.groupby("month").apply(getWenduTopN, topn=2).head()

    Out[24]:

    ymdbWendu
    month
    2018-01132018-01-146
    182018-01-197
    2018-02532018-02-2310
    562018-02-2612
    2018-03862018-03-2825

    In [25]:

    df[["ymd","bWendu"]]

    . . .

    我们看到,groupby的apply函数返回的DataFrame,其实和原来的DataFrame其实可以完全不一样

    需要做网站?需要网络推广?欢迎咨询客户经理 13272073477