发布时间:2025-12-09 17:53:46 浏览次数:3
pandas需要先读取表格类型的数据,然后进行分析
| csv、tsv、txt | 用逗号分隔、tab分割的纯文本文件 | pd.read_csv |
| excel | 微软xls或者xlsx文件 | pd.read_excel |
| mysql | 关系型数据库表 | pd.read_sql |
In [1]:
import pandas as pdIn [2]:
fpath = "./pandas-learn-code/datas/ml-latest-small/ratings.csv"In [3]:
# 使用pd.read_csv读取数据ratings = pd.read_csv(fpath)In [4]:
# 查看前几行数据ratings.head()Out[4]:
| 0 | 1 | 1 | 4.0 | 964982703 |
| 1 | 1 | 3 | 4.0 | 964981247 |
| 2 | 1 | 6 | 4.0 | 964982224 |
| 3 | 1 | 47 | 5.0 | 964983815 |
| 4 | 1 | 50 | 5.0 | 964982931 |
In [5]:
# 查看数据的形状,返回(行数、列数)ratings.shapeOut[5]:
(100836, 4)In [6]:
# 查看列名列表ratings.columnsOut[6]:
Index(['userId', 'movieId', 'rating', 'timestamp'], dtype='object')In [7]:
# 查看索引ratings.indexOut[7]:
RangeIndex(start=0, stop=100836, step=1)In [9]:
# 查看每列的数据类型ratings.dtypesOut[9]:
userId int64movieId int64rating float64timestamp int64dtype: objectIn [10]:
fpath = "./pandas-learn-code/datas/crazyant/access_pvuv.txt"In [11]:
pvuv = pd.read_csv(fpath, sep="\t", header=None, names=["pdate","pv","uv"])In [13]:
pvuv.head()Out[13]:
| 0 | 2019-09-10 | 139 | 92 |
| 1 | 2019-09-09 | 185 | 153 |
| 2 | 2019-09-08 | 123 | 59 |
| 3 | 2019-09-07 | 65 | 40 |
| 4 | 2019-09-06 | 157 | 98 |
In [18]:
fpath = "./pandas-learn-code/datas/crazyant/access_pvuv.xlsx"pvuv = pd.read_excel(fpath)In [19]:
pvuvOut[19]:
| 0 | 2019-09-10 | 139 | 92 |
| 1 | 2019-09-09 | 185 | 153 |
| 2 | 2019-09-08 | 123 | 59 |
| 3 | 2019-09-07 | 65 | 40 |
| 4 | 2019-09-06 | 157 | 98 |
| 5 | 2019-09-05 | 205 | 151 |
| 6 | 2019-09-04 | 196 | 167 |
| 7 | 2019-09-03 | 216 | 176 |
| 8 | 2019-09-02 | 227 | 148 |
| 9 | 2019-09-01 | 105 | 61 |
In [36]:
import pymysqlconn = pymysql.connect(host="127.0.0.1",user="root",password="123456",database="test",charset="utf8")In [41]:
fpath = "./pandas-learn-code/datas/crazyant/test_crazyant_pvuv.sql"mysql_page = pd.read_sql("select * from crazyant_pvuv", con=conn)In [42]:
pvuvOut[42]:
| 0 | 2019-09-10 | 139 | 92 |
| 1 | 2019-09-09 | 185 | 153 |
| 2 | 2019-09-08 | 123 | 59 |
| 3 | 2019-09-07 | 65 | 40 |
| 4 | 2019-09-06 | 157 | 98 |
| 5 | 2019-09-05 | 205 | 151 |
| 6 | 2019-09-04 | 196 | 167 |
| 7 | 2019-09-03 | 216 | 176 |
| 8 | 2019-09-02 | 227 | 148 |
| 9 | 2019-09-01 | 105 | 61 |
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tT4RRssV-1597761927694)(C:\Users\z&y\AppData\Roaming\Typora\typora-user-images\image-20200730213558995.png)]
In [1]:
import pandas as pdimport numpy as npSeries是一种类似于一维数组的对象,它由一组数据(不同数据类型)以及一组与之相关的数据标签(即索引)组成。
In [3]:
s1 = pd.Series([1,'a',5.2,7])In [5]:
# 左侧为索引,右侧是数据s1.head()Out[5]:
0 11 a2 5.23 7dtype: objectIn [6]:
# 获取索引s1.indexOut[6]:
RangeIndex(start=0, stop=4, step=1)In [7]:
# 获取数据s1.valuesOut[7]:
array([1, 'a', 5.2, 7], dtype=object)In [8]:
s2 = pd.Series([1,'a',5.2,7], index=['a','b','c','d'])In [9]:
s2Out[9]:
a 1b ac 5.2d 7dtype: objectIn [10]:
s2.indexOut[10]:
Index(['a', 'b', 'c', 'd'], dtype='object')In [11]:
sdata = {'Ohio':35000, 'Texas':72000, 'Oregon':16000, 'Utah':5000}In [13]:
s3 = pd.Series(sdata)In [14]:
# 字典的key成为了Series的索引s3Out[14]:
Ohio 35000Texas 72000Oregon 16000Utah 5000dtype: int64类似python的字典dict
In [15]:
s2Out[15]:
a 1b ac 5.2d 7dtype: objectIn [20]:
s2['a']Out[20]:
1In [21]:
# 查询一个值,返回查询值的数据类型type(s2['a'])Out[21]:
intIn [18]:
# 一次查询多个值s2[['a','b','c']]Out[18]:
a 1b ac 5.2dtype: objectIn [22]:
# 查询多个值,返回的还是Seriestype(s2[['a','b','c']])Out[22]:
pandas.core.series.SeriesDataFrame是一个表格型的数据结构
In [24]:
data = {'state':['Ohio','Ohio','Ohio','Nevada','Nevada'],'year':[2000,2001,2002,2003,2004],'pop':[1.5,1.7,3.6,2.4,2.9]}df = pd.DataFrame(data)In [25]:
dfOut[25]:
| 0 | Ohio | 2000 | 1.5 |
| 1 | Ohio | 2001 | 1.7 |
| 2 | Ohio | 2002 | 3.6 |
| 3 | Nevada | 2003 | 2.4 |
| 4 | Nevada | 2004 | 2.9 |
In [26]:
df.dtypesOut[26]:
state objectyear int64pop float64dtype: objectIn [27]:
df.columnsOut[27]:
Index(['state', 'year', 'pop'], dtype='object')In [28]:
df.indexOut[28]:
RangeIndex(start=0, stop=5, step=1)In [29]:
dfOut[29]:
| 0 | Ohio | 2000 | 1.5 |
| 1 | Ohio | 2001 | 1.7 |
| 2 | Ohio | 2002 | 3.6 |
| 3 | Nevada | 2003 | 2.4 |
| 4 | Nevada | 2004 | 2.9 |
In [30]:
df['year']Out[30]:
0 20001 20012 20023 20034 2004Name: year, dtype: int64In [35]:
# 返回的是一个Seriestype(df['year'])Out[35]:
pandas.core.series.SeriesIn [33]:
df[['year', 'pop']]Out[33]:
| 0 | 2000 | 1.5 |
| 1 | 2001 | 1.7 |
| 2 | 2002 | 3.6 |
| 3 | 2003 | 2.4 |
| 4 | 2004 | 2.9 |
In [34]:
# 返回的结果是一个DataFrametype(df[['year','pop']])Out[34]:
pandas.core.frame.DataFrameIn [39]:
df.loc[0]Out[39]:
state Ohioyear 2000pop 1.5Name: 0, dtype: objectIn [40]:
type(df.loc[0])Out[40]:
pandas.core.series.SeriesIn [41]:
# DataFrame中切片会返回结尾的数据df.loc[0:3]Out[41]:
| 0 | Ohio | 2000 | 1.5 |
| 1 | Ohio | 2001 | 1.7 |
| 2 | Ohio | 2002 | 3.6 |
| 3 | Nevada | 2003 | 2.4 |
In [42]:
type(df.loc[0:3])Out[42]:
pandas.core.frame.DataFrame.loc方法既能查询,又能覆盖写入,推荐使用此方法
In [3]:
import pandas as pd数据为北京2018年全年天气预报
In [4]:
df = pd.read_csv("./pandas-learn-code/datas/beijing_tianqi/beijing_tianqi_2018.csv")In [5]:
df.head()Out[5]:
| 0 | 2018-01-01 | 3℃ | -6℃ | 晴~多云 | 东北风 | 1-2级 | 59 | 良 | 2 |
| 1 | 2018-01-02 | 2℃ | -5℃ | 阴~多云 | 东北风 | 1-2级 | 49 | 优 | 1 |
| 2 | 2018-01-03 | 2℃ | -5℃ | 多云 | 北风 | 1-2级 | 28 | 优 | 1 |
| 3 | 2018-01-04 | 0℃ | -8℃ | 阴 | 东北风 | 1-2级 | 28 | 优 | 1 |
| 4 | 2018-01-05 | 3℃ | -6℃ | 多云~晴 | 西北风 | 1-2级 | 50 | 优 | 1 |
In [6]:
# 设定索引为日期,方便按日期筛选df.set_index('ymd', inplace=True)In [7]:
df.head()Out[7]:
| ymd | ||||||||
| 2018-01-01 | 3℃ | -6℃ | 晴~多云 | 东北风 | 1-2级 | 59 | 良 | 2 |
| 2018-01-02 | 2℃ | -5℃ | 阴~多云 | 东北风 | 1-2级 | 49 | 优 | 1 |
| 2018-01-03 | 2℃ | -5℃ | 多云 | 北风 | 1-2级 | 28 | 优 | 1 |
| 2018-01-04 | 0℃ | -8℃ | 阴 | 东北风 | 1-2级 | 28 | 优 | 1 |
| 2018-01-05 | 3℃ | -6℃ | 多云~晴 | 西北风 | 1-2级 | 50 | 优 | 1 |
In [8]:
# 时间序列见后续课程,本次按字符串处理df.indexOut[8]:
Index(['2018-01-01', '2018-01-02', '2018-01-03', '2018-01-04', '2018-01-05','2018-01-06', '2018-01-07', '2018-01-08', '2018-01-09', '2018-01-10',...'2018-12-22', '2018-12-23', '2018-12-24', '2018-12-25', '2018-12-26','2018-12-27', '2018-12-28', '2018-12-29', '2018-12-30', '2018-12-31'],dtype='object', name='ymd', length=365)In [9]:
# 替换掉温度的后缀℃# df.loc[:]表示筛选出所有的行df.loc[:, "bWendu"] = df["bWendu"].str.replace("℃","").astype('int32')df.loc[:, "yWendu"] = df["yWendu"].str.replace("℃","").astype('int32')In [10]:
# bWendu和yWendu改为int类型df.dtypesOut[10]:
bWendu int32yWendu int32tianqi objectfengxiang objectfengli objectaqi int64aqiInfo objectaqiLevel int64dtype: object行或者列,都可以只传入单个值,实现精确匹配
In [11]:
# 得到单个值df.loc['2018-01-03','bWendu']Out[11]:
2In [12]:
# 得到一个Seriesdf.loc['2018-01-03',['bWendu', 'yWendu']]Out[12]:
bWendu 2yWendu -5Name: 2018-01-03, dtype: objectIn [13]:
# 得到Seriesdf.loc[['2018-01-03','2018-01-04','2018-01-05'], 'bWendu']Out[13]:
ymd2018-01-03 22018-01-04 02018-01-05 3Name: bWendu, dtype: int32In [14]:
# 得到DataFramedf.loc[['2018-01-03','2018-01-04','2018-01-05'], ['bWendu','yWendu']]Out[14]:
| ymd | ||
| 2018-01-03 | 2 | -5 |
| 2018-01-04 | 0 | -8 |
| 2018-01-05 | 3 | -6 |
注意:区间既包含开始,也包含结束
In [15]:
# 行index按区间df.loc['2018-01-03':'2018-01-05', 'bWendu']Out[15]:
ymd2018-01-03 22018-01-04 02018-01-05 3Name: bWendu, dtype: int32In [16]:
# 列index按区间df.loc['2018-01-03','bWendu':'fengxiang']Out[16]:
bWendu 2yWendu -5tianqi 多云fengxiang 北风Name: 2018-01-03, dtype: objectIn [17]:
# 行和列都按区间查询df.loc['2018-01-03':'2018-01-05','bWendu':'fengxiang']Out[17]:
| ymd | ||||
| 2018-01-03 | 2 | -5 | 多云 | 北风 |
| 2018-01-04 | 0 | -8 | 阴 | 东北风 |
| 2018-01-05 | 3 | -6 | 多云~晴 | 西北风 |
bool列表的长度得等于行数或者列数
In [23]:
df.loc[df["yWendu"]<-10,:]Out[23]:
| ymd | ||||||||
| 2018-01-23 | -4 | -12 | 晴 | 西北风 | 3-4级 | 31 | 优 | 1 |
| 2018-01-24 | -4 | -11 | 晴 | 西南风 | 1-2级 | 34 | 优 | 1 |
| 2018-01-25 | -3 | -11 | 多云 | 东北风 | 1-2级 | 27 | 优 | 1 |
| 2018-12-26 | -2 | -11 | 晴~多云 | 东北风 | 2级 | 26 | 优 | 1 |
| 2018-12-27 | -5 | -12 | 多云~晴 | 西北风 | 3级 | 48 | 优 | 1 |
| 2018-12-28 | -3 | -11 | 晴 | 西北风 | 3级 | 40 | 优 | 1 |
| 2018-12-29 | -3 | -12 | 晴 | 西北风 | 2级 | 29 | 优 | 1 |
| 2018-12-30 | -2 | -11 | 晴~多云 | 东北风 | 1级 | 31 | 优 | 1 |
In [24]:
df["yWendu"]<-10Out[24]:
ymd2018-01-01 False2018-01-02 False2018-01-03 False2018-01-04 False2018-01-05 False2018-01-06 False2018-01-07 False2018-01-08 False2018-01-09 False2018-01-10 False2018-01-11 False2018-01-12 False2018-01-13 False2018-01-14 False2018-01-15 False2018-01-16 False2018-01-17 False2018-01-18 False2018-01-19 False2018-01-20 False2018-01-21 False2018-01-22 False2018-01-23 True2018-01-24 True2018-01-25 True2018-01-26 False2018-01-27 False2018-01-28 False2018-01-29 False2018-01-30 False... 2018-12-02 False2018-12-03 False2018-12-04 False2018-12-05 False2018-12-06 False2018-12-07 False2018-12-08 False2018-12-09 False2018-12-10 False2018-12-11 False2018-12-12 False2018-12-13 False2018-12-14 False2018-12-15 False2018-12-16 False2018-12-17 False2018-12-18 False2018-12-19 False2018-12-20 False2018-12-21 False2018-12-22 False2018-12-23 False2018-12-24 False2018-12-25 False2018-12-26 True2018-12-27 True2018-12-28 True2018-12-29 True2018-12-30 True2018-12-31 FalseName: yWendu, Length: 365, dtype: boolIn [29]:
df.loc[(df["bWendu"]<=30) & (df["yWendu"]>=15) & (df["tianqi"]=="晴") & (df["aqiLevel"]==1),:]Out[29]:
| ymd | ||||||||
| 2018-08-24 | 30 | 20 | 晴 | 北风 | 1-2级 | 40 | 优 | 1 |
| 2018-09-07 | 27 | 16 | 晴 | 西北风 | 3-4级 | 22 | 优 | 1 |
In [30]:
(df["bWendu"]<=30) & (df["yWendu"]>=15) & (df["tianqi"]=="晴") & (df["aqiLevel"]==1)Out[30]:
ymd2018-01-01 False2018-01-02 False2018-01-03 False2018-01-04 False2018-01-05 False2018-01-06 False2018-01-07 False2018-01-08 False2018-01-09 False2018-01-10 False2018-01-11 False2018-01-12 False2018-01-13 False2018-01-14 False2018-01-15 False2018-01-16 False2018-01-17 False2018-01-18 False2018-01-19 False2018-01-20 False2018-01-21 False2018-01-22 False2018-01-23 False2018-01-24 False2018-01-25 False2018-01-26 False2018-01-27 False2018-01-28 False2018-01-29 False2018-01-30 False... 2018-12-02 False2018-12-03 False2018-12-04 False2018-12-05 False2018-12-06 False2018-12-07 False2018-12-08 False2018-12-09 False2018-12-10 False2018-12-11 False2018-12-12 False2018-12-13 False2018-12-14 False2018-12-15 False2018-12-16 False2018-12-17 False2018-12-18 False2018-12-19 False2018-12-20 False2018-12-21 False2018-12-22 False2018-12-23 False2018-12-24 False2018-12-25 False2018-12-26 False2018-12-27 False2018-12-28 False2018-12-29 False2018-12-30 False2018-12-31 FalseLength: 365, dtype: boolIn [31]:
# 直接写lambda表达式df.loc[lambda df: (df["bWendu"]<=30) & (df["yWendu"]>=15),:]Out[31]:
| ymd | ||||||||
| 2018-04-28 | 27 | 17 | 晴 | 西南风 | 3-4级 | 125 | 轻度污染 | 3 |
| 2018-04-29 | 30 | 16 | 多云 | 南风 | 3-4级 | 193 | 中度污染 | 4 |
| 2018-05-04 | 27 | 16 | 晴~多云 | 西南风 | 1-2级 | 86 | 良 | 2 |
| 2018-05-09 | 29 | 17 | 晴~多云 | 西南风 | 3-4级 | 79 | 良 | 2 |
| 2018-05-10 | 26 | 18 | 多云 | 南风 | 3-4级 | 118 | 轻度污染 | 3 |
| 2018-05-11 | 24 | 15 | 阴~多云 | 东风 | 1-2级 | 106 | 轻度污染 | 3 |
| 2018-05-12 | 28 | 16 | 小雨 | 东南风 | 3-4级 | 186 | 中度污染 | 4 |
| 2018-05-13 | 30 | 17 | 晴 | 南风 | 1-2级 | 68 | 良 | 2 |
| 2018-05-16 | 29 | 21 | 多云~小雨 | 东风 | 1-2级 | 142 | 轻度污染 | 3 |
| 2018-05-17 | 25 | 19 | 小雨~多云 | 北风 | 1-2级 | 70 | 良 | 2 |
| 2018-05-18 | 28 | 16 | 多云~晴 | 南风 | 1-2级 | 49 | 优 | 1 |
| 2018-05-19 | 27 | 16 | 多云~小雨 | 南风 | 1-2级 | 69 | 良 | 2 |
| 2018-05-20 | 21 | 16 | 阴~小雨 | 东风 | 1-2级 | 54 | 良 | 2 |
| 2018-05-23 | 29 | 15 | 晴 | 西南风 | 3-4级 | 153 | 中度污染 | 4 |
| 2018-05-26 | 30 | 17 | 小雨~多云 | 西南风 | 3-4级 | 143 | 轻度污染 | 3 |
| 2018-05-28 | 30 | 16 | 晴 | 西北风 | 4-5级 | 178 | 中度污染 | 4 |
| 2018-06-09 | 23 | 17 | 小雨 | 北风 | 1-2级 | 45 | 优 | 1 |
| 2018-06-10 | 27 | 17 | 多云 | 东南风 | 1-2级 | 51 | 良 | 2 |
| 2018-06-11 | 29 | 19 | 多云 | 西南风 | 3-4级 | 85 | 良 | 2 |
| 2018-06-13 | 28 | 19 | 雷阵雨~多云 | 东北风 | 1-2级 | 73 | 良 | 2 |
| 2018-06-18 | 30 | 21 | 雷阵雨 | 西南风 | 1-2级 | 112 | 轻度污染 | 3 |
| 2018-06-22 | 30 | 21 | 雷阵雨~多云 | 东南风 | 1-2级 | 83 | 良 | 2 |
| 2018-07-08 | 30 | 23 | 雷阵雨 | 南风 | 1-2级 | 73 | 良 | 2 |
| 2018-07-09 | 30 | 22 | 雷阵雨~多云 | 东南风 | 1-2级 | 106 | 轻度污染 | 3 |
| 2018-07-10 | 30 | 22 | 多云~雷阵雨 | 南风 | 1-2级 | 48 | 优 | 1 |
| 2018-07-11 | 25 | 22 | 雷阵雨~大雨 | 东北风 | 1-2级 | 44 | 优 | 1 |
| 2018-07-12 | 27 | 22 | 多云 | 南风 | 1-2级 | 46 | 优 | 1 |
| 2018-07-13 | 28 | 23 | 雷阵雨 | 东风 | 1-2级 | 60 | 良 | 2 |
| 2018-07-17 | 27 | 23 | 中雨~雷阵雨 | 西风 | 1-2级 | 28 | 优 | 1 |
| 2018-07-24 | 28 | 26 | 暴雨~雷阵雨 | 东北风 | 3-4级 | 29 | 优 | 1 |
| … | … | … | … | … | … | … | … | … |
| 2018-08-11 | 30 | 23 | 雷阵雨~中雨 | 东风 | 1-2级 | 60 | 良 | 2 |
| 2018-08-12 | 30 | 24 | 雷阵雨 | 东南风 | 1-2级 | 74 | 良 | 2 |
| 2018-08-14 | 29 | 24 | 中雨~小雨 | 东北风 | 1-2级 | 42 | 优 | 1 |
| 2018-08-16 | 30 | 21 | 晴~多云 | 东北风 | 1-2级 | 40 | 优 | 1 |
| 2018-08-17 | 30 | 22 | 多云~雷阵雨 | 东南风 | 1-2级 | 69 | 良 | 2 |
| 2018-08-18 | 28 | 23 | 小雨~中雨 | 北风 | 3-4级 | 40 | 优 | 1 |
| 2018-08-19 | 26 | 23 | 中雨~小雨 | 东北风 | 1-2级 | 37 | 优 | 1 |
| 2018-08-22 | 28 | 21 | 雷阵雨~多云 | 西南风 | 1-2级 | 48 | 优 | 1 |
| 2018-08-24 | 30 | 20 | 晴 | 北风 | 1-2级 | 40 | 优 | 1 |
| 2018-08-27 | 30 | 22 | 多云~雷阵雨 | 东南风 | 1-2级 | 89 | 良 | 2 |
| 2018-08-28 | 29 | 22 | 小雨~多云 | 南风 | 1-2级 | 58 | 良 | 2 |
| 2018-08-30 | 29 | 20 | 多云 | 南风 | 1-2级 | 47 | 优 | 1 |
| 2018-08-31 | 29 | 20 | 多云~阴 | 东南风 | 1-2级 | 48 | 优 | 1 |
| 2018-09-01 | 27 | 19 | 阴~小雨 | 南风 | 1-2级 | 50 | 优 | 1 |
| 2018-09-02 | 27 | 19 | 小雨~多云 | 南风 | 1-2级 | 55 | 良 | 2 |
| 2018-09-03 | 30 | 19 | 晴 | 北风 | 3-4级 | 70 | 良 | 2 |
| 2018-09-06 | 27 | 18 | 多云~晴 | 西北风 | 4-5级 | 37 | 优 | 1 |
| 2018-09-07 | 27 | 16 | 晴 | 西北风 | 3-4级 | 22 | 优 | 1 |
| 2018-09-08 | 27 | 15 | 多云~晴 | 北风 | 1-2级 | 28 | 优 | 1 |
| 2018-09-09 | 28 | 16 | 晴 | 西南风 | 1-2级 | 51 | 良 | 2 |
| 2018-09-10 | 28 | 19 | 多云 | 南风 | 1-2级 | 65 | 良 | 2 |
| 2018-09-11 | 26 | 19 | 多云 | 南风 | 1-2级 | 68 | 良 | 2 |
| 2018-09-12 | 29 | 19 | 多云 | 南风 | 1-2级 | 59 | 良 | 2 |
| 2018-09-13 | 29 | 20 | 多云~阴 | 南风 | 1-2级 | 107 | 轻度污染 | 3 |
| 2018-09-14 | 28 | 19 | 小雨~多云 | 南风 | 1-2级 | 128 | 轻度污染 | 3 |
| 2018-09-15 | 26 | 15 | 多云 | 北风 | 3-4级 | 42 | 优 | 1 |
| 2018-09-17 | 27 | 17 | 多云~阴 | 北风 | 1-2级 | 37 | 优 | 1 |
| 2018-09-18 | 25 | 17 | 阴~多云 | 西南风 | 1-2级 | 50 | 优 | 1 |
| 2018-09-19 | 26 | 17 | 多云 | 南风 | 1-2级 | 52 | 良 | 2 |
| 2018-09-20 | 27 | 16 | 多云 | 西南风 | 1-2级 | 63 | 良 | 2 |
64 rows × 8 columns
In [33]:
# 编写自己的函数,查询9月份,空气质量好的数据def query_my_data(df):return df.index.str.startswith("2018-09") & (df["aqiLevel"]==1)df.loc[query_my_data,:]Out[33]:
| ymd | ||||||||
| 2018-09-01 | 27 | 19 | 阴~小雨 | 南风 | 1-2级 | 50 | 优 | 1 |
| 2018-09-04 | 31 | 18 | 晴 | 西南风 | 3-4级 | 24 | 优 | 1 |
| 2018-09-05 | 31 | 19 | 晴~多云 | 西南风 | 3-4级 | 34 | 优 | 1 |
| 2018-09-06 | 27 | 18 | 多云~晴 | 西北风 | 4-5级 | 37 | 优 | 1 |
| 2018-09-07 | 27 | 16 | 晴 | 西北风 | 3-4级 | 22 | 优 | 1 |
| 2018-09-08 | 27 | 15 | 多云~晴 | 北风 | 1-2级 | 28 | 优 | 1 |
| 2018-09-15 | 26 | 15 | 多云 | 北风 | 3-4级 | 42 | 优 | 1 |
| 2018-09-16 | 25 | 14 | 多云~晴 | 北风 | 1-2级 | 29 | 优 | 1 |
| 2018-09-17 | 27 | 17 | 多云~阴 | 北风 | 1-2级 | 37 | 优 | 1 |
| 2018-09-18 | 25 | 17 | 阴~多云 | 西南风 | 1-2级 | 50 | 优 | 1 |
| 2018-09-21 | 25 | 14 | 晴 | 西北风 | 3-4级 | 50 | 优 | 1 |
| 2018-09-22 | 24 | 13 | 晴 | 西北风 | 3-4级 | 28 | 优 | 1 |
| 2018-09-23 | 23 | 12 | 晴 | 西北风 | 4-5级 | 28 | 优 | 1 |
| 2018-09-24 | 23 | 11 | 晴 | 北风 | 1-2级 | 28 | 优 | 1 |
| 2018-09-25 | 24 | 12 | 晴~多云 | 南风 | 1-2级 | 44 | 优 | 1 |
| 2018-09-29 | 22 | 11 | 晴 | 北风 | 3-4级 | 21 | 优 | 1 |
| 2018-09-30 | 19 | 13 | 多云 | 西北风 | 4-5级 | 22 | 优 | 1 |
In [1]:
import pandas as pdIn [15]:
df = pd.read_csv("./pandas-learn-code/datas/beijing_tianqi/beijing_tianqi_2018.csv")In [16]:
df.head()Out[16]:
| 0 | 2018-01-01 | 3℃ | -6℃ | 晴~多云 | 东北风 | 1-2级 | 59 | 良 | 2 |
| 1 | 2018-01-02 | 2℃ | -5℃ | 阴~多云 | 东北风 | 1-2级 | 49 | 优 | 1 |
| 2 | 2018-01-03 | 2℃ | -5℃ | 多云 | 北风 | 1-2级 | 28 | 优 | 1 |
| 3 | 2018-01-04 | 0℃ | -8℃ | 阴 | 东北风 | 1-2级 | 28 | 优 | 1 |
| 4 | 2018-01-05 | 3℃ | -6℃ | 多云~晴 | 西北风 | 1-2级 | 50 | 优 | 1 |
实例:清理温度列,变成数字类型
In [31]:
df.loc[:,"bWendu"] = df["bWendu"].str.replace("℃","").astype('int32')df.loc[:,"yWendu"] = df["yWendu"].str.replace("℃","").astype('int32')实例:计算温差In [49]:
del df["bWendnu"]In [51]:
del df["bWednu"]In [52]:
# 注意,fpath["bWendu"]其实是一个Series,后面的减法返回的是Seriesdf.loc[:,"wencha"] = df["bWendu"] - df["yWendu"]In [53]:
df.head()Out[53]:
| 0 | 2018-01-01 | 3 | -6 | 晴~多云 | 东北风 | 1-2级 | 59 | 良 | 2 | 9 |
| 1 | 2018-01-02 | 2 | -5 | 阴~多云 | 东北风 | 1-2级 | 49 | 优 | 1 | 7 |
| 2 | 2018-01-03 | 2 | -5 | 多云 | 北风 | 1-2级 | 28 | 优 | 1 | 7 |
| 3 | 2018-01-04 | 0 | -8 | 阴 | 东北风 | 1-2级 | 28 | 优 | 1 | 8 |
| 4 | 2018-01-05 | 3 | -6 | 多云~晴 | 西北风 | 1-2级 | 50 | 优 | 1 | 9 |
Apply a function along an axis of the DataFrame. Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1) 实例:添加一列温度类型:
In [60]:
def get_wendu_type(x):if x["bWendu"] > 33:return "高温"if x["yWendu"] < -10:return "低温"return "常温"# 注意需要设置axis=1df.loc[:,"wendu_type"] = df.apply(get_wendu_type, axis=1)In [61]:
# 查看温度类型的计数df["wendu_type"].value_counts()Out[61]:
常温 328高温 29低温 8Name: wendu_type, dtype: int64Assign new columns to a DataFrame.
Returns a new object with all original columns in addtion to new ones.
实例:将温度从摄氏度变成华氏度
In [63]:
# 可以同时添加多个新的列df.assign(yWendu_huashi = lambda x: x["yWendu"]*9/5 + 32,bWendu_huashi = lambda x: x["bWendu"]*9/5 + 32). . .
按条件选择数据,然后对整个数据赋值新列
实例:高低温差大于10度,则认为温差大
In [65]:
df.loc[:,"wencha_type"] = ""df.loc[df["bWendu"]-df["yWendu"]>10, "wencha_type"] = "温差大"df.loc[df["bWendu"]-df["yWendu"]<=10, "wencha_type"]= "温度正常"In [67]:
df["wencha_type"].value_counts()Out[67]:
温度正常 187温差大 178Name: wencha_type, dtype: int64In [2]:
import pandas as pdIn [5]:
fpath = "./pandas-learn-code/datas/beijing_tianqi/beijing_tianqi_2018.csv"df = pd.read_csv(fpath)In [6]:
df.head(3)Out[6]:
| 0 | 2018-01-01 | 3℃ | -6℃ | 晴~多云 | 东北风 | 1-2级 | 59 | 良 | 2 |
| 1 | 2018-01-02 | 2℃ | -5℃ | 阴~多云 | 东北风 | 1-2级 | 49 | 优 | 1 |
| 2 | 2018-01-03 | 2℃ | -5℃ | 多云 | 北风 | 1-2级 | 28 | 优 | 1 |
In [12]:
df.loc[:, "yWendu"] = df["yWendu"].str.replace("℃","").astype("int32")In [14]:
df.head(3)Out[14]:
| 0 | 2018-01-01 | 3 | -6 | 晴~多云 | 东北风 | 1-2级 | 59 | 良 | 2 |
| 1 | 2018-01-02 | 2 | -5 | 阴~多云 | 东北风 | 1-2级 | 49 | 优 | 1 |
| 2 | 2018-01-03 | 2 | -5 | 多云 | 北风 | 1-2级 | 28 | 优 | 1 |
In [15]:
# 一次提取所有数字列统计结果df.describe()Out[15]:
| count | 365.000000 | 365.000000 | 365.000000 | 365.000000 |
| mean | 18.665753 | 8.358904 | 82.183562 | 2.090411 |
| std | 11.858046 | 11.755053 | 51.936159 | 1.029798 |
| min | -5.000000 | -12.000000 | 21.000000 | 1.000000 |
| 25% | 8.000000 | -3.000000 | 46.000000 | 1.000000 |
| 50% | 21.000000 | 8.000000 | 69.000000 | 2.000000 |
| 75% | 29.000000 | 19.000000 | 104.000000 | 3.000000 |
| max | 38.000000 | 27.000000 | 387.000000 | 6.000000 |
In [16]:
# 查看单个Series的数据df["bWendu"].mean()Out[16]:
18.665753424657535In [17]:
# 最高温df["bWendu"].max()Out[17]:
38In [18]:
# 最低温df["bWendu"].min()Out[18]:
-5一般不用于数值列,而是枚举、分类列
In [19]:
df["fengxiang"].unique()Out[19]:
array(['东北风', '北风', '西北风', '西南风', '南风', '东南风', '东风', '西风'], dtype=object)In [20]:
df["tianqi"].unique()Out[20]:
array(['晴~多云', '阴~多云', '多云', '阴', '多云~晴', '多云~阴', '晴', '阴~小雪', '小雪~多云','小雨~阴', '小雨~雨夹雪', '多云~小雨', '小雨~多云', '大雨~小雨', '小雨', '阴~小雨','多云~雷阵雨', '雷阵雨~多云', '阴~雷阵雨', '雷阵雨', '雷阵雨~大雨', '中雨~雷阵雨', '小雨~大雨','暴雨~雷阵雨', '雷阵雨~中雨', '小雨~雷阵雨', '雷阵雨~阴', '中雨~小雨', '小雨~中雨', '雾~多云','霾'], dtype=object)In [22]:
df["fengli"].unique()Out[22]:
array(['1-2级', '4-5级', '3-4级', '2级', '1级', '3级'], dtype=object)In [24]:
df["fengxiang"].value_counts()Out[24]:
南风 92西南风 64北风 54西北风 51东南风 46东北风 38东风 14西风 6Name: fengxiang, dtype: int64In [25]:
df["tianqi"].unique()Out[25]:
array(['晴~多云', '阴~多云', '多云', '阴', '多云~晴', '多云~阴', '晴', '阴~小雪', '小雪~多云','小雨~阴', '小雨~雨夹雪', '多云~小雨', '小雨~多云', '大雨~小雨', '小雨', '阴~小雨','多云~雷阵雨', '雷阵雨~多云', '阴~雷阵雨', '雷阵雨', '雷阵雨~大雨', '中雨~雷阵雨', '小雨~大雨','暴雨~雷阵雨', '雷阵雨~中雨', '小雨~雷阵雨', '雷阵雨~阴', '中雨~小雨', '小雨~中雨', '雾~多云','霾'], dtype=object)In [26]:
df["fengli"].value_counts()Out[26]:
1-2级 2363-4级 681级 214-5级 202级 133级 7Name: fengli, dtype: int64用途:
对于两个变量x, y:
In [27]:
# 协方差矩阵df.cov()Out[27]:
| bWendu | 140.613247 | 135.529633 | 47.462622 | 0.879204 |
| yWendu | 135.529633 | 138.181274 | 16.186685 | 0.264165 |
| aqi | 47.462622 | 16.186685 | 2697.364564 | 50.749842 |
| aqiLevel | 0.879204 | 0.264165 | 50.749842 | 1.060485 |
In [28]:
# 相关系数矩阵df.corr()Out[28]:
| bWendu | 1.000000 | 0.972292 | 0.077067 | 0.071999 |
| yWendu | 0.972292 | 1.000000 | 0.026513 | 0.021822 |
| aqi | 0.077067 | 0.026513 | 1.000000 | 0.948883 |
| aqiLevel | 0.071999 | 0.021822 | 0.948883 | 1.000000 |
In [29]:
# 单独查看空气质量和最高温度的相关系数df["aqi"].corr(df["bWendu"])Out[29]:
0.07706705916811067In [30]:
df["aqi"].corr(df["yWendu"])Out[30]:
0.026513282672968895In [31]:
# 空气质量和温差的相关系数df["aqi"].corr(df["bWendu"]-df["yWendu"])Out[31]:
0.2165225757638205Pandas使用这些函数处理缺失值:
In [1]:
import pandas as pdIn [5]:
# skiprows=2, 跳过前两行studf = pd.read_excel("./pandas-learn-code/datas/student_excel/student_excel.xlsx", skiprows=2)In [6]:
studfOut[6]:
| 0 | NaN | 小明 | 语文 | 85.0 |
| 1 | NaN | NaN | 数学 | 80.0 |
| 2 | NaN | NaN | 英语 | 90.0 |
| 3 | NaN | NaN | NaN | NaN |
| 4 | NaN | 小王 | 语文 | 85.0 |
| 5 | NaN | NaN | 数学 | NaN |
| 6 | NaN | NaN | 英语 | 90.0 |
| 7 | NaN | NaN | NaN | NaN |
| 8 | NaN | 小刚 | 语文 | 85.0 |
| 9 | NaN | NaN | 数学 | 80.0 |
| 10 | NaN | NaN | 英语 | 90.0 |
In [7]:
studf.isnull()Out[7]:
| 0 | True | False | False | False |
| 1 | True | True | False | False |
| 2 | True | True | False | False |
| 3 | True | True | True | True |
| 4 | True | False | False | False |
| 5 | True | True | False | True |
| 6 | True | True | False | False |
| 7 | True | True | True | True |
| 8 | True | False | False | False |
| 9 | True | True | False | False |
| 10 | True | True | False | False |
In [9]:
studf["分数"].isnull()Out[9]:
0 False1 False2 False3 True4 False5 True6 False7 True8 False9 False10 FalseName: 分数, dtype: boolIn [10]:
studf["分数"].notnull()Out[10]:
0 True1 True2 True3 False4 True5 False6 True7 False8 True9 True10 TrueName: 分数, dtype: boolIn [12]:
# 筛选没有空分数的所有行studf.loc[studf["分数"].notnull(), :]Out[12]:
| 0 | NaN | 小明 | 语文 | 85.0 |
| 1 | NaN | NaN | 数学 | 80.0 |
| 2 | NaN | NaN | 英语 | 90.0 |
| 4 | NaN | 小王 | 语文 | 85.0 |
| 6 | NaN | NaN | 英语 | 90.0 |
| 8 | NaN | 小刚 | 语文 | 85.0 |
| 9 | NaN | NaN | 数学 | 80.0 |
| 10 | NaN | NaN | 英语 | 90.0 |
In [15]:
studf.dropna(axis="columns", how="all", inplace=True)In [16]:
studfOut[16]:
| 0 | 小明 | 语文 | 85.0 |
| 1 | NaN | 数学 | 80.0 |
| 2 | NaN | 英语 | 90.0 |
| 4 | 小王 | 语文 | 85.0 |
| 5 | NaN | 数学 | NaN |
| 6 | NaN | 英语 | 90.0 |
| 8 | 小刚 | 语文 | 85.0 |
| 9 | NaN | 数学 | 80.0 |
| 10 | NaN | 英语 | 90.0 |
In [13]:
studf.dropna(axis="index", how="all", inplace=True)In [17]:
studfOut[17]:
| 0 | 小明 | 语文 | 85.0 |
| 1 | NaN | 数学 | 80.0 |
| 2 | NaN | 英语 | 90.0 |
| 4 | 小王 | 语文 | 85.0 |
| 5 | NaN | 数学 | NaN |
| 6 | NaN | 英语 | 90.0 |
| 8 | 小刚 | 语文 | 85.0 |
| 9 | NaN | 数学 | 80.0 |
| 10 | NaN | 英语 | 90.0 |
In [19]:
studf.fillna({"分数": 0}). . .
In [20]:
# 等同于studf.loc[:,"分数"] = studf["分数"].fillna(0)In [21]:
studfOut[21]:
| 0 | 小明 | 语文 | 85.0 |
| 1 | NaN | 数学 | 80.0 |
| 2 | NaN | 英语 | 90.0 |
| 4 | 小王 | 语文 | 85.0 |
| 5 | NaN | 数学 | 0.0 |
| 6 | NaN | 英语 | 90.0 |
| 8 | 小刚 | 语文 | 85.0 |
| 9 | NaN | 数学 | 80.0 |
| 10 | NaN | 英语 | 90.0 |
使用前面的有效值填充,用ffill:forward fill
In [22]:
studf.loc[:, "姓名"] = studf['姓名'].fillna(method="ffill")In [23]:
studfOut[23]:
| 0 | 小明 | 语文 | 85.0 |
| 1 | 小明 | 数学 | 80.0 |
| 2 | 小明 | 英语 | 90.0 |
| 4 | 小王 | 语文 | 85.0 |
| 5 | 小王 | 数学 | 0.0 |
| 6 | 小王 | 英语 | 90.0 |
| 8 | 小刚 | 语文 | 85.0 |
| 9 | 小刚 | 数学 | 80.0 |
| 10 | 小刚 | 英语 | 90.0 |
In [25]:
studf.to_excel(r"D:\WinterIsComing\python\New_Wave\pandas_basic\student_excel.xlsx", index=False)In [1]:
import pandas as pdIn [2]:
fpath = "./pandas-learn-code/datas/beijing_tianqi/beijing_tianqi_2018.csv"df = pd.read_csv(fpath)In [3]:
df.head()Out[3]:
| 0 | 2018-01-01 | 3℃ | -6℃ | 晴~多云 | 东北风 | 1-2级 | 59 | 良 | 2 |
| 1 | 2018-01-02 | 2℃ | -5℃ | 阴~多云 | 东北风 | 1-2级 | 49 | 优 | 1 |
| 2 | 2018-01-03 | 2℃ | -5℃ | 多云 | 北风 | 1-2级 | 28 | 优 | 1 |
| 3 | 2018-01-04 | 0℃ | -8℃ | 阴 | 东北风 | 1-2级 | 28 | 优 | 1 |
| 4 | 2018-01-05 | 3℃ | -6℃ | 多云~晴 | 西北风 | 1-2级 | 50 | 优 | 1 |
In [5]:
df.loc[:, "bWendu"] = df["bWendu"].str.replace("℃","").astype("int32")df.loc[:, "yWendu"] = df["yWendu"].str.replace("℃","").astype("int32")In [7]:
df.head()Out[7]:
| 0 | 2018-01-01 | 3 | -6 | 晴~多云 | 东北风 | 1-2级 | 59 | 良 | 2 |
| 1 | 2018-01-02 | 2 | -5 | 阴~多云 | 东北风 | 1-2级 | 49 | 优 | 1 |
| 2 | 2018-01-03 | 2 | -5 | 多云 | 北风 | 1-2级 | 28 | 优 | 1 |
| 3 | 2018-01-04 | 0 | -8 | 阴 | 东北风 | 1-2级 | 28 | 优 | 1 |
| 4 | 2018-01-05 | 3 | -6 | 多云~晴 | 西北风 | 1-2级 | 50 | 优 | 1 |
In [10]:
# 筛选出3月份的数据用于分析condition = df.loc[:, "ymd"].str.startswith("2018-03")In [11]:
# 设置三月份的温差# 错误写法df[condition]["wen_cha"] = df["bWendu"] - df["yWendu"]D:\Tools\Anaconda3\lib\site-packages\ipykernel_launcher.py:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.Try using .loc[row_indexer,col_indexer] = value insteadSee the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copyThis is separate from the ipykernel package so we can avoid doing imports untilIn [12]:
# 查看修改是否成功df[condition].head()# 只筛选了3月的数据,但没有新增温差列Out[12]:
| 59 | 2018-03-01 | 8 | -3 | 多云 | 西南风 | 1-2级 | 46 | 优 | 1 |
| 60 | 2018-03-02 | 9 | -1 | 晴~多云 | 北风 | 1-2级 | 95 | 良 | 2 |
| 61 | 2018-03-03 | 13 | 3 | 多云~阴 | 北风 | 1-2级 | 214 | 重度污染 | 5 |
| 62 | 2018-03-04 | 7 | -2 | 阴~多云 | 东南风 | 1-2级 | 144 | 轻度污染 | 3 |
| 63 | 2018-03-05 | 8 | -3 | 晴 | 南风 | 1-2级 | 94 | 良 | 2 |
发出警告的代码 df[condition][“wen_cha”] = df[“bWendu”]-df[“yWendu”]
相当于:df.get(condition).set(wen_cha),第一步骤的get发出了报警
*链式操作其实是两个步骤,先get后set,get得到的dataframe可能是view(是DateFrame的子视图,我们对它修改会直接影响原DateFrame)也可能是copy,pandas发出警告*
官网文档: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
核心要诀:pandas的dataframe的修改写操作,只允许在源dataframe上进行,一步到位
## 3. 解决方法1将get+set的两步操作,改成set的一步操作In [15]:
df.loc[condition, "wen_cha"] = df["bWendu"] - df["yWendu"]In [18]:
df.head(2)Out[18]:
| 0 | 2018-01-01 | 3 | -6 | 晴~多云 | 东北风 | 1-2级 | 59 | 良 | 2 | NaN |
| 1 | 2018-01-02 | 2 | -5 | 阴~多云 | 东北风 | 1-2级 | 49 | 优 | 1 | NaN |
In [19]:
df[condition].head(2)Out[19]:
| 59 | 2018-03-01 | 8 | -3 | 多云 | 西南风 | 1-2级 | 46 | 优 | 1 | 11.0 |
| 60 | 2018-03-02 | 9 | -1 | 晴~多云 | 北风 | 1-2级 | 95 | 良 | 2 | 10.0 |
如果需要预筛选数据做后续的处理分析,先使用copy复制DataFrame并进行操作
In [20]:
# 复制一个新的DateFrame df_month3:筛选3月份的数据并复制df_month3 = df[condition].copy()In [22]:
df_month3.head()Out[22]:
| 59 | 2018-03-01 | 8 | -3 | 多云 | 西南风 | 1-2级 | 46 | 优 | 1 | 11.0 |
| 60 | 2018-03-02 | 9 | -1 | 晴~多云 | 北风 | 1-2级 | 95 | 良 | 2 | 10.0 |
| 61 | 2018-03-03 | 13 | 3 | 多云~阴 | 北风 | 1-2级 | 214 | 重度污染 | 5 | 10.0 |
| 62 | 2018-03-04 | 7 | -2 | 阴~多云 | 东南风 | 1-2级 | 144 | 轻度污染 | 3 | 9.0 |
| 63 | 2018-03-05 | 8 | -3 | 晴 | 南风 | 1-2级 | 94 | 良 | 2 | 11.0 |
In [24]:
df_month3["wencha"] = df_month3["bWendu"] - df_month3["yWendu"]In [25]:
df_month3.head()Out[25]:
| 59 | 2018-03-01 | 8 | -3 | 多云 | 西南风 | 1-2级 | 46 | 优 | 1 | 11.0 | 11 |
| 60 | 2018-03-02 | 9 | -1 | 晴~多云 | 北风 | 1-2级 | 95 | 良 | 2 | 10.0 | 10 |
| 61 | 2018-03-03 | 13 | 3 | 多云~阴 | 北风 | 1-2级 | 214 | 重度污染 | 5 | 10.0 | 10 |
| 62 | 2018-03-04 | 7 | -2 | 阴~多云 | 东南风 | 1-2级 | 144 | 轻度污染 | 3 | 9.0 | 9 |
| 63 | 2018-03-05 | 8 | -3 | 晴 | 南风 | 1-2级 | 94 | 良 | 2 | 11.0 | 11 |
Series的排序:
*Series.sort_values(ascending=True, inplace=False)*
参数说明:
DataFrame的排序:
*DataFrame.sort_values(by, ascending=True, inplace=False)*
参数说明:
In [1]:
import pandas as pdIn [2]:
fpath = "./pandas-learn-code/datas/beijing_tianqi/beijing_tianqi_2018.csv"df = pd.read_csv(fpath)In [4]:
# 替换温度的后缀℃df.loc[:, "bWendu"] = df["bWendu"].str.replace("℃","").astype("int32")df.loc[:, "yWendu"] = df["yWendu"].str.replace("℃","").astype("int32")In [5]:
df.head()Out[5]:
| 0 | 2018-01-01 | 3 | -6 | 晴~多云 | 东北风 | 1-2级 | 59 | 良 | 2 |
| 1 | 2018-01-02 | 2 | -5 | 阴~多云 | 东北风 | 1-2级 | 49 | 优 | 1 |
| 2 | 2018-01-03 | 2 | -5 | 多云 | 北风 | 1-2级 | 28 | 优 | 1 |
| 3 | 2018-01-04 | 0 | -8 | 阴 | 东北风 | 1-2级 | 28 | 优 | 1 |
| 4 | 2018-01-05 | 3 | -6 | 多云~晴 | 西北风 | 1-2级 | 50 | 优 | 1 |
In [7]:
# 默认为升序df["aqi"].sort_values()Out[7]:
271 21281 21249 22272 22301 22246 2435 2433 2410 24273 25282 25359 269 26111 2724 272 28264 28319 28250 28266 283 28265 28205 28197 28204 29258 29362 29283 30308 3022 31... 334 163109 164108 17068 171176 17470 174294 176124 177286 177147 17849 183131 18613 187118 193336 198287 198330 198306 20661 21490 218316 21957 220335 23485 243329 245317 26671 28791 28772 29386 387Name: aqi, Length: 365, dtype: int64In [10]:
# 将排序方式调整为降序df["aqi"].sort_values(ascending=False)Out[10]:
86 38772 29391 28771 287317 266329 24585 243335 23457 220316 21990 21861 214306 206330 198287 198336 198118 19313 187131 18649 183147 178286 177124 177294 17670 174176 17468 171108 170109 164334 163... 22 31308 30283 30362 29258 29204 29197 28205 28265 283 28266 28250 28319 28264 282 2824 27111 279 26359 26282 25273 2510 2433 2435 24246 24301 22272 22249 22281 21271 21Name: aqi, Length: 365, dtype: int64In [12]:
# 对中文也可以排序df["tianqi"].sort_values()Out[12]:
225 中雨~小雨230 中雨~小雨197 中雨~雷阵雨196 中雨~雷阵雨112 多云108 多云232 多云234 多云241 多云94 多云91 多云88 多云252 多云84 多云364 多云165 多云81 多云79 多云78 多云77 多云257 多云74 多云69 多云67 多云261 多云262 多云268 多云270 多云226 多云253 多云... 338 阴~多云111 阴~多云243 阴~小雨139 阴~小雨20 阴~小雪167 阴~雷阵雨237 雷阵雨195 雷阵雨223 雷阵雨187 雷阵雨168 雷阵雨188 雷阵雨193 雷阵雨175 雷阵雨218 雷阵雨~中雨216 雷阵雨~中雨224 雷阵雨~中雨222 雷阵雨~中雨189 雷阵雨~多云163 雷阵雨~多云180 雷阵雨~多云183 雷阵雨~多云194 雷阵雨~多云172 雷阵雨~多云233 雷阵雨~多云191 雷阵雨~大雨219 雷阵雨~阴335 雾~多云353 霾348 霾Name: tianqi, Length: 365, dtype: objectIn [13]:
# 按照空气质量进行排序df.sort_values(by="aqi")Out[13]:
| 271 | 2018-09-29 | 22 | 11 | 晴 | 北风 | 3-4级 | 21 | 优 | 1 |
| 281 | 2018-10-09 | 15 | 4 | 多云~晴 | 西北风 | 4-5级 | 21 | 优 | 1 |
| 249 | 2018-09-07 | 27 | 16 | 晴 | 西北风 | 3-4级 | 22 | 优 | 1 |
| 272 | 2018-09-30 | 19 | 13 | 多云 | 西北风 | 4-5级 | 22 | 优 | 1 |
| 301 | 2018-10-29 | 15 | 3 | 晴 | 北风 | 3-4级 | 22 | 优 | 1 |
| 246 | 2018-09-04 | 31 | 18 | 晴 | 西南风 | 3-4级 | 24 | 优 | 1 |
| 35 | 2018-02-05 | 0 | -10 | 晴 | 北风 | 3-4级 | 24 | 优 | 1 |
| 33 | 2018-02-03 | 0 | -9 | 多云 | 北风 | 1-2级 | 24 | 优 | 1 |
| 10 | 2018-01-11 | -1 | -10 | 晴 | 北风 | 1-2级 | 24 | 优 | 1 |
| 273 | 2018-10-01 | 24 | 12 | 晴 | 北风 | 4-5级 | 25 | 优 | 1 |
| 282 | 2018-10-10 | 17 | 4 | 多云~晴 | 西北风 | 1-2级 | 25 | 优 | 1 |
| 359 | 2018-12-26 | -2 | -11 | 晴~多云 | 东北风 | 2级 | 26 | 优 | 1 |
| 9 | 2018-01-10 | -2 | -10 | 晴 | 西北风 | 1-2级 | 26 | 优 | 1 |
| 111 | 2018-04-22 | 16 | 12 | 阴~多云 | 东北风 | 3-4级 | 27 | 优 | 1 |
| 24 | 2018-01-25 | -3 | -11 | 多云 | 东北风 | 1-2级 | 27 | 优 | 1 |
| 2 | 2018-01-03 | 2 | -5 | 多云 | 北风 | 1-2级 | 28 | 优 | 1 |
| 264 | 2018-09-22 | 24 | 13 | 晴 | 西北风 | 3-4级 | 28 | 优 | 1 |
| 319 | 2018-11-16 | 8 | -1 | 晴~多云 | 北风 | 1-2级 | 28 | 优 | 1 |
| 250 | 2018-09-08 | 27 | 15 | 多云~晴 | 北风 | 1-2级 | 28 | 优 | 1 |
| 266 | 2018-09-24 | 23 | 11 | 晴 | 北风 | 1-2级 | 28 | 优 | 1 |
| 3 | 2018-01-04 | 0 | -8 | 阴 | 东北风 | 1-2级 | 28 | 优 | 1 |
| 265 | 2018-09-23 | 23 | 12 | 晴 | 西北风 | 4-5级 | 28 | 优 | 1 |
| 205 | 2018-07-25 | 32 | 25 | 多云 | 北风 | 1-2级 | 28 | 优 | 1 |
| 197 | 2018-07-17 | 27 | 23 | 中雨~雷阵雨 | 西风 | 1-2级 | 28 | 优 | 1 |
| 204 | 2018-07-24 | 28 | 26 | 暴雨~雷阵雨 | 东北风 | 3-4级 | 29 | 优 | 1 |
| 258 | 2018-09-16 | 25 | 14 | 多云~晴 | 北风 | 1-2级 | 29 | 优 | 1 |
| 362 | 2018-12-29 | -3 | -12 | 晴 | 西北风 | 2级 | 29 | 优 | 1 |
| 283 | 2018-10-11 | 18 | 5 | 晴~多云 | 北风 | 1-2级 | 30 | 优 | 1 |
| 308 | 2018-11-05 | 10 | 2 | 多云 | 西南风 | 1-2级 | 30 | 优 | 1 |
| 22 | 2018-01-23 | -4 | -12 | 晴 | 西北风 | 3-4级 | 31 | 优 | 1 |
| … | … | … | … | … | … | … | … | … | … |
| 334 | 2018-12-01 | 7 | 0 | 多云 | 东南风 | 1级 | 163 | 中度污染 | 4 |
| 109 | 2018-04-20 | 28 | 14 | 多云~小雨 | 南风 | 4-5级 | 164 | 中度污染 | 4 |
| 108 | 2018-04-19 | 26 | 13 | 多云 | 东南风 | 4-5级 | 170 | 中度污染 | 4 |
| 68 | 2018-03-10 | 14 | -2 | 晴 | 东南风 | 1-2级 | 171 | 中度污染 | 4 |
| 176 | 2018-06-26 | 36 | 25 | 晴 | 西南风 | 3-4级 | 174 | 中度污染 | 4 |
| 70 | 2018-03-12 | 15 | 3 | 多云~晴 | 南风 | 1-2级 | 174 | 中度污染 | 4 |
| 294 | 2018-10-22 | 19 | 5 | 多云~晴 | 西北风 | 1-2级 | 176 | 中度污染 | 4 |
| 124 | 2018-05-05 | 25 | 13 | 多云 | 北风 | 3-4级 | 177 | 中度污染 | 4 |
| 286 | 2018-10-14 | 21 | 10 | 多云 | 南风 | 1-2级 | 177 | 中度污染 | 4 |
| 147 | 2018-05-28 | 30 | 16 | 晴 | 西北风 | 4-5级 | 178 | 中度污染 | 4 |
| 49 | 2018-02-19 | 6 | -3 | 多云 | 南风 | 1-2级 | 183 | 中度污染 | 4 |
| 131 | 2018-05-12 | 28 | 16 | 小雨 | 东南风 | 3-4级 | 186 | 中度污染 | 4 |
| 13 | 2018-01-14 | 6 | -5 | 晴~多云 | 西北风 | 1-2级 | 187 | 中度污染 | 4 |
| 118 | 2018-04-29 | 30 | 16 | 多云 | 南风 | 3-4级 | 193 | 中度污染 | 4 |
| 336 | 2018-12-03 | 8 | -3 | 多云~晴 | 东北风 | 3级 | 198 | 中度污染 | 4 |
| 287 | 2018-10-15 | 17 | 11 | 小雨 | 北风 | 1-2级 | 198 | 中度污染 | 4 |
| 330 | 2018-11-27 | 9 | -3 | 晴~多云 | 西北风 | 2级 | 198 | 中度污染 | 4 |
| 306 | 2018-11-03 | 16 | 6 | 多云 | 南风 | 1-2级 | 206 | 重度污染 | 5 |
| 61 | 2018-03-03 | 13 | 3 | 多云~阴 | 北风 | 1-2级 | 214 | 重度污染 | 5 |
| 90 | 2018-04-01 | 25 | 11 | 晴~多云 | 南风 | 1-2级 | 218 | 重度污染 | 5 |
| 316 | 2018-11-13 | 13 | 5 | 多云 | 东南风 | 1-2级 | 219 | 重度污染 | 5 |
| 57 | 2018-02-27 | 7 | 0 | 阴 | 东风 | 1-2级 | 220 | 重度污染 | 5 |
| 335 | 2018-12-02 | 9 | 2 | 雾~多云 | 东北风 | 1级 | 234 | 重度污染 | 5 |
| 85 | 2018-03-27 | 27 | 11 | 晴 | 南风 | 1-2级 | 243 | 重度污染 | 5 |
| 329 | 2018-11-26 | 10 | 0 | 多云 | 东南风 | 1级 | 245 | 重度污染 | 5 |
| 317 | 2018-11-14 | 13 | 5 | 多云 | 南风 | 1-2级 | 266 | 重度污染 | 5 |
| 71 | 2018-03-13 | 17 | 5 | 晴~多云 | 南风 | 1-2级 | 287 | 重度污染 | 5 |
| 91 | 2018-04-02 | 26 | 11 | 多云 | 北风 | 1-2级 | 287 | 重度污染 | 5 |
| 72 | 2018-03-14 | 15 | 6 | 多云~阴 | 东北风 | 1-2级 | 293 | 重度污染 | 5 |
| 86 | 2018-03-28 | 25 | 9 | 多云~晴 | 东风 | 1-2级 | 387 | 严重污染 | 6 |
365 rows × 9 columns
In [14]:
# 指定降序df.sort_values(by="aqi", ascending=False)Out[14]:
| 86 | 2018-03-28 | 25 | 9 | 多云~晴 | 东风 | 1-2级 | 387 | 严重污染 | 6 |
| 72 | 2018-03-14 | 15 | 6 | 多云~阴 | 东北风 | 1-2级 | 293 | 重度污染 | 5 |
| 71 | 2018-03-13 | 17 | 5 | 晴~多云 | 南风 | 1-2级 | 287 | 重度污染 | 5 |
| 91 | 2018-04-02 | 26 | 11 | 多云 | 北风 | 1-2级 | 287 | 重度污染 | 5 |
| 317 | 2018-11-14 | 13 | 5 | 多云 | 南风 | 1-2级 | 266 | 重度污染 | 5 |
| 329 | 2018-11-26 | 10 | 0 | 多云 | 东南风 | 1级 | 245 | 重度污染 | 5 |
| 85 | 2018-03-27 | 27 | 11 | 晴 | 南风 | 1-2级 | 243 | 重度污染 | 5 |
| 335 | 2018-12-02 | 9 | 2 | 雾~多云 | 东北风 | 1级 | 234 | 重度污染 | 5 |
| 57 | 2018-02-27 | 7 | 0 | 阴 | 东风 | 1-2级 | 220 | 重度污染 | 5 |
| 316 | 2018-11-13 | 13 | 5 | 多云 | 东南风 | 1-2级 | 219 | 重度污染 | 5 |
| 90 | 2018-04-01 | 25 | 11 | 晴~多云 | 南风 | 1-2级 | 218 | 重度污染 | 5 |
| 61 | 2018-03-03 | 13 | 3 | 多云~阴 | 北风 | 1-2级 | 214 | 重度污染 | 5 |
| 306 | 2018-11-03 | 16 | 6 | 多云 | 南风 | 1-2级 | 206 | 重度污染 | 5 |
| 287 | 2018-10-15 | 17 | 11 | 小雨 | 北风 | 1-2级 | 198 | 中度污染 | 4 |
| 336 | 2018-12-03 | 8 | -3 | 多云~晴 | 东北风 | 3级 | 198 | 中度污染 | 4 |
| 330 | 2018-11-27 | 9 | -3 | 晴~多云 | 西北风 | 2级 | 198 | 中度污染 | 4 |
| 118 | 2018-04-29 | 30 | 16 | 多云 | 南风 | 3-4级 | 193 | 中度污染 | 4 |
| 13 | 2018-01-14 | 6 | -5 | 晴~多云 | 西北风 | 1-2级 | 187 | 中度污染 | 4 |
| 131 | 2018-05-12 | 28 | 16 | 小雨 | 东南风 | 3-4级 | 186 | 中度污染 | 4 |
| 49 | 2018-02-19 | 6 | -3 | 多云 | 南风 | 1-2级 | 183 | 中度污染 | 4 |
| 147 | 2018-05-28 | 30 | 16 | 晴 | 西北风 | 4-5级 | 178 | 中度污染 | 4 |
| 286 | 2018-10-14 | 21 | 10 | 多云 | 南风 | 1-2级 | 177 | 中度污染 | 4 |
| 124 | 2018-05-05 | 25 | 13 | 多云 | 北风 | 3-4级 | 177 | 中度污染 | 4 |
| 294 | 2018-10-22 | 19 | 5 | 多云~晴 | 西北风 | 1-2级 | 176 | 中度污染 | 4 |
| 70 | 2018-03-12 | 15 | 3 | 多云~晴 | 南风 | 1-2级 | 174 | 中度污染 | 4 |
| 176 | 2018-06-26 | 36 | 25 | 晴 | 西南风 | 3-4级 | 174 | 中度污染 | 4 |
| 68 | 2018-03-10 | 14 | -2 | 晴 | 东南风 | 1-2级 | 171 | 中度污染 | 4 |
| 108 | 2018-04-19 | 26 | 13 | 多云 | 东南风 | 4-5级 | 170 | 中度污染 | 4 |
| 109 | 2018-04-20 | 28 | 14 | 多云~小雨 | 南风 | 4-5级 | 164 | 中度污染 | 4 |
| 334 | 2018-12-01 | 7 | 0 | 多云 | 东南风 | 1级 | 163 | 中度污染 | 4 |
| … | … | … | … | … | … | … | … | … | … |
| 274 | 2018-10-02 | 24 | 11 | 晴 | 西北风 | 1-2级 | 31 | 优 | 1 |
| 308 | 2018-11-05 | 10 | 2 | 多云 | 西南风 | 1-2级 | 30 | 优 | 1 |
| 283 | 2018-10-11 | 18 | 5 | 晴~多云 | 北风 | 1-2级 | 30 | 优 | 1 |
| 362 | 2018-12-29 | -3 | -12 | 晴 | 西北风 | 2级 | 29 | 优 | 1 |
| 258 | 2018-09-16 | 25 | 14 | 多云~晴 | 北风 | 1-2级 | 29 | 优 | 1 |
| 204 | 2018-07-24 | 28 | 26 | 暴雨~雷阵雨 | 东北风 | 3-4级 | 29 | 优 | 1 |
| 2 | 2018-01-03 | 2 | -5 | 多云 | 北风 | 1-2级 | 28 | 优 | 1 |
| 3 | 2018-01-04 | 0 | -8 | 阴 | 东北风 | 1-2级 | 28 | 优 | 1 |
| 250 | 2018-09-08 | 27 | 15 | 多云~晴 | 北风 | 1-2级 | 28 | 优 | 1 |
| 205 | 2018-07-25 | 32 | 25 | 多云 | 北风 | 1-2级 | 28 | 优 | 1 |
| 197 | 2018-07-17 | 27 | 23 | 中雨~雷阵雨 | 西风 | 1-2级 | 28 | 优 | 1 |
| 264 | 2018-09-22 | 24 | 13 | 晴 | 西北风 | 3-4级 | 28 | 优 | 1 |
| 266 | 2018-09-24 | 23 | 11 | 晴 | 北风 | 1-2级 | 28 | 优 | 1 |
| 265 | 2018-09-23 | 23 | 12 | 晴 | 西北风 | 4-5级 | 28 | 优 | 1 |
| 319 | 2018-11-16 | 8 | -1 | 晴~多云 | 北风 | 1-2级 | 28 | 优 | 1 |
| 111 | 2018-04-22 | 16 | 12 | 阴~多云 | 东北风 | 3-4级 | 27 | 优 | 1 |
| 24 | 2018-01-25 | -3 | -11 | 多云 | 东北风 | 1-2级 | 27 | 优 | 1 |
| 9 | 2018-01-10 | -2 | -10 | 晴 | 西北风 | 1-2级 | 26 | 优 | 1 |
| 359 | 2018-12-26 | -2 | -11 | 晴~多云 | 东北风 | 2级 | 26 | 优 | 1 |
| 273 | 2018-10-01 | 24 | 12 | 晴 | 北风 | 4-5级 | 25 | 优 | 1 |
| 282 | 2018-10-10 | 17 | 4 | 多云~晴 | 西北风 | 1-2级 | 25 | 优 | 1 |
| 33 | 2018-02-03 | 0 | -9 | 多云 | 北风 | 1-2级 | 24 | 优 | 1 |
| 246 | 2018-09-04 | 31 | 18 | 晴 | 西南风 | 3-4级 | 24 | 优 | 1 |
| 10 | 2018-01-11 | -1 | -10 | 晴 | 北风 | 1-2级 | 24 | 优 | 1 |
| 35 | 2018-02-05 | 0 | -10 | 晴 | 北风 | 3-4级 | 24 | 优 | 1 |
| 249 | 2018-09-07 | 27 | 16 | 晴 | 西北风 | 3-4级 | 22 | 优 | 1 |
| 301 | 2018-10-29 | 15 | 3 | 晴 | 北风 | 3-4级 | 22 | 优 | 1 |
| 272 | 2018-09-30 | 19 | 13 | 多云 | 西北风 | 4-5级 | 22 | 优 | 1 |
| 271 | 2018-09-29 | 22 | 11 | 晴 | 北风 | 3-4级 | 21 | 优 | 1 |
| 281 | 2018-10-09 | 15 | 4 | 多云~晴 | 西北风 | 4-5级 | 21 | 优 | 1 |
365 rows × 9 columns
In [15]:
# 按空气质量等级、最高温度默认排序,默认升序df.sort_values(by=["aqiLevel", "bWendu"])Out[15]:
| 360 | 2018-12-27 | -5 | -12 | 多云~晴 | 西北风 | 3级 | 48 | 优 | 1 |
| 22 | 2018-01-23 | -4 | -12 | 晴 | 西北风 | 3-4级 | 31 | 优 | 1 |
| 23 | 2018-01-24 | -4 | -11 | 晴 | 西南风 | 1-2级 | 34 | 优 | 1 |
| 340 | 2018-12-07 | -4 | -10 | 晴 | 西北风 | 3级 | 33 | 优 | 1 |
| 21 | 2018-01-22 | -3 | -10 | 小雪~多云 | 东风 | 1-2级 | 47 | 优 | 1 |
| 24 | 2018-01-25 | -3 | -11 | 多云 | 东北风 | 1-2级 | 27 | 优 | 1 |
| 25 | 2018-01-26 | -3 | -10 | 晴~多云 | 南风 | 1-2级 | 39 | 优 | 1 |
| 361 | 2018-12-28 | -3 | -11 | 晴 | 西北风 | 3级 | 40 | 优 | 1 |
| 362 | 2018-12-29 | -3 | -12 | 晴 | 西北风 | 2级 | 29 | 优 | 1 |
| 9 | 2018-01-10 | -2 | -10 | 晴 | 西北风 | 1-2级 | 26 | 优 | 1 |
| 339 | 2018-12-06 | -2 | -9 | 晴 | 西北风 | 3级 | 40 | 优 | 1 |
| 341 | 2018-12-08 | -2 | -10 | 晴~多云 | 西北风 | 2级 | 37 | 优 | 1 |
| 359 | 2018-12-26 | -2 | -11 | 晴~多云 | 东北风 | 2级 | 26 | 优 | 1 |
| 363 | 2018-12-30 | -2 | -11 | 晴~多云 | 东北风 | 1级 | 31 | 优 | 1 |
| 10 | 2018-01-11 | -1 | -10 | 晴 | 北风 | 1-2级 | 24 | 优 | 1 |
| 32 | 2018-02-02 | -1 | -9 | 晴 | 北风 | 3-4级 | 32 | 优 | 1 |
| 3 | 2018-01-04 | 0 | -8 | 阴 | 东北风 | 1-2级 | 28 | 优 | 1 |
| 33 | 2018-02-03 | 0 | -9 | 多云 | 北风 | 1-2级 | 24 | 优 | 1 |
| 35 | 2018-02-05 | 0 | -10 | 晴 | 北风 | 3-4级 | 24 | 优 | 1 |
| 8 | 2018-01-09 | 1 | -8 | 晴 | 西北风 | 3-4级 | 34 | 优 | 1 |
| 34 | 2018-02-04 | 1 | -8 | 晴 | 西南风 | 1-2级 | 36 | 优 | 1 |
| 40 | 2018-02-10 | 1 | -9 | 晴 | 西北风 | 3-4级 | 39 | 优 | 1 |
| 345 | 2018-12-12 | 1 | -8 | 晴 | 西南风 | 1级 | 50 | 优 | 1 |
| 1 | 2018-01-02 | 2 | -5 | 阴~多云 | 东北风 | 1-2级 | 49 | 优 | 1 |
| 2 | 2018-01-03 | 2 | -5 | 多云 | 北风 | 1-2级 | 28 | 优 | 1 |
| 5 | 2018-01-06 | 2 | -5 | 多云~阴 | 西南风 | 1-2级 | 32 | 优 | 1 |
| 7 | 2018-01-08 | 2 | -6 | 晴 | 西北风 | 4-5级 | 50 | 优 | 1 |
| 14 | 2018-01-15 | 2 | -5 | 阴 | 东南风 | 1-2级 | 47 | 优 | 1 |
| 4 | 2018-01-05 | 3 | -6 | 多云~晴 | 西北风 | 1-2级 | 50 | 优 | 1 |
| 346 | 2018-12-13 | 3 | -7 | 晴 | 西北风 | 2级 | 42 | 优 | 1 |
| … | … | … | … | … | … | … | … | … | … |
| 330 | 2018-11-27 | 9 | -3 | 晴~多云 | 西北风 | 2级 | 198 | 中度污染 | 4 |
| 56 | 2018-02-26 | 12 | -1 | 晴~多云 | 西南风 | 1-2级 | 157 | 中度污染 | 4 |
| 68 | 2018-03-10 | 14 | -2 | 晴 | 东南风 | 1-2级 | 171 | 中度污染 | 4 |
| 70 | 2018-03-12 | 15 | 3 | 多云~晴 | 南风 | 1-2级 | 174 | 中度污染 | 4 |
| 287 | 2018-10-15 | 17 | 11 | 小雨 | 北风 | 1-2级 | 198 | 中度污染 | 4 |
| 294 | 2018-10-22 | 19 | 5 | 多云~晴 | 西北风 | 1-2级 | 176 | 中度污染 | 4 |
| 286 | 2018-10-14 | 21 | 10 | 多云 | 南风 | 1-2级 | 177 | 中度污染 | 4 |
| 84 | 2018-03-26 | 25 | 7 | 多云 | 西南风 | 1-2级 | 151 | 中度污染 | 4 |
| 124 | 2018-05-05 | 25 | 13 | 多云 | 北风 | 3-4级 | 177 | 中度污染 | 4 |
| 108 | 2018-04-19 | 26 | 13 | 多云 | 东南风 | 4-5级 | 170 | 中度污染 | 4 |
| 109 | 2018-04-20 | 28 | 14 | 多云~小雨 | 南风 | 4-5级 | 164 | 中度污染 | 4 |
| 131 | 2018-05-12 | 28 | 16 | 小雨 | 东南风 | 3-4级 | 186 | 中度污染 | 4 |
| 142 | 2018-05-23 | 29 | 15 | 晴 | 西南风 | 3-4级 | 153 | 中度污染 | 4 |
| 118 | 2018-04-29 | 30 | 16 | 多云 | 南风 | 3-4级 | 193 | 中度污染 | 4 |
| 147 | 2018-05-28 | 30 | 16 | 晴 | 西北风 | 4-5级 | 178 | 中度污染 | 4 |
| 133 | 2018-05-14 | 34 | 22 | 晴~多云 | 南风 | 3-4级 | 158 | 中度污染 | 4 |
| 176 | 2018-06-26 | 36 | 25 | 晴 | 西南风 | 3-4级 | 174 | 中度污染 | 4 |
| 57 | 2018-02-27 | 7 | 0 | 阴 | 东风 | 1-2级 | 220 | 重度污染 | 5 |
| 335 | 2018-12-02 | 9 | 2 | 雾~多云 | 东北风 | 1级 | 234 | 重度污染 | 5 |
| 329 | 2018-11-26 | 10 | 0 | 多云 | 东南风 | 1级 | 245 | 重度污染 | 5 |
| 61 | 2018-03-03 | 13 | 3 | 多云~阴 | 北风 | 1-2级 | 214 | 重度污染 | 5 |
| 316 | 2018-11-13 | 13 | 5 | 多云 | 东南风 | 1-2级 | 219 | 重度污染 | 5 |
| 317 | 2018-11-14 | 13 | 5 | 多云 | 南风 | 1-2级 | 266 | 重度污染 | 5 |
| 72 | 2018-03-14 | 15 | 6 | 多云~阴 | 东北风 | 1-2级 | 293 | 重度污染 | 5 |
| 306 | 2018-11-03 | 16 | 6 | 多云 | 南风 | 1-2级 | 206 | 重度污染 | 5 |
| 71 | 2018-03-13 | 17 | 5 | 晴~多云 | 南风 | 1-2级 | 287 | 重度污染 | 5 |
| 90 | 2018-04-01 | 25 | 11 | 晴~多云 | 南风 | 1-2级 | 218 | 重度污染 | 5 |
| 91 | 2018-04-02 | 26 | 11 | 多云 | 北风 | 1-2级 | 287 | 重度污染 | 5 |
| 85 | 2018-03-27 | 27 | 11 | 晴 | 南风 | 1-2级 | 243 | 重度污染 | 5 |
| 86 | 2018-03-28 | 25 | 9 | 多云~晴 | 东风 | 1-2级 | 387 | 严重污染 | 6 |
365 rows × 9 columns
In [17]:
# 两个字段都是降序df.sort_values(by=["aqiLevel","bWendu"],ascending=False)Out[17]:
| 86 | 2018-03-28 | 25 | 9 | 多云~晴 | 东风 | 1-2级 | 387 | 严重污染 | 6 |
| 85 | 2018-03-27 | 27 | 11 | 晴 | 南风 | 1-2级 | 243 | 重度污染 | 5 |
| 91 | 2018-04-02 | 26 | 11 | 多云 | 北风 | 1-2级 | 287 | 重度污染 | 5 |
| 90 | 2018-04-01 | 25 | 11 | 晴~多云 | 南风 | 1-2级 | 218 | 重度污染 | 5 |
| 71 | 2018-03-13 | 17 | 5 | 晴~多云 | 南风 | 1-2级 | 287 | 重度污染 | 5 |
| 306 | 2018-11-03 | 16 | 6 | 多云 | 南风 | 1-2级 | 206 | 重度污染 | 5 |
| 72 | 2018-03-14 | 15 | 6 | 多云~阴 | 东北风 | 1-2级 | 293 | 重度污染 | 5 |
| 61 | 2018-03-03 | 13 | 3 | 多云~阴 | 北风 | 1-2级 | 214 | 重度污染 | 5 |
| 316 | 2018-11-13 | 13 | 5 | 多云 | 东南风 | 1-2级 | 219 | 重度污染 | 5 |
| 317 | 2018-11-14 | 13 | 5 | 多云 | 南风 | 1-2级 | 266 | 重度污染 | 5 |
| 329 | 2018-11-26 | 10 | 0 | 多云 | 东南风 | 1级 | 245 | 重度污染 | 5 |
| 335 | 2018-12-02 | 9 | 2 | 雾~多云 | 东北风 | 1级 | 234 | 重度污染 | 5 |
| 57 | 2018-02-27 | 7 | 0 | 阴 | 东风 | 1-2级 | 220 | 重度污染 | 5 |
| 176 | 2018-06-26 | 36 | 25 | 晴 | 西南风 | 3-4级 | 174 | 中度污染 | 4 |
| 133 | 2018-05-14 | 34 | 22 | 晴~多云 | 南风 | 3-4级 | 158 | 中度污染 | 4 |
| 118 | 2018-04-29 | 30 | 16 | 多云 | 南风 | 3-4级 | 193 | 中度污染 | 4 |
| 147 | 2018-05-28 | 30 | 16 | 晴 | 西北风 | 4-5级 | 178 | 中度污染 | 4 |
| 142 | 2018-05-23 | 29 | 15 | 晴 | 西南风 | 3-4级 | 153 | 中度污染 | 4 |
| 109 | 2018-04-20 | 28 | 14 | 多云~小雨 | 南风 | 4-5级 | 164 | 中度污染 | 4 |
| 131 | 2018-05-12 | 28 | 16 | 小雨 | 东南风 | 3-4级 | 186 | 中度污染 | 4 |
| 108 | 2018-04-19 | 26 | 13 | 多云 | 东南风 | 4-5级 | 170 | 中度污染 | 4 |
| 84 | 2018-03-26 | 25 | 7 | 多云 | 西南风 | 1-2级 | 151 | 中度污染 | 4 |
| 124 | 2018-05-05 | 25 | 13 | 多云 | 北风 | 3-4级 | 177 | 中度污染 | 4 |
| 286 | 2018-10-14 | 21 | 10 | 多云 | 南风 | 1-2级 | 177 | 中度污染 | 4 |
| 294 | 2018-10-22 | 19 | 5 | 多云~晴 | 西北风 | 1-2级 | 176 | 中度污染 | 4 |
| 287 | 2018-10-15 | 17 | 11 | 小雨 | 北风 | 1-2级 | 198 | 中度污染 | 4 |
| 70 | 2018-03-12 | 15 | 3 | 多云~晴 | 南风 | 1-2级 | 174 | 中度污染 | 4 |
| 68 | 2018-03-10 | 14 | -2 | 晴 | 东南风 | 1-2级 | 171 | 中度污染 | 4 |
| 56 | 2018-02-26 | 12 | -1 | 晴~多云 | 西南风 | 1-2级 | 157 | 中度污染 | 4 |
| 330 | 2018-11-27 | 9 | -3 | 晴~多云 | 西北风 | 2级 | 198 | 中度污染 | 4 |
| … | … | … | … | … | … | … | … | … | … |
| 4 | 2018-01-05 | 3 | -6 | 多云~晴 | 西北风 | 1-2级 | 50 | 优 | 1 |
| 346 | 2018-12-13 | 3 | -7 | 晴 | 西北风 | 2级 | 42 | 优 | 1 |
| 1 | 2018-01-02 | 2 | -5 | 阴~多云 | 东北风 | 1-2级 | 49 | 优 | 1 |
| 2 | 2018-01-03 | 2 | -5 | 多云 | 北风 | 1-2级 | 28 | 优 | 1 |
| 5 | 2018-01-06 | 2 | -5 | 多云~阴 | 西南风 | 1-2级 | 32 | 优 | 1 |
| 7 | 2018-01-08 | 2 | -6 | 晴 | 西北风 | 4-5级 | 50 | 优 | 1 |
| 14 | 2018-01-15 | 2 | -5 | 阴 | 东南风 | 1-2级 | 47 | 优 | 1 |
| 8 | 2018-01-09 | 1 | -8 | 晴 | 西北风 | 3-4级 | 34 | 优 | 1 |
| 34 | 2018-02-04 | 1 | -8 | 晴 | 西南风 | 1-2级 | 36 | 优 | 1 |
| 40 | 2018-02-10 | 1 | -9 | 晴 | 西北风 | 3-4级 | 39 | 优 | 1 |
| 345 | 2018-12-12 | 1 | -8 | 晴 | 西南风 | 1级 | 50 | 优 | 1 |
| 3 | 2018-01-04 | 0 | -8 | 阴 | 东北风 | 1-2级 | 28 | 优 | 1 |
| 33 | 2018-02-03 | 0 | -9 | 多云 | 北风 | 1-2级 | 24 | 优 | 1 |
| 35 | 2018-02-05 | 0 | -10 | 晴 | 北风 | 3-4级 | 24 | 优 | 1 |
| 10 | 2018-01-11 | -1 | -10 | 晴 | 北风 | 1-2级 | 24 | 优 | 1 |
| 32 | 2018-02-02 | -1 | -9 | 晴 | 北风 | 3-4级 | 32 | 优 | 1 |
| 9 | 2018-01-10 | -2 | -10 | 晴 | 西北风 | 1-2级 | 26 | 优 | 1 |
| 339 | 2018-12-06 | -2 | -9 | 晴 | 西北风 | 3级 | 40 | 优 | 1 |
| 341 | 2018-12-08 | -2 | -10 | 晴~多云 | 西北风 | 2级 | 37 | 优 | 1 |
| 359 | 2018-12-26 | -2 | -11 | 晴~多云 | 东北风 | 2级 | 26 | 优 | 1 |
| 363 | 2018-12-30 | -2 | -11 | 晴~多云 | 东北风 | 1级 | 31 | 优 | 1 |
| 21 | 2018-01-22 | -3 | -10 | 小雪~多云 | 东风 | 1-2级 | 47 | 优 | 1 |
| 24 | 2018-01-25 | -3 | -11 | 多云 | 东北风 | 1-2级 | 27 | 优 | 1 |
| 25 | 2018-01-26 | -3 | -10 | 晴~多云 | 南风 | 1-2级 | 39 | 优 | 1 |
| 361 | 2018-12-28 | -3 | -11 | 晴 | 西北风 | 3级 | 40 | 优 | 1 |
| 362 | 2018-12-29 | -3 | -12 | 晴 | 西北风 | 2级 | 29 | 优 | 1 |
| 22 | 2018-01-23 | -4 | -12 | 晴 | 西北风 | 3-4级 | 31 | 优 | 1 |
| 23 | 2018-01-24 | -4 | -11 | 晴 | 西南风 | 1-2级 | 34 | 优 | 1 |
| 340 | 2018-12-07 | -4 | -10 | 晴 | 西北风 | 3级 | 33 | 优 | 1 |
| 360 | 2018-12-27 | -5 | -12 | 多云~晴 | 西北风 | 3级 | 48 | 优 | 1 |
365 rows × 9 columns
In [18]:
# 分别指定升序和降序df.sort_values(by=["aqiLevel", "bWendu"], ascending=[True, False])Out[18]:
| 178 | 2018-06-28 | 35 | 24 | 多云~晴 | 北风 | 1-2级 | 33 | 优 | 1 |
| 149 | 2018-05-30 | 33 | 18 | 晴 | 西风 | 1-2级 | 46 | 优 | 1 |
| 206 | 2018-07-26 | 33 | 25 | 多云~雷阵雨 | 东北风 | 1-2级 | 40 | 优 | 1 |
| 158 | 2018-06-08 | 32 | 19 | 多云~雷阵雨 | 西南风 | 1-2级 | 43 | 优 | 1 |
| 205 | 2018-07-25 | 32 | 25 | 多云 | 北风 | 1-2级 | 28 | 优 | 1 |
| 226 | 2018-08-15 | 32 | 24 | 多云 | 东北风 | 3-4级 | 33 | 优 | 1 |
| 231 | 2018-08-20 | 32 | 23 | 多云~晴 | 北风 | 1-2级 | 41 | 优 | 1 |
| 232 | 2018-08-21 | 32 | 22 | 多云 | 北风 | 1-2级 | 38 | 优 | 1 |
| 148 | 2018-05-29 | 31 | 16 | 多云 | 西北风 | 1-2级 | 41 | 优 | 1 |
| 196 | 2018-07-16 | 31 | 24 | 中雨~雷阵雨 | 南风 | 1-2级 | 43 | 优 | 1 |
| 234 | 2018-08-23 | 31 | 21 | 多云 | 北风 | 1-2级 | 43 | 优 | 1 |
| 240 | 2018-08-29 | 31 | 20 | 晴~多云 | 北风 | 3-4级 | 44 | 优 | 1 |
| 246 | 2018-09-04 | 31 | 18 | 晴 | 西南风 | 3-4级 | 24 | 优 | 1 |
| 247 | 2018-09-05 | 31 | 19 | 晴~多云 | 西南风 | 3-4级 | 34 | 优 | 1 |
| 190 | 2018-07-10 | 30 | 22 | 多云~雷阵雨 | 南风 | 1-2级 | 48 | 优 | 1 |
| 220 | 2018-08-09 | 30 | 24 | 多云 | 南风 | 1-2级 | 49 | 优 | 1 |
| 227 | 2018-08-16 | 30 | 21 | 晴~多云 | 东北风 | 1-2级 | 40 | 优 | 1 |
| 235 | 2018-08-24 | 30 | 20 | 晴 | 北风 | 1-2级 | 40 | 优 | 1 |
| 219 | 2018-08-08 | 29 | 24 | 雷阵雨~阴 | 东北风 | 1-2级 | 45 | 优 | 1 |
| 225 | 2018-08-14 | 29 | 24 | 中雨~小雨 | 东北风 | 1-2级 | 42 | 优 | 1 |
| 241 | 2018-08-30 | 29 | 20 | 多云 | 南风 | 1-2级 | 47 | 优 | 1 |
| 242 | 2018-08-31 | 29 | 20 | 多云~阴 | 东南风 | 1-2级 | 48 | 优 | 1 |
| 137 | 2018-05-18 | 28 | 16 | 多云~晴 | 南风 | 1-2级 | 49 | 优 | 1 |
| 204 | 2018-07-24 | 28 | 26 | 暴雨~雷阵雨 | 东北风 | 3-4级 | 29 | 优 | 1 |
| 229 | 2018-08-18 | 28 | 23 | 小雨~中雨 | 北风 | 3-4级 | 40 | 优 | 1 |
| 233 | 2018-08-22 | 28 | 21 | 雷阵雨~多云 | 西南风 | 1-2级 | 48 | 优 | 1 |
| 192 | 2018-07-12 | 27 | 22 | 多云 | 南风 | 1-2级 | 46 | 优 | 1 |
| 197 | 2018-07-17 | 27 | 23 | 中雨~雷阵雨 | 西风 | 1-2级 | 28 | 优 | 1 |
| 243 | 2018-09-01 | 27 | 19 | 阴~小雨 | 南风 | 1-2级 | 50 | 优 | 1 |
| 248 | 2018-09-06 | 27 | 18 | 多云~晴 | 西北风 | 4-5级 | 37 | 优 | 1 |
| … | … | … | … | … | … | … | … | … | … |
| 142 | 2018-05-23 | 29 | 15 | 晴 | 西南风 | 3-4级 | 153 | 中度污染 | 4 |
| 109 | 2018-04-20 | 28 | 14 | 多云~小雨 | 南风 | 4-5级 | 164 | 中度污染 | 4 |
| 131 | 2018-05-12 | 28 | 16 | 小雨 | 东南风 | 3-4级 | 186 | 中度污染 | 4 |
| 108 | 2018-04-19 | 26 | 13 | 多云 | 东南风 | 4-5级 | 170 | 中度污染 | 4 |
| 84 | 2018-03-26 | 25 | 7 | 多云 | 西南风 | 1-2级 | 151 | 中度污染 | 4 |
| 124 | 2018-05-05 | 25 | 13 | 多云 | 北风 | 3-4级 | 177 | 中度污染 | 4 |
| 286 | 2018-10-14 | 21 | 10 | 多云 | 南风 | 1-2级 | 177 | 中度污染 | 4 |
| 294 | 2018-10-22 | 19 | 5 | 多云~晴 | 西北风 | 1-2级 | 176 | 中度污染 | 4 |
| 287 | 2018-10-15 | 17 | 11 | 小雨 | 北风 | 1-2级 | 198 | 中度污染 | 4 |
| 70 | 2018-03-12 | 15 | 3 | 多云~晴 | 南风 | 1-2级 | 174 | 中度污染 | 4 |
| 68 | 2018-03-10 | 14 | -2 | 晴 | 东南风 | 1-2级 | 171 | 中度污染 | 4 |
| 56 | 2018-02-26 | 12 | -1 | 晴~多云 | 西南风 | 1-2级 | 157 | 中度污染 | 4 |
| 330 | 2018-11-27 | 9 | -3 | 晴~多云 | 西北风 | 2级 | 198 | 中度污染 | 4 |
| 336 | 2018-12-03 | 8 | -3 | 多云~晴 | 东北风 | 3级 | 198 | 中度污染 | 4 |
| 334 | 2018-12-01 | 7 | 0 | 多云 | 东南风 | 1级 | 163 | 中度污染 | 4 |
| 13 | 2018-01-14 | 6 | -5 | 晴~多云 | 西北风 | 1-2级 | 187 | 中度污染 | 4 |
| 49 | 2018-02-19 | 6 | -3 | 多云 | 南风 | 1-2级 | 183 | 中度污染 | 4 |
| 85 | 2018-03-27 | 27 | 11 | 晴 | 南风 | 1-2级 | 243 | 重度污染 | 5 |
| 91 | 2018-04-02 | 26 | 11 | 多云 | 北风 | 1-2级 | 287 | 重度污染 | 5 |
| 90 | 2018-04-01 | 25 | 11 | 晴~多云 | 南风 | 1-2级 | 218 | 重度污染 | 5 |
| 71 | 2018-03-13 | 17 | 5 | 晴~多云 | 南风 | 1-2级 | 287 | 重度污染 | 5 |
| 306 | 2018-11-03 | 16 | 6 | 多云 | 南风 | 1-2级 | 206 | 重度污染 | 5 |
| 72 | 2018-03-14 | 15 | 6 | 多云~阴 | 东北风 | 1-2级 | 293 | 重度污染 | 5 |
| 61 | 2018-03-03 | 13 | 3 | 多云~阴 | 北风 | 1-2级 | 214 | 重度污染 | 5 |
| 316 | 2018-11-13 | 13 | 5 | 多云 | 东南风 | 1-2级 | 219 | 重度污染 | 5 |
| 317 | 2018-11-14 | 13 | 5 | 多云 | 南风 | 1-2级 | 266 | 重度污染 | 5 |
| 329 | 2018-11-26 | 10 | 0 | 多云 | 东南风 | 1级 | 245 | 重度污染 | 5 |
| 335 | 2018-12-02 | 9 | 2 | 雾~多云 | 东北风 | 1级 | 234 | 重度污染 | 5 |
| 57 | 2018-02-27 | 7 | 0 | 阴 | 东风 | 1-2级 | 220 | 重度污染 | 5 |
| 86 | 2018-03-28 | 25 | 9 | 多云~晴 | 东风 | 1-2级 | 387 | 严重污染 | 6 |
365 rows × 9 columns
前面我们已经使用了字符串的处理函数:
df[“bWendu”].str.replace(“℃”, “”).astype(‘int32’)
*Pandas的字符串处理:*
*Series.str字符串方法列表参考文档:*
https://pandas.pydata.org/pandas-docs/stable/reference/series.html#string-handling
*本节演示内容:*
In [5]:
import pandas as pdIn [6]:
fpath = "./pandas-learn-code/datas/beijing_tianqi/beijing_tianqi_2018.csv"df = pd.read_csv(fpath)In [8]:
df.head()Out[8]:
| 0 | 2018-01-01 | 3℃ | -6℃ | 晴~多云 | 东北风 | 1-2级 | 59 | 良 | 2 |
| 1 | 2018-01-02 | 2℃ | -5℃ | 阴~多云 | 东北风 | 1-2级 | 49 | 优 | 1 |
| 2 | 2018-01-03 | 2℃ | -5℃ | 多云 | 北风 | 1-2级 | 28 | 优 | 1 |
| 3 | 2018-01-04 | 0℃ | -8℃ | 阴 | 东北风 | 1-2级 | 28 | 优 | 1 |
| 4 | 2018-01-05 | 3℃ | -6℃ | 多云~晴 | 西北风 | 1-2级 | 50 | 优 | 1 |
In [13]:
df.dtypesOut[13]:
ymd objectbWendu objectyWendu objecttianqi objectfengxiang objectfengli objectaqi int64aqiInfo objectaqiLevel int64dtype: objectIn [14]:
df["bWendu"].strOut[14]:
<pandas.core.strings.StringMethods at 0x25ffbcce898>In [15]:
# 字符串替换函数df.loc[:, "bWendu"] = df["bWendu"].str.replace("℃","").astype("int32")In [16]:
df.head()Out[16]:
| 0 | 2018-01-01 | 3 | -6℃ | 晴~多云 | 东北风 | 1-2级 | 59 | 良 | 2 |
| 1 | 2018-01-02 | 2 | -5℃ | 阴~多云 | 东北风 | 1-2级 | 49 | 优 | 1 |
| 2 | 2018-01-03 | 2 | -5℃ | 多云 | 北风 | 1-2级 | 28 | 优 | 1 |
| 3 | 2018-01-04 | 0 | -8℃ | 阴 | 东北风 | 1-2级 | 28 | 优 | 1 |
| 4 | 2018-01-05 | 3 | -6℃ | 多云~晴 | 西北风 | 1-2级 | 50 | 优 | 1 |
In [19]:
# 判断是不是数字df["yWendu"].str.isnumeric()Out[19]:
0 False1 False2 False3 False4 False5 False6 False7 False8 False9 False10 False11 False12 False13 False14 False15 False16 False17 False18 False19 False20 False21 False22 False23 False24 False25 False26 False27 False28 False29 False... 335 False336 False337 False338 False339 False340 False341 False342 False343 False344 False345 False346 False347 False348 False349 False350 False351 False352 False353 False354 False355 False356 False357 False358 False359 False360 False361 False362 False363 False364 FalseName: yWendu, Length: 365, dtype: boolIn [21]:
# 在数列列上调用str会报错df["aqi"].str.len(). . .
## 2. 使用str的startswith, contains等得到的bool的Series可以做条件查询In [23]:
# 查询三月数据condition = df["ymd"].str.startswith("2018-03")In [25]:
condition. . .
In [27]:
df[condition].head()Out[27]:
| 59 | 2018-03-01 | 8 | -3℃ | 多云 | 西南风 | 1-2级 | 46 | 优 | 1 |
| 60 | 2018-03-02 | 9 | -1℃ | 晴~多云 | 北风 | 1-2级 | 95 | 良 | 2 |
| 61 | 2018-03-03 | 13 | 3℃ | 多云~阴 | 北风 | 1-2级 | 214 | 重度污染 | 5 |
| 62 | 2018-03-04 | 7 | -2℃ | 阴~多云 | 东南风 | 1-2级 | 144 | 轻度污染 | 3 |
| 63 | 2018-03-05 | 8 | -3℃ | 晴 | 南风 | 1-2级 | 94 | 良 | 2 |
In [28]:
df["ymd"].str.replace("-",""). . .
In [29]:
# 每次调用函数,都返回一个新Series# 不能直接在Series上调用str方法df["ymd"].str.replace("-","").slice(0, 6)---------------------------------------------------------------------------AttributeError Traceback (most recent call last)<ipython-input-29-a6e2e7006edf> in <module>1 # 每次调用函数,都返回一个新Series----> 2 df["ymd"].str.replace("-","").slice(0, 6)D:\Tools\Anaconda3\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)5065 if self._info_axis._can_hold_identifiers_and_holds_name(name):5066 return self[name]-> 5067 return object.__getattribute__(self, name)5068 5069 def __setattr__(self, name, value):AttributeError: 'Series' object has no attribute 'slice'In [31]:
# replace后得到的是Series,通过再次.str后才能切片df["ymd"].str.replace("-","").str.slice(0, 6)Out[31]:
0 2018011 2018012 2018013 2018014 2018015 2018016 2018017 2018018 2018019 20180110 20180111 20180112 20180113 20180114 20180115 20180116 20180117 20180118 20180119 20180120 20180121 20180122 20180123 20180124 20180125 20180126 20180127 20180128 20180129 201801... 335 201812336 201812337 201812338 201812339 201812340 201812341 201812342 201812343 201812344 201812345 201812346 201812347 201812348 201812349 201812350 201812351 201812352 201812353 201812354 201812355 201812356 201812357 201812358 201812359 201812360 201812361 201812362 201812363 201812364 201812Name: ymd, Length: 365, dtype: objectIn [32]:
# slice就是切片语法,可以直接调用df["ymd"].str.replace("-","").str[0:6]Out[32]:
0 2018011 2018012 2018013 2018014 2018015 2018016 2018017 2018018 2018019 20180110 20180111 20180112 20180113 20180114 20180115 20180116 20180117 20180118 20180119 20180120 20180121 20180122 20180123 20180124 20180125 20180126 20180127 20180128 20180129 201801... 335 201812336 201812337 201812338 201812339 201812340 201812341 201812342 201812343 201812344 201812345 201812346 201812347 201812348 201812349 201812350 201812351 201812352 201812353 201812354 201812355 201812356 201812357 201812358 201812359 201812360 201812361 201812362 201812363 201812364 201812Name: ymd, Length: 365, dtype: objectIn [37]:
df.head()Out[37]:
| 0 | 2018-01-01 | 3 | -6℃ | 晴~多云 | 东北风 | 1-2级 | 59 | 良 | 2 |
| 1 | 2018-01-02 | 2 | -5℃ | 阴~多云 | 东北风 | 1-2级 | 49 | 优 | 1 |
| 2 | 2018-01-03 | 2 | -5℃ | 多云 | 北风 | 1-2级 | 28 | 优 | 1 |
| 3 | 2018-01-04 | 0 | -8℃ | 阴 | 东北风 | 1-2级 | 28 | 优 | 1 |
| 4 | 2018-01-05 | 3 | -6℃ | 多云~晴 | 西北风 | 1-2级 | 50 | 优 | 1 |
In [42]:
# 添加新列def get_nianyueri(x):year, month, day = x["ymd"].split("-")return f"{year}年{month}月{day}日"df["中文日期"] = df.apply(get_nianyueri, axis=1)In [40]:
df["中文日期"]. . .
问题:怎么将"2018年12月31日"中的年,月,日三个中文字符去除In [44]:
# 方法1:链式replacedf["中文日期"].str.replace("年","").str.replace("月","").str.replace("日",""). . .
In [43]:
# 方法2:正则表达式替换df["中文日期"].str.replace("[年月日]",""). . .
In [ ]:
In [2]:
import pandas as pdimport numpy as npIn [7]:
df = pd.DataFrame(np.arange(12).reshape(3,4),columns = ["A", "B", "C", "D"])In [8]:
dfOut[8]:
| 0 | 0 | 1 | 2 | 3 |
| 1 | 4 | 5 | 6 | 7 |
| 2 | 8 | 9 | 10 | 11 |
In [9]:
# 代表的就是删除某列df.drop("A", axis=1)Out[9]:
| 0 | 1 | 2 | 3 |
| 1 | 5 | 6 | 7 |
| 2 | 9 | 10 | 11 |
In [10]:
# 代表的就是删除某行df.drop(1, axis=0)Out[10]:
| 0 | 0 | 1 | 2 | 3 |
| 2 | 8 | 9 | 10 | 11 |
In [11]:
dfOut[11]:
| 0 | 0 | 1 | 2 | 3 |
| 1 | 4 | 5 | 6 | 7 |
| 2 | 8 | 9 | 10 | 11 |
In [16]:
# axis=0 or axis=indexdf.mean(axis=0)Out[16]:
A 4.0B 5.0C 6.0D 7.0dtype: float64[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-2VilxhcD-1597761927700)(http://localhost:8888/notebooks/pandas-learn-code/other_files/pandas-axis-index.png)]
In [21]:
dfOut[21]:
| 0 | 0 | 1 | 2 | 3 |
| 1 | 4 | 5 | 6 | 7 |
| 2 | 8 | 9 | 10 | 11 |
In [22]:
# axis=1 or axis = columnsdf.mean(axis=1)Out[22]:
0 1.51 5.52 9.5dtype: float64[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-XahGzry0-1597761927702)(http://localhost:8888/notebooks/pandas-learn-code/other_files/pandas-axis-columns.png)]
In [23]:
def get_sum_value(x):return x["A"] + x["B"] + x["C"] + x["D"]df["sum_value"] = df.apply(get_sum_value, axis=1)In [24]:
dfOut[24]:
| 0 | 0 | 1 | 2 | 3 | 6 |
| 1 | 4 | 5 | 6 | 7 | 22 |
| 2 | 8 | 9 | 10 | 11 | 38 |
In [27]:
df["A"]Out[27]:
0 01 42 8Name: A, dtype: int32把数据存储于普通的column列也能用于数据查询,那使用index有什么好处?
index的用途总结:
In [1]:
import pandas as pdIn [2]:
df = pd.read_csv("./pandas-learn-code/datas/ml-latest-small/ratings.csv")In [3]:
df.head()Out[3]:
| 0 | 1 | 1 | 4.0 | 964982703 |
| 1 | 1 | 3 | 4.0 | 964981247 |
| 2 | 1 | 6 | 4.0 | 964982224 |
| 3 | 1 | 47 | 5.0 | 964983815 |
| 4 | 1 | 50 | 5.0 | 964982931 |
In [4]:
df.count()Out[4]:
userId 100836movieId 100836rating 100836timestamp 100836dtype: int64In [5]:
# drop==False,让索引列还保持在column# 下列代码实现了将userId设置成了index,同时保留了userIddf.set_index("userId", inplace=True, drop=False)In [6]:
df.head()Out[6]:
| userId | ||||
| 1 | 1 | 1 | 4.0 | 964982703 |
| 1 | 1 | 3 | 4.0 | 964981247 |
| 1 | 1 | 6 | 4.0 | 964982224 |
| 1 | 1 | 47 | 5.0 | 964983815 |
| 1 | 1 | 50 | 5.0 | 964982931 |
In [7]:
df.indexOut[7]:
Int64Index([ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,...610, 610, 610, 610, 610, 610, 610, 610, 610, 610],dtype='int64', name='userId', length=100836)In [8]:
# 使用index的查询方法:在loc[]中直接写入要查询的参数# 查询userId为500的用户信息df.loc[500].head(5)Out[8]:
| userId | ||||
| 500 | 500 | 1 | 4.0 | 1005527755 |
| 500 | 500 | 11 | 1.0 | 1005528017 |
| 500 | 500 | 39 | 1.0 | 1005527926 |
| 500 | 500 | 101 | 1.0 | 1005527980 |
| 500 | 500 | 104 | 4.0 | 1005528065 |
In [9]:
# 使用column的condition查询方法df.loc[df["userId"]==500].head()Out[9]:
| userId | ||||
| 500 | 500 | 1 | 4.0 | 1005527755 |
| 500 | 500 | 11 | 1.0 | 1005528017 |
| 500 | 500 | 39 | 1.0 | 1005527926 |
| 500 | 500 | 101 | 1.0 | 1005527980 |
| 500 | 500 | 104 | 4.0 | 1005528065 |
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-3xFJcngE-1597761927705)(http://localhost:8888/notebooks/pandas-learn-code/other_files/pandas-index-performance.png)]
In [11]:
# 将数据随机打散from sklearn.utils import shuffledf_shuffle = shuffle(df)In [12]:
df_shuffle.head()Out[12]:
| userId | ||||
| 244 | 244 | 1377 | 4.0 | 975093513 |
| 413 | 413 | 3753 | 5.0 | 1484439911 |
| 280 | 280 | 6539 | 3.5 | 1348435219 |
| 18 | 18 | 86332 | 3.5 | 1455051197 |
| 274 | 274 | 3160 | 2.5 | 1197275106 |
In [13]:
# 索引是否是递增的df_shuffle.index.is_monotonic_increasingOut[13]:
FalseIn [14]:
# 索引是否是唯一的df_shuffle.index.is_uniqueOut[14]:
FalseIn [15]:
# 计时,查看id==500的数据性能# %timeit将名称执行多次,查看性能%timeit df_shuffle.loc[500]366 µs ± 7.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)In [17]:
# 将df_shuffle进行排序df_sorted = df_shuffle.sort_index()In [18]:
df_sorted.head()Out[18]:
| userId | ||||
| 1 | 1 | 3578 | 5.0 | 964980668 |
| 1 | 1 | 2406 | 4.0 | 964982310 |
| 1 | 1 | 110 | 4.0 | 964982176 |
| 1 | 1 | 2090 | 5.0 | 964982838 |
| 1 | 1 | 2096 | 4.0 | 964982838 |
In [19]:
# 索引是否是递增的df_sorted.index.is_monotonic_increasingOut[19]:
TrueIn [20]:
df_sorted.index.is_uniqueOut[20]:
FalseIn [21]:
%timeit df_sorted.loc[500]178 µs ± 4.55 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)包括Series和DateFrame
In [22]:
s1 = pd.Series([1,2,3], index=list("abc"))In [23]:
s1Out[23]:
a 1b 2c 3dtype: int64In [24]:
s2 = pd.Series([2,3,4], index=list("bcd"))In [25]:
s2Out[25]:
b 2c 3d 4dtype: int64In [26]:
s1 + s2Out[26]:
a NaNb 4.0c 6.0d NaNdtype: float64*很多强大的索引数据结构*
Pandas的Merge,相当于Sql的Join,将不同的表按key关联到一个表
pd.merge(left, right, how=‘inner’, on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=True, suffixes=(’*x’, ‘*y’), copy=True, indicator=False, validate=None)
文档地址:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
本次讲解提纲:
是推荐系统研究的很好的数据集
位于本代码目录:./datas/movielens-1m
包含三个文件:
可以关联三个表,得到一个完整的大表
数据集官方地址:https://grouplens.org/datasets/movielens/
In [8]:
import pandas as pdIn [12]:
df_ratings = pd.read_csv("./pandas-learn-code/datas/movielens-1m/ratings.dat",sep = "::",engine = "python",names = "UserID::MovieID::Rating::Timestamp".split("::"))In [13]:
df_ratings.head()Out[13]:
| 0 | 1 | 1193 | 5 | 978300760 |
| 1 | 1 | 661 | 3 | 978302109 |
| 2 | 1 | 914 | 3 | 978301968 |
| 3 | 1 | 3408 | 4 | 978300275 |
| 4 | 1 | 2355 | 5 | 978824291 |
In [14]:
df_users = pd.read_csv("./pandas-learn-code/datas/movielens-1m/users.dat",sep = "::",engine = "python",names = "UserID::Gender::Age::Occupation::Zip-code".split("::"))In [15]:
df_users.head()Out[15]:
| 0 | 1 | F | 1 | 10 | 48067 |
| 1 | 2 | M | 56 | 16 | 70072 |
| 2 | 3 | M | 25 | 15 | 55117 |
| 3 | 4 | M | 45 | 7 | 02460 |
| 4 | 5 | M | 25 | 20 | 55455 |
In [17]:
df_movies = pd.read_csv("./pandas-learn-code/datas/movielens-1m/movies.dat",sep = "::",engine = "python",names = "MovieID::Title::Genres".split("::"))In [18]:
df_movies.head()Out[18]:
| 0 | 1 | Toy Story (1995) | Animation|Children’s|Comedy |
| 1 | 2 | Jumanji (1995) | Adventure|Children’s|Fantasy |
| 2 | 3 | Grumpier Old Men (1995) | Comedy|Romance |
| 3 | 4 | Waiting to Exhale (1995) | Comedy|Drama |
| 4 | 5 | Father of the Bride Part II (1995) | Comedy |
In [ ]:
df_In [21]:
# inner:两边都有某个数据时才会保留df_ratings_users = pd.merge(df_ratings, df_users, left_on="UserID", right_on="UserID", how="inner")In [22]:
df_ratings_users.head()Out[22]:
| 0 | 1 | 1193 | 5 | 978300760 | F | 1 | 10 | 48067 |
| 1 | 1 | 661 | 3 | 978302109 | F | 1 | 10 | 48067 |
| 2 | 1 | 914 | 3 | 978301968 | F | 1 | 10 | 48067 |
| 3 | 1 | 3408 | 4 | 978300275 | F | 1 | 10 | 48067 |
| 4 | 1 | 2355 | 5 | 978824291 | F | 1 | 10 | 48067 |
In [25]:
df_ratings_users_movies = pd.merge(df_ratings_users, df_movies, left_on="MovieID", right_on="MovieID", how="inner")In [26]:
df_ratings_users_movies.head()Out[26]:
| 0 | 1 | 1193 | 5 | 978300760 | F | 1 | 10 | 48067 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
| 1 | 2 | 1193 | 5 | 978298413 | M | 56 | 16 | 70072 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
| 2 | 12 | 1193 | 4 | 978220179 | M | 25 | 12 | 32793 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
| 3 | 15 | 1193 | 4 | 978199279 | M | 25 | 7 | 22903 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
| 4 | 17 | 1193 | 5 | 978158471 | M | 50 | 1 | 95350 | One Flew Over the Cuckoo’s Nest (1975) | Drama |
以下关系要正确理解:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-gL3djpVk-1597761927707)(http://localhost:8888/notebooks/pandas-learn-code/other_files/pandas-merge-one-to-one.png)]
In [31]:
left = pd.DataFrame({"sno":[11, 12, 13, 14],"name":["name_a","name_b","name_c","name_d"]})leftOut[31]:
| 0 | 11 | name_a |
| 1 | 12 | name_b |
| 2 | 13 | name_c |
| 3 | 14 | name_d |
In [28]:
right = pd.DataFrame({"sno":[11, 12, 13, 14],"age":["21","22","23","24"]})rightOut[28]:
| 0 | 11 | 21 |
| 1 | 12 | 22 |
| 2 | 13 | 23 |
| 3 | 14 | 24 |
In [30]:
# 一对一关系,结果中有4条pd.merge(left, right, on="sno")Out[30]:
| 0 | 11 | name_a | 21 |
| 1 | 12 | name_b | 22 |
| 2 | 13 | name_c | 23 |
| 3 | 14 | name_d | 24 |
注意:数据会被复制
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7xToCx8V-1597761927707)(http://localhost:8888/notebooks/pandas-learn-code/other_files/pandas-merge-one-to-many.png)]
In [32]:
left = pd.DataFrame({"sno":[11, 12, 13, 14],"name":["name_a","name_b","name_c","name_d"]})leftOut[32]:
| 0 | 11 | name_a |
| 1 | 12 | name_b |
| 2 | 13 | name_c |
| 3 | 14 | name_d |
In [33]:
right = pd.DataFrame({"sno":[11, 11, 11, 12, 12, 13],"grade":["语文88","数学90","英语75","语文66", "数学55", "英语29"]})rightOut[33]:
| 0 | 11 | 语文88 |
| 1 | 11 | 数学90 |
| 2 | 11 | 英语75 |
| 3 | 12 | 语文66 |
| 4 | 12 | 数学55 |
| 5 | 13 | 英语29 |
In [35]:
# 数目以多的一边为准pd.merge(left, right, on="sno")Out[35]:
| 0 | 11 | name_a | 语文88 |
| 1 | 11 | name_a | 数学90 |
| 2 | 11 | name_a | 英语75 |
| 3 | 12 | name_b | 语文66 |
| 4 | 12 | name_b | 数学55 |
| 5 | 13 | name_c | 英语29 |
注意:结果数量会出现乘法
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Y9grLyta-1597761927708)(http://localhost:8888/notebooks/pandas-learn-code/other_files/pandas-merge-many-to-many.png)]
In [36]:
left = pd.DataFrame({"sno":[11, 11, 12, 12, 12],"爱好":["篮球","羽毛球","乒乓球","篮球", "足球"]})leftOut[36]:
| 0 | 11 | 篮球 |
| 1 | 11 | 羽毛球 |
| 2 | 12 | 乒乓球 |
| 3 | 12 | 篮球 |
| 4 | 12 | 足球 |
In [37]:
right = pd.DataFrame({"sno":[11, 11, 11, 12, 12, 13],"grade":["语文88","数学90","英语75","语文66", "数学55", "英语29"]})rightOut[37]:
| 0 | 11 | 语文88 |
| 1 | 11 | 数学90 |
| 2 | 11 | 英语75 |
| 3 | 12 | 语文66 |
| 4 | 12 | 数学55 |
| 5 | 13 | 英语29 |
In [38]:
pd.merge(left, right, on="sno")Out[38]:
| 0 | 11 | 篮球 | 语文88 |
| 1 | 11 | 篮球 | 数学90 |
| 2 | 11 | 篮球 | 英语75 |
| 3 | 11 | 羽毛球 | 语文88 |
| 4 | 11 | 羽毛球 | 数学90 |
| 5 | 11 | 羽毛球 | 英语75 |
| 6 | 12 | 乒乓球 | 语文66 |
| 7 | 12 | 乒乓球 | 数学55 |
| 8 | 12 | 篮球 | 语文66 |
| 9 | 12 | 篮球 | 数学55 |
| 10 | 12 | 足球 | 语文66 |
| 11 | 12 | 足球 | 数学55 |
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-45nVqPR9-1597761927708)(http://localhost:8888/notebooks/pandas-learn-code/other_files/pandas-leftjoin-rightjoin-outerjoin.png)]
In [52]:
left = pd.DataFrame({"key":["K0", "K1", "K2", "K3"],"A":["A0","A1","A2","A3"],"B":["B0","B1","B2","B3"]})right = pd.DataFrame({"key":["K0", "K1", "K4", "K5"],"C":["C0","C1","C2","C3"],"D":["D0","D1","D2","D3"]})In [53]:
leftOut[53]:
| 0 | K0 | A0 | B0 |
| 1 | K1 | A1 | B1 |
| 2 | K2 | A2 | B2 |
| 3 | K3 | A3 | B3 |
In [54]:
rightOut[54]:
| 0 | K0 | C0 | D0 |
| 1 | K1 | C1 | D1 |
| 2 | K4 | C2 | D2 |
| 3 | K5 | C3 | D3 |
左边和右边的key都有,才会出现在结果里
In [55]:
pd.merge(left, right, how="inner")Out[55]:
| 0 | K0 | A0 | B0 | C0 | D0 |
| 1 | K1 | A1 | B1 | C1 | D1 |
左边的都会出现在结果里,右边的如果无法匹配则为Null
In [56]:
pd.merge(left, right, how="left")Out[56]:
| 0 | K0 | A0 | B0 | C0 | D0 |
| 1 | K1 | A1 | B1 | C1 | D1 |
| 2 | K2 | A2 | B2 | NaN | NaN |
| 3 | K3 | A3 | B3 | NaN | NaN |
右边的都会出现在结果里,左边的如果无法匹配则为Null
In [57]:
pd.merge(left, right, how="right")Out[57]:
| 0 | K0 | A0 | B0 | C0 | D0 |
| 1 | K1 | A1 | B1 | C1 | D1 |
| 2 | K4 | NaN | NaN | C2 | D2 |
| 3 | K5 | NaN | NaN | C3 | D3 |
左边、右边的都会出现在结果里,如果无法匹配则为Null
In [58]:
pd.merge(left, right, how="outer")Out[58]:
| 0 | K0 | A0 | B0 | C0 | D0 |
| 1 | K1 | A1 | B1 | C1 | D1 |
| 2 | K2 | A2 | B2 | NaN | NaN |
| 3 | K3 | A3 | B3 | NaN | NaN |
| 4 | K4 | NaN | NaN | C2 | D2 |
| 5 | K5 | NaN | NaN | C3 | D3 |
In [61]:
left = pd.DataFrame({"key":["K0", "K1", "K2", "K3"],"A":["A0","A1","A2","A3"],"B":["B0","B1","B2","B3"]})right = pd.DataFrame({"key":["K0", "K1", "K4", "K5"],"A":["A10","A11","A12","A13"],"D":["D0","D1","D4","D5"]})In [60]:
leftOut[60]:
| 0 | K0 | A0 | B0 |
| 1 | K1 | A1 | B1 |
| 2 | K2 | A2 | B2 |
| 3 | K3 | A3 | B3 |
In [62]:
rightOut[62]:
| 0 | K0 | A10 | D0 |
| 1 | K1 | A11 | D1 |
| 2 | K4 | A12 | D4 |
| 3 | K5 | A13 | D5 |
In [64]:
pd.merge(left, right, on="key")Out[64]:
| 0 | K0 | A0 | B0 | A10 | D0 |
| 1 | K1 | A1 | B1 | A11 | D1 |
In [65]:
# 两个元素的后缀,如果列有重名,自动添加后缀,默认是('x', 'y')pd.merge(left, right, on="key", suffixes=('_left', '_right'))Out[65]:
| 0 | K0 | A0 | B0 | A10 | D0 |
| 1 | K1 | A1 | B1 | A11 | D1 |
批量合并相同格式的Excel、给DataFrame添加行、给DataFrame添加列
append只有按行合并,没有按列合并,相当于concat按行的简写形式
In [1]:
import pandas as pdimport warningswarnings.filterwarnings("ignore")In [2]:
df1 = pd.DataFrame({"A":["A0","A1","A2","A3"],"B":["B0","B1","B2","B3"],"C":["C0","C1","C2","C3"],"D":["D0","D1","D2","D3"],"E":["E0","E1","E2","E3"]})df1Out[2]:
| 0 | A0 | B0 | C0 | D0 | E0 |
| 1 | A1 | B1 | C1 | D1 | E1 |
| 2 | A2 | B2 | C2 | D2 | E2 |
| 3 | A3 | B3 | C3 | D3 | E3 |
In [3]:
df2 = pd.DataFrame({"A":["A4","A5","A6","A7"],"B":["B4","B5","B6","B7"],"C":["C4","C5","C6","C7"],"D":["D4","D5","D6","D7"],"F":["F4","F5","F6","F7"]})df2Out[3]:
| 0 | A4 | B4 | C4 | D4 | F4 |
| 1 | A5 | B5 | C5 | D5 | F5 |
| 2 | A6 | B6 | C6 | D6 | F6 |
| 3 | A7 | B7 | C7 | D7 | F7 |
In [4]:
pd.concat([df1, df2])Out[4]:
| 0 | A0 | B0 | C0 | D0 | E0 | NaN |
| 1 | A1 | B1 | C1 | D1 | E1 | NaN |
| 2 | A2 | B2 | C2 | D2 | E2 | NaN |
| 3 | A3 | B3 | C3 | D3 | E3 | NaN |
| 0 | A4 | B4 | C4 | D4 | NaN | F4 |
| 1 | A5 | B5 | C5 | D5 | NaN | F5 |
| 2 | A6 | B6 | C6 | D6 | NaN | F6 |
| 3 | A7 | B7 | C7 | D7 | NaN | F7 |
In [5]:
pd.concat([df1, df2], ignore_index=True)Out[5]:
| 0 | A0 | B0 | C0 | D0 | E0 | NaN |
| 1 | A1 | B1 | C1 | D1 | E1 | NaN |
| 2 | A2 | B2 | C2 | D2 | E2 | NaN |
| 3 | A3 | B3 | C3 | D3 | E3 | NaN |
| 4 | A4 | B4 | C4 | D4 | NaN | F4 |
| 5 | A5 | B5 | C5 | D5 | NaN | F5 |
| 6 | A6 | B6 | C6 | D6 | NaN | F6 |
| 7 | A7 | B7 | C7 | D7 | NaN | F7 |
In [6]:
pd.concat([df1, df2], ignore_index=True, join="inner")Out[6]:
| 0 | A0 | B0 | C0 | D0 |
| 1 | A1 | B1 | C1 | D1 |
| 2 | A2 | B2 | C2 | D2 |
| 3 | A3 | B3 | C3 | D3 |
| 4 | A4 | B4 | C4 | D4 |
| 5 | A5 | B5 | C5 | D5 |
| 6 | A6 | B6 | C6 | D6 |
| 7 | A7 | B7 | C7 | D7 |
In [7]:
df1Out[7]:
| 0 | A0 | B0 | C0 | D0 | E0 |
| 1 | A1 | B1 | C1 | D1 | E1 |
| 2 | A2 | B2 | C2 | D2 | E2 |
| 3 | A3 | B3 | C3 | D3 | E3 |
In [9]:
s1 = pd.Series(list(range(4)), name="F")pd.concat([df1, s1], axis=1)Out[9]:
| 0 | A0 | B0 | C0 | D0 | E0 | 0 |
| 1 | A1 | B1 | C1 | D1 | E1 | 1 |
| 2 | A2 | B2 | C2 | D2 | E2 | 2 |
| 3 | A3 | B3 | C3 | D3 | E3 | 3 |
In [10]:
s2 = df1.apply(lambda x:x["A"] + "_GG", axis=1)In [11]:
s2Out[11]:
0 A0_GG1 A1_GG2 A2_GG3 A3_GGdtype: objectIn [12]:
s2.name="G"In [13]:
# 列表可以只有Seriespd.concat([s1,s2], axis=1)Out[13]:
| 0 | 0 | A0_GG |
| 1 | 1 | A1_GG |
| 2 | 2 | A2_GG |
| 3 | 3 | A3_GG |
In [14]:
# 列表是可以混合顺序的pd.concat([s1, df1, s2], axis=1)Out[14]:
| 0 | 0 | A0 | B0 | C0 | D0 | E0 | A0_GG |
| 1 | 1 | A1 | B1 | C1 | D1 | E1 | A1_GG |
| 2 | 2 | A2 | B2 | C2 | D2 | E2 | A2_GG |
| 3 | 3 | A3 | B3 | C3 | D3 | E3 | A3_GG |
In [15]:
df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list("AB"))df1Out[15]:
| 0 | 1 | 2 |
| 1 | 3 | 4 |
In [16]:
df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list("AB"))df2Out[16]:
| 0 | 5 | 6 |
| 1 | 7 | 8 |
In [18]:
df1.append(df2)Out[18]:
| 0 | 1 | 2 |
| 1 | 3 | 4 |
| 0 | 5 | 6 |
| 1 | 7 | 8 |
In [19]:
df1.append(df2, ignore_index=True)Out[19]:
| 0 | 1 | 2 |
| 1 | 3 | 4 |
| 2 | 5 | 6 |
| 3 | 7 | 8 |
In [21]:
# 创建一个空的dfdf = pd.DataFrame(columns=["A"])dfOut[21]:
In [22]:
for i in range(5):# 注意:这里每次都在复制 df = df.append({"a":i}, ignore_index=True)dfOut[22]:
| 0 | NaN | 0.0 |
| 1 | NaN | 1.0 |
| 2 | NaN | 2.0 |
| 3 | NaN | 3.0 |
| 4 | NaN | 4.0 |
In [23]:
# 第一个pd.concat([pd.DataFrame([i], columns=["A"]) for i in range(5)],ignore_index=True)Out[23]:
| 0 | 0 |
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
In [27]:
ss = pd.DataFrame( i for i in range(5))ssOut[27]:
| 0 | 0 |
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
实例演示:
In [51]:
work_dir = "D:/WinterIsComing/python/New_Wave/pandas_basic/15.excel_split_merge"# 用来放置拆分后的小文件splits_dir = f"{work_dir}/splits"In [52]:
import os if not os.path.exists(splits_dir):os.mkdir(splits_dir)In [54]:
import pandas as pdIn [53]:
df_source = pd.read_excel(r"D:/WinterIsComing/python/New_Wave/pandas_basic/15.excel_split_merge/crazyant_blog_articles_source.xlsx")In [55]:
df_source.head()Out[55]:
| 0 | 2585 | Tensorflow怎样接收变长列表特征 | python,tensorflow,特征工程 |
| 1 | 2583 | Pandas实现数据的合并concat | pandas,python,数据分析 |
| 2 | 2574 | Pandas的Index索引有什么用途? | pandas,python,数据分析 |
| 3 | 2564 | 机器学习常用数据集大全 | python,机器学习 |
| 4 | 2561 | 一个数据科学家的修炼路径 | 数据分析 |
In [56]:
df_source.indexOut[56]:
RangeIndex(start=0, stop=258, step=1)In [57]:
# 258行,3列df_source.shapeOut[57]:
(258, 3)In [58]:
# 通过df_source.shape得到元组(258, 3)# 通过df_source.shape[0]得到行数total_row_count = df_source.shape[0]total_row_countOut[58]:
258In [59]:
# 将一个大Excel,拆分给这几个人user_names = ["A", "B", "C", "D", "E", "F"]In [60]:
# 每个人的任务数目split_size = total_row_count // len(user_names)# 此处的作用在于如果有余数,可以将未分配的行数,分配给前面几人,保证所有的行都分配出去if total_row_count % len(user_names) != 0:split_size += 1split_sizeOut[60]:
43In [64]:
df_subs = []for idx, user_name in enumerate(user_names):# iloc的开始索引begin = idx*split_size# iloc的结束索引end = begin + split_size# 实现df按照iloc拆分df_sub = df_source.iloc[begin:end]# 将每个子df存入列表df_subs.append((idx, user_name, df_sub))df_subs[0][2].head(5)Out[64]:
| 0 | 2585 | Tensorflow怎样接收变长列表特征 | python,tensorflow,特征工程 |
| 1 | 2583 | Pandas实现数据的合并concat | pandas,python,数据分析 |
| 2 | 2574 | Pandas的Index索引有什么用途? | pandas,python,数据分析 |
| 3 | 2564 | 机器学习常用数据集大全 | python,机器学习 |
| 4 | 2561 | 一个数据科学家的修炼路径 | 数据分析 |
In [65]:
df_subs[1][2].head(5)Out[65]:
| 43 | 2120 | Zookeeper并不保证读取的是最新数据 | zookeeper |
| 44 | 2089 | Mybatis源码解读-初始化过程详解 | mybatis |
| 45 | 2076 | 怎样借助Python爬虫给宝宝起个好名字 | python,爬虫 |
| 46 | 2022 | Mybatis源码解读-设计模式总结 | mybatis,设计模式 |
| 47 | 2012 | 打工者心态、主人公意识、个人公司品牌 | 程序人生 |
In [63]:
for idx, user_name, df_sub in df_subs:file_name = f"{splits_dir}/spike_pandas_{idx}_{user_name}.xlsx"df_sub.to_excel(file_name, index=False)In [66]:
import osexcel_names = []# listdir返回指定目录下的所有文件和文件夹名称for excel_name in os.listdir(splits_dir):excel_names.append(excel_name)excel_namesOut[66]:
['spike_pandas_0_A.xlsx','spike_pandas_1_B.xlsx','spike_pandas_2_C.xlsx','spike_pandas_3_D.xlsx','spike_pandas_4_E.xlsx','spike_pandas_5_F.xlsx']In [70]:
df_list = []for excel_name in excel_names:# 读取每个excel到dfexcel_path = f"{splits_dir}/{excel_name}"df_split = pd.read_excel(excel_path)# 得到username,通过字符串切片username = excel_name.replace("spike_pandas_", "").replace(".xlsx", "")[2:]# print(username)# 给df_split添加一列usernamedf_split["username"] = usernamedf_list.append(df_split)In [71]:
df_merged = pd.concat(df_list)In [72]:
df_merged.shapeOut[72]:
(258, 4)In [74]:
df_merged.head()Out[74]:
| 0 | 2585 | Tensorflow怎样接收变长列表特征 | python,tensorflow,特征工程 | A |
| 1 | 2583 | Pandas实现数据的合并concat | pandas,python,数据分析 | A |
| 2 | 2574 | Pandas的Index索引有什么用途? | pandas,python,数据分析 | A |
| 3 | 2564 | 机器学习常用数据集大全 | python,机器学习 | A |
| 4 | 2561 | 一个数据科学家的修炼路径 | 数据分析 | A |
In [76]:
df_merged["username"].value_counts()Out[76]:
B 43F 43D 43E 43A 43C 43Name: username, dtype: int64In [77]:
df_merged.to_excel(f"{work_dir}/spike_pandas_merged.xlsx", index=False)类似SQL:
select city,max(temperature) from city_weather group by city;
groupby:先对数据分组,然后在每个分组上应用聚合函数、转换函数
本次演示:
一、分组使用聚合函数做数据统计
二、遍历groupby的结果理解执行流程
三、实例分组探索天气数据
In [1]:
import pandas as pdimport numpy as np# 加上这一句,能在jupyter notebook展示matplo图表%matplotlib inlineIn [4]:
df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],'B': ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],'C': np.random.randn(8),'D': np.random.randn(8)})dfOut[4]:
| 0 | foo | one | -0.102369 | 0.042233 |
| 1 | bar | one | 1.552845 | -0.623522 |
| 2 | foo | two | 0.770077 | 0.205682 |
| 3 | bar | three | -1.989910 | -0.617111 |
| 4 | foo | two | 1.230455 | -0.422428 |
| 5 | bar | two | -0.697516 | -0.964579 |
| 6 | foo | one | -0.939646 | -0.414017 |
| 7 | foo | three | 0.763570 | 0.451086 |
In [5]:
df.groupby("A").sum()Out[5]:
| A | ||
| bar | -1.134582 | -2.205211 |
| foo | 1.722086 | -0.137444 |
我们看到:
In [6]:
# 以A,B为索引,查询C,D的平均值df.groupby(["A", "B"]).mean()Out[6]:
| A | B | ||
| bar | one | 1.552845 | -0.623522 |
| three | -1.989910 | -0.617111 | |
| two | -0.697516 | -0.964579 | |
| foo | one | -0.521008 | -0.185892 |
| three | 0.763570 | 0.451086 | |
| two | 1.000266 | -0.108373 |
In [7]:
# 取消A.B作为索引df.groupby(["A", "B"], as_index=False).mean()Out[7]:
| 0 | bar | one | 1.552845 | -0.623522 |
| 1 | bar | three | -1.989910 | -0.617111 |
| 2 | bar | two | -0.697516 | -0.964579 |
| 3 | foo | one | -0.521008 | -0.185892 |
| 4 | foo | three | 0.763570 | 0.451086 |
| 5 | foo | two | 1.000266 | -0.108373 |
In [8]:
df.groupby("A").agg([np.sum, np.mean, np.std])Out[8]:
| sum | mean | std | sum | mean | std | |
| A | ||||||
| bar | -1.134582 | -0.378194 | 1.792834 | -2.205211 | -0.735070 | 0.198786 |
| foo | 1.722086 | 0.344417 | 0.864635 | -0.137444 | -0.027489 | 0.385242 |
我们看到:列变成了多级索引
In [10]:
# 预过滤,性能更好df.groupby("A")["C"].agg([np.sum, np.mean, np.std])Out[10]:
| A | |||
| bar | -1.134582 | -0.378194 | 1.792834 |
| foo | 1.722086 | 0.344417 | 0.864635 |
In [9]:
# 方法2df.groupby("A").agg([np.sum, np.mean, np.std])["C"]Out[9]:
| A | |||
| bar | -1.134582 | -0.378194 | 1.792834 |
| foo | 1.722086 | 0.344417 | 0.864635 |
In [12]:
# 以字典的形式对不同的列使用不同的聚合函数df.groupby("A").agg({"C":np.sum, "D":np.mean})Out[12]:
| A | ||
| bar | -1.134582 | -0.735070 |
| foo | 1.722086 | -0.027489 |
for循环可以直接遍历每个group
In [13]:
g = df.groupby("A")gOut[13]:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000024E95FCC320>In [16]:
dfOut[16]:
| 0 | foo | one | -0.102369 | 0.042233 |
| 1 | bar | one | 1.552845 | -0.623522 |
| 2 | foo | two | 0.770077 | 0.205682 |
| 3 | bar | three | -1.989910 | -0.617111 |
| 4 | foo | two | 1.230455 | -0.422428 |
| 5 | bar | two | -0.697516 | -0.964579 |
| 6 | foo | one | -0.939646 | -0.414017 |
| 7 | foo | three | 0.763570 | 0.451086 |
In [15]:
for name,group in g:print(name)print(group)print()# name:bar and foo# group:是两个DataFramebarA B C D1 bar one 1.552845 -0.6235223 bar three -1.989910 -0.6171115 bar two -0.697516 -0.964579fooA B C D0 foo one -0.102369 0.0422332 foo two 0.770077 0.2056824 foo two 1.230455 -0.4224286 foo one -0.939646 -0.4140177 foo three 0.763570 0.451086*可以获取单个分组的数据*
In [17]:
g.get_group("bar")Out[17]:
| 1 | bar | one | 1.552845 | -0.623522 |
| 3 | bar | three | -1.989910 | -0.617111 |
| 5 | bar | two | -0.697516 | -0.964579 |
In [20]:
g = df.groupby(["A", "B"])gOut[20]:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0000024E95ED3E80>In [21]:
for name, group in g:print(name)print(group)print()# 分组的名称变成了元组('bar', 'one')A B C D1 bar one 1.552845 -0.623522('bar', 'three')A B C D3 bar three -1.98991 -0.617111('bar', 'two')A B C D5 bar two -0.697516 -0.964579('foo', 'one')A B C D0 foo one -0.102369 0.0422336 foo one -0.939646 -0.414017('foo', 'three')A B C D7 foo three 0.76357 0.451086('foo', 'two')A B C D2 foo two 0.770077 0.2056824 foo two 1.230455 -0.422428可以看到,name是一个2个元素的tuple,代表不同的列
In [22]:
g.get_group(("foo", "one"))Out[22]:
| 0 | foo | one | -0.102369 | 0.042233 |
| 6 | foo | one | -0.939646 | -0.414017 |
*可以直接查询group后的某几列,生成Series或者子DataFrame*
In [24]:
# 获得一个SeriesGroupByg["C"]Out[24]:
<pandas.core.groupby.generic.SeriesGroupBy object at 0x0000024E95C42828>In [25]:
for name, group in g["C"]:print(name)print(group)print(type(group))print()('bar', 'one')1 1.552845Name: C, dtype: float64<class 'pandas.core.series.Series'>('bar', 'three')3 -1.98991Name: C, dtype: float64<class 'pandas.core.series.Series'>('bar', 'two')5 -0.697516Name: C, dtype: float64<class 'pandas.core.series.Series'>('foo', 'one')0 -0.1023696 -0.939646Name: C, dtype: float64<class 'pandas.core.series.Series'>('foo', 'three')7 0.76357Name: C, dtype: float64<class 'pandas.core.series.Series'>('foo', 'two')2 0.7700774 1.230455Name: C, dtype: float64<class 'pandas.core.series.Series'>其实所有的聚合统计,都是在dataframe和series上进行的;
In [27]:
fpath = "./pandas-learn-code/datas/beijing_tianqi/beijing_tianqi_2018.csv"df = pd.read_csv(fpath)df.head()Out[27]:
| 0 | 2018-01-01 | 3℃ | -6℃ | 晴~多云 | 东北风 | 1-2级 | 59 | 良 | 2 |
| 1 | 2018-01-02 | 2℃ | -5℃ | 阴~多云 | 东北风 | 1-2级 | 49 | 优 | 1 |
| 2 | 2018-01-03 | 2℃ | -5℃ | 多云 | 北风 | 1-2级 | 28 | 优 | 1 |
| 3 | 2018-01-04 | 0℃ | -8℃ | 阴 | 东北风 | 1-2级 | 28 | 优 | 1 |
| 4 | 2018-01-05 | 3℃ | -6℃ | 多云~晴 | 西北风 | 1-2级 | 50 | 优 | 1 |
In [28]:
# 替换掉温度的后缀℃df.loc[:, "bWendu"] = df["bWendu"].str.replace("℃","").astype("int32")df.loc[:, "yWendu"] = df["yWendu"].str.replace("℃","").astype("int32")df.head()Out[28]:
| 0 | 2018-01-01 | 3 | -6 | 晴~多云 | 东北风 | 1-2级 | 59 | 良 | 2 |
| 1 | 2018-01-02 | 2 | -5 | 阴~多云 | 东北风 | 1-2级 | 49 | 优 | 1 |
| 2 | 2018-01-03 | 2 | -5 | 多云 | 北风 | 1-2级 | 28 | 优 | 1 |
| 3 | 2018-01-04 | 0 | -8 | 阴 | 东北风 | 1-2级 | 28 | 优 | 1 |
| 4 | 2018-01-05 | 3 | -6 | 多云~晴 | 西北风 | 1-2级 | 50 | 优 | 1 |
In [29]:
# 新增一列为月份df["month"] = df["ymd"].str[:7]df.head()Out[29]:
| 0 | 2018-01-01 | 3 | -6 | 晴~多云 | 东北风 | 1-2级 | 59 | 良 | 2 | 2018-01 |
| 1 | 2018-01-02 | 2 | -5 | 阴~多云 | 东北风 | 1-2级 | 49 | 优 | 1 | 2018-01 |
| 2 | 2018-01-03 | 2 | -5 | 多云 | 北风 | 1-2级 | 28 | 优 | 1 | 2018-01 |
| 3 | 2018-01-04 | 0 | -8 | 阴 | 东北风 | 1-2级 | 28 | 优 | 1 | 2018-01 |
| 4 | 2018-01-05 | 3 | -6 | 多云~晴 | 西北风 | 1-2级 | 50 | 优 | 1 | 2018-01 |
In [31]:
data = df.groupby("month")["bWendu"].max()dataOut[31]:
month2018-01 72018-02 122018-03 272018-04 302018-05 352018-06 382018-07 372018-08 362018-09 312018-10 252018-11 182018-12 10Name: bWendu, dtype: int32In [32]:
type(data)Out[32]:
pandas.core.series.SeriesIn [34]:
data.plot()Out[34]:
<matplotlib.axes._subplots.AxesSubplot at 0x24e95bbe240>[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Ssxk6ssS-1597761927710)(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAXUAAAEKCAYAAADticXcAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzt3Xl8VNX9//HXJ5N9ISEkgZAEwx7CkqAhoLiLghvY1p0iClStti6lX63WumtbrdhNrQqIIlqtK0UlUiqgVtmErBB2SYBshGwkJCQ5vz9m9Ecpy2SZubN8no/HPDK5uXPv+0DymTvn3nOuGGNQSinlGwKsDqCUUqr7aFFXSikfokVdKaV8iBZ1pZTyIVrUlVLKh2hRV0opH6JFXSmlfIgWdaWU8iFa1JVSyocEunuHcXFxJjU11d27VUopr7Z+/foqY0z8ydZze1FPTU1l3bp17t6tUkp5NRH51pn1tPtFKaV8iBZ1pZTyIVrUlVLKh2hRV0opH6JFXSmlfIgWdaWU8iFa1JVSyoe4/Tp1pVxl7a5q8ktrGZEUzYikHoQH66+38j/6W6+83oGDLTzx8SbeWV/6/bIAgSG9o8hIjmFUSjQZyTEM7RNFkE0/nCrfpkVdeS1jDO9v2MPjH22irukwt507kOvH9qO4rJ7c0lpyS2rIKSrjrXUlAIQEBjC8bw8yUmLISI4hIyWG1F7hiIjFLVGq+4gxxq07zMrKMjpNgOqqXVUHeeCDAr7YVsWp/WJ48ocjSevT43/WM8ZQUt3ExtIacktqyCutIX9PLYcOtwMQHRbEqOTo74t8RnI0CT1C3d0cpU5KRNYbY7JOtp4eqSuv0tLazsuf7+DPy7cSbAvgsStGMDW7HwEBxz7aFhH69QqnX69wJmf0BaC1rZ2tFQ3kltSQW1pDbkktL6zcTlu7/QAnMTr0+26bzOQYRiRH0yM0yG1tVKortKgrr7H+22ruey+fLeUNXDoykYcuT+/UUXWgLYBhiT0YltiDa7P7AdDU0kbh3trvu21yS2tYWlgGgAgMiIsgIyWGzJQYRiXHMCwxipBAW7e2T6nuoEVdebzapsM8tXQzi1bvJikmjHnTs7hgWO9u3UdYsI2s1FiyUmO/X3bgYAt5e2q/77ZZtaWK977ZA0CQTUhP7MGoI7ptBsZHHvcTg1Luon3qymMZY/gofx+P/LOI/Q3NzBjfn7svHEJEiDXHIsYY9tUeIrekho2lNeSV1JJXWsPBljYAIkMCGZkU/X23TUZKDInRoXoiVnUL7VNXXq30QCO/+aCAz4orGZkUzSs3jmFEUrSlmUSEvjFh9I0J4+KRiQC0tRt2VDawsaSGvNJacktrmP/FTg632Q+W4iJDyEyJ/q8j+pjwYCuboXycFnXlUVrb2nnly13MWbYFEfjNZelMP/0UAj30+nJbgDC4dxSDe0dxVVYKAM2tbWzaV3/Eidga/rWp4vvXnNIr3H4iNjmazJQYhveNJixY++dV99CirjxGXmkN972XT+HeOiYMS+CRKSNIigmzOlaHhQTayHScVP1O3aHDFJTWft9ts3ZXNYtz9wL2N4YhvaPIdAySGpUcw5DekR77RqY8m/apK8s1NLfyzKfFvPqfXcRHhfDI5OFMHN7H5/uiK+oO/dfVNrklNdQdagUgNCiAEX2j7V02jm6bfrE6UMqfOdunrkVdWerTwjIeWlxIWd0hpo07hV9OHOq314QbY9i1v5G80ho2ltiLfOHeOppb7QOlYsKD7IOkku3FflRyDPFRIRanVu6iJ0qVRyurPcRDiwvIKSwnrU8Uz009lVP79bQ6lqVEhP5xEfSPi2BKZhIAh9vaHdMe2Lttcktr+OtnlTjGSZEUE8bVWSncccEgPYpXgBZ15WZt7YbXv/6Wp3OKaW1v595Jacw6q79OtHUcQbYAx6yT0Uwda1/W2NJKwZ46cktq+GJbFc/+awstbW3838Q0a8Mqj6BFXblN0d467n8/n40lNZw1OI4nrhhJv17hVsfyOuHBgWT3jyW7fyyzzurP/e8X8Nxn2wkPDuT28wZZHU9ZTIu6crmmljb+uHwLcz/fSc/wIP50bSaTM/pqd0E3EBEev2IETS2tPJ1TTFiQjRln9rc6lrKQU0VdREKBVUCI4zXvGGMeEpEFwDlArWPVG40xG10RVHmnlVsqeeCDfEqqm7h2TAq/ujhNB990M1uA8IerMmg63MajS4oID7Z9P6eN8j/OHqk3A+cbYxpEJAj4QkQ+cfzs/4wx77gmnvJWlfXNPLakiMW5exkYH8FbN49j7IBeVsfyWYG2AP583Whufm09972fT1iw7fuTrcq/OFXUjf26xwbHt0GOh3uvhVRe4+21JTz+URGHDrdz94Qh3HruAJ3R0A1CAm28OO00bnxlDb94O5fQIBsTh/exOpZyM6cvORARm4hsBCqAZcaY1Y4fPSEieSLyrIgc86JZEblZRNaJyLrKyspuiK081fMrtnHPu3kMS+zBJ3edxZ0TBmtBd6PQIBtzp49hZFI0P39jAyu36N+bv3G6qBtj2owxmUAykC0iI4D7gDRgDBAL3Huc175kjMkyxmTFx8d3Q2zliRZ8uZOnlhYzJbMvb/xkHAPjI62O5JciQwJ59aZsBiVEcsvCdazesd/qSMqNOnxxsDGmBlgBTDLG7DN2zcArQHY351Ne4u21JTz8zyIuSu/NH67KwKbzilsqOjyIhTOzSe4ZzowFa9lYUmN1JOUmThV1EYkXkRjH8zBgArBZRBIdywS4AihwVVDluRbn7uXe9/I4e0g8f7l+tA4k8hC9IkN4feZYekWGcMO81RTtrbM6knIDZ//6EoHPRCQPWIu9T30JsEhE8oF8IA543DUxladaVlTOL97ayJjUWF788Wnaf+5h+kSHsmjWWCJCApk2bzXbKhpO/iLl1XRCL9Vpn2+tZOaCdQzr24NFs8YSadEdidTJ7ahs4OoXv8YWAP+45QwdyeuFnJ3QSz8nq05Zs7Oan7y2joEJkbx2U7YWdA83ID6S12dl09zazvVzv2ZfbZPVkZSLaFFXHZZbUsOMBWtJiglj4cxsosP9c6pcb5PWpwevzcimpvEwU19eTWV9s9WRlAtoUVcdsmlfHTfMX0PPiCAWzRpHXKTO5+1NRiXH8MpNY9hXe4hp81ZT09hidSTVzbSoK6dtr2xg2rzVhAXZeGPWOPpEh1odSXXCmNRYXr4hix2VB5k+fw31hw5bHUl1Iy3qyikl1Y1Mfdk+iHjRT8aSEqsn2rzZmYPjeH7qqRTurWPmgnU0tbRZHUl1Ey3q6qTKag8xde5qmg63sXDmWB0p6iMmpPfm2WsyWfdtNTcvXEdzqxZ2X6BFXZ1QVUMzU+d+TfXBFl6bkc2wxB5WR1Ld6PKMvvzuR6P4fGsVP3tjA4fb2q2OpLpIi7o6rtrGw0ybt4Y9NU3Mv3EMGSkxVkdSLnB1VgqPThnOsqJyZr+dS1u7TsDqzfTiYnVMDc2tTH9lDdsrGpg7PYvs/rFWR1IudMPpqTS2tPG7TzYTFmTjtz8cSYDO3+OVtKir/9HU0sbMBWvJ31PLC1NP5ewhOrOmP7j1nIE0Nrfy539vIyzYxkOXp+stB72QFnX1X5pb27j19fWs2VXNn64dzUV6kwW/cveFQ2hsaWPuFzsJD7Zxz6Q0qyOpDtKirr7X2tbOHW/ab6zw1I9GMTmjr9WRlJuJCL++dBiNh9t4fsV2IkICuf28QVbHUh2gRV0B0NZu+OU/cskpLOfhy9O5ekyK1ZGURUSEx6eMoKmljadzigkLsjHjzP5Wx1JO0qKuMMbwwAf5fLBxL/dMGsqN4/UP2N8FBAhPXzmKppY2Hl1SRHiwjWuz+1kdSzlBL2n0c8YYHluyiTfXlPCz8wZx27n6UVvZBdoC+PN1ozl3aDz3vZ/Phxv3WB1JOUGLup+bs2wL87/cyU3jU5l90RCr4ygPExwYwN9+fBpj+8fyi7dzWVpQZnUkdRJa1P3Y8yu28Zd/b+PaMSk8eJlevqaOLTTIxtzpYxiVHM3P3/yGFcUVVkdSJ6BF3U8t+HInTy0tZkpmX574wUgt6OqEIkMCWXBTNoMTorhl4Xq+3rHf6kjqOLSo+6G315bw8D+LmDi8N89clYFNRw4qJ0SHBbFwZjYpseHMXLCWDbsPWB1JHYMWdT+zOHcv976Xx9lD4vnzdaMJtOmvgHJer8gQFs0aS1xUCNPnr9EbWXsg/Yv2I8uKyvnFWxsZkxrLiz8+jZBAm9WRlBfq3SOU12eOJcgWwC0L1+lNNjyMFnU/8fnWSm5f9A0jkqKZf+MYwoK1oKvOS4kN57mpp7JrfyOz386lXWd29Bha1P3Amp3V/OS1dQxMiOTVm7KJDNExZ6rrxg3oxa8vGcanReU8v2Kb1XGUg1NFXURCRWSNiOSKSKGIPOJY3l9EVovIVhF5S0SCXRtXdVRuSQ0zFqwlKSaMhTOziQ4PsjqS8iE3jU/lB6OTeGbZFj7brJc6egJnj9SbgfONMRlAJjBJRMYBvweeNcYMBg4AM10TU3XGpn113DB/DT0jglg0axxxkSFWR1I+RkR48gcjGdanB3f8fQO7qg5aHcnvOVXUjd13p7mDHA8DnA+841j+KnBFtydUnVK0t45p81YTFmTjjVnj6BMdanUk5aPCgm28OO00bAHCLQvXc7C51epIfs3pPnURsYnIRqACWAZsB2qMMd/9D5YCScd57c0isk5E1lVWVnY1szqBppY2fr90M5P/+gUiwqKfjCUlNtzqWMrHpcSG89frTmVrRT33vJOHMXri1CpOF3VjTJsxJhNIBrKBYcda7TivfckYk2WMyYqP17vouMqqLZVM/OMqXlixnR+MTuLTu85mYHyk1bGUnzhzcBz3Tkrjo/x9vLhqh9Vx/FaHL4MwxtSIyApgHBAjIoGOo/VkYG8351NOqGpo5rElRXy4cS8D4iJ48yfjOH1gL6tjKT9089kDyNtTy1NLNzO8bw/OGqwHce7m7NUv8SIS43geBkwANgGfAVc6VpsOfOiKkOrYjDG8tXY3Fzyzkk/yy7jzgsF8ctdZWtCVZUTs87AP6R3Fz9/cQEl1o9WR/I6z3S+JwGcikgesBZYZY5YA9wK/EJFtQC9gnmtiqqNtq2jgmpe+5t538xnaJ4qP7zyLuy8coqNEleXCgwN5cdpptLcbbl64nqaWNqsj+RVx9wmNrKwss27dOrfu05c0t7bx/GfbeWHFdsKCbdx/SRpXnZZCgE7KpTzMZ8UVzFiwlikZfXn2mkydCbSLRGS9MSbrZOvp0EIv8vWO/dz/fj47Kg8yJbMvD1yaTnyUXnuuPNN5QxOYfeEQ/vDpFkYmxzBT73PqFlrUvcCBgy08+fEm/rG+lJTYMF6dkc05Q/QElPJ8t507iPw9tTz58SaGJUZxxsA4qyP5PJ37xYMZY3h/QykT5qzkvQ17uPWcgXx61zla0JXXCAgQnrk6k/5xEfz8jQ3sqWmyOpLP06Luob7df5Ab5q/h7rdySYkNZ8nPz+RXF6fp7IrK60SG2E+ctrS2c+vC9Rw6rCdOXUmLuoc53NbOc59t46JnV7Fxdw2PTRnOuz89g2GJPayOplSnDYyPZM41meTvqeWBDwp0xKkLaZ+6B1n/7QHufy+f4vJ6Lh7Rh4cnD6d3D52zRfmGC9N7c+cFg/nT8q1kJEcz7fRUqyP5JC3qHqC26TBPLd3MG2t2k9gjlLk3ZDEhvbfVsZTqdndeMJiCPbU88s8i0hJ7MCY11upIPke7XyxkjOGjvH1MmLOSN9fs5qYz+vPpL87Rgq58VkCAMOeaTFJiw/np699QVnvI6kg+R4u6RUoPNDLz1XXc/sY3JESF8OHtZ/Lg5el6VyLl86LDgnhp2mk0tbTy00XraW7VE6fdSYu6m7W2tTP38x1cOGcVX23fzwOXDuPD28czMjna6mhKuc3g3lH84aoMNuyu4eHFRVbH8Sl6WOhG+aW13Pd+HgV76jg/LYFHpwwnuafOda7808UjE7nt3IE8v2I7o5KjuS67n9WRfIIWdTdoaG7lmU+LefU/u4iLDOH5qady8Yg+OheG8nuzLxpKwd46HvqwkKF9oji1X0+rI3k97X5xsZVbKrlozkoW/GcX14/tx79mn8MlIxO1oCsF2AKEP1+bSZ/oUH76+noq6vXEaVdpUXehxpZWbl24nvCQQN659Qwev2IkPUKDrI6llEeJCQ/mxWmnUdfUyu2LvqGltd3qSF5Ni7oLrdpSSdPhNh6dMpzTTtGPlUodz7DEHvz+ylGs3XWAJz7SE6ddoX3qLrS0oIye4UFk6wALpU5qckZf8ktrePnznYxIiuaqrBSrI3klPVJ3kZbWdpZvrmDCsN4E2vSfWSln3DspjTMG9uLXHxSQV1pjdRyvpNXGRb7asZ/6Q61MGtHH6ihKeY1AWwB/vf5U4iNDuHXhevY3NFsdyetoUXeRnMIyIoJtjB+kNwVQqiNiI+wnTvcfbOH2N76htU1PnHaEFnUXaGs3fFpYzrlDEwgN0vnPleqoEUnR/PaHI/l6RzW//WSz1XG8ip4odYENuw9Q1dDMRO16UarTfnhqMnmltcz7YiejkqOZkplkdSSvoEfqLrC0oIxgWwDnDdXbzinVFb++dBjZ/WO59908CvfWWh3HKzhV1EUkRUQ+E5FNIlIoInc6lj8sIntEZKPjcYlr43o+Yww5RWWMH9SLKB1opFSXBNkCeO76U4kJC+aWhes5cLDF6kgez9kj9VZgtjFmGDAOuF1E0h0/e9YYk+l4fOySlF5k0756SqqbmDhcu16U6g7xUSH8bdppVNQ1c8ffN9DWrrfCOxGniroxZp8x5hvH83pgE6AdXMewtLCMAEFvdKFUN8pMieGxK4bz+dYqns4ptjqOR+twn7qIpAKjgdWORT8TkTwRmS8ifj8W/tPCMrJSY4mLDLE6ilI+5Zox/Zg6th9/W7mdj/P3WR3HY3WoqItIJPAucJcxpg54ARgIZAL7gGeO87qbRWSdiKyrrKzsYmTPtavqIJvL6pmkXS9KucRDlw8nMyWG+97Lp7xOZ3Q8FqeLuogEYS/oi4wx7wEYY8qNMW3GmHbgZSD7WK81xrxkjMkyxmTFx/vuFSE5hWUAXDRcu16UcoXgwACevSaT5tY27n03D2O0f/1ozl79IsA8YJMxZs4RyxOPWO0HQEH3xvMuSwvLGJkUrXczUsqF+sdFcN/Fw1hRXMnf15ZYHcfjOHukPh6YBpx/1OWLT4lIvojkAecBd7sqqKcrrzvEht01TNSjdKVcbtq4Uxg/qBePLymipLrR6jgexdmrX74wxogxZtSRly8aY6YZY0Y6lk82xvjt2YtPHV0veimjUq4XECA8dWUGASLM/kcu7XqZ4/d0RGk3ySksZ0B8BIMSIq2OopRfSIoJ48HL01mzs5r5X+60Oo7H0KLeDWoaW/hqx34mDtebSSvlTleelsyEYQk8lVPMtop6q+N4BC3q3WD5pgra2o1eyqiUm4kIT/5wJBHBNma/navT9KJFvVssLSwjMTqUUcnRVkdRyu8kRIXy+BUjyS2t5YUV262OYzkt6l3U2NLKqi2V2vWilIUuHZXI5Iy+/Gn5Vgr2+PdsjlrUu2hlcSXNre064Egpiz06ZTixEcHMfjuX5tY2q+NYRot6F+UUltEzPIjs1Firoyjl12LCg/n9j0ZRXF7Ps8u2Wh3HMlrUu6CltZ3lmyuYMKw3gTb9p1TKauelJXDtmBReWrWd9d9WWx3HElqJuuCrHfupP9TKJL1tnVIe44HL0ukbE8bst3NpbGm1Oo7baVHvgpzCMiKCbYwfFGd1FKWUQ2RIIE9fmcGu/Y38zg9vWq1FvZPa2g2fFpZzbloCoUE2q+MopY5w+sBezBjfn9e++pYvtlZZHcettKh30obdB6hqaNa5XpTyUPdMGsqA+AjueSeXukOHrY7jNlrUO2lpQRnBtgDOG+q788Mr5c1Cg2zMuTqT8vpmHv1nkdVx3EaLeicYY8gpKmP8oF5EhQZZHUcpdRyZKTHcdu5A3llfyrKicqvjuIUW9U4o2ldHSXWTdr0o5QV+fv5g0hN7cN97eVQfbLE6jstpUe+EnMJyAgQmpOsoUqU8XXBgAHOuyaC26TAPfJDv87fA06LeCTkFZWSlxhIXGWJ1FKWUE9L69ODuC4fwcX4Zi3P3Wh3HpbSod9DOqoMUl9frNLtKeZlbzh7I6H4x/OaDAsrrDlkdx2W0qHdQjuO2dTqBl1LexRYgzLk6k5a2du55J89nu2G0qHdQTmEZI5OiSe4ZbnUUpVQH9Y+L4L6Lh7FySyV/X1tidRyX0KLeAWW1h9iwu4aJepSulNeaNu4UzhjYi8eXFFFS3Wh1nG6nRb0DlhXZu150Ai+lvFdAgPD0VRmICLP/kUt7u291w2hR74ClhWUMiI9gUEKU1VGUUl2QFBPGg5ens2ZnNfO/3Gl1nG7lVFEXkRQR+UxENolIoYjc6VgeKyLLRGSr42tP18a1Tk1jC1/vqNYBR0r5iKtOS2bCsASeyilmW0W91XG6jbNH6q3AbGPMMGAccLuIpAO/ApYbYwYDyx3f+6Tlmypoazd6KaNSPkJEePKHI4kItjH77Vxa29qtjtQtnCrqxph9xphvHM/rgU1AEjAFeNWx2qvAFa4I6QmWFpaRGB3KqORoq6MopbpJQlQoj18xktzSWp5fsd3qON2iw33qIpIKjAZWA72NMfvAXviBhO4M5ykaW1pZtaWSicP7ICJWx1FKdaNLRyUyOaMvf16+lYI9tVbH6bIOFXURiQTeBe4yxtR14HU3i8g6EVlXWVnZ0YyWW1lcSXNruw44UspHPTplOLERwcx+O5fm1jar43SJ00VdRIKwF/RFxpj3HIvLRSTR8fNEoOJYrzXGvGSMyTLGZMXHe9/84zmFZfQMDyI7NdbqKEopF4gJD+b3PxpFcXk9zy7banWcLnH26hcB5gGbjDFzjvjRYmC64/l04MPujWe9ltZ2lm+uYMKw3gTa9ApQpXzVeWkJXDsmhZdWbWf9t9VWx+k0Z6vUeGAacL6IbHQ8LgF+B1woIluBCx3f+5Svduyn/lCrDjhSyg88cFk6fWPCmP12Lo0trVbH6RRnr375whgjxphRxphMx+NjY8x+Y8wFxpjBjq/e+/Z2HEsLyogItjF+UJzVUZRSLhYZEsjTV2awa38jv/tks9VxOkX7E06grd2wrKicc9MSCA2yWR1HKeUGpw/sxYzx/Xntq2/5YmuV1XE6TIv6CXyz+wBVDc06ilQpP3PPpKEMiI/gnndyqTt02Oo4HaJF/QRyCsoItgVw3lDvu2JHKdV5oUE25lydSVndIR5ZXGR1nA7Ron4cxhiWFpYxflAvokKDrI6jlHKzzJQYbjt3EO9+U8qnjpvjeAMt6sdRtK+O0gNN2vWilB+744LBpCf24P7389nf0Gx1HKdoUT+OnIIyAgQmpOsoUqX8VXBgAHOuyaC26TAPfFDgFbfA06J+HDmF5WSlxhIXGWJ1FKWUhdL69ODuC4fwSUEZi3P3Wh3npLSoH8POqoMUl9frNLtKKQBuOXsgo/vF8JsPCqis9+xuGC3qx5DjOCmiE3gppQBsAcLTV2bQdLiNp5Z69qAkLerHkFNYxsikaJJ7hlsdRSnlIQYlRDLjzP78Y30p6789YHWc49KifpSy2kNs2F3DRD1KV0od5Y7zB9O7RwgPLS6gzUNvWK1F/SjLiuxdLzqBl1LqaBEhgfz60nQK9tTxxprdVsc5Ji3qR1laWMaA+AgGJURZHUUp5YEuH5XI6QN68YecYqoPtlgd539oUT9CTWMLX++o1qtelFLHJSI8MmU4Dc2tPJ3jeSdNtagf4V+bKmhrNzqKVCl1QkN6R3HjGan8fW0JuSU1Vsf5L1rUj5BTWEZidCijkqOtjqKU8nB3TRhMXGQID35YQLsHnTTVou7Q2NLKqi2VTBzeB/vd+5RS6viiQoO4/5I0cktreXtdidVxvqdF3WFlcSXNre064Egp5bQrMpMYk9qT3y/dTE2jZ5w01aLusLSwjJ7hQWSnxlodRSnlJUSERyaPoLbpMH/4tNjqOIAWdQBaWtv596YKJgzrTaBN/0mUUs5L79uDG05PZdHq3RTsqbU6jhZ1gP9sr6K+uVUHHCmlOuXuC4fQKyLYI06aalHHPs1uRLCN8YPirI6ilPJC0WFB3DspjW921/DuN6WWZvH7ot7WblhWVMa5aQmEBtmsjqOU8lI/OjWZ0f1i+N0nm6ltsu5m1U4VdRGZLyIVIlJwxLKHRWSPiGx0PC5xXUzX+Wb3AaoaWnTAkVKqSwIChMemjKC6sYVnl22xLoeT6y0AJh1j+bPGmEzH4+Pui+U+OQVlBNsCOG9ovNVRlFJebkRSNFPH9uO1r3axaV+dJRmcKurGmFVAtYuzuJ0xhqWFZYwf1Iuo0CCr4yilfMAvLxpKdFgQD35ozT1Nu9qn/jMRyXN0z/TslkRuVLSvjtIDTdr1opTqNjHhwdwzKY21uw7wwcY9bt9/V4r6C8BAIBPYBzxzvBVF5GYRWSci6yorK7uwy+6VU1BGgMCEdB1FqpTqPtdkpZCRHM2TH2+m/pB7T5p2uqgbY8qNMW3GmHbgZSD7BOu+ZIzJMsZkxcd7Tt91TmE5WamxxEWGWB1FKeVDAgKER6eMoKqhmT/9a6t7993ZF4pI4hHf/gAoON66nmhn1UGKy+t17nSllEtkpMRw7ZgUXvnPLraU17ttv85e0vgm8BUwVERKRWQm8JSI5ItIHnAecLcLc3a7nEL7bet0Ai+llKv838Q0IkMC3XrSNNCZlYwx1x1j8bxuzuJWSwvKGJkUTXLPcKujKKV8VGxEML+cOJTffFDAkrx9XJ7R1+X79MsRpWW1h9hYUsNEPUpXSrnY9dn9GJHUgyc+2sTB5laX788vi/qnRfauF53ASynlarYA+/S8ZXWH+PO/XX/S1C+Lek5hGQPiIxiUEGV1FKWUHzjtlJ5ceVoy8z7fybaKBpfuy++K+oGDLXy9o1qvelFKudWvLk4jLNjGw4sLXXrS1O+K+vLNFbS1Gx1FqpRyq7jIEGZfOIQvtlUAmgBuAAAMdUlEQVSxtKDMZfvxu6K+tKCMxOhQRiVHWx1FKeVnfjzuFNL6RPHYkiIaW1xz0tSvinpjSyufb61k4vA+iIjVcZRSfibQFsCjU0awt/YQz322zSX78KuivrK4kubWdh1wpJSyTHb/WH4wOomXV+1kZ9XBbt++XxX1pYVl9AwPIjs11uooSik/dt/FaQQHBvDIP7v/pKnfFPWW1nb+vamCCcN6E2***2YrpTxQQo9Q7powmBXFlSwrKu/WbftNdfvP9irqm1t1wJFSyiNMPyOVIb0jeXRJEYcOt3Xbdv2mqOcUlhMRbGP8oDiroyilFEG2AB6ZPILSA028sGJ7t23XL4p6W7thWVEZ56YlEBpkszqOUkoBcPrAXlye0ZcXVm5n9/7GbtmmXxT1L7ZVUdXQogOOlFIe5/5L0ggMEB5dUtgt2/P5or65rI47/76B5J5hXJCWYHUcpZT6L4nRYdxxwWD+tamCf2/u+klTny7qOyob+PHc1YQG2nhj1jgiQpyaPl4ppdxqxvj+DIyP4JF/dv2kqc8W9ZLqRqbOXY0x8PqssfTrpTfDUEp5puDAAB6ePJxv9zfy8qodXdqWTxb1stpDTJ27moPNrSycOZZBCZFWR1JKqRM6a3A8F4/ow3MrtlF6oPMnTX2uqO9vaGbq3K/Z39DMqzOySe/bw+pISinllAcuS0cQHltS1Olt+FRRr208zI/nrWFPTRPzbxzD6H49rY6klFJOS4oJ42fnDyKnsJyVWyo7tQ2fKeoNza1Mf2UN2ysaeHFaFmMH9LI6klJKddiss/qT2iucRxYX0tza8ZOmPlHUm1ramLlgLfl7avnL9aM5Z0i81ZGUUqpTQgJtPDR5ODuqDjLvi50dfr3XF/Xm1jZufX09a3ZVM+fqDB1gpJTyeucNTeDC9N78Zfk29tY0dei1ThV1EZkvIhUiUnDEslgRWSYiWx1f3d6B3drWzh1vbmDllkp++4ORTMlMcncEpZRyiQcvS6fdGJ74eFOHXufskfoCYNJRy34FLDfGDAaWO753m/Z2wy//kUtOYTkPXpbOtdn93Ll7pZRyqZTYcG47dxAf5e3jy21VTr/OqaJujFkFVB+1eArwquP5q8AVTu+1i4wx/PqDAj7YuJf/mziUGWf2d9eulVLKbW45ZwApsWE8tNj5eWG60qfe2xizD8Dx1S0TqxhjeGzJJt5cs5vbzxvI7ecNcsdulVLK7UKDbDx02XC2VTQ4/Rq3nCgVkZtFZJ2IrKus7Ny1l9+Zs2wL87/cyY1npPLLi4Z2U0KllPJME9J7c/PZA5xevytFvVxEEgEcXyuOt6Ix5iVjTJYxJis+vvOXGz6/Yht/+fc2rslK4cHL0hGRTm9LKaW8xf2XDHN63a4U9cXAdMfz6cCHXdjWSb36n108tbSYyRl9efKHIwkI0IKulFJHc/aSxjeBr4ChIlIqIjOB3wEXishW4ELH9y7x9toSHlpcyIXpvXnm6gxsWtCVUuqYnJpg3Bhz3XF+dEE3Zjmmxbl7ufe9PM4aHMdfrx9NkM3rx0sppZTLeHSFXFZUzi/e2siYU2J5aVoWIYF6f1GllDoRjy3qn2+t5PZF3zA8KZp5N2YRFqwFXSmlTsYji/qandX85LV1DIiP4NWbxhAVGmR1JKWU8goeV9RzS2qYsWAtfWPCeH3WWGLCg62OpJRSXsOjivqmfXXcMH8NPSOCeGPWOOIiQ6yOpJRSXsVjivqOygamzVtNWJCNN2aNo090qNWRlFLK63hEUS+pbmTq3NUYA6/PGktKbLjVkZRSyis5dZ26K5XVHmLq3NU0trTx95vHMSgh0upISinltSw9Ut/f0MzUuV9TfbCFV2dkMyyxh5VxlFLK61lW1GsbD/PjeWvYU9PEvOlZZKbEWBVFKaV8hiVFvaG5lemvrGF7RQMvTsti7IBeVsRQSimf4/Y+9XYDMxesJX9PLS9MPZVzhnR+Kl6llFL/ze1Ffff+gxzYVc0fr8nkouF93L17pZTyaW4v6vXNrTz3w5FMyUxy966VUsrnub1PPTE6lGvG9HP3bpVSyi+4vajr0H+llHIdjxhRqpRSqntoUVdKKR+iRV0ppXyIFnWllPIhWtSVUsqHaFFXSikfokVdKaV8iBZ1pZTyIWKMce8OReqBYrfu1HpxQJXVIdxM2+z7/K29YG2bTzHGnHQGRCvufFRsjMmyYL+WEZF12mbf529t9rf2gne0WbtflFLKh2hRV0opH2JFUX/Jgn1aTdvsH/ytzf7WXvCCNrv9RKlSSinX0e4XpZTyISct6iKSIiKficgmESkUkTsdy2NFZJmIbHV87elYniYiX4lIs4j88qht3e3YRoGIvCkiocfZ53THdreKyPQjlj8hIiUi0tC1ZntVm5eKSK5jG38TEZsftHmFiBSLyEbHI8GX2ywiUUe0daOIVInIH321vY7l14hInmMbT3V3Wy1u81IRqRGRJUct/5mIbBMRIyJxrmozxpgTPoBE4FTH8yhgC5AOPAX8yrH8V8DvHc8TgDHAE8Avj9hOErATCHN8/zZw4zH2FwvscHzt6Xje0/GzcY48DSfL3ZWHh7W5h+OrAO8C1/pBm1cAWa78P/a0Nh+13nrgbF9tL9AL2A3EO9Z7FbjAF/6PHT+7ALgcWHLU8tFAKrALiHPV7/VJj9SNMfuMMd84ntcDmxwNnOL4z/juP+UKxzoVxpi1wOFjbC4QCBORQCAc2HuMdSYCy4wx1caYA8AyYJJj218bY/adLHNXeVib647YTjDgkpMgntRmd/HENovIYOyF5fMuNu9/eFB7BwBbjDGVjvX+BfyoG5r4PyxoM8aY5UD9MZZvMMbs6lKDnNChPnURScX+brMa6P1dgXV8PeFHZGPMHuAP2N+h9wG1xphPj7FqElByxPeljmWW8IQ2i0gOUIH9F+WdTjbFaZ7QZuAVR1fEb0REOtkUp3lImwGuA94yjkM7V7G4vduANBFJdRTIK4CUrrTHGW5qs+WcLuoiEon94/9dRxw9Os3RZzUF6A/0BSJE5MfHWvUYyyy5RMdT2myMmYj9Y2QIcH5Hc3SEh7R5qjFmJHCW4zGtozk6wkPa/J1rgTc7mqEjrG6v46j9p8Bb2D+R7AJaO5qjI9zYZss5VdRFJAj7P8giY8x7jsXlIpLo+Hki9iPJE5kA7DTGVBpjDgPvAWeIyNgjThBNxv5ufuS7djLH+ZjjSp7WZmPMIWAx9l8sl/CUNjuOir77uPwGkN09LfxfntJmx74ygEBjzPpuadwxeEp7jTH/NMaMNcacjn0uqK3d1cajubnNlnPm6hcB5gGbjDFzjvjRYuC7s9nTgQ9PsqndwDgRCXds8wLHNlcbYzIdj8VADnCRiPR0vDte5FjmNp7SZhGJPOIXLxC4BNjcXe08kge1OfC7KwMcf4yXAQXd1c4jeUqbj9jOdbjwKN2T2iuOK5ocy28D5nZPK/+bBW22njn52eMzsX9EzAM2Oh6XYD+DvRz7O+xyINaxfh/s79B1QI3j+XdXcDyCvSgVAAuBkOPscwb2frdtwE1HLH/Ksb12x9eHT5a/Mw9PaTPQG1jryFEI/AX7kZwvtzkC+9Uf37X5T4DNl9t8xM92AGmuaKuntRf7m1eR4+GSK7osbPPnQCXQ5Hj9RMfyOxzft2L/xDLXFW3WEaVKKeVDdESpUkr5EC3qSinlQ7SoK6WUD9GirpRSPkSLulJK+RAt6kqdhIjEiMhtR3x/rhw1A59SnkKLulInF4N9gIxSHk+LuvIpjkmiNovIXLHPe71IRCaIyJdinzs7W+xzaX8g9vm8vxaRUY7XPiwi88U+n/sOEbnDsdnfAQMdQ8GfdiyLFJF3HPta5BhlqJTlAq0OoJQLDAKuAm7GPiL3euwjCycD92OfOXCDMeYKETkfeA3IdLw2DTgP+9zbxSLyAvb5tkcYYzLB3v2Cfba/4dhHBn4JjAe+cEfjlDoRPVJXvminMSbfGNOOfaqB5cY+dDof+00KzsQ+zBtjzL+BXiIS7XjtR8aYZmNMFfZJnnofZx9rjDGljn1sdGxXKctpUVe+qPmI5+1HfN+O/dPpiabAPfK1bRz/06yz6ynlVlrUlT9aBUyF77tSqsyJ59iux94do5TH06ML5Y8exn5npTygkf8/BesxGWP2O060FgCfAB+5PqJSnaOzNCqllA/R7hellPIhWtSVUsqHaFFXSikfokVdKaV8iBZ1pZTyIVrUlVLKh2hRV0opH6JFXSmlfMj/AwCmttdcI8/aAAAAAElFTkSuQmCC)]
In [35]:
df.head()Out[35]:
| 0 | 2018-01-01 | 3 | -6 | 晴~多云 | 东北风 | 1-2级 | 59 | 良 | 2 | 2018-01 |
| 1 | 2018-01-02 | 2 | -5 | 阴~多云 | 东北风 | 1-2级 | 49 | 优 | 1 | 2018-01 |
| 2 | 2018-01-03 | 2 | -5 | 多云 | 北风 | 1-2级 | 28 | 优 | 1 | 2018-01 |
| 3 | 2018-01-04 | 0 | -8 | 阴 | 东北风 | 1-2级 | 28 | 优 | 1 | 2018-01 |
| 4 | 2018-01-05 | 3 | -6 | 多云~晴 | 西北风 | 1-2级 | 50 | 优 | 1 | 2018-01 |
In [38]:
group_data = df.groupby("month").agg({"bWendu":np.max, "yWendu":np.min, "aqi":np.mean})group_dataOut[38]:
| month | |||
| 2018-01 | 7 | -12 | 60.677419 |
| 2018-02 | 12 | -10 | 78.857143 |
| 2018-03 | 27 | -4 | 130.322581 |
| 2018-04 | 30 | 1 | 102.866667 |
| 2018-05 | 35 | 10 | 99.064516 |
| 2018-06 | 38 | 17 | 82.300000 |
| 2018-07 | 37 | 22 | 72.677419 |
| 2018-08 | 36 | 20 | 59.516129 |
| 2018-09 | 31 | 11 | 50.433333 |
| 2018-10 | 25 | 1 | 67.096774 |
| 2018-11 | 18 | -4 | 105.100000 |
| 2018-12 | 10 | -12 | 77.354839 |
In [39]:
group_data.plot()Out[39]:
<matplotlib.axes._subplots.AxesSubplot at 0x24e963082b0>[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Lwmi8hwM-1597761927711)(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAXoAAAEKCAYAAAAcgp5RAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAIABJREFUeJzs3Xd4VFX6wPHvSe+kE0hIAiQQOoTQi6AiSLEg1XVX176ua0Fsuz/B3VVXUFlZXVxx7UhHUVDRtQSQTugkgYSSkFASUgjpycz5/XGHEKSFtJlM3s/z5GHmzp173wmT95577rnvUVprhBBC2C8HawcghBCiYUmiF0IIOyeJXggh7JwkeiGEsHOS6IUQws5JohdCCDsniV4IIeycJHohhLBzkuiFEMLOOVk7AIDAwEAdGRlp7TCEEKJJSUhIOK21DrraejaR6CMjI9m+fbu1wxBCiCZFKZVWk/Wk60YIIeycJHohhLBzkuiFEMLO2UQfvRAVFRVkZGRQWlpq7VCaBDc3N8LCwnB2drZ2KKIJkEQvbEJGRgbe3t5ERkailLJ2ODZNa01OTg4ZGRm0bdvW2uGIJkC6boRNKC0tJSAgQJJ8DSilCAgIkLMfUWOS6IXNkCRfc/K7EtdCEr0VHMg9wJYTW6wdhhCimZBE38jM2sxTa5/iof89xNYTW60djqjm6NGjdO3a9YJlX375JbfddlvV83/84x9ERUVVPV+1ahW33HJLvew/MjKS06dP18u2hKhOEn0j23h8I2kFabg5uTF97XSOFx63dkjiCgYOHMimTZuqnm/atAkfHx+ysrIA2LhxI4MGDbJWeELUiCT6RrYwaSGB7oF8evOnVJgreOLnJyitlItqtqKyspK7776b7t27M2HCBDw9PWnRogWpqakAZGZmcscdd7Bx40bASPQDBw4E4Pvvv2fAgAHExsYyceJECgsLAaOlPnPmTGJjY+nWrRvJyckA5OTkcNNNN9GrVy8eeughtNbAxWcWr7/+Oi+++GJj/QqEHZLhlY0ovSCdXzJ/4eEeDxPtF80/hvyDP/30J/6++e+8NOglucBm8ddV+0k8XlCv2+zc2oeZ47pcdb0DBw7w/vvvM2jQIO69917mzZvHwIED2bhxIyaTiejoaPr37893333H2LFj2bNnD3369OH06dO89NJL/PDDD3h6ejJr1izmzJnDjBkzAAgMDGTHjh3MmzeP119/nf/+97/89a9/ZfDgwcyYMYOvv/6a+fPn1+tnFuIcadE3okXJi3BUjkzsMBGAYW2G8UiPR/jq0FcsTF5o5egEQJs2baq6Yu666y5++eUXBg0axMaNG9m4cSMDBgygb9++bNmyhZ07d9KxY0fc3NzYvHkziYmJDBo0iJ49e/Lxxx+Tlna+3tT48eMB6N27N0ePHgVg3bp13HXXXQCMGTMGPz+/xv2wotmQFn0jKa4oZmXqSkZEjiDI43xV0Yd6PERibiKvbXuNDn4d6BPSx4pR2oaatLwbyq/PqpRSDBw4kLfeeguTycQDDzyAt7c3paWlxMfHVx0UtNaMGDGCRYsWXXK7rq6uADg6OlJZWXnZ/QE4OTlhNpurnst4eVFX0qJvJKsOraKwopA7Y+68YLmDcuCVwa/QxrsN09dO52TRSStFKADS09OrLr4uWrSIwYMH07lzZ44fP8769evp1asXAD179uQ///lPVf98//792bBhQ1VffnFxMQcPHrzivoYOHcpnn30GwLfffkteXh4ALVu2JCsri5ycHMrKyli9enWDfFbRfEiibwRaaxYlL6JzQGd6BPW46HVvF2/mXj+XMlOZXJy1sk6dOvHxxx/TvXt3cnNz+cMf/oBSin79+hEYGFhVW2bAgAEcPny4KtEHBQXx0UcfMXXqVLp3707//v2rLrpezsyZM1m3bh2xsbF8//33hIeHA+Ds7MyMGTPo168fY8eOJSYmpmE/tLB76tyVfmuKi4vT9jzxyOYTm3ng+wd4adBL3Bp162XX+yn9Jx7/+XFuaX9Ls7s4m5SURKdOnawdRpMivzOhlErQWsddbT1p0TeChUkL8XP1Y1TbUVdc7/rw63m4x8N8degrFh9Y3EjRCSHsnST6BpZZmMnajLVM6DABV0fXq67/hx5/4Lqw65i9dTbbT9rvWY4QovFcNdErpT5QSmUppfZVW/aaUipZKbVHKfWFUsq32mvPK6VSlVIHlFIjGyrwpmJJ8hIUikkdJ9VofQflwD+G/IMw7zCeWvuUXJwVQtRZTVr0HwG/7nP4H9BVa90dOAg8D6CU6gxMAbpY3jNPKeVYb9E2MSWVJaxIWcH14dcT4hlS4/d5u3jz5vA3Ka0sZVr8NMpMZQ0YpRDC3l010Wut1wG5v1r2vdb63GDgzUCY5fGtwGKtdZnW+giQCvStx3iblG8Of0NBecFFQypror1ve14Z/Ap7T+/l5c0vYwsXzYUQTVN99NHfC3xreRwKHKv2WoZlWbOjtWZh8kI6+HWgd8vetdrGDRE38GD3B/ki9QuWHlhazxEKIZqLOiV6pdRfgErgs3OLLrHaJZuiSqkHlVLblVLbs7Oz6xKGTUo4lcDBvIPcGXNnnYZJPtLjEYaEDuHVra+y49SOeoxQXM3tt9/OypUrq5537NiRl156qer5HXfcweeff17n/cTHxzN27Ng6b0eIy6l1oldK3Q2MBX6jz/crZABtqq0WBlyyDq/Wer7WOk5rHRcUFHSpVZq0hckL8XHxYXS70XXajqODI68OfZXWXq2ZFj+NU0Wn6ilCcTXnipmBUWnSy8vropLF526YEsKW1SrRK6VGAc8Ct2iti6u99BUwRSnlqpRqC0QDzW52jZNFJ/kp/SfuiL4Ddyf3Om/Px8WHucPnUlxZzLT4aZSbyushSlHdCy+8wNy5c6ue/+Uvf6GkpOSCcsRjx44lOzsbrTVHjhzB3d2dkJAQTCYTTz/9NH369KF79+68++67gNFSHzZsGBMmTCAmJobf/OY3Vdda1qxZQ0xMDIMHD77grODFF1/k9ddfr3retWvXqiJoQtTWVYuaKaUWAcOAQKVUBjATY5SNK/A/S7fEZq31w1rr/UqppUAiRpfOH7XWpoYK3lYtPbAUjWZyzOR622aUXxQvD36ZafHTeGXLK7w48MV627bN+fY5OLm3frcZ0g1ufvWyL993332MHz+exx9/HLPZzOLFi9m4cSNz5syhvLycjRs3ct1113H48GGSkpLYuXNnVUGz999/nxYtWrBt2zbKysoYNGgQN910EwA7d+5k//79tG7dmkGDBrFhwwbi4uJ44IEH+Omnn4iKimLy5Pr7nghxKVdN9FrrqZdY/P4V1n8ZeLkuQTVlZaYylh9cznVh1xHqVb/XoUdEjOCBbg/w3t736BLYparcsai7yMhIAgIC2LlzJ6dOnaJXr160bNmSLl26sGPHDjZv3swzzzzD4cOH2bhxIzt37rxgwpE9e/awfPlyAM6cOUNKSgouLi707duXsDBjUFrPnj05evQoXl5etG3blujoaMAohyy16EVDkjLF9WzNkTXkleVxZ6drH1JZE3/s+UcScxN5ZcsrRPtG0zO4Z4Psx6qu0PJuSPfffz8fffQRJ0+e5N577wWMfvp169Zx9uxZ/Pz86N+/P2+//TY7d+7k4YcfBowRVm+99RYjR154f2B8fHxVeWK4sETx5S7QS4li0RCkBEI9Ojeksn2L9vQL6dcg+3B0cGTWkFm08mzFk/FPklWc1SD7aY5uv/121qxZw7Zt26qS9qBBg3j33Xfp0cOoOtq9e3c2b95Meno6XboYdfNHjhzJO++8Q0VFBQAHDx6kqKjosvuJiYnhyJEjHDp0COCCGvaRkZHs2GGMrtqxYwdHjhyp/w8qmh1J9PVod/ZuEnMSmRoztUErT7ZwbcHc4XMpqihiWvw0KkwVDbav5sTFxYXhw4czadIkHB2NG7oHDhzI4cOHGTBgAGC0uIODg4mLi8PBwfjzuf/+++ncuTOxsbF07dqVhx566ILJRX7Nzc2N+fPnM2bMGAYPHkxERETVa3fccQe5ubn07NmTd955hw4dOjTgJxbNhZQprkfPrHuGXzJ+4YeJP+Dh7NHg+/vu6HdMXzudiR0mMmPAjAbfX0OyhZK7ZrOZ2NhYli1bVtV/bsts4XcmrEvKFDey7OJs/nf0f9wadWujJHmAkZEjubfrvSw7uIzlB5c3yj7tVWJiIlFRUdxwww1NIskLcS3kYmw9WXZwGSZtYmrMpQYpNZzHej1Gcm6ycXHWL/qSM1iJq+vcuTOHDx+2dhhCNAhp0deDClMFyw4uY3DoYMJ9wht1344OjsweOptgj2Cm/TyN0yWnG3X/QgjbJ4m+Hnyf9j2nS0432JDKqzl3cfZsxVm5OCuEuIgk+nqwMHkhET4RDGxtvbonHf078reBf2Nn1k5mbZtltTiEELZHEn0d7T+9nz3Ze5gaMxUHZd1f56i2o/h9l9+z5MASvkj5wqqxCCFshyT6OlqYvBAPJw9ubX+rtUMB4LHYx+jfqj9/3/x39mbXc70YcYH//Oc/fPLJJ9YOQ4irkkRfBzklOXx75FtuaX8LXi5e1g4HACcHJ14b+hrBHsE8Ef+EXJxtQA8//DC/+93vrB2GEFclib4OVqSsoMJcwdROjTuk8mp83Xx5c/ibFJQV8FT8U3Jx9hrcdttt9O7dmy5dulQVGvvwww/p0KED1113HQ888ACPPvoocHFJYSFslYyjr6UKcwVLDixhQKsBtGvRztrhXCTGP4a/Dvwrz65/lte2v8af+/3Z2iHV2Kyts0jOTa7Xbcb4x/Bs32evut4HH3yAv78/JSUl9OnThzFjxjBz5kwSEhJo0aIFw4cPp1evXvUamxANTVr0tfRT+k9kFWdZbUhlTYxuN5q7O9/NouRFrExdefU3CP71r3/Ro0cP+vfvz7Fjx/j0008ZNmwYQUFBuLi4SO14cYFKcyVrjqyhqOLyRexsgbToa2lh0kJCvUIZEjrE2qFc0RO9nyA5N5m/b/o7Ub5RdA3sau2QrqomLe+GEB8fzw8//MCmTZvw8PBg2LBhxMTEkJSUZJV4hO1bnLyYWdtmMazNMOYOn2v1kXeXY5tR2bjk3GR2ZO1gasxUHB0crR3OFTk5OPHada8R6B7IEz8/QU5JjrVDsllnzpzBz88PDw8PkpOT2bx5MyUlJcTHx5OTk0NFRQXLli2zdpjCRuSV5jFv9zyC3IOIPxbPe3ves3ZIlyWJvhYWJi3E3cmd26Jus3YoNeLn5sebw98kvyyfp9Y+xZ7sPRRXFF/9jc3MqFGjqKyspHv37rzwwgv079+fVq1a8eKLLzJgwABuvPFGYmNjrR2msBH/3vVviiuKee+m9xjTbgz/3vVv1mest3ZYlyRliq9Rfmk+Ny6/kXHtxzFzwExrh3NNVh9ezZ/X/xmN8X8e6hVKtG800X7RRPlGEe0XTaRPJM6Ozo0eW1MpufvRRx+xfft23n77bWuH0mR+Z/boQO4BJq2exJSOU3i+3/OUVJbw229+y/Gi4ywZs4Q2Pm0aJY6alimuyeTgHwBjgSytdVfLMn9gCRAJHAUmaa3zlDHbxlxgNFAM3KO13lHbD2GLVqSsoMxU1uhVKuvD2HZj6R3cm6TcJFLyUkjNTyUlL4X1mesxWeZwd1JORLaIJNo3mii/qKp/Q71Cbbb/UYjGpLVm9rbZeLt480jPRwBwd3Lnn8P/yZTVU3gi/gkWjF6Au5O7lSM976oteqXUUKAQ+KRaop8N5GqtX1VKPQf4aa2fVUqNBv6Ekej7AXO11ledU6+ptOgrzZWM/nw0Yd5hfDDyA2uHU2/KTeUcOXOE1PzUquSfmp9KZmFm1TruTu5E+UZVtfzP/RvgFlAvs2lJ6/Taye/MOn5M+5En4p/gL/3+wpSYKRe89kvmLzzywyPc3PZmXh3yaoPONAf12KLXWq9TSkX+avGtwDDL44+BeOBZy/JPtHH02KyU8lVKtdJan6h56LZr7bG1nCg6wTN9nrF2KPXKxdGFjv4d6ejf8YLlheWFHDpzqCrxp+alsjZjLV+knq+j4+fqd0HLP9rXOAjU5k5hrXWD/2HYC1vocm2OykxlvLb9NaJ8o5jQYcJFrw8OHcyjvR7lrZ1v0S2wG3d1vssKUV6stsMrW55L3lrrE0qpYMvyUOBYtfUyLMsuSvRKqQeBBwHCwxu3hnttLUxeSIhnCMPaDLN2KI3Cy8WLHkE9LprMJKckh5T8FFLzzp8BfJH6BSWVJVXrtPJsVdXyj/KNooNfB9r7tsfJ4dJfOTc3N3JycggIqJ8zBHumtSYnJwc3Nzdrh9LsfJr4KZmFmbx303uX/S7f3+1+9p3ex+vbX6ejf0f6hPRp5CgvVt/j6C/1F3rJpofWej4wH4yum3qOo96l5KWw9eRWHo99/LL/wc1FgHsAAe4B9G/Vv2qZWZs5Xni8KvGn5BtnARuPb6TSbEyUHeUbxZxhc2jbou1F2wwLCyMjI4Ps7OxG+xxNmZubG2FhYdYOo1nJLs5m/p75DG8z/ILv/q85KAdeGfwKU7+eyvS101k6diktPVs2YqQXq23GOnWuS0Yp1QrIsizPAKpfbg4DjtclQFuxKHkRLg4u3BF9h7VDsUkOyoEw7zDCvMMuOOOpMFWQVpDG3tN7mZMwhymrp/DXQX9lVOSoC97v7OxM27YXHwCEsBVzd8yl0lzJ9LjpV13Xy8WLN4e/yZ1f38m0tdP4cOSHuDi6NEKUl1bbYRRfAXdbHt8NfFlt+e+UoT9wxh7658+UnWH14dWMbjcaPzc/a4fTpDg7OhPlF8Xt0bezbNwyov2ieXrt07y69VUptiaajH2n9/HloS/5beff1ni60Pa+7Xlp8Evsyd7Dq1tfbeAIr+yqiV4ptQjYBHRUSmUope4DXgVGKKVSgBGW5wDfAIeBVOA94JEGibqRrUxdSUllSZMcUmlLQjxD+HDkh9zV6S4+S/qMe9bcw4nCJt8OEHZOa82rW18lwC2AB7s/eE3vHRExgnu73suyg8v4POXzBorw6moy6uZy2e2GS6yrgT/WNShbYjKbWJy8mJ5BPekc0Nna4TR5zo7OPNv3WWJbxvLChheYuHoirw55lcGhg60dmhCX9M2Rb9idvZu/Dfwbns6e1/z+x3o9RmJOIi9vfpkOfh2sUm9K7oC5il8yfyGjMMOmq1Q2RSMiRrB4zGJaerTkkR8e4e2db2Mym6wdlhAXKK4oZk7CHDoHdObWqNrNIufo4MjsobMJdA/kyfgnyS3Nrecor04S/VUsTF5IkHsQN0bcaO1Q7E5ki0gWjF7ALe1v4d097/LQDw9J0TVhUz7c/yFZxVk81/e5Ot0Z7ufmxz+H/5O80jyeWftM1Ui0xiKJ/gqOnDnCxuMbmdhxIs4OjV//pTlwd3LnpcEv8beBf2NX1i4mrZrEzqyd1g5LCI4XHufDfR9yc9ub6RVc98lmOgd05oX+L7Dl5Bbm7phbDxHWnCT6K1iUvAgnBycmdpho7VDs3u3Rt7Ng9AJcnVz5/Zrf8/H+j+XuT2FVcxLmoFBM6z2t3rZ5a9StTO44mY/2f8Sao2vqbbtXI4n+MgrLC/ky9UtGRo4k0D3Q2uE0CzH+MSwZu4ThbYbz+vbXeTL+SQrKC6wdlmiGEk4l8N3R77i3672EeIbU67af7fMsPYN6MmPDDFLyUup125cjif4yvjz0JcWVxdwZIxdhG5O3izdzhs1hetx01h5by5TVU+p9/lghrsRkNjFr6yxCPEO4p+s99b59Z0dn3hj2Bp7Ono3WmJFEfwlmbWZx8mK6BnSle1B3a4fT7CiluLvL3Xww6gPKTGX85uvf8HnK59KVIxrFytSVJOUmMa33tAYrNRzsEcwb171B5tlM/rL+L5i1uUH2c44k+kvYdHwTRwuOypBKK+sV3IulY5cS2zKWmRtn8sKGFy4onCZEfTtbfpZ/7fwXvYJ7XVSmo77Ftozl6T5PE58Rz7t73m3QfUmiv4SFyQvxd/NnZORIa4fS7AW4B/CfG//Dwz0e5qtDX/Gbb37D0TNHrR2WsFPz98wnrzSPZ/s+2yhVVKfGTGVcu3G8s+sd1mWsa7D9SKL/lWMFx1ifsZ4JHSZYtQiROM/RwZE/9vwj826cR3ZxNlO+nsJ3R7+zdljCzqQVpLEgaQG3Rd1Gl4AujbJPpRQzBsygo39Hnlv/HOkF6Q2yH0n0v7LowCIclSOTOkyydijiVwaHDmbZuGW0923P9LXTmbV1lhRGE/Xm9W2v4+roymOxjzXqft2c3PjnsH/ioBx4Iv4JiiuK630fkuirKa4oZmXKSm6IuMHq9aPFpYV4hvDRyI+4q9NdLEhawD3f3cPJopPWDks0cRszNxKfEc+D3R+0ynDqMO8wZg+ZTWpeKi9uerHeBx5Ioq9m9eHVnK04K0Mqbdy5wmivX/c6qXmpTFw1kQ2ZG6wdlmiiKswVzN42mzbebbirk/Wm/hsYOpDHYh/j2yPf8mnip/W6bUn0FlprFiUvIsY/pl5udxYNb2TkSBaPXUygeyB/+OEPzNs1TwqjiWu29MBSDp05xNNxT1v9utx9Xe/jhvAbmJMwh20nt9XbdiXRW2w9uZXU/FTujLlT5ixtQtq2aMvCMQsZ134c7+x+h4d/eNgq1QFF05Rfms+8XfPo36q/TcwFrZTipUEvEe4TzvS10+utW1ISvcXCpIX4uvpyc9ubrR2KuEbuTu68NOgl/jrwr+w4tYOJqyayK2uXtcMSTcC/d/2boooinunzjM008M5NQ1hmKmNa/DTKTeV13qYkeowqdfEZ8YyPHo+bk5u1wxG1oJRifPR4ozCao1EY7ZP9n8jdtOKyUvJSWHpwKRM7TCTaL9ra4VygXYt2vDzoZfae3ss/tv6jztuTRA8sPrAYgMkdJ1s5ElFXnQI6sXjsYoaGDeW17a8xLX4aZ8vPWjssYWO01szaNgsvZy/+2NM2J8W7IeIG7u92P8sPLmfFwRV12ladEr1S6kml1H6l1D6l1CKllJtSqq1SaotSKkUptUQpZdN3HZVWlvJ5yucMbzOc1l6trR2OqAc+Lj68OfxNpsdN5+djPzNp1SQ+T/mc0spSa4cmbMTPx35my4kt/LHnH/F187V2OJf1aM9HGdh6IC9veZm92XtrvZ1aJ3qlVCjwGBCnte4KOAJTgFnAP7XW0UAecF+to2sE3xz5hjNlZ2RIpZ2pKow28gM8nT2ZuXEmNy2/ibd3vk12cba1wxNWVG4q5/Xtr9O+RXsmdbTtGyMdHRyZNWQWwR7BPBn/ZK1nYKtr140T4K6UcgI8gBPA9cByy+sfA7fVcR8NRmvNwqSFRPlG0Sekj7XDEQ0gtmUsy8Yt44ORH9AjuAfz98znphU38Zdf/kJSTpK1wxNWsCBpAcfOHuOZvs/g5OBk7XCuytfNl38O+yf5Zfk8ve7pWk1DWOtEr7XOBF4H0jES/BkgAcjXWp+LJAMIre0+GtqOrB0cyDvA1JipNnPFXdQ/pRR9Qvrw1vVvsfr21UzuOJkf0n5g0upJ3LPmHn5M/1HG3zcTp0tO8+7udxkWNoyBrQdaO5wa6xTQiZkDZrLt5DbeTH***t9fl64bP+BWoC3QGvAELjU28ZLDHpRSDyqltiultmdnW+dUemHSQrxdvBnbbqxV9i8aX7hPOM/1fY7/Tfwf0+Omc6LwBE/8/ARjvxjLgsQFFJYXWjtE0YD+teNflJvLmd5nurVDuWbj2o9jasxUPk78mG+PfHtN761L182NwBGtdbbWugL4HBgI+Fq6cgDCgOOXerPWer7WOk5rHRcUFFSHMGrnh7Qf+DH9R26Puh0PZ49G37+wLh8XH+7ucjdfj/+aN657gyCPIGZtm8WNy29k1tZZZJzNsHaIop7tz9nPytSV/LbTb4nwibB2OLXydNzT9AruxcyNMzmYd7DG76tLok8H+iulPJTR73EDkAj8DEywrHM38GUd9lHvtNa8u/tdnox/ki4BXXig2wPWDklYkZODEzdF3sQnN3/CojGLuC7sOhYnL2bMF2N44ucnSDiVIGPx7YDWmllbZ+Hn5seD3R+0dji15uzozBvXvYGXsxdP/PxEjd9Xlz76LRgXXXcAey3bmg88C0xTSqUCAcD7td1HfSutLOXZdc/y9q63GdtuLB+M+sCmh1aJxtU1sCuzhs5izR1ruLfrvWw/tZ171tzD5NWTWXVolZREbsLWHF3DzqydPB77OF4uXtYOp06CPIKYM2wOJ4pO1Pg9yhZaK3FxcXr79u0Nuo+s4iwe++kxEnMSeSz2Me7rep9cgBVXVFJZwurDq1mQuIDDZw4T6B7IlI5TmNRxEn5uftYOT9RQSWUJ474Yh7+bP4vGLMLRwdHaIdWLZQeXManjpAStddzV1m0WiX7f6X08/tPjnK04y6tDXuX68OsbbF/C/pi1mU3HN/Fp4qdsOL4BV0dXxrYby12d7iLKL8ra4YmreGfXO8zbPY+PRn1E75a9rR1OvVJK1SjR2/4g0jpac2QN/7fh/whwC+DTmz+lo39Ha4ckmhgH5cCg0EEMCh3EofxDLEhawKpDq1iRsoIBrQZwV+e7GBw6GAclFUVszYnCE3yw7wNGRY6yuyR/Ley2RW/WZubtmse7e94lNjiWOcPmEOAeUK/7EM1Xfmk+y1OWsyhpEVklWUT6RHJXp7sY136cjOKyIc+sfYafjv3EqttW0cqrlbXDqXc1bdHbZaIvrijm/zb8H/9L+x+3Rd3GC/1fsPqEAsI+VZgq+D7tez5N/JT9OfvxcfFhQocJTI2ZSohniLXDa9Z2nNrB3Wvu5uEeD9ts4bK6araJ/mTRSf700584mHeQab2n8bvOv5OLrqLBaa3Zlb2LTxM/5cf0H1Eoboq4ifu7308Hvw7WDq/ZMWszU7+eSk5JDl/d9pXdnmU1yz763dm7efynxykzlfHW9W8xNGyotUMSzYRSil7BvegV3IvMwkwWJS1iRcoKvk/7nqkxU3mk5yN4u3hbO8xm48vUL0nMSeTVIa/abZK/FnZz9WjVoVXcu+Ze3J3cWTB6gSR5YTWhXqFM7zOdb8d/y/jo8XyW9Bm3rLyF1Ycc29InAAAgAElEQVRXy81XjaCwvJC5O+bSM6gno9uOtnY4NqHJJ3qzNvPPhH/y51/+TI/gHiwas4j2vu2tHZYQ+Lr5MmPADBaNWUSIRwjPr3+e33/3+2u6dV1cu/l755NTmsOzfZ+VbluLJp3oiyqKePynx/lg3wdM7DCRd0e8K3e6CpvTJbALn435jJkDZpKan8qkVZOYvW22FFBrAOkF6SxIXMCt7W+la2BXa4djM5psos8szOS33/6W9Znreb7v87zQ/wWcHZytHZYQl+SgHJjQYQKrb1vN7dG3syBxAeNWjpPunHr2+vbXcXZw5vHYx60dik1pkol+x6kdTF09lZOFJ5l34zzu7HSnnKKJJsHXzZeZA2aycMxCWnq0rOrOSclLsXZoTd6m45v4+djPPND9AYI8Gr8iri1rcon+i5QvuO/7+2jh2oLPxnzWpCYPEOKcroFd+Wz0Z8wYMIPU/FQmrpoo3Tl1UGmuZPa22YR5hfHbzr+1djg2p8kkepPZxGvbXmPGxhnEtYxjwegFtG3R1tphCVFrjg6OTOww8aLunK8Pfy3dOddAa817e98jNT+V6XHTcXV0tXZINqdJJPqz5Wd59KdH+STxE+6MuZN3bnyHFq4trB2WEPXiXHfOZ6M/o6VHS55b/xz3fncvqXmp1g7N5hVXFPPc+ueYt2seIyJGSMHCy7D5O2OPFRzj0Z8eJb0gnef7PW/zs7YLURcms4nPUz9n7o65FJYX8ptOv+EPPf7Q5GuoN4SUvBSeWvsUaQVpPNrzUe7rdl+zKyxX0ztjbfq3svXEVqZ+M5Wc0hzeHfGuJHlh985156y6bRW3Rd3Gp4mfcsvKW6Q751e+OvQVd359JwVlBbw34j0e6P5As0vy18JmfzNLDyzlof89RIBbAItGL6Jvq77WDkmIRuPn5seLA19kwegFBHkESXeORWllKTM3zuQvv/yFbkHdWH7LcskNNWBzXTfnrp4vSl7E4NDBzB46W2qEiGbNZDaxImUFc3fMpbii2OjO6fkHPJ09rR1ao0orSOOp+Kc4kHeAB7o9wCM9H8HJwa7KdV2zJlm98kzZGaavnc7mE5u5u/PdPNn7SbuZ9kuIusorzWPujrmsSFlBkHsQ0+Omc3Pbm5vFPSTfH/2eGRtn4OTgxD8G/4MhYUOsHZJNaJREr5TyBf4LdAU0cC9wAFgCRAJHgUla67wrbScuLk4v+3EZf/rpT2QWZjKj/wxuj7691nEJYc/2ZO/h5S0vk5iTSJ+QPvy575/tdkrDClMFbyS8wWdJn9E9sDuvX/e6XU4gUluNleg/BtZrrf+rlHIBPIA/A7la61eVUs8BflrrZ6+0nZgeMdr/OX+clBNvDn+T2JaxtY5JiOagOXTnHC88zvS109l7ei93dbqLab2n4ewoZU6qa/BEr5TyAXYD7XS1jSilDgDDtNYnlFKtgHit9RUnavVo66FHvjmSt294m1Cv0FrFI0RzlFuay9wdc/k85XOC3YOZ3mc6oyJHNfnunHUZ6/jzL3/GZDbxt0F/Y0TECGuHZJMaI9H3BOYDiUAPIAF4HMjUWvtWWy9Pa+13pW217NhSH9532K5aI0I0pt3Zu3l588sk5SbRN6Qvz/d9vkl251SaK/n3rn/z373/paNfR+YMm0O4T7i1w7JZjZHo44DNwCCt9Ral1FygAPhTTRK9UupB4EGA8PDw3mlpabWKQwhhMJlNLD+4nLk7jZutBoUOYkrHKQwOHdwkBjVkF2fzzLpn2H5qO3dE38FzfZ/DzcnN2mHZtMZI9CHAZq11pOX5EOA5IIpr7Lqp78nBhWjOcktzWZy8mOUHl5Ndkk1rz9ZM7DiR8dHj8Xfzt3Z4l7TlxBaeWfcMJZUlvND/Bca1H2ftkJqExroYux64X2t9QCn1InCu7yWn2sVYf631M1fajiR6IepfhbmCn9N/ZsmBJWw9uRVnB2duiryJKR2n0COoh03045u1mff2vMe83fOI8IlgznVzmmSXk7U0VqLviTG80gU4DPwe427bpUA4kA5M1FrnXmk7kuiFaFiH8g+x5MASvjr0FUUVRcT4xzC542RGtx1ttcmz80rzeH7982w4voHRbUczc8BMmcj7GjXJG6aEEA2ruKKY1YdXs+TAEg7mHcTL2Ytbo25lUsdJtGvRrtHi2JW1i+lrp5NbmstzfZ9jYoeJNnGG0dRIohdCXJbWml3Zu1icvJjv076n0lxJv5B+TI6ZzPA2wxustIDWmk8SP+HNhDdp6dmSOcPm0Dmgc4PsqzmQRC+EqJHTJadZmbqSpQeWcqLoBMHuwUzoOIEJ0RPqdUq+gvICZmyYwY/pP3J9m+v5++C/4+PiU2/bb44k0QshronJbGJ95noWH1jMhswNOCknrg+/nikxU4hrGVenrpXEnESein+Kk0UneaL3E/yu8++kq6YeSKIXQtRaekE6Sw8s5YvULygoL6Bdi3ZM7jiZce3HXVM1Wa01yw4uY9bWWfi6+fLGdW/QM7hnA0bevEiiF0LUWWllKWuOrmFJ8hL25ezD3cmdse3GMrnjZDr6X/H2GIorivnrpr/yzZFvGNR6EK8MecVmx/E3VZLohRD1at/pfSw5sIRvj3xLmamM2OBYJneczI0RN+Li6HLBuofyDzEtfhpHC47ySI9HZAaoBiKJXgjRIM6UnWFl6kqWHFjCsbPH8Hfz547oO5jYYSKtvFqx6tAq/r7577g7uTN76Gz6tepn7ZDtliR6IUSDMmszm45vYvGBxazLWAdAjH8MiTmJ9G7Zm9lDZxPsEWzlKO1bTRN9856HSwhRaw7KgUGhgxgUOojjhcdZfnA5P6T/wP3d7uePPf/Y7Kf5syXSohdCiCaqpi16uToihBB2ThK9EELYOUn0Qghh5yTRCyGEnZNEL4QQdk4SvRBC2DlJ9EIIYeck0QshhJ2TRC+EEHauzoleKeWolNqplFpted5WKbVFKZWilFqilHK52jaEEEI0nPpo0T8OJFV7Pgv4p9Y6GsgD7quHfQghhKilOiV6pVQYMAb4r+W5Aq4HlltW+Ri4rS77EEIIUTd1bdG/CTwDmC3PA4B8rXWl5XkGEFrHfQghhKiDWid6pdRYIEtrnVB98SVWvWR5TKXUg0qp7Uqp7dnZ2bUNQwghxFXUpWD0IOAWpdRowA3wwWjh+yqlnCyt+jDg+KXerLWeD8wHo0xxHeIQ4pJKyk2k5xaTfbYMN2cHPFyc8HBxxMPVEU8XJ9ydHXFwuFTbRAj7UutEr7V+HngeQCk1DJiutf6NUmoZMAFYDNwNfFkPcQpxEa01+cUVpOUWk5ZTRHpOcdXjtJxiss6WXXUb7s6OeLo64u5iJH8PF8eqA4Knq5NluWO1g4ST5bmxzNPVEXdn499z68gBRNiahpgC5llgsVLqJWAn8H4D7EM0E2az5kRB6QWJ3PjXSOZnSysvWL+ljysR/p4M7RBEhL8H4QEehPi4UW4yU1Rmori8kuJy49+iMhMlFSaKyoxlRWWVVc9PF5ZdtN618Kh2MDj32NP14gOJx1UOGp4uTni4OlYdQIzxDkJcm3pJ9FrreCDe8vgw0Lc+tiuah9IKExl5xaTlGD/p51rlucVk5JZQbjJXrevsqAjz8yDc34PYcD/C/T2ICPAkIsCDNn4euLs4NkiMZrM2DgLllZSUm6oOGkXlJkosB4NzB5GichPFZdVeKzdRUm6isKySrIIyiisqKS4ztlVaYb76zi2UAg9n46yi6uDgUu1sxHJA8HRxIsDLhZ5t/Oge1gI354b5nYimQyZ1FI2iqKySw9lFVS3xc63y9JxiThSUUn1GS08XR8IDPOkQ7M2ITi0JD/AgMsCTcH8PWvu642iFbhEHB4WnqxOervX7J2OyHEDOHRiqDhZllgNKtWUXr2M8PltayamC0gvOUMoqjQOIs6OiS+sWxEX40dvyE+zjVq+fQdg+SfSi3mmtycwvISEtr+on6UQB5mrJPNDLlYgAD/q3CyA8wIOIAA/C/Y2WeYCnS7PponB0UHi5OuFVzweQ3KLyqt/9jrQ8Pt2cxn9/OQJAG393eoefS/z+dAzxtsrBUzQemRxc1FmFyUzi8QK2W5JKQloeJwtKAaOvule4L70j/OncyseS0D3qvWUsrqy80sz+42eqkv/2tDyyLRervVyd6BXuS6wl+fcK98XbzdnKEYuaqOnk4JLoxTXLLy6/oLW+OyO/qq851Ned3hF+xEX6ERvuR0yIN06OUjvP1mitycgrsST9XBLS8jlw0jjrUgo6tvQmLtLS6g/3p42/e7M5y2pKJNGLeqG15vDpIhKO5lUlhUPZRQA4OSi6tPahd4R/Vf9vSAvp/22qzpZWsOtYftUBfGd6PoVlxqimIG/X8909kX50ae2Dq5Nc5LW2miZ6OX8WFyitMLH7WD4J6XkkHM1jR3oeecUVAPh6ONM73I/xsWHERfjRPcy3wUa5iMbn7ebMkOgghkQHAcaF4oOnzlZ1yW1Py2XN/pMAuDg50COsBbERflUHgAAvV2uGL65AWvTN3KmCUqOlfjSPhPQ89meeodJy1bR9kGdVS713hD/tAj3lRqBmLsvyfUlIM74v+zLPUGEyvi9tAz2JDfdjYPsAhkQHyuieRiBdN+IixeXGEMed6cbFuIS0PDLySgBwdXKgRxtfo389wo9e4X74e8pUAuLKSitM7M08c76xkJZbdQYYE+LN0A5BDIkOpE+kv4znbwCS6JshrTW5ReXn7x6tNlY9zVLz5Zxgb1fLxTajf71zKx9cnOSiqagbs1mTeKKA9SmnWZ+SzfajeZSbzLg6OdC3rT9Do4MY0iGQji295eJuPZBEb6dMZs3x/BLL3aPVErnljtJzF8/OadXCzXL36Pk7SHuE+RLmJ6MoRMMrLq9ky5Fc1h3MZn3KaVKzCgGjoTEkOoihHQIZFBVIoPTv14ok+iastMLEsapEXky6pRxAek4xx/KKq/pEwbjzsY2fUdMlolo5gIgAD8L8POR0WdiU4/kl/JJymnUp2fySepp8SzdPl9Y+RuKPDqR3pJ+M6KkhSfQ2Lr+4/MJEXq1g17mbjc7xdnW66O7RcwW7WrWwTkkAIerKZNbsP36G9SmnWXcwm4S0PCrNGndnR/q182dIdBDXdQikfZCXnH1ehiR6G5WWU8TTy/ew9UjuBcuDvV0vTOSWO0gjAjzx83CWL7qwe4VllWw5nFPVzXP4tHG/RqsWbgyJDmRIdBCDogJlkEA1kuhtjNaaz7ak88o3STgqxYND29EhxLsqoXu4yC0NQlR3LLeYX1KNi7q/pJymoLQSpaBbaIuqxB8b7tesBxFIorchJ8+U8uyKPaw9mM3gqEBmT+hOa193a4clRJNhMmv2ZORXjebZkZ6PyazxcHFkQLsAhnYI4roOQUQGelo71EYlid4GaK35avdxXli5j3KTmT+P7sRd/SLkpiMh6qigtILNh3JYl2J086TlFANGjZ6RXUMY1SWETq3sfwinJHoryy0q5/9W7uWbvSfpFe7LGxN70C7Iy9phCWGX0nKK+DEpi+/2n2Tb0VzMGsL9PRjVNYSRXVrSq42fXTawJNFb0Y9Jp3h2xV7OlJTzxI0deGhoO6ngKEQjOV1Yxg+Jp1iz/yQbUk9TYdIEe7tyU5eWjOrSin7t/HG2k79HSfRWcLa0gr+vTmTp9gxiQryZM6knnVv7WDssYS2V5eDgaPwIqygoreDnZKOl/3NyNiUVJlq4O3NDp2BGdQlhaIegJn2vSYMneqVUG+ATIAQwA/O11nOVUv7AEiASOApM0lrnXWlb9pDoNx3KYfqy3Zw4U8JD17XniRuj5aaP5ubsKTi2GY5thWNb4Pgu0CbwCADPYPCy/HgGgVfLix97BMhBoQGVVphYn3KaNftO8kPSKc6UVODu7MiwjkGM6hrC8JhgfJrYhCuNkehbAa201juUUt5AAnAbcA+Qq7V+VSn1HOCntX72Sttqyom+tMLE7DUH+GDDESIDPHhjUg96R/hbOyzR0MwmyEo0EvqxrZC+GfLTjNccXSE0FsL6gJMrFGZBUTYUnoLCbCjKgsrSi7epHMAjsNoBIRi8LAeCXz/28JeDQh1UmMxsOZzLd/tP8t3+k2SdLcPZUTEoKpCRXUIY0bllkyjL0OhdN0qpL4G3LT/DtNYnLAeDeK11xyu9t6km+t3H8pm2dBeHsov43YAInrs5RsbD26vSAsjcfj6pZ2yH8rPGa57BEN4P2vSDNv2hVXcjwV+O1lBWcD7pVz8AXPDY8mMqu3gbysE4G7jcmYJ/e+NgY+ejTuqD2azZeSyf7/afZM2+k6TnFuOgIC7Sn1FdQhjZNYRQGx0O3aiJXikVCawDugLpWmvfaq/laa39LvGeB4EHAcLDw3unpaXVOY7GUmEy89aPKfw7/hBBXq68NrF71WQNwg5obbTOzyX1Y1shaz9oM6CgZRdo09dI6m36gl9kwyXUqoOCJelXPwBc8NhyxmAqP//eoE7Q5z7oPhnc5FpRTWitST55ljX7jJZ+8knjYN4ttIVlBE8IUcG2M3qu0RK9UsoLWAu8rLX+XCmVX5NEX11TatEfPHWWaUt3sS+zgPG9Qpl5SxdauDetfj3xK5XlcHKPJalbumIKjZmUcPGCsLjzST0sDtxaWDfey9EaSs8YST99M2x/H47vBGdP6D4R4u4zzjZEjR09XWS09PefZGd6PgBRwV6MtIzg6RrqY9Wx+o2S6JVSzsBq4Dut9RzLsgPYYdeNyax5/5fDvP79QbxcnXjl9q6M6trK2mGJ2ijKgYxqrfXjO873mfuGn0/q4f0huHPT7gvP3GEk/L0roLLEuG4Qdx90uR2cZQaoa3HyTCnfJxot/c2HczGZNaG+7ozsEsItPVvTI6xFoyf9xrgYq4CPMS68PlFt+WtATrWLsf5a62eutC1bT/TpOcVMX7abrUdzGdG5Ja/c3o0gb9u/UCMwWrmnD55P6sc2Q06q8ZqDM7TqYfSth/eDsL7gY6cH75I82LUItn8AOSng7ge97oLev4eA9taOrsnJKyrnh6RTfLf/JOtSTlNeaaZdkCfje4VyW69Qwvw8GiWOxkj0g4H1wF6M4ZUAfwa2AEuBcCAdmKi1zr3kRixsNdFrrVm09RgvfZ2Io1LMvKULd8SG2v1t1U2e2Wy02JNWQdJXkJ9uLHf3P5/U2/SD1r3A2TYvsjUYreHIOqOVn/w1mCuh/fVGK7/DKHCUwQTXqqC0gm/3nmDFjsyqqrT92vpzR2wYN3cLwbsBh2zKDVN1dKrAKEQWfyCbQVEBzJ7Qw2avvAuMfvaj643knvy1caHS0QXaDYOOoyFyMAREySiU6s6ehB2fQMJHUJAJPqEQezfE/s5+z2wa2LHcYlbuzOSLnZkcPl2Eq5MDN3UJYXyvUIZEB9b7HfKS6OvgXCGyskoTz42K4XcDIu2yTkaTV14Mh34ykvvBb40Lkc6eED0COo2D6JtktElNmCrh4BqjlX/oJ3BwgpgxRiu/7VA5ONaC1ppdx/L5YmcmX+0+Tn5xBYFertzSozXjY0Pp0rp+LuJKoq+FvKJy/u/LfXy95wQ92/gyZ5IUIrM5JfmQ8r3RJZPyg3GB0d3PaLXHjIX2w5tfd0x9yjkECR/CzgVGv35ANMTdCz2nGr9ncc3KK83EH8jii52Z/JiURbnJTIeWXtzeK4zberWmVYvaf18l0V+jn5OzeGbFHvKLpRCZzSnMMrpjklYZ/cvmCvBuZbQ6O42DiEHgKENc61VFKSSuhG3vG9c7nNyh2x1GKz801trRNVn5xeV8vfcEn+/IJCEtD6VgYPsAxvcKY1TXEDxdr+0aiST6Giosq+Sl1Yks3naMji29mTO5B11a2+g46eYkLw2SV0PSakjfBGjwa2sk9k63QGhvcJADcaM4scfo1tmzDCqKjIvYcfdB1zvApXFGl9ijo6eL+MLSn5+eW4y7syOjuoZwe69QBkUF1mguaEn0V2E2azYdzuHZFXs4nl/Cg0Pb8+QIKURmVdkHjC6ZpFVwYrexrGVXI7nHjDXuSJX+YuspPQN7lhqt/Owk48axHncaXTtBHawdXZOltSYhLY/Pd2ayevdxCkoraenjyq09Q7m9VyidWl3+OpMk+l8pKTexOyOfhLQ8EtLy2JGeR35xBREBHsyRQmTWobVx52bSKuMnJ8VYHtYXOo01kruM8bY9WkPaRqOVn/iV0ZUWOQT63G90p0k3Wq2VVpj4OTmLFTsyiT+QRaVZ06mVD+N7hXJrz9YE+1x4k1uzT/Qnz5SSkJbH9rRcdqTlsf94AZVm47NGBXvRO9yP3pF+jO3eSgqRNSazyeiKSVpldMsUZIByNIY/dhpnJAqf1taOUtRUYRbs/BS2fwRn0sErxEj4fe4zKmyKWsspLGP1nhN8vjOT3cfycVAwODqIO2JDualzCO4ujs0r0VeazCSfPFvVWk9IyyMzvwQAN2cHeoT50jvCj7hIP3q18cPP06W+Qhc1YTbDkbWw/3NI/gaKTxulfKNuMJJ7h1GSFJo6swlSf4Ct70Hq/8DZA3r9FgY8YhR9E3WSmlVYNT4/M78ETxdHbu7Wijcm9bTfRF9QWsHO9HwSjuaSkJ7HrvR8ispNALT0cSUuwp/eEX70jvCjc2sfu5k2rMkpzoVdnxm33eceBhdv6DDS6JaJGgGuMnTVLp1KhE1vG/352gSdb4WBj8lonXpgNmu2Hs3l8x0ZfLP3JPv/Nso+Er3WmvTcYrYfzSMhPY+Eo3kczDqL1uCgoFMrH+Ii/Ii1JPZQX3cpUWBNWhu12re/D/s+N2qpt+lvnMp3ukUKaTUnBcdhy39g+4dGqeXIIUbCj7pRRkzVg5JyEx6uTk0z0ZdVmtiXecboXz9qXDQ9XWjU2PZ2daJXhB9xlqTes43vNY87FQ2kvAj2LjNGZJzcY5T37T7ZGJER0tXa0QlrKi0wSi1snmeUWgiKgYF/gm4TrzxBi7iqJtVH36FLD/371xazPS2PvRlnKDcZNdIiAjyqumDiIvyJDvaSUgS2JivZaL3vXmy02lp2NZJ790ng6m3t6IQtMVUYZ3kb/wWn9hkXbvs/bFTQdPe9+vvFRZpUondtFa0j7v0XXUN9iIv0JzbcSO5SCthGVZYb4923fwBpG4ziYZ1vM7pn2vSTse7iyrQ2aups/BccjjfO/nrfA/0eBt821o6uSWlSib5z9556R0ICbs5ys5JNy083Kh3u+MSYxcg3wmi997oLPAOtHZ1oik7sgY1vwb4VRgOhy3ijW0dmwqqRJpXobaHWjbgMswlSfzS6Zw5+Z/wxdhhl3ALf/nq5qCbqR/4x2PwO7PgYyguN8tIDHzO+Y3KGeFmS6EXdFGYbN8IkfGi05D2DjTrlve+R02vRcEryje/c5v8Y8/a27Ga08LuOlztuL0ESvbh2Wp+fVDrxSzCVG0Pi4u41yhE4yY1mopFUlhmjuDa+BdnJxqQo/f9gTIwicwxUkUQvaq60APYsMS6uZiWCawuj/njcvRB0xXndhWhYZrNxp+2Gf0HaL8Z3M+4e6PcHmQULSfSiJk7uNca9711m9Iu26mHUKel6B7h4Wjs6IS6UmWC08BO/NOojdZ9kdOsEd7J2ZFZT00TfYHcbKaVGAXMBR+C/WutXG2pf4hpUlBp/KNvfh2NbwMnNSOznJpSQC1/CVoX2hokfQe4R4+arnQuMEhtRI2DQY0Y3o3x/L6lBWvRKKUfgIDACyAC2AVO11omXWl9a9A2ossyo7Z6+2UjsaRssU8RFGV0zPaZKQTHRNBXnGmekW981hvsGREPEQAjvb9zP4d/O7hO/tVv0fYFUrfVhSzCLgVuBSyZ6UY8Ks42EfmwLHNtq1Hs3lRmv+beDDjdDj8nQ9jq7/yMQds7DH657GgY+atyZnfw17F9pDNEE8Ag0En54P+PfVj2bba2lhkr0ocCxas8zgH7VV1BKPQg8CBAeHt5AYdg5s9kYkXAuqR/bbFSJBONu1da9oN9Dxpe8TV/wCrZuvEI0BGd3iPu98WM2w+kDljNYy9/Ega+N9RxdjGTfpu/5Vn8z+ZtoqER/qabiBX1EWuv5wHwwum4aKA77UlZoXJCqarFvg7IzxmueQcYXt/fvjX9b95SCUaL5cXAwLs4GdzISPxhnuRlbzyf/rfONMspgzENcvdUfFAMO9neHfkMl+gyg+l01YcDxBtqXfdIazmRUS+pb4OQ+o743yvgidx1//kvq11a6YoS4FK8gY+aymDHG819ftzr0I+xZbLzm6gNhfc7/XYX2tovifA2V6LcB0UqptkAmMAW4s4H2ZR9MFcZwx3NJPX0LnLUcG509ISwOhjxl+fLFSbU/IWrLydXovmnT13iuNeQdMVr751r98f8ANCgHY1L6Nv3PJ/8WbZpco6pBEr3WulIp9SjwHcbwyg+01vsbYl9NVnEuZGw7n9QzE6DSmP6QFuHG6IFzX6zgLuAodfeFaBBKGQMV/NtBjynGspJ8yNxu/G0e2wK7FsK294zXvFtZrntZ/j5Dutt8eQa5YaqxVJQYk2If+gkO/WzU4wZwcDK+KOe+NGF9oUWodWMVQlzIVAlZ+y9s9Z9JN15z9oCON0P3KUYRtkZslMmdsdamtVFO4NBPxk/aRqgsNa78hw+AtkONK/+tY8HFw9rRCiGuVcFxo7V/ZB3s/8K4P8UzGLpNMM4MQro3eBePJHprKMw2JlI4l9wLTxrLg2KMI337G4wuGUnsQtiXynJI+d64qHtgDZgrIKiTkfC7TwKf1g2yW0n0jaGyzDiNO5fYT+4xlrv7Q/vhRnJvN1y6YoRoTopzjRb+niVGix9lnMH3mAqdxoGrV73tShJ9Q9Aasg9U647ZABXFRj97m/5Gco+6AUJ6yIQcQgjIOQR7lsLuRZCfZvTndxoH3Scbk6vUccy+JPr6UpQDR+LPX0QtyDSWB0RbujeMg7UAAAhPSURBVGOuh8hBdjHWVgjRQLQ2Wve7Fxmt/dIzxuidbhOMln7LLrXarCT62qosN+6iO9dqP74L0ODmaxyB219vtNx9pWyDEKIWKkrh4BqjayflezBXGjNp9ZgM3SaCd0iNNyWJvqa0hpzU84n9yHqoKDLqXbfpe77V3rqXXd4aLYSwoqLTsO9z4yJuZoJxg1a74UYrP2bMVQduSKI/R2soKzBGxBRlQeGp84/PZMLRX86Ph/VvV607ZjC4tWiYmIQQ4teyDxqt/D1L4MwxcPGCzrca/fmRQy553c++E73WUHbWqEFdeAoKsy7z2JLQK0sv3oZyMAqBhfU5n9z929bfhxJCiNowmyF9o6U//0soP2vMmdt9knFTVnBM1apNL9Fv22ZMZ1eYZUnWWVd+fLnk7RFolB71DAKvlkZBI8/gix97+EtXjBDCtlWUwIFvjHr7qT8aRQ1b9TTG53edgPIObkKJvo2H3v6Qz/laLxdQ4BloSdCW5O0ZZCTzqsctjeceAZK8hRD2qTAL9i43+vNP7AbliHoxz7pzxl4TF0/oc5+lJV49oVuStxT0EkI0d17BMOAR4ycryWjl87cavdU2WvS2NLxSCCGaiJr20cvtm0IIYeck0QshhJ2TRC+EEHZOEr0QQtg5SfRCCGHnJNELIYSdk0QvhBB2ThK9EELYOZu4YUopdRY4YO04GlkgcNraQTQy+czNg3zmxhOhtQ662kq2UlvgQE3u7rInSqnt8pntn3zm5sHWP7N03QghhJ2TRC+EEHbOVhL9fGsHYAXymZsH+czNg01/Zpu4GCuEEKLh2EqLXgghRAOpVaJXSrVRSv2slEpSSu1XSj1uWe6vlPqfUirF8q+fZXmMUmqTUqpMKTX9V9t60rKNfUqpRUopt8vs827LdlOUUndXW/6yUuqYUqqwNp+liX7mNUqp3ZZt/Ecp1SDTatnYZ45XSh1QSu2y/ATb82dWSnlX+6y7lFKnlVJv2vNntiyfrJTaY9nG7Ib4vFb8zGuUUvlKqdW/Wv6oUipVKaWVUoEN8oG11tf8A7QCYi2PvYGDQGdgNvCcZflzwCzL42CgD/AyML3adkKBI4C75flS4J5L7M8fOGz518/y2M/yWn9LPIW1+SxN9DP7WP5VwApgSjP4zPFAXEP+H9vaZ/7VegnAUHv+zEAAkA4EWdb7GLjBHj6z5bUbgHHA6l8t7wVEAkeBwIb4vLVq0WutT2itd1genwWSLB/4Vst/zrn/pNss62RprbcBFZfYnNP/t3d3IVKVcRzHv3/aEnUhrYvNXsDeYKG3vdJeDMrEDQkRIsgsoi6CvJCug2C7i4oguuhmKyi2EERqQ8Jsu8gELULRrcxEpSyp1V4UKsn238X/mTrp7LgzOzPn6fj7wLA7Z87bT8b/Oc8553kWmG1mPcAc4Ps68wwCW9z9J3f/GdgC3J3Wvd3dj7SSoxmZZT5eWM8FQEdutOSUuVtyzGxm1xKFZusM49WVUeargH3uPpHm+wC4tw0Rz1BCZtx9DDhRZ/pOdz80o0BnMeNr9Ga2kDgi7QD6akU3/WzYvHb374DniaP4EeBXd3+/zqyXAd8W3h9O00qRQ2Yz2wz8SHxxNrQYZdpyyAy8li5jPGVm1mKUacskM8BqYL2n079OKjnzfqDfzBamorkKuGImeaajS5lLNaNCb2a9xKWDJwpnmc0sP584gl4JXArMNbMH681aZ1opjwvlktndB4nm5yxgabP70YxMMq9x9xuA29ProWb3oxmZZK65H3ir2X1oVtmZ09n948B6ovVyCDjV7H40o4uZS9VyoTez84l/oBF335gm/2BmC9LnC4gzzkaWAQfdfcLd/wQ2Area2eLCTaiVxBG/eGS/nCmaR52UW2Z3/wMYJb5oHZFL5nTmVGtmvwksak/CM+WSOW3rJqDH3T9rS7gp5JLZ3d9198Xufgsx/tXX7cp4ui5nLlWrT90Y8Arwpbu/UPhoFKjdQX8YeOcsq/oGuNnM5qR13pXWucPdB9JrFNgMLDez+ekIujxN65pcMptZb+GL2AOsAPa2K2dRRpl7ak8jpP+c9wDj7cpZlEvmwnpW0+Gz+ZwyW3qaKk1fCwy3J+V/lZC5XN7aHeslRPNyN7ArvVYQd83HiKPwGHBRmv8S4ih+HPgl/V57cuRpolCNA28As6bY5qPENbz9wCOF6c+m9U2mn0OtZPq/ZAb6gE/TfnwOvESc8VU581ziqZNa5heB86qcufDZAaC/E1lzzEwc1L5Ir448TVZi5q3ABPB7Wn4wTV+X3p8iWjbD7c6rnrEiIhWnnrEiIhWnQi8iUnEq9CIiFadCLyJScSr0IiIVp0Iv0gIzm2dmawvv77DTRiUUyYUKvUhr5hEdekSyp0IvlZcGydprZsMWY4aPmNkyM9tmMe74IotxyN+2GAt9u5ndmJYdMrNXLcbDP2Bm69JqnwGuTl3cn0vTes1sQ9rWSOopKVK6nrJ3QKRLrgHuAx4jehY/QPSOXAk8SYymuNPdV5nZUuB1YCAt2w/cSYxb/pWZvUyMVX69uw9AXLohRkC8jujduA24Dfi4G+FEGtEZvZwrDrr7HnefJIZRGPPoFr6H+KMPS4ju67j7h8DFZnZhWnaTu59096PEIFd9U2zjE3c/nLaxK61XpHQq9HKuOFn4fbLwfpJo2TYaLri47F9M3RKe7nwiXaVCLxI+AtbAP5dhjnrj8clPEJdyRLKnMw6RMET8BavdwG/8O1RtXe5+LN3MHQfeAzZ1fhdFWqPRK0VEKk6XbkREKk6FXkSk4lToRUQqToVeRKTiVOhFRCpOhV5EpOJU6EVEKk6FXkSk4v4GsPIi+oY43AcAAAAASUVORK5CYII=)]
为什么要学习分层索引MultiIndex?
演示数据:百度、阿里巴巴、爱奇艺、京东四家公司的10天股票数据
数据来自:英为财经
https://cn.investing.com/
本次演示提纲:
一、Series的分层索引MultiIndex
二、Series有多层索引怎样筛选数据?
三、DataFrame的多层索引MultiIndex
四、DataFrame有多层索引怎样筛选数据?
In [7]:
import pandas as pd %matplotlib inlineIn [8]:
fpath = "./pandas-learn-code/datas/stocks/互联网公司股票.xlsx"stocks = pd.read_excel(fpath)In [10]:
stocks.shapeOut[10]:
(12, 8)In [5]:
stocks.head()Out[5]:
| 0 | 2019-10-03 | BIDU | 104.32 | 102.35 | 104.73 | 101.15 | 2.24 | 0.02 |
| 1 | 2019-10-02 | BIDU | 102.62 | 100.85 | 103.24 | 99.50 | 2.69 | 0.01 |
| 2 | 2019-10-01 | BIDU | 102.00 | 102.80 | 103.26 | 101.00 | 1.78 | -0.01 |
| 3 | 2019-10-03 | BABA | 169.48 | 166.65 | 170.18 | 165.00 | 10.39 | 0.02 |
| 4 | 2019-10-02 | BABA | 165.77 | 162.82 | 166.88 | 161.90 | 11.60 | 0.00 |
In [12]:
stocks["公司"].unique()Out[12]:
array(['BIDU', 'BABA', 'IQ', 'JD'], dtype=object)In [14]:
# 按公司分组查询收盘价的平均值stocks.groupby("公司")["收盘"].mean()Out[14]:
公司BABA 166.80BIDU 102.98IQ 15.90JD 28.35Name: 收盘, dtype: float64In [16]:
# ser是Series,有两列索引ser = stocks.groupby(["公司", "日期"])["收盘"].mean()serOut[16]:
公司 日期 BABA 2019-10-01 165.152019-10-02 165.772019-10-03 169.48BIDU 2019-10-01 102.002019-10-02 102.622019-10-03 104.32IQ 2019-10-01 15.922019-10-02 15.722019-10-03 16.06JD 2019-10-01 28.192019-10-02 28.062019-10-03 28.80Name: 收盘, dtype: float64多维索引中,空白的意思是:使用上面的值
In [20]:
ser.indexOut[20]:
MultiIndex(levels=[['BABA', 'BIDU', 'IQ', 'JD'], ['2019-10-01', '2019-10-02', '2019-10-03']],codes=[[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]],names=['公司', '日期'])In [21]:
# unstack把二级索引变成列# 公司继续作为索引,但日期变为columnsser.unstack()Out[21]:
| 公司 | |||
| BABA | 165.15 | 165.77 | 169.48 |
| BIDU | 102.00 | 102.62 | 104.32 |
| IQ | 15.92 | 15.72 | 16.06 |
| JD | 28.19 | 28.06 | 28.80 |
In [22]:
serOut[22]:
公司 日期 BABA 2019-10-01 165.152019-10-02 165.772019-10-03 169.48BIDU 2019-10-01 102.002019-10-02 102.622019-10-03 104.32IQ 2019-10-01 15.922019-10-02 15.722019-10-03 16.06JD 2019-10-01 28.192019-10-02 28.062019-10-03 28.80Name: 收盘, dtype: float64In [24]:
# 将两层索引(公司,日期)都变成了columnsser.reset_index()Out[24]:
| 0 | BABA | 2019-10-01 | 165.15 |
| 1 | BABA | 2019-10-02 | 165.77 |
| 2 | BABA | 2019-10-03 | 169.48 |
| 3 | BIDU | 2019-10-01 | 102.00 |
| 4 | BIDU | 2019-10-02 | 102.62 |
| 5 | BIDU | 2019-10-03 | 104.32 |
| 6 | IQ | 2019-10-01 | 15.92 |
| 7 | IQ | 2019-10-02 | 15.72 |
| 8 | IQ | 2019-10-03 | 16.06 |
| 9 | JD | 2019-10-01 | 28.19 |
| 10 | JD | 2019-10-02 | 28.06 |
| 11 | JD | 2019-10-03 | 28.80 |
In [25]:
serOut[25]:
公司 日期 BABA 2019-10-01 165.152019-10-02 165.772019-10-03 169.48BIDU 2019-10-01 102.002019-10-02 102.622019-10-03 104.32IQ 2019-10-01 15.922019-10-02 15.722019-10-03 16.06JD 2019-10-01 28.192019-10-02 28.062019-10-03 28.80Name: 收盘, dtype: float64In [27]:
ser.loc["BIDU"]Out[27]:
日期2019-10-01 102.002019-10-02 102.622019-10-03 104.32Name: 收盘, dtype: float64In [ ]:
# 多层索引,可以用元组的形式筛选In [28]:
ser.loc[("BIDU","2019-10-02")]Out[28]:
102.62In [29]:
ser.loc[:, "2019-10-02"]Out[29]:
公司BABA 165.77BIDU 102.62IQ 15.72JD 28.06Name: 收盘, dtype: float64In [30]:
stocks.head()Out[30]:
| 0 | 2019-10-03 | BIDU | 104.32 | 102.35 | 104.73 | 101.15 | 2.24 | 0.02 |
| 1 | 2019-10-02 | BIDU | 102.62 | 100.85 | 103.24 | 99.50 | 2.69 | 0.01 |
| 2 | 2019-10-01 | BIDU | 102.00 | 102.80 | 103.26 | 101.00 | 1.78 | -0.01 |
| 3 | 2019-10-03 | BABA | 169.48 | 166.65 | 170.18 | 165.00 | 10.39 | 0.02 |
| 4 | 2019-10-02 | BABA | 165.77 | 162.82 | 166.88 | 161.90 | 11.60 | 0.00 |
In [40]:
stocks.set_index(["公司", "日期"], inplace=True). . .
In [41]:
stocks.head()Out[41]:
| 公司 | 日期 | ||||||
| BIDU | 2019-10-03 | 104.32 | 102.35 | 104.73 | 101.15 | 2.24 | 0.02 |
| 2019-10-02 | 102.62 | 100.85 | 103.24 | 99.50 | 2.69 | 0.01 | |
| 2019-10-01 | 102.00 | 102.80 | 103.26 | 101.00 | 1.78 | -0.01 | |
| BABA | 2019-10-03 | 169.48 | 166.65 | 170.18 | 165.00 | 10.39 | 0.02 |
| 2019-10-02 | 165.77 | 162.82 | 166.88 | 161.90 | 11.60 | 0.00 |
In [42]:
stocks.indexOut[42]:
MultiIndex(levels=[['BABA', 'BIDU', 'IQ', 'JD'], ['2019-10-01', '2019-10-02', '2019-10-03']],codes=[[1, 1, 1, 0, 0, 0, 2, 2, 2, 3, 3, 3], [2, 1, 0, 2, 1, 0, 2, 1, 0, 2, 1, 0]],names=['公司', '日期'])In [43]:
stocks.sort_index(inplace=True)In [44]:
stocksOut[44]:
| 公司 | 日期 | ||||||
| BABA | 2019-10-01 | 165.15 | 168.01 | 168.23 | 163.64 | 14.19 | -0.01 |
| 2019-10-02 | 165.77 | 162.82 | 166.88 | 161.90 | 11.60 | 0.00 | |
| 2019-10-03 | 169.48 | 166.65 | 170.18 | 165.00 | 10.39 | 0.02 | |
| BIDU | 2019-10-01 | 102.00 | 102.80 | 103.26 | 101.00 | 1.78 | -0.01 |
| 2019-10-02 | 102.62 | 100.85 | 103.24 | 99.50 | 2.69 | 0.01 | |
| 2019-10-03 | 104.32 | 102.35 | 104.73 | 101.15 | 2.24 | 0.02 | |
| IQ | 2019-10-01 | 15.92 | 16.14 | 16.22 | 15.50 | 11.65 | -0.01 |
| 2019-10-02 | 15.72 | 15.85 | 15.87 | 15.12 | 8.10 | -0.01 | |
| 2019-10-03 | 16.06 | 15.71 | 16.38 | 15.32 | 10.08 | 0.02 | |
| JD | 2019-10-01 | 28.19 | 28.22 | 28.57 | 27.97 | 10.64 | 0.00 |
| 2019-10-02 | 28.06 | 28.00 | 28.22 | 27.53 | 9.53 | 0.00 | |
| 2019-10-03 | 28.80 | 28.11 | 28.97 | 27.82 | 8.77 | 0.03 |
【*重要知识*】在选择数据时:
In [45]:
stocks.loc["BIDU"]Out[45]:
| 日期 | ||||||
| 2019-10-01 | 102.00 | 102.80 | 103.26 | 101.00 | 1.78 | -0.01 |
| 2019-10-02 | 102.62 | 100.85 | 103.24 | 99.50 | 2.69 | 0.01 |
| 2019-10-03 | 104.32 | 102.35 | 104.73 | 101.15 | 2.24 | 0.02 |
In [46]:
# BIDU, 2019-10-02当天所有的相关数据stocks.loc[("BIDU", "2019-10-02"), :]Out[46]:
收盘 102.62开盘 100.85高 103.24低 99.50交易量 2.69涨跌幅 0.01Name: (BIDU, 2019-10-02), dtype: float64In [48]:
# 逻辑关系为BIDU的2019-10-02的开盘数据stocks.loc[("BIDU", "2019-10-02"), "开盘"]Out[48]:
100.85In [50]:
# 并列筛选,BIDU和JD为同级关系stocks.loc[["BIDU", "JD"], :]Out[50]:
| 公司 | 日期 | ||||||
| BIDU | 2019-10-01 | 102.00 | 102.80 | 103.26 | 101.00 | 1.78 | -0.01 |
| 2019-10-02 | 102.62 | 100.85 | 103.24 | 99.50 | 2.69 | 0.01 | |
| 2019-10-03 | 104.32 | 102.35 | 104.73 | 101.15 | 2.24 | 0.02 | |
| JD | 2019-10-01 | 28.19 | 28.22 | 28.57 | 27.97 | 10.64 | 0.00 |
| 2019-10-02 | 28.06 | 28.00 | 28.22 | 27.53 | 9.53 | 0.00 | |
| 2019-10-03 | 28.80 | 28.11 | 28.97 | 27.82 | 8.77 | 0.03 |
In [51]:
stocks.loc[(["BIDU", "JD"], "2019-10-03"), :]Out[51]:
| 公司 | 日期 | ||||||
| BIDU | 2019-10-03 | 104.32 | 102.35 | 104.73 | 101.15 | 2.24 | 0.02 |
| JD | 2019-10-03 | 28.80 | 28.11 | 28.97 | 27.82 | 8.77 | 0.03 |
In [52]:
stocks.loc[(["BIDU", "JD"], "2019-10-03"), "收盘"]Out[52]:
公司 日期 BIDU 2019-10-03 104.32JD 2019-10-03 28.80Name: 收盘, dtype: float64In [54]:
stocks.loc[("BIDU",["2019-10-02", "2019-10-03"]), "收盘"]Out[54]:
公司 日期 BIDU 2019-10-02 102.622019-10-03 104.32Name: 收盘, dtype: float64In [55]:
# slice(None)代表筛选这一索引的所有内容stocks.loc[(slice(None), ["2019-10-02", "2019-10-03"]),:]Out[55]:
| 公司 | 日期 | ||||||
| BABA | 2019-10-02 | 165.77 | 162.82 | 166.88 | 161.90 | 11.60 | 0.00 |
| 2019-10-03 | 169.48 | 166.65 | 170.18 | 165.00 | 10.39 | 0.02 | |
| BIDU | 2019-10-02 | 102.62 | 100.85 | 103.24 | 99.50 | 2.69 | 0.01 |
| 2019-10-03 | 104.32 | 102.35 | 104.73 | 101.15 | 2.24 | 0.02 | |
| IQ | 2019-10-02 | 15.72 | 15.85 | 15.87 | 15.12 | 8.10 | -0.01 |
| 2019-10-03 | 16.06 | 15.71 | 16.38 | 15.32 | 10.08 | 0.02 | |
| JD | 2019-10-02 | 28.06 | 28.00 | 28.22 | 27.53 | 9.53 | 0.00 |
| 2019-10-03 | 28.80 | 28.11 | 28.97 | 27.82 | 8.77 | 0.03 |
In [56]:
# 将多层索引恢复成列stocks.reset_index()Out[56]:
| 0 | BABA | 2019-10-01 | 165.15 | 168.01 | 168.23 | 163.64 | 14.19 | -0.01 |
| 1 | BABA | 2019-10-02 | 165.77 | 162.82 | 166.88 | 161.90 | 11.60 | 0.00 |
| 2 | BABA | 2019-10-03 | 169.48 | 166.65 | 170.18 | 165.00 | 10.39 | 0.02 |
| 3 | BIDU | 2019-10-01 | 102.00 | 102.80 | 103.26 | 101.00 | 1.78 | -0.01 |
| 4 | BIDU | 2019-10-02 | 102.62 | 100.85 | 103.24 | 99.50 | 2.69 | 0.01 |
| 5 | BIDU | 2019-10-03 | 104.32 | 102.35 | 104.73 | 101.15 | 2.24 | 0.02 |
| 6 | IQ | 2019-10-01 | 15.92 | 16.14 | 16.22 | 15.50 | 11.65 | -0.01 |
| 7 | IQ | 2019-10-02 | 15.72 | 15.85 | 15.87 | 15.12 | 8.10 | -0.01 |
| 8 | IQ | 2019-10-03 | 16.06 | 15.71 | 16.38 | 15.32 | 10.08 | 0.02 |
| 9 | JD | 2019-10-01 | 28.19 | 28.22 | 28.57 | 27.97 | 10.64 | 0.00 |
| 10 | JD | 2019-10-02 | 28.06 | 28.00 | 28.22 | 27.53 | 9.53 | 0.00 |
| 11 | JD | 2019-10-03 | 28.80 | 28.11 | 28.97 | 27.82 | 8.77 | 0.03 |
数据转换函数对比:map、apply、applymap:
实例:将股票代码英文转换成中文名字
Series.map(dict) or Series.map(function)均可
In [2]:
import pandas as pdstocks = pd.read_excel(r"D:\WinterIsComing\python\New_Wave\pandas_basic\pandas-learn-code\datas\stocks\互联网公司股票.xlsx")In [3]:
stocks.head()Out[3]:
| 0 | 2019-10-03 | BIDU | 104.32 | 102.35 | 104.73 | 101.15 | 2.24 | 0.02 |
| 1 | 2019-10-02 | BIDU | 102.62 | 100.85 | 103.24 | 99.50 | 2.69 | 0.01 |
| 2 | 2019-10-01 | BIDU | 102.00 | 102.80 | 103.26 | 101.00 | 1.78 | -0.01 |
| 3 | 2019-10-03 | BABA | 169.48 | 166.65 | 170.18 | 165.00 | 10.39 | 0.02 |
| 4 | 2019-10-02 | BABA | 165.77 | 162.82 | 166.88 | 161.90 | 11.60 | 0.00 |
In [7]:
stocks["公司"].unique()Out[7]:
array(['BIDU', 'BABA', 'IQ', 'JD'], dtype=object)In [8]:
# 公司股票代码到中文的映射,注意这里是小写dict_company_names={"bidu":"百度","baba":"阿里巴巴","iq":"爱奇艺","jd":"京东"}In [9]:
stocks["中文公司1"]=stocks["公司"].str.lower().map(dict_company_names)In [10]:
stocksOut[10]:
| 0 | 2019-10-03 | BIDU | 104.32 | 102.35 | 104.73 | 101.15 | 2.24 | 0.02 | 百度 |
| 1 | 2019-10-02 | BIDU | 102.62 | 100.85 | 103.24 | 99.50 | 2.69 | 0.01 | 百度 |
| 2 | 2019-10-01 | BIDU | 102.00 | 102.80 | 103.26 | 101.00 | 1.78 | -0.01 | 百度 |
| 3 | 2019-10-03 | BABA | 169.48 | 166.65 | 170.18 | 165.00 | 10.39 | 0.02 | 阿里巴巴 |
| 4 | 2019-10-02 | BABA | 165.77 | 162.82 | 166.88 | 161.90 | 11.60 | 0.00 | 阿里巴巴 |
| 5 | 2019-10-01 | BABA | 165.15 | 168.01 | 168.23 | 163.64 | 14.19 | -0.01 | 阿里巴巴 |
| 6 | 2019-10-03 | IQ | 16.06 | 15.71 | 16.38 | 15.32 | 10.08 | 0.02 | 爱奇艺 |
| 7 | 2019-10-02 | IQ | 15.72 | 15.85 | 15.87 | 15.12 | 8.10 | -0.01 | 爱奇艺 |
| 8 | 2019-10-01 | IQ | 15.92 | 16.14 | 16.22 | 15.50 | 11.65 | -0.01 | 爱奇艺 |
| 9 | 2019-10-03 | JD | 28.80 | 28.11 | 28.97 | 27.82 | 8.77 | 0.03 | 京东 |
| 10 | 2019-10-02 | JD | 28.06 | 28.00 | 28.22 | 27.53 | 9.53 | 0.00 | 京东 |
| 11 | 2019-10-01 | JD | 28.19 | 28.22 | 28.57 | 27.97 | 10.64 | 0.00 | 京东 |
function的参数是Series的每个元素的值
In [13]:
# lambda x中的x代表Series的每个值(即stocks["公司"]中的每个值)stocks["公司中文2"]=stocks["公司"].map(lambda x : dict_company_names[x.lower()])In [12]:
stocks.head()Out[12]:
| 0 | 2019-10-03 | BIDU | 104.32 | 102.35 | 104.73 | 101.15 | 2.24 | 0.02 | 百度 | 百度 |
| 1 | 2019-10-02 | BIDU | 102.62 | 100.85 | 103.24 | 99.50 | 2.69 | 0.01 | 百度 | 百度 |
| 2 | 2019-10-01 | BIDU | 102.00 | 102.80 | 103.26 | 101.00 | 1.78 | -0.01 | 百度 | 百度 |
| 3 | 2019-10-03 | BABA | 169.48 | 166.65 | 170.18 | 165.00 | 10.39 | 0.02 | 阿里巴巴 | 阿里巴巴 |
| 4 | 2019-10-02 | BABA | 165.77 | 162.82 | 166.88 | 161.90 | 11.60 | 0.00 | 阿里巴巴 | 阿里巴巴 |
function的参数是Series的每个值
In [14]:
stocks["中文公司3"]=stocks["公司"].apply(lambda x : dict_company_names[x.lower()])In [16]:
stocks.head()Out[16]:
| 0 | 2019-10-03 | BIDU | 104.32 | 102.35 | 104.73 | 101.15 | 2.24 | 0.02 | 百度 | 百度 | 百度 |
| 1 | 2019-10-02 | BIDU | 102.62 | 100.85 | 103.24 | 99.50 | 2.69 | 0.01 | 百度 | 百度 | 百度 |
| 2 | 2019-10-01 | BIDU | 102.00 | 102.80 | 103.26 | 101.00 | 1.78 | -0.01 | 百度 | 百度 | 百度 |
| 3 | 2019-10-03 | BABA | 169.48 | 166.65 | 170.18 | 165.00 | 10.39 | 0.02 | 阿里巴巴 | 阿里巴巴 | 阿里巴巴 |
| 4 | 2019-10-02 | BABA | 165.77 | 162.82 | 166.88 | 161.90 | 11.60 | 0.00 | 阿里巴巴 | 阿里巴巴 | 阿里巴巴 |
function的参数是对应轴的Series
In [18]:
stocks["中文公司4"]=stocks.apply(lambda x: dict_company_names[x["公司"].lower()], axis=1)In [19]:
stocks["公司"]Out[19]:
0 BIDU1 BIDU2 BIDU3 BABA4 BABA5 BABA6 IQ7 IQ8 IQ9 JD10 JD11 JDName: 公司, dtype: object注意这个代码:
1、apply是在stocks这个DataFrame上调用;
2、lambda x的x是一个Series,因为指定了axis=1所以Seires的key是列名,可以用x[‘公司’]获取
In [20]:
stocks.head()Out[20]:
| 0 | 2019-10-03 | BIDU | 104.32 | 102.35 | 104.73 | 101.15 | 2.24 | 0.02 | 百度 | 百度 | 百度 | 百度 |
| 1 | 2019-10-02 | BIDU | 102.62 | 100.85 | 103.24 | 99.50 | 2.69 | 0.01 | 百度 | 百度 | 百度 | 百度 |
| 2 | 2019-10-01 | BIDU | 102.00 | 102.80 | 103.26 | 101.00 | 1.78 | -0.01 | 百度 | 百度 | 百度 | 百度 |
| 3 | 2019-10-03 | BABA | 169.48 | 166.65 | 170.18 | 165.00 | 10.39 | 0.02 | 阿里巴巴 | 阿里巴巴 | 阿里巴巴 | 阿里巴巴 |
| 4 | 2019-10-02 | BABA | 165.77 | 162.82 | 166.88 | 161.90 | 11.60 | 0.00 | 阿里巴巴 | 阿里巴巴 | 阿里巴巴 | 阿里巴巴 |
In [21]:
sub_df = stocks[["收盘","开盘","高","低","交易量"]]In [22]:
sub_dfOut[22]:
| 0 | 104.32 | 102.35 | 104.73 | 101.15 | 2.24 |
| 1 | 102.62 | 100.85 | 103.24 | 99.50 | 2.69 |
| 2 | 102.00 | 102.80 | 103.26 | 101.00 | 1.78 |
| 3 | 169.48 | 166.65 | 170.18 | 165.00 | 10.39 |
| 4 | 165.77 | 162.82 | 166.88 | 161.90 | 11.60 |
| 5 | 165.15 | 168.01 | 168.23 | 163.64 | 14.19 |
| 6 | 16.06 | 15.71 | 16.38 | 15.32 | 10.08 |
| 7 | 15.72 | 15.85 | 15.87 | 15.12 | 8.10 |
| 8 | 15.92 | 16.14 | 16.22 | 15.50 | 11.65 |
| 9 | 28.80 | 28.11 | 28.97 | 27.82 | 8.77 |
| 10 | 28.06 | 28.00 | 28.22 | 27.53 | 9.53 |
| 11 | 28.19 | 28.22 | 28.57 | 27.97 | 10.64 |
In [23]:
# 将这些数字取整数,应用于所有元素(即表格中所有的值)sub_df.applymap(lambda x: int(x))Out[23]:
| 0 | 104 | 102 | 104 | 101 | 2 |
| 1 | 102 | 100 | 103 | 99 | 2 |
| 2 | 102 | 102 | 103 | 101 | 1 |
| 3 | 169 | 166 | 170 | 165 | 10 |
| 4 | 165 | 162 | 166 | 161 | 11 |
| 5 | 165 | 168 | 168 | 163 | 14 |
| 6 | 16 | 15 | 16 | 15 | 10 |
| 7 | 15 | 15 | 15 | 15 | 8 |
| 8 | 15 | 16 | 16 | 15 | 11 |
| 9 | 28 | 28 | 28 | 27 | 8 |
| 10 | 28 | 28 | 28 | 27 | 9 |
| 11 | 28 | 28 | 28 | 27 | 10 |
In [25]:
# 直接修改原df的这几列stocks.loc[:, ["收盘","开盘","高","低","交易量"]] = sub_df.applymap(lambda x: int(x))In [26]:
stocks.head()Out[26]:
| 0 | 2019-10-03 | BIDU | 104 | 102 | 104 | 101 | 2 | 0.02 | 百度 | 百度 | 百度 | 百度 |
| 1 | 2019-10-02 | BIDU | 102 | 100 | 103 | 99 | 2 | 0.01 | 百度 | 百度 | 百度 | 百度 |
| 2 | 2019-10-01 | BIDU | 102 | 102 | 103 | 101 | 1 | -0.01 | 百度 | 百度 | 百度 | 百度 |
| 3 | 2019-10-03 | BABA | 169 | 166 | 170 | 165 | 10 | 0.02 | 阿里巴巴 | 阿里巴巴 | 阿里巴巴 | 阿里巴巴 |
| 4 | 2019-10-02 | BABA | 165 | 162 | 166 | 161 | 11 | 0.00 | 阿里巴巴 | 阿里巴巴 | 阿里巴巴 | 阿里巴巴 |
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-lUn7YSnp-1597761927713)(http://localhost:8891/notebooks/pandas-learn-code/other_files/pandas-split-apply-combine.png)]
这里的split指的是pandas的groupby,我们自己实现apply函数,apply返回的结果由pandas进行combine得到结果
将不同范围的数值列进行归一化,映射到[0,1]区间:
归一化的公式:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-CuwTEdIl-1597761927714)(http://localhost:8891/notebooks/pandas-learn-code/other_files/Normalization-Formula.jpg)]
每个用户的评分不同,有的乐观派评分高,有的悲观派评分低,按用户做归一化
In [1]:
import pandas as pdIn [7]:
ratings = pd.read_csv("./pandas-learn-code/datas/movielens-1m/ratings.dat",sep="::",engine="python",names="UserID::MovieID::Rating::Timestamp".split("::"))In [8]:
ratings.head()Out[8]:
| 0 | 1 | 1193 | 5 | 978300760 |
| 1 | 1 | 661 | 3 | 978302109 |
| 2 | 1 | 914 | 3 | 978301968 |
| 3 | 1 | 3408 | 4 | 978300275 |
| 4 | 1 | 2355 | 5 | 978824291 |
In [10]:
# 实现按照用户ID分组,然后对ratings进行归一化def ratings_norm(df):# 实际参数是每个用户分组的df(按照UserID分组的DataFrame)max_value = df["Rating"].max()min_value = df["Rating"].min()df["Rating_norm"] = df["Rating"].apply(lambda x:(x - min_value)/(max_value - min_value))return df# 按照用户分组,apply一个函数,给该DataFrame新增了一列,实现了Rating列的归一化ratings = ratings.groupby("UserID").apply(ratings_norm)In [12]:
ratings["Rating"]. . .
In [16]:
type(ratings)Out[16]:
pandas.core.frame.DataFrameIn [17]:
ratings[ratings["UserID"]==1].head()Out[17]:
| 0 | 1 | 1193 | 5 | 978300760 | 1.0 |
| 1 | 1 | 661 | 3 | 978302109 | 0.0 |
| 2 | 1 | 914 | 3 | 978301968 | 0.0 |
| 3 | 1 | 3408 | 4 | 978300275 | 0.5 |
| 4 | 1 | 2355 | 5 | 978824291 | 1.0 |
获取2018年每个月温度最高的2天数据
In [18]:
fpath = "./pandas-learn-code/datas/beijing_tianqi/beijing_tianqi_2018.csv"df = pd.read_csv(fpath)In [19]:
df.head()Out[19]:
| 0 | 2018-01-01 | 3℃ | -6℃ | 晴~多云 | 东北风 | 1-2级 | 59 | 良 | 2 |
| 1 | 2018-01-02 | 2℃ | -5℃ | 阴~多云 | 东北风 | 1-2级 | 49 | 优 | 1 |
| 2 | 2018-01-03 | 2℃ | -5℃ | 多云 | 北风 | 1-2级 | 28 | 优 | 1 |
| 3 | 2018-01-04 | 0℃ | -8℃ | 阴 | 东北风 | 1-2级 | 28 | 优 | 1 |
| 4 | 2018-01-05 | 3℃ | -6℃ | 多云~晴 | 西北风 | 1-2级 | 50 | 优 | 1 |
In [21]:
# 替换掉温度后的℃df.loc[:, "bWendu"]=df["bWendu"].str.replace("℃","").astype("int32")df.loc[:, "yWendu"]=df["yWendu"].str.replace("℃","").astype("int32")In [22]:
# 新增一列为月份df["month"] = df["ymd"].str[0:7]df.head()Out[22]:
| 0 | 2018-01-01 | 3 | -6 | 晴~多云 | 东北风 | 1-2级 | 59 | 良 | 2 | 2018-01 |
| 1 | 2018-01-02 | 2 | -5 | 阴~多云 | 东北风 | 1-2级 | 49 | 优 | 1 | 2018-01 |
| 2 | 2018-01-03 | 2 | -5 | 多云 | 北风 | 1-2级 | 28 | 优 | 1 | 2018-01 |
| 3 | 2018-01-04 | 0 | -8 | 阴 | 东北风 | 1-2级 | 28 | 优 | 1 | 2018-01 |
| 4 | 2018-01-05 | 3 | -6 | 多云~晴 | 西北风 | 1-2级 | 50 | 优 | 1 | 2018-01 |
In [24]:
def getWenduTopN(df, topn):# 这里的df,是每个月份分组group的dfreturn df.sort_values(by="bWendu")[["ymd", "bWendu"]][-topn:]df.groupby("month").apply(getWenduTopN, topn=2).head()Out[24]:
| month | |||
| 2018-01 | 13 | 2018-01-14 | 6 |
| 18 | 2018-01-19 | 7 | |
| 2018-02 | 53 | 2018-02-23 | 10 |
| 56 | 2018-02-26 | 12 | |
| 2018-03 | 86 | 2018-03-28 | 25 |
In [25]:
df[["ymd","bWendu"]]. . .
我们看到,groupby的apply函数返回的DataFrame,其实和原来的DataFrame其实可以完全不一样