熊猫在线压缩图_回归图与熊猫和脾气暴躁

熊猫在线压缩图

数据可视化 (Data Visualization)

I like the plotting facilities that come with Pandas. Yes, there are many other plotting libraries such as Seaborn, Bokeh and Plotly but for most purposes, I am very happy with the simplicity of Pandas plotting.

我喜欢熊猫随附的绘图设备。 是的,还有许多其他的绘图库,例如Seaborn,Bokeh和Plotly,但对于大多数用途,我对Pandas绘图的简单性感到非常满意。

But there is one thing missing that I would like and that is the ability to plot a regression line over a complex line or scatter plot.

但是我想缺少一件事,那就是能够在复杂线或散点图上绘制回归线。

But, as I have discovered, this is very easily solved. With the Numpy library you can generate regression data in a couple of lines of code and plot it in the same figure as your original line or scatter plot.

但是,正如我发现的那样,这很容易解决。 使用Numpy库,您可以在几行代码中生成回归数据,并将其绘制在与原始线图或散点图相同的图中。

So that is what we are going to do in this article.

这就是我们在本文中要做的。

First, let’s get some data. If you’ve read any of my previous articles on data visualization, you know what’s coming next. I’m going to use a set of weather data that you can download from my Github account. It records the temperatures, sunshine levels and rainfall over several decades for London in the UK and is stored as a CSV file. This file has been created from public domain data recorded by the UK Met Office.

首先,让我们获取一些数据。 如果您阅读过我以前有关数据可视化的任何文章,那么您将了解接下来的内容。 我将使用一组可以从我的Github帐户下载的天气数据。 它记录了英国伦敦数十年来的温度,日照水平和降雨量,并以CSV文件存储。 该文件是根据UK Met Office记录的公共领域数据创建的。

伦敦夏天变热吗 (Are London summers getting hotter)

We are going to check whether the temperatures in London are rising over time. It’s not obvious from the raw data but by plotting a regression line over that data we will be better able to see the trend.

我们将检查伦敦的温度是否随着时间升高。 从原始数据来看并不明显,但是通过在该数据上绘制一条回归线,我们将能够更好地看到趋势。

So to begin we need to import the libraries that we will need.

因此,我们首先需要导入所需的库。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Nothing very unusual there, we are importing Pandas to help with data analysis and visualization, Numpy will give us the routines we need to create the regression data and Matplotlib is used by Pandas to create the plots.

那里没有什么异常的,我们正在导入Pandas以帮助进行数据分析和可视化,Numpy将为我们提供创建回归数据所需的例程,而Matplotlib被Pandas用于创建图。

Next, we download the data.

接下来,我们下载数据。

weather = pd.read_csv(‘https://raw.githubusercontent.com/alanjones2/dataviz/master/londonweather.csv')

(As you probably guessed, that’s all supposed to be on one line.)

(您可能已经猜到了,这些都应该放在一行上。)

We have read the CSV file into a Pandas DataFrame and this is what it looks like — a table containing monthly data that records the maximum and minimum temperatures, the rainfall and the number of hours of sunshine, starting in 1957 and ending part way through 2019.

我们已经将CSV文件读入了Pandas DataFrame,它的样子是这样的-该表包含每月数据,记录最高和最低温度,降雨量和日照小时数,始于1957年,直到2019年结束。

Image for post

I posed the question about whether summers were getting hotter, so I’m going to filter the data to give me only the data for the month of July when the hottest temperatures are normally recorded. And, for convenience, I’m going to add a column that numbers the years starting at year 0 (you’ll see how this is used later).

我提出了一个关于夏天是否变热的问题,所以我将过滤数据以仅提供通常记录最热温度的7月的数据。 并且,为方便起见,我将添加一列以数字表示从0年开始的年份(您将在稍后看到如何使用它)。

july = weather.query(‘Month == 7’)
july.insert(0,’Yr’,range(0,len(july)))

The code above applies a query to the weather dataframe which returns only the rows where the Month is equal to 7 (i.e.July) and creates a new dataframe called july from the result.

上面的代码对天气数据框应用查询,该查询仅返回Month等于7(即7月)的行,并从结果中创建一个称为july的新数据框。

Next, we insert a new column called Yr which numbers the rows from 0 to the length of the table.

接下来,我们插入一个称为Yr的新列,该列对从0到表的长度的行进行编号。

july looks like this:

七月看起来像这样:

Image for post

Now we can plot the maximum temperatures for July since 1957.

现在,我们可以绘制1957年以来7月份的最高温度。

july.plot(y=’Tmax’,x=’Yr’)
Image for post

There is a lot of variation there and high temperatures are not limited to recent years. But there does seem to be a trend, temperatures do seem to be rising a little, over time.

那里有很多变化,高温不仅限于近年来。 但似乎确实存在趋势,随着时间的流逝,温度似乎确实有所上升。

We can try and make this a bit more obvious by doing a linear regression where we attempt to find a straight line graph that represents the trend in the rise in temperature. To do this we use the polyfit function from Numpy. Polyfit does a least squares polynomial fit over the data that it is given. We want a linear regression over the data in columns Yr and Tmax so we pass these as parameters. The final parameter is the degree of the polynomial. For linear regression the degree is 1.

我们可以通过进行线性回归来尝试使这一点更加明显,在线性回归中我们试图找到一个代表温度上升趋势的直线图。 为此,我们使用Numpy中的polyfit函数。 Polyfit对给出的数据进行最小二乘多项式拟合。 我们希望对YrTmax列中的数据进行线性回归,因此我们将它们作为参数传递。 最终参数是多项式的次数。 对于线性回归,度为1。

We then use the convenience function poly1d to provide us with a function that will do the fitting.

然后,我们使用便利函数poly1d为我们提供将进行拟合的函数。

d = np.polyfit(july[‘Yr’],july[‘Tmax’],1)
f = np.poly1d(d)

We now use the function f to produce our linear regression data and inserting that into a new column called Treg.

现在,我们使用函数f生成线性回归数据,并将其插入到名为Treg的新列中。

july.insert(6,’Treg’,f(july[‘Yr’]))

Next, we create a line plot of Yr against Tmax (the wiggly plot we saw above) and another of Yr against Treg which will be our straight line regression plot. We combine the two plot by assigning the first plot to the variable ax and then passing that to the second plot as an additional axis.

接下来,我们创建一个YrTmax的折线图(我们在上面看到的摆动曲线),以及另一个YrTreg的折线图,这将是我们的直线回归图。 我们通过将第一个图分配给变量ax ,然后将其作为附加轴传递给第二个图,来组合这两个图。

ax = july.plot(x = ‘Yr’,y=’Tmax’)
july.plot(x=’Yr’, y=’Treg’,color=’Red’,ax=ax)
Image for post

That’s it, done!

就这样,完成了!

We can now see much more clearly the upward trend of temperature over the years.

现在,我们可以更清楚地看到多年来温度的上升趋势。

And here is the same thing done with a scatter chart.

这就是散点图所做的相同的事情。

ax=july.plot.scatter(x=’Yr’, y=’Tmax’)
july.plot(x=’Yr’,y=’Treg’,color=’Red’,legend=False,ax=ax)
Image for post

That was fairly straightforward, I think, and I hope you found it useful.

我认为那非常简单,希望您发现它有用。

For an introduction to plotting with Pandas see this:

有关使用Pandas进行绘图的介绍,请参见:

翻译自: https://towardsdatascience.com/regression-plots-with-pandas-and-numpy-faf2edbfad4f

熊猫在线压缩图

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389250.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

SiameseRPN详解

SiameseRPN论文来源论文背景一,简介二,研究动机三、相关工作论文理论注意:网络结构:1.Siamese Network2.RPN3.LOSS计算4.Tracking论文的优缺点分析一、Siamese-RPN的贡献/优点:二、Siamese-RPN的缺点:代码流…

数据可视化 信息可视化_可视化数据操作数据可视化与纪录片的共同点

数据可视化 信息可视化Data visualization is a great way to celebrate our favorite pieces of art as well as reveal connections and ideas that were previously invisible. More importantly, it’s a fun way to connect things we love — visualizing data and kicki…

python 图表_使用Streamlit-Python将动画图表添加到仪表板

python 图表介绍 (Introduction) I have been thinking of trying out Streamlit for a while. So last weekend, I spent some time tinkering with it. If you have never heard of this tool before, it provides a very friendly way to create custom interactive Data we…

Python--day26--复习

转载于:https://www.cnblogs.com/xudj/p/9953293.html

SiameseRPN++分析

SiamRPN论文来源论文背景什么是目标跟踪什么是孪生网络结构Siamese的局限解决的问题论文分析创新点一:空间感知策略创新点二:ResNet-50深层网络创新点三:多层特征融合创新点四:深层互相关代码分析整体代码简述(1&#…

Lockdown Wheelie项目

“It’s Strava for wheelies,” my lockdown project, combining hyper-local exercise with data analytics to track and guide improvement. Practising wheelies is a great way to stay positive; after all, it’s looking up, moving forward.我的锁定项目“将Strava运…

api地理编码_通过地理编码API使您的数据更有意义

api地理编码Motivation动机 In my second semester of my Master’s degree, I was working on a dataset which had all the records of the road accident in Victoria, Australia (2013-19). I was very curious to know, which national highways are the most dangerous …

SiamBAN论文学习

SiameseBAN论文来源论文背景主要贡献论文分析网络框架创新点一:Box Adaptive Head创新点二:Ground-truth创新点三:Anchor Free论文流程训练部分:跟踪部分论文翻译Abstract1. Introduction2. Related Works2.1. Siamese Network Ba…

实现klib_使用klib加速数据清理和预处理

实现klibTL;DRThe klib package provides a number of very easily applicable functions with sensible default values that can be used on virtually any DataFrame to assess data quality, gain insight, perform cleaning operations and visualizations which results …

MMDetection修改代码无效

最近在打比赛,使用MMDetection框架,但是无论是Yolo修改类别还是更改head,代码运行后发现运行的是修改之前的代码。。。也就是说修改代码无效。。。 问题解决办法: MMDetection在首次运行后会把一部分运行核心放在anaconda的环境…

docker etcd

etcd是CoreOS团队于2013年6月发起的开源项目,它的目标是构建一个高可用的分布式键值(key-value)数据库,用于配置共享和服务发现 etcd内部采用raft协议作为一致性算法,etcd基于Go语言实现。 etcd作为服务发现系统,有以下的特点&…

SpringBoot简要

2019独角兽企业重金招聘Python工程师标准>>> 简化Spring应用开发的一个框架;      整个Spring技术栈的一个大整合;      J2EE开发的一站式解决方案;      自动配置:针对很多Spring应用程序常见的应用功能&…

简明易懂的c#入门指南_统计假设检验的简明指南

简明易懂的c#入门指南介绍 (Introduction) One of the main applications of frequentist statistics is the comparison of sample means and variances between one or more groups, known as statistical hypothesis testing. A statistic is a summarized/compressed proba…

Torch.distributed.elastic 关于 pytorch 不稳定

错误日志: Epoch: [229] Total time: 0:17:21 Test: [ 0/49] eta: 0:05:00 loss: 1.7994 (1.7994) acc1: 78.0822 (78.0822) acc5: 95.2055 (95.2055) time: 6.1368 data: 5.9411 max mem: 10624 WARNING:torch.distributed.elastic.agent.server.api:Rec…

0x22 迭代加深

poj2248 真是个新套路。还有套路剪枝...大到小和判重 #include<cstdio> #include<iostream> #include<cstring> #include<cstdlib> #include<algorithm> #include<cmath> #include<bitset> using namespace std;int n,D,x[110];bool…

云原生全球最大峰会之一KubeCon首登中国 Kubernetes将如何再演进?

雷锋网消息&#xff0c;11月14日&#xff0c;由CNCF发起的云原生领域全球最大的峰会之一KubeConCloudNativeCon首次登陆中国&#xff0c;中国已经成为云原生领域一股强大力量&#xff0c;并且还在不断成长。 毫无疑问&#xff0c;Kubernetes已经成为容器编排事实标准&#xff…

分布分析和分组分析_如何通过群组分析对用户进行分组并获得可行的见解

分布分析和分组分析数据分析 (DATA ANALYSIS) Being a regular at a restaurant is great.乙 eing定期在餐厅是伟大的。 When I started university, my dad told me I should find a restaurant I really liked and eat there every month with some friends. Becoming a reg…

python 工具箱_Python交易工具箱:通过指标子图增强图表

python 工具箱交易工具箱 (trading-toolbox) After a several months-long hiatus, I can finally resume posting to the Trading Toolbox Series. We started this series by learning how to plot indicators (specifically: moving averages) on the top of a price chart.…

PDA端的数据库一般采用的是sqlce数据库

PDA端的数据库一般采用的是sqlce数据库,这样与PC端的sql2000中的数据同步就变成了一个问题,如在PDA端处理,PDA端的内存,CPU等都是一个制约因素,其次他们的一个连接稳定及其间的数据传输也是一个难点.本例中通过在PC端的转化后再复制到PDA上面,这样,上面所有的问题都得到了一个有…

bzoj 1016 [JSOI2008]最小生成树计数——matrix tree(相同权值的边为阶段缩点)(码力)...

题目&#xff1a;https://www.lydsy.com/JudgeOnline/problem.php?id1016 就是缩点&#xff0c;每次相同权值的边构成的联通块求一下matrix tree。注意gauss里的编号应该是从1到...的连续的。 学习了一个TJ。用了vector。自己曾写过一个只能过样例的。都放上来吧。 路径压缩的…