先知模型 facebook_使用Facebook先知进行犯罪率预测

先知模型 facebook

Time series prediction is one of the must-know techniques for any data scientist. Questions like predicting the weather, product sales, customer visit in the shopping center, or amount of inventory to maintain, etc - all about time series forecasting, making it a valuable addition to a data scientist’s skillsets.

时间序列预测是任何数据科学家都必须了解的技术之一。 诸如预测天气,产品销售,购物中心的顾客来访或要维护的库存量之类的问题都与时间序列预测有关,这使其成为数据科学家技能的宝贵补充。

In this article, I will introduce how to use Facebook Prophet to predict the crime rate in Chicago. Split into 5 parts:

在本文中,我将介绍如何使用Facebook Prophet预测芝加哥的犯罪率。 分为5部分:

1. Prophet Introduction

1.先知介绍

2. EDA

2. EDA

3. Data processing

3.数据处理

4. Model prediction

4.模型预测

5. Takeaways

5.外卖

Let’s begin the journey.

让我们开始旅程。

1. Prophet Introduction

1.先知介绍

In 2017, Facebook Core Data Science Team open-sourced Prophet. As stated on its Github page, Prophet is:

2017年,Facebook核心数据科学团队开源了Prophet。 如其Github页所述,先知是:

  • a procedure for forecasting time series data;

    预测时间序列数据的程序;
  • based on additive models;

    基于加性模型;
  • fit non-linear trends with yearly, weekly, and daily seasonality, plus holiday effect.

    使非线性趋势与每年,每周和每天的季节性相适应,再加上假期影响。

Prophet uses a decomposable model with three main components, including trend, seasonality, and holidays, as combined below:

先知使用具有三个主要组成部分的可分解模型,包括趋势,季节性和假日,如下所示:

Image for post

Where:

哪里:

  • g(t) is the trend function which models non-periodic changes;

    g(t)是模拟非周期性变化的趋势函数;

  • s(t) represents periodic changes (e.g., weekly and yearly seasonality);

    s(t)代表周期性变化(例如,每周和每年的季节性变化);

  • h(t) represents the effects of holidays which occur on potentially irregular schedules;

    h(t)表示假期可能在不定期的时间表上发生的影响;

  • the error term represents any idiosyncratic changes which are not accommodated by the model.

    错误项表示模型不适应的任何特有变化。

So using time as a regressor, Prophet tries to fit linear and non-linear functions of time as components. In effect, Prophet frames the forecasting problem as a curve-fitting exercise, instead of looking at the time-based dependency of each observation, which brings flexibility, fast-fitting, and interpretable parameters.

因此,先知将时间用作回归变量,尝试将时间的线性和非线性函数拟合为分量。 实际上,Prophet将预测问题构造为曲线拟合练习,而不是查看每个观测值基于时间的依赖性,这带来了灵活性,快速拟合和可解释的参数。

Prophet works best with time series that have strong seasonal effects and several seasons of historical data.

先知最适合具有强烈季节性影响和多个季节历史数据的时间序列。

2. EDA

2. EDA

The data used here is the Chicago Crime dataset from Kaggle. It contains a summary of the reported crimes that occurred in the City of Chicago from 2001 to 2017.

这里使用的数据是来自Kaggle的Chicago Crime数据集。 它包含2001年至2017年在芝加哥市发生的所报告犯罪的摘要。

Quickly looking at the data below, you will notice the dataset has 23 columns and 7,941,282 records, including ID, Case Number, Block, Primary Type, Description, etc.

快速查看下面的数据,您会注意到数据集有23列和7,941,282条记录,包括ID,案例编号,块,主要类型,描述等。

A brief view of the raw Chicago Crime dataset
原始芝加哥犯罪数据集的简要视图

First, let’s drop the unused columns. Specifically,

首先,让我们删除未使用的列。 特别,

df.drop([‘Unnamed: 0’, ‘ID’, ‘Case Number’, ‘IUCR’, ‘X Coordinate’,  ‘Y Coordinate’,’Updated On’,’Year’, ‘FBI Code’, ‘Beat’,’Ward’,’Community Area’,‘Location’, ‘District’, ‘Latitude’, ‘Longitude’],
axis = 1, inplace=True)
Image for post
Fig.1 Data view after column dropping
图1列删除后的数据视图

As shown in Fig.1, the column ‘Date’ is in date format. Let’s convert it to a date format Pandas can interpret, and set it as the index. Specifically,

如图1所示, “日期”列为日期格式。 让我们将其转换为熊猫可以解释的日期格式,并将其设置为索引。 特别,

df.Date = pd.to_datetime(df.Date, format = ‘%m/%d/%Y %I:%M:%S %p’)
df.index = pd.DatetimeIndex(df.Date)
df.drop(‘Date’, inplace = True, axis = 1)

Now data is ready for visualization. First, let’s look at the yearly crime distribution. Specifically,

现在,数据已准备好可视化。 首先,让我们看一下每年的犯罪分布。 特别,

plt.plot(df.resample(‘Y’).size())
plt.xlabel(‘Year’)
plt.ylabel(‘Num of crimes’)

Note above df.resample(‘Y’).size() produce the yearly crime count.

请注意,上面的df.resample('Y')。size()会产生年度犯罪计数。

As indicated in Fig.2, the crime rate starts to drop from 2002 to 2005. But from 2006, the crime rate starts to go up, reaching a peak in 2009 and going down till 2018. This curve may reflect the economic impact on social crime. Before and after the financial crisis, the crime rate goes downs yearly, but the bad economy resulting from the financial crisis causes an increase in crimes.

如图2所示,犯罪率从2002年到2005年开始下降。但是从2006年开始,犯罪率开始上升,2009年达到峰值,然后下降到2018年。该曲线可能反映了经济对社会的影响。犯罪。 金融危机前后,犯罪率逐年下降,但金融危机造成的经济不景气导致犯罪率上升。

Image for post
Fig.2 Yearly distribution of the crime rate
图2犯罪率的年度分布

Second, let’s look at the quarterly crime rate distribution. As shown in Fig.3, the crime rate shows a descending trend with periodic ups and downs.

其次,让我们看一下季度犯罪率分布。 如图3所示,犯罪率呈下降趋势,并有周期性的起伏。

Image for post
Fig.3 Monthly distribution of the crime rate
图3犯罪率月分布

In a similar way, as shown in Fig.4, the monthly crime rate shows the same pattern as the quarterly analysis.

以类似的方式,如图4所示,每月犯罪率显示与季度分析相同的模式。

Image for post
Fig.4 Quarterly distribution of the crime rate
图4犯罪率季度分布

3. Data processing

3.数据处理

The input to Prophet is always a dataframe with two columns: ‘ds’ and ‘y’. The ‘ds’ (datestamp) column should be of a format expected by Pandas, ideally YYYY-MM-DD for a date or YYYY-MM-DD HH:MM:SS for a timestamp. The ‘y’ column must be numeric and represents the measurement we wish to forecast.

先知的输入始终是具有两列的数据框:“ ds”和“ y”。 “ ds”(datestamp)列应采用熊猫期望的格式,理想情况下,日期应为YYYY-MM-DD,时间戳则应为YYYY-MM-DD HH:MM:SS。 “ y”列必须为数字,代表我们希望预测的度量。

Specifically,

特别,

df_m = df.resample(‘M’).size().reset_index()
df_m.columns = [‘Date’, ‘Monthly Crime Count’]
df_m_final = df_m.rename(columns = {‘Date’: ‘ds’, ‘Monthly Crime Count’: ‘y’})

4. Model prediction

4.模型预测

From EDA analysis, we found there is monthly and quarterly seasonality but no yearly seasonality. By default, Prophet fits weekly and yearly seasonality, if the time series is more than two cycles long. Users can add seasonality such as hourly, monthly, and quarterly using ‘add_seasonality’ method.

通过EDA分析,我们发现每个月和每个季度都有季节性,没有年度季节性。 默认情况下,如果时间序列长于两个周期以上,则先知适合每周和每年的季节性。 用户可以使用“ add_seasonality”方法添加每小时,每月和每季度等季节性信息。

To make a prediction, instantiate a new Prophet object, and call the fit method to train on the data. Specifically,

要进行预测,请实例化一个新的Prophet对象,然后调用fit方法对数据进行训练。 特别,

m = Prophet(interval_width=0.95, yearly_seasonality=False)
m.add_seasonality(name=’monthly’, period=30.5, fourier_order=10)
m.add_seasonality(name=’quarterly’, period=91.5, fourier_order=10)
m.fit(df_m_final)

Note ‘interval_width=0.95’, produces a confidence interval around the forecast. Prophet uses a partial Fourier sum to approximate periodic signal. The number of Fourier order determines how quickly the seasonality can change.

注意'interval_width = 0.95' ,在预测周围产生一个置信区间。 先知使用部分傅立叶和来近似周期信号。 傅立叶阶数确定季节性可以多快地改变。

Predictions are made on a dataframe with a column ‘ds’ containing the dates for which a prediction is to be made. For instance, to predict the following 24 months, try below:

在具有“ ds”列的数据帧上进行预测,该列包含要进行预测的日期。 例如,要预测接下来的24个月,请尝试以下操作:

future = m.make_future_dataframe(periods = 24, freq = ‘M’)
pred = m.predict(future)

As shown in Fig.5, the predicted value ‘yhat’ is assigned to each date with a lower and upper limit.

如图5所示,将预测值“ yhat”分配给具有上限和下限的每个日期。

Image for post
Fig.5 Prediction results
图5预测结果

As shown in Fig.6, the black dots are the historical data, and the deep blue line is model predictions. The light blue shadow is a 95% confidence interval around the predictions. The blue line shows a good match with the pattern in Fig.3, indicating a good prediction on historical data. Great!

如图6所示,黑点是历史数据,深蓝线是模型预测。 淡蓝色阴影是围绕预测的95%置信区间。 蓝线表示与图3中的图案非常匹配,表示对历史数据的良好预测。 大!

Image for post
Fig.6 Prediction plot
图6预测图

Finally, Fig.7 shows the un-periodic trend, and monthly and quarterly seasonality components of the crime rate pattern.

最后,图7显示了犯罪率模式的非周期性趋势以及每月和每季度的季节性组成。

Image for post
Fig.7 Prediction pattern component plot
图7预测模式成分图

5. Takeaways

5.外卖

We introduced how to make the best use of Facebook Prophet. Specifically,

我们介绍了如何充分利用Facebook Prophet。 特别,

  • to use EDA to explore the historical data patterns, helping to create the best suitable model

    使用EDA探索历史数据模式,帮助创建最合适的模型
  • to use data processing to prepare the data for modeling

    使用数据处理为建模准备数据
  • to use Prophet to fit the historical data and forecast future crime rate

    使用先知来拟合历史数据并预测未来犯罪率

Great! Huge congratulations for making it to the end. If you need the source code, feel free to visit my Github page.

大! 巨大的祝贺,使它走到了尽头。 如果您需要源代码,请随时访问我的Github页面。

1. Facebook Prophet official document

1. Facebook Prophet官方文件

2. Prophet paper: Sean J. Taylor, Benjamin Letham (2018) Forecasting at scale. The American Statistician 72(1):37–45 (https://peerj.com/preprints/3190.pdf).

2.先知论文:肖恩·泰勒(Sean J. Taylor),本杰明·莱瑟姆(Benjamin Letham)(2018)大规模预测。 美国统计师72(1):37-45( https://peerj.com/preprints/3190.pdf )。

翻译自: https://towardsdatascience.com/crime-rate-prediction-using-facebook-prophet-5348e21273d

先知模型 facebook

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/389440.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

莫烦Pytorch神经网络第四章代码修改

4.1CNN卷积神经网络 import torch import torch.nn as nn from torch.autograd import Variable import torch.utils.data as Data import torchvision import matplotlib.pyplot as pltEPOCH 1 BATCH_SIZE 50 LR 0.001 DOWNLOAD_MNIST False #如果数据集已经下载到…

github gists 101使代码共享漂亮

If you’ve been going through Medium, looking at technical articles, you’ve undoubtedly seen little windows that look like the below:如果您一直在阅读Medium,并查看技术文章,那么您无疑会看到类似于以下内容的小窗口: def hello_…

loj #6278. 数列分块入门 2

题目 题解 区间修改,询问区间小于c的个数。分块排序,用vector。至于那个块的大小,好像要用到均值不等式 我不太会。。。就开始一个个试,发现sizsqrt(n)/4时最快!!!明天去学一下算分块复杂度的方…

基于Netty的百万级推送服务设计要点

1. 背景1.1. 话题来源最近很多从事移动互联网和物联网开发的同学给我发邮件或者微博私信我,咨询推送服务相关的问题。问题五花八门,在帮助大家答疑解惑的过程中,我也对问题进行了总结,大概可以归纳为如下几类:1&#x…

莫烦Pytorch神经网络第五章代码修改

5.1动态Dynamic import torch from torch import nn import numpy as np import matplotlib.pyplot as plt# torch.manual_seed(1) # reproducible# Hyper Parameters INPUT_SIZE 1 # rnn input size / image width LR 0.02 # learning rateclass…

鲜为人知的6个黑科技网站_6种鲜为人知的熊猫绘图工具

鲜为人知的6个黑科技网站Pandas is the go-to Python library for data analysis and manipulation. It provides numerous functions and methods that expedice the data analysis process.Pandas是用于数据分析和处理的Python库。 它提供了加速数据分析过程的众多功能和方法…

VRRP网关冗余

实验要求 1、R1创建环回口,模拟外网 2、R2,R3使用VRRP技术 3、路由器之间使用EIGRP路由协议  实验拓扑  实验配置  R1(config)#interface loopback 0R1(config-if)#ip address 1.1.1.1 255.255.255.0R1(config-if)#int e0/0R1(config-if)#ip addr…

网页JS获取当前地理位置(省市区)

网页JS获取当前地理位置(省市区) 一、总结 一句话总结:ip查询接口 二、网页JS获取当前地理位置(省市区) 眼看2014又要过去了,翻翻今年的文章好像没有写几篇,忙真的或许已经不能成为借口了&#…

大熊猫卸妆后_您不应错过的6大熊猫行动

大熊猫卸妆后数据科学 (Data Science) Pandas is used mainly for reading, cleaning, and extracting insights from data. We will see an advanced use of Pandas which are very important to a Data Scientist. These operations are used to analyze data and manipulate…

数据eda_关于分类和有序数据的EDA

数据eda数据科学和机器学习统计 (STATISTICS FOR DATA SCIENCE AND MACHINE LEARNING) Categorical variables are the ones where the possible values are provided as a set of options, it can be pre-defined or open. An example can be the gender of a person. In the …

PyTorch官方教程中文版:PYTORCH之60MIN入门教程代码学习

Pytorch入门 import torch""" 构建非初始化的矩阵 """x torch.empty(5,3) #print(x)""" 构建随机初始化矩阵 """x torch.rand(5,3)""" 构造一个矩阵全为 0,而且数据类型是 long &qu…

Flexbox 最简单的表单

弹性布局(Flexbox)逐渐流行&#xff0c;越来越多的人开始使用&#xff0c;因为它写Css布局真是太简单了一一、<form>元素表单使用<form>元素<form></form>复制代码上面是一个空的表单&#xff0c;根据HTML标准&#xff0c;它是一个块级元素&#xff0c…

CSS中的盒子模型

一.为什么使用CSS 1.有效的传递页面信息 2.使用CSS美化过的页面文本&#xff0c;使页面漂亮、美观&#xff0c;吸引用户 3.可以很好的突出页面的主题内容&#xff0c;使用户第一眼可以看到页面主要内容 4.具有良好的用户体验 二.字体样式属性 1.font-family:英…

jdk重启后步行_向后介绍步行以一种新颖的方式来预测未来

jdk重启后步行“永远不要做出预测&#xff0c;尤其是关于未来的预测。” (KK Steincke) (“Never Make Predictions, Especially About the Future.” (K. K. Steincke)) Does this picture portray a horse or a car? 这张照片描绘的是马还是汽车&#xff1f; How likely is …

PyTorch官方教程中文版:入门强化教程代码学习

PyTorch之数据加载和处理 from __future__ import print_function, division import os import torch import pandas as pd #用于更容易地进行csv解析 from skimage import io, transform #用于图像的IO和变换 import numpy as np import matplotlib.pyplot a…

css3-2 CSS3选择器和文本字体样式

css3-2 CSS3选择器和文本字体样式 一、总结 一句话总结&#xff1a;是要记下来的&#xff0c;记下来可以省很多事。 1、css的基本选择器中的:first-letter和:first-line是什么意思&#xff1f; :first-letter选择第一个单词&#xff0c;:first-line选择第一行 2、css的伪类选…

mongodb仲裁者_真理的仲裁者

mongodb仲裁者Coming out of college with a background in mathematics, I fell upward into the rapidly growing field of data analytics. It wasn’t until years later that I realized the incredible power that comes with the position. As Uncle Ben told Peter Par…

优化 回归_使用回归优化产品价格

优化 回归应用数据科学 (Applied data science) Price and quantity are two fundamental measures that determine the bottom line of every business, and setting the right price is one of the most important decisions a company can make. Under-pricing hurts the co…

Node.js——异步上传文件

前台代码 submit() {var file this.$refs.fileUpload.files[0];var formData new FormData();formData.append("file", file);formData.append("username", this.username);formData.append("password", this.password);axios.post("http…

用 JavaScript 的方式理解递归

原文地址 1. 递归是啥? 递归概念很简单&#xff0c;“自己调用自己”&#xff08;下面以函数为例&#xff09;。 在分析递归之前&#xff0c;需要了解下 JavaScript 中“压栈”&#xff08;call stack&#xff09; 概念。 2. 压栈与出栈 栈是什么&#xff1f;可以理解是在内存…