Python 数学应用(三)(1)https://developer.aliyun.com/article/1506400
ARIMA 模型结果
==================================================================
因变量:D.y 观察次数:365
模型:ARIMA(1, 1, 1) 对数似然-512.905
方法:css-mle 创新的标准差 0.986
日期:星期六,2020 年 5 月 2 日 AIC 1033.810
时间:14:47:25 BIC 1049.409
样本:2020 年 01 月 02 日 HQIC 1040.009
- 12-31-2020
==================================================================
coef std err z P>|z| [0.025 0.975]
const 0.9548 0.148 6.464 0.000 0.665 1.244
ar.L1.D.y 0.8342 0.056 14.992 0.000 0.725 0.943
ma.L1.D.y -0.5204 0.088 -5.903 0.000 -0.693 -0.348
根
==================================================================
实际 虚数 模数 频率
AR.1 1.1987 +0.0000j 1.1987 0.0000
MA.1 1.9216 +0.0000j 1.9216 0.0000
Here, we can see that all three of our estimated coefficients are significantly different from 0 due to the fact that all three have 0 to 3 decimal places in the `P>|z|` column. 6. Now, we can use the `forecast` method to generate predictions of future values. This also returns the standard error and confidence intervals for predictions:
forecast, std_err, fc_ci = fitted.forecast(steps=50)
forecast_dates = pd.date_range(“2021-01-01”, periods=50)
forecast = pd.Series(forecast, index=forecast_dates)
7. Next, we plot the forecast values and their confidence intervals on the figure containing the time series data:
forecast.plot(ax=ts_ax, c=“g”, 标签=“预测”)
ts_ax.fill_between(forecast_dates, fc_ci[:, 0], fc_ci[:, 1],
颜色=“r”, alpha=0.4)
8. Finally, we add the actual future values to generate, along with the sample in *step 1*, to the plot (it might be easier if you repeat the plot commands from *step 1* to regenerate the whole plot here):
test_ts.plot(ax=ts_ax, c=“k”, 标签=“实际”)
ts_ax.legend()
The final plot containing the time series with the forecast and the actual future values can be seen in the following figure: ![](https://gitee.com/OpenDocCN/freelearn-python-zh/raw/master/docs/app-math-py/img/76b414a5-1847-4944-8811-28677d9e3853.png) Figure 7.10: Plot of the sample time series with forecast values and actual future values for comparison Here, we can see that the actual future values are within the confidence interval for the forecast values. How it works... The ARIMA model – with orders *p*, *d*, and *q –* is simply an ARMA (*p*, *q*) model that's applied to a time series. This is obtained by applying differencing of order *d* to the original time series data. It is a fairly simple way to generate a model for time series data. The statsmodels `ARIMA` class handles the creation of a model, while the `fit` method fits this model to the data. We passed the `trend="c"` keyword argument because we know, from *Figure 7.9*, that the time series has a constant trend. The model is fit to the data using a maximum likelihood method and the final estimates for the parameters – in this case, one parameter for the autoregressive component, one for the moving average component, the constant trend parameter, and the variance of the noise. These parameters are reported in the summary. From this output, we can see that the estimates for the AR coefficient (0.8342) and the MA constant (-0.5204) are very good approximations of the true estimates that were used to generate the data, which were 0.8 for the AR coefficient and -0.5 for the MA coefficient. These parameters are set in the `generate_sample_data` routine from the `tsdata.py` file in the code repository for this chapter. This generates the sample data in *step 1*. You might have noticed that the constant parameter (0.9548) is not 0.2, as specified in the `generate_sample_data` call in *step 1*. In fact, it is not so far from the actual drift of the time series. The `forecast` method on the fitted model (the output of the `fit` method) uses the model to make predictions about the value after a given number of steps. In this recipe, we forecast for up to 50 time steps beyond the range of the sample time series. The output of the `forecast` method is a tuple containing the forecast values, the standard error for the forecasts, and the confidence interval (by default, 95% confidence) for the forecasts. Since we provided the time series as a Pandas series, these are returned as `Series` objects (the confidence interval is a `DataFrame`). When you construct an ARIMA model for time series data, you need to make sure you use the smallest order differencing that removes the underlying trend. Applying more differencing than is necessary is called *overdifferencing* and can lead to problems with the model. Forecasting seasonal data using ARIMA Time series often display periodic behavior so that peaks or dips in the value appear at regular intervals. This behavior is called *seasonality* in the analysis of time series. The methods we have used to far in this chapter to model time series data obviously do not account for seasonality. Fortunately, it is relatively easy to adapt the standard ARIMA model to incorporate seasonality, resulting in what is sometimes called a SARIMA model. In this recipe, we will learn how to model time series data that includes seasonal behavior and use this model to produce forecasts. Getting ready For this recipe, we will need the NumPy package imported as `np`, the Pandas package imported as `pd`, the Matplotlib `pyplot`module as `plt`, and the statsmodels `api`module imported as `sm`. We will also need the utility for creating sample time series data from the `tsdata`module, which is included in this book's repository:
从 tsdata 导入生成样本数据
How to do it... Follow these steps to produce a seasonal ARIMA model for sample time series data and use this model to produce forecasts: 1. First, we use the `generate_sample_data` routine to generate a sample time series to analyze:
sample_ts,test_ts = generate_sample_data(undiff=True,
季节性=True)
2. As usual, our first step is to visually inspect the data by producing a plot of the sample time series:
ts_fig, ts_ax = plt.subplots(tight_layout=True)
sample_ts.plot(ax=ts_ax, 标题=“时间序列”, 标签=“观察到的”)
ts_ax.set_xlabel(“日期”)
ts_ax.set_ylabel(“值”)
The plot of the sample time series data can be seen in the following figure. Here, we can see that there seem to be periodic peaks in the data: ![](https://gitee.com/OpenDocCN/freelearn-python-zh/raw/master/docs/app-math-py/img/f4ae55a8-7e3a-489b-b5bc-987923b794ab.png) Figure 7.11: Plot of the sample time series data 3. Next, we plot the ACF and PACF for the sample time series:
ap_fig,(acf_ax,pacf_ax) = plt.subplots(2, 1,
sharex=True, tight_layout=True)
sm.graphics.tsa.plot_acf(sample_ts, ax=acf_ax)
sm.graphics.tsa.plot_pacf(sample_ts, ax=pacf_ax)
pacf_ax.set_xlabel(“滞后”)
acf_ax.set_ylabel(“值”)
pacf_ax.set_ylabel(“值”)
The ACF and PACF for the sample time series can be seen in the following figure: ![](https://gitee.com/OpenDocCN/freelearn-python-zh/raw/master/docs/app-math-py/img/0d6132a1-0b39-4f7b-bf1a-0da288ad8c3c.png) Figure 7.12: ACF and PACF for the sample time series These plots possibly indicate the existence of autoregressive components, but also a significant spike on the PACF with lag 7. 4. Next, we difference the time series and produce plots of the ACF and PACF for the differenced series. This should make the order of the model clearer:
差分=sample_ts.diff().dropna()
dap_fig, (dacf_ax, dpacf_ax) = plt.subplots(2, 1, sharex=True,
tight_layout=True)
sm.graphics.tsa.plot_acf(diffs, ax=dacf_ax,
标题=“差分 ACF”)
sm.graphics.tsa.plot_pacf(diffs, ax=dpacf_ax,
标题=“差分 PACF”)
dpacf_ax.set_xlabel(“滞后”)
dacf_ax.set_ylabel(“值”)
dpacf_ax.set_ylabel(“值”)
The ACF and PACF for the differenced time series can be seen in the following figure. We can see that there is definitely a seasonal component with lag 7: ![](https://gitee.com/OpenDocCN/freelearn-python-zh/raw/master/docs/app-math-py/img/7e2f3382-2b4e-4423-9d38-0e0d4332f9b8.png) Figure 7.13: Plot of the ACF and PACF for the differenced time series 5. Now, we need to create a `SARIMAX` object that holds the model, with ARIMA order `(1, 1, 1)` and seasonal ARIMA order `(1, 0, 0, 7)`. We fit this model to the sample time series and print summary statistics. We plot the predicted values on top of the time series data:
model = sm.tsa.SARIMAX(sample_ts, order=(1, 1, 1),
季节性顺序=(1, 0, 0, 7))
fitted_seasonal = model.fit()
print(fitted_seasonal.summary())
fitted_seasonal.fittedvalues.plot(ax=ts_ax, c=“r”,
标签=“预测”)
The summary statistics that are printed to the Terminal look as follows:
SARIMAX 结果
===================================================================
因变量:y 观察次数:366
模型:SARIMAX(1, 1, 1)x(1, 0, [], 7) 对数似然-509.941
日期:星期一,2020 年 5 月 4 日 AIC 1027.881
时间:18:03:27 BIC 1043.481
样本:2020 年 01 月 01 日 HQIC 1034.081
- 12-31-2020
协方差类型: opg
===================================================================
coef std err z P>|z| [0.025 0.975]
ar.L1 0.7939 0.065 12.136 0.000 0.666 0.922
ma.L1 -0.4544 0.095 -4.793 0.000 -0.640 -0.269
ar.S.L7 0.7764 0.034 22.951 0.000 0.710 0.843
sigma2 0.9388 0.073 12.783 0.000 0.795 1.083
===================================================================
Ljung-Box (Q): 31.89 Jarque-Bera (JB): 0.47
Prob(Q): 0.82 Prob(JB): 0.79
异方差性(H): 1.15 偏度: -0.03
Prob(H) (双侧): 0.43 峰度: 2.84
===================================================================
警告:
[1] 使用外积计算的协方差矩阵
梯度的数量(复杂步骤)。
6. This model appears to be a reasonable fit, so we move ahead and forecast `50` time steps into the future:
forecast_result = fitted_seasonal.get_forecast(steps=50)
forecast_index = pd.date_range(“2021-01-01”, periods=50)
预测 = 预测结果.预测均值
7. Finally, we add the forecast values to the plot of the sample time series, along with the confidence interval for these forecasts:
forecast.plot(ax=ts_ax, c=“g”, label=“预测”)
conf = forecast_result.conf_int()
ts_ax.fill_between(forecast_index, conf[“lower y”],
conf[“upper y”], color=“r”, alpha=0.4)
test_ts.plot(ax=ts_ax, color=“k”, label=“实际未来”)
ts_ax.legend()
The final plot of the time series, along with the predictions and the confidence interval for the forecasts, can be seen in the following figure: ![](https://gitee.com/OpenDocCN/freelearn-python-zh/raw/master/docs/app-math-py/img/54ae8641-4c49-464e-9183-9c56e95c1145.png) Figure 7.14: Plot of the sample time series, along with the forecasts and confidence interval How it works... Adjusting an ARIMA model to incorporate seasonality is a relatively simple task. A seasonal component is similar to an autoregressive component, where the lag starts at some number larger than 1\. In this recipe, the time series exhibits seasonality with period 7 (weekly), which means that the model is approximately given by the following equation: ![](https://gitee.com/OpenDocCN/freelearn-python-zh/raw/master/docs/app-math-py/img/0a72f02a-4314-4245-b1cf-92c84d07882e.png) Here *φ[1]* and *Φ**[1]**are the parameters and *ε[t]* is the noise at time step *t*. The standard ARIMA model is easily adapted to include this additional lag term.* *The SARIMA model incorporates this additional seasonality into the ARIMA model. It has four additional order terms on top of the three for the underlying ARIMA model. These four additional parameters are the seasonal AR, differencing, and MA components, along with the period of the seasonality. In this recipe, we took the seasonal AR to be order 1, with no seasonal differencing or MA components (order 0), and a seasonal period of 7\. This gives us the additional parameters (1, 0, 0, 7) that we used in *step 5* of this recipe. Seasonality is clearly important in modeling time series data that is measured over a period of time covering days, months, or years. It usually incorporates some kind of seasonal component based on the time frame that they occupy. For example, a time series of national power consumption measured hourly over several days would probably have a 24-hour seasonal component since power consumption will likely fall during the night hours. Long-term seasonal patterns might be hidden if the time series data that you are analyzing does not cover a sufficiently large time period for the pattern to emerge. The same is true for trends in the data. This can lead to some interesting problems when trying to produce long-term forecasts from a relatively short period represented by observed data. The `SARIMAX` class from the statsmodels package provides the means of modeling time series data using a seasonal ARIMA model. In fact, it can also model external factors that have an additional effect on the model, sometimes called *exogenous regressors*. (We will not cover these here.) This class works much like the `ARMA` and `ARIMA` classes that we used in the previous recipes. First, we create the model object by providing the data and orders for both the ARIMA process and the seasonal process, and then use the `fit` method on this object to create a fitted model object. We use the `get_forecasts` method to generate an object holding the forecasts and confidence interval data that we can then plot, thus producing the *Figure 7.14*. ## There's more... There is a small difference in the interface between the `SARIMAX` class used in this recipe and the `ARIMA` class used in the previous recipe. At the time of writing, the statsmodels package (v0.11) includes a second `ARIMA` class that builds on top of the `SARIMAX` class, thus providing the same interface. However, at the time of writing, this new `ARIMA` class does not offer the same functionality as that used in this recipe. # Using Prophet to model time series data The tools we have seen so far for modeling time series data are very general and flexible methods, but they require some knowledge of time series analysis in order to be set up. The analysis needed to construct a good model that can be used to make reasonable predictions into the future can be intensive and time-consuming, and may not be viable for your application. The Prophet library is designed to automatically model time series data quickly, without the need for input from the user, and make predictions into the future. In this recipe, we will learn how to use Prophet to produce forecasts from a sample time series. ## Getting ready For this recipe, we will need the Pandas package imported as `pd`, the Matplotlib `pyplot` package imported as `plt`, and the `Prophet` object from the Prophet library, which can be imported using the following command:
from fbprophet import Prophet
We also need to import the `generate_sample_data` routine from the `tsdata` module, which is included in the code repository for this book:
from tsdata import generate_sample_data
## How to do it... The following steps show you how to use the Prophet package to generate forecasts for a sample time series: 1. First, we use `generate_sample_data` to generate the sample time series data:
sample_ts, test_ts = generate_sample_data(undiff=True, trend=0.2)
2. We need to convert the sample data into a `DataFrame` that Prophet expects:
df_for_prophet = pd.DataFrame({
“ds”: sample_ts.index, # dates
“y”: sample_ts.values # values
})
3. Next, we make a model using the `Prophet` class and fit it to the sample time series:
model = Prophet()
model.fit(df_for_prophet)
4. Now, we create a new `DataFrame` that contains the time intervals for the original time series, plus the additional periods for the forecasts:
forecast_df = model.make_future_dataframe(periods=50)
5. Then, we use the `predict` method to produce the forecasts along the time periods we just created:
forecast = model.predict(forecast_df)
6. Finally, we plot the predictions on top of the sample time series data, along with the confidence interval and the true future values:
fig, ax = plt.subplots(tight_layout=True)
sample_ts.plot(ax=ax, label=“观察到的”, title=“预测”)
forecast.plot(x=“ds”, y=“yhat”, ax=ax, c=“r”,
label=“预测”)
ax.fill_between(forecast[“ds”].values, forecast[“yhat_lower”].values,
forecast[“yhat_upper”].values, color=“r”, alpha=0.4)
test_ts.plot(ax=ax, c=“k”, label=“未来”)
ax.legend()
ax.set_xlabel(“日期”)
ax.set_ylabel(“值”)
The plot of the time series, along with forecasts, can be seen in the following figure: ![](https://gitee.com/OpenDocCN/freelearn-python-zh/raw/master/docs/app-math-py/img/9a9b3e93-8bbb-491f-8f99-2357a722754a.png) Figure 7.15: Plot of sample time series data, along with forecasts and a confidence interval ## How it works... Prophet is a package that's used to automatically produce models for time series data based on sample data, with little extra input needed from the user. In practice, it is very easy to use; we just need to create an instance of the `Prophet` class, call the `fit` method, and then we are ready to produce forecasts and understand our data using the model. The `Prophet` class expects the data in a specific format: a `DataFrame` with columns named `ds` for the date/time index, and `y` for the response data (the time series values). This `DataFrame` should have integer indices. Once the model has been fit, we use `make_future_dataframe` to create a `DataFrame` in the correct format, with appropriate date intervals, and with additional rows for future time intervals. The `predict` method then takes this `DataFrame` and produces values using the model to populate these time intervals with predicted values. We also get other information, such as the confidence intervals, in this forecast's `DataFrame`. ## There's more... Prophet does a fairly good job of modeling time series data without any input from the user. However, the model can be customized using various methods from the `Prophet` class. For example, we could provide information about the seasonality of the data using the `add_seasonality` method of the `Prophet` class, prior to fitting the model. There are alternative packages for automatically generating models for time series data. For example, popular machine learning libraries such as TensorFlow can be used to model time series data. # Further reading A good textbook on regression in statistics is the book *Probability and Statistics* by Mendenhall, Beaver, and Beaver, as mentioned in Chapter 6, *Working with Data and Statistics*. The following books provide a good introduction to classification and regression in modern data science: * *James, G. and Witten, D., 2013\. An Introduction To Statistical Learning: With Applications In R. New York: Springer.* * *Müller, A. and Guido, S., 2016\. Introduction To Machine Learning With Python. Sebastopol: O'Reilly Media.* A good introduction to time series analysis can be found in the following book: * *Cryer, J. and Chan, K., 2008\. Time Series Analysis. New York: Springer.** ```** # 第九章:几何问题 本章描述了关于二维几何的几个问题的解决方案。几何是数学的一个分支,涉及点、线和其他图形(形状)的特征,这些图形之间的相互作用以及这些图形的变换。在本章中,我们将重点关注二维图形的特征以及这些对象之间的相互作用。 在 Python 中处理几何对象时,我们必须克服几个问题。最大的障碍是表示问题。大多数几何对象占据二维平面中的一个区域,因此不可能存储区域内的每个点。相反,我们必须找到一种更紧凑的方式来表示可以存储为相对较少的点的区域。例如,我们可以存储沿对象边界的一些点,从而可以重建边界和对象本身。此外,我们将几何问题重新表述为可以使用代表性数据回答的问题。 第二个最大的问题是将纯几何问题转化为可以使用软件理解和解决的形式。这可能相对简单-例如,找到两条直线相交的点是解决矩阵方程的问题-或者可能非常复杂,这取决于所提出的问题类型。解决这些问题的常见技术是使用更简单的对象表示所讨论的图形,并使用每个简单对象解决(希望)更容易的问题。然后,这应该给我们一个关于原始问题的解决方案的想法。 我们将首先向您展示如何可视化二维形状,然后学习如何确定一个点是否包含在另一个图形中。然后,我们将继续查看边缘检测、三角剖分和寻找凸包。最后,我们将通过构造贝塞尔曲线来结束本章。 本章包括以下教程: + 可视化二维几何形状 + 寻找内部点 + 在图像中查找边缘 + 对平面图形进行三角剖分 + 计算凸包 + 构造贝塞尔曲线 让我们开始吧! # 技术要求 对于本章,我们将像往常一样需要`numpy`包和`matplotlib`包。我们还需要 Shapely 包和`scikit-image`包,可以使用您喜欢的软件包管理器(如`pip`)进行安装: ```py python3.8 -m pip install numpy matplotlib shapely scikit-image
该章节的代码可以在 GitHub 存储库的Chapter 08
文件夹中找到:github.com/PacktPublishing/Applying-Math-with-Python/tree/master/Chapter%2008
。
查看以下视频以查看代码实际操作:bit.ly/3hpeKEF
。
可视化二维几何形状
本章的重点是二维几何,因此我们的第一个任务是学习如何可视化二维几何图形。这里提到的一些技术和工具可能适用于三维几何图形,但通常需要更专门的软件包和工具。
几何图形,至少在本书的上下文中,是指边界是一组线和曲线的任何点、线、曲线或封闭区域(包括边界)的集合。简单的例子包括点和线(显然)、矩形、多边形和圆。
在本教程中,我们将学习如何使用 Matplotlib 可视化几何图形。
准备工作
对于本教程,我们需要导入 NumPy 包作为np
,导入 Matplotlib pyplot
模块作为plt
。我们还需要从 Matplotlib patches
模块导入Circle
类和从 Matplotlib collections
模块导入PatchCollection
类。可以使用以下命令完成这些操作:
import numpy as np import matplotlib.pyplot as plt from matplotlib.patches import Circle from matplotlib.collections import PatchCollection
我们还需要本章的代码库中的swisscheese-grid-10411.csv
数据文件。
如何做…
以下步骤向您展示了如何可视化一个二维几何图形:
- 首先,我们从本书的代码库中的
swisscheese-grid-10411.csv
文件加载数据:
data = np.loadtxt("swisscheese-grid-10411.csv")
- 我们创建一个代表绘图区域的新补丁对象。这将是一个圆(圆盘),其中心在原点,半径为
1
。我们创建一个新的轴集,并将这个补丁添加到其中:
fig, ax = plt.subplots() outer = Circle((0.0, 0.0), 1.0, zorder=0, fc="k") ax.add_patch(outer)
- 接下来,我们从步骤 1中加载的数据创建一个
PatchCollection
对象,其中包含了一些其他圆的中心和半径。然后我们将这个PatchCollection
添加到步骤 2中创建的轴上:
col = PatchCollection( (Circle((x, y), r) for x, y, r in data), facecolor="white", zorder=1, linewidth=0.2, ls="-", ec="k" ) ax.add_collection(col)
- 最后,我们设置*x-和y-*轴的范围,以便整个图像都能显示出来,然后关闭轴线:
ax.set_xlim((-1.1, 1.1)) ax.set_ylim((-1.1, 1.1)) ax.set_axis_off()
结果图像是一个瑞士奶酪,如下所示:
图 8.1:瑞士奶酪的绘图
它是如何工作的…
这个食谱的关键是Circle
和PatchCollection
对象,它们代表了 Matplotlib Axes
上的绘图区域的区域。在这种情况下,我们创建了一个大的圆形补丁,它位于原点,半径为1
,具有黑色的面颜色,并使用zorder=0
将其放在其他补丁的后面。这个补丁被添加到Axes
对象中使用add_patch
方法。
下一步是创建一个对象,它将呈现从 CSV 文件中加载的数据所代表的圆。这些数据包括中心(x,y)和半径r的值,用于表示个别圆的中心和半径(总共 10,411 个)。PatchCollection
对象将一系列补丁组合成一个单一的对象,可以添加到Axes
对象中。在这里,我们为我们的数据中的每一行添加了一个Circle
,然后使用add_collection
方法将其添加到Axes
对象中。请注意,我们已经将面颜色应用到整个集合,而不是每个单独的Circle
成员。我们将面颜色设置为白色(使用facecolor="w"
参数),边缘颜色设置为黑色(使用ec="k"
),边缘线宽设置为 0.2(使用linewidth=0.2
),边缘样式设置为连续线。所有这些放在一起,就得到了我们的图像。
我们在这里创建的图像被称为“瑞士奶酪”。这些最初是由爱丽丝·罗斯在 1938 年在有理逼近理论中使用的;随后它们被重新发现,并且类似的构造自那时以来已经被多次使用。我们使用这个例子是因为它由一个大的个体部分和一个大量的较小的个体部分组成。罗斯的瑞士奶酪是平面上具有正面积但没有拓扑内部的一个集合的例子。(这样一个集合甚至能存在是相当惊人的!)更重要的是,有一些连续函数在这个瑞士奶酪上是不能被有理函数逼近的。这个特性使得类似的构造在均匀 代数理论中非常有用。
Circle
类是更一般的Patch
类的子类。还有许多其他Patch
类,代表不同的平面图形,比如Polygon
和PathPatch
,它们代表了由路径(曲线或曲线集合)所界定的区域。这些可以用来生成可以在 Matplotlib 图中呈现的复杂补丁。集合可以用来同时应用设置到多个补丁对象上,这在本例中特别有用,因为你有大量的对象都将以相同的样式呈现。
还有更多…
Matplotlib 中有许多不同的补丁类型。在本教程中,我们使用了Circle
补丁类,它表示坐标轴上的圆形区域。还有Polygon
补丁类,它表示多边形(规则或其他)。还有PatchPath
对象,它们是由不一定由直线段组成的曲线包围的区域。这类似于许多矢量图形软件包中可以构造阴影区域的方式。
除了 Matplotlib 中的单个补丁类型外,还有许多集合类型,它们将许多补丁聚集在一起,以便作为单个对象使用。在本教程中,我们使用了PatchCollection
类来收集大量的Circle
补丁。还有更多专门的补丁集合,可以用来自动生成这些内部补丁,而不是我们自己生成它们。
另请参阅
在数学中,可以在以下传记文章中找到关于瑞士奶酪的更详细的历史:Daepp,U., Gauthier, P., Gorkin, P. and Schmieder, G., 2005. Alice in Switzerland: The life and mathematics of Alice Roth. The Mathematical Intelligencer, 27(1), pp.41-54。
查找内部点
在编程环境中处理二维图形的一个问题是,您不可能存储所有位于图形内的点。相反,我们通常存储表示图形的远少于的点。在大多数情况下,这将是一些点(通过线连接)来描述图形的边界。这在内存方面是有效的,并且可以使用 MatplotlibPatches
轻松在屏幕上可视化它们。但是,这种方法使确定点或其他图形是否位于给定图形内变得更加困难。这是许多几何问题中的一个关键问题。
在本教程中,我们将学习如何表示几何图形并确定点是否位于图形内。
做好准备
对于本教程,我们需要将matplotlib
包(整体)导入为mpl
,将pyplot
模块导入为plt
:
import matplotlib as mpl import matplotlib.pyplot as plt
我们还需要从 Shapely 包的geometry
模块中导入Point
和Polygon
对象。Shapely 包包含许多用于表示、操作和分析二维几何图形的例程和对象:
from shapely.geometry import Polygon, Point
如何操作…
以下步骤向您展示如何创建多边形的 Shapely 表示,然后测试点是否位于此多边形内:
- 创建一个样本多边形进行测试:
polygon = Polygon( [(0, 2), (-1, 1), (-0.5, -1), (0.5, -1), (1, 1)], )
- 接下来,我们在新图上绘制多边形。首先,我们需要将多边形转换为可以添加到图中的 Matplotlib
Polygon
补丁:
fig, ax = plt.subplots() poly_patch = mpl.patches.Polygon(polygon.exterior, ec="k", lw="1", alpha=0.5) ax.add_patch(poly_patch) ax.set(xlim=(-1.05, 1.05), ylim=(-1.05, 2.05)) ax.set_axis_off()
- 现在,我们需要创建两个测试点,其中一个将位于多边形内,另一个将位于多边形外:
p1 = Point(0.0, 0.0) p2 = Point(-1.0, -0.75)
- 我们在多边形上方绘制并注释这两个点,以显示它们的位置:
ax.plot(0.0, 0.0, "k*") ax.annotate("p1", (0.0, 0.0), (0.05, 0.0)) ax.plot(-0.8, -0.75, "k*") ax.annotate("p2", (-0.8, -0.75), (-0.8 + 0.05, -0.75))
- 最后,我们使用
contains
方法测试每个点是否位于多边形内,然后将结果打印到终端:
print("p1 inside polygon?", polygon.contains(p1)) print("p2 inside polygon?", polygon.contains(p2))
结果显示,第一个点p1
包含在多边形内,而第二个点p2
不包含。这也可以在下图中看到,清楚地显示了一个点包含在阴影多边形内,而另一个点不包含:
图 8.2:多边形区域内外的点
工作原理…
ShapelyPolygon
类是多边形的表示,它将其顶点存储为点。外边界包围的区域-存储顶点之间的五条直线-对我们来说是明显的,并且很容易被眼睛识别,但是“在”边界内的概念很难以一种计算机容易理解的方式定义。甚至很难给出关于“在”给定曲线内的含义的正式数学定义。
确定点是否位于简单闭合曲线内有两种主要方法 - 即从同一位置开始并结束且不包含任何自交点的曲线。第一种方法使用数学概念称为绕数,它计算曲线“绕”点的次数,以及射线交叉计数方法,其中我们计算从点到无穷远处的点的射线穿过曲线的次数。幸运的是,我们不需要自己计算这些数字,因为我们可以使用 Shapely 包中的工具来为我们执行这些计算。这就是多边形的contains
方法所做的。(在底层,Shapely 使用 GEOS 库执行此计算。)
Shapely 的Polygon
类可用于计算与这些平面图形相关的许多数量,包括周长和面积。contains
方法用于确定点或一组点是否位于对象表示的多边形内(该类有关多边形的表示存在一些限制)。实际上,您可以使用相同的方法来确定一个多边形是否包含在另一个多边形内,因为正如我们在这个示例中看到的,多边形由一组简单的点表示。
在图像中找到边缘
在图像中找到边缘是减少包含大量噪音和干扰的复杂图像为包含最突出轮廓的非常简单图像的好方法。这可以作为我们分析过程的第一步,例如在图像分类中,或者作为将线轮廓导入计算机图形软件包的过程。
在这个示例中,我们将学习如何使用scikit-image
包和 Canny 算法来找到复杂图像中的边缘。
准备工作
对于这个示例,我们需要将 Matplotlib 的pyplot
模块导入为plt
,从skimage.io
模块导入imread
例程,以及从skimage.feature
模块导入canny
例程:
import matplotlib.pyplot as plt from skimage.io import imread from skimage.feature import canny
如何做到…
按照以下步骤学习如何使用scikit-image
包在图像中找到边缘:
- 从源文件加载图像数据。这可以在本章的 GitHub 存储库中找到。关键是,我们传入
as_gray=True
以以灰度加载图像:
image = imread("mandelbrot.png", as_gray=True)
以下是原始图像,供参考。集合本身由白色区域显示,如您所见,边界由较暗的阴影表示,非常复杂:
图 8.3:使用 Python 生成的 Mandelbrot 集合的绘图
- 接下来,我们使用
canny
例程,需要从scikit-image
包的features
模块导入。对于这个图像,sigma
值设置为 0.5:
edges = canny(image, sigma=0.5)
- 最后,我们将
edges
图像添加到一个新的图中,使用灰度(反转)色图:
fig, ax = plt.subplots() ax.imshow(edges, cmap="gray_r") ax.set_axis_off()
已检测到的边缘可以在以下图像中看到。边缘查找算法已经识别出 Mandelbrot 集合边界的大部分可见细节,尽管并不完美(毕竟这只是一个估计):
图 8.4:使用 scikit-image 包的 Canny 边缘检测算法找到的 Mandelbrot 集合的边缘
它是如何工作的…
scikit-image
包提供了各种用于操作和分析从图像中导出的数据的实用程序和类型。正如其名称所示,canny
例程使用 Canny 边缘检测算法来找到图像中的边缘。该算法使用图像中的强度梯度来检测边缘,其中梯度较大。它还执行一些过滤以减少它找到的边缘中的噪音。
我们提供的sigma
关键字值是应用于图像的高斯平滑的标准偏差,用于计算边缘检测。这有助于我们去除图像中的一些噪音。我们设置的值(0.5)小于默认值(1),但在这种情况下可以提供更好的分辨率。较大的值会遮盖 Mandelbrot 集边界的一些细节。
Python 数学应用(三)(3)https://developer.aliyun.com/article/1506405