7、regplot函数:散点线性回归分析图/置信区间图可视化
seaborn.regplot(*, x=None, y=None, data=None, x_estimator=None, x_bins=None, x_ci='ci', scatter=True, fit_reg=True, ci=95, n_boot=1000, units=None, seed=None, order=1, logistic=False, lowess=False, robust=False, logx=False, x_partial=None, y_partial=None, truncate=True, dropna=True, x_jitter=None, y_jitter=None, label=None, color=None, marker='o', scatter_kws=None, line_kws=None, ax=None)
官方文档解释:http://seaborn.pydata.org/generated/seaborn.regplot.html?highlight=regplot#seaborn.regplot
Plot data and a linear regression model fit.
There are a number of mutually exclusive options for estimating the regression model. See the tutorial for more information.
图数据与线性回归模型拟合。
估计回归模型有许多互斥的选项。有关更多信息,请参阅本教程。
(1)、default
单独,x_estimator=np.mean, # 如果x为离散数据,显示其平均值。
(2)、单独,用log(x)拟合回归模型并截断模型预测
8、kdeplot函数:核密度等高线图可视化
seaborn.kdeplot(x=None, *, y=None, shade=None, vertical=False, kernel=None, bw=None, gridsize=200, cut=3, clip=None, legend=True, cumulative=False, shade_lowest=None, cbar=False, cbar_ax=None, cbar_kws=None, ax=None, weights=None, hue=None, palette=None, hue_order=None, hue_norm=None, multiple='layer', common_norm=True, common_grid=False, levels=10, thresh=0.05, bw_method='scott', bw_adjust=1, log_scale=None, color=None, fill=None, data=None, data2=None, **kwargs)
官方文档解释:http://seaborn.pydata.org/generated/seaborn.kdeplot.html?highlight=kdeplot#seaborn.kdeplot
Plot univariate or bivariate distributions using kernel density estimation.
A kernel density estimate (KDE) plot is a method for visualizing the distribution of observations in a dataset, analagous to a histogram. KDE represents the data using a continuous probability density curve in one or more dimensions.
The approach is explained further in the user guide.
使用核密度估计绘制单变量或双变量分布。
核密度估计(KDE)图是一种可视化数据集中观测值分布的方法,类似于直方图。KDE使用一个或多个维度的连续概率密度曲线表示数据。
该方法在用户指南中有进一步的解释。
9、boxplot函数:盒形图可视化
seaborn.boxplot(*, x=None, y=None, hue=None, data=None, order=None, hue_order=None, orient=None, color=None, palette=None, saturation=0.75, width=0.8, dodge=True, fliersize=5, linewidth=None, whis=1.5, ax=None, **kwargs)
官方文档解释:http://seaborn.pydata.org/generated/seaborn.boxplot.html?highlight=boxplot#seaborn.boxplot
Draw a box plot to show distributions with respect to categories.
A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range.
Input data can be passed in a variety of formats, including:
Vectors of data represented as lists, numpy arrays, or pandas Series objects passed directly to the x, y, and/or hue parameters.
A “long-form” DataFrame, in which case the x, y, and hue variables will determine how the data are plotted.
A “wide-form” DataFrame, such that each numeric column will be plotted.
An array or list of vectors.
画一个盒形图来显示与类别特征有关的分布。
盒形图(或盒须图)显示了定量数据的分布,以促进变量之间的比较或分类变量的层次。盒形图显示数据集的四分位数,而须扩展显示分布的其余部分,除了使用四分位数间范围函数的方法确定为“异常值”的点。
输入数据可以以多种格式传递,包括:
表示为列表、numpy数组或pandas系列对象的数据向量,直接传递给x、y和/或hue参数。
一个“长格式”数据帧,在这种情况下,x, y和hue变量将决定数据如何绘制。
一种“宽格式”数据帧,这样每个数字列都将被绘制出来。
向量的数组或列表。
In most cases, it is possible to use numpy or Python objects, but pandas objects are preferable because the associated names will be used to annotate the axes. Additionally, you can use Categorical types for the grouping variables to control the order of plot elements.
This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type.
在大多数情况下,可以使用numpy或Python对象,但pandas对象更合适,因为关联的名称将用于注释坐标轴。此外,您可以为分组变量使用类别类型来控制绘图元素的顺序。
该函数总是将其中一个变量视为类别变量,并在相关轴上的顺序位置(0,1,…n)绘制数据,即使数据具有数字或日期类型。
10、violinplot函数:小提琴图可视化
seaborn.violinplot(*, x=None, y=None, hue=None, data=None, order=None, hue_order=None, bw='scott', cut=2, scale='area', scale_hue=True, gridsize=100, width=0.8, inner='box', split=False, dodge=True, orient=None, linewidth=None, color=None, palette=None, saturation=0.75, ax=None, **kwargs)
官方文档解释:http://seaborn.pydata.org/generated/seaborn.violinplot.html?highlight=violinplot#seaborn.violinplot
Draw a combination of boxplot and kernel density estimate.
A violin plot plays a similar role as a box and whisker plot. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared. Unlike a box plot, in which all of the plot components correspond to actual datapoints, the violin plot features a kernel density estimation of the underlying distribution.
This can be an effective and attractive way to show multiple distributions of data at once, but keep in mind that the estimation procedure is influenced by the sample size, and violins for relatively small samples might look misleadingly smooth.
Input data can be passed in a variety of formats, including:
Vectors of data represented as lists, numpy arrays, or pandas Series objects passed directly to the x, y, and/or hue parameters.
A “long-form” DataFrame, in which case the x, y, and hue variables will determine how the data are plotted.
A “wide-form” DataFrame, such that each numeric column will be plotted.
An array or list of vectors.
绘制箱线图和核密度估计的组合。
小提琴图的作用类似于盒子和胡须图。它显示了定量数据在一个(或多个)分类变量的几个层次上的分布,这样就可以比较这些分布。盒形图的所有图组件都对应于实际的数据点,与盒形图不同,小提琴形图的特点是对底层分布的核密度估计。
这是一种有效且有吸引力的同时显示多个数据分布的方法,但请记住,估计过程受到样本大小的影响,相对较小的样本可能看起来平滑得令人误解。
输入数据可以以多种格式传递,包括:
表示为列表、numpy数组或pandas系列对象的数据向量,直接传递给x、y和/或hue参数。
一个“长格式”数据帧,在这种情况下,x, y和hue变量将决定数据如何绘制。
一种“宽格式”数据帧,这样每个数字列都将被绘制出来。
向量的数组或列表。
In most cases, it is possible to use numpy or Python objects, but pandas objects are preferable because the associated names will be used to annotate the axes. Additionally, you can use Categorical types for the grouping variables to control the order of plot elements.
This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type.
在大多数情况下,可以使用numpy或Python对象,但pandas对象更合适,因为关联的名称将用于注释坐标轴。此外,您可以为分组变量使用类别类型来控制情节元素的顺序。
该函数总是将其中一个变量视为类别变量,并在相关轴上的顺序位置(0,1,…n)绘制数据,即使数据具有数字或日期类型。
split=True, # 设置是否拆分小提琴图,前提条件是第三特征为二类别属性,尝试测试
11、boxenplot函数:LV多框图可视化
seaborn.boxenplot(*, x=None, y=None, hue=None, data=None, order=None, hue_order=None, orient=None, color=None, palette=None, saturation=0.75, width=0.8, dodge=True, k_depth='tukey', linewidth=None, scale='exponential', outlier_prop=0.007, trust_alpha=0.05, showfliers=True, ax=None, **kwargs)
官方文档解释:http://seaborn.pydata.org/generated/seaborn.boxenplot.html?highlight=boxenplot#seaborn.boxenplot
Draw an enhanced box plot for larger datasets.
This style of plot was originally named a “letter value” plot because it shows a large number of quantiles that are defined as “letter values”. It is similar to a box plot in plotting a nonparametric representation of a distribution in which all features correspond to actual observations. By plotting more quantiles, it provides more information about the shape of the distribution, particularly in the tails. For a more extensive explanation, you can read the paper that introduced the plot:
https://vita.had.co.nz/papers/letter-value-plot.html
Input data can be passed in a variety of formats, including:
Vectors of data represented as lists, numpy arrays, or pandas Series objects passed directly to the x, y, and/or hue parameters.
A “long-form” DataFrame, in which case the x, y, and hue variables will determine how the data are plotted.
A “wide-form” DataFrame, such that each numeric column will be plotted.
An array or list of vectors.
为更大的数据集绘制增强的箱线图。
这种样式的图最初被称为“字母值”图,因为它显示了大量定义为“字母值”的分位数。它类似于用非参数表示一个分布的箱线图,其中所有的特征都对应于实际的观察结果。通过绘制更多的分位数,它提供了更多关于分布形状的信息,特别是在尾部。要想获得更广泛的解释,你可以阅读介绍情节的文章:
https://vita.had.co.nz/papers/letter-value-plot.html
输入数据可以以多种格式传递,包括:
表示为列表、numpy数组或pandas系列对象的数据向量,直接传递给x、y和/或hue参数。
一个“长格式”数据帧,在这种情况下,x, y和hue变量将决定数据如何绘制。
一种“宽格式”数据帧,这样每个数字列都将被绘制出来。
向量的数组或列表。
In most cases, it is possible to use numpy or Python objects, but pandas objects are preferable because the associated names will be used to annotate the axes. Additionally, you can use Categorical types for the grouping variables to control the order of plot elements.
This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type.
在大多数情况下,可以使用numpy或Python对象,但pandas对象更合适,因为关联的名称将用于注释坐标轴。此外,您可以为分组变量使用类别类型来控制情节元素的顺序。
该函数总是将其中一个变量视为类别变量,并在相关轴上的顺序位置(0,1,…n)绘制数据,即使数据具有数字或日期类型。