3、barplot函数:条形图可视化
seaborn.barplot(*, x=None, y=None, hue=None, data=None, order=None, hue_order=None, estimator=<function mean at 0x7fecadf1cee0>, ci=95, n_boot=1000, units=None, seed=None, orient=None, color=None, palette=None, saturation=0.75, errcolor='.26', errwidth=None, capsize=None, dodge=True, ax=None, **kwargs)
仅第2变量必须为数值型
条形图表示数值变量与每个矩形高度的中心趋势的估计值(默认平均值),并使用误差条提供关于该估计值附近的不确定性的一些指示。误差条越长,数据离散程度越大,数据越不稳定。
官方文档解释:http://seaborn.pydata.org/generated/seaborn.barplot.html?highlight=barplot#seaborn.barplot
Show point estimates and confidence intervals as rectangular bars.
A bar plot represents an estimate of central tendency for a numeric variable with the height of each rectangle and provides some indication of the uncertainty around that estimate using error bars. Bar plots include 0 in the quantitative axis range, and they are a good choice when 0 is a meaningful value for the quantitative variable, and you want to make comparisons against it.
For datasets where 0 is not a meaningful value, a point plot will allow you to focus on differences between levels of one or more categorical variables.
It is also important to keep in mind that a bar plot shows only the mean (or other estimator) value, but in many cases it may be more informative to show the distribution of values at each level of the categorical variables. In that case, other approaches such as a box or violin plot may be more appropriate.
用矩形条显示点估计和置信区间。
条形图表示对每个矩形高度的数值变量的集中趋势的估计,并使用误差条提供了一些关于估计的不确定性的指示。条形图在数量轴范围中包括0,当0是数量变量的一个有意义的值,并希望与之进行比较时,条形图是一个很好的选择。
对于0不是一个有意义的值的数据集,点图将允许你关注一个或多个分类变量的不同级别。
同样重要的是要记住,条形图只显示平均值(或其他估计值),但在许多情况下,显示分类变量每一级的值分布可能会提供更多信息。在这种情况下,其他方法,如盒子或小提琴情节可能更合适。
Input data can be passed in a variety of formats, including:
Vectors of data represented as lists, numpy arrays, or pandas Series objects passed directly to the x, y, and/or hue parameters.
A “long-form” DataFrame, in which case the x, y, and hue variables will determine how the data are plotted.
A “wide-form” DataFrame, such that each numeric column will be plotted.
An array or list of vectors.
In most cases, it is possible to use numpy or Python objects, but pandas objects are preferable because the associated names will be used to annotate the axes. Additionally, you can use Categorical types for the grouping variables to control the order of plot elements.
This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type.
输入数据可以以多种格式传递,包括:
表示为列表、numpy数组或pandas系列对象的数据向量,直接传递给x、y和/或hue参数。
一个“长格式”数据帧,在这种情况下,x, y和hue变量将决定数据如何绘制。
一种“宽格式”数据帧,这样每个数字列都将被绘制出来。
向量的数组或列表。
在大多数情况下,可以使用numpy或Python对象,但pandas对象更合适,因为关联的名称将用于注释坐标轴。此外,您可以为分组变量使用类别类型来控制绘图元素的顺序。
该函数总是将其中一个变量视为类别变量,并在相关轴上的顺序位置(0,1,…n)绘制数据,即使数据具有numeric 或date 类型。
(1)、BarPlot
(2)、BarPlotByV
(3)、BarPlotBy2V
4、pointplot函数:点估计和置信区间可视化(误差条)
seaborn.pointplot(*, x=None, y=None, hue=None, data=None, order=None, hue_order=None, estimator=<function mean at 0x7fecadf1cee0>, ci=95, n_boot=1000, units=None, seed=None, markers='o', linestyles='-', dodge=False, join=True, scale=1, orient=None, color=None, palette=None, errwidth=None, capsize=None, ax=None, **kwargs)
仅第2变量必须为数值型
置信区间估计:图中的点为该组数据的平均值点,竖线则为误差条,默认两个均值点会相连接
官方文档解释:http://seaborn.pydata.org/generated/seaborn.pointplot.html?highlight=pointplot#seaborn.pointplot
Show point estimates and confidence intervals using scatter plot glyphs.
A point plot represents an estimate of central tendency for a numeric variable by the position of scatter plot points and provides some indication of the uncertainty around that estimate using error bars.
Point plots can be more useful than bar plots for focusing comparisons between different levels of one or more categorical variables. They are particularly adept at showing interactions: how the relationship between levels of one categorical variable changes across levels of a second categorical variable. The lines that join each point from the same hue level allow interactions to be judged by differences in slope, which is easier for the eyes than comparing the heights of several groups of points or bars.
It is important to keep in mind that a point plot shows only the mean (or other estimator) value, but in many cases it may be more informative to show the distribution of values at each level of the categorical variables. In that case, other approaches such as a box or violin plot may be more appropriate.
使用散点图符号显示点估计和置信区间。
点图通过散点的位置表示对数值变量的集中趋势的估计,并使用误差条提供一些关于估计的不确定性的指示。
点图比条形图更有助于集中比较一个或多个分类变量的不同层次。他们特别擅长展示交互作用:一个分类变量的各个层次之间的关系如何在另一个分类变量的各个层次之间发生变化。连接来自同一色调等级的每个点的线条允许通过斜率的差异来判断交互作用,这比比较几组点或条的高度更容易。
重要的是要记住点图只显示平均值(或其他估计值),但在许多情况下,显示分类变量的每一级值的分布可能会提供更多的信息。在这种情况下,其他方法,如盒子或小提琴情节可能更合适。
Input data can be passed in a variety of formats, including:
Vectors of data represented as lists, numpy arrays, or pandas Series objects passed directly to the x, y, and/or hue parameters.
A “long-form” DataFrame, in which case the x, y, and hue variables will determine how the data are plotted.
A “wide-form” DataFrame, such that each numeric column will be plotted.
An array or list of vectors.
In most cases, it is possible to use numpy or Python objects, but pandas objects are preferable because the associated names will be used to annotate the axes. Additionally, you can use Categorical types for the grouping variables to control the order of plot elements.
This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type.
输入数据可以以多种格式传递,包括:
表示为列表、numpy数组或pandas系列对象的数据向量,直接传递给x、y和/或hue参数。
一个“长格式”数据帧,在这种情况下,x, y和hue变量将决定数据如何绘制。
一种“宽格式”数据帧,这样每个数字列都将被绘制出来。
向量的数组或列表。
在大多数情况下,可以使用numpy或Python对象,但pandas对象更合适,因为关联的名称将用于注释坐标轴。此外,您可以为分组变量使用类别类型来控制绘图元素的顺序。
该函数总是将其中一个变量视为类别变量,并在相关轴上的顺序位置(0,1,…n)绘制数据,即使数据具有数字或日期类型。
5、stripplot函数:散点图可视化
seaborn.stripplot(*, x=None, y=None, hue=None, data=None, order=None, hue_order=None, jitter=True, dodge=False, orient=None, color=None, palette=None, size=5, edgecolor='gray', linewidth=0, ax=None, **kwargs)
官方文档解释:http://seaborn.pydata.org/generated/seaborn.stripplot.html?highlight=stripplot#seaborn.stripplot
Draw a scatterplot where one variable is categorical.
A strip plot can be drawn on its own, but it is also a good complement to a box or violin plot in cases where you want to show all observations along with some representation of the underlying distribution.
Input data can be passed in a variety of formats, including:
Vectors of data represented as lists, numpy arrays, or pandas Series objects passed directly to the x, y, and/or hue parameters.
A “long-form” DataFrame, in which case the x, y, and hue variables will determine how the data are plotted.
A “wide-form” DataFrame, such that each numeric column will be plotted.
An array or list of vectors.
绘制一个散点图,其中一个变量是类别变量。
条形图可以自己绘制,但在您想要显示所有观察结果以及一些潜在分布的表示的情况下,它也是盒形图或小提琴形图的一个很好的补充。
输入数据可以以多种格式传递,包括:
表示为列表、numpy数组或pandas系列对象的数据向量,直接传递给x、y和/或hue参数。
一个“长格式”数据帧,在这种情况下,x, y和hue变量将决定数据如何绘制。
一种“宽格式”数据帧,这样每个数字列都将被绘制出来。
向量的数组或列表。
In most cases, it is possible to use numpy or Python objects, but pandas objects are preferable because the associated names will be used to annotate the axes. Additionally, you can use Categorical types for the grouping variables to control the order of plot elements.
This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type.
在大多数情况下,可以使用numpy或Python对象,但pandas对象更合适,因为关联的名称将用于注释坐标轴。此外,您可以为分组变量使用类别类型来控制情节元素的顺序。
该函数总是将其中一个变量视为类别变量,并在相关轴上的顺序位置(0,1,…n)绘制数据,即使数据具有数字或日期类型。
6、relplot函数:散点图/折线图可视化
seaborn.relplot(*, x=None, y=None, hue=None, size=None, style=None, data=None, row=None, col=None, col_wrap=None, row_order=None, col_order=None, palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None, size_norm=None, markers=None, dashes=None, style_order=None, legend='auto', kind='scatter', height=5, aspect=1, facet_kws=None, units=None, **kwargs)
官方文档解释:http://seaborn.pydata.org/generated/seaborn.relplot.html?highlight=relplot#seaborn.relplot
Figure-level interface for drawing relational plots onto a FacetGrid.
This function provides access to several different axes-level functions that show the relationship between two variables with semantic mappings of subsets. The kind parameter selects the underlying axes-level function to use:
scatterplot() (with kind="scatter"; the default)
lineplot() (with kind="line")
Extra keyword arguments are passed to the underlying function, so you should refer to the documentation for each to see kind-specific options.
The relationship between x and y can be shown for different subsets of the data using the hue, size, and style parameters. These parameters control what visual semantics are used to identify the different subsets. It is possible to show up to three dimensions independently by using all three semantic types, but this style of plot can be hard to interpret and is often ineffective. Using redundant semantics (i.e. both hue and style for the same variable) can be helpful for making graphics more accessible.
See the tutorial for more information.
用于在FacetGrid上绘制关系图的图形级接口。
这个函数提供了对几个不同的轴级函数的访问,这些函数显示了两个具有子集语义映射的变量之间的关系。kind参数选择要使用的axis级函数:
scatterplot() (with kind="scatter"; the default)
lineplot() (with kind="line")
额外的关键字参数被传递给底层函数,因此您应该参考每个函数的文档来查看特定种类的选项。
x和y之间的关系可以通过使用hue、size和style参数来显示数据的不同子集。这些参数控制使用什么视觉语义来标识不同的子集。通过使用这三种语义类型,我们可以独立呈现出三个维度,但这种绘图风格很难解释,而且通常是无效的。使用冗余的语义(例如,相同变量的色调和样式)有助于让图形更容易访问。
有关更多信息,请参阅本教程。
The default treatment of the hue (and to a lesser extent, size) semantic, if present, depends on whether the variable is inferred to represent “numeric” or “categorical” data. In particular, numeric variables are represented with a sequential colormap by default, and the legend entries show regular “ticks” with values that may or may not exist in the data. This behavior can be controlled through various parameters, as described and illustrated below.
After plotting, the FacetGrid with the plot is returned and can be used directly to tweak supporting plot details or add other layers.
如果存在色相(以及较小程度上的大小)语义的默认处理,则取决于该变量是被推断为表示“numeric”还是“categorical”数据。具体来说,默认情况下,数值变量用顺序的colormap表示,并且图例条目显示有规律的“刻度”,刻度的值可能存在于数据中,也可能不存在。这种行为可以通过各种参数来控制,如下面的描述和说明所示。
绘制后,返回带有plot的FacetGrid,可以直接用于调整支持的plot细节或添加其他层。