Probability Distributions

简介: Refer to R Tutorial andExercise Solution A probability distribution describes how the values of a random variable is distributed.
Refer to R Tutorial andExercise Solution

A probability distribution describes how the values of a random variable is distributed.

 

Binomial Distribution, 二项分布

The binomial distribution is a discrete probability distribution. It describes the outcome of n independent trials in an experiment. Each trial is assumed to have only two outcome, labeled as success or failure. If the probability of a successful trial is p, then the probability of having x successful trials in an experiment is as follows.

描述随机现象的一种常用概率分布形式,因与二项式展开式相同而得名。即重复n次的伯努利试验。在每次试验中只有两种可能的结果,而且是互相对立的,是独立的,与其它各次试验结果无关,结果事件发生的概率在整个系列试验中保持不变,则这一系列试验称为伯努力试验。

一个简单的例子如下:掷一枚骰子十次,那么掷得4的次数就服从n = 10、p = 1/6的二项分布。

we apply the function pbinom with x = 4, n = 12, p = 0.2.

> pbinom(4, size=12, prob=0.2)  
[1] 0.92744

Poisson Distribution, 泊松分布

The Poisson distribution is the probability distribution of independent events occurrence in an interval. If λ is the mean occurrence per interval, then the probability of having x occurrence within a given interval is:

泊松分布适合于描述单位时间内随机事件发生的次数。如某一服务设施在一定时间内到达的人数,电话交换机接到呼叫的次数,汽车站台的候客人数,机器出现的故障数,自然灾害发生的次数等等。

If there are twelve cars crossing a bridge per minute on average, find the probability of having sixteen or more cars crossing the bridge in a particular minute.

We compute the upper tail probability of the Poisson distribution with the function ppois.

> ppois(16, lambda=12, lower=FALSE)   # find upper tail  
[1] 0.10129

If there are twelve cars crossing a bridge per minute on average, the probability of having sixteen or more cars crossing the bridge in a particular minute is 10.1%.

泊松分布与二项分布的区别

当二项分布的n很大而p很小时,泊松分布可作为二项分布的近似,其中λ为np。通常当n≧10,p≦0.1时,就可以用泊松公式近似计算。

 

Continuous Uniform Distribution, 连续均匀分布

The continuous uniform distribution is the probability distribution of random number selection from the continuous interval between a and b. Its density function is defined by the following.

Here is a graph of the continuous uniform distribution with a = 1, b = 3.

Exponential Distribution, 指数分布

The exponential distribution describes the arrival time of a randomly recurring independent event sequence. If μ is the mean waiting time for the next event recurrence, its probability density function is:

Here is a graph of the exponential distribution with μ = 1.

指数分布(Exponential distribution)是一种连续概率分布。指数分布可以用来表示独立随机事件发生的时间间隔,比如旅客进机场的时间间隔、中文维基百科新条目出现的时间间隔等等。

Suppose the mean checkout time of a supermarket cashier is three minutes. Find the probability of a customer checkout being completed by the cashier in less than two minutes.

The checkout processing rate is equals to one divided by the mean checkout completion time. Hence the processing rate is 1/3 checkouts per minute. We then apply the function pexp of the exponential distribution with rate=1/3.

> pexp(2, rate=1/3)  
[1] 0.48658

 

Normal Distribution, 正态分布

The normal distribution is defined by the following probability density function, where μ is the population mean and σ2 is thevariance.

In particular, the normal distribution with μ = 0 and σ = 1 is called the standard normal distribution, and is denoted as N(0,1). It can be graphed as follows.

正态分布Normal distribution)又名高斯分布Gaussian distribution), 很重要的一种分布...因为中心极限定理

中心极限定理(Central Limit Theorem)

正态分布有一个非常重要的性质:在特定条件下,大量统计独立的随机变量的平均值的分布趋于正态分布,这就是中心极限定理。中心极限定理的重要意义在于,根据这一定理的结论,其他概率分布可以用正态分布作为近似。

  • 参数为np二项分布,在n相当大而且p不接近1或者0时近似于正态分布(有的参考书建议仅在npn(1 − p)至少为5时才能使用这一近似). 近似正态分布平均数为μ = np且方差为σ2 = np(1 − p).
  • 泊松分布带有参数λ当取样样本数很大时将近似正态分布λ. 近似正态分布平均数为μ = λ且方差为σ2 = λ.

Assume that the test scores of a college entrance exam fits a normal distribution. Furthermore, the mean test score is 72, and the standard deviation is 15.2. What is the percentage of students scoring 84 or more in the exam?

We apply the function pnorm of the normal distribution with mean 72 and standard deviation 15.2. Since we are looking for the percentage of students scoring higher than 84, we are interested in the upper tail of the normal distribution.

> pnorm(84, mean=72, sd=15.2, lower.tail=FALSE)  
[1] 0.21492

 

Chi-squared Distribution, 卡方分布

If X1,X2,…,Xm are m independent random variables having the standard normal distribution, then the following quantity follows a Chi-Squared distribution with m degrees of freedom. Its mean is m, and its variance is 2m.

Here is a graph of the Chi-Squared distribution 7 degrees of freedom.

卡方分布(χ2分布)是概率论统计学中常用的一种概率分布k独立的标准正态分布变量的平方和服从自由度为k的卡方分布。卡方分布常用于假设检验置信区间的计算。

Find the 95th percentile of the Chi-Squared distribution with 7 degrees of freedom.

We apply the quantile function qchisq of the Chi-Squared distribution against the decimal values 0.95.

> qchisq(.95, df=7)        # 7 degrees of freedom  
[1] 14.067

 

Student t Distribution, 学生t分布

Assume that a random variable Z has the standard normal distribution, and another random variable V has the Chi-Squared distribution with m degrees of freedom. Assume further that Z and V are independent, then the following quantity follows a Student t distribution with m degrees of freedom.

Here is a graph of the Student t distribution with 5 degrees of freedom.

Find the 2.5th and 97.5th percentiles of the Student t distribution with 5 degrees of freedom.

> qt(c(.025, .975), df=5)   # 5 degrees of freedom  
[1] -2.5706  2.5706

 

F Distribution, 费雪分布

If V 1 and V 2 are two independent random variables having the Chi-Squared distribution with m1 and m2 degrees of freedom respectively, then the following quantity follows an F distribution with m1 numerator degrees of freedom and m2denominator degrees of freedom, i.e., (m1,m2) degrees of freedom.

Here is a graph of the F distribution with (5, 2) degrees of freedom.

Find the 95th percentile of the F distribution with (5, 2) degrees of freedom.

> qf(.95, df1=5, df2=2)  
[1] 19.296

 

卡方分布(χ2分布)、t分布和F分布合称三大抽样分布, 因为他们都是基于正态分布的


本文章摘自博客园,原文发布日期:2012-02-16

目录
相关文章
|
11月前
|
C++
hackerrank challenges median
只能说这题我想多了,使用普通的插入排序完全可以解决这道题,在查找的时候用二分加快查找速度。 正确解题报告 这道题的关键在于,不能用int,因为两个int相加可能会越界!因为这个WA了好多遍。所以改用long long。 对double,使用math.h中的函数ceil(double)可以取整,根据ceil(v) == v的结果可以判断v是否是整数。
43 0
|
3月前
|
机器学习/深度学习 算法 关系型数据库
Hierarchical Attention-Based Age Estimation and Bias Analysis
【6月更文挑战第8天】Hierarchical Attention-Based Age Estimation论文提出了一种深度学习方法,利用层次注意力和图像增强来估计面部年龄。通过Transformer和CNN,它学习局部特征并进行序数分类和回归,提高在CACD和MORPH II数据集上的准确性。论文还包括对种族和性别偏倚的分析。方法包括自我注意的图像嵌入和层次概率年龄回归,优化多损失函数。实验表明,该方法在RS和SE协议下表现优越,且在消融研究中验证了增强聚合和编码器设计的有效性。
29 2
|
监控
DFNet: Enhance Absolute Pose Regression withDirect Feature Matching
DFNet: Enhance Absolute Pose Regression withDirect Feature Matching
127 0
|
运维 安全 数据挖掘
Outlier and Outlier Analysis|学习笔记
快速学习 Outlier and Outlier Analysis
Outlier and Outlier Analysis|学习笔记
|
机器学习/深度学习 异构计算
COVID-19 Cases Prediction (Regression)(二)
COVID-19 Cases Prediction (Regression)
434 0
COVID-19 Cases Prediction (Regression)(二)
|
机器学习/深度学习
COVID-19 Cases Prediction (Regression)(一)
COVID-19 Cases Prediction (Regression)
505 0
COVID-19 Cases Prediction (Regression)(一)
成功解决coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to inc
成功解决coordinate_descent.py:491: ConvergenceWarning: Objective did not converge. You might want to inc
|
Linux TensorFlow 算法框架/工具
Using side features: feature preprocessing
One of the great advantages of using a deep learning framework to build recommender models is the freedom to build rich, flexible feature representations.
156 0
1118. Birds in Forest (25)
#include #include #include #include #include using namespace std; int father[10001]; int to[10001]; int fi...
761 0