Probability Distributions

简介: Refer to R Tutorial andExercise Solution A probability distribution describes how the values of a random variable is distributed.
Refer to R Tutorial andExercise Solution

A probability distribution describes how the values of a random variable is distributed.

 

Binomial Distribution, 二项分布

The binomial distribution is a discrete probability distribution. It describes the outcome of n independent trials in an experiment. Each trial is assumed to have only two outcome, labeled as success or failure. If the probability of a successful trial is p, then the probability of having x successful trials in an experiment is as follows.

描述随机现象的一种常用概率分布形式,因与二项式展开式相同而得名。即重复n次的伯努利试验。在每次试验中只有两种可能的结果,而且是互相对立的,是独立的,与其它各次试验结果无关,结果事件发生的概率在整个系列试验中保持不变,则这一系列试验称为伯努力试验。

一个简单的例子如下:掷一枚骰子十次,那么掷得4的次数就服从n = 10、p = 1/6的二项分布。

we apply the function pbinom with x = 4, n = 12, p = 0.2.

> pbinom(4, size=12, prob=0.2)  
[1] 0.92744

Poisson Distribution, 泊松分布

The Poisson distribution is the probability distribution of independent events occurrence in an interval. If λ is the mean occurrence per interval, then the probability of having x occurrence within a given interval is:

泊松分布适合于描述单位时间内随机事件发生的次数。如某一服务设施在一定时间内到达的人数,电话交换机接到呼叫的次数,汽车站台的候客人数,机器出现的故障数,自然灾害发生的次数等等。

If there are twelve cars crossing a bridge per minute on average, find the probability of having sixteen or more cars crossing the bridge in a particular minute.

We compute the upper tail probability of the Poisson distribution with the function ppois.

> ppois(16, lambda=12, lower=FALSE)   # find upper tail  
[1] 0.10129

If there are twelve cars crossing a bridge per minute on average, the probability of having sixteen or more cars crossing the bridge in a particular minute is 10.1%.

泊松分布与二项分布的区别

当二项分布的n很大而p很小时,泊松分布可作为二项分布的近似,其中λ为np。通常当n≧10,p≦0.1时,就可以用泊松公式近似计算。

 

Continuous Uniform Distribution, 连续均匀分布

The continuous uniform distribution is the probability distribution of random number selection from the continuous interval between a and b. Its density function is defined by the following.

Here is a graph of the continuous uniform distribution with a = 1, b = 3.

Exponential Distribution, 指数分布

The exponential distribution describes the arrival time of a randomly recurring independent event sequence. If μ is the mean waiting time for the next event recurrence, its probability density function is:

Here is a graph of the exponential distribution with μ = 1.

指数分布(Exponential distribution)是一种连续概率分布。指数分布可以用来表示独立随机事件发生的时间间隔,比如旅客进机场的时间间隔、中文维基百科新条目出现的时间间隔等等。

Suppose the mean checkout time of a supermarket cashier is three minutes. Find the probability of a customer checkout being completed by the cashier in less than two minutes.

The checkout processing rate is equals to one divided by the mean checkout completion time. Hence the processing rate is 1/3 checkouts per minute. We then apply the function pexp of the exponential distribution with rate=1/3.

> pexp(2, rate=1/3)  
[1] 0.48658

 

Normal Distribution, 正态分布

The normal distribution is defined by the following probability density function, where μ is the population mean and σ2 is thevariance.

In particular, the normal distribution with μ = 0 and σ = 1 is called the standard normal distribution, and is denoted as N(0,1). It can be graphed as follows.

正态分布Normal distribution)又名高斯分布Gaussian distribution), 很重要的一种分布...因为中心极限定理

中心极限定理(Central Limit Theorem)

正态分布有一个非常重要的性质:在特定条件下,大量统计独立的随机变量的平均值的分布趋于正态分布,这就是中心极限定理。中心极限定理的重要意义在于,根据这一定理的结论,其他概率分布可以用正态分布作为近似。

  • 参数为np二项分布,在n相当大而且p不接近1或者0时近似于正态分布(有的参考书建议仅在npn(1 − p)至少为5时才能使用这一近似). 近似正态分布平均数为μ = np且方差为σ2 = np(1 − p).
  • 泊松分布带有参数λ当取样样本数很大时将近似正态分布λ. 近似正态分布平均数为μ = λ且方差为σ2 = λ.

Assume that the test scores of a college entrance exam fits a normal distribution. Furthermore, the mean test score is 72, and the standard deviation is 15.2. What is the percentage of students scoring 84 or more in the exam?

We apply the function pnorm of the normal distribution with mean 72 and standard deviation 15.2. Since we are looking for the percentage of students scoring higher than 84, we are interested in the upper tail of the normal distribution.

> pnorm(84, mean=72, sd=15.2, lower.tail=FALSE)  
[1] 0.21492

 

Chi-squared Distribution, 卡方分布

If X1,X2,…,Xm are m independent random variables having the standard normal distribution, then the following quantity follows a Chi-Squared distribution with m degrees of freedom. Its mean is m, and its variance is 2m.

Here is a graph of the Chi-Squared distribution 7 degrees of freedom.

卡方分布(χ2分布)是概率论统计学中常用的一种概率分布k独立的标准正态分布变量的平方和服从自由度为k的卡方分布。卡方分布常用于假设检验置信区间的计算。

Find the 95th percentile of the Chi-Squared distribution with 7 degrees of freedom.

We apply the quantile function qchisq of the Chi-Squared distribution against the decimal values 0.95.

> qchisq(.95, df=7)        # 7 degrees of freedom  
[1] 14.067

 

Student t Distribution, 学生t分布

Assume that a random variable Z has the standard normal distribution, and another random variable V has the Chi-Squared distribution with m degrees of freedom. Assume further that Z and V are independent, then the following quantity follows a Student t distribution with m degrees of freedom.

Here is a graph of the Student t distribution with 5 degrees of freedom.

Find the 2.5th and 97.5th percentiles of the Student t distribution with 5 degrees of freedom.

> qt(c(.025, .975), df=5)   # 5 degrees of freedom  
[1] -2.5706  2.5706

 

F Distribution, 费雪分布

If V 1 and V 2 are two independent random variables having the Chi-Squared distribution with m1 and m2 degrees of freedom respectively, then the following quantity follows an F distribution with m1 numerator degrees of freedom and m2denominator degrees of freedom, i.e., (m1,m2) degrees of freedom.

Here is a graph of the F distribution with (5, 2) degrees of freedom.

Find the 95th percentile of the F distribution with (5, 2) degrees of freedom.

> qf(.95, df1=5, df2=2)  
[1] 19.296

 

卡方分布(χ2分布)、t分布和F分布合称三大抽样分布, 因为他们都是基于正态分布的


本文章摘自博客园,原文发布日期:2012-02-16

目录
相关文章
|
监控 Java 数据处理
Spring中的批处理:数据处理的瑞士军刀
Spring中的批处理:数据处理的瑞士军刀
330 0
|
前端开发 JavaScript
百度统计失效,referrer背锅了
前段时间遇到一个问题,就是我的个人网站需要接入第三方百度统计,因为我的文章图片有来自第三方微信后台上传的文章,所以使用<meta name="referrer" content="no-referrer">解决图片访问403的问题,但是此时这个导致我百度统计失效了,于是去查询了一下referrer这个特性。
527 0
百度统计失效,referrer背锅了
echarts修改tooltip默认样式(使用formatter函数拼接加工)
echarts修改tooltip默认样式(使用formatter函数拼接加工)
725 0
|
11月前
|
XML Java 数据格式
Spring从入门到入土(bean的一些子标签及注解的使用)
本文详细介绍了Spring框架中Bean的创建和使用,包括使用XML配置文件中的标签和注解来创建和管理Bean,以及如何通过构造器、Setter方法和属性注入来配置Bean。
182 9
Spring从入门到入土(bean的一些子标签及注解的使用)
|
11月前
|
机器学习/深度学习 数据采集 算法
目标分类笔记(一): 利用包含多个网络多种训练策略的框架来完成多目标分类任务(从数据准备到训练测试部署的完整流程)
这篇博客文章介绍了如何使用包含多个网络和多种训练策略的框架来完成多目标分类任务,涵盖了从数据准备到训练、测试和部署的完整流程,并提供了相关代码和配置文件。
371 0
目标分类笔记(一): 利用包含多个网络多种训练策略的框架来完成多目标分类任务(从数据准备到训练测试部署的完整流程)
|
8月前
|
机器学习/深度学习 弹性计算 人工智能
阿里云服务器ECS架构区别及选择参考:X86计算、ARM计算等架构介绍
在我们选购阿里云服务器的时候,云服务器架构有X86计算、ARM计算、GPU/FPGA/ASIC、弹性裸金属服务器、高性能计算可选,有的用户并不清楚他们之间有何区别,本文主要简单介绍下这些架构各自的主要性能及适用场景,以便大家了解不同类型的架构有何不同,主要特点及适用场景有哪些。
1178 10
|
存储 弹性计算 运维
如何使用Argo Workflows编排基因计算工作流?
为了高效、准确地整合和分析大量基因计算数据,工作流的自动化编排成为了一项关键技术,而容器化、灵活、易用的工作流引擎Argo Workflows在其中脱颖而出,成为串联基因计算各个环节的得力助手。
|
负载均衡 网络协议 算法
|
存储 关系型数据库 MySQL
在 MySQL 中使用 Insert Into Select
【8月更文挑战第11天】
3565 0
在 MySQL 中使用 Insert Into Select
|
前端开发 JavaScript 中间件
Golang——通过实例了解并解决CORS跨域问题
Golang——通过实例了解并解决CORS跨域问题
Golang——通过实例了解并解决CORS跨域问题

热门文章

最新文章