Interval Estimation 区间预估

简介:
Refer to R Tutorial andExercise Solution

It is a common requirement to efficiently estimate population parameters based on simple random sample data.

基于简单随机样本来estimate全局参数, 由于是预估, 一般是预估一个区间, 所以称为区间预估.

Point Estimate of Population Mean, 全局平均值的点估计

For any particular random sample, we can always compute its sample mean.

> library(MASS)                  # load the MASS package  
> height.survey = survey$Height

> mean(height.survey, na.rm=TRUE)  # skip missing values  
[1] 172.38

直接用样本mean来作为全局mean, 显然这个方法比较简陋

 

Interval Estimate of Population Mean with Known Variance, 已知全局方差的全局平均值的区间预估

Here, we discuss the case where the population variance σ2 is assumed known.

Let us denote the 100(1 −α∕2) percentile of the standard normal distribution as zα∕2. For random sample of sufficiently large size, the end points of the interval estimate at (1 − α) confidence level is given as follows:

这个就比直接用样本平均值高级点, 虽然我不知道为什么是这样来用全局方差

Assume the population standard deviation σ of the student height in survey is 9.48. Find the margin of error and interval estimate at 95% confidence level(1 − α).

> library(MASS)                  # load the MASS package  
> height.response = na.omit(survey$Height)

 

> n = length(height.response)  
> sigma = 9.48                   # population standard deviation  
> sem = sigma/sqrt(n); sem       # standard error of the mean  
[1] 0.65575

 

> E = qnorm(.975)∗sem; E         # margin of error  
[1] 1.2852

 

> xbar = mean(height.response)   # sample mean  
> xbar + c(−E, E)  
[1] 171.10 173.67

 

Interval Estimate of Population Mean with Unknown Variance, 未知全局方差的全局平均值的区间预估

Here, we discuss the case where the population variance is not assumed.

Let us denote the 100(1 −α∕2) percentile of the Student t distribution with n− 1 degrees of freedom as tα∕2. For random samples of sufficiently large size, and with standard deviation s, the end points of the interval estimate at (1 −α) confidence level is given as follows:

不知道全局方差, 就通过样品标准偏差(samples standard deviation)来替代全局方差进行预估, 更牛比了点

Without assuming the population standard deviation of the student height in survey, find the margin of error and interval estimate at 95% confidence level.

> n = length(height.response)  
> s = sd(height.response)        # sample standard deviation  
> SE = s/sqrt(n); SE             # standard error estimate  
[1] 0.68117

> E = qt(.975, df=n−1)∗SE; E     # margin of error  
[1] 1.3429

 

Sampling Size of Population Mean, 样本数量

The quality of a sample survey can be improved by increasing the sample size. The formula below provide the sample size needed under the requirement of population mean interval estimate at (1 −α) confidence level, margin of error E, and population variance σ2. Here, zα∕2 is the 100(1 − α∕2) percentile of the standard normal distribution.

样本越大当然预测就越准, 这个公式就是来算合适的样本size的

Assume the population standard deviation σ of the student height in survey is 9.48. Find the sample size needed to achieve a 1.2 centimeters margin of error at 95% confidence level.

 

> zstar = qnorm(.975)  
> sigma = 9.48  
> E = 1.2  
> zstar^2 ∗ sigma^2/ E^2  
[1] 239.75

 

Point Estimate of Population Proportion, 全局比例的点预估

Multiple choice questionnaires in a survey are often used to determine the the proportion of a population with certain characteristic. For example, we can estimate the proportion of female students in the university based on the result in the sample data set survey.

Find a point estimate of the female student proportion from survey.

> library(MASS)                  # load the MASS package  
> gender.response = na.omit(survey$Sex)  
> n = length(gender.response)    # valid responses count

> k = sum(gender.response == "Female")  
> pbar = k/n; pbar  
[1] 0.5

 

Interval Estimate of Population Proportion

After we found a point sample estimate of the population proportion, we would need to estimate its confidence interval.

Let us denote the 100(1 −α∕2) percentile of the standard normal distribution as zα∕2. If the samples size n and population proportion p satisfy the condition that np ≥ 5 and n(1 − p) ≥ 5, than the end points of the interval estimate at (1 − α) confidence level is defined in terms of the sample proportion as follows.

Sampling Size of Population Proportion

The quality of a sample survey can be improved by increasing the sample size. The formula below provide the sample size needed under the requirement of population proportion interval estimate at (1 − α) confidence level, margin of error E, and planned proportion estimate p. Here, zα∕2 is the 100(1 − α∕2) percentile of the standard normal distribution.


本文章摘自博客园,原文发布日期:2012-02-17

目录
相关文章
交流电路理论:峰值、平均值和RMS值的计算公式
除了频率和周期之外,AC 波形的一个关键属性是振幅,它表示交变波形的最大值,或者更广为人知的是峰值。
6858 0
交流电路理论:峰值、平均值和RMS值的计算公式
|
算法 计算机视觉
平均精度均值(Mean Average Precision, mAP)
与目标识别不同,目标检测中不仅仅需要在一张图片中检测到是否含有某物体,还需要将该物体的位置找出来,所以在判定模型的好坏时,就有其标准 —— mAP
684 0
|
索引
PAT甲级 1007. Maximum Subsequence Sum(25分) 复杂度优化到O(n)
PAT甲级 1007. Maximum Subsequence Sum(25分) 复杂度优化到O(n)
80 0
Optimal Coin Change(完全背包计数)
题目描述 In a 10-dollar shop, everything is worthy 10 dollars or less. In order to serve customers more effectively at the cashier, change needs to be provided in the minimum number of coins. In this problem, you are going to provide a given value of the change in different coins.
90 0
|
运维 监控
平方预测误差(Squared prediction error,SPE)和霍特林统计量(Hotelling’s T2)原理
平方预测误差(Squared prediction error,SPE)和霍特林统计量(Hotelling’s T2)原理
平方预测误差(Squared prediction error,SPE)和霍特林统计量(Hotelling’s T2)原理
【1145】Hashing - Average Search Time (25分)【hash 平方探测法】
【1145】Hashing - Average Search Time (25分)【hash 平方探测法】 【1145】Hashing - Average Search Time (25分)【hash 平方探测法】
109 0
【1142】Maximal Clique (25分)【有点问题】
【1142】Maximal Clique (25分)【有点问题】 【1142】Maximal Clique (25分)【有点问题】
80 0