R语言分析协变量之间的非线性关系-阿里云开发者社区

R语言分析协变量之间的非线性关系

2024-04-16 69 发布于吉林

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： R语言分析协变量之间的非线性关系

最近我被问到我的 - [R和Stata的软件包是否能够适应协变量之间的非线性关系。答案是肯定的，在这篇文章中，我将说明如何做到这一点。

为了说明，我们将模拟具有两个协变量X1和X2以及连续结果ý的非常大的数据集。

set.seed（123）
n < -  10000
x1 < -  rnorm（n）
x2 < -  x1 ^ 2 + rnorm（n）
y < -  x1 + x2 + rnorm（n）
 [（runif（n）<expit（y））] < -  NA
mydata < -  data.frame（x1， X2，Y）

因此，模型的真实系数是0（截距）。注意，实体模型中没有非线性，但x2对x1的依赖性存在非线性。

imps1 < -   （mydata，smtype =“lm” ，
                numit = 50，method = c（“”，“norm”，“”））
impobj < -  imputationList（imps1 $ impDatasets）

输出：

[1] "Outcome variable(s): y"
[1] "Passive variables: "
[1] "Partially obs. variables: x2"
[1] "Fully obs. substantive model variables: x1"
[1] "Imputation  1"
[1] "Imputing:  x2  using  x1,x1sq  plus outcome"
[1] "Imputation  2"
[1] "Imputation  3"
[1] "Imputation  4"
[1] "Imputation  5"
Warning message:
In smcfcs.core(originaldata, smtype, smformula, method, predictorMatrix,  :
  Rejection sampling failed 503 times (across all variables, iterations, and imputations). You may want to increase the rejection sampling limit.

Multiple imputation results:
      with(impobj, lm(y ~ x1 + x2))
      MIcombine.default(models)
               results          se      (lower      upper) missInfo
(Intercept) -0.0274234 0.015746687 -0.06054163 0.005694823     53 %
x1           1.0075646 0.018740270  0.96407720 1.051052088     77 %
x2           1.0026004 0.008043873  0.98549090 1.019709850     56 %

我们看到x1的截距和系数的估计有明显的偏差。假设x2遵循以x1为条件的线性回归模型，smcfcs正在估算x2中的缺失值，条件均值在x1中是线性的。这样做意味着X2平方会在X2的插补模型中自动调整：

mydata $ x1sq < -  mydata $ x1 ^ 2
imps2 < -   （mydata，smtype =“lm”，smformula =“y~x1 + x2 + x1sq”，
                numit = 50，method = c（“”，“norm”， “”，“”））
impobj < -  imputationList（imps2 $ impDatasets）

输出：

[1] "Outcome variable(s): y"
[1] "Passive variables: x1sq"
[1] "Partially obs. variables: x1,x2"
[1] "Fully obs. substantive model variables: "
[1] "Imputation  1"
[1] "Imputing:  x1  using  x2  plus outcome"
[1] "Imputing:  x2  using  x1,x1sq  plus outcome"
[1] "Imputation  2"
[1] "Imputation  3"
[1] "Imputation  4"
[1] "Imputation  5"
Warning message:
In smcfcs.core(originaldata, smtype, smformula, method, predictorMatrix,  :
  Rejection sampling failed 17260 times (across all variables, iterations, and imputations). You may want to increase the rejection sampling limit.

Multiple imputation results:
      with(impobj, lm(y ~ x1 + x2))
      MIcombine.default(models)
              results         se    (lower    upper) missInfo
(Intercept) 0.2687343 0.04002737 0.1694782 0.3679903     88 %
x1          1.0276229 0.03432337 0.9436348 1.1116109     86 %
x2          1.0742299 0.01635284 1.0385746 1.1098852     64 %

我们现在估计与数据生成机制中使用的真实值非常接近。

需要注意的一点是，我们已经修改了假设为x2 | X1的模型，但我们还将实体模型（至少是用作插补过程的一部分的模型）修改为包含x1sq的模型。

predictorMatrix < -  array（0，dim = c（4,4））
predictorMatrix [2，c（1,4）] < -  1
imps3 < -   （mydata，smtype =“lm”，smformula =“y~x1 + x2“，numit = 50，
                predictorMatrix = predictorMatrix ）
impobj < -  imputationList（imps3 $ impDatasets）
models < -  with（impobj，lm（y~x1） + x2））

输出：

[1] "Outcome variable(s): y"
[1] "Passive variables: "
[1] "Partially obs. variables: x2"
[1] "Fully obs. substantive model variables: x1"
[1] "Imputation  1"
[1] "Imputing:  x2  using  x1,x1sq  plus outcome"
[1] "Imputation  2"
[1] "Imputation  3"
[1] "Imputation  4"
[1] "Imputation  5"
Warning message:
In smcfcs.core(originaldata, smtype, smformula, method, predictorMatrix,  :
  Rejection sampling failed 503 times (across all variables, iterations, and imputations). You may want to increase the rejection sampling limit.

Multiple imputation results:
      with(impobj, lm(y ~ x1 + x2))
      MIcombine.default(models)
               results          se      (lower      upper) missInfo
(Intercept) -0.0274234 0.015746687 -0.06054163 0.005694823     53 %
x1           1.0075646 0.018740270  0.96407720 1.051052088     77 %
x2           1.0026004 0.008043873  0.98549090 1.019709850     56 %

这里完全观察到x1。如果x1也有一些缺失值怎么办？然后我们需要告诉smcfcs如何估算x1，然后被动地估算x1sq变量。鉴于我们对真实数据生成模型的了解，我们应该如何归认于x1？然而，我们将继续，要求smcfcs使用规范方法来估算X1：

mydata$x1[runif(n)<0.25] <- NA
mydata$x1sq <- mydata$x1^2
predictorMatrix[1,2] <- 1
imps4 <-  (mydata, smtype="lm", smformula = "y~x1+x2", numit=50,
                predictorMatrix=predictorMatrix,  =c("norm","norm","","x1^2"))
impobj <-  (imps4$impDatasets)
models <- with(impobj, lm(y~x1+x2))
summary(MIcombine(models))

输出：

[1] "Outcome variable(s): y"
[1] "Passive variables: x1sq"
[1] "Partially obs. variables: x1,x2"
[1] "Fully obs. substantive model variables: "
[1] "Imputation  1"
[1] "Imputing:  x1  using  x2  plus outcome"
[1] "Imputing:  x2  using  x1,x1sq  plus outcome"
[1] "Imputation  2"
[1] "Imputation  3"
[1] "Imputation  4"
[1] "Imputation  5"
Warning message:
In smcfcs.core(originaldata, smtype, smformula, method, predictorMatrix,  :
  Rejection sampling failed 17260 times (across all variables, iterations, and imputations). You may want to increase the rejection sampling limit.

Multiple imputation results:
      with(impobj, lm(y ~ x1 + x2))
      MIcombine.default(models)
              results         se    (lower    upper) missInfo
(Intercept) 0.2687343 0.04002737 0.1694782 0.3679903     88 %
x1          1.0276229 0.03432337 0.9436348 1.1116109     86 %
x2          1.0742299 0.01635284 1.0385746 1.1098852     64 %

这个例子也说明了smcfcs的一个理论问题 - 虽然它从一个与指定的实体或结果模型兼容的插补模型中推算每个协变量，但这并不意味着这些插补模型中的每一个都是相互兼容的。具体而言，用于分配其他协变量的模型可能不兼容。

更有效的方法是为数据指定单个联合模型，并在其隐含的条件分布下进行估算。例如，这可以使用JAGS来实现。

R语言分析协变量之间的非线性关系

热门文章

最新文章

相关课程

相关电子书

相关实验场景

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

R语言分析协变量之间的非线性关系

热门文章

最新文章

相关课程

相关电子书

相关实验场景