R-Description Data(step 3)

简介: R is a data analysis and visualization platform.

[I]Description

1.overall description

summary()

> summary(data)        #min,lower quartile,median,upper quartile,max 

sapply()

> sapply(x,FUN,options)        #mean,standard deviation,skewness,kurtosis
#options:mean(),sd(),var(),min(),max(),median(),length(),range(),quantile(),fivenum()

describe() of Hmisc

> describe(data)        #variable and observation amount,missing value and unique value mean,
                        #quantile,,five min,five max

stat.desc() of pastecs

> stat.desc(data)
#- basic=TRUE(default)
#variable,null value,missing value,min,max,range,summary
#- desc=TRUE(default)
#median,mean,mean standard deviation,mean confidence interval(confidence=95%)
#- norm=TRUE
#normal distribution,include skewness and kurtosis(and degree of statistics)

describe() of psych

> describe(data)        #missing value mean,standard deviation,madian,trimmed mean,
                        #median absolute deviation,min,max,range,skewness,kurtosis,standard error of the mean

2.part description

aggregate()

> aggregate(data,by=list(INDICES),FUN)        #return single statistic

by()

> by(data,INDICES,FUN)        #return multiple statistics

summaryBy() of doBy

> summaryBy(formula,data=dataframe,FUN=function)          #single or multiple grouping variable layering
#formula = var1 + var2 + var3 + ... + varN ~ groupvar1 + groupvar2 + ... + groupvarN
#(varN is numerical variable,groupvar is grouping variable)

describeBy() of psych

> describeBy(data,list(INDICES))        #grouping variable are related

3.contingency table

traditional

Function Describe
table(var1,var2, ... ,varN) N dimensional table
xtabs(~formula,data) N dimensional table is based on a formula,a matrix or data frame generating
prop.table(table,margins) convert frequency to scale
margin.table(table,margin) summary
addmargins(table,margins) add margins to table
ftable(table) tiled contingency table

CrossTable() of gmodels

> CrossTable(data1,data2)

[II]Test

1.known sample

- independence

Chi-square
> chis.test(data)        #p<0.01,related;p>0.05,unrelated
Fisher percision
> fisher.test(mytable)        #mytable is not a 2×2 table
Cochran-Mantel-Haenszel
> mantelhaen.test(mytable)        #no third-order interaction

- correlation

category type

(1)Phi/Contingency/Cramer's V

> assocstats(mytable)

(2)Pearson/Spearman/Kendall

> cor(x,use,method)        #default:use="everything",method="pearson"
> cov(data)        #covariance
> cor.test(x,y,alternative= ,method= )        #test a relationship at a time
> corr.test(x,use,method)        #test multiple relationships at a time

use:

  • all.obs:getting an error while getting wrong data;
  • everything:missing is setting while missing data;
  • complete.obs:line deletion
  • pairwise.complete.obs:pairwise deletion

method:

  • pearson:linear correlation between two variables
  • spearman:degree of correlation between graded variables
  • kendall:level related measure
    (3)partial correlation
> library(ggm)
> pcor(u,S)        #u:numerical vetor;S:covariance
> pcor.test(r,q,n)        #r:correlation coefficient;q:variable number;n:sample size

continuous type

(1)parameter
1)independent sample

> t.test(y~x,data)        #t.test(y1,y2)

2)dependent sample

> t.test(y1,y2,paired=TRUE)

3)more than two groups:ANOVA

  • single factor varinance (y~A)
> aov(formula,data=dataframe)
> TukeyHSD()        #pairwise comparison
  • single factor covariance (y~x + A)
  • double factors varinance (y~A * B)
  • repeated measurement varinance (y~ B*W + Error(Subject/W))
  • multiple varinance
> data->manova(y~A)
> summary.aov(data)
> Wilks.test(y,shelf,method="mcd")
  • regression
> fit.lm<-lm(y~A,data)
> summary(fit.lm)

(2)nonparameter

  • two groups
> wilcox.test(y~x,data)        #wilco.text(y1,y2)
  • more than two groups
#groups independent
> kruskal.test(y~A,data)        
#groups dependent
>friedman.test(y~A|B,data)        

2.random sample

Function Description
oneway_test(y~A) two samples and K samples
oneway_test(y~A | C) containing a layering factor of two samples and K samples
wilcox_test(y~A) Wilcoxon-Meann-Whitney
kruskal_test(y~A) Kruskal-Wallis
chisq_test(A~B | C) Pearson Chi-square
cmh_test(A~B | C) Cochran-Mantel-Haenszel
lbl_test(D~E) linear correlation
spearman_test(y~x) Spearman
friendman_test(y~A | C) Friendman
wilcoxsign_test(y1~y2) Wilcoxon
  • function_name(formula,data,distribution=)
  • formula=variables relationship
  • data=dataframe
  • distribution="exact"/"asymptotic"/"approximate"
Function Description
lmp(A~B,data=,perm=) simple
lmp(A~B+I(height^2),data=,perm=) polynomical
lmp(A~B+C+D+E,data=,perm=) multiple
avop(A~B,data=,perm=) single factor variance
avop(A~B+C,data=,perm=) single factor covariance
avop(A~B*C,data=,perm=) double factor variance
  • perm="Exact"/"Prob"/"SPR"

[III]efficacy

Function Description
pwr.2p.test(h=,n=,sig.level=,power=) two(n is equal)
pwr.2p2n.test(h=,n1=,n2=,sig.level=,power=) two(n are not equal)
pwr.anova.test(k=,n=,f=,sig.level=,power=) balanced single factor ANOVA
pwr.chisq.test(w=,N=,df=,sig.level=,power=) Chi-square test
pwr.f2.test(u=,v=,f2=,sig.level=,power=) generalized linear model
pwr.p.test() proportion(single sample)
pwr.r.test(n=,r=,sig.level=,power=,alternative=) correlation coefficient
pwr.t.test(n=,d=,sig.level=,power=,type=,alternative=) t est(single sample/two samples/pair)
pwr.t2n.test(n1=,n2=,d=,sig.level=,power=,alternative=) t test(n are not equal of two samples)
  • h=ES.h(p1,p2)
  • n=sample size
  • $\mu$=mean
  • $\sigma^2$=error variance
  • sig.level=significant level(default=0.05)
  • power=efficacy level
  • k=groups number
  • f=$\sqrt{\frac{\sum_{i-1}^{k}{p_i * {(\mu_i -\mu)}^2}}{\sigma^2}}$,$p_i=\frac{n_i}{N}$
  • w=$\sqrt{\sum_{i=1}^{m}{\frac{{(p0_i-p1_i)}^2}{p0_i}}}$,$p0_i=H_0$ for probability,$p1_i=H_1$ for probability
  • N=total sample
  • df=free degree
  • u=N-B;N-k-1(k=forecast number)
  • v=denominator free degree
  • f2=$\frac{R^2}{1-R^2}$($R^2$=total squared value of multiple correlation);
    f2=$\frac{{R_{AB}}^2-{R_A}^2}{1-{R_{AB}}^2}$(${R_{A}}^2$=interpretation rate of A for total variance,${R_{AB}}^2$=interpretation rate of A and B for total variance)
  • r=reference linear correlation coefficient
  • alternative="two.sided"(default)/"less"/"greater"
  • d=$\frac{\mu_1-\mu_2}{\sigma}$
  • type="two.sample"(default)/"one.sample"/"paired"

    END!

目录
相关文章
|
机器学习/深度学习 人工智能 算法
Data topic details 5 | Data
数据结构结构教程 李春葆(第五版)习题 第五章
115 0
|
数据可视化 PyTorch 算法框架/工具
Pychram Pytorch Tensorboard 报错 “No dashboards are active for the current data set.“ 解决方案
Pychram Pytorch Tensorboard 报错 “No dashboards are active for the current data set.“ 解决方案
Pychram Pytorch Tensorboard 报错 “No dashboards are active for the current data set.“ 解决方案
Sap Ds Data is not available. Increase the time-out interval values in Debug | Options
Sap Ds Data is not available. Increase the time-out interval values in Debug | Options
94 0
torch.distributed.init_process_group(‘gloo’, init_method=‘file://tmp/somefile’, rank=0, world_size=1
torch.distributed.init_process_group(‘gloo’, init_method=‘file://tmp/somefile’, rank=0, world_size=1
539 0
torch.distributed.init_process_group(‘gloo’, init_method=‘file://tmp/somefile’, rank=0, world_size=1
How to find unit test class by code
Created by Wang, Jerry, last modified on Dec 20, 2014
104 0
How to find unit test class by code
D3 dataset - what is usage of key function in data
Created by Wang, Jerry, last modified on Sep 21, 2015
101 0
D3 dataset - what is usage of key function in data
|
开发工具
R-Organize Data(step 2)
R is a data analysis and visualization platform.
914 0
|
数据库连接 Linux API
R-Retrieve Data (step 1)
R is a data analysis and visualization platform.
1175 0