R-Organize Data(step 2)-阿里云开发者社区

开发者社区> 开发与运维> 正文

R-Organize Data(step 2)

简介: R is a data analysis and visualization platform.

[I]Missing Value

1.identify missing value

x is.na(x) is.nan(x) is.infinite(x)
x<-NA TRUE FALSE FALSE
x<-0/0 TRUE TRUE FALSE
x<-1/0 FALSE FALSE TRUE
  • is.na(x):missing value
  • is.nan(x):impossible value
  • is.infinite(x):infinite value

complete.cases():missing values are NA and NaN;Inf and -Inf are valid values

> mydata(data,package="")        #loading
> data[complete.cases(data),]        #no missing value row
> data[!complete.cases(data),]        #one or more missing value row

2.missing pattern

Pattern Package Function Description
list mice md.pattern(x) 0:missing value;1:no missing value
graphic VIM aggr(x,prop=FALSE,number=TRUE) number=FALSE(default):delete numberical label
graphic VIM matrixplot(x,pch=,col=) light color:small value,dark color:great value,red:default missing value
related none none x<-as.data.frame(abs(is.na(data)))
head(x,5)
y<-x[which(apply(x,2,sum)>0)]
cor(data,y,use="pairwise.complete.obs")

3.processing missing value

Method Description
raw delete newdata<-na.omit(mydata)
MI library(mice)
imp<-mice(data,m)
fit<-with(imp,analysis)
pooled<-pool(fit)
summary(pooled)
mvnmle maximum likelihood estimation of missing values in multivariate normal distribution data
cat multiple interpolation of multi-category variable in log-linear models
arraryImpute
arraryMissPattern
Seqknn
microarrary missing data
longitudinalData related function list
kmi multiple interpolation Kaplan-Meier
mix multiple interpolation mixed type data with continuous data
pan multi-panel data or cluster data

[II]Date Value

Function Description
date() output current date and time
Sys.Date() output current date
as.Date(x,"input_format") character convert to date
as.character(dates) date covert to character
difftime(date1,date2,units=) time interval,units="weeks"/"days"/"hours"/"minutes"/seconds"
format(x,format="output_format") output date in the specified format

input/output format

Symbol Description Example
%d 0~31 01
%a abbreviated week name Mon
%A non-abbreviated week name Monday
%m month 01
%b abbreviated month Jan
%B non-abbreviated month January
%y two-digit year 19
%Y four-digit year 2019

[III]Type Conversion

Judgement Conversion
is.numeric() as.numeric()
is.character() as.character()
is.vector() as.vector()
is.matrix() as.matrix()
is.date.frame() as.data.frame()
is.factor() as.factor()
is.logical() as.logical()

[IV]Data Sorting

> newdata<-dataframe[order(x1,x2),]        #x_i=x,ascending;x_i=-x,descending

[V]Data merging

> total<-merge(dataframeA,dataframeB,by="x1")        #column
> total<-cbind(dataframeA,dataframeB)        #direct column merger
> total<-rbind(dataframeA,dataframeB)        #direct row merger

[VI]Subset of Dataset

> newdata<-dataframe[row indices,column indices]        #save variable
> dataframe$x1<-dataframe$x2<-NULL        #delete variable x1,x2
> newdata<subset(dataframe,condition)
> mysample<-dataframe[sample(1:nrow(dataframe),extracting elements,replace=),]        #replace=FALSE/TRUE(put/back)

[VII]Processing Functions

1.math functions

Function Description
abs(x) absolute value
sqrt(x) squart root
ceiling(x) minimum integer not less than x
floor(x) maximum integer not greater than x
trunc(x) integer part from 0 to x
round(x,digits=n) specified n is the decimal number of x
signif(x,digits=n) specified n is the effective number of x
cos(x),sin(x),tan(x) cosine,sine,tangent
acos(x),asin(x),atan(x) arccosine,arcsine,arctangent
cosh(x),sinh(x),tanh(x) hyperbolic cosine,hyperbolic sine,hyperbolic tangent
acosh(x),asinh(x),atanh(x) inverse hyperbolic cosine,inverse hyperbolic sine,inverse hyperbolic tangent
log(x,base=n) base=n,logarithm of x;log(x):base value=e;log10(x):base value=10
exp(x) exponential function

2.statistical function

Function Description
mean(x) mean
madian(x) madian
sd(x) standard deviation
var(x) variance
mad(x) median absolute deviation
quantile(x,probs) quantile
range(x) range
sum(x) summary
diff(x,lag=n) hysteresis difference
min(x) minimum
max(x) maximum
scale(x,center=TRUE,scale=TRUE) centralization:center=TRUE;standardization:center=TRUE,scale=TRUE

3.probability function

> [d/p/q/r]distribution_abbreviation()        
  • d=density
  • p=distribution function
  • q=quantile function
  • r=random function
Distribution Abbreviation
Beta beta
Binomial binom
Cauchy caushy
Chi-square chisq
Exponential exp
F f
Gamma gamma
Geometric geom
Hypergeometric hyper
Logarithm normal lnorm
Logistic logis
Multiple multinom
Negative Binomial nbinom
Normal norm
Poission pois
Wilcoxon signrank
T t
Uniform unif
Weibull weibull
Wilcoxon wilcox

4.character processing function

Function Description
nchar(x) character amount of x
substr(x,start,stop) extract or replace a substring in a character vetor
grep(pattern,x,ignore,case=FALSE,fixed=FALSE) search for a pattern in x.Regular Expression:fixed= FALSE;Text string:fixed=TRUE
sub(pattern,replacement,x,ignore,case=FALSE,fixed=FALSE) search for a pattern in x and replacing by text replacement
strsplit(x,split,fixed=FALSE) separate x in split
paste(...,sep="") connection string with separator sep
toupper(x) convert to uppercase
tolower(x) convert to lowercase

5.others

Function Description
length(x) the length of x
seq(from,to,by) generate a sequence
rep(x,n) repeat x times n times
cut(x,n) separate x into n parts
pretty(x,n) create beautiful split points
cat(...,file="mylife",append=FALSE) connection ... and output a file
apply(x,MARGIN,FUN,...) x:data,MARGIN:subscript of dimension,FUN:specified function

Homemade function

> myfunction<-function(arg1,arg2,..){
       statements
      return(object)
  }

[VIII]Control Flow

Description Function
Repeat and Loop for(var in seq) statement
while (cond) statement
Conditional Execution if (cond) statement
if (cond) statement1 else statement2
ifelse(cond,statement1,statement2)
switch(expr,...)

[IX]Aggregate and Reshape

Function Description
t(x) Transpose
aggregate(x,by,FUN) x=data,by:a list of variable name,FUN:function
melt(x,variance) reshape2 package,data melt
dcast(md,formula,fun.aggregate) md:melted data,formula:variance1~variance i,fun.aggregate:aggregate function

[X]component analysis

step analysis diagram of principal component/exploratory factor

1

principal component analysis

1.determine the number of principal

> library(psych)
> fa.parallel(Harman23.cor$cov,n.obs=302,fa="pc",n.iter=100,
                  show.legend=FALSE,main="Scree plot with parallel analysis")

2.extracting the main component

> pc<-principal(r=USJudgeRatings[,-1],nfactors=1,rotate=,scores=)        #r:data,rotate default:maximum,scores default:no need

3.principal component rotation

> rc<-principal(Harman23.cor$cov,nfactors=2,rotate="varimax")

4.get the score of the principal component

> round(unclass(rc$weights),2)

exploratory factor analysis

1.determine the number of common factors

> library(psych)
> covariances<-ability.cov$cov
> correlations<-cov2cor(covariances)
> fa.parallel(correlations,nobs=112,fa="both",n.iter=100,
                   main="Scree plots with parallel analysis")

2.extracting common factor

> fa<-fa(correlations,nfactors=2,rotate="none",fm="pa")

3.factor rotation

> fa.varimax<-fa(correlations,nfactors=2,rotate="varimax",fm="pa")        #orthogonal
> fa.promax<-fa(correlations,nfactors=2,rotate="promax",fm="pa")        #oblique
> factor.plot(fa.promax,labels=rownames(fa.promax$loadings))
> fa.diagram(fa.promax,simple=FALSE)

4.factor score

> fa.promax$weights

版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。

分享:
开发与运维
使用钉钉扫一扫加入圈子
+ 订阅

集结各类场景实战经验,助你开发运维畅行无忧

其他文章