R-Organize Data(step 2)-阿里云开发者社区

R-Organize Data(step 2)

2019-04-04 951

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： R is a data analysis and visualization platform.

[I]Missing Value

1.identify missing value

x	is.na(x)	is.nan(x)	is.infinite(x)
x<-NA	TRUE	FALSE	FALSE
x<-0/0	TRUE	TRUE	FALSE
x<-1/0	FALSE	FALSE	TRUE

is.na(x):missing value
is.nan(x):impossible value
is.infinite(x):infinite value

complete.cases():missing values are NA and NaN;Inf and -Inf are valid values

> mydata(data,package="")        #loading
> data[complete.cases(data),]        #no missing value row
> data[!complete.cases(data),]        #one or more missing value row
        
          
        
        
        
          
          AI 代码解读

2.missing pattern

Pattern	Package	Function	Description
list	mice	md.pattern(x)	0:missing value;1:no missing value
graphic	VIM	aggr(x,prop=FALSE,number=TRUE)	number=FALSE(default):delete numberical label
graphic	VIM	matrixplot(x,pch=,col=)	light color:small value,dark color:great value,red:default missing value
related	none	none	x<-as.data.frame(abs(is.na(data))) head(x,5) y<-x[which(apply(x,2,sum)>0)] cor(data,y,use="pairwise.complete.obs")

3.processing missing value

Method	Description
raw delete	newdata<-na.omit(mydata)
MI	library(mice) imp<-mice(data,m) fit<-with(imp,analysis) pooled<-pool(fit) summary(pooled)
mvnmle	maximum likelihood estimation of missing values in multivariate normal distribution data
cat	multiple interpolation of multi-category variable in log-linear models
arraryImpute arraryMissPattern Seqknn	microarrary missing data
longitudinalData	related function list
kmi	multiple interpolation Kaplan-Meier
mix	multiple interpolation mixed type data with continuous data
pan	multi-panel data or cluster data

[II]Date Value

Function	Description
date()	output current date and time
Sys.Date()	output current date
as.Date(x,"input_format")	character convert to date
as.character(dates)	date covert to character
difftime(date1,date2,units=)	time interval,units="weeks"/"days"/"hours"/"minutes"/seconds"
format(x,format="output_format")	output date in the specified format

input/output format

Symbol	Description	Example
%d	0~31	01
%a	abbreviated week name	Mon
%A	non-abbreviated week name	Monday
%m	month	01
%b	abbreviated month	Jan
%B	non-abbreviated month	January
%y	two-digit year	19
%Y	four-digit year	2019

[III]Type Conversion

Judgement	Conversion
is.numeric()	as.numeric()
is.character()	as.character()
is.vector()	as.vector()
is.matrix()	as.matrix()
is.date.frame()	as.data.frame()
is.factor()	as.factor()
is.logical()	as.logical()

[IV]Data Sorting

> newdata<-dataframe[order(x1,x2),]        #x_i=x,ascending;x_i=-x,descending
        
          
        
        
        
          
          AI 代码解读

[V]Data merging

> total<-merge(dataframeA,dataframeB,by="x1")        #column
> total<-cbind(dataframeA,dataframeB)        #direct column merger
> total<-rbind(dataframeA,dataframeB)        #direct row merger
        
          
        
        
        
          
          AI 代码解读

[VI]Subset of Dataset

> newdata<-dataframe[row indices,column indices]        #save variable
> dataframe$x1<-dataframe$x2<-NULL        #delete variable x1,x2
> newdata<subset(dataframe,condition)
> mysample<-dataframe[sample(1:nrow(dataframe),extracting elements,replace=),]        #replace=FALSE/TRUE(put/back)
        
          
        
        
        
          
          AI 代码解读

[VII]Processing Functions

1.math functions

Function	Description
abs(x)	absolute value
sqrt(x)	squart root
ceiling(x)	minimum integer not less than x
floor(x)	maximum integer not greater than x
trunc(x)	integer part from 0 to x
round(x,digits=n)	specified n is the decimal number of x
signif(x,digits=n)	specified n is the effective number of x
cos(x),sin(x),tan(x)	cosine,sine,tangent
acos(x),asin(x),atan(x)	arccosine,arcsine,arctangent
cosh(x),sinh(x),tanh(x)	hyperbolic cosine,hyperbolic sine,hyperbolic tangent
acosh(x),asinh(x),atanh(x)	inverse hyperbolic cosine,inverse hyperbolic sine,inverse hyperbolic tangent
log(x,base=n)	base=n,logarithm of x;log(x):base value=e;log10(x):base value=10
exp(x)	exponential function

2.statistical function

Function	Description
mean(x)	mean
madian(x)	madian
sd(x)	standard deviation
var(x)	variance
mad(x)	median absolute deviation
quantile(x,probs)	quantile
range(x)	range
sum(x)	summary
diff(x,lag=n)	hysteresis difference
min(x）	minimum
max(x）	maximum
scale(x,center=TRUE,scale=TRUE)	centralization:center=TRUE；standardization:center=TRUE,scale=TRUE

3.probability function

> [d/p/q/r]distribution_abbreviation()        
        
          
        
        
        
          
          AI 代码解读

d=density
p=distribution function
q=quantile function
r=random function

Distribution	Abbreviation
Beta	beta
Binomial	binom
Cauchy	caushy
Chi-square	chisq
Exponential	exp
F	f
Gamma	gamma
Geometric	geom
Hypergeometric	hyper
Logarithm normal	lnorm
Logistic	logis
Multiple	multinom
Negative Binomial	nbinom
Normal	norm
Poission	pois
Wilcoxon	signrank
T	t
Uniform	unif
Weibull	weibull
Wilcoxon	wilcox

4.character processing function

Function	Description
nchar(x)	character amount of x
substr(x,start,stop)	extract or replace a substring in a character vetor
grep(pattern,x,ignore,case=FALSE,fixed=FALSE)	search for a pattern in x.Regular Expression:fixed= FALSE；Text string:fixed=TRUE
sub(pattern,replacement,x,ignore,case=FALSE,fixed=FALSE)	search for a pattern in x and replacing by text replacement
strsplit(x,split,fixed=FALSE)	separate x in split
paste(...,sep="")	connection string with separator sep
toupper(x)	convert to uppercase
tolower(x)	convert to lowercase

5.others

Function	Description
length(x)	the length of x
seq(from,to,by)	generate a sequence
rep(x,n)	repeat x times n times
cut(x,n)	separate x into n parts
pretty(x,n)	create beautiful split points
cat(...,file="mylife",append=FALSE)	connection ... and output a file
apply(x,MARGIN,FUN,...)	x:data,MARGIN:subscript of dimension,FUN:specified function

Homemade function

> myfunction<-function(arg1,arg2,..){
       statements
      return(object)
  }
        
          
        
        
        
          
          AI 代码解读

[VIII]Control Flow

Description	Function
Repeat and Loop	for(var in seq) statement
Repeat and Loop	while (cond) statement
Conditional Execution	if (cond) statement if (cond) statement1 else statement2
	ifelse(cond,statement1,statement2)
	switch(expr,...)

[IX]Aggregate and Reshape

Function	Description
t(x)	Transpose
aggregate(x,by,FUN)	x=data,by:a list of variable name,FUN:function
melt(x,variance)	reshape2 package,data melt
dcast(md,formula,fun.aggregate)	md:melted data,formula:variance1~variance i,fun.aggregate:aggregate function

[X]component analysis

step analysis diagram of principal component/exploratory factor

principal component analysis

1.determine the number of principal

> library(psych)
> fa.parallel(Harman23.cor$cov,n.obs=302,fa="pc",n.iter=100,
                  show.legend=FALSE,main="Scree plot with parallel analysis")
        
          
        
        
        
          
          AI 代码解读

2.extracting the main component

> pc<-principal(r=USJudgeRatings[,-1],nfactors=1,rotate=,scores=)        #r:data,rotate default:maximum,scores default:no need
        
          
        
        
        
          
          AI 代码解读

3.principal component rotation

> rc<-principal(Harman23.cor$cov,nfactors=2,rotate="varimax")
        
          
        
        
        
          
          AI 代码解读

4.get the score of the principal component

> round(unclass(rc$weights),2)
        
          
        
        
        
          
          AI 代码解读

exploratory factor analysis

1.determine the number of common factors

> library(psych)
> covariances<-ability.cov$cov
> correlations<-cov2cor(covariances)
> fa.parallel(correlations,nobs=112,fa="both",n.iter=100,
                   main="Scree plots with parallel analysis")
        
          
        
        
        
          
          AI 代码解读

2.extracting common factor

> fa<-fa(correlations,nfactors=2,rotate="none",fm="pa")
        
          
        
        
        
          
          AI 代码解读

3.factor rotation

> fa.varimax<-fa(correlations,nfactors=2,rotate="varimax",fm="pa")        #orthogonal
> fa.promax<-fa(correlations,nfactors=2,rotate="promax",fm="pa")        #oblique
> factor.plot(fa.promax,labels=rownames(fa.promax$loadings))
> fa.diagram(fa.promax,simple=FALSE)
        
          
        
        
        
          
          AI 代码解读

4.factor score

> fa.promax$weights
        
          
        
        
        
          
          AI 代码解读

R-Organize Data(step 2)

[I]Missing Value

[II]Date Value

[III]Type Conversion

[IV]Data Sorting

[V]Data merging

[VI]Subset of Dataset

[VII]Processing Functions

1.math functions

2.statistical function

3.probability function

4.character processing function

5.others

[VIII]Control Flow

[IX]Aggregate and Reshape

[X]component analysis

step analysis diagram of principal component/exploratory factor

principal component analysis

exploratory factor analysis

热门文章

最新文章

相关电子书

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

直播

下载

镜像站

技术资料

R-Organize Data(step 2)

[I]Missing Value

[II]Date Value

[III]Type Conversion

[IV]Data Sorting

[V]Data merging

[VI]Subset of Dataset

[VII]Processing Functions

1.math functions

2.statistical function

3.probability function

4.character processing function

5.others

[VIII]Control Flow

[IX]Aggregate and Reshape

[X]component analysis

step analysis diagram of principal component/exploratory factor

principal component analysis

exploratory factor analysis

热门文章

最新文章

相关电子书