R的几个小知识

require(),library()

require()返回TRUE/FALSE,报错后程序会继续执行:

1
2
require(abc)
print("Hello")
1
2
3
4
5
6
$ Rscript require_library.R
载入需要的程辑包:abc
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
不存在叫‘abc’这个名字的程辑包
[1] "Hello"

而library()在遇到报错立即停止程序执行:

1
2
3
#require(abc)
library(abc)
print("Hello")
1
2
3
$ Rscript require_library.R
Error in library(abc) : 不存在叫‘abc’这个名字的程辑包
停止执行

sink(),unlink()

将R的输出内容保存到文件中:

语法 说明
sink.number() 获取正在保存输出的文件个数
sink.number(type = “message”) 获取正在保存错误输出的文件个数
unlink() 删除文件句柄

用法:

1
2
3
4
sink(file = NULL, append = FALSE, type = c("output", "message"),split = FALSE)
sink.number(type = c("output", "message"))
sink("/path/to/sample.log")
unlink(x, recursive = FALSE, force = FALSE)

Sys

1
2
3
4
5
6
7
8
9
10
11
12
13
14
> Sys.time()
[1] "2016-01-19 09:52:58 CST"
> Sys.Date()
[1] "2016-01-19"
> Sys.timezone()
[1] "Asia/Shanghai"
> Sys.getenv("HOME")
[1] "/Users/LeslieZhu"
> Sys.info()[['nodename']] # 获取主机名
[1] "cl.local"

commandArgs

用法:

1
commandArgs(trailingOnly = FALSE)

如果是返回所有参数,则:

1
2
3
4
5
6
7
$ Rscript args.R --help
[1] "/Library/Frameworks/R.framework/Resources/bin/exec/R"
[2] "--slave"
[3] "--no-restore"
[4] "--file=args.R"
[5] "--args"
[6] "--help"

如果只返回 =–args= 后面的参数,则设置 =trailingOnly = TRUE=:

1
2
$ Rscript args.R --help
[1] "--help"

=注意=: 索引从1开始,而不是从0开始

数值、字符串转换

1
2
3
4
5
6
> as.character(201601)
[1] "201601"
> as.double("201601")
[1] 201601
> as.integer("201601.1")
[1] 201601

cat(),paste(),print()

函数 说明
cat() 将所有参数连接为一个字符串输出,转义字符生效
print() 打印内容
paste() 连接参数返回字符串,转义字符不生效
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
> cat(1,2,3,'abcd',5,6,sep='*')
1*2*3*abcd*5*6
> print(1,2,3,'abcd',5,6)
[1] 1
> print(paste(1,2,3,'abcd',5,6))
[1] "1 2 3 abcd 5 6"
> print(paste(1,2,3,'abcd',5,6,sep='*'))
[1] "1*2*3*abcd*5*6"
> cat(1,2,3,'\n',5,6,sep='*')
1*2*3*
*5*6
> paste(1,2,3,'\n',5,6,sep='*')
[1] "1*2*3*\n*5*6"
```r
# getwd(),setwd()
获取/设置工作路径:
```r
> getwd()
[1] "/UUU"
> setwd("/KKK")
> getwd()
[1] "/KKK"

data.table

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# 定义一个表
> DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9)
> DT
x y v
1: a 1 1
2: a 3 2
3: a 6 3
4: b 1 4
5: b 3 5
6: b 6 6
7: c 1 7
8: c 3 8
9: c 6 9
> tables()
NAME NROW NCOL MB COLS KEY
[1,] DT 9 3 1 x,y,v
[2,] DT1 5 2 1 x,a
[3,] DT2 3 2 1 x,mul
[4,] X 2 2 1 V1,foo
Total: 4MB
# 查看第二行
> DT[2]
x y v
1: a 3 2
# 查看第y列
> DT[,y]
[1] 1 3 6 1 3 6 1 3 6
> DT[,list(y)]
y
1: 1
2: 3
3: 6
4: 1
5: 3
6: 6
7: 1
8: 3
9: 6
# 在2到5行之间累加y列的值
> DT[2:5,sum(y)]
[1] 13

read.table

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
> read.table(header = TRUE, text = "
+ a b
+ 1 2
+ 3 4
+ ")
a b
1 1 2
2 3 4
> read.table("foo.csv", header = TRUE, sep = ",", row.names = 1)
> test1 <- c(1:5, "6,7", "8,9,10")
> tf <- tempfile()
> writeLines(test1, tf)
> read.csv(tf, fill = TRUE) # 1 column
X1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10

write.table

1
2
3
> write.csv(x, file = "foo.csv", row.names = FALSE)
> write.table(x, file = "foo.csv", sep = ",", col.names = NA,
qmethod = "double")

rbindlist

1
2
3
4
5
6
7
8
9
10
11
12
13
> DT1 = data.table(A=1:3,B=letters[1:3])
> DT2 = data.table(A=4:5,B=letters[4:5])
> l = list(DT1,DT2)
> rbindlist(l)
A B
1: 1 a
2: 2 b
3: 3 c
4: 4 d
5: 5 e

snow, snowfall与并行计算

函数 说明
sfInit 初始化并行计算环境
sfStop 停止并行计算
sfParallel,sfCpus 查看是否并行执行,计算单元
sfLapply, sfSapply, sfApply 应用操作到各个计算单元
sfLibrary,sfSource 各个计算单元加载软件包、源码文件
sfExport, sfExportAll, sfRemoveAll 同步/删除变量
sfClusterCall,sfCluster 在计算单元执行

例1:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# 初始化
sfInit( parallel=TRUE, cpus=2 )
if( sfParallel() )
cat( "Running in parallel mode on", sfCpus(), "nodes.\n" )
else
cat( "Running in sequential mode.\n" )
# 设置全局对象
globalVar1 <- c( "a", "b", "c" )
globalVar2 <- c( "d", "e" )
globalVar3 <- c( 1:10 )
globalNoExport <- "dummy"
# 设置函数用于在计算单元执行
calculate <- function( x ) {
cat( x )
return( 2 ^ x )
}
# 将变量传送到各个计算单元环境
sfExportAll( except=c( "globalNoExport" ) )
# 在计算单元上执行命令
sfClusterEvalQ( ls() )
# 在计算单元执行函数,并传递参数
cat( unlist( sfLapply( globalVar3, calculate ) ) )
# 在计算单元清除变量
sfRemoveAll( except=c( "calculate" ) )
吴羽舒 wechat
欢迎您扫一扫上面的微信公众号,订阅我的博客!