R的几个小知识

require(),library()

require()返回TRUE/FALSE,报错后程序会继续执行:

1
2
require(abc)
print("Hello")
1
2
3
4
5
6
$ Rscript require_library.R
载入需要的程辑包:abc
Warning message:
In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
不存在叫‘abc’这个名字的程辑包
[1] "Hello"

而library()在遇到报错立即停止程序执行:

1
2
3
#require(abc)
library(abc)
print("Hello")
1
2
3
$ Rscript require_library.R
Error in library(abc) : 不存在叫‘abc’这个名字的程辑包
停止执行

sink(),unlink()

将R的输出内容保存到文件中:

语法 说明
sink.number() 获取正在保存输出的文件个数
sink.number(type = “message”) 获取正在保存错误输出的文件个数
unlink() 删除文件句柄

用法:

1
2
3
4
sink(file = NULL, append = FALSE, type = c("output", "message"),split = FALSE)
sink.number(type = c("output", "message"))
sink("/path/to/sample.log")
unlink(x, recursive = FALSE, force = FALSE)

Sys

1
2
3
4
5
6
7
8
9
10
11
12
13
14
> Sys.time()
[1] "2016-01-19 09:52:58 CST"

> Sys.Date()
[1] "2016-01-19"

> Sys.timezone()
[1] "Asia/Shanghai"

> Sys.getenv("HOME")
[1] "/Users/LeslieZhu"

> Sys.info()[['nodename']] # 获取主机名
[1] "cl.local"

commandArgs

用法:

1
commandArgs(trailingOnly = FALSE)

如果是返回所有参数,则:

1
2
3
4
5
6
7
$ Rscript args.R --help
[1] "/Library/Frameworks/R.framework/Resources/bin/exec/R"
[2] "--slave"
[3] "--no-restore"
[4] "--file=args.R"
[5] "--args"
[6] "--help"

如果只返回 =–args= 后面的参数,则设置 =trailingOnly = TRUE=:

1
2
$ Rscript args.R --help
[1] "--help"

=注意=: 索引从1开始,而不是从0开始

数值、字符串转换

1
2
3
4
5
6
> as.character(201601)
[1] "201601"
> as.double("201601")
[1] 201601
> as.integer("201601.1")
[1] 201601

cat(),paste(),print()

函数 说明
cat() 将所有参数连接为一个字符串输出,转义字符生效
print() 打印内容
paste() 连接参数返回字符串,转义字符不生效
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
> cat(1,2,3,'abcd',5,6,sep='*')
1*2*3*abcd*5*6

> print(1,2,3,'abcd',5,6)
[1] 1

> print(paste(1,2,3,'abcd',5,6))
[1] "1 2 3 abcd 5 6"

> print(paste(1,2,3,'abcd',5,6,sep='*'))
[1] "1*2*3*abcd*5*6"

> cat(1,2,3,'\n',5,6,sep='*')
1*2*3*
*5*6

> paste(1,2,3,'\n',5,6,sep='*')
[1] "1*2*3*\n*5*6"
```r

# getwd(),setwd()

获取/设置工作路径:

```r
> getwd()
[1] "/UUU"
> setwd("/KKK")
> getwd()
[1] "/KKK"

data.table

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# 定义一个表
> DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9)

> DT
x y v
1: a 1 1
2: a 3 2
3: a 6 3
4: b 1 4
5: b 3 5
6: b 6 6
7: c 1 7
8: c 3 8
9: c 6 9

> tables()
NAME NROW NCOL MB COLS KEY
[1,] DT 9 3 1 x,y,v
[2,] DT1 5 2 1 x,a
[3,] DT2 3 2 1 x,mul
[4,] X 2 2 1 V1,foo
Total: 4MB

# 查看第二行
> DT[2]
x y v
1: a 3 2

# 查看第y列
> DT[,y]
[1] 1 3 6 1 3 6 1 3 6

> DT[,list(y)]
y
1: 1
2: 3
3: 6
4: 1
5: 3
6: 6
7: 1
8: 3
9: 6

# 在2到5行之间累加y列的值
> DT[2:5,sum(y)]
[1] 13

read.table

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
> read.table(header = TRUE, text = "
+ a b
+ 1 2
+ 3 4
+ ")
a b
1 1 2
2 3 4

> read.table("foo.csv", header = TRUE, sep = ",", row.names = 1)

> test1 <- c(1:5, "6,7", "8,9,10")

> tf <- tempfile()

> writeLines(test1, tf)

> read.csv(tf, fill = TRUE) # 1 column
X1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10

write.table

1
2
3
> write.csv(x, file = "foo.csv", row.names = FALSE)
> write.table(x, file = "foo.csv", sep = ",", col.names = NA,
qmethod = "double")

rbindlist

1
2
3
4
5
6
7
8
9
10
11
12
13
> DT1 = data.table(A=1:3,B=letters[1:3])

> DT2 = data.table(A=4:5,B=letters[4:5])

> l = list(DT1,DT2)

> rbindlist(l)
A B
1: 1 a
2: 2 b
3: 3 c
4: 4 d
5: 5 e

snow, snowfall与并行计算

函数 说明
sfInit 初始化并行计算环境
sfStop 停止并行计算
sfParallel,sfCpus 查看是否并行执行,计算单元
sfLapply, sfSapply, sfApply 应用操作到各个计算单元
sfLibrary,sfSource 各个计算单元加载软件包、源码文件
sfExport, sfExportAll, sfRemoveAll 同步/删除变量
sfClusterCall,sfCluster 在计算单元执行

例1:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# 初始化
sfInit( parallel=TRUE, cpus=2 )

if( sfParallel() )
cat( "Running in parallel mode on", sfCpus(), "nodes.\n" )
else
cat( "Running in sequential mode.\n" )

# 设置全局对象
globalVar1 <- c( "a", "b", "c" )
globalVar2 <- c( "d", "e" )
globalVar3 <- c( 1:10 )
globalNoExport <- "dummy"

# 设置函数用于在计算单元执行
calculate <- function( x ) {
cat( x )
return( 2 ^ x )
}

# 将变量传送到各个计算单元环境
sfExportAll( except=c( "globalNoExport" ) )

# 在计算单元上执行命令
sfClusterEvalQ( ls() )

# 在计算单元执行函数,并传递参数
cat( unlist( sfLapply( globalVar3, calculate ) ) )

# 在计算单元清除变量
sfRemoveAll( except=c( "calculate" ) )