Removing duplicates on subset of columns in R -
i have table
[,1] [,2] [,3] [,4] [,5] [1,] 1 5 10 0.00040803 0.00255277 [2,] 1 11 3 0.01765470 0.01584580 [3,] 1 6 2 0.15514850 0.15509000 [4,] 1 8 14 0.02100531 0.02572320 [5,] 1 9 4 0.04748648 0.00843252 [6,] 2 5 10 0.00040760 0.06782680 [7,] 2 11 3 0.01765480 0.01584580 [8,] 2 6 2 0.15514810 0.15509000 [9,] 2 8 14 0.02100491 0.02572320 [10,] 2 9 4 0.04748608 0.00843252 [11,] 3 5 10 0.00040760 0.06782680 [12,] 3 11 3 0.01765480 0.01584580 [13,] 3 8 14 0.02100391 0.02572320 [14,] 3 9 4 0.04748508 0.00843252 [15,] 4 5 10 0.00040760 0.06782680 [16,] 4 11 3 0.01765480 0.01584580 [17,] 4 8 14 0.02100391 0.02572320 [18,] 4 9 4 0.04748508 0.00843252 [19,] 5 8 14 0.02100391 0.02572320 [20,] 5 9 4 0.04748508 0.00843252
i want remove duplicates table. however, colums 2,3,4 matter. example: rows 1,6,11,15 identical if columns 2,3,4 observed. note column 4: possible incorporate considered being same long within 10e-5 of value? rows 1 , 6 considered being identical although value in column 4 differs (within tolerance mentioned)?
then great output like:
column 2 value | column 3 value | column 1 value @ the pair has been first observed (with tolerance) (in example 1) | column 1 value @ pair has been last observed (with tolerance) (in example 4) | value of column 4 @ first appearance (0.00040803 in example)
this way of thinking it, i'm not sure it's you're looking for. logic should able started though.
dat <- data set dat v1 v2 v3 v4 v5 1 1 5 10 0.00040803 0.00255277 2 1 11 3 0.01765470 0.01584580 3 1 6 2 0.15514850 0.15509000 4 1 8 14 0.02100531 0.02572320 5 1 9 4 0.04748648 0.00843252 # truncated dat <- dat[, c(2, 3, 4)] dat$v4 <- round(dat$v4, 5) unique(dat) v2 v3 v4 1 5 10 0.00041 2 11 3 0.01765 3 6 2 0.15515 4 8 14 0.02101 5 9 4 0.04749 9 8 14 0.02100
Comments
Post a Comment