Removing duplicates on subset of columns in R -

August 15, 2014

i have table

     [,1] [,2] [,3]       [,4]       [,5]  [1,]    1    5   10 0.00040803 0.00255277  [2,]    1   11    3 0.01765470 0.01584580  [3,]    1    6    2 0.15514850 0.15509000  [4,]    1    8   14 0.02100531 0.02572320  [5,]    1    9    4 0.04748648 0.00843252  [6,]    2    5   10 0.00040760 0.06782680  [7,]    2   11    3 0.01765480 0.01584580  [8,]    2    6    2 0.15514810 0.15509000  [9,]    2    8   14 0.02100491 0.02572320 [10,]    2    9    4 0.04748608 0.00843252 [11,]    3    5   10 0.00040760 0.06782680 [12,]    3   11    3 0.01765480 0.01584580 [13,]    3    8   14 0.02100391 0.02572320 [14,]    3    9    4 0.04748508 0.00843252 [15,]    4    5   10 0.00040760 0.06782680 [16,]    4   11    3 0.01765480 0.01584580 [17,]    4    8   14 0.02100391 0.02572320 [18,]    4    9    4 0.04748508 0.00843252 [19,]    5    8   14 0.02100391 0.02572320 [20,]    5    9    4 0.04748508 0.00843252

i want remove duplicates table. however, colums 2,3,4 matter. example: rows 1,6,11,15 identical if columns 2,3,4 observed. note column 4: possible incorporate considered being same long within 10e-5 of value? rows 1 , 6 considered being identical although value in column 4 differs (within tolerance mentioned)?

then great output like:

column 2 value | column 3 value | column 1 value @ the pair has been first observed (with tolerance) (in example 1) | column 1 value @ pair has been last observed (with tolerance) (in example 4) | value of column 4 @ first appearance (0.00040803 in example)

this way of thinking it, i'm not sure it's you're looking for. logic should able started though.

dat <- data set dat    v1 v2 v3         v4         v5 1   1  5 10 0.00040803 0.00255277 2   1 11  3 0.01765470 0.01584580 3   1  6  2 0.15514850 0.15509000 4   1  8 14 0.02100531 0.02572320 5   1  9  4 0.04748648 0.00843252 # truncated  dat <- dat[, c(2, 3, 4)] dat$v4 <- round(dat$v4, 5)  unique(dat)   v2 v3      v4 1  5 10 0.00041 2 11  3 0.01765 3  6  2 0.15515 4  8 14 0.02101 5  9  4 0.04749 9  8 14 0.02100

Search This Blog

Silver

Removing duplicates on subset of columns in R -

Comments

Post a Comment

Popular posts from this blog

user interface - How to replace the Python logo in a Tkinter-based Python GUI app? -

android - Get AccessToken using signpost OAuth without opening a browser (Two legged Oauth) -

org.mockito.exceptions.misusing.InvalidUseOfMatchersException: mockito -