Merge many tsv files by common key using Python pandas -

January 15, 2013

i have several hundred tsv files 2 fields, common key , unique sample id:

==> test1.vmat <== chrom:pos:ref:alt  144-93-02 1:14653:c:t  1 1:14677:g:a  1 1:14907:a:g  1  ==> test2.vmat <== chrom:pos:ref:alt  144-93-01 1:14653:c:t  1 1:14522:g:a  1 1:14907:a:g  1

i perform outer join on of files using field "chr:pos:ref:alt" form 1 giant matrix. example 2 files:

chrom:pos:ref:alt  144-93-02    144-93-01 1:14653:c:t  1.0 1.0 1:14522:g:a  na 1.0 1:14677:g:a  1.0 na 1:14907:a:g  1.0 1.0

i got output above using following code, having trouble looping on hundreds of *tsv files in directory (path/to/testn.vmat). how can modify merge individual *tsv files directory single tsv file?

variant_field = "chrom:pos:ref:alt" outfile = "everyone.vmat"  df1 = pandas.read_csv("path/to/test1.vmat", sep='\t', parse_dates=false) df2 = pandas.read_csv("path/to/test2.vmat", sep='\t', parse_dates=false)  df3 = pandas.merge(df1,df2,on=variant_field, how='outer') df3.to_csv(str(outfile), sep="\t", header=true, index=false, na_rep="na", engine='python')

if make 'chrom:pos:ref:alt' index join multiple frames:

df1 = pandas.read_csv("path/to/test1.vmat", sep='\t', parse_dates=false,                                             index_col='chrom:pos:ref:alt')   in [11]: df1.join([df2], how='outer') out[11]:               144-93-02  144-93-01 1:14522:g:a        nan          1 1:14653:c:t          1          1 1:14677:g:a          1        nan 1:14907:a:g          1          1

in someways it's more honest think of concat rather join:

in [12]: pd.concat([df1, df2], axis=1) out[12]:               144-93-02  144-93-01 1:14522:g:a        nan          1 1:14653:c:t          1          1 1:14677:g:a          1        nan 1:14907:a:g          1          1

you can iterate through files using glob:

from glob import iglob pd.concat((pd.read_csv(f, ...) f in glob.iglob(*.vmat)), axis=1)

Search This Blog

Silver

Merge many tsv files by common key using Python pandas -

Comments

Post a Comment

Popular posts from this blog

user interface - How to replace the Python logo in a Tkinter-based Python GUI app? -

objective c - Greedy NSProgressIndicator Allocation -

how to set an OCR language in Google Drive -