python - Slicing Mutliindex data with Pandas -

July 15, 2012

i have imported csv multi-indexed dataframe. here's mockup of data:

df = pd.read_csv("coursedata2.csv", index_col=[0,2])

print (df)

                                  course

id course list 12345 interior environments desn10000 rendering & present skills desn20065 lighting desn20025 22345 drawing techniques desn10016 colour theory desn14049 finishes & sustainable issues desn12758 lighting desn20025 32345 window treatments&soft furnish desn27370 42345 introduction cadd info16859 principles of drafting desn10065 drawing techniques desn10016 fundamentals of design desn15436 colour theory desn14049 interior environments desn10000 drafting desn10123 textiles , applications desn10199 finishes & sustainable issues desn12758

[17 rows x 1 columns]

i can slice label using .xs -- eg:

selected = df.xs (12345, level='id') print selected

                        course course list                           interior environments       desn10000 rendering & present skills  desn20065 lighting                    desn20025  [3 rows x 1 columns]

but want step through dataframe , perform operation on each block of courses, id. id values in real data random integers, sorted in ascending order.

df.index shows:

df.index multiindex(levels=[[12345, 22345, 32345, 42345], [u'colour theory', u'colour theory ', u'drafting', u'drawing techniques', u'finishes & sustainable issues', u'interior environments', u'introduction cadd', u'lighting', u'principles of drafting', u'rendering & present skills', u'textiles , applications', u'the fundamentals of design', u'window treatments&soft furnish']], labels=[[0, 0, 0, 1, 1, 1, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3], [5, 9, 7, 3, 1, 4, 7, 12, 6, 8, 3, 11, 0, 5, 2, 10, 4]], names=[u'id', u'course list'])

it seems me should able use first index labels increment through dataframe. ie. courses label 0 1 2 3,... looks .xs not slice label.

am missing something?

so there may more efficient ways this, depending on you're trying data. however, there 2 approaches come mind:

for id_label in df.index.levels[0]:     some_func(df.xs(id_label, level='id'))

and

for id_label in df.index.levels[0]:     df.xs(id_label, level='id').apply(some_func, axis=1)

depending on whether want operate on group whole or on each row in it.

Search This Blog

Silver

python - Slicing Mutliindex data with Pandas -

Comments

Post a Comment

Popular posts from this blog

user interface - How to replace the Python logo in a Tkinter-based Python GUI app? -

android - Get AccessToken using signpost OAuth without opening a browser (Two legged Oauth) -

org.mockito.exceptions.misusing.InvalidUseOfMatchersException: mockito -