python - Slicing Mutliindex data with Pandas -
i have imported csv multi-indexed dataframe. here's mockup of data:
df = pd.read_csv("coursedata2.csv", index_col=[0,2])
print (df)
course
id course list
12345 interior environments desn10000 rendering & present skills desn20065 lighting desn20025 22345 drawing techniques desn10016 colour theory desn14049 finishes & sustainable issues desn12758 lighting desn20025 32345 window treatments&soft furnish desn27370 42345 introduction cadd info16859 principles of drafting desn10065 drawing techniques desn10016 fundamentals of design desn15436 colour theory desn14049 interior environments desn10000 drafting desn10123 textiles , applications desn10199 finishes & sustainable issues desn12758
[17 rows x 1 columns]
i can slice label using .xs -- eg:
selected = df.xs (12345, level='id') print selected
course course list interior environments desn10000 rendering & present skills desn20065 lighting desn20025 [3 rows x 1 columns]
>
but want step through dataframe , perform operation on each block of courses, id. id values in real data random integers, sorted in ascending order.
df.index shows:
df.index multiindex(levels=[[12345, 22345, 32345, 42345], [u'colour theory', u'colour theory ', u'drafting', u'drawing techniques', u'finishes & sustainable issues', u'interior environments', u'introduction cadd', u'lighting', u'principles of drafting', u'rendering & present skills', u'textiles , applications', u'the fundamentals of design', u'window treatments&soft furnish']], labels=[[0, 0, 0, 1, 1, 1, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3], [5, 9, 7, 3, 1, 4, 7, 12, 6, 8, 3, 11, 0, 5, 2, 10, 4]], names=[u'id', u'course list'])
it seems me should able use first index labels increment through dataframe. ie. courses label 0 1 2 3,... looks .xs not slice label.
am missing something?
so there may more efficient ways this, depending on you're trying data. however, there 2 approaches come mind:
for id_label in df.index.levels[0]: some_func(df.xs(id_label, level='id'))
and
for id_label in df.index.levels[0]: df.xs(id_label, level='id').apply(some_func, axis=1)
depending on whether want operate on group whole or on each row in it.
Comments
Post a Comment