r - How to convert an portion of an XML into a data frame? (properly) -


i trying extract information xml file clinicaltrials.gov. file organized in following way:

<clinical_study>   ...   <brief_title>   ...   <location>     <facility>       <name>       <address>         <city>         <state>         <zip>         <country>     </facility>     <status>     <contact>       <last_name>       <phone>       <email>     </contact>   </location>   <location>     ...   </location>   ... </clinical_study> 

i can use r xml package cran in following code extract location nodes xml file:

library(xml) clinicaltrialurl <- "http://clinicaltrials.gov/ct2/show/nct01480479?resultsxml=true" xmldoc <- xmlparse(clinicaltrialurl, useinternalnode=true) locations <- xmltodataframe(getnodeset(xmldoc,"//location")) 

this works kind of ok. however, if @ data frame, notice xmltodataframe function lumped under <facility> single concatenated string. solution write code generate data frame column column, example, generate

you flatten xml first.

flatten_xml <- function(x) {   if (length(xmlchildren(x)) == 0) structure(list(xmlvalue(x)), .names = xmlname(xmlparent(x)))   else reduce(append, lapply(xmlchildren(x), flatten_xml)) }  dfs <- lapply(getnodeset(xmldoc,"//location"), function(x) data.frame(flatten_xml(x))) allnames <- unique(c(lapply(dfs, colnames), recursive = true)) df <- do.call(rbind, lapply(dfs, function(df) { df[, setdiff(allnames,colnames(df))] <- na; df })) head(df)   #          city      state   zip       country     status          last_name        phone                    email               last_name.1  # 1  birmingham    alabama 35294 united states recruiting louis b nabors, md 205-934-1813          bnabors@uab.edu        louis b nabors, md  # 2      mobile    alabama 36604 united states recruiting melanie alford, rn 251-445-9649     malford@usouthal.edu    pamela francisco, ccrp  # 3     phoenix    arizona 85013 united states recruiting     lynn ashby, md 602-406-6262           lashby@chw.edu            lynn ashby, md  # 4      tucson    arizona 85724 united states recruiting         jamie holt 520-626-6800 jholt1@email.arizona.edu baldassarre stea, md, phd  # 5 little rock   arkansas 72205 united states recruiting   wilma brooks, rn 501-686-8530       aleubanks@uams.edu       amanda eubanks, apn  # 6    berkeley california 94704 united states  withdrawn               <na>         <na>                     <na>                      <na> 

Comments

Popular posts from this blog

user interface - How to replace the Python logo in a Tkinter-based Python GUI app? -

objective c - Greedy NSProgressIndicator Allocation -

how to set an OCR language in Google Drive -