r - How to convert an portion of an XML into a data frame? (properly) -
i trying extract information xml file clinicaltrials.gov. file organized in following way:
<clinical_study> ... <brief_title> ... <location> <facility> <name> <address> <city> <state> <zip> <country> </facility> <status> <contact> <last_name> <phone> <email> </contact> </location> <location> ... </location> ... </clinical_study> i can use r xml package cran in following code extract location nodes xml file:
library(xml) clinicaltrialurl <- "http://clinicaltrials.gov/ct2/show/nct01480479?resultsxml=true" xmldoc <- xmlparse(clinicaltrialurl, useinternalnode=true) locations <- xmltodataframe(getnodeset(xmldoc,"//location")) this works kind of ok. however, if @ data frame, notice xmltodataframe function lumped under <facility> single concatenated string. solution write code generate data frame column column, example, generate
you flatten xml first.
flatten_xml <- function(x) { if (length(xmlchildren(x)) == 0) structure(list(xmlvalue(x)), .names = xmlname(xmlparent(x))) else reduce(append, lapply(xmlchildren(x), flatten_xml)) } dfs <- lapply(getnodeset(xmldoc,"//location"), function(x) data.frame(flatten_xml(x))) allnames <- unique(c(lapply(dfs, colnames), recursive = true)) df <- do.call(rbind, lapply(dfs, function(df) { df[, setdiff(allnames,colnames(df))] <- na; df })) head(df) # city state zip country status last_name phone email last_name.1 # 1 birmingham alabama 35294 united states recruiting louis b nabors, md 205-934-1813 bnabors@uab.edu louis b nabors, md # 2 mobile alabama 36604 united states recruiting melanie alford, rn 251-445-9649 malford@usouthal.edu pamela francisco, ccrp # 3 phoenix arizona 85013 united states recruiting lynn ashby, md 602-406-6262 lashby@chw.edu lynn ashby, md # 4 tucson arizona 85724 united states recruiting jamie holt 520-626-6800 jholt1@email.arizona.edu baldassarre stea, md, phd # 5 little rock arkansas 72205 united states recruiting wilma brooks, rn 501-686-8530 aleubanks@uams.edu amanda eubanks, apn # 6 berkeley california 94704 united states withdrawn <na> <na> <na> <na>
Comments
Post a Comment