regex - HTML tag replacement/removal -


i'm trying find way clean sloppy html (machine generated).

my assumption regex solution, i'm not sure start.

html like...

the <div>government’s</div> “risk management” efforts. as&nbsp;<br /> <span style="line-height:1.6em">critical infrastructure provides</span><br> 

to html like...

the government's "risk management" efforts. critical infrastructure provides 

this means replacing or removing several different tags...

&nbsp;   = ' ' <br />   = ' ' <br>     = ' ' “        = " ”        = " ’        = ' <span>   = remove <div>    = remove style    = remove 

i have several different text editors (sublime text, textmate, etc.) , i'm open using apps, applescript or else save having manually search each of these.

thanks help.

wrap <span> tags, inner html, , string.replace

<span id="test"> <div>government’s</div>“risk management” efforts. as&nbsp; <br /> <span style="line-height:1.6em">critical infrastructure provides</span>  <br> </span>  var cleantext = test.innerhtml.replace("<div>",""); 

or take innertext , wil lget rid of tags.


Comments

Popular posts from this blog

user interface - How to replace the Python logo in a Tkinter-based Python GUI app? -

objective c - Greedy NSProgressIndicator Allocation -

how to set an OCR language in Google Drive -