Sunday, April 05, 2020

Use VIM to Clean Up HTML

I have set up a blog at Blogger with more than one authors so that we can write our stories, observations about COVID-19. Some authors do not have any knowledge about HTML or how to write clean blogs there. As a result, the blogs they created contains massy HTML tags, either cause them frustration to edit text or inconsistent styles spreading blog text.

I am tech support person working hard behind scenes. One of my jobs is to clean up their blogs to remove unnecessary HTML tags. Here is one example how complexed HTML is:

VIM as Effective Tool

It is really tedious and time consuming to manually remove HTML paired tags. For example:

<span ...>...</span>

I realize that VIM is a great tool for help!

Here is the command to find out span paired block in HTML:


search for start patten <span any chars afterwards including mutilple lines till >.

Here is the result:

That's very exiting! With this confirmation test, it will be just line of VIM replace command to clean up span tags:


More VIM command for finding and replacing

/<\/span\(\_.\{-}\)\@<=\_.\{-}> ## find </span>
:%s:<\/span\(\_.\{-}\)\@=\_.\{-}>::g ## replace </span>

/<div\(\_.\{-}\)\@<=\_.\{-}> ## find <div...>
:%s:<div\(\_.\{-}\)\@=\_.\{-}>::g ## replace <div>

/<\/div\(\_.\{-}\)\@<=\_.\{-}> ## find </div>
:%s:<\/div\(\_.\{-}\)\@=\_.\{-}>::g ## replace </div>

With above a few commands, I cleaned up HTML with great result!