Saturday, January 09, 2010

VIM Tips: Edit HTML Table from Web

I like to explore and learn new things. It is a life time learning experience. For example, I like to use VIM to edit my text files. There are so many great and powerful features in VIM. What I know is just very tiny tip of the iceberg. There are so much to learn, and I really enjoy learning VIM.

Here is my blog story. I occasionally need to copy and paste some HTML table data from web to my blog. One case was the technical specifications of an Epson printer I purchased from Futureshop. Futureshop's on-line web page lists its tech-features on its page. This printer is the current model today, but it will be out very soon. The web information will be gone. As a record or reminder for myself, I wrote a blog with all those features there. What I did first was to copy the HTML table from Futureshop's web site. The problem is that there are some other un-wanted tags in the block such as javascript functions as links and too many spaces and enter keys. The Blogger's web based editor is not a truly HTML editor. It convers, for example, enter keys (white spaces) to line break tags. I would like to remove them.

Hence, I need VIM to help me. Here is the summary of VIM search and replace commands I used:

:%s/\s\{2,}/ /g   # white space > 2 replaced by one space
:%s/\t\+//g # tab > 0 replaced by nothing
:%s/\s\n/\r/g # white space + newline replaced by new line
:%s/\n\s/\r/g # new line + white space replaced by new line
:%s/\n\{2,}/\r/g # new line > 2 replaced by new line
:%s/a_\{-}\/a>//g # replace <a...>..</a> with nothing
:%s/\n//g # replace all the new lines with nothing

Notice that the line breaks in search is \n, where the line break in replacement is \r. {n,m} is used for a count range from n to m, and m is optional. Another very useful key in search is _\{-}xxx, _\{-} is a special keys to none-greedy search. Normally . matches to any char, but with _\{-} will search for any char including new line but not the chars afterwords. Here I search for "<a" with any chars until ">". By the way, .\ is for any char but not including new line.

Before I remove all the new lines (combine lines to one line), for the case of Table, I'll add the following column groups to set up width of columns. Here is an example of two columns:

<colgroup width="10%"></colgroup>
<colgroup width="90%"></colgroup>

In order to display rows in alternative colours, I have added three css classes in my Bloger HTML settings: header, odd and even. I have to manually add those classes to each row like <th class="header"> and <tr class="odd"> or <tr class="even">.

Finally I remove all the new lines in the table so that this block of table in HTML is ready for my blog.

With VIM those powerful features, it makes my blog writing much easier, especially in the case I need to copy and paste some HTML from other web pages.