Friday, April 06, 2012

VIM Tip: Not Containing Pattern (2)

In my previous VIM tip blog, I mentioned about searching for a pattern of a expected word with not expected word afterwords, for example, 'tablespace' followed by a word not starting with 't'. When I tried to the pattern in an opposite way, I could not figure out how to do a match. For example, a word not starting with 't' followed by a word of 'tablespace'.

I think that I figure it out now, but it took me a while to google and digest the related information. I think it is worthwhile to study this. I am writing this blog to summarize my findings.

Lookahead and Lookbehand Zero-width Assertions


At first I thought about match a pattern not containing another pattern should be as simple as using a negate or ! operator to identify not-containing-pattern. There may be a not operator in VIM search, but I could not find it. What I found is Lookaround Zero-width Assertion.

In VIM, the way of search for a pattern not containing another pattern is very smart and elegant. The basic search is to find all matched patterns, and the matched items are returned as results. In VIM, the following is a search command:

/PATTERN

the pattern can be a regular expression.

In VIM, zero width pattern is a pattern to be matched but not in the search results. Think zero width pattern as additional match condition, it can be described as either of following ways:

PATTREN + ZERO_WIDTH_ATOM or
ZERO_WIDTH_ATOM + PATTERN

The first one is called as lookahead zero width assertion, and second one as lookbehind zero width assertion. Assertion here means matched or not matched. The above two are positive assertions. If we take negative or not matched into consideration, there are four types of look around with zero width assertions. They are:

/PATTERN[ZERO_WIDTH_ATOM\@=]
/PATTERN[ZERO_WIDTH_ATOM\@!]
/[ZERO_WIDTH_ATOM\@<=]PATTERN
/[ZERO_WIDTH_ATOM\@<!]PATTERN

Note: [...] is used as optional and also as separator from pattern, [ or ] are not part of search.

As my understanding, VIM uses symbolic like character for look around. As other special characters in VIM, \ is used to indicate look around with zero width assertion. The following table summarizes characters used for this type of search:

\@  indicate lookahead
\@<  lookbehind
=  positive match
!  negative or not match.

In above searches, PATTERN is to be matched. If ZERO_WIDTH_ATOM is supplied, it will be used as additional assertion. If there is any match, the matched pattern items will be returned as results, but ZERO_WIDTH_ATOM is not in the results. That's why it called as zero width.

According to VIM documentation, the definition of ATOM is a character, or a character class, or a group (indicated by \(...\) braces).

Think the Search as a Program


Lets think those type of searches as a program. Here I have the following c-style pseudo codes:

Results getMatchedResults(
   string context,               // basic
   string pattern,
   string zero_width_pattern,    // zero_width pattern
   bool match_zero_width_pattern,
   bool lookahead)
{
  results = EMPTY_LIST;
  result = getMatchedResult(context, pattern);
  while (result != EMPTY)
  {
    if (zero_pattern != EMPTY)
    {
      if ( lookahead ) {
        context_tmp = getNextContextByLookahead(context, result);
        result_tmp = getMatchedResult(context_tmp, zero_width_pattern);
      } else {
        context_tmp = getNextContextByLookbehind(context, result);
        result_tmp = getMatchedResultByLookbehind(context_tmp,
                       zero_width_pattern);
      }
      if ( result_tmp == EMPTY ) {
        if (match_zero_width_pattern) {
          result = EMPTY;
        }
      } else {
        if (!match_zero_width_pattern) {
          retult = EMPTY;
        }
      }
    }
    if ( result != EMPTY ) {
      results.add(result);
      context = geNextContextByLookahread(context, result);
      result = getMatchedResult(context, pattern);
    }
  }
  return results;
}

The pseudo codes are very straightforward. Actually, VIM search is based on Regex as its search engine. The above search expression commands are basic Regex patterns.


References


0 comments: