Friday, January 07, 2011

Parse and Transform Text File by Using PowerShell (2)

With the previous blog on the basic concepts and the project goals on your belt, now it is time to jump into the codes from 6000 feet height.

Define Input Variables
I could set some input parameters for my script module for easy use from command line. However, since my script is only for my own use and I have to update some values frequently when I use it, then I decided to just declare some variables with initial settings as a simple start:

$inputFile = 'C:\Tmp\test\fst_PO20100505.txt';
$outputFile = '{0}.txt' -f $inputFile;
[decimal]$durationLimit = -0.01;

Then some counter variables and a hash table variable $ht for later use are initialized:
$i = 0
$j = 1
$identityExPattern = "*duration: *" # expression pattern as a filter
$exPatternForCount = "*DAO::*" # epxression pattern for count
$identityPropertyName = "Duration"
$propertyNameForCount = "DAO count"
$ht = New-Object Collections.Hashtable;

Next I output a line a header to my output file:
# Output duration limit to the result file
'==== Result of "Duration > {0}" ====' -f $durationLimit >> $outputFile;

Those codes are straightforward. The duration limit is a filter, which is used to list only calls with duration larger than the filter value.

Generate the First Report

The first report is generated by one statement with a long list of chained segments of codes as a pipeline. The result is saved to variable $result.

The first segment in the chain is to get lines from input file:

$result = Get-Content $inputFile `

Then the second segment is a block of codes %{...}. This block takes the input to process and generate empty or a collection of objects as a result. The result can be piped to the next segment. The codes in the block is very simple, it updates the line number in a varable $i, and then takes the input object as it is. $_ is a special variable notation for the input object:

| %{ $i = $i + 1;
} `

Then each line is piped to the next where constrain statement:
| where {$_.trim().length -gt 0 -and ($_.trim().SubString(0,1) -eq "[") -and ($_.trim() -like $identityExPattern)} `

This constrain clause is like a filter to remove un-expected string lines. The result of this filter will be a line which is not empty, the first none-empty char is "[" and the content of the string contains a substring of "duration: ". The interested lines are then piped into the next code block %{...}.

This block contains a lot of codes. They can be divided in to two parts. The first part is to split a line into an array variable $row, and to acuminate count of each method:

$row = (-split $_ ); # split a line into array by space
$bCount = $false;
$method = $row[7]; # example: [ 5/5/2010 9:55:03 AM duration: 0.19 ] AppUserDAO::loginUser
if ( $method -like $exPatternForCount )
$c = 1;
# update method in hash table with count value
if ( $ht.ContainsKey($method) )
$c = $ht.Get_Item($method) + 1;
$ht.Set_Item($method, $c);
$ht.Add($method, $c);
$bCount = $true;

The second part is to create an object based on the result of the first part: $row. The object is created by using "Select ... -InputObject ... -Property" statement. The -InputObject take the array of $row as input, and -Property defines a list of properties:

# create an object with properties: sequence, datetime, duration, DAO count, and methodName
$obj = select-object -input $row -prop `
@{Name='No.'; expression={$i;}}, `
@{Name='DateTime'; expression={[DateTime]($row[1] + ' ' + $row[2] + ' ' + $row[3]);};} , `
@{Name=$identityPropertyName; expression={([decimal]$row[5]);} }, `
@{Name=$propertyNameForCount; expression={ `
if ($bCount) { `
$ht.Get_Item($method); `
} `
else { `
0; `
} `
}, `
@{Name='MethodName'; expression={($method);} };
$obj; #output object

The last line of $obj will pass the object to the next segment of the pipeline.

The segment is a simple where clause, which filters out any duration smaller than the expected value:

| where { $_.Duration -gt $durationLimit } `

The final pipe segment is to create another object based on input object. The purpose of this new object is for the first report, each property for a column in the report:

$obj1 = Select-Object -Input $_ -Property `
@{Name='No.'; expression={$j;}}, `
@{Name='Org No.'; expression={$_.'No.';}}, `
@{Name='DateTime'; expression={$_.'DateTime';};} , `
@{Name=$identityPropertyName; expression={$_.$identityPropertyName;} }, `
@{Name=$propertyNameForCount; expression={$_.=$propertyNameForCount;} }, `
@{Name='MethodName'; expression={$_.'MethodName';} };
$obj1; # output obj1

The result of the above chained segments is a collection of objects for the first report. The result is assigned to variable $result, and then is output to a file:

# get the reults to output file as formatted table
$result | ft -AutoSize >> $outputFile;