Friday, January 21, 2011

Parse and Transform Text File by Using PowerShell (3)

In my previous blog, I described who I parsed the input text file from my debug result to generate the first report table. Now Let's continue to the second report.

Generate the Second Report

The second report is another summary report, based on the first report. It displays a list of methods and counts of their calls. In other words, it will list distinct method names in the first report, and take the max counter for each method as method call count.

I find out that to get the second report, it is not easy to use just one pipeline with a long chain of segment codes. Still, I'll continue to use pipelines with each one to generate temporary results. I'll use pipelines to get temporary results, and dynamically add properties to objects.

First, initialize some variables:

# second report
# initialize variables
$totalCount = 0
$i = 0
$ht=New-Object Collections.Hashtable

The first pipeline is used to pass the result of the above collection of objects ($results) as input. There will be no result out of the pipeline. Instead, I use the pipeline to update a hash table variable $ht, with the property of method name as its key.

The input objects are piped into the second segment which is a where clause to filter any object with its count property larger than zero.

# Filter out rows with count less or equals 0
$result | where { $_.$propertyNameForCount -gt 0 } `

Then the filtered object is passed to a block of codes, where the hash table variable $ht is updated with the object with the max count value:

| Select-Object -Property 'MethodName', $propertyNameForCount `
| %{
if ( $_ -ne $null ) {
$key1 = $_.'MethodName';
if ( $ht.ContainsKey($key1) -eq $false ) {
$ht.Add($key1, $_)
elseif ($_.$propertyNameForCount -gt $ht[$key1].$propertyNameForCount) {
$ht.Set_Item($key1, $_)

The values of the hash table $ht contain objects we need for the report. In addition to those objects, I need a total count of each method call count. This is done by a pipeline to update total count into the variable $totalCount:

$ht.Values `
| %{ `
if ( $_ -ne $null ) {
$totalCount += $_.$propertyNameForCount

After we get the total count, a new object is created in the same structure of properties as ones in the hash table, then add the new object to the hash table $ht:

# add total count to $ht1 table
$objValue = New-Object PSCustomObject
$objValue | Add-Member -type NoteProperty -Name 'MethodName' -Value '[Total count]'; # use XXX so that sorting to the last
$objValue | Add-Member -type NoteProperty -Name $propertyNameForCount -Value $totalCount;
$ht.Add("XXXX dummy key", $objValue);

Finally, the objects in the hash table are ready for the second report. The report is generated by the last pipeline: sorting by property 'MethodName', adding a sequence number as object property, and appending the table layout report to the output file:

# generate report
$ht.Values `
| Sort-Object -Property 'MethodName' `
| %{ # Add sequence column
$obj = $_;
$obj | Add-Member -type NoteProperty -Name 'No.' -Value $i;
} `
| ft -AutoSize -Property 'No.', 'MethodName', $propertyNameForCount >> $outputFile; # Format the result and out put to file

In summary, pipeline in PS is a nice-to-have feature. You may achieve the same result without using pipeline. I like PS pipeline's simple, fluent flow and powerful feature. As you can see, PS is also dynamic data type script language. Objects can be created on-fly and properties can be added or removed during the run-time. My parse script codes take the advantage of those two great features.

Here is the complete package( of the script codes.