Saturday, January 01, 2011

Parse and Transform Text File by Using PowerShell (1)

I have used my DebugLog class to investigate issues in Visual Studio projects. The debug messages are pushed to Visual Studio's output consol. The generated messages may be very extensive huge. For example, I had a case of a Windows application with performance issues of DAO calls. I got about 8233 lines of debug messages just from the start to the main window displayed. I copied the messages to a text file. It is 732K in size. I like to keep the raw messages there; however, it was hard to investigate issues with the extensive raw messages.

What I would like is to generate concise summary reports, for example, a list of method calls with durations in an order and a list of DAO calls with their counts. This is similar to the case to use XPath to parse and to transform an XML content to another format, such as a HTML table list.

PS came to my mind first. PS is a script based language; therefore, it is easy to give it a try. I am not an expert in PS. I just use it and learn it as I need. I spent some time to write codes and finally I completed a script module to get my expected result. Here is my review of the codes.

Basic Concepts

Before I jump deep into my PS codes, I would like to list brief explanations for some basic concepts.

Single value variables are dynamically declared in PS with prefix $. The data type can also be static in the format of [type]var.

Hashtable is a dictionary data type with a key and an associated value. The constant definition is @{[key1=value1,...]} or @{}.

# is used for comments.

Statements can be either separated by line break or terminated by ';' character. ` character is used as a continuing indicator.

Piping or pipeline is a very powerful feature in PS. By using |, twp segments of codes can be chained together, the output or results of the first segment being piped into the next segment of codes as an input. Not only strings can be piped, but objects can also be passed through the pipeline. You can write similar codes without pipelines, but by using it appropriately, your scripts may look much simple and easy to read, and you may like this unique and powerful feature of PS.

There are many great resources on web. For example, the first part of this blog tutorial on PS variables, arrays, and hashtables provides nice hands-on examples on PS basics.

The Goal of my Project

I call it as a project because I want to write script codes to reach my goal. Basically, I copy my debug messages from VS output console and save them to a text file. The goal of the project is to read the text file as input, parse each line and generate a list of reports, actually two reports in this project.

The first report is a table view of methods and their corresponding duration time. The second report is a table view of interested methods and their call counts.

Here is a partial section of the raw data:

[ 5/5/2010 3:12:58 PM ] MainSchedulingTool::posMenuItem_Click
[ 5/5/2010 3:12:58 PM ] POSelectionViewForm::POSelectionViewForm
[ 5/5/2010 3:12:58 PM ] POSelectionViewForm::poStartDateTimeFilterPicker_ValueChanged
[ 5/5/2010 3:12:58 PM duration: 0.00 ] POSelectionViewForm::poStartDateTimeFilterPicker_ValueChanged
[ 5/5/2010 3:12:58 PM ] POSelectionViewForm::poEndDateTimeFilterPicker_ValueChanged
[ 5/5/2010 3:12:58 PM duration: 0.00 ] POSelectionViewForm::poEndDateTimeFilterPicker_ValueChanged
[ 5/5/2010 3:12:58 PM ] PODAO::GetAllPOsByStatusAndOrderByEndDate
[ 5/5/2010 3:12:58 PM ] PODAO::GetPOs
[ 5/5/2010 3:12:59 PM ] PODAO::ReadPOData
[ 5/5/2010 3:12:59 PM duration: 0.00 ] PODAO::ReadPOData
[ 5/5/2010 3:12:59 PM ] PODAO::ReadPOData
[ 5/5/2010 3:12:59 PM duration: 0.00 ] PODAO::ReadPOData
[ 5/5/2010 3:12:59 PM ] PODAO::ReadPOData
[ 5/5/2010 3:12:59 PM duration: 0.00 ] PODAO::ReadPOData
[ 5/5/2010 3:12:59 PM ] PODAO::ReadPOData
[ 5/5/2010 3:12:59 PM duration: 0.00 ] PODAO::ReadPOData

Here an example of the first report:
No. Org No. DateTime            Duration DAO count MethodName
--- ------- -------- -------- --------- ----------
1 161 5/5/2010 3:12:59 PM 0.38 1 PODAO::GetPOs
2 162 5/5/2010 3:12:59 PM 0.38 1 PODAO::GetAllPOsByStatusAndOrderByEndDate
3 164 5/5/2010 3:12:59 PM 0.41 0 POSelectionViewForm::POSelection...
4 807 5/5/2010 3:13:00 PM 1.56 1 VendorDAO::GetVendorData
5 808 5/5/2010 3:13:00 PM 1.56 1 VendorDAO::GetVendors
6 1450 5/5/2010 3:13:02 PM 1.83 2 VendorDAO::GetVendorData
7 1451 5/5/2010 3:13:02 PM 1.84 2 VendorDAO::GetVendors
8 1452 5/5/2010 3:13:02 PM 1.84 0 POSelectionViewForm::setupSortedPOList
9 2575 5/5/2010 3:13:13 PM 14.13 0 POSelectionViewForm::PopulatePOData...
10 2576 5/5/2010 3:13:13 PM 14.13 0 POSelectionViewForm::POSelection...
11 2729 5/5/2010 3:13:14 PM 0.16 0 MainSchedulingTool::setupC......

The first column is a sequence number. It is a sequence line number in the report. The second is similar to the line number in the raw text file. The remaining columns are the information about each method such date time, duration value, count of DAO method calls, and method names.

The following is an example of the second report:
No. MethodName                                DAO count
--- ---------- ---------
0 [Total count] 6
1 PODAO::GetAllPOsByStatusAndOrderByEndDate 1
2 PODAO::GetPOs 1
3 VendorDAO::GetVendorData 2
4 VendorDAO::GetVendors 2