I did some work to aggregate some logs from a group of servers for the whole month of February. This took a while, but I ended up with a nice CSV file that I was ready to load into Excel to create some Pivot Tables. See more on Pivot Tables at: Using PowerShell and Excel PivotTables to understand the files on your disk.
However, when I tried to load the CSV file into Excel, I got one of the messages I hate the most: “File not loaded completely”. That means that the file I was loading had more than one million rows, which means it cannot be loaded into a single spreadsheet. Bummer… Looking at the partially loaded file in Excel I figure I had about 80% of everything in the one million rows that did load.
In the end, since it was all February data and the date was in the mm/dd/yyyy format, I could just split the line by “/” and get the second item. There’s a PowerShell function for that. I also needed to convert that item to an integer, since a string comparison would not work (using the string type, “22” is less than “3”). I also had to add an encoding option to my out-file cmdlet. This preserved the log’ s original format, avoided doubling size of the resulting file and kept Excel happy.
Here is what I used to split the log into two files (one with data up to 02/14/15 and the other with the rest of the month):
Type .\server.csv | ? { [int] $_.Split("/")[1]) -lt 15 } | Out-File .\server1.csv -Encoding utf8Type .\server.csv | ? { [int] $_.Split("/")[1]) -ge 15 } | Out-File .\server2.csv -Encoding utf8
That worked well, but I lost the first line of the log with the column headers. It would be simple enough to edit the files with Notepad (which is surprisingly capable of handling very large log files), but at this point I was trying to find a way to do the whole thing using just PowerShell. The solution was to introduce a line counter variable to add to the filter:
$l=0; type .\server.csv | ? { ($l++ -eq 0) -or ( ([int] $_.Split("/")[1]) -lt 15 ) } | Out-File .\server1.csv -Encoding utf8$l=0; type .\server.csv | ? { ($l++ -eq 0) -or ( ([int] $_.Split("/")[1]) -ge 15 ) } | Out-File .\server2.csv -Encoding utf8
PowerShell was actually quick to process the large CSV file and the resulting files worked fine with Excel. In case you’re wondering, you could easily adapt the filter to use full dates. You would split by the comma separator (instead of “/”) and you would use the datetime type instead of int. I imagine that the more complex data type would probably take a little longer, but I did not measure it. The filter would look like this:
$l=0; type .\server.csv | ? { ($l++ -eq 0) -or ([datetime] $_.Split(",")[0] -gt [datetime] "02/15/2016") } | Out-File .\server1.csv -Encoding utf8
Now let me get back to my Pivot Tables…