I have a several hundred million row CSV, with no headers, tab delimited, and about 40 fields. I need to split this csv into 6 smaller csv's.
The smaller csv's fields will be specified via cherry picking. As in this example:
Output csv#1 uses fields 1, 4, 23, 16
Output csv#2 uses fields 2, 3, 5, 24, 34
Output csv#3 uses fields 6, 25, 20, 16
Output csv#4 uses fields 10, 29, 31, 32, 34, 35
Etc.
Several of the original fields won't get used.
I have had the pleasure of reading many other users posts on splitting a single csv into multiple csv's, but haven't stumbled across one that hints how to specify which fields to put into which csv's.
I am assuming I should use streamReader/StreamWriter, and '`t' to detect the delimiter. I don't care if I have to run once over the source file for each iteration rather than run it all on a single pass. I expect this to run overnight (or over a weekend!)
I just haven't figured out how to pick each filed for the correct output csv. Thank you for any and all idea's!
-= Bruce
Here is my sample csv. This sample is comma delimited, NOT tab delimited. I can deal with the tab delimited change to any code.
12/22/2018 16:00,Something,,4503333883,137.11.21.251,113,,Tackov,SO,1968974515,192.168.4312.179,37468,QWE,,172166316,99.0.0.0-99.255.255.255 (STACK),3011,overflow_internet,stack-overflow002-stack.over.flow,stack_over_flow,1,,CISCO,ASA,,US/Eastern,STACK2 Type,,STACK1 Flow,,Forty4Fortnights,,QWE FourceScore,RST,ABC,,AndForty,,Fries,,With That,,Shake,,14012,Lunch ,With Stackoverflow all things are possible,0,7,/dot,/Access,/Dot/Com/Service,/Win,/Fail/Win
12/22/2018 16:00,Something2,stackover,2018138499,120.43.212.240,34749,,Tackov,SO,1968982930,192.168.4345.146,4545,QWE,,172166316,99.0.0.0-99.255.255.255 (STACK),3011,overflow_internet,stack-overflow002-stack.over.flow,stack_over_flow,1,,CISCO,ASA,inside,US/Eastern,STACK2 Type,,STACK1 Flow,,Forty4Fortnights,,QWE FourceScore,,ABC,,AndForty,,Fries,,With That,,Shake,,14012,Lunch ,with Google all things are possible ABC,0,5,/dot,/Access,/Dot/Com/Service,/Win,/Fail/Win
12/22/2018 16:00,Something3,Overflows,3203409083,12.45.36.187,60170,,Tackov,SO,1968976181,192.168.4319.53,4545,QWE,,172166316,99.0.0.0-99.255.255.255 (STACK),3011,overflow_internet,stack-overflow002-stack.over.flow,stack_over_flow,1,,CISCO,ASA,inside,US/Eastern,STACK2 Type,,STACK1 Flow,,Forty4Fortnights,,QWE FourceScore,,ABC,,AndForty,,Fries,,With That,,Shake,,14012,Lunch ,with Google all things are possible ABC,0,5,/dot,/Access,/Dot/Com/Service,/Win,/Fail/Win
12/22/2018 16:00,Something4,,1390242054,82.221.105.6,59922,,Tackov,SO,1968986785,192.168.4360.161,88,QWE,,172166316,99.0.0.0-99.255.255.255 (STACK),3011,overflow_internet,stack-overflow002-stack.over.flow,stack_over_flow,1,,CISCO,ASA,inside,US/Eastern,STACK2 Type,,STACK1 Flow,,Forty4Fortnights,,QWE FourceScore,,ABC,,AndForty,,Fries,,With That,,Shake,,14012,Lunch ,with Google all things are possible ABC,0,5,/dot,/Access,/Dot/Com/Service,/Win,/Fail/Win
12/22/2018 16:00,Something7,Ackover,1245693210,11.63.197.26,60300,,Tackov,SO,1968982959,192.168.4345.175,99,QWE,,172166316,99.0.0.0-99.255.255.255 (STACK),3011,overflow_internet,stack-overflow002-stack.over.flow,stack_over_flow,1,,CISCO,ASA,inside,US/Eastern,STACK2 Type,,STACK1 Flow,,Forty4Fortnights,,QWE FourceScore,,ABC,,AndForty,,Fries,,With That,,Shake,,14012,Lunch ,with Google all things are possible ABC,0,5,/dot,/Access,/Dot/Com/Service,/Win,/Fail/Win
12/22/2018 16:00,Something2,Koverflo,3026283045,199.97.106.37,47861,,Tackov,SO,1968982790,192.168.4345.6,443,QWE,,172166316,99.0.0.0-99.255.255.255 (STACK),3011,overflow_internet,stack-overflow002-stack.over.flow,stack_over_flow,1,,CISCO,ASA,inside,US/Eastern,STACK2 Type,,STACK1 Flow,,Forty4Fortnights,,QWE FourceScore,,ABC,,AndForty,,Fries,,With That,,Shake,,14012,Lunch ,with Google all things are possible ABC,0,5,/dot,/Access,/Dot/Com/Service,/Win,/Fail/Win
In case the data doesn't copy paste well, each row ends with the chars "/Fail/Win"
=-=-=-=-=-=-=-=-==- 7 hours later..... I think I have everything going except for one tiny detail. I haven't been able to get my columns split out when reading and writing in this method. My test file now reads and writes in 29.1 seconds. YAY! But all it does is read it, and write it back out. Not terribly effective. The variable $Header, per the 'Tessalating Solution' contains the headers I want to split off on my test code. I have not found how to specify a delimiter, and subsequently place the $Header variable. Thank You for any help.
clear
$sw = new-object System.Diagnostics.Stopwatch
$sw.Start()
$Header = "Select-Object -Property 'SourceGeoCountryName','SourceGeoLocationInfo','SourceAddress','SourceHostName','SourcePort','SourceDnsDomain'"
$MyPath = "C:\Users\myusername\SplitOuts\"
$InputFile ="mysourcefile.csv"
$OutputFileName = "table1.csv"
$FileName = $MyPath + $InputFile
$OutputFile = $MyPath + $OutputFileName
Set-Location $MyPath
$reader = $null
If(Test-path $OutputFile) {Clear-Content $OutputFile}
$stream=[System.IO.StreamWriter] $OutputFile
ForEach ($line in [System.IO.File]::ReadLines($FileName)) {
$stream.WriteLine($line)
}
$stream.close()
$sw.Stop()
Write-Host "Completed in " $sw.Elapsed.TotalSeconds.ToString("00.0") "seconds" -ForegroundColor Yellow`
-= Bruce
Import-Csv
, data munging andExport-Csv
. However, with files that size loading everything into memory becomes prohibitive. But dropping down to manual broken parsing feels weird, too ... :-| - JoeyImport-Csv
take to import the file once?Measure-Command { Import-Csv "file/path" -Delimiter "yourdelimiter" }
- sodawillowGet-Content |Select -Skip $x -First $n |ConvertFrom-Csv |%{<# split data and Export-Csv -Append #>}
) - Mathias R. Jessen