I try to scrape emailaddresses with Powershell from a directory, with subdirectories and within them .txt files. So i have this code:
$input_path = ‘C:\Users\Me\Documents\toscrape’
$output_file = ‘C:\Users\Me\Documents\toscrape\output.txt’
$regex = ‘\b[A-Za-z0-9._%-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}\b’
select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } > $output_file
But when I execute it, it gives me an error
select-string : The file C:\Users\Me\Documents\toscrape\ can not be read: Could not
path 'C:\Users\Me\Documents\toscrape\'.
At line:1 char:1
+ select-string -Path $input_path -Pattern $regex -AllMatches | % { $_.Matches } | ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (:) [Select-String], ArgumentException
+ FullyQualifiedErrorId : ProcessingFile,Microsoft.PowerShell.Commands.SelectStringCommand
I've tried variations to the $input_path, with Get-Item, Get-ChildItem, -Recurse, but nothing seems to work. Can anyone figure out how I need to scrape my location and all its subdirectories and files for the regex pattern?
Get-ChildItem -Path $input_path -Include "*.txt" -Recurse
– boxdog