1
votes

I've had a really hard time tracking down a solution for this--though I'm sure it's out there. Just not sure of the exact wording to get what I'm looking for.

I have a huge data set where some of the data is missing information so it is not uniform. I want to extract just the name into one column and the e-mail in to the next column.

The best way I can narrow this down is there is a space between each unique entry with the name always being in the first box.

Example:

John Doe
John Doe's Company
(555) 555-5555
[email protected]

John Doe
(555) 555-5555

John Doe
Jane Doe's Company
[email protected]

With the results wanted being (in two excel columns):

John Doe | [email protected]
John Doe |
John Doe | [email protected]

Any suggestions on the best way to do this would be appreciated it. To make it complicated if there was no e-mail I would want to ignore that set completely, but I could just manually check.

2
Suggest you export to CSV and then write a fairly simple python program to read the CSV and produce your required output. I'm sure tis is also possible in an Excel macro using VBA,DisappointedByUnaccountableMod

2 Answers

0
votes

VBA coding:
1. Indicate in Row1 the initial row where the data begins.
2. Place a flag in this case the word "end" to indicate the end of the information.
3. Create a second sheet

Sub ToList()
Row1 = 1 'Row initial from data
Row2 = 1 'Row initial to put list
Do
    Name = False
    Do
        field = Trim(Sheets(1).Cells(Row1, 1))
        If field <> "" And LCase(field) <> "end" And Not Name Then
            Sheets(2).Cells(Row2, 1) = field
            Name = True
        End If
        Row1 = Row1 + 1
    Loop Until (IIf(field = "" Or LCase(field) = "end", True, False))
    fieldprev = Sheets(1).Cells(Row1 - 2, 1)
    If InStr(fieldprev, "@") > 0 Then
        Sheets(2).Cells(Row2, 2) = fieldprev
    End If
    Row2 = Row2 + 1
Loop Until (IIf(LCase(field) = "end", True, False))
End Sub
0
votes

Extracting the e-mail address shouldn't be too difficult as you just need to is search for a string containing the @ character. A series of search() and mid() functions can be used to separate out the individual words. Search for each instance of a space and use that value in a mid() function. Then search for @ in the results and you should find the e-mail address. Extracting the name will be more difficult if the original data is very messy.

However I second the comment above about using an external script, especially for a large dataset. Excel isn't really designed for the sort of thing you are describing here.