0
votes

I have a text file with multi-line rows, delimited by a blank line. What would be the best way to read that row for row in Go?

I think I may have to use a Scanner with my own Split function, but just wondering if there is a better/easier way that I am missing.

I have tried using my own Splitfunc based on bufio.ScanLines:

func MyScanLines(data []byte, atEOF bool) (advance int, token []byte,    err error) {
    if atEOF && len(data) == 0 {
            return 0, nil, nil
    }
    if i := bytes.IndexAny(data, "\n\n"); i >= 0 {
            return i + 1, dropCR(data[0:i]), nil
    }
    if atEOF {
            return len(data), dropCR(data), nil
    }
    return 0, nil, nil
}

But I get an error on the IndexAny call: "syntax error: unexpected semicolon or newline, expecting )" - Fixed that

Update: Fixed the syntax error above as suggested, but I only get the first line returned. I am reading the file as follows:

scanner.Split(MyScanLines)
scanner.Scan()
fmt.Println(scanner.Text())

Any suggestions?

Example of test file I am trying to read:

Name = "John"
Surname = "Smith"
Val1 = 700
Val2 = 800

Name = "Pete"
Surname = "Jones"
Val1 = 555
Val2 = 666
Val3 = 444

 .
 .
 .
4
Please provide sample of the file that you are trying to read.Prashant Thakkar
@PrashantThakkar Example provided in original post now. Some Value Pairs may be in one record and not in the others and order is also not fixed.Kosie
Thanks, For the error that you are getting it clearly says that ")" is missing. Corrected: if i := bytes.IndexAny(data, "\n\n"); i >= 0 {Prashant Thakkar
@PrashantThakkar Ah no!! I have been staring at that code and did not pick it up. Ugh. Thanks for that. Is the way I am doing it the recommended way?Kosie

4 Answers

2
votes

Here is an alternate approach to do the same, using bufio.Reader. The logic is almost similar to Elwiner's answer.

myReadLine function below uses bufio.Reader to read the next multiline entry in the file.

func myReadLine(file *os.File, reader *bufio.Reader) (lines []string, err error){
  for {
    line, _, err := reader.ReadLine()
    if err != nil || len(line) == 0 {
      break
    }
    lines = append(lines, string(line))
  }
  return lines, err
}

The below code sample illustrates sample usage of the above function:

reader := bufio.NewReader(file)
for {
    lines, err := myReadLine(file, reader)
    if err != nil || len(lines) == 0 { 
        break 
    }
    fmt.Println(lines)
}
2
votes

You way is working, but I would advise you to use a bufio.Scanner, which defaults to scanning line by line. Then, you just start reading your file line by line and populating your struct. When encountering a blank line, put your struct into a slice and start with a new one.

Here is an example taken from one of my open source projects that demonstrate it:

buffer := [][]string{}
block := []string{}
scanner := bufio.NewScanner(strings.NewReader(data))
for scanner.Scan() {
    l := scanner.Text()

    if len(strings.TrimSpace(l)) != 0 {
        block = append(block, l)
        continue
    }

    // At this point, the script has reached an empty line,
    // which means the block is ready to be processed.
    // If the block is not empty, append it to the buffer and empty it.
    if len(block) != 0 {
        buffer = append(buffer, block)
        block = []string{}
    }
}

if len(block) != 0 {
    buffer = append(buffer, block)
}
1
votes

Broken out. First understand scanning and make sure that is working:

package main

import (
    "bufio"
    "fmt"
    "strings"
)

func main() {
    scanner := bufio.NewScanner(strings.NewReader(data))
    for scanner.Scan() {
        l := scanner.Text()
        fmt.Println(l)

    }

}

var data = `
Name = "John"
Surname = "Smith"
Val1 = 700
Val2 = 800

Name = "Pete"
Surname = "Jones"
Val1 = 555
Val2 = 666
Val3 = 444
`

Here is the code on the Go playground.

Next, gather the data you need into a slice. There is probably a way to check end of file, EOF, but I wasn't able to find it. This is what I came up with and this works:

package main

import (
    "bufio"
    "fmt"
    "strings"
)

func main() {
    buffer := [][]string{}
    block := []string{}
    scanner := bufio.NewScanner(strings.NewReader(data))
    for scanner.Scan() {
        l := scanner.Text()

        if len(l) != 0 {
            block = append(block, l)
            continue
        }

        if len(l) == 0 && len(block) != 0 {
            buffer = append(buffer, block)
            block = []string{}
            continue
        }

        if len(l) == 0 {
            block = []string{}
            continue
        }

    }

    if len(block) != 0 {
        buffer = append(buffer, block)
        block = []string{}
    }

    fmt.Println("PRINTING BUFFER - END OF PROGRAM - ALL DATA PROCESSED:", buffer)

}

var data = `
Name = "John"
Surname = "Smith"
Val1 = 700
Val2 = 800

Name = "Pete"
Surname = "Jones"
Val1 = 555
Val2 = 666
Val3 = 444
`

Here is the code on the playground.

1
votes

bufio.Scan() returns false on EOF. We will return a second 'ok' argument, so our caller can tell if we have hit the end of our input.

Best to accumulate our record in a slice of strings, and concatenate at the end. The obvious way of appending each line in turn to the result string will work, but is O(n^2) in the number of lines.

Putting it all together:

func ReadBlock(scanner *bufio.Scanner) (string, bool) {
    var o []string
    if scanner.Scan() == false {
        return "", false
    }

    for len(scanner.Text()) > 0 {
        o = append(o, scanner.Text())
        if scanner.Scan() == false {
            break
        }
    }
    return strings.Join(o, " "), true
}

https://play.golang.org/p/C_fB8iaYJo

p.s. looking at your input, I suspect you would want to return the result as a map rather than a concatenated string.