I'm writing a small webapp in golang, and it involves parsing a file uploaded by the user. I'd like to auto-detect if the file is gzipped or not and create readers / scanners appropriately. One twist is that I can't read the whole file into memory, I can only operate on the stream alone. Here's what I've got:
func scannerFromFile(reader io.Reader) (*bufio.Scanner, error) {
var scanner *bufio.Scanner
//create a bufio.Reader so we can 'peek' at the first few bytes
bReader := bufio.NewReader(reader)
testBytes, err := bReader.Peek(64) //read a few bytes without consuming
if err != nil {
return nil, err
}
//Detect if the content is gzipped
contentType := http.DetectContentType(testBytes)
//If we detect gzip, then make a gzip reader, then wrap it in a scanner
if strings.Contains(contentType, "x-gzip") {
gzipReader, err := gzip.NewReader(bReader)
if (err != nil) {
return nil, err
}
scanner = bufio.NewScanner(gzipReader)
} else {
//Not gzipped, just make a scanner based on the reader
scanner = bufio.NewScanner(bReader)
}
return scanner, nil
}
This works fine for plain text, but for gzipped data it inflates incorrectly, and after a few kb I inevitably get garbled text. Is there a simpler method out there? Any ideas why after a few thousand lines it uncompresses incorrectly?
contentType == "application/x-gzip"
instead of strings.Contains. – Cerise Limón