This is my favorite way of going through a file, a simple native solution for a progressive (as in not a "slurp" or all-in-memory way) file read with modern async/await
. It's a solution that I find "natural" when processing large text files without having to resort to the readline
package or any non-core dependency.
let buf = '';
for await ( const chunk of fs.createReadStream('myfile') ) {
const lines = buf.concat(chunk).split(/\r?\n/);
buf = lines.pop();
for( const line of lines ) {
console.log(line);
}
}
if(buf.length) console.log(buf); // last line, if file does not end with newline
You can adjust encoding in the fs.createReadStream
or use chunk.toString(<arg>)
. Also this let's you better fine-tune the line splitting to your taste, ie. use .split(/\n+/)
to skip empty lines and control the chunk size with { highWaterMark: <chunkSize> }
.
Don't forget to create a function like processLine(line)
to avoid repeating the line processing code twice due to the ending buf
leftover. Unfortunately, the ReadStream
instance does not update its end-of-file flags in this setup, so there's no way, afaik, to detect within the loop that we're in the last iteration without some more verbose tricks like comparing the file size from a fs.Stats()
with .bytesRead
. Hence the final buf
processing solution, unless you're absolutely sure your file ends with a newline \n
, in which case the for await
loop should suffice.
★ If you prefer the evented asynchronous version, this would be it:
let buf = '';
fs.createReadStream('myfile')
.on('data', chunk => {
const lines = buf.concat(chunk).split(/\r?\n/);
buf = lines.pop();
for( const line of lines ) {
console.log(line);
}
})
.on('end', () => buf.length && console.log(buf) );
★ Now if you don't mind importing the stream
core package, then this is the equivalent piped stream version, which allows for chaining transforms like gzip decompression:
const { Writable } = require('stream');
let buf = '';
fs.createReadStream('myfile').pipe(
new Writable({
write: (chunk, enc, next) => {
const lines = buf.concat(chunk).split(/\r?\n/);
buf = lines.pop();
for (const line of lines) {
console.log(line);
}
next();
}
})
).on('finish', () => buf.length && console.log(buf) );
fs.readSync()
. You can read binary octets into a buffer but there's no easy way to deal with partial UTF-8 or UTF-16 characters without inspecting the buffer before translating it to JavaScript strings and scanning for EOLs. TheBuffer()
type doesn't have as rich set of functions to operate on its instances as native strings, but native strings cannot contain binary data. It seems to me that lacking a built-in way to read text lines from arbitrary filehandles is a real gap in node.js. – hippietrailif (line.length==1 && line[0] == 48) special(line);
– Thabonode
's API docs github.com/nodejs/node/pull/4609 – eljefedelrodeodeljefe