Problem:
We have tried to use Spanner runStream to migrate a part of data (the millions of data) by specific query (in select query we have used more then one table)
spannerDatabase
.runStream({
sql,
params,
json: true
})
But after running Cloud functions (256 MB size) for this - I've got Error: memory limit exceeded.
Investigation
After investigation I've found that @google-cloud/spanner use partialResultStream where maxQueued is 10
and it seems that there are not appropriate params to manage chunk size or count of chunks (each chunk response near 100 000 field values)
On the current moment we are trying to use client.executeStreamingSql method but there we have a big implementation (like preparing params Struct, parseChunk like what PartialResultStream do)
client.executeStreamingSql({
session: session.id,
sql,
paramsType,
params,
}, {
pageSize: 1000,
});
Expectation
We need to decrease the memory consumption when reading from Spanner.
UPDATED:
Here is an example of using stream in spanner (we try to write into the file)
const { Spanner } = require("@google-cloud/spanner");
const fs = require("fs");
const util = require('util');
const stream = require('stream');
const through = require('through');
const pipeline = util.promisify(stream.pipeline);
const spanner = {
"instanceId": "..",
"databaseId": "..",
"projectId": ".."
};
const filePath = "./users.data.txt";
const keyFilename = "service-account.json";
const spannerDatabase = new Spanner({
projectId: spanner.projectId,
keyFilename,
})
.instance(spanner.instanceId)
.database(spanner.databaseId);
const spannerStream = spannerDatabase.runStream({
sql: 'select * from user',
json: true,
gaxOptions: {
pageSize: 1000,
},
});
const run = async () => {
await pipeline(
spannerStream,
through(function (data) {
return this.queue(`${JSON.stringify(data)}\n`);
}),
fs.createWriteStream(filePath)
);
process.exit(0);
};
run();
The memory usage incredibly increase every time when data consuming. In Spanner user table near 100 000 rows.