Since there seems to be no import functionality for the Datastore emulator, you can build your own.
It's something as simple as creating two clients within your script, one for the remote (cloud) Datastore, and one for the local Datastore emulator. Since the Cloud Client Libraries support the emulator, you can dig into the code to see how to establish the connection properly.
I did exactly that for the go Cloud Client Libraries, and came up with this script:
package main
import (
"context"
"fmt"
"os"
"time"
"cloud.google.com/go/datastore"
"google.golang.org/api/iterator"
"google.golang.org/api/option"
"google.golang.org/grpc"
)
const (
projectId = "<PROJECT_ID>"
namespace = "<NAMESPACE>"
kind = "<KIND>"
emulatorHost = "<EMULATOR_HOST>:<EMULATOR_PORT>"
)
func main() {
ctx := context.Background()
// Create the Cloud Datastore client
remoteClient, err := datastore.NewClient(ctx, projectId, option.WithGRPCConnectionPool(50))
if err != nil {
fmt.Fprintf(os.Stderr, "Could not create remote datastore client: %v \n", err)
}
// Create the local Datastore Emulator client
o := []option.ClientOption{
option.WithEndpoint(emulatorHost),
option.WithoutAuthentication(),
option.WithGRPCDialOption(grpc.WithInsecure()),
option.WithGRPCConnectionPool(50),
}
localClient, err := datastore.NewClient(ctx, projectId, o...)
if err != nil {
fmt.Fprintf(os.Stderr, "Could not create local datastore client: %v \n", err)
}
// Create the query
q := datastore.NewQuery(kind).Namespace(namespace)
//Run the query and handle the received entities
start := time.Now() // This is just to calculate the rate
for it, i := remoteClient.Run(ctx, q), 1; ; i++ {
x := &arbitraryEntity{}
// Get the entity
key, err := it.Next(x)
if err == iterator.Done {
break
}
if err != nil {
fmt.Fprintf(os.Stderr, "Error retrieving entity: %v \n", err)
}
// Insert the entity into the emulator
_, err = localClient.Put(ctx, key, x)
if err != nil {
fmt.Fprintf(os.Stderr, "Error saving entity: %v \n", err)
}
// Print stats
go fmt.Fprintf(os.Stdout, "\rCopied %v entities. Rate: %v/s", i, i/int(time.Since(start).Seconds()))
}
fmt.Fprintln(os.Stdout)
}
// Declare a struct capable of handling any type of entity.
// It implements the PropertyLoadSaver interface
type arbitraryEntity struct {
properties []datastore.Property
}
func (e *arbitraryEntity) Load(ps []datastore.Property) error {
e.properties = ps
return nil
}
func (e *arbitraryEntity) Save() ([]datastore.Property, error) {
return e.properties, nil
}
With this, I'm getting a rate of ~700 entities/s, but it could change a lot depending of the entities you have.
Do not set the DATASTORE_EMULATOR_HOST
env variable, since the script is creating the connection manually to the local emulator, and you want the library to connect automatically to the Cloud Datastore.
The script could be greatly improved: both the remote and the local use GRPC, so you could use some proto-magic to avoid encoding-decoding the messages. Using batching for uploading would also help, as well as using Go's concurrency trickery. You could even get the namespaces and kinds programatically so you don't need to run this for each entity.
However, I think this simple proof of concept can help understand how you could develop your own tool to run an import.
gcloud datastore
commands to work with the local datastore emulator (see this answer) it might be possible to import the entities into the emulator. Give it a try (I didn't actually try it, I'm unsure if in that mode thegcloud
command is able to access the exported file in GCS). – Dan Cornilescu/v1/projects/<project_id>:[import|export]
. Apparently, the emulator only implements a subset of the Cloud Datastore Data API methods, and the import/export features are from the Admin API. – Jofre