4
votes

I have an issue trying to create a Table from a datastore backup file (in cloud storage)

The issue occurs with a specific entity kind. I have about 20 entity kinds in my app engine datastore. If I create a database backup for all the entity kinds, I have no issues importing into bigquery almost all the corresponding {EntityName}.backup_info files, either using the Bigquery UI (create table functionality) or using the API via this nice pyton package https://github.com/tylertreat/BigQuery-Python

The issue arises with the backup for a specific entity kind. WHen I try to import it via either the UI or the API I get the following Job load error

Field {field_name} already exists in schema

I'm at odds trying to solve this issue. I'll be happy to provide the url of the .backup_info file in cloud storage and give permission to someone at google in order to troubleshoot this problem

Further investigation:

I see in the datastore statistics (Breakdown by Property and Composite Indexes) that the property Credit, which is the one that is producing the error, appears twice in the Schema

Credit  Key 35.23 KB    173.94 KB   0 
Credit  NULL    501.34 KB   6.77 MB 2

The model, once upon a time, had a key property but it was removed, so presumably there might be some entities with that property. The other property Credit NULL don't know what is about.

It seems that the root of the problem is that I need to fix that, but it isn't clear how. I could re-save all the entities (about 50K) of this kind (with a map phase of a map reduce job). Is this a possible solution? I don't see a way to change the "schema", it seems to be generated automatically.

2

2 Answers

3
votes

This is a known BigQuery issue in the dataastore schema translation. We're working on a fix. Unfortunately, I don't know of a workaround, other than to move or rename the entities that cause the conflict.

0
votes

Problem solved by running a map job resaving all the entities of this kind and setting a value for the problematic property, my_property_name = None

After that, the import job to bigquery worked!