Extract/Convert Google Datastore Backups to JSON
Posted on 2017/01/23
When creating a backup in Datastore using the Datastore Admin Tools, the result will be numerous folders containing something like this:
output-0
output-1
output-2
output-3
output-4
output-5
output-6
output-7
output-8
AWESOME ! Expect not so AWESOME, this was done in a Protobuf format as binary. Great for saving space, not so great for reading raw data.
Well luckily as part of the SDK there is a upload_data
command that can be used, and after some digging now we can use the same functions to parse the data.
The following function will read the data from the backup files, add it to a array and save the array as JSON in data.json
:
import io
import json
import sys
sys.path.append('/Users/johanndutoit/google-cloud-sdk/platform/google_appengine')
from google.appengine.api.files import records
from google.appengine.datastore import entity_pb
from google.appengine.api import datastore
def default(obj):
"""Default JSON serializer."""
import calendar, datetime
if isinstance(obj, datetime.datetime):
if obj.utcoffset() is not None:
obj = obj - obj.utcoffset()
millis = int(
calendar.timegm(obj.timetuple()) * 1000 +
obj.microsecond / 1000
)
return millis
raise TypeError('Not sure how to serialize %s' % (obj,))
items = []
for fileIndex in range(0, 8):
raw = open('output-' + str(fileIndex), 'r')
reader = records.RecordsReader(raw)
for record in reader:
entity_proto = entity_pb.EntityProto(contents=record)
entity = datastore.Entity.FromPb(entity_proto)
# print entity
items.append(entity)
print "Writing " + str(len(items)) + " items to file"
f = open('data.json', 'w')
f.write(json.dumps(items, default=default))
f.close()
This example will work for smaller backups not going beyond 40mb. Bigger than that the code will need to be updated to insert into a database (or some storage) per entity.
Enjoy :)
What's currently keeping me busy

Testing and keeping websites safe

Tech/product of new incubating startups

Advocate and educate on the Google Cloud

Easy prescribed book management

Loadshedding being constantly updated and watched

Secret management for PAAS

National microchip database

Youtube channel of edited meetup talks

Gaming Youtube Channel