When creating a backup in Datastore using the Datastore Admin Tools, the result will be numerous folders containing something like this:

output-0  
output-1  
output-2  
output-3  
output-4  
output-5  
output-6  
output-7  
output-8  

AWESOME ! Expect not so AWESOME, this was done in a Protobuf format as binary. Great for saving space, not so great for reading raw data.

Well luckily as part of the SDK there is a upload_data command that can be used, and after some digging now we can use the same functions to parse the data.

The following function will read the data from the backup files, add it to a array and save the array as JSON in data.json:

import io  
import json  
import sys

sys.path.append('/Users/johanndutoit/google-cloud-sdk/platform/google_appengine')  
from google.appengine.api.files import records  
from google.appengine.datastore import entity_pb  
from google.appengine.api import datastore

def default(obj):  
  """Default JSON serializer."""
  import calendar, datetime

  if isinstance(obj, datetime.datetime):
    if obj.utcoffset() is not None:
      obj = obj - obj.utcoffset()
    millis = int(
      calendar.timegm(obj.timetuple()) * 1000 +
      obj.microsecond / 1000
    )
    return millis
  raise TypeError('Not sure how to serialize %s' % (obj,)) 


items = []  
for fileIndex in range(0, 8):  
  raw = open('output-' + str(fileIndex), 'r')
  reader = records.RecordsReader(raw)
  for record in reader:
    entity_proto = entity_pb.EntityProto(contents=record)
    entity = datastore.Entity.FromPb(entity_proto)
    # print entity
    items.append(entity)


print "Writing " + str(len(items)) + " items to file"


f = open('data.json', 'w')  
f.write(json.dumps(items, default=default))  
f.close()

This example will work for smaller backups not going beyond 40mb. Bigger than that the code will need to be updated to insert into a database (or some storage) per entity.

Enjoy :)