Been writing a small service over at prescribe.co.za that crawls and processes data.

Tee problem is that the data is generated and my local internet connection can't get close. So I had to somehow import the database from the live datastore into the local instance.

These are the steps I followed to get a local copy of the data imported.

Considerations

Just a few considerations before heading down this road. These commands are meant to be very generic and will try to download your entire dataset.

The local Datastore was never meant to handle large amounts of data. Please look at importing only what you need, and also look at configuring MySQL as the backend for the local Datastore if the data gets bigger than a few 100 entries. As performance will start dropping by then already...

Step 1 - Enable remote_api

To use the appcfg.py tool to download and import your data, the remote_api property should be enabled on your deployed app.

To this do, add the following to your app.yaml file:

builtins:  
- remote_api: on

Which will enable the feature.

By default on the local App Engine dev server this is enabled, in case anyone was wondering...

Step 2 - The export

Make sure you have appcfg.py ready for use, and run the following command:

~/google-cloud-sdk/platform/google_appengine/appcfg.py download_data -A s~{APP_ID} --url=http://{APP_ID}.appspot.com/_ah/remote_api/ --filename=data.csv

Replacing {APP_ID} with your own app ID off course :)

appcfg.py is only available globally if installed using the Google App Engine SDK. If you have gcloud (like I do too) you'll need to reference it directly. In my case: '~/google-cloud-sdk/platform/google_appengine/appcfg.py'.

This will start a download of your data to the local .csv file and this could take quite a while depending on your dataset size.

You should see something along the lines of:

07:40 AM Downloading data records.
[INFO ] Logging to bulkloader-log-20161231.074046 [INFO ] Throttling transfers: [INFO ] Bandwidth: 250000 bytes/second [INFO ] HTTP connections: 8/second [INFO ] Entities inserted/fetched/modified: 20/second [INFO ] Batch Size: 10 bulkloader-progress-20161231.074046.sql3
[INFO ] Opening database: bulkloader-progress-20161231.074046.sql3 bulkloader-results-20161231.074046.sql3
[INFO ] Opening database: bulkloader-results-20161231.074046.sql3 2016-12-31 07:40:46,743 INFO client.py:546 Attempting refresh to obtain initial access_token

### Step 3 - The import

Right, after all that you should have a `data.csv` file in your directory.

Replace both {APP_ID} and {API_PORT} with your local variables. The {API_PORT} variable is you'll see when the local dev server is started in the output. It should look something like:
INFO Starting API server at: http://localhost:60888
INFO Starting module "default" running at: http://localhost:8080
INFO admin_server.py:116] Starting admin server at: http://localhost:8000
...

Look for the line with ```Starting API server at``` and copy that url, in my case (from the output above) - `http://localhost:60888`.

> If you received a error about authentication the following snippet can be added to `appengine_config.py` which solved the issue for a few people: `if os.environ.get('SERVER_SOFTWARE', '').lower().startswith('development'): remoteapi_CUSTOM_ENVIRONMENT_AUTHENTICATION = ('REMOTE_ADDR', ['127.0.0.1'])`.

After which you should see the following output and you're data will start appearing in your local datastore:

08:10 AM Application: dev~{APPID} (was: {APPID})
08:10 AM Uploading data records.
[INFO ] Logging to bulkloader-log-20161231.081004 [INFO ] Throttling transfers: [INFO ] Bandwidth: 250000 bytes/second [INFO ] HTTP connections: 8/second [INFO ] Entities inserted/fetched/modified: 20/second [INFO ] Batch Size: 10 bulkloader-progress-20161231.081004.sql3
[INFO ] Opening database: bulkloader-progress-20161231.081004.sql3 [INFO ] Connecting to localhost:59762/ah/remoteapi/ [INFO ] Starting import; maximum 10 entities per post ```

Fin

After running your data should now be available locally for your use in development.

Let me know in the comments if I missed anything.