96 image chips
How to export thousands of image chips from Earth Engine in a few minutes¶
This source code of this notebook was adopted from the Medium post - Fast(er) Downloads by Noel Gorelick. Credits to Noel.
Due to the limitation of the multiprocessing package, the functionality of this notebook can only be run in the top-level. Therefore, it could not be implemented as a function under geemap.
Install packages¶
Uncomment the following line to install the required packages.
# !pip install geemap retry
Import libraries¶
import ee
import geemap
import logging
import multiprocessing
import os
import requests
import shutil
from retry import retry
--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) Input In [2], in <cell line: 8>() 6 import requests 7 import shutil ----> 8 from retry import retry ModuleNotFoundError: No module named 'retry'
Initialize GEE to use the high-volume endpoint¶
ee.Initialize(opt_url='https://earthengine-highvolume.googleapis.com')
Create an interactive map¶
Map = geemap.Map()
Map
Define the Region of Interest (ROI)¶
You can use the drawing tools on the map to draw an ROI, then you can use Map.user_roi
to retrieve the geometry. Alternatively, you can define the ROI as an ee.Geometry as shown below.
# region = Map.user_roi
region = ee.Geometry.Polygon(
[
[
[-122.513695, 37.707998],
[-122.513695, 37.804359],
[-122.371902, 37.804359],
[-122.371902, 37.707998],
[-122.513695, 37.707998],
]
],
None,
False,
)
Define the image source¶
Using the 1-m NAIP imagery.
image = (
ee.ImageCollection('USDA/NAIP/DOQQ')
.filterBounds(region)
.filterDate('2020', '2021')
.mosaic()
.clip(region)
.select('N', 'R', 'G')
)
Using the 10-m Sentinel-2 imagery.
# image = ee.ImageCollection('COPERNICUS/S2_SR') \
# .filterBounds(region) \
# .filterDate('2021', '2022') \
# .select('B8', 'B4', 'B3') \
# .median() \
# .visualize(min=0, max=4000) \
# .clip(region)
Set parameters¶
If you want the exported images to have coordinate system, change format
to GEO_TIFF
. Otherwise, you can use png
or jpg
formats.
params = {
'count': 100, # How many image chips to export
'buffer': 127, # The buffer distance (m) around each point
'scale': 100, # The scale to do stratified sampling
'seed': 1, # A randomization seed to use for subsampling.
'dimensions': '256x256', # The dimension of each image chip
'format': "png", # The output image format, can be png, jpg, ZIPPED_GEO_TIFF, GEO_TIFF, NPY
'prefix': 'tile_', # The filename prefix
'processes': 25, # How many processes to used for parallel processing
'out_dir': '.', # The output directory. Default to the current working directly
}
Add layers to map¶
Map.addLayer(image, {}, "Image")
Map.addLayer(region, {}, "ROI", False)
Map.setCenter(-122.4415, 37.7555, 12)
Map
Generate a list of work items¶
In the example, we are going to generate 1000 points using the stratified random sampling, which requires a classBand
. It is the name of the band containing the classes to use for stratification. If unspecified, the first band of the input image is used. Therefore, we have toADD a new band with a constant value (e.g., 1) to the image. The result of the getRequests()
function returns a list of dictionaries containing points.
def getRequests():
img = ee.Image(1).rename("Class").addBands(image)
points = img.stratifiedSample(
numPoints=params['count'],
region=region,
scale=params['scale'],
seed=params['seed'],
geometries=True,
)
Map.data = points
return points.aggregate_array('.geo').getInfo()
Create a function for downloading image¶
The getResult()
function then takes one of those points and generates an image centered on that location, which is then downloaded as a PNG and saved to a file. This function uses image.getThumbURL()
to select the pixels, however you could also use image.getDownloadURL()
if you wanted the output to be in GeoTIFF or NumPy format (source).
@retry(tries=10, delay=1, backoff=2)
def getResult(index, point):
point = ee.Geometry.Point(point['coordinates'])
region = point.buffer(params['buffer']).bounds()
if params['format'] in ['png', 'jpg']:
url = image.getThumbURL(
{
'region': region,
'dimensions': params['dimensions'],
'format': params['format'],
}
)
else:
url = image.getDownloadURL(
{
'region': region,
'dimensions': params['dimensions'],
'format': params['format'],
}
)
if params['format'] == "GEO_TIFF":
ext = 'tif'
else:
ext = params['format']
r = requests.get(url, stream=True)
if r.status_code != 200:
r.raise_for_status()
out_dir = os.path.abspath(params['out_dir'])
basename = str(index).zfill(len(str(params['count'])))
filename = f"{out_dir}/{params['prefix']}{basename}.{ext}"
with open(filename, 'wb') as out_file:
shutil.copyfileobj(r.raw, out_file)
print("Done: ", basename)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) Input In [11], in <cell line: 1>() ----> 1 @retry(tries=10, delay=1, backoff=2) 2 def getResult(index, point): 3 point = ee.Geometry.Point(point['coordinates']) 4 region = point.buffer(params['buffer']).bounds() NameError: name 'retry' is not defined
Download images¶
%%time
logging.basicConfig()
items = getRequests()
pool = multiprocessing.Pool(params['processes'])
pool.starmap(getResult, enumerate(items))
pool.close()
--------------------------------------------------------------------------- NameError Traceback (most recent call last) File <timed exec>:5, in <module> NameError: name 'getResult' is not defined
Retrieve sample points¶
Map.addLayer(Map.data, {}, "Sample points")
Map