extract

The extraction engine: extract_categorical and extract_continuous for pulling stats from any raster.

The Core Primitive

The heart of gee-polygons is one powerful primitive:

site.extract_categorical(layer, years=[2018, 2019, 2020])

Given: - A polygon (Site) - A categorical raster descriptor (CategoricalLayer) - A list of years

It returns a tidy DataFrame with pixel counts and areas per class, per year.

This function knows nothing about MapBiomas, deforestation, or Brazil. It’s pure geometry + categorical values + time.

source

Site.extract_categorical


def extract_categorical(
    layer:CategoricalLayer, years:list, max_pixels:int=1000000000
)->DataFrame:

Extract per-year pixel counts and areas from a categorical raster.

Supports two temporal modes: - ‘band’: Each year is a band in a single Image (e.g., MapBiomas) - ‘collection’: An ImageCollection filtered by date (e.g., Dynamic World)

Args: layer: A CategoricalLayer describing the data source years: List of years to extract max_pixels: Maximum pixels for reduction (default 1e9)

Returns: A tidy DataFrame with columns: - site_id: Site identifier - year: Year of observation - class_value: Integer class value - count: Pixel count - area_ha: Area in hectares - class_name: Human-readable name (if available in layer)

Example Usage

Let’s test with a simple example. First, we need to initialize GEE and load a site.

# Initialize Earth Engine
ee.Authenticate()
ee.Initialize(project='hs-brazilreforestation')

from gee_polygons.site import load_sites
sites = load_sites('../data/restoration_sites_subset.geojson')
site = sites[8]
print(site)

Site(id=9368, start_year=2012)

# Define a layer (or use a preset from gee_polygons.datasets)
layer = CategoricalLayer(
     asset_id='projects/mapbiomas-public/assets/brazil/lulc/collection10/mapbiomas_brazil_collection10_coverage_v2',
     band_pattern='classification_{}',
     scale=30
)

# Extract stats
df = site.extract_categorical(layer, years=[2012, 2013, 2014, 2015])
df.head(10)

	site_id	year	class_value	count	area_ha	class_name
0	9368	2012	21	147.211765	13.249059	None
1	9368	2012	3	9.196078	0.827647	None
2	9368	2013	21	99.458824	8.951294	None
3	9368	2013	3	56.949020	5.125412	None
4	9368	2014	21	86.427451	7.778471	None
5	9368	2014	3	69.980392	6.298235	None
6	9368	2015	3	156.407843	14.076706	None

The output is always a tidy DataFrame:

site_id	year	class_value	count	area_ha	class_name
3107	2020	3	1520	136.8	None
3107	2020	5	203	18.3	None
3107	2021	3	1701	153.1	None

If the CategoricalLayer has a class_map, the class_name column will be populated.

We can also try with a pre-set dataset.

from gee_polygons.datasets.mapbiomas import MAPBIOMAS_DEFREG
df = site.extract_categorical(MAPBIOMAS_DEFREG, years=range(2012, 2015))
df.head(10)

	site_id	year	class_value	count	area_ha	class_name
0	9368	2012	1	147.211765	13.249059	Anthropic
1	9368	2012	2	0.462745	0.041647	Primary Vegetation
2	9368	2012	3	8.733333	0.786000	Secondary Vegetation
3	9368	2013	1	99.458824	8.951294	Anthropic
4	9368	2013	2	0.462745	0.041647	Primary Vegetation
5	9368	2013	3	8.733333	0.786000	Secondary Vegetation
6	9368	2013	5	47.752941	4.297765	Secondary Veg Regrowth
7	9368	2014	1	86.427451	7.778471	Anthropic
8	9368	2014	2	0.462745	0.041647	Primary Vegetation
9	9368	2014	3	56.486275	5.083765	Secondary Vegetation

Visualizing Layers

Before extracting stats, it’s useful to visually verify the data. The show_layer method displays the categorical raster for specified years with the site polygon overlaid.

source

Site.show_layer


def show_layer(
    layer:CategoricalLayer, years:list, zoom:int=14, basemap:str='SATELLITE', site_color:str='blue',
    buffer_m:Optional=None
)->Map:

Display a categorical layer for multiple years with the site overlaid.

Supports both temporal modes: - ‘band’: Shows each year’s band from a single Image - ‘collection’: Shows mode-reduced composite per year

Args: layer: A CategoricalLayer to visualize years: List of years to add as layers zoom: Initial zoom level (default 14) basemap: Basemap type (default ‘SATELLITE’) site_color: Color for site boundary (default ‘blue’) buffer_m: Optional buffer around site for clipping display

Returns: A geemap.Map with yearly classification layers

# Example: Visualize the layer with a buffer around the site
site.show_layer(MAPBIOMAS_DEFREG, years=range(2010, 2018), buffer_m=500)

Continuous Extraction

For continuous data (NDVI, EVI, temperature, etc.), use extract_continuous. It supports: - Multi-band extraction: Get multiple bands in one call - Preprocessing hooks: Dataset-specific logic (cloud masking, index computation) via layer.preprocess - Temporal aggregation: 'all' (per-image), 'monthly', or 'yearly'

The function is completely dataset-agnostic — all dataset-specific logic lives in the preprocess function defined in the layer.

source

Site.extract_continuous


def extract_continuous(
    layer:ContinuousLayer, start_date:str, end_date:str, reducer:Literal='mean', frequency:Literal='all',
    max_pixels:int=1000000000
)->DataFrame:

Extract continuous raster statistics over time for a site.

Completely dataset-agnostic. All preprocessing (cloud masking, index computation, scaling) is handled by the layer’s preprocess function.

Args: layer: ContinuousLayer with bands to extract and optional preprocess start_date: Start date (YYYY-MM-DD) end_date: End date (YYYY-MM-DD) reducer: Spatial aggregation (‘mean’, ‘median’, ‘min’, ‘max’) frequency: Temporal output (‘all’, ‘monthly’, ‘yearly’) max_pixels: Maximum pixels for reduction

Returns: DataFrame with columns: site_id, date/year/month, and one column per band

Example: Continuous Extraction

Using Sentinel-2 NDVI/EVI with preprocessing defined in the dataset module:

# Example with Sentinel-2 NDVI/EVI (requires updated datasets/sentinel2.py)
from gee_polygons.datasets.sentinel2 import SENTINEL2_NDVI_EVI

# Yearly NDVI/EVI summary
df = site.extract_continuous(
    SENTINEL2_NDVI_EVI,
    start_date='2018-01-01',
    end_date='2020-12-31',
    reducer='median',
    frequency='all'
)
df.head()

	site_id	date	NDVI	EVI
0	9368	2018-01-09	0.806691	0.591811
1	9368	2018-02-08	0.884718	0.589342
2	9368	2018-12-15	0.885487	0.596724
3	9368	2018-12-20	0.870644	0.592598
4	9368	2018-12-25	0.491157	0.659755

df.head(10)

	site_id	date	NDVI	EVI
0	9368	2018-01-09	0.806691	0.591811
1	9368	2018-02-08	0.884718	0.589342
2	9368	2018-12-15	0.885487	0.596724
3	9368	2018-12-20	0.870644	0.592598
4	9368	2018-12-25	0.491157	0.659755
5	9368	2018-12-30	0.873436	0.623111
6	9368	2019-01-04	0.846921	0.589451
7	9368	2019-01-09	0.892792	0.600742
8	9368	2019-01-14	0.893780	0.612306
9	9368	2019-01-19	0.338589	0.800424
10	9368	2019-01-24	0.890900	0.611940
11	9368	2019-01-29	0.895227	0.604959
12	9368	2019-02-03	0.852337	0.603106
13	9368	2019-02-08	0.454743	0.589876
14	9368	2019-02-13	-0.013164	-0.501578
15	9368	2019-02-18	0.814663	0.732076
16	9368	2019-02-23	0.919677	0.455631
17	9368	2019-02-28	0.720851	0.546486
18	9368	2019-03-05	0.801259	0.609721
19	9368	2019-03-10	0.876676	0.613873