# Initialize Earth Engine
ee.Authenticate()
ee.Initialize(project='hs-brazilreforestation')extract
extract_categorical and extract_continuous for pulling stats from any raster.
The Core Primitive
The heart of gee-polygons is one powerful primitive:
site.extract_categorical(layer, years=[2018, 2019, 2020])Given: - A polygon (Site) - A categorical raster descriptor (CategoricalLayer) - A list of years
It returns a tidy DataFrame with pixel counts and areas per class, per year.
This function knows nothing about MapBiomas, deforestation, or Brazil. It’s pure geometry + categorical values + time.
Site.extract_categorical
def extract_categorical(
layer:CategoricalLayer, years:list, max_pixels:int=1000000000
)->DataFrame:
Extract per-year pixel counts and areas from a categorical raster.
Supports two temporal modes: - ‘band’: Each year is a band in a single Image (e.g., MapBiomas) - ‘collection’: An ImageCollection filtered by date (e.g., Dynamic World)
Args: layer: A CategoricalLayer describing the data source years: List of years to extract max_pixels: Maximum pixels for reduction (default 1e9)
Returns: A tidy DataFrame with columns: - site_id: Site identifier - year: Year of observation - class_value: Integer class value - count: Pixel count - area_ha: Area in hectares - class_name: Human-readable name (if available in layer)
Example Usage
Let’s test with a simple example. First, we need to initialize GEE and load a site.
from gee_polygons.site import load_sites
sites = load_sites('../data/restoration_sites_subset.geojson')
site = sites[8]
print(site)Site(id=9368, start_year=2012)
# Define a layer (or use a preset from gee_polygons.datasets)
layer = CategoricalLayer(
asset_id='projects/mapbiomas-public/assets/brazil/lulc/collection10/mapbiomas_brazil_collection10_coverage_v2',
band_pattern='classification_{}',
scale=30
)
# Extract stats
df = site.extract_categorical(layer, years=[2012, 2013, 2014, 2015])
df.head(10)| site_id | year | class_value | count | area_ha | class_name | |
|---|---|---|---|---|---|---|
| 0 | 9368 | 2012 | 21 | 147.211765 | 13.249059 | None |
| 1 | 9368 | 2012 | 3 | 9.196078 | 0.827647 | None |
| 2 | 9368 | 2013 | 21 | 99.458824 | 8.951294 | None |
| 3 | 9368 | 2013 | 3 | 56.949020 | 5.125412 | None |
| 4 | 9368 | 2014 | 21 | 86.427451 | 7.778471 | None |
| 5 | 9368 | 2014 | 3 | 69.980392 | 6.298235 | None |
| 6 | 9368 | 2015 | 3 | 156.407843 | 14.076706 | None |
The output is always a tidy DataFrame:
| site_id | year | class_value | count | area_ha | class_name |
|---|---|---|---|---|---|
| 3107 | 2020 | 3 | 1520 | 136.8 | None |
| 3107 | 2020 | 5 | 203 | 18.3 | None |
| 3107 | 2021 | 3 | 1701 | 153.1 | None |
If the CategoricalLayer has a class_map, the class_name column will be populated.
We can also try with a pre-set dataset.
from gee_polygons.datasets.mapbiomas import MAPBIOMAS_DEFREG
df = site.extract_categorical(MAPBIOMAS_DEFREG, years=range(2012, 2015))
df.head(10)| site_id | year | class_value | count | area_ha | class_name | |
|---|---|---|---|---|---|---|
| 0 | 9368 | 2012 | 1 | 147.211765 | 13.249059 | Anthropic |
| 1 | 9368 | 2012 | 2 | 0.462745 | 0.041647 | Primary Vegetation |
| 2 | 9368 | 2012 | 3 | 8.733333 | 0.786000 | Secondary Vegetation |
| 3 | 9368 | 2013 | 1 | 99.458824 | 8.951294 | Anthropic |
| 4 | 9368 | 2013 | 2 | 0.462745 | 0.041647 | Primary Vegetation |
| 5 | 9368 | 2013 | 3 | 8.733333 | 0.786000 | Secondary Vegetation |
| 6 | 9368 | 2013 | 5 | 47.752941 | 4.297765 | Secondary Veg Regrowth |
| 7 | 9368 | 2014 | 1 | 86.427451 | 7.778471 | Anthropic |
| 8 | 9368 | 2014 | 2 | 0.462745 | 0.041647 | Primary Vegetation |
| 9 | 9368 | 2014 | 3 | 56.486275 | 5.083765 | Secondary Vegetation |
Visualizing Layers
Before extracting stats, it’s useful to visually verify the data. The show_layer method displays the categorical raster for specified years with the site polygon overlaid.
Site.show_layer
def show_layer(
layer:CategoricalLayer, years:list, zoom:int=14, basemap:str='SATELLITE', site_color:str='blue',
buffer_m:Optional=None
)->Map:
Display a categorical layer for multiple years with the site overlaid.
Supports both temporal modes: - ‘band’: Shows each year’s band from a single Image - ‘collection’: Shows mode-reduced composite per year
Args: layer: A CategoricalLayer to visualize years: List of years to add as layers zoom: Initial zoom level (default 14) basemap: Basemap type (default ‘SATELLITE’) site_color: Color for site boundary (default ‘blue’) buffer_m: Optional buffer around site for clipping display
Returns: A geemap.Map with yearly classification layers
# Example: Visualize the layer with a buffer around the site
site.show_layer(MAPBIOMAS_DEFREG, years=range(2010, 2018), buffer_m=500)Continuous Extraction
For continuous data (NDVI, EVI, temperature, etc.), use extract_continuous. It supports: - Multi-band extraction: Get multiple bands in one call - Preprocessing hooks: Dataset-specific logic (cloud masking, index computation) via layer.preprocess - Temporal aggregation: 'all' (per-image), 'monthly', or 'yearly'
The function is completely dataset-agnostic — all dataset-specific logic lives in the preprocess function defined in the layer.
Site.extract_continuous
def extract_continuous(
layer:ContinuousLayer, start_date:str, end_date:str, reducer:Literal='mean', frequency:Literal='all',
max_pixels:int=1000000000
)->DataFrame:
Extract continuous raster statistics over time for a site.
Completely dataset-agnostic. All preprocessing (cloud masking, index computation, scaling) is handled by the layer’s preprocess function.
Args: layer: ContinuousLayer with bands to extract and optional preprocess start_date: Start date (YYYY-MM-DD) end_date: End date (YYYY-MM-DD) reducer: Spatial aggregation (‘mean’, ‘median’, ‘min’, ‘max’) frequency: Temporal output (‘all’, ‘monthly’, ‘yearly’) max_pixels: Maximum pixels for reduction
Returns: DataFrame with columns: site_id, date/year/month, and one column per band
Example: Continuous Extraction
Using Sentinel-2 NDVI/EVI with preprocessing defined in the dataset module:
# Example with Sentinel-2 NDVI/EVI (requires updated datasets/sentinel2.py)
from gee_polygons.datasets.sentinel2 import SENTINEL2_NDVI_EVI
# Yearly NDVI/EVI summary
df = site.extract_continuous(
SENTINEL2_NDVI_EVI,
start_date='2018-01-01',
end_date='2020-12-31',
reducer='median',
frequency='all'
)
df.head()| site_id | date | NDVI | EVI | |
|---|---|---|---|---|
| 0 | 9368 | 2018-01-09 | 0.806691 | 0.591811 |
| 1 | 9368 | 2018-02-08 | 0.884718 | 0.589342 |
| 2 | 9368 | 2018-12-15 | 0.885487 | 0.596724 |
| 3 | 9368 | 2018-12-20 | 0.870644 | 0.592598 |
| 4 | 9368 | 2018-12-25 | 0.491157 | 0.659755 |
df.head(10)| site_id | date | NDVI | EVI | |
|---|---|---|---|---|
| 0 | 9368 | 2018-01-09 | 0.806691 | 0.591811 |
| 1 | 9368 | 2018-02-08 | 0.884718 | 0.589342 |
| 2 | 9368 | 2018-12-15 | 0.885487 | 0.596724 |
| 3 | 9368 | 2018-12-20 | 0.870644 | 0.592598 |
| 4 | 9368 | 2018-12-25 | 0.491157 | 0.659755 |
| 5 | 9368 | 2018-12-30 | 0.873436 | 0.623111 |
| 6 | 9368 | 2019-01-04 | 0.846921 | 0.589451 |
| 7 | 9368 | 2019-01-09 | 0.892792 | 0.600742 |
| 8 | 9368 | 2019-01-14 | 0.893780 | 0.612306 |
| 9 | 9368 | 2019-01-19 | 0.338589 | 0.800424 |
| 10 | 9368 | 2019-01-24 | 0.890900 | 0.611940 |
| 11 | 9368 | 2019-01-29 | 0.895227 | 0.604959 |
| 12 | 9368 | 2019-02-03 | 0.852337 | 0.603106 |
| 13 | 9368 | 2019-02-08 | 0.454743 | 0.589876 |
| 14 | 9368 | 2019-02-13 | -0.013164 | -0.501578 |
| 15 | 9368 | 2019-02-18 | 0.814663 | 0.732076 |
| 16 | 9368 | 2019-02-23 | 0.919677 | 0.455631 |
| 17 | 9368 | 2019-02-28 | 0.720851 | 0.546486 |
| 18 | 9368 | 2019-03-05 | 0.801259 | 0.609721 |
| 19 | 9368 | 2019-03-10 | 0.876676 | 0.613873 |