AlphaEarth: A Peek into the Potential of Geospatial Satelitte Embeddings

Article
Author

Alice Heiman

Published

August 9, 2025

AlphaEarth Embeddings of a Point in Brazil, State of Mato Grosso

Why AlphaEarth?

For millenia, humans have stared into the night sky curious about space - but quite recently we’ve gotten the tools to look back onto the Earth.

However, the amount of data has become a challenge. In the 1970’s, the Landsat 1-3 satelittes on-board storage capacity was 3.75 GB per orbit. Today, Landsat 7 and 8 collect 1,200 scenes, equivalent to 1 TB of data, every 24 hours1, adding 3 TB of landsat products2.

AlphaEarth Foundations is Google DeepMind’s latest geospatial foundational AI model trained to assimilate thousands of image bands

The model is trained on 3 billion individual image frames sampled from over 5 million locations globally3. Some of the data bands include

  • Optical and thermal imagery
  • Radar data
  • 3D surface measurements
  • Climate properties
  • Gravity fields
  • Geo-located descriptive text
  • …and more

Through training, the model can compress all of this data into 64 numbers for each 10mx10m location, ready for downstream analysis.

The Google DeepMind team open released the Satellite Embedding V1 Dataset, which contains annual 64-dimensional embeddings between 2017-2024 at a 10m resolution4.

What are Satellite Embeddings?

In essence, the thousands of data bands are compressed into 64 bands that together is called a satellite embeddings.

The 64 bands of the satellite embedding visualized for a point in Brazil.

You can think of the embeddings as new satellite embedding coordinates that locates each point on Earth on a 64-dimensional sphere. These embeddings are constructed such that similar points (such as locations with solar panels) have embeddings pointing to locations close together on the sphere, while different points (such as solar panels vs. forest) have embeddings that point to different locations on the sphere.

Similar locations have satellite embeddings close together, while different locations have embeddings that point in different directions.

What’s special is that the embeddings capture similarities and difference both across space and time.

This means that it is possible to track changes in the same patch of land just by looking at the evolution of the embedding.

The same geographical point can get different satellite embeddings over time, making it possible to track changes over time.

The indication of similarity between two spatio-temporal locations is then the angle between the satellite embedding coordinates.

A popular similarity metric is the cosine similarity score, taking the cosine of the angle between the vectors.

\[ \cos(\theta) = \frac{\mathbf{a} \cdot \mathbf{b}}{|\mathbf{a}||\mathbf{b}|} \]

The embedding coordinates in the AlphaEarth Foundation Satellite Embeddings dataset have already been normalized to have unit length (a length of one). Therefore, the cosine similarity simplifies to just the dot product between the embedding vectors. This is very efficient to compute.

\[ \cos(\theta) = \mathbf{a} \cdot \mathbf{b} \]

Visualizing the cosine similarity, it is possible to highlight areas that are either very similar or very different.

The dot product between two satellite embedding vector coordinates measures the similarity between them.

What can we use Satellite Embeddings for?

Since the satellite embeddings contain a combination of thousands of data layers, they contain very rich semantic information about the relationship between locations on Earth.

In fact, AlphaEarth achieves state of the art results on many benchmarks, with the authors repororting on average 24% lower error rates compared to other models tested5.

Some downstream tasks highlighted by the authors include similarity search, unsupervised clustering, and supervised classification3.

AlphaEarth Foundations and the Satellite Embedding Dataset enable several downstream tasks, such as similarity search, unsupervised clustering, and supervised classification

Moreover, what makes these embeddings so powerful is that they support continuous time6.

Traditional data products are discrete in time due to satellite only passing over a patch of land at discrete points in time. But AlphaEarth is trained to interpolate between these.

These tasks can help with applications such as tracking deforestation and urban expansion, categorizing agricultural lands to aid in food security, and help model water resources.

What are the limitations?

The Satellite Embedding dataset still has its fair share of limitations, including:

  • Less interpretable features: The new 64 bands don’t have a direct physical meaning. This means it can be hard to interpret predictions and feature importance.
  • Only 2017-2024: The V1 dataset only goes back to 2017, limiting studies going further back in time.
  • Mostly land-focused: Many original data sources have limited data over the ocean, also impacting the embeddings.
  • Limited coverage at the poles: The dataset also has limited quality around the poles.

Resources

If you are curious to learn more, the Google DeepMind team has curated a collection of excellent resources and tutorials.

Moreover, the Google DeepMind team is currenlty offering a series of small grants to researchers using the Satellite Embedding dataset.

Conclusion

The AlphaEarth Foundations geospatial foundation model marks a pivotal moment for geospatial analysis.

I am very excited to explore the applications of this model!

Take care! 🥳

References

1
Laura E.P. Rocchio and Jon Campbell. Imaging the Past Landsat Science. 2016.
2
Landsat Project Statistics U.S. Geological Survey. 2018.
3
Earth G. AI-powered pixels: Introducing Google’s Satellite Embedding dataset. Google Earth and Earth Engine. 2025.
4
Google Earth Engine Google DeepMind. Satellite Embedding V1 Earth Engine Data Catalog. Google for Developers. 2025.
5
AlphaEarth Foundations helps map our planet in unprecedented detail. Google DeepMind. 2025.
6
Brown CF, Kazmierski MR, Pasquarella VJ, Rucklidge WJ, Samsikova M, Zhang C et al. AlphaEarth Foundations: An embedding field model for accurate and efficient global mapping from sparse label data. 2025. doi:10.48550/arXiv.2507.22291.

Citation

BibTeX citation:
@online{heiman2025,
  author = {Heiman, Alice},
  title = {AlphaEarth: {A} {Peek} into the {Potential} of {Geospatial}
    {Satelitte} {Embeddings}},
  date = {2025-08-09},
  url = {https://aliceheiman.github.io/posts/alphaearth-intro},
  langid = {en}
}
For attribution, please cite this work as:
Heiman A. AlphaEarth: A Peek into the Potential of Geospatial Satelitte Embeddings. 2025.https://aliceheiman.github.io/posts/alphaearth-intro.