Skip to content
WOODFINE CAPITAL PROJECTS

Regional Name Resolution Architecture

Topic

From the Woodfine Projects

Each co-location cluster is labelled with a human-readable regional name — a North American Metropolitan Area, a European NUTS-3 region, a Mexican municipio, a Canadian Census Subdivision. That name is the output of a layered offline reverse-geocoding pipeline that draws from five open boundary datasets without requiring external API calls.

Updated 2026-05-25 · History
customer-woodfine

The co-location map labels each cluster with a human-readable regional name — a North American Metropolitan Area, a European NUTS-3 region, a Mexican municipio, a Canadian Census Subdivision. The name is not a single field on the source data; it is the output of a layered offline reverse-geocoding pipeline. This article documents the data sources, the lookup order, and the post-processing that produces the names visible on the platform; the cluster itself is produced by the deterministic ranking system after deduplication.

[edit]The Five Boundary Layers

Each cluster anchor's coordinates are tested against five boundary datasets in a country-specific order:

Layer Source Coverage Granularity
us_cbsa.geojson US Census Bureau TIGER GENZ2023 United States Core-Based Statistical Areas (Metro + Micropolitan)
ca_cma.geojson Statistics Canada 2021 Census Canada Census Metropolitan Areas
ca_csd.geojson GADM 4.1 admin-3 (UC Davis Open Data) Canada Census Subdivision proxies (municipalities)
mx_municipio.geojson GADM 4.1 admin-2 (UC Davis Open Data) Mexico Municipios
eu_nuts3.geojson Eurostat GISCO 2021 EU + UK + EFTA + Western Balkans NUTS-3 regions
fallback_ne_admin1.geojson Natural Earth 10m Global Admin-1 (states / provinces)

All files load once at engine initialisation. Spatial indexes accelerate point-in-polygon lookups to O(log N) per query.

[edit]Country-Specific Routing

The engine routes each cluster's anchor coordinates by ISO country code:

  • United States: CBSA lookup. If a match is found, the CBSA name is formatted (state suffix stripped, "Metro Area" appended if absent).
  • Canada: Census Subdivision lookup first (admin-3). When both a Census Subdivision and the surrounding Census Metropolitan Area match and differ, the result is composed: "Strathcona County, Edmonton". When only one matches, that name is returned alone.
  • Mexico: Municipio lookup (admin-2). On a match, the municipio name is returned with Spanish-text post-processing applied. On a miss, the engine falls through to the Natural Earth state-level fallback.
  • European Union, United Kingdom, EFTA, Western Balkans: NUTS-3 lookup.
  • Fallback: Natural Earth admin-1 for any country not covered by the layered files. Returns state or province names.

Each layer has a tolerance built into its spatial query: when a point falls just outside any polygon — for instance, a coastal store on a fjord edge — the engine accepts the nearest polygon within approximately 15 km. This prevents legitimate stores in coastal configurations from falling through to the fallback layer.

[edit]Post-Processing the Raw Names

Boundary files carry source-language names with concatenated affixes that are not human-readable. Three transformations clean them.

CamelCase splitter. GADM 4.1 admin-2 and admin-3 names are stored without word separators. "StrathconaCounty" becomes "Strathcona County".

Spanish preposition splitter. Mexican municipio names occasionally carry preposition concatenation: "Bocadel Río", "Apetatitlánde Antonio Carvajal". A regular expression detects the prepositions de, del, la, las, el, los glued to a preceding lowercase character and inserts a space before the preposition.

Period normaliser. "Gustavo A.Madero" is normalised to "Gustavo A. Madero".

A separate explicit-override dictionary handles cases that fall outside the regular-expression scope: Greek names transliterated to English, Finnish suffix simplifications, Polish prefix stripping, Belgian bilingual name normalisation. This dictionary held approximately 200 entries as of mid-2026.

[edit]Mexican Display Overrides

Some Mexican municipio names are technically correct but not the form a Spanish-speaking reader expects on a map. A small display-override dictionary maps INEGI Zona Metropolitana names to their common short forms — "Zona Metropolitana del Valle de México" becomes "Ciudad de México", "Zona Metropolitana de Guadalajara" becomes "Guadalajara".

[edit]Scale

After the layered routing and post-processing, the engine produces approximately 1,200 unique region names across the operational footprint as of May 2026. By country: 671 distinct US Metropolitan Areas, 245 Canadian regions (Census Subdivisions and Census Metropolitan Areas combined), 104 Mexican Municipios, and several hundred European NUTS-3 regions. Each region name appears on the map in cluster pop-ups and the inspector panel.

[edit]See Also

Edit this page · View source