urbanity.utils¶
General helper functions covering OSM data discovery, country/city lookups, graph edge construction, data splitting, and other utilities used internally across the package.
OSM Extract Discovery & Download¶
fetch_geofabrik_index¶
Downloads and caches the Geofabrik extract index (a GeoJSON-like index of all available .osm.pbf extracts with their bounding geometries).
fetch_bbbike_index¶
Builds a BBBike index of cities with their boundary geometries. Downloads the city list once, then lazily fetches .poly files per city.
Returns: List of dicts with keys: name, geometry, pbf_url.
find_smallest_extract¶
Searches both Geofabrik and BBBike to find the smallest .osm.pbf extract that fully contains the area of interest. Automatically used by get_street_network() to avoid downloading unnecessarily large files.
| Parameter | Type | Description |
|---|---|---|
aoi_geometry |
Shapely geometry | The area of interest polygon. |
data_dir |
str |
Directory to cache downloaded index files. |
Returns: Dict with keys url, name, size.
is_extract_cached¶
Returns True if a valid .osm.pbf file already exists at dest_path (and optionally matches expected_size or is consistent with url).
download_extract¶
Downloads a .osm.pbf file from url to dest_path, skipping the download if the file is already cached.
Country & City Discovery¶
get_country_centroids¶
Returns a dictionary mapping country names to (latitude, longitude) centroid coordinates. Used by Map(country=...) to centre the initial view.
get_available_countries¶
Prints the list of countries for which centroid data is available.
get_available_pop_countries¶
Prints the list of countries where Meta population data can be fetched.
get_available_precomputed_network_data¶
Prints the list of cities available from the Global Urban Network Dataset.
get_country_from_polygon¶
Determines which country a polygon is in via Nominatim reverse geocoding of the centroid, then matches the result to the Geofabrik extract name list.
get_gadm¶
get_gadm(country: str, city: str, version: str = '4.1', max_level: int = 4, level_drop: int = 0) -> GeoDataFrame
Fetches GADM administrative boundary data for a city and returns its subzone GeoDataFrame.
| Parameter | Type | Default | Description |
|---|---|---|---|
country |
str |
required | Country name. |
city |
str |
required | City name. |
version |
str |
'4.1' |
GADM database version. |
max_level |
int |
4 |
Maximum administrative level to retrieve. |
level_drop |
int |
0 |
Number of top levels to drop. |
Population Links¶
get_population_data_links¶
Returns a dictionary mapping country names to their Meta HDX population data download URLs.
POI Utilities¶
finetune_poi¶
Relabels and trims a POI list to eight standard categories: Civic, Commercial, Entertainment, Food, Healthcare, Institutional, Recreational, Social.
| Parameter | Type | Description |
|---|---|---|
df |
DataFrame |
POI DataFrame with full OSM/Overture amenity tags. |
target |
str |
Column containing the raw POI category labels. |
relabel_dict |
dict |
Mapping from raw labels to standard categories. |
n |
int |
Minimum POI count threshold; categories below this are grouped into Other. |
Graph Edge Construction¶
get_building_to_building_edges¶
get_building_to_building_edges(
buildings,
return_neighbours: str = 'knn',
knn: int = 3,
distance_threshold: int = 100,
knn_threshold: int = 100,
add_reverse: bool = True,
) -> np.ndarray
Generates building–building adjacency edges using K-NN or distance threshold.
get_building_to_street_edges¶
Generates edges between buildings and their nearest adjacent street segment (within 50 m).
get_intersection_to_street_edges¶
Connects street intersection nodes to the street segments they belong to.
get_buildings_in_plot_edges¶
Connects each building to the urban plot polygon it falls within.
get_plot_to_plot_edges¶
Generates plot–plot adjacency edges for urban plots that share a street boundary.
get_edge_nodes¶
Converts street segment LineStrings to midpoint nodes for use in dual graph or multi-nodal representations.
boundary_to_plot¶
Creates a super-node boundary edge linking the study area boundary to all urban plots. Useful as a global context node in GNN architectures.
Data Preparation¶
one_hot_encode_categorical¶
Converts a categorical column to one-hot encoded binary columns with a given prefix.
standardise_and_scale¶
Standardises and min-max scales all numeric columns in each GeoDataFrame within the objects dict. Returns a new dict with scaled DataFrames.
most_frequent¶
Returns the most common element in a list. Used to determine the dominant category in a set of POI or land-use labels.
gdf_to_poly¶
Writes a GeoDataFrame of Polygon or MultiPolygon geometries to Osmosis .poly format for use with osmium or pyrosm.