Wherobots open data
Wherobots collects and maintains open datasets from various data sources for use by Wherobots Cloud users. Those datasets are cleaned and transformed into Havasu format for fast and efficient analytics with WherobotsDB in Wherobots Cloud.
These datasets are available for free within Wherobots Cloud, with a subset of them reserved to our Professional Edition users. If you are interested in upgrading your plan, please contact us.
Open data catalogs¶
Wherobots open data is available through two catalogs:
wherobots_open_data
for Community Edition datasets (available to all,
including Professional Edition users), and wherobots_pro_data
for
Professional Edition datasets.
Dataset name | Availability in Wherobots | Type | Count | Description |
---|---|---|---|---|
Overture Maps buildings/building | Community edition | Polygon | 785 million | Any human-made structures with roofs or interior spaces |
Overture Maps places/place | Community edition | Point | 59 million | Any business or point of interest within the world |
Overture Maps admins/administrativeBoundary | Community edition | LineString | 96 thousand | Any officially defined border between two Administrative Localities |
Overture Maps admins/locality | Community edition | Point | 2948 | Countries and hierarchical subdivisions of countries |
Overture Maps transportation/connector | Community edition | Point | 330 million | Points of physical connection between two or more segments |
Overture Maps transportation/segment | Community edition | LineString | 294 million | Center-line of a path which may be traveled |
Google & Microsoft open buildings | Professional edition | Polygon | 2.5 billion | Google & Microsoft Open Buildings, combined by VIDA |
LandSAT surface temperature | Professional edition | Raster (GeoTiff) | 166K images, 10 TB size | The temperature of the Earth's surface in Kelvin, from Aug 2023 to Oct 2023 |
US Census ZCTA codes | Professional edition | Polygon | 33144 | ZIP Code Tabulation Areas defined in 2018 |
NYC TLC taxi trip records | Professional edition | Point | 200 million | NYC TLC taxi trip pickup and dropoff records per trip |
Open Street Maps all nodes | Professional edition | Point | 8 billion | All the nodes of the OpenStreetMap Planet dataset |
Open Street Maps postal codes | Professional edition | Polygon | 154 thousand | Boundaries of postal code areas as defined in OpenStreetMap |
Weather events | Professional edition | Point | 8.6 million | Events such as rain, snow, storm, from 2016 - 2022 |
Wild fires | Professional edition | Point | 1.8 million | Wildfire that occurred in the United States from 1992 to 2015 |
Accessing open data¶
Catalogs for the open data your account has access to are automatically
configured in your environment's SedonaContext
, and can be directly
referenced by the following format: CATALOG_NAME.DATABASE_NAME.TABLE_NAME
.
Users can read these tables by calling sedona.table(CATALOG_NAME.DATABASE_NAME.TABLE_NAME).show()
.
Inspecting open data catalogs¶
You can inspect the existing databases and tables in a catalog as follows:
Show database names¶
sedona.sql("SHOW SCHEMAS IN wherobots_pro_data").show()
+----------------+
| namespace|
+----------------+
|google_microsoft|
| landsat|
| nyc_taxi|
| osm|
| us_census|
| weather|
+----------------+
Show table names¶
Use weather
database as an example:
sedona.sql("SHOW TABLES IN wherobots_pro_data.weather").show()
+---------+--------------+-----------+
|namespace| tableName|isTemporary|
+---------+--------------+-----------+
| weather|weather_events| false|
| weather| wild_fires| false|
+---------+--------------+-----------+
Show table schema and content¶
Use weather.weather_events
as an example:
sedona.table("wherobots_pro_data.weather.weather_events").printSchema()
root
|-- EventId: string (nullable = true)
|-- Type: string (nullable = true)
|-- Severity: string (nullable = true)
|-- StartTime(UTC): string (nullable = true)
|-- EndTime(UTC): string (nullable = true)
|-- Precipitation(in): string (nullable = true)
|-- TimeZone: string (nullable = true)
|-- AirportCode: string (nullable = true)
|-- LocationLat: string (nullable = true)
|-- LocationLng: string (nullable = true)
|-- City: string (nullable = true)
|-- County: string (nullable = true)
|-- State: string (nullable = true)
|-- ZipCode: string (nullable = true)
|-- geometry: geometry (nullable = true)
Use case notebooks¶
We provide interesting use case notebooks to demonstrate how you can link your data to the physical world and drive insights. Professional Edition users will be able to execute these notebooks on Wherobots cloud.
Overviews of these notebooks are as follows.