Define Sedona context
Load OpenStreetMap road data
We will callload_osm for the road network we will match against. Whereobots Map Matcher uses OSM’s XML file format to load detailed, open source road network data. We’ve got a sample dataset for the Ann Arbor, Michigan area that we will use. The [car] parameter tells matcher to filter anything out of the network that is not big enough for motor vehicle traffic.
Load sample GPS tracking data from VED
For this analysis, we’re leveraging the Vehicle Energy Dataset (VED). VED is a comprehensive dataset capturing one year of GPS trajectories for 383 vehicles (including gasoline vehicles, HEVs, and PHEV/EVs) in the Ann Arbor area. The data spans about 374,000 miles/600,000 km and includes details on fuel, energy, speed, and auxiliary power usage. Driving scenarios cover diverse conditions, from highways to traffic-dense downtown areas and across four seasons.Source: “Vehicle Energy Dataset (VED), A Large-scale Dataset for Vehicle Energy Consumption Research” by Geunseob (GS) Oh, David J. LeBlanc, Huei Peng. Published in IEEE Transactions on Intelligent Transportation Systems (T-ITS), 2020.Each row in the dataset represents a spatial-temporal point of one vehicle’s journey. We are going to use these five columns:
- VehId — Vehicle ID
- Trip — Trip ID; unique per vehicle
- Timestamp(ms)
- Latitude[deg]
- Longitude[deg]
Aggregate GPS points into LineString geometries
The combination of VehId and Trip together form a unique key for our dataset. This combination allows us to isolate individual vehicle trajectories. Every unique pair signifies a specific trajectory of a vehicle. Raw GPS points, while valuable, can be scattered, redundant, and lack context when viewed independently. By organizing these individual points into coherent trajectories represented by LineString geometries, we enhance our ability to interpret, analyze, and apply the data in meaningful ways. AgroupBy operation on ‘VehId’ and ‘Trip’ isolates each trip, a LineString representing the vehicle’s course. We sort the rows by timestamps so the LineString follows the correct order of the GPS data points.
We’ll write a rows_to_linestring function for Spark to process Sedona DataFrame rows into LineString geometries, then collect them in a new DataFrame, trips_df.
Finally, we’ll give each trip a unique ID using row_number.
Perform Map Matching
Finally, we will pass the road network and the aggregated trips intomatcher, and tell it the name of the relevant columns (geometry in both tables).
- ids: A unique identifier for each trajectory, representing a distinct vehicle journey.
- observed_points: Represents the original GPS trajectories. These are the linestrings formed from the raw GPS points collected during each vehicle journey.
- matched_points: The processed trajectories post map-matching. These linestrings are aligned onto the actual road network, correcting for any GPS inaccuracies.
- matched_nodes: A list of node identifiers from the road network that the matched trajectory passes through. These nodes correspond to intersections, turns, or other significant points in the road network.
Visualize the result using SedonaKepler
Themap_config.json file specifies the bounding box and how to draw the road network and the source and matched routes.

