Skip to content

Map Matching

Map matching is the process of mapping noisy GPS points to correct road segments.

Class: sedonamaps.core.MapMatching

This class provides methods to perform map matching on a road network.

loadOSM method

Method signature

Python definition:

def sedonamaps.core.MapMatching.loadOSM(osmPath: str, tagsFilter: str = "")

Scala definition:

def loadOSM(osmPath: String, tagsFilter: String = ""): DataFrame

Java definition:

public static DataFrame loadOSM(String osmPath, String tagsFilter);
public static DataFrame loadOSM(String osmPath);


  • osmPath: Path to the OSM XML file
  • tagsFilter: Tag values of the highway tag to be used for filtering the OSM data. Multiple values delimited by , can be specified. Specify empty string to preserve all the edges. Default value is empty string.

    There is a special value [car] for filtering the OSM edges for cars. This value expands to the following tags:


Returns a Sedona DataFrame.


from sedonamaps.core import MapMatching as mm
dfEdge = mm.loadOSM(PATH_PREFIX + "data/osm2.xml", "[car]")
import com.wherobots.sedonamaps.MapMatching
val dfEdge = MapMatching.loadOSM(resourceFolder + "osm2.xml", "[car]")
import com.wherobots.sedonamaps.MapMatching
Dataset dfEdge = MapMatching.loadOSM(resourceFoler + "data/osm2.xml", "[car]");

perform_matching method

Method signature

Python Definition:

def sedonamaps.core.MapMatching.perform(edgesDf: DataFrame, pathsDf: DataFrame, colEdgesGeom: String, colPathsGeom: String, idFieldName: str = None)

Scala Definition:

def perform(edgesDf: DataFrame, pathsDf: DataFrame, colEdgesGeom: String, colPathsGeom: String, idFieldName: String = None): DataFrame

Java Definition:

public static DataFrame perform(DataFrame edgesDf, DataFrame pathsDf, String colEdgesGeom, String colPathsGeom);
// Or
public static DataFrame perform(DataFrame edgesDf, DataFrame pathsDf, String colEdgesGeom, String colPathsGeom, String idFieldName);

Parameters: - edgesDf (DataFrame) - Sedona DataFrame containing the attributes loaded from the OSM file. - pathsDf (DataFrame) - Sedona DataFrame containing the GPS trips or LineStrings for which map matching will be performed. - colEdgesGeom (String) - Name of the geometry type column in the DataFrame edgesDf. - colPathsGeom (String) - Name of the geometry type column in the DataFrame pathsDf. - idFieldName (String) - Optional: The column in dfPaths DataFrame that contains the unique identifier for each GPS trip. if not provided, the first non-geometry column is used.

Returns a PySpark DataFrame object containing the results of map matching. This DataFrame includes fields such as ids, observed_points, matched_points, and matched_nodes.


dfMmResult = mm.perform(dfEdge, dfPaths, "geometry", "geometry")
val dfMmResult = MapMatching.perform(dfEdge, dfPaths, "geometry", "geometry")
Dataset matchingResultDf = MapMatching.perform(edgesDf, pathsSpatialDf, "geometry", "geometry");;

Advanced Configuration

SedonaMaps has several advanced configs that can be set through Config:

config = SedonaContext.builder() .\
    config("","50.0"). \
sedona = SedonaContext.create(config)
val config = SedonaContext.builder().
val sedona = SedonaContext.create(config)
SparkSession config = SedonaContext.builder()
SparkSession sedona = SedonaContext.create(config);

These configurations can also be tuned when the Sedona context was already created:

sedona.conf.set("", "50.0")
sedona.conf.set("", "50.0")
sedona.conf().set("", "50.0");


How Distributed Map-Matching Works

SedonaMaps runs batch map-matching on a large collection of trajectories in a distributed manner. The map-matching process is divided into two phases: distributing workloads and local map-matching. The distributing workloads phase rearranges the trajectories and the road segments near those trajectories to the same partition, and the local map-matching phase performs map-matching on each partition, where we already have trajectories and their surrounding road network co-located.

Parameters for Distributing Map-Matching Workloads

    • Number of spatial partitions generated in the spatial join phase. This controls the parallelism of performing spatial join between trajectories and road networks. A recommended value is 10 * number of executor cores.
    • Default value: None
    • Possible values: any positive integer value

The Local Map-Matching Algorithm

The local map-matching algorithm is based on a Hidden Markov Model (HMM), which is popularized by the paper Hidden Markov Map Matching Through Noise and Sparseness. SedonaMaps implements a variation of this algorithm.

Parameters for the Local Map-Matching Algorithm

    • The algorithm for the local map matcher. The legacy mode works better for dense trajectories (high sampling rate) while the advanced mode works better for sparse trajectories (low sampling rate).
    • Default value: legacy
    • Possible values: legacy, advanced
    • The GPS accuracy of the input data, in the unit of meters. This controls the search radius of each observation. For sparse data, a higher value (e.g., 40 meters) will improve the accuracy but decrease the speed.
    • Default value: 20
    • Possible values: any positive integer value
    • The local map matching algorithm will terminate early if it cannot find matches for every observation of a trip and the result will be a partial match of the original trip. This parameter controls if partial matches should be included in the output. If false, partial matches will become LineString EMPTY in the final output DataFrame.
    • Defalue value: false
    • Possible values: true, false

Last update: March 21, 2024 05:49:18