Map matching

Introduction to Map Matching with Matcher

In GPS-based navigation systems, devices often provide GPS coordinates that might be imprecise and don’t necessarily align with actual road networks. This discrepancy can be addressed by a process known as map matching, which aligns these potentially inaccurate GPS points to their correct road segments, ensuring the accuracy of navigation routes and providing a seamless navigation experience for users.

Import Matcher Functions

To leverage the map matching functionalities offered by Wherobots Matcher, you need to import the relevant class. The MapMatching class from the macher module provides these capabilities:

Python
Scala
Java

from wherobots import matcher

import com.wherobots.Matcher

import com.wherobots.Matcher;

Loading OSM Data

Once you’ve accessed the MapMatching class, the next step is to load data from the OpenStreetMap (OSM). Wherobots Matcher supports loading an OSM file into a Sedona DataFrame. This DataFrame is then enriched with the necessary attributes for map matching. If your OSM data is saved at a local path like data/osm.xml, or on a remote storage such as an AWS S3 path, you can load this data using the following code:

Python
Scala
Java

dfEdge = matcher.load_osm("data/osm.xml", "[car]")
dfEdge.show(5)

val dfEdge = Matcher.loadOSM(resourceFolder + "osm2.xml", "[car]")
dfEdge.show(5)

Dataset dfEdge = Matcher.loadOSM(resourceFoler + "data/osm2.xml", "[car]");
dfEdge.show(5);

The [car] parameter is the road type filter, which can be used to filter out road segments that are not suitable for cars. Upon executing the above code, you can expect an output that aligns with the example below:

+--------------------+----------+--------+----------+-----------+----------+-----------+
|            geometry|       src|     dst|   src_lat|    src_lon|   dst_lat|    dst_lon|
+--------------------+----------+--------+----------+-----------+----------+-----------+
|LINESTRING (-83.0...|  62545092|62545090| 42.350883| -83.071323|42.3502907|-83.0711018|
|LINESTRING (-83.0...|7592807480|62545090|42.3495189|-83.0708043|42.3502907|-83.0711018|
|LINESTRING (-83.0...|8129132226|62545316| 42.363371|-83.0546103|42.3634571|-83.0546522|
|LINESTRING (-83.0...|  62859327|62545316| 42.363539| -83.054686|42.3634571|-83.0546522|
|LINESTRING (-83.0...|8129132225|62545316|42.3635311|-83.0547655|42.3634571|-83.0546522|
+--------------------+----------+--------+----------+-----------+----------+-----------+
only showing top 5 rows

This table represents the top five rows of the loaded OSM data, showcasing the geometry and attributes of the road segments. In the context of the table:

src (source): Represents the starting point or origin of a path. Each src has associated coordinates given by src_lat and src_lon.
dst (destination): Refers to the endpoint of a path. Each dst has coordinates represented by dst_lat and dst_lon.

Simply put, the table illustrates various paths in a network, detailing their starting (src) and ending (dst) points with corresponding geographic coordinates.

Create Sedona DataFrame of GPS Trips

With the Sedona library, which provides APIs and data structures for spatial operations, you can transform a list of GPS tracks into a Sedona DataFrame where each track is represented as a LineString. Follow these steps to create a Sedona DataFrame of GPS trips:

Format the given GPS coordinates into strings that can be converted into LineString objects.
Next, Use Sedona’s API to create a DataFrame of these LineString objects.

Here are the GPS tracks,

Python
Scala
Java

gps_tracks = [
    (42.355618, -83.054237), (42.35562, -83.054238), (42.355615, -83.054253), (42.355684, -83.054297), (42.355719, -83.054198),
    (42.355781, -83.054022), (42.355749, -83.054001), (42.355781, -83.054022), (42.355719, -83.054198), (42.35565, -83.054155),
    (42.355587, -83.054116), (42.355656, -83.053914), (42.35573, -83.053715), (42.355749, -83.053728), (42.35573, -83.053715),
    (42.355656, -83.053914), (42.355722, -83.05396), (42.355839, -83.053635), (42.355918, -83.053637), (42.355993, -83.053428),
    (42.355967, -83.053411), (42.355993, -83.053428), (42.355918, -83.053637), (42.355839, -83.053635), (42.355886, -83.053504),
    (42.355999, -83.053194), (42.356038, -83.053086), (42.356027, -83.053079), (42.356038, -83.053086), (42.356105, -83.052904),
    (42.356126, -83.052918), (42.356105, -83.052904), (42.355999, -83.053194), (42.355933, -83.053149), (42.35602, -83.052916),
    (42.35609, -83.052727), (42.356102, -83.052735), (42.35609, -83.052727), (42.35602, -83.052916), (42.355933, -83.053149),
    (42.355999, -83.053194), (42.356243, -83.052523), (42.356265, -83.052537), (42.356243, -83.052523), (42.3563, -83.052367),
    (42.356273, -83.052349), (42.3563, -83.052367), (42.356362, -83.052196), (42.356388, -83.052125), (42.356409, -83.052139),
    (42.356407, -83.052145), (42.356409, -83.052139), (42.35646, -83.052173), (42.356499, -83.052073), (42.356424, -83.052024),
    (42.356348, -83.051973), (42.35638, -83.051886), (42.356395, -83.051847), (42.356423, -83.051866), (42.356395, -83.051847),
    (42.35638, -83.051886), (42.356348, -83.051973), (42.356424, -83.052024), (42.356457, -83.051932), (42.356562, -83.051641),
    (42.356578, -83.051652), (42.356562, -83.051641), (42.35672, -83.051204), (42.356732, -83.051212), (42.35672, -83.051204),
    (42.356722, -83.051198), (42.356795, -83.050995), (42.356827, -83.051015), (42.356816, -83.051046), (42.356827, -83.051015),
    (42.356795, -83.050995), (42.356724, -83.050952), (42.356761, -83.050841), (42.356838, -83.050876), (42.356865, -83.050894),
    (42.356859, -83.050911), (42.356865, -83.050894), (42.356838, -83.050876), (42.356923, -83.050643), (42.356891, -83.050622),
    (42.356923, -83.050643), (42.356999, -83.050433), (42.356927, -83.050386), (42.356953, -83.050315), (42.356978, -83.050332),
    (42.356953, -83.050315), (42.356927, -83.050386), (42.35684, -83.050328), (42.356827, -83.050364), (42.35684, -83.050328),
    (42.356722, -83.050249), (42.35674, -83.050198), (42.356722, -83.050249), (42.356506, -83.050105), (42.356494, -83.050139),
    (42.356506, -83.050105), (42.356443, -83.050063), (42.356431, -83.050096), (42.356443, -83.050063), (42.356388, -83.050025),
    (42.356399, -83.049994), (42.356388, -83.050025), (42.356357, -83.050005), (42.356345, -83.049897), (42.356349, -83.049884),
    (42.35638, -83.049903), (42.356349, -83.049884), (42.356386, -83.049777), (42.356407, -83.04979), (42.356386, -83.049777),
    (42.356464, -83.049549), (42.356459, -83.049533)
]

val gps_tracks: List[(Double, Double)] = List(
  (42.355618, -83.054237), (42.35562, -83.054238), (42.355615, -83.054253), (42.355684, -83.054297), (42.355719, -83.054198),
  (42.355781, -83.054022), (42.355749, -83.054001), (42.355781, -83.054022), (42.355719, -83.054198), (42.35565, -83.054155),
  (42.355587, -83.054116), (42.355656, -83.053914), (42.35573, -83.053715), (42.355749, -83.053728), (42.35573, -83.053715),
  (42.355656, -83.053914), (42.355722, -83.05396), (42.355839, -83.053635), (42.355918, -83.053637), (42.355993, -83.053428),
  (42.355967, -83.053411), (42.355993, -83.053428), (42.355918, -83.053637), (42.355839, -83.053635), (42.355886, -83.053504),
  (42.355999, -83.053194), (42.356038, -83.053086), (42.356027, -83.053079), (42.356038, -83.053086), (42.356105, -83.052904),
  (42.356126, -83.052918), (42.356105, -83.052904), (42.355999, -83.053194), (42.355933, -83.053149), (42.35602, -83.052916),
  (42.35609, -83.052727), (42.356102, -83.052735), (42.35609, -83.052727), (42.35602, -83.052916), (42.355933, -83.053149),
  (42.355999, -83.053194), (42.356243, -83.052523), (42.356265, -83.052537), (42.356243, -83.052523), (42.3563, -83.052367),
  (42.356273, -83.052349), (42.3563, -83.052367), (42.356362, -83.052196), (42.356388, -83.052125), (42.356409, -83.052139),
  (42.356407, -83.052145), (42.356409, -83.052139), (42.35646, -83.052173), (42.356499, -83.052073), (42.356424, -83.052024),
  (42.356348, -83.051973), (42.35638, -83.051886), (42.356395, -83.051847), (42.356423, -83.051866), (42.356395, -83.051847),
  (42.35638, -83.051886), (42.356348, -83.051973), (42.356424, -83.052024), (42.356457, -83.051932), (42.356562, -83.051641),
  (42.356578, -83.051652), (42.356562, -83.051641), (42.35672, -83.051204), (42.356732, -83.051212), (42.35672, -83.051204),
  (42.356722, -83.051198), (42.356795, -83.050995), (42.356827, -83.051015), (42.356816, -83.051046), (42.356827, -83.051015),
  (42.356795, -83.050995), (42.356724, -83.050952), (42.356761, -83.050841), (42.356838, -83.050876), (42.356865, -83.050894),
  (42.356859, -83.050911), (42.356865, -83.050894), (42.356838, -83.050876), (42.356923, -83.050643), (42.356891, -83.050622),
  (42.356923, -83.050643), (42.356999, -83.050433), (42.356927, -83.050386), (42.356953, -83.050315), (42.356978, -83.050332),
  (42.356953, -83.050315), (42.356927, -83.050386), (42.35684, -83.050328), (42.356827, -83.050364), (42.35684, -83.050328),
  (42.356722, -83.050249), (42.35674, -83.050198), (42.356722, -83.050249), (42.356506, -83.050105), (42.356494, -83.050139),
  (42.356506, -83.050105), (42.356443, -83.050063), (42.356431, -83.050096), (42.356443, -83.050063), (42.356388, -83.050025),
  (42.356399, -83.049994), (42.356388, -83.050025), (42.356357, -83.050005), (42.356345, -83.049897), (42.356349, -83.049884),
  (42.35638, -83.049903), (42.356349, -83.049884), (42.356386, -83.049777), (42.356407, -83.04979), (42.356386, -83.049777),
  (42.356464, -83.049549), (42.356459, -83.049533)
)

double[][] gps_tracks = {
    {42.355618, -83.054237}, {42.35562, -83.054238}, {42.355615, -83.054253}, {42.355684, -83.054297}, {42.355719, -83.054198},
    {42.355781, -83.054022}, {42.355749, -83.054001}, {42.355781, -83.054022}, {42.355719, -83.054198}, {42.35565, -83.054155},
    {42.355587, -83.054116}, {42.355656, -83.053914}, {42.35573, -83.053715}, {42.355749, -83.053728}, {42.35573, -83.053715},
    {42.355656, -83.053914}, {42.355722, -83.05396}, {42.355839, -83.053635}, {42.355918, -83.053637}, {42.355993, -83.053428},
    {42.355967, -83.053411}, {42.355993, -83.053428}, {42.355918, -83.053637}, {42.355839, -83.053635}, {42.355886, -83.053504},
    {42.355999, -83.053194}, {42.356038, -83.053086}, {42.356027, -83.053079}, {42.356038, -83.053086}, {42.356105, -83.052904},
    {42.356126, -83.052918}, {42.356105, -83.052904}, {42.355999, -83.053194}, {42.355933, -83.053149}, {42.35602, -83.052916},
    {42.35609, -83.052727}, {42.356102, -83.052735}, {42.35609, -83.052727}, {42.35602, -83.052916}, {42.355933, -83.053149},
    {42.355999, -83.053194}, {42.356243, -83.052523}, {42.356265, -83.052537}, {42.356243, -83.052523}, {42.3563, -83.052367},
    {42.356273, -83.052349}, {42.3563, -83.052367}, {42.356362, -83.052196}, {42.356388, -83.052125}, {42.356409, -83.052139},
    {42.356407, -83.052145}, {42.356409, -83.052139}, {42.35646, -83.052173}, {42.356499, -83.052073}, {42.356424, -83.052024},
    {42.356348, -83.051973}, {42.35638, -83.051886}, {42.356395, -83.051847}, {42.356423, -83.051866}, {42.356395, -83.051847},
    {42.35638, -83.051886}, {42.356348, -83.051973}, {42.356424, -83.052024}, {42.356457, -83.051932}, {42.356562, -83.051641},
    {42.356578, -83.051652}, {42.356562, -83.051641}, {42.35672, -83.051204}, {42.356732, -83.051212}, {42.35672, -83.051204},
    {42.356722, -83.051198}, {42.356795, -83.050995}, {42.356827, -83.051015}, {42.356816, -83.051046}, {42.356827, -83.051015},
    {42.356795, -83.050995}, {42.356724, -83.050952}, {42.356761, -83.050841}, {42.356838, -83.050876}, {42.356865, -83.050894},
    {42.356859, -83.050911}, {42.356865, -83.050894}, {42.356838, -83.050876}, {42.356923, -83.050643}, {42.356891, -83.050622},
    {42.356923, -83.050643}, {42.356999, -83.050433}, {42.356927, -83.050386}, {42.356953, -83.050315}, {42.356978, -83.050332},
    {42.356953, -83.050315}, {42.356927, -83.050386}, {42.35684, -83.050328}, {42.356827, -83.050364}, {42.35684, -83.050328},
    {42.356722, -83.050249}, {42.35674, -83.050198}, {42.356722, -83.050249}, {42.356506, -83.050105}, {42.356494, -83.050139},
    {42.356506, -83.050105}, {42.356443, -83.050063}, {42.356431, -83.050096}, {42.356443, -83.050063}, {42.356388, -83.050025},
    {42.356399, -83.049994}, {42.356388, -83.050025}, {42.356357, -83.050005}, {42.356345, -83.049897}, {42.356349, -83.049884},
    {42.35638, -83.049903}, {42.356349, -83.049884}, {42.356386, -83.049777}, {42.356407, -83.04979}, {42.356386, -83.049777},
    {42.356464, -83.049549}, {42.356459, -83.049533}
};

Use the following code to create the Sedona DataFrame of LineStrings/GPS paths:

Python
Scala
Java

ids = []
lines = []
i = j = k = 0
lineStr = ""

while k < len(gps_tracks):
  if j == 15 or k == len(gps_tracks)-1:
    ids.append(i)
    lines.append(lineStr)
    i += 1
    j = 0
    lineStr = ""
  if j == 0:
    lineStr += str(gps_tracks[j][1]) + "," + str(gps_tracks[j][0])
  else:
    lineStr += "," + str(gps_tracks[j][1]) + "," + str(gps_tracks[j][0])
  j += 1
  k += 1

df_paths = sedona.createDataFrame(zip(ids, lines), ["ids", "lon_lat_seq"])
df_paths = df_paths.withColumn("geometry", expr("ST_LineStringFromText(lon_lat_seq, ',')")).drop("lon_lat_seq");
df_paths.show(5);

var ids = List[Int]()
var lines = List[String]()
var i = 0
var j = 0
var k = 0
var lineStr = ""

while (k < gps_tracks.length) {
  if (j == 15 || k == gps_tracks.length - 1) {
    ids = ids :+ i
    lines = lines :+ lineStr
    i += 1
    j = 0
    lineStr = ""
  }
  if (j == 0) {
    lineStr += gps_tracks(j)._2 + "," + gps_tracks(j)._1
  } else {
    lineStr += "," + gps_tracks(j)._2 + "," + gps_tracks(j)._1
  }
  j += 1
  k += 1
}

val dfPaths = ids.zip(lines).toDF("ids", "lon_lat_seq")
.withColumn("geometry", expr("ST_LineStringFromText(lon_lat_seq, ',')"))
.drop("lon_lat_seq")

dfPaths.show(5)

List<Integer> ids = new ArrayList<>();
List<String> lines = new ArrayList<>();
int i = 0;
int j = 0;
int k = 0;
String lineStr = "";

while (k < gps_tracks.length) {
  if (j == 15 || k == gps_tracks.length - 1) {
    ids.add(i);
    lines.add(lineStr);
    i += 1;
    j = 0;
    lineStr = "";
  }
  if (j == 0) {
    lineStr += gps_tracks[j][1] + "," + gps_tracks[j][0];
  } else {
    lineStr += "," + gps_tracks[j][1] + "," + gps_tracks[j][0];
  }
  j += 1;
  k += 1;
}

Dataset<Row> dfPaths = org.apache.spark.sql.functions.zip(ids, lines).toDF("ids", "lon_lat_seq")
    .withColumn("geometry", functions.expr("ST_LineStringFromText(lon_lat_seq, ',')"))
    .drop("lon_lat_seq");

dfPaths.show(5);

Output will be similar to the following:

+---+--------------------+
|ids|            geometry|
+---+--------------------+
|  0|LINESTRING (-83.0...|
|  1|LINESTRING (-83.0...|
|  2|LINESTRING (-83.0...|
|  3|LINESTRING (-83.0...|
|  4|LINESTRING (-83.0...|
+---+--------------------+
only showing top 5 rows

Map Matching on Batch Data

Now that you have both the OSM edges DataFrame and the GPS trips DataFrame, you’re ready to perform map matching. The below code is used to execute map matching and display the top 5 rows of the resulting DataFrame.

Python
Scala
Java

result_df = matcher.match(df_edge, df_paths, "geometry", "geometry")
result_df.show(5)

val resultDf = Matcher.match(dfEdge, dfPaths, "geometry", "geometry")
resultDf.show(5)

Dataset matchingResultDf = Matcher.match(edgesDf, pathsSpatialDf, "geometry", "geometry");
matchingResultDf.show();

The parameters in the method match are as follows:

dfEdge: A DataFrame of road segments from OpenStreetMap.
dfPaths: A DataFrame of GPS paths to be aligned to roads.
colEdgesGeom: The column in dfEdge holding road geometry.
colPathsGeom: The column in dfPaths holding GPS path geometry.
idFieldName: Optional - The column in dfPaths DataFrame that contains the unique identifier for each GPS trip. if not provided, the first non-geometry column is used.

The output should be similar to the following table:

+---+--------------------+--------------------+--------------------+
|ids|     observed_points|      matched_points|       matched_nodes|
+---+--------------------+--------------------+--------------------+
|  4|[[-83.054237, 42....|[[-83.05423815457...|[4071942207, 6260...|
|  7|[[-83.054237, 42....|[[-83.05423815457...|[4071942207, 6260...|
|  1|[[-83.054237, 42....|[[-83.05423815457...|[4071942207, 6260...|
|  3|[[-83.054237, 42....|[[-83.05423815457...|[4071942207, 6260...|
|  5|[[-83.054237, 42....|[[-83.05423815457...|[4071942207, 6260...|
+---+--------------------+--------------------+--------------------+
only showing top 5 rows

Visualizing the result using SedonaKepler

To visualize the map matching result using SedonaKepler, you can use the following code:

my_map = SedonaKepler.create_map()
SedonaKepler.add_df(my_ap, dfEdge, name="Road Network")
SedonaKepler.add_df(my_ap, result_df.select("observed_points"), name="Observed Points")
SedonaKepler.add_df(my_ap, result_df.select("matched_points"), name="Matched Points")

my_map

Below is a visual representation of an example map matching result:

Red Line: Illustrates the raw GPS trajectories, which capture the vehicle’s original movement as recorded by the GPS device.
Green Line: Highlights the trajectories after map matching, reflecting a refined path that aligns the GPS observations with the actual road network.

A comparison of the red and green lines reveals the adjustments made by the map-matching algorithm, ensuring that raw GPS data corresponds accurately with established roadways and provides a precise depiction of the vehicle’s journey through the network.

Advanced Configuration

Wherobots map matching has several advanced configs that can be set through Config:

Python
Scala
Java

config = SedonaContext.builder() .\
    config("wherobots.tools.mm.maxdist","50.0"). \
    getOrCreate()
sedona = SedonaContext.create(config)

val config = SedonaContext.builder().
    config("wherobots.tools.mm.maxdist","50.0").
    .getOrCreate()
val sedona = SedonaContext.create(config)

SparkSession config = SedonaContext.builder()
            .config("wherobots.tools.mm.maxdist","50.0")
            .appName("SparkSedonaExample")
            .getOrCreate();
SparkSession sedona = SedonaContext.create(config);

These configurations can also be tuned when the SedonaContext was already created:

Python
Scala
Java

sedona.conf.set("wherobots.tools.mm.maxdist", "50.0")

sedona.conf.set("wherobots.tools.mm.maxdist", "50.0")

sedona.conf().set("wherobots.tools.mm.maxdist", "50.0");

Explanation

How Distributed Map-Matching Works

Wherobots runs batch map-matching on a large collection of trajectories in a distributed manner. The map-matching process is divided into two phases: distributing workloads and local map-matching. The distributing workloads phase rearranges the trajectories and the road segments near those trajectories to the same partition, and the local map-matching phase performs map-matching on each partition, where we already have trajectories and their surrounding road network co-located.

Parameters for Distributing Map-Matching Workloads

wherobots.tools.mm.numspatialpartitions
- Number of spatial partitions generated in the spatial join phase. This controls the parallelism of performing spatial join between trajectories and road networks. A recommended value is 10 * number of executor cores.
  - Default value: None
  - Possible values: any positive integer value

The Local Map-Matching Algorithm

The local map-matching algorithm is based on a Hidden Markov Model (HMM), which is popularized by the paper Hidden Markov Map Matching Through Noise and Sparseness. Wherobots implements a variation of this algorithm.

Parameters for the Local Map-Matching Algorithm

wherobots.tools.mm.matcher
- The algorithm for the local map matcher. The legacy mode works better for dense trajectories (high sampling rate) while the advanced mode works better for sparse trajectories (low sampling rate).
- Default value: legacy
- Possible values: legacy, advanced
wherobots.tools.mm.adv.gpsaccuracy
- The GPS accuracy of the input data, in the unit of meters. This controls the search radius of each observation. For sparse data, a higher value (e.g., 40 meters) will improve the accuracy but decrease the speed.
- Default value: 20
- Possible values: any positive integer value
wherobots.tools.mm.adv.partialmatch
- The local map matching algorithm will terminate early if it cannot find matches for every observation of a trip and the result will be a partial match of the original trip. This parameter controls if partial matches should be included in the output. If false, partial matches will become LineString EMPTY in the final output DataFrame.
- Defalue value: false
- Possible values: true, false

Spatial Catalog

WherobotsDB

WherobotsAI

Introduction to Map Matching with Matcher

Import Matcher Functions

Loading OSM Data

Create Sedona DataFrame of GPS Trips

Map Matching on Batch Data

Visualizing the result using SedonaKepler

Advanced Configuration

Explanation

How Distributed Map-Matching Works

Parameters for Distributing Map-Matching Workloads

The Local Map-Matching Algorithm

Parameters for the Local Map-Matching Algorithm

Spatial Catalog

WherobotsDB

WherobotsAI

​Introduction to Map Matching with Matcher

​Import Matcher Functions

​Loading OSM Data

​Create Sedona DataFrame of GPS Trips

​Map Matching on Batch Data

​Visualizing the result using SedonaKepler

​Advanced Configuration

​Explanation

​How Distributed Map-Matching Works

​Parameters for Distributing Map-Matching Workloads

​The Local Map-Matching Algorithm

​Parameters for the Local Map-Matching Algorithm

Introduction to Map Matching with Matcher

Import Matcher Functions

Loading OSM Data

Create Sedona DataFrame of GPS Trips

Map Matching on Batch Data

Visualizing the result using SedonaKepler

Advanced Configuration

Explanation

How Distributed Map-Matching Works

Parameters for Distributing Map-Matching Workloads

The Local Map-Matching Algorithm

Parameters for the Local Map-Matching Algorithm