USAGE OF VARIANCE IN DETERMINATION OF SINUOSITY INTERVALS FOR ROAD MATCHING 1

Geo-object matching is a process that identifies, classifies and matches the object pairs with regards to their maximum similarity in whole datasets. The matching process is used to handle updating, aligning, optimizing, integrating and/or quality measuring of road networks. There are several metrics used in matching algorithms such as Hausdorff distance, orientation, valence, sinuosity etc. Sinuosity is a ratio of actual length of a road to the straight length among start and end points of the same road. Sinuosity defines how curve a road is. In a matching process, it is necessary to determine the sinuosity thresholds or intervals firstly. Sinuosity intervals can be determined by several data classification methods such as equal interval, quantile, natural breaks and geometrical interval. Furthermore, the intervals determined by Ireland Transportation Agency can be used in parallel with this purpose. In this study, it was aimed to find out if the variance can be used in determination of sinuosity intervals as well. An experiment was conducted to compare all of the methods mentioned above. According to the results in road matching, the efficiency of the sinuosity intervals determined by the methods differs from 37.4% to 49.4%, and it seems that the intervals determined by the variance are the most efficient ones.


INTRODUCTION
Spatial data has been used and produced rapidly in information age.This kind of productionconsumption cycle brings several economic deficiencies because of duplicate versions of the same data.Geometric data integration relies on the combination of multi-source datasets to obtain up-to-date dataset without producing new data.This kind of integration is the subject of map conflation.Lynch and Saalfeld (1985) defined the purpose of map conflation that the objects in different datasets, representing the same entities, are combined to get a better map.Most of the conflation studies have been conducted on road networks because of the extensive usage such as navigation, transportation, etc. Main problem in conflation is matching road objects in different sources that represent the same road.Geo-object matching is a challenging study since there are several geometric, attribute and topological differences among source datasets.This is because of that the production of source datasets can be very different from each other in several ways such as coordinate system, date, data collection (on stereo image or surveying in field), and so on.It is a process that identifies, classifies and matches the object pairs, representing the same entity, with regards to their maximum similarity in whole datasets.The matching process is used to handle updating, aligning, optimizing, integrating, conflating and/or quality measuring of road networks.A matching algorithm is generally conducted by using similarity equations (Zhang and Meng, 2007;Li and Goodchild, 2011).The bigger similarity values the more possibility for matching candidates to be certain matched pairs.In similarity equations, there are several metrics (network alignment, distance threshold, orientation, direction, road length, valence, sinuosity, etc.) make the matching algorithm more efficient (Hacar and Gökgöz, 2016).While distance metric limits the number of matching candidates, orientation and valence (degree of connectivity) can be used to find the certain matches (Olteanu-Raimond et al., 2015;Mustière and Devogele, 2008).Sinuosity is also used to eliminate the incorrect candidates.It is a ratio of actual length of a road to the straight length among start and end points of the same road and defines how curve the road is (Mueller, 1968;Haynes et al., 2007) (Figure 1).In this study, sinuosity intervals determined by commonly used classification methods and a proposed classification method called 'sinuosity variance' were compared with standard sinuosity intervals from Ireland Transportation Agency (ITA) under the framework of matching process.The study area and road datasets are described in Section 2. Besides, classification methods and proposed Sinuosity variance method are summarily introduced.In section 3, determination of sinuosity intervals were conducted and the results of matching process are presented with regards to the classification methods.Finally, some inferences from these results are given in section 4.

STUDY AREA AND DATASETS
This study was conducted using datasets representing roads in Beykoz district, Istanbul, Turkey.It covers the area 1.6km x 1.7km.The road networks, representing the same entities, are one from Istanbul Metropolitan Municipality (IMM) road dataset and the other from Basarsoft navigation road dataset.Their pattern is tree-based.Figure 2 shows the study area, road networks and the differences among networks.

Classification Methods
Roads are classified into predefined sinuosity intervals generally to analyze traffic components such as travel demand, road safety, etc.In the literature, there have been some calculations of sinuosity (Table 1).

Method Definition Bend density
The number of bends per kilometer Sinuosity/detour ratio The ratio of actual length of a road to the straight length among start and end points of the same road Straightness index The proportion of road segments that are straight Mean angle The mean angle turned per bend In this study, the sinuosity/detour ratio is used as a sinuosity equation. ( Sinuosity is commonly divided into three classes; Low → for straight and/or low curved roads Middle → for relatively curved roads High → for highly curved roads. Sinuosity intervals (classes) can be determined by several commonly used data classification methods such as equal interval, quantile, natural breaks and geometrical interval.Furthermore, the intervals determined by ITA can be used in parallel with this purpose.ITA conducted an evaluation and defined three standardized sinuosity intervals for Ireland (Transport Infrastructure, 2016) (Table 2) (Figure 3).

Sinuosity Index Intervals
Low < 1.0001 Mid ≥ 1.0001 and < High ≥ In a matching process, the sinuosity index of an object is assumed to be the same sinuosity index of the matched object.For example, if Line A in dataset 1 has Low sinuosity index, then it is expected to search Low sinuosity indexed line/lines in dataset 2 during matching.
The proposed method sinuosity variance was also used to determine the intervals.In this method, sinuosity intervals were determined with regards to the variations of sinuosity values of the roads in datasets.Firstly, the sinuosity variance values in both road datasets are calculated.Then, the dataset has the maximum variance value is set to be a reference in order to calculate the sinuosity intervals (Table 3).

RESULTS AND DISCUSSION
In this study, the sinuosity intervals were determined by using the proposed sinuosity variance approach, equal interval, quantile, natural breaks and geometrical interval.They were compared with standard intervals from ITA (Table 4 and 5).A pre-matching process was conducted by using Hausdorff distance with the threshold 85m.The threshold value should be determined as high as to catch all the possible candidate roads.The roads close to the others less than 85m were assigned to be matching candidates.
Line k and l are matched if the following conditions are met:  If Line k has 'Low' sinuosity index then Line l with 'Low' sinuosity index in all candidates of Line k is matched. If Line k has 'Mid' sinuosity index then Line l with 'Mid' sinuosity index in all candidates of Line k is matched. If Line k has 'High' sinuosity index then Line l with 'High' sinuosity index in all candidates of Line k is matched.Matching processes were conducted after each classification.For the evaluation, the matching results were compared with manually matching results (Table 6).

CONCLUSIONS
In this study, a new method determining sinuosity intervals and classifying sinuosity index for road matching process was proposed.Sinuosity intervals were determined with regards to the variations of sinuosity values of the roads in datasets.It is compared with the sinuosity intervals from ITA and mostly used classification methods.Equal Interval and Natural Breaks methods are insufficient for matching process since hardly any roads were classified into 'Mid' or 'High' sinuosity indices.Quantile method gave the second best result.In this method, the intervals are determined to make each sinuosity class has the same number of objects.Since both datasets in this study have different number of objects, Quantile should be tested better with datasets that have the same number of objects.Sinuosity variance, a promising classification method for matching process, gave the best matching result in all classification methods.

Figure 1 .
Figure 1.Actual (orange) and straight lengths (dashed blue) of a road

Figure 3 .
Figure 3. Examples of road lines for each ITA sinuosity index.

Table 4 .
The sinuosity interval values retrieved from each classification method

Table 5 .
Number of the objects in each sinuosity index with regards to the classification methods and sources

Table 6 .
Matching statistics with regards to the classification methods.