Label Placement Challenges in City Wayfinding Map Production—Identification and Possible Solutions

Harrie, Lars; Oucheikh, Rachid; Nilsson, Åsa; Oxenstierna, Andreas; Cederholm, Pontus; Wei, Lai; Richter, Kai-Florian; Olsson, Perola

doi:10.1007/s41651-022-00115-z

Label Placement Challenges in City Wayfinding Map Production—Identification and Possible Solutions

Open access
Published: 25 May 2022

Volume 6, article number 16, (2022)
Cite this article

Download PDF

You have full access to this open access article

Journal of Geovisualization and Spatial Analysis Aims and scope Submit manuscript

Label Placement Challenges in City Wayfinding Map Production—Identification and Possible Solutions

Download PDF

3132 Accesses
12 Citations
Explore all metrics

Abstract

Map label placement is an important task in map production, which needs to be automated since it is tedious and requires a significant amount of manual work. In this paper, we identify five cartographic labeling situations that present challenges by causing intensive manual work in map production of city wayfinding maps, e.g., label placement in high density areas, utilizing true label geometries in automated methods, and creating a good relationship between text labels and icons. We evaluate these challenges in an open source map labeling tool (QGIS), provide results from a preliminary study, and discuss if there are other techniques that could be applicable to solving these challenges. These techniques are based on quantified cartographic rules or on machine learning. We focus on deep learning for which we provide several examples of techniques from other application domains that might have a potential in map label placement. The aim of the paper is to explore those techniques and to recommend future practical studies for each of the identified five challenges in map production. We believe that targeting the revealed challenges using the proposed solutions will significantly raise the automation level for producing city wayfinding maps, thus, having a real, measurable impact on production time and costs.

End-to-End Deep Structured Models for Drawing Crosswalks

Post-analysis of OSM-GAN Spatial Change Detection

An Algorithmic Framework for Labeling Road Maps

Introduction

Label placement is an important task in map production that requires a substantial amount of manual work and time. To reduce this time and enhance visual, informative, and esthetic quality of the maps, numerous studies have been carried out on automatic map labeling (see Wolff and Strijk (2009) for an overview of early studies). Even though several labeling problems have satisfying solutions (such as how to find optimal solutions for point placement on small scale maps), the automation level of map labeling in production is still low. This low automation rate is likely due to several reasons. Firstly, adequate methods may not have been developed to solve the labeling challenges that occur in a production environment. Secondly, current label placement tools might not implement the best methods available. Thirdly, map producers might not entirely utilize the capability of the map labeling tools. Fourthly, the data structures used for the cartographic data are not sufficient to support the best methods/tools available. Most likely, the current low degree of automation in map labeling in production is caused by a combination of these reasons.

In map production, text labels and icons are often placed simultaneously since there are dependencies between how they are placed. Therefore, in this study, we include placement of both text labels and icons. In the paper, the terms labels and map labeling include both (placement of) text labels and icons.

Most research in map labeling is based on quantifying rules found in the cartographic literature, often based on seminal work such as Imhof (1975) and Wood (2000). This approach has been successful in the sense that rule-based systems and optimization techniques have been developed and implemented in tools, but it has not solved several challenges in map production. In an era of increasing use of machine learning in many application domains, an obvious question is if and how machine learning and precisely deep learning could be applied to map labeling. This question boils down to whether we can utilize cartographic knowledge of map labeling implicitly present in map examples to train, e.g., a neural network to perform map labeling of good quality, or at least to evaluate if a map labeling is appropriately conducted. This is, in our view, still an open question that we in this study address and give insight to but not fully answer.

The paper has two main aims: the first is to identify cartographic labeling challenges, occurring in a map production environment that cannot be solved by current label placement tools. The second aim is to discuss whether there are published methods that might be useful to solve these challenges and/or if deep learning methods could be applicable. Based on this, we formulate recommendations for further studies. The paper starts with describing rules of map labeling in a production environment, with a focus on city wayfinding maps. Then, follows an introduction to deep learning and its potential use in map labeling. In the following sections, some challenging cartographic labeling situations, occurring in production of city wayfinding maps, are described. This part also includes descriptions of several methods that potentially could be useful for solving the labeling challenges, including rule-based and deep learning methods. The paper ends with some concluding remarks.

Map Labeling Rules for City Wayfinding Maps

General Label Placement Rules

Map labeling rules concern the whole labeling process which includes the following: (1) the choice of labels to show and their classification, (2) determination of font characteristics, and (3) label placement (Yoeli, 1972). In this study, we are mainly interested in the placement of labels, for which several general rules must be obeyed (for more details, see Imhof 1975; Wood 2000; van Dijk 2002; Rylov and Reimer 2015):

Legibility: a label is not allowed to overlap another label.
Association: it should be easy to interpret which map object a label refers to, hence avoid placing labels too close to other objects.
Map readability: If the labels must be placed on top of map objects, they should not cover important features of those objects (and ideally only overlap homogenous areas and less important objects). Furthermore, the map objects should not disturb the interpretation of the labels.
Esthetics: The labeling should contribute to an overall esthetic map.

These rules are applicable for map labeling of all types of maps. To fulfill these rules, as well as other cartographic aspects, there are more specific rules defined for a specific cartographic product. In this study, we focus on city wayfinding maps.

Production Rules for Label Placements in City Wayfinding Maps

In this study, we focus on city wayfinding maps^{Footnote 1} in London. City wayfinding maps provide directional information in complex urban environments in such a way that they can be easily interpreted by pedestrians and cyclists. The rules considered for label placement are based on design standards produced by Transport for London^{Footnote 2} as well as internal labeling rules from the mapping company T-Kartor.^{Footnote 3} Even though the cartographic rules are for a specific cartographic product (city wayfinding map for London), the main content is largely generally applicable (and generally follows recommendations found in, e.g., Imhof 1975). The intention here is not to provide a complete list of label placement rules, rather to provide an outline of the rules as a base for discussion about limitations in the available algorithms/tools (see Appendix 1 for detailed rules). In short, the following rules (and their exceptions) apply.

Point feature labeling: generally, point feature labels should be horizontal and ideally above to the right of the point (see, e.g., Slocum et al., 2005). In city wayfinding maps, most point objects are in fact represented by icons and some types of these icons are not allowed to be moved. If there is not enough space, callouts are used (Fig. 1).

Line feature labeling: line feature labels, e.g., for roads, are to be placed within the road area. Straight parts of a road are preferable for labels due to readability; if not possible, the label shape needs to adapt to the shape of the feature. Labels can also be wrapped into two (or more) lines, or shortened, to make them fit. For long line features, labels are repeated.
Area feature labeling: preferably, area labels should be completely placed within the polygon feature they represent, wrapping text into several lines if necessary. But if unavoidable, area labels may cross the polygon boundary. Labels should be horizontal and aligned according to their relation to the polygon feature (e.g., left alignment if placed more to the right of the feature). City wayfinding maps also contain area labels for administrative areas (e.g., neighborhoods), using large fonts, opacity, and large space between characters. Ideally, these labels do not overlap other labels, but in practice, this is hardly avoidable and, thus, overlap is allowed as long as it does not harm map readability.
Icons: most icons relate to a specific location on a street (e.g., a bus stop). Icons come with an arrow that needs to point to the true location on the street. Ideally, icons are placed in a 90-degree angle to the corresponding street, but other angles are possible if necessary for avoiding overlaps with map features or other labels. Icons should not overlap roads, but may overlap buildings if necessary. Icons representing (parts of) area features should align with the parts they represent and text labels of these features, respectively. Exceptions are possible if there is no other solution.
Label overlap and removal: In short, the first rule is that no text labels and icons may overlap, and the second rule is that it is not allowed to remove a text label or icon. Clearly, these rules often result in conflicts that require exceptions, e.g., for text labels to overlap icons or buildings they do not represent as long as it is still clear which building each label corresponds to.
Hyphenation and other text manipulations: For a label text, the following priorities should be used: (1) complete text in one unit, (2) shortened text in one unit, (3) text divided into two units, and (4) text divided into two or several rows. For city wayfinding maps, there is a list of allowed abbreviations that can be used. Also, under several restrictions, font size may be changed to obtain optimally looking labels.

Map Labeling Based on Deep Learning

In this section, we discuss the potential of deep learning methods for map labeling. After a brief introduction to deep learning and its applications, we provide a more general outlook on how deep learning may contribute to achieving the key elements in good label placement. Deep learning may possibly also be utilized for improving the evaluation step in label placement, especially for evaluating cartographic rules that are difficult to quantify, e.g., map readability.

Introduction to Deep Learning

Machine learning techniques have experienced a prosperous development in recent years in several application fields such as image recognition (Ohri et al. 2021), image classification (Zhao and Du 2016), and robot technology (Levine et al. 2018). Classical machine learning techniques may achieve acceptable performance but require tedious feature engineering, in contrast to deep learning techniques, and particularly convolutional neural networks (CNN) and the learning mechanisms such as attention, adversarial, and spatial transformation. CNN include convolutional layers stacked on top of each other and each layer is capable of recognizing more sophisticated features and generating feature maps. The fully connected networks are prone to overfitting if not regularized as each neuron in one layer is connected to all neurons in the next layer. With CNN, regularization is achieved by exploiting the hierarchical patterns in their input data by employing increasingly complex filters or kernels on the data with increasing network depth. Much research has gone into optimizing the network design to increase the performance of learning specific tasks and to solve some technical issues such as overfitting, vanishing gradient problem, and under-specification. This leads to efficient model architectures such as Faster-R-CNN, U-Net, YOLO, SSD, FPN, or Inception (Dhilon and Verma 2019).

One type of deep learning models increasingly applied in many learning applications and of interest to map labeling is generative adversarial networks (GAN). A GAN includes two networks trained in contest: the generative network generates new samples and learns to map from a latent space to a given data distribution, while the discriminative network evaluates the generated samples and distinguishes them from the true data distribution (Goodfellow et al. 2014). These two networks play a minimax game which, if its equilibrium is reached, results in very good performance, e.g., generating highly realistic looking images.

GAN are relevant in this context because the problem of placing labels on maps lies in the intersection of vision and language and can be formulated as an image synthesis problem. The two most interesting approaches for image synthesis are image composition and image translation. Image composition aims to synthesize new images by placing foreground objects into an existing background image (Lee et al. 2018; Fig. 2). The foreground objects in our case are the labels that should be placed in the background image, i.e., the map, at semantically sensible regions. To achieve synthesis realism and to generate labeled maps similar to the manually labeled dataset, some techniques and networks mentioned below can be used to learn and control certain parameters such as text locations within the background image, geometric transformation of the foreground texts, and blending between the foreground text and background image. On the other hand, image-to-image translation aims to find a mapping from one visual domain to another and to learn the required transformations to perform on images from one domain so they have the features of images from another domain.

There are, however, some inherent problems of using many deep learning techniques in map labeling since they rely on image-to-image translations. These translations only focus on the synthesis of appearance features (here the label) by learning the style of images of the target domain. Generally speaking, a solution to the label placement problem should include both synthesis realism in the geometry domain (alignment, etc.) and the appearance domain (the text itself as well as the relation to the background map). A geometry synthesizer needs to learn the local geometry of background images (maps) consisting of the roads, buildings, etc. on which the labels representing our foreground objects (labels) can be transformed and placed. To which extent this is possible is further elaborated on below.

Earlier Studies on Machine Learning in Label Placement

Pokonieczny and Borkowska (2019) utilized machine learning to determine feature labeling in topographic maps. They trained a network with input terrain coverage data and labels from several maps to determine in which rectangle a label should be placed around a feature. They achieved up to 80% correctly placed labels which made it possible to reduce manual editing by 50%.

Li et al. (2020) developed a deep learning methodology for placing area feature labels. A common strategy in 43, implemented in several GIS programs, is to place the label on top of the centroid of the polygon that defines the area. However, for many polygonal shapes, this strategy is not cartographically satisfying, and in map production, cartographers manually select other positions. Furthermore, it is difficult to formalize what is a good position for an arbitrary polygon shape. Li et al. (2020) utilized data to train a stacked hourglass network to produce a heatmap that indicates a good position of the area label. The methodology was applied to map labeling of property units in a cadaster map and yielded relatively good results.

It should be noted here that neither of these two studies is concerned with conflicting labels and overlap of other map features only plays a very minor role here; in other words, they treat quite simple labeling tasks. Therefore, their methodologies are likely not extendable to more general and/or more complex labeling situations.

Potential Deep Learning Techniques for Automated Label Placement

In this section, we describe some deep learning techniques that are related to the key elements of good label placement. The approach is to formulate the problem of text placement as a learning task, and then to explore deep learning techniques from other domains that share similar issues and, this way, to identify the appropriate approaches that may be pursued further.

Legibility

Legibility in map labeling mainly concerns avoiding labels to overlap, which can be facilitated by a saliency-based method (Vilaplana 2015). The saliency model computes a saliency map for a given image such that homogeneous image regions usually have lower saliency. Then, a predefined threshold on the resulting saliency map will determine the appropriate locations for text placement. This saliency guidance helps to find the right locations for texts within the semantically sensible regions or at least to improve the identified candidate locations while avoiding collisions with other objects.

Another deep learning method that may be interesting for label placement is the image text quality assessment (ITQA for short) which aims to evaluate the image quality with a focus on text as it computes the quality score of an image through predicting the degree of degradation at textual regions. Furthermore, Li et al. (2018) proposed a method based on ResNet to perform image text quality assessment, which is composed of three stages: text detection, text quality prediction, and weighted pooling of the quality of all detected text lines. Other methods can learn from ranked datasets such as user rankings (Liu et al. 2017). Siamese networks are trained on ranked sets and transfer this learning to a CNN that performs the absolute legibility assessment. Another related application is image stitching in which the overlapped objects should be detected so that they can be stitched and generate a wide field of view image. Lyu et al. (2019) claimed in their survey that feature-based methods have dominated image stitching and that learned CNN features are more flexible, and more potential matched candidates could be extracted from images with wide baseline or low-texture regions.

Association

There is a challenge to model associations in deep learning applied to raster maps. The raster map features alone might be insufficient to capture the relation between objects and their labels. The same applies to the practical level since learning object-centric representations from pixels is not efficient for complex tasks in which it is required to encode fine-grained locations, orientations, and complex composition of objects. However, several semantic and context-based methods have been developed in the deep learning domain that could be applicable for label placement.

Lee et al. (2018) developed a model for context-aware synthesis and placement of object instances that can simultaneously determine locations to place an object in a scene, and its appearance, i.e., scale and shape, or pose, given a semantic mask. They used an architecture that consists of two GAN modules and spatial transformation networks (STN). An STN is a special type of CNN capable of making geometric transformations on images and generating realistic looking ones by limiting the space of possible outputs to a low-dimensional geometric transformation of real images. Using only GAN can produce images of remarkable complexity and realism but may potentially ignore the explicit spatial interaction between multiple entities present in the image. That is the reason for introducing ST-GAN and using it for image composition tasks in both paired and unpaired settings (Lin et al. 2018). Volokitin et al. (2020) developed a method for the automatic determination of plausible locations for object placement into images using masked convolutions which compute feature maps for left, right, top, and bottom contexts just once per image and thus learn the spatial context of different image regions.

Readability

Map readability can be evaluated by detecting the occlusion (overlap) in the final maps. By using de-occlusion techniques which aim to recover and complete the invisible parts of occluded objects, we can ensure that no important features are hidden. In addition, saliency models could be useful to identify the attention points or regions that people would focus on and important objects that should be not occluded. Saliency feature learning was used to increase readability of posters, which are very informative, but they are usually viewed only for a few seconds (Fang et al. 2020). The used data are collected from eye-tracking experiments and the evaluation is done using specific metrics such as time to first fixation and observation length. The same techniques are used for natural scenes data in order to identify the most noticeable objects which attract human attention. Fang et al. (2020) evaluated the capabilities of six state-of-the-art models on natural scene content (i.e., text or characters) to find salient regions and generate saliency maps. The use of custom loss functions can enhance the readability of the obtained maps. For example, using a repulsion loss can help to keep away the labels from each other by penalizing the generated samples with small spaces between the labels.

Esthetics

There are some examples of deep learning studies in cartography addressing esthetics, mainly focusing on cartographic generalization (Zhou and Li 2017; Touya et al. 2019; Feng et al. 2019; Courtial et al. 2020). For example, Courtial et al. (2020) explored deep learning techniques for mountain road generalization, where a U-Net network was trained on raster images of road objects in the Alps. The authors conclude that the network achieves smoothing, enlargement, and caricature operations on the mountain road objects in most of the cases, but they mentioned that the result is not as good as the reference data (i.e., the production data at IGN, the French mapping authority).

In art, Cetinic et al. (2019) investigated scoring artistic images according to three subjective aspects of human perception: esthetic valuation, received sentiment, and memorability. Their experiments were performed using different decision trees and CNN models on image features related to the content, composition, and color of digitized fine art collections. For each concept, they evaluated several different CNN models trained on various natural image datasets and select the best performing model based on the qualitative results and the comparison with existing subjective ratings of artworks. They conclude that CNN models pre-trained on natural images can learn and extract meaningful esthetic, memorability, and sentiment features in art images.

Some Challenges in Label Placement

In the production of city wayfinding maps (at T-Kartor), map labeling is a substantial part of the manual handling. Some tools have been evaluated to increase the automation level, but so far, no satisfying solution has been found. One reason might be that the requirements of the city wayfinding maps are somewhat unique and therefore hard to automate using standard tools. This situation is also worsened by the fact that the best cartographic solution is sometimes a violation of one or several of the requirements (simply because it is impossible to place all labels adhering to the complete list of requirements). We do, however, believe that the challenges for the labeling of city wayfinding maps are to a large degree shared with the labeling of other types of high-quality maps with dense information content.

In the following sections, we describe some labeling challenges that cause much interactive work in the production of city wayfinding maps. These challenges have been identified together with cartographers at T-Kartor. We also look into and discuss if there are map labeling methods and/or deep learning techniques that potentially could be useful in these situations, as well as perform some tests. To illustrate the label placement challenges, we use two types of city wayfinding maps. The first type are production maps created by the company T-Kartor. These maps are produced in an ESRI ArcGIS environment using the Maplex label engine and substantial manual label (annotation) editing both in the ArcGIS environment and in the publishing tool Adobe Illustrator. The second type are maps created by us in the open source program QGIS^{Footnote 4} or in the Maplex label engine with the same input data as for the production maps. Details of the QGIS map labeling tool are given in Appendix 2 (see also Ertz et al. 2009). The Maplex label engine is a rule-based system that is integrated into the ESRI environment.^{Footnote 5} Maplex is extensively used and has shown to produce good results for several map types (see, e.g., the evaluation in Kern and Brewer 2008).

Challenge 1: Label Placement in High-Density Areas

Problem Identification

High-density areas are characterized by a scarcity of space for both map features and labels (Figure 3). To cope with this, cartographers often manually find solutions that are a compromise between wanted properties of the map. One particular challenge in high-density areas is to define priorities between the labels, especially since it is not possible to state that one label type always should be in priority over another label type. Referring to Figure 3a, we can identify that the area label (representing the landmark building) The Original London Visitor Centre has been prioritized (by the cartographer) before the line label Cockspur street (that was divided into two lines which is not an optimal solution according to the labeling rules). On the other hand, the area label The Ambassadors Theatre (in Figure 3b) is moved from its ideal placement where the main part of the label is in fact placed on another side of a street (which is not recommended from an association perspective) to allow space for the line labels West St and Tower Ct.

Figure 4a shows text labels placed automatically in QGIS, while the icons are placed manually (identical to Fig. 3a). The label placement is affected by how the parameters are set in QGIS (e.g., which type of area labels are used, priorities of labels, if labels are allowed to overlap other objects). For us, it turned out to be difficult to find a set of parameters that utilize the available space in such a good way that was done manually in Fig. 3a. Some shortcomings in the map in Fig. 4a overlap between text labels and icons and that some text labels (e.g., The Original London Visitor Centre) had a fixed form that made it not possible to find a better location that could allow also other labels to be shown (e.g., Embassy of Brazil). Also, the parameter setting used was not optimal to show all the road labels. Figure 4b shows the same area where the map labeling is conducted by the Maplex tool. Both QGIS and Maplex create satisfying labeling in terms of readability. The main problem is that the tools are not capable of placing all the labels. This omission could be acceptable in many map services, but not in the city wayfinding map which has a requirement that all labels are present. The question then boils down to whether QGIS and Maplex are useful tools for placing a majority of the labels and the rest then being placed manually. In the production environment for city wayfinding maps, they have concluded that, at least in dense areas, the labeling tools do not provide good enough solutions. In other words, the proposed solutions (in Fig. 4a,b) do not provide any time savings in map production. The little support from the automatic tools can be illustrated by comparing Fig. 4a and Fig. 4b with the manually made labeling in Fig. 3a; there are quite a few labels that are not moved and/or changed (more lines) between these maps. Instead, T-Kartor produces a labeling solution where all labels are present and then start the manual work from there. In Fig. 4c, such a map is generated in Maplex (where overlap between labels has been allowed). From this map, some placements of road labels are saved but almost all other labels have to be moved (and in some cases also divided into several lines).

Rule-Based Techniques

Labeling dense areas is a well-known challenge in automated label placement. Early studies of Doerschler and Freeman (1992) aimed at improving rule-based systems for automatic label placement to cope with high-density maps but it turned out to be difficult to utilize the available space for the labels. One improvement was the introduction of the slider model which allowed a more flexible label placement, not restricted by a fixed number of solutions (van Kreveld et al. 1999; Strijk and van Kreveld 2002). Also, optimization techniques (e.g., developed by Christensen et al. 1995 and Zoraster 1997) have shortcomings to deal with high-density areas. Much of the development of optimization techniques have concentrated on finding a solution with most added (point) labels, and have not addressed the cartographic challenges in high-density maps concerning, e.g., the association property; see, e.g., Rylov and Reimer (2014) who address this issue utilizing a multicriteria optimization technique. Haunert and Wolff (2017) argue that the association criteria must be strengthened in their development of a new integer linear programming approach. However, it should be noted that both Rylov and Reimer (2014) and Haunert and Wolff (2017) only deal with point labels, which is the usual case for label placement optimization research. This is not adequate for labeling a high-density map as in Fig. 3a. What we can see in this map, especially in the upper part, is that the point and line labels have to fight for the same space, and therefore, it is almost impossible to find a good cartographic solution if the label types are treated independently. To improve this situation, Lu et al. (2019) developed a unified framework for placing all types of labels; this framework is based on a hybrid algorithm combining discrete differential evolution and genetic algorithms. However, as far as we know, there is no available tool, commercial or open source, that has a common framework for all label types.

To circumvent adding labels to high-density areas, a leader approach could be utilized. In this approach, the label is placed outside the area and a leader connects the label to the feature, as done for the Ticket shop icon in Fig. 3b. The labels could then be placed either in the map or just outside the border of the map. For the latter case, Kindermann et al. (2015) developed an efficient algorithm that creates a planar solution (guaranteeing no overlaps of the leaders) where the labels are allowed to be placed along two borders.

Another approach in high-density areas would be to perform a selection of data that should be labeled. We have not found any research on this for city wayfinding maps, but for other types of maps. For example, Brewer et al. (2013) provided an automated method for adaptive thinning of road features and road labels suitable (for multiscale design) which removes features by a feature hierarchy and network connectivity while preserving many urban/rural local density patterns. Also, Raposo et al. (2017) perform selection of labels in a multi scale context targeting summits (point data) in hydrological datasets. The latter study is using a tessellation approach (where restrictions are set for the labels in each cell in the tessellation) which could be of interest for a city wayfinding map.

Furthermore, label placement, of, e.g., high-density areas, could utilize an automated evaluation step. This could be implemented by computing several candidate solutions in the first step, and then in the evaluation step, select the best one according to certain criteria. But even the best identified solutions could include some labels that are not placed satisfactorily, and in these cases, the evaluation step could identify which of the labels need interactive improvement. From a practical perspective, this identification would save much labor time since cartographers would not be required to manually inspect all labels from the automated solution (see, e.g., Klute et al. (2019) for a practical implementation of semi-automated map labeling). Analytical evaluation of map labeling was studied by van Dijk (2002) who quantified several map labeling rules to form a label quality function used for evaluation (for a practical use of a similar framework, see Kern and Brewer (2008)).

Deep Learning Techniques

In the deep learning domain, there are some interesting techniques that could be applied for high-density areas. As mentioned above, Lee et al. (2018) developed a model for context-aware synthesis and placement of objects. Closer to that, Volokitin et al. (2020) developed a method to automatically determine plausible locations for object placement into images considering the surrounding context. Such approaches can be useful to simultaneously determine the location to place the labels on the map, and their appearances, i.e., font and shape so as to avoid occlusion and overlaps.

Associations are important in map labeling in high-density areas. Association is linked to the concept of semantic coherence, since both concerns that the text should be placed at semantically sensible regions within the background images. To learn this pairing, Zhan et al. (2021) used semantic image segmentation datasets to classify image regions into two lists where one list includes only image regions that are semantically sensible for text embedding and the other include those which are not semantically sensible for text embedding. However, most current image composition systems deal only with one foreground object, while map labeling in dense areas deals with multiple foreground objects (labels). To include multiple foreground objects, hierarchical composition techniques have been developed (see, e.g., Zhan et al. (2021)).

If a GAN is used for map labeling in high-density areas, the formulation of the adversarial loss function (that models the difference between the original target image and the generated one) is important. In order to measure the error of the automatic label placement relatively to the original target image (manually labeled), the objective criteria used for the evaluation of the automatic segmentation methods can be used. Applicable loss functions in this case can be the overlap-based losses such as the Dice similarity coefficient or Jaccard index, or spatial distance-based ones such as mean boundary distance or Hausdorff distance (Wang et al. 2020a). In addition, the core network for the GAN discriminator should be well chosen. The basic discriminator is trained as a binary classification model to predict the probability that a given image is real. However, in a WGAN, the output is a score of “realness” for a given image. Therefore, instead of playing the role of classifier and using loss functions such as binary cross-entropy, the WGAN model uses a new loss function that pushes the discriminator to predict a precise score.

Preliminary Study: Using Deep Learning for Label Placement Evaluation

We created and assessed an evaluation framework for map labeling of high-density city wayfinding maps using deep learning (Fig. 5) (see Wei (2020) for details). The deep learning part was implemented in GoogLeNet (Szegedy et al. 2015) and trained by manually created map labeling examples. The map examples were of size 256 × 256 pixels (as required by GoogLeNet) in scale 1:2250 tailored for the learning task (Fig. 6). All the training map samples were manually classified into three quality classes (good, moderate, and bad) based on the categories legibility, disturbance, and association. In total, 2400 map samples were used, 1500 for training and 900 for validation (with an equal amount for all three quality classes). The trained network was then used to evaluate map samples where the map labeling had been automatically generated by QGIS (for details, see Cederholm 2020).

The idea was then that the trained network should be able to evaluate map samples with automated map labeling conducted in QGIS. However, it turned out that the framework was not able to perform an acceptable evaluation of the test map samples, but rather that the framework identified all input images as poor quality. It turned out that this initial test of performing map labeling evaluation using deep learning has at least four shortcomings:

1)
The evaluation schema is too complex for the neural network to learn. The map samples contained several labels and each label was manually classified according to the three categories (legibility, disturbance, and association). If only one of these labels was defined to be bad in one single category, the whole map sample was classified as “bad map labeling.”
2)
The map samples were based on a single raster file. This implies that no information about what type of features that were hidden by the labels was learned in the training by the neural network. The solution for this would be to use several raster maps for a single map sample (as done in some other deep learning image applications), e.g., with one specific raster map that only contains the labels.
3)
Due to hardware restrictions, only 800 iterations were performed in the training of the GoogLeNet network, which likely is too little training.
4)
The sample size, i.e., number of map examples, is likely too small for both training and validation.