Thomas van Dijk

Junior-Professor, Ruhr-Universität Bochum tvdmaps@gmail.com

I am an applied algorithms researcher working on geographic information systems (GIS), eco-informatics, information visualisation, historical information, and algorithmically-guided user interaction.

Download full CV as PDF here . Fulltext author's copies of research papers available here .

Research Grants

Algorithmic Quality Assurance: Theory & Practice

DFG eigene Stelle, Principal Investigor (three years)

independently developed and wrote proposal (Di 2161/2-2) in Priority Programme 1894 “Volunteered Geographic Information.” This funds my current position in Würzburg.

The use of crowdsourcing provides a big opportunity for many sciences, such as geographic information science and the digital humanities. A particular application that has seen a big boom of crowd projects is the extraction of information from historical documents, since this involves many problems that continue to elude fully-automatic solutions. Numerous projects have sprung up, some with notable success (such as GB1900, Transkribus and Maps-in-the-Crowd). However, the use of volunteered information comes with major concerns about data quality. In this project we investigate novel methods for quality assessment in crowdsourcing projects: not just checking the quality at the end, but viewing the entire crowdsourcing process as a computation where we need to be careful about data quality throughout. In doing so, we open up a range of mathematical and algorithmic techniques, including Bayesian modelling, optimal task selection based on active learning, and sensitivity analysis.
Our prior experience indicates that this is not only of theoretical interest, but will indeed result in practically useful systems that outperform ad-hoc approaches. We take historical material as the main source of our experimental material, as well as considering more modern data. This serves two purposes. Firstly, this brings us closer to the other projects in the priority programme. Secondly, the methodology we have developed so far (and will continue to develop) is generally applicable: we will validate this by applying it more broadly. Fruitful applications include image recognition tasks (such as those encountered in the ENAP and COVMAP projects) and information extraction from microblogs (EVA-VGI). Through this project, we will arrive at a more refined understanding of how the power of crowdsourcing can be harnessed - by paying specific attention to the quality of data as it moves through a project, and designing algorithmically-supported processes to guide it.
Jan. 2020 – Jan. 2023

Algorithmically-guided User Interaction: Smart Crowdsourcing and the Extraction of Metadata from Old Maps

DFG eigene Stelle, Principal Investigor (3 years)

Developed proposal; co-wrote with Prof. Alexander Wolff. (Di 2161/2-1) in Priority Programme 1894 “Volunteered Geographic Information.” This funded my own position in Würzburg.

There are many practical problems that currently cannot be solved by algorithmic means, not because our algorithms are too slow but because we have no satisfactory algorithm at all. An example for such a practical problem is the extraction of meaningful metadata from historical maps. The problem has so far defied automation, which is very unfortunate because old maps contain lots of useful information, e.g., for historians, economists, and city planners. Libraries are producing massive amounts of digitised maps by scanning their historical collections, but struggle to “understand” the contents of these collections: computer-vision algorithms are not up to the task of extracting the desired metadata and there are too many maps to do it by hand. We will attack these problems by developing smart crowdsourcing solutions. With our collaborators (such as three libraries and the Open Historical Data Map project) we will attack real problems on real data.
Most crowdsourcing applications rely on the crowd’s most obvious power: its multitude. If we throw enough users at a problem, we may solve it by brute force. But just like brute-force algorithms are often grossly inefficient, so is the indiscriminate application of human effort. In the proposed project we will develop algorithmically-guided, adaptive user interaction where the algorithm and (sets of) users cooperate to efficiently and reliably solve a problem instance. This is achieved through proper algorithmic modeling of the underlying computational problems, followed by techniques such as active learning (efficient task selection) and sensitivity analysis (quality assurance).
Through this project, we will arrive at a more refined understanding of how the power of crowdsourcing can be harnessed – for generating general geographic data, as well as specifically for metadata extraction from historical maps. What are the correct notions of efficiency and what are the algorithmic techniques to optimise it?
Fall 2016 – Jan. 2020

Extracting Spatial Information from Historical Maps: Algorithms & Interaction

Promotionsstipendium der Studienstiftung des deutschen Volkes (3 years)

Developed and co-wrote Benedikt Budig’s proposal for his PhD student grapht; this funded him for three years as my PhD student.

Historical maps are fascinating documents and a valuable source of information for scientists of various disciplines. Many of these maps are available as scanned bitmap images, but in order to make them searchable in useful ways, a structured representation of the contained information is desirable.
This book deals with the extraction of spatial information from historical maps. This cannot be expected to be solved fully automatically (since it involves difficult semantics), but is also too tedious to be done manually at scale.
The methodology used in this book combines the strengths of both computers and humans: it describes efficient algorithms to largely automate information extraction tasks and pairs these algorithms with smart user interactions to handle what is not understood by the algorithm. The effectiveness of this approach is shown for various kinds of spatial documents from the 16th to the early 20th century.
Nov. 2014 – Nov. 2017

Employment

Ruhr-Universität Bochum

Junior-Professor Eco-Informatics

Department of Civil and Environmental Engineering.

Aug. 2021 - present

Universität Würzburg

Independent Postdoc

Independent postdoc at Universität Würzburg at Prof. Alexander Wolff’s Lehrstuhl für Informatik I: Algorithmen und Komplexität, working on algorithms for geographic information systems (GIS), information visualisation, historical information, and algorithmically-guided user interaction.

Funded by my own grants since Fall 2016.

Jan. 2016 - Jul. 2021

Universität Würzburg

Postdoc "Algorithms for Interactive Variable-Scale Maps"

DFG project, principal investigator: Jan-Henrik Haunert.

Jan. 2012 - Jan. 2016

Universiteit Utrecht

PhD project "Wireless Sensor Networks: Structure & Algorithms"

Chair for Algorithmic Systems. Advisor: Jan van Leeuwen.

Full introduction: PDF

(...)
In dealing with these topics, we will have touched on many aspects of wireless sensor networks, their structure and the related algorithms. Before the thesis proper begins, let us briefly draw attention to two divisions that it contains. The first is readily apparent from the table of contents: a Part I about physical models and a Part II about graphical models. This is a technical divide and merely a result of organising the material.
The second divide is methodological in nature and a divide that, where possible, we have tried to bridge. On the one hand, we have theoretical, worst-case results about the runtime of algorithms and the structural properties of wireless networks. On the other hand, we have experimental results based on implementations of our algorithms and on simulations. These two approaches are sometimes seen as being at odds with each other—indeed, as the saying goes: “In theory, theory and practice are the same. In practice, they are not.” In recognising this difference, we find that in fact theoretical and experimental research complement each other to give a more comprehensive view of the subject.
As an example in this thesis, we can point to Chapters 4 and 5 about the Link independent set problem. Even though Chapter 4 is mainly concerned with the formal correctness and worst-case runtime of an algorithm, the design of the algorithm was very much informed by experiments in the early stages of the research. Then, in Chapter 5 we complement the worst-case result with runtime measurements on an actual implementation of the algorithm. With this implementation we observe some structural properties of wireless networks and, moving to theory again, continue to prove some of the observed behaviour. As another example, the concept of a regular schedule (Chapter 3) and the corresponding theorems were inspired by the experimental results in Table 3.2. In these ways, there is a back and forth between theoretical and experimental results.
It is important to not be satisfied too easily with experimental appearances: models and theories are what computer science is built on, for good reason. Yet a theorem might not tell the whole story, and pure theory might not be how we arrive at a result.
Sept. 2007 - Sept. 2011

Education

Universiteit Utrecht

MSc Applied Computing Science (with honours)

Thesis: Fixed Parameter Tractability of Feedback Set Problems

One of the classic NP-complete problems is Feedback Vertex Set: in an undirected graph, remove the least possible number of vertices such that the resulting graph is acyclic. In the theory of fixed parameter complexity, a 'kernelization' is a preprocessing algorithm with a proven quality guarantee; this thesis extendeds work by Bodlaender, who has shown that the Feedback Vertex Set problem, parameterized on the size of the solution, has a kernelization to cubic size.
We have extended this kernelization algorithm to work on Loop Cutset (a problem related to Feedback Vertex Set, arising in inference in probabilistic networks) and provide additional analysis of the algorithm.
Furthermore, we have implemented these kernelization algorithms and experimentally show that they perform well in practice, both regarding their runtime and their effect on the size of the instance. We also show that the kernelization significantly speeds up the calculation of an exact feedback vertex set/loop cutset, i.e.: it is an effective preprocessing.
2005 – 2007

Universiteit Utrecht

BSc Computer Science (with honours)

Minor: Software Engineering

2002 – 2005

Cals College, Nieuwegein

Gymnasium

Profile: Nature & Technology, with Latin and Computer Science.

1996 – 2002

Software

Information Visualisation
fast-linear-carto

Realtime spatially-informative linear cartograms using least squares adjustment. Also useful for drawing metro maps in realtime.

C++

armstrong

Topologically safe rounding of geographic networks to a grid using two-stage simulated annealing.

C++

story-bc

Optimal block crossings in Storyline Visualisation: efficient fixed parameter tractable algorithm and a SAT-solver based solution.

C++, Python

Historical and crowdsourced data analysis
glyph-miner

Fully production-ready system for rapidly extracting glyph occurrences from early type-set prints. Interactive user interface based on active learning.

Python, Javascript, C++

jigglr

Interactive local search to improve crowdsourced building footprints from historical maps.

C#

lineman

Tool for aligning GeoJSON LineStrings to bitmap images such as old maps; optimal alignment using an hidden Markov models.

C++

building-sleuth

Smart crowdsourcing workflow system for extracting building footprints: smart user interface for data entry and algorithmic data integration.

Python, Javascript

Graph Algorithms
wupstream

Very fast calculation of upstream features in utility networks. Developed for the ACM SIGSPATIAL GIS Cup 2018. Second place of 15 at SIGSPATIAL Cup 2018.

C++

fvsk

Kernelisation, approximation and exact algorithms for Feedback Vertex Set and Loop Cutset.

C++

Teaching & Utility
frechet-demo

Interactive visualisation of Fréchet-distance parameter space.

Javascript

Link
mini-ipe

An easy, no-dependencies package for writing IPE files from Python. Ipe is an “extensible drawing editor” that is excellent for making diagrams for scientific papers and presentations. This makes its file format ideal as output from computational experiments.

Get it using python -m pip install miniipe .

haelight

Syntax highlighting for C++ code in Adobe After Effects text layers. Useful for making lecture videos involving code.

Javascript

azimut-tool

Website for determining the direction (azimuth) of rooftops from open digital orthophoto imagery. This is relevant for estimating the yield of solar panels.

Javascript

Link

Awards

  • 2nd place out of 15 submissions, ACM SIGSPATIAL Cup 2018 . For "Wüpstream: efficient enumeration of upstream features (GIS cup)," with T. Greiner, B. den Heijer, N. Henning, F. Klesen, and A. Löffler.
  • Best Fast-forward Presentation at ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems 2018. For "Realtime linear cartograms and metro maps," with D. Lutz.
  • Shortlisted for membership of the Young Academy of the Bavarian Academy of Sciences, class of 2017.
  • Included on the ACM Computing Review’s "Best of Computing" list of notable publications in 2016. For "Matching Labels and Markers in Historical Maps: An Algorithm with Interactive Postprocessing," with B. Budig and A. Wolff.
  • Best Paper award (Theory track) , at the International Conference on Graph Drawing and Network Visualization (GD) 2016. For "Block Crossings in Storyline Visualizations," with M. Fink, N. Fischer, F. Lipp, P. Markfelder, A. Ravsky, S. Suri, A. Wolff.
  • Best Applied Paper award , at the International Conference on Discovery Science (DS) 2015. For "Active Learning for Classifying Template Matches in Historical Maps," with B. Budig.
  • Runner-up Best Poster award at ACM SIGSPATIAL 2015. For "There and Back Again: Using Fréchet-Distance Diagrams to Find Trajectory Turning Points," with L. Beckmann, B. Budig and J. Schamel.
  • Best student contribution award at Schematic Mapping 2014, sponsored by the British Cartographic Society. For "An Automated Method for Circular-Arc Metro Maps," with A. van Goethem and W. Meulemans.
  • Best Fast-forward Presentation & Runner-up Best Poster awards at ACM SIGSPATIAL 2013. For "Accentuating Focus Maps via Partial Schematization," with A. van Goethem, J.-H. Haunert, W. Meulemans and B. Speckmann.
  • Best Short Presentation at Web and Wireless GIS 2013 (W2GIS). For "A Probabilistic Model for Road Selection in Mobile Maps," with J.-H. Haunert.
  • Honourable mentions at the International Collegiate Programming Contest (ICPC) , North-West European regional finals, 2005 & 2006, Stockholm.