Follow us on Twitter

Multi-processing MapPoint with Python


Over the past year or so, there have been a number of queries on concerning the use of Microsoft® MapPoint® in a multi-threaded environment. Some of these queries concern the use of a common MapPoint instance across multiple threads; and others attempt to improve processing speeds by using multiple threads and MapPoint instances. There are many problems with using MapPoint in a multi-threaded environment. This article describes the limitations, and gives a simple demonstration of successful multi-processing using a Python script. General multi-threading techniques are beyond the scope of this article. A good guide to the language of your choice will give you the basics for that language. Specialist books (e.g. Herlihy & Shavit’s The Art of Multiprocessor Programming) provide a much more detailed coverage of this complex subject.

Multi-Threading and Microsoft MapPoint

Later versions of Microsoft MapPoint use internal multi-threading and can take advantage of modern multi-core computers to increase the speed of cpu-intensive tasks such as route finding. This capability is not perfect, and experience has shown that MapPoint 2010 is capable of using approximately one and a half cores when route finding. These threads are completely internal to MapPoint and are hidden from both the user and the MapPoint API.

In fact, MapPoint’s API is not thread safe. It must always be called from the same thread that created the MapPoint application instance. This is the biggest restriction when attempting to use MapPoint from a multi-threaded application: ALL MapPoint communication must work through the same ‘controller’ thread. Multiple threads that call a singleton MapPoint instance or MapPoint control will fail. Instead, a singleton controller thread must be created, and this must perform all communication to the MapPoint instance (control or application).

Typically when I create an application that uses multi-threading to provide a responsive user-interface, I will use the main application thread for all MapPoint processing, and then start a new thread to handle the user interface.

Using multiple MapPoint instances for faster processing

Although MapPoint processing cannot be made faster using multiple calls from different threads, it is possible to use multi-processing to improve the speed of certain processing applications. This is performed by creating multiple processing threads, each with its own independent instance of MapPoint. This technique can only be used where the processing is “embarrassingly parallel”, i.e. it can be easily broken into many independent chunks. E.g. batch geocoding or batch route calculations.

I shall demonstrate how to do this using Python 3. The code was developed against Python 3, but should be portable to Python 2.*. The sample calculates the driving distance for ten routes using two processing threads. It will appear marginally faster on a dual core processor, but should run almost twice as fast as a single thread program when run on a quad core processor. It is assumed that you are familiar with Python and using it with MapPoint’s COM interface. If not, the article Using Python to Control MapPoint, is recommended.

There are two approaches to multi-processing in Python. The conventional thread approach using the threading module does not produce true multi-processing. The Python interpreter uses an internal global interpreter lock which means the interpreter only executes one thread at a time. This is usually satisfactory when using multi-threading for handling I/O requests, but it will not work if the intention is to quickly execute a cpu-intensive task. Therefore the second approach, the multiprocessing module, should be used instead.

Due to process restrictions in Windows, you will probably need to run a multiprocessing script from the command line and not using an environment such as PythonWin, e.g. with a command line call such as:

So let’s start coding the script! The script starts very much like any other MapPoint Python script, but we also import the multiprocessing and time modules:

Next we shall define the work to perform. The work is split into two lists of routes, one per thread. Each route has a name and two coordinates. A production script would load data from a file or database, and split it into two chunks.

Next we define the code for our processing thread. This is inherited from the multiprocessing.Process class, but we override the __init__() and run() methods. The initializer is passed a list of the routes, and it stores a local reference to this store.

The run() method defines the actual processing thread. This definition is actually very simple. It creates a hidden MapPoint application instance, loops over each route calculating it, and then closes the MapPoint object. The distance results are sent to the terminal using print().

Here is the thread definition:

Now we just need the ‘main’ function which starts the processes. Here it is:

This code starts two process threads (p and pp), using the two route lists. After these are started, there is a loop which continues until both threads have ended. Note the time.sleep() call which stops this parent thread from using excessive cpu cycles whilst it is waiting for the processing threads to finish.

And that is it! Note that both processing threads are completely independent and do not mix MapPoint objects between each other.

Occasionally ‘independent’ MapPoint instances can interfere under certain circumstances. For example, two MapPoint instances cannot simultaneously open the same map (ptm) file at the same time. This could happen if you tried to initialize two MapPoint instances with the same data. To do it, you will need to create a lock to control the open process. Then set each processing thread so that it can only open the map file when it has the lock. Once the map file has been loaded, the lock can be released so that the next thread can load the file. It is perfectly fine for multiple MapPoint instances to display the same map file at the same time.

So as you can see, multi-processor MapPoint programming is actually quite simple, as long as all communication with a particular MapPoint instance is through a single thread.

Leave a Reply




You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">