Follow us on Twitter

Plotting Place Names from Natural Text in Python 3

Share:

This example uses Maptitude’s new Python 3 interface to draw annotation on a map. The annotation is in the form of place names mentioned in H.G. Wells’ War of the Worlds. This also serves as a basic demonstration of using NLTK (Natural Language Toolkit) to identify named entities (proper nouns) in the book text. Annotation is used merely as a demonstration. It coud be argued that the data should be created as a point layer, allowing the possibility of further processing.

The following code was written using the Spyder environment and Python 3.5. Maptitude’s Python 3 interface is currently in beta testing and will become a standard part of Maptitude 2017.

NLTK 3.0 was also used. NLTK (Natural Language Toolkit) is a powerful toolkit of Python natural language tools intended for education. Personally I have also found it useful for prototyping natural language processing flows. Further information can be found on the official website at www.nltk.org.

Here is the code:

The script requires an existing Maptitude UK map (wow.map) and the War of the Worlds text (war_of_the_worlds.txt). The text can be downloaded from the Gutenberg project.

Most of the code should be self-explanatory with the comments. The interface is currently in beta testing, and there is currently an issue with the casting of compound objects returned by Maptitude. This is fixed with the use of Dispatch() to cast the Maptitude scope object into a usable form (line 112).

The text is processed by splitting into sentences and then words. The words are then tagged with ‘part of speech’ tags (e.g. noun, verb, etc) and then chunked into named entities (i.e. proper noun phrases). These are then extracted, and passed individually to Maptitude for geocoding. The geocoding first attempts to find the proper noun as a landmark within London. If this is not found, it then tries to find it as a town or city in the UK. Some named entities are filtered out. These include people’s names such as Ogilvy (“the chances of anything coming from Mars are a million to one“), international locations (moscow, france), and other false positives (e.g. chapter numbers). For simplicity, all of these natural language steps use existing NLTK models. For a more robust solution, many of these false positives could be removed with better, larger training data.

Here is the resulting UK map:

The results on a UK map. Most locations are accurate, but a few are mis-placed. E.g. the Strand and Hyde Park are geocoded to locations with these names, but the book actually refers to locations in London. (Click for larger view)

Notice that the script does a pretty good job. Some locations are obviously wrong. For example Strand (a street) and Hyde Park (a well known park) were not located as landmarks within London, so the town/city geocoding step located them elsewhere in the UK.

Here is the map zoomed into the London area:

Zoomed into the London area. Note that most of the locations are in central London and to the south-west – as expected from the plot. (Click for larger view)

Zooming in, we see even better locations. Notice that Horsell is located, but not Horsell Common (where the first cylinder lands). The plot starts in the Horsell area, and generally moves towards central London – as seen in this map.

This example demonstrates the use of Maptitude with Python 3. It also demonstrates that powerful applications that can be create using third party libraries such as NLTK.

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">