Geospatial Diagrams

With the advent of smart devices like phones, wearables and IoT devices, collection of geospatial data seems to be the norm rather than an exception.

Geospatial data is typically collected as longitude (lon) and latitude (lat) of a given point on earth. If the collected information is through GSP then the lat/lon accuracy is very high. But if the collected information is based on IP-Address of the Cable network provider or Cellular network provider then the accuracy of the information is not great. Be mindful of the analytic inferences made based on the quality and source from where the geospatial data is collected.

There are a plethora of libraries which help you analyze and display geospatial information. In this lesson you will learn a few techniques:

Seaborn LM plot

LM plot show the lat/lon values in a 2 dimensional, x and y axis which is not a true geo representation. However for small areas it barely matters and an LM plot would do the job just fine. Here is an example which shows all the demolition sites of 'Able Demolition' group and although is shown in a 2 dimensional plane, it still looks like the Detroit City geo map. Check it out:


import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

detroit_demolitions = pd.read_csv('https://storage.googleapis.com/mbcc/datasets/Detroit_Demolitions_withColumns.csv', parse_dates=[4])


sns.lmplot(x='Longitude', y='Latitude', data=detroit_demolitions[detroit_demolitions['Contractor Name'] == 'Able Demolition'],
           fit_reg=False )   

plt.title("Demolition sites of Able Demolition Group in Detroit City")
plt.show()

Alt

  • Detroit city map from Google Maps

Alt

From above you notice that 'Able Demolition' company has had a demolition contract in so many points in Detroit City that the plot of the demolition sites almost represents the Detroit City map.

Using Basemap

You can also create cool looking geomaps using Basemap. However the one draw back is basemap is not part of Anaconda so you have to install all the necessary packages before using basemap.

Here are the instructions to install Basemap if you are writing your notebook on https://colab.research.google.com

Note: The below instructions is only for Colab and does not work on your computer.


!!pip install basemap

Once it is installed you are now ready to start using the Basemap to plot geopmaps. Here is an example of plotting the earthquake locations using Basmap:


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap

df = pd.read_csv('http://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_week.csv')
print (df.head(1))


plt.figure(figsize=(16, 14))

# projection value 'robin' will get you the map in Robinson projection. 
# Note: resolution value is lower case L and not 1
m = Basemap(projection='robin', lon_0=-90, resolution = 'l', area_thresh = 1000.0)
m.drawcoastlines()
m.drawcountries()
m.fillcontinents(color='green')
m.drawmapboundary(fill_color='grey')

# draw latitude and longitude lines
# The arguments for np.arange() specifies where the 
# latitude and longitude lines should begin and end, 
# and the spacing between them.
m.drawparallels(np.arange(-90.,99.,30.))
junk = m.drawmeridians(np.arange(-180.,180.,60.))

# Mark points by sending in lat/lon values
x,y = m(df['longitude'].values, df['latitude'].values)

# Plot the points by setting the marker color to yellow
m.plot(x, y, 'yo', markersize=4)
plt.title('Earthquake Locations')
plt.show()

Alt

Most of the settings for the basemap are self explanatory. Here is the link to the documentation detailing settings: https://matplotlib.org/basemap/api/basemap_api.html 'lon_0' sets the center of the map domain in degrees. You can also set the 'lat_0' in a similar way

Zooming in

You can zoom into the California region to take a closer look. You have to set the width and height from the center lat/lon in meters or set the lat/lon of the lower left corner (llc) and the upper right corner (urc) of the interested area. The below examples sets the width and height. Width is 1 x 1000000 = 1E6 meters (1000 kms) and height is 1.2 x 1000000 = 1.2E6


fig = plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution='h', 
            lat_0=37.5, lon_0=-119,
            width=1E6, height=1.2E6)
m.shadedrelief()
m.drawcoastlines(color='gray')
m.drawcountries(color='gray')
m.drawstates(color='gray')

# Mark points by sending in lat/lon values
x,y = m(df['longitude'].values, df['latitude'].values)

# Plot the points by setting the marker color to yellow
m.plot(x, y, 'ro', markersize=4)

Alt

While the above examples get you started, you can indeed get many other types of projections, set legends, change marker size etc.. Please refer to the documentation given below to play with them.

Plotly Choropleth

Choropleth maps are visually appealing maps displaying geographic areas showing the level of variability within a region. Data with geographic units, such as countries, states, provinces, and counties are very common and using Choropleth maps would be ideal to show the statistic or parameter values.

Plotly provides a convenient choropleth function and here is an example that shows unemployment rate across various counties in the US represented by FIPS county code. Note the GeoJSON file also has the same FIPS county code as the ID parameter for each county:


from urllib.request import urlopen
import json
import pandas as pd
import plotly.express as px

with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
      print(type(response))
      counties = json.load(response)


df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/fips-unemp-16.csv",
                     dtype={"fips": str})


fig = px.choropleth(df, geojson=counties, locations='fips', color='unemp',
                             color_continuous_scale="Viridis",
                             range_color=(0, 12),
                             scope="usa",
                             labels={'unemp':'unemployment rate'})

fig.update_layout(margin={"r":0, "t":0, "l":0, "b":0})  # set the margin to 0 for right, top, left and bottom of the figure
fig.show()

Note: choropleth function shown above only works for the latest version of plotly and Colab by default has the older version. To update to the latest version run the below command:

!pip install -U plotly

In this example, the geojson file consisting of counties, is first read into the counties dictionary. This dictionary is fed to the choropleth function along with the data consisting of the county wise unemployment information.

Note however that many a times you have the county information but the original data may not have the FIPS code for the county.

And the geojson that is fed to the choropleth function is based on FIPS code. You can add FIPS code to your dataset using addfips module and here is an example:

First install addfips module with the below command

pip install addfips

Then apply lambda function on the row by sending in the county and stte value to get the get_county_fips function. Save the returned value in a new column and here is an example:

import addfips

df= pd.DataFrame({
                'COUNTY': ['wayne','brown','fresno'], \
                'rate': [3.2, 5.5, 3.5],  \
                'state':['MI', 'IL', 'CA']
                })
af = addfips.AddFIPS()

df['fips'] = df.apply(lambda row: af.get_county_fips(row['COUNTY'], row['state']), axis=1)

df.head()

Output:

     COUNTY    rate    state    fips
 0    wayne    3.2    MI    26163
 1    brown    5.5    IL    17009
 2    fresno    3.5    CA    06019
 

Info

  • axis=0 will apply the lambda function on the COLUMN object and will iterate as many times as the number of COLUMNS are present
  • axis=1 will apply the lambda function on the ROW object and will iterate as many tines as the number of ROWS are present

Geo plot with US State codes only

If you want to show the boundaries based on state and not on counties, you can remove the fips keyword argument and instead add locationmode='USA-states' keyword argument with locations being set to 'states' as shown below:


df = pd.DataFrame({'state': ['MI', 'IL', 'WA', 'NJ', 'TX', 'FL','CA'],
 'unemp': [5.3, 5.4, 6.6, 5.5, 7.2, 7.1, 11]})

fig = px.choropleth(df, locations='state', locationmode="USA-states", color='unemp',
                             scope="usa",
                             labels={'unemp':'unemployment rate'})
fig.show()

Note, in the above examples we went with the default color settings. These are some examples to get you started but please do refer to the official documentation to learn more on all the arguments that you can set.

While the basic Plot.ly module is open source project, there are many extensions to Plot.ly like Dash Enterprise and Chart Studio Enterprise which are proprietary.

Leaflet

Leaflet is another open source JavaScript based library for map functionality. The library itself is very light weight and loads really fast. For this to work, you not only have to download the library but also extensions to Jupyter. More details here: https://github.com/jupyter-widgets/ipyleaflet

GeoPandas

Another popular geo library based on Pandas is GeoPandas. More details here: http://geopandas.org/ GeoPandas is based on GeoSeries and GeoDataFrame which are subclasses of Pandas Series and DataFrame respectively, and are based on three geometric objects; Points, Lines and Polygons.

Other libraries

There are many other libraries among which Google Maps stand out from the crowd for its feature rich and interactive map capabilities.

Reference:

results matching ""

    No results matching ""