From pandas to geopandas¶
import pandas as pd
import geopandas as gpd
import random
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
What is Pandas?¶
Pandas
is a popular open-source library for data analysis and manipulation in Python. It provides easy-to-use data structures and data analysis tools for handling and analyzing tabular data in various formats. With Pandas, users can filter, slice, group, and transform data, as well as handle missing data gracefully. Pandas is highly extensible and integrates well with other data analysis libraries.
Pandas and geopandas¶
Geopandas
is a Python library for working with geospatial data. It is built on top of Pandas and adds support for geographic data types and operations. Geopandas provides a convenient way to read, write, and manipulate geospatial data in various formats, such as Shapefiles, GeoJSON, and PostGIS databases. It also offers powerful visualization capabilities, making it easy to create maps and visualize geospatial data. Geopandas is built on top of other popular geospatial libraries, such as Shapely, Fiona, and PyProj, and is a useful tool for a wide range of applications, including GIS, remote sensing, and environmental monitoring.
Let start with creating some synthetic data¶
We will use Mogi Model
to generate some geospatial data. The Mogi model
is a mathematical model that describes the deformation caused by a point source of volume change (e.g., a magma chamber) at depth beneath the Earth’s surface. The subsidence rates predicted by the Mogi model depend on the parameters of the model, which include the change in volume of the magma chamber, the depth of the chamber, the Poisson’s ratio of the surrounding rock, and the location of the chamber relative to the surface.
The formula for the subsidence rate (dU/dt) predicted by the Mogi model is:
$\frac{dU}{dt} = \frac{\Delta V (3 – 4\nu)}{4\pi r^3}$
where $\Delta V$ is the change in volume of the magma chamber, $\nu$ is the Poisson’s ratio of the surrounding rock, and r is the distance from the point on the surface directly above the center of the magma chamber to the center of the chamber.
This equation is for the vertical component of subsidence (i.e., the rate of change of the height of the surface). The horizontal components of deformation can be calculated using similar equations that depend on the distance from the magma chamber and the angle of measurement relative to the direction of maximum horizontal compression.
def subsidence_rate(x, y):
# Define the parameters for the Mogi model
delta_v = 0.1 # change in volume of magma
d = 5 # depth of magma chamber
poisson_ratio = 0.25 # Poisson's ratio for the crust
# Define the location of the magma chamber
lat = 0 # latitude of the magma chamber
lon = 60 # longitude of the magma chamber
# Calculate the subsidence rates using the Mogi model
r = np.sqrt((x - lon)**2 + (y - lat)**2 + d**2)
sub_data = delta_v * (3 - 4 * poisson_ratio) / (4 * np.pi) * d**3 / r**3
return sub_data
Let us create some random latitude and longitude points¶
# Set the longitude and latitude ranges
lon_range = (40, 110)
lat_range = (-20, 30)
# Create an empty list to store the random points
lat=[]
lon=[]
# Generate 100 random points within the specified range
for i in range(500):
# Generate a random longitude and latitude
lon.append(random.uniform(lon_range[0], lon_range[1]))
lat.append(random.uniform(lat_range[0], lat_range[1]))
Now we can store our list data in DataFrame¶
In pandas
, a DataFrame
is a two-dimensional table-like data structure that consists of rows and columns, similar to a spreadsheet or a SQL table. A DataFrame can hold various data types, such as integers, floats, strings, and even other objects, such as lists or dictionaries.
To create a DataFrame
in Pandas, you will need column names and values as lists of values for each column. Here’s an example:
df=pd.DataFrame()
df['Latitude']=lat
df['Longitude']=lon
df['Subsidence']=subsidence_rate(df['Longitude'],df['Latitude'])
df.head()
Latitude | Longitude | Subsidence | |
---|---|---|---|
0 | -9.995486 | 64.556974 | 0.001131 |
1 | -16.447512 | 77.177932 | 0.000139 |
2 | 8.360813 | 59.225169 | 0.002132 |
3 | -6.795347 | 106.702294 | 0.000019 |
4 | 12.859304 | 40.289230 | 0.000143 |
Writing a DataFrame as csv¶
pd.to_csv()
is a method in the pandas library that allows you to write data from a pandas DataFrame to a CSV (Comma Separated Values) file. The method takes one mandatory argument, which is the path and name of the file to write the data to. It also has several optional arguments that you can use to customize the output.
df.to_csv("Syntheic_Subsidence_Data.csv")
Reading a csv as DataFrame¶
pd.read_csv()
is a method in the pandas library that allows you to read data from a CSV (Comma Separated Values) file into a pandas DataFrame. The method takes one mandatory argument, which is the path and name of the file to read the data from. It also has several optional arguments that you can use to customize the input.
new_df=pd.read_csv("Syntheic_Subsidence_Data.csv")
new_df.head()
Unnamed: 0 | Latitude | Longitude | Subsidence | |
---|---|---|---|---|
0 | 0 | -9.995486 | 64.556974 | 0.001131 |
1 | 1 | -16.447512 | 77.177932 | 0.000139 |
2 | 2 | 8.360813 | 59.225169 | 0.002132 |
3 | 3 | -6.795347 | 106.702294 | 0.000019 |
4 | 4 | 12.859304 | 40.289230 | 0.000143 |
Changing pandas to geopandas¶
# Convert the DataFrame to a GeoDataFrame with Point objects in the 'geometry' column
gdf = gpd.GeoDataFrame(new_df, geometry=gpd.points_from_xy(new_df['Longitude'], new_df['Latitude']))
# Set the CRS of the GeoDataFrame to WGS 84 (EPSG:4326)
gdf.crs = {'init': 'epsg:4326'}
# Print the GeoDataFrame
gdf.head()
/Users/satyam/opt/miniconda3/lib/python3.9/site-packages/pyproj/crs/crs.py:130: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6 in_crs_string = _prepare_from_proj_string(in_crs_string)
Unnamed: 0 | Latitude | Longitude | Subsidence | geometry | |
---|---|---|---|---|---|
0 | 0 | -9.995486 | 64.556974 | 0.001131 | POINT (64.55697 -9.99549) |
1 | 1 | -16.447512 | 77.177932 | 0.000139 | POINT (77.17793 -16.44751) |
2 | 2 | 8.360813 | 59.225169 | 0.002132 | POINT (59.22517 8.36081) |
3 | 3 | -6.795347 | 106.702294 | 0.000019 | POINT (106.70229 -6.79535) |
4 | 4 | 12.859304 | 40.289230 | 0.000143 | POINT (40.28923 12.85930) |
This code converts a pandas DataFrame named new_df
with longitude and latitude information into a geopandas GeoDataFrame named gdf
with Point objects in the geometry
column. The gpd.points_from_xy()
function creates a Point object for each pair of longitude and latitude values in the new_df DataFrame.
After creating the gdf GeoDataFrame, the coordinate reference system (CRS) is set to WGS 84 (EPSG:4326) using gdf.crs = {'init': 'epsg:4326'}
. This sets the CRS for the gdf GeoDataFrame to the most commonly used coordinate system
for GPS coordinates.To learn more about you can read my tutorial on Changing Map Projection in Python
Finally, the head()
method is used to display the first few rows of the gdf GeoDataFrame. This will show the ‘geometry’ column with the Point objects as well as any other columns that were present in the original new_df
DataFrame.
Plotting geopandas file¶
You can geopndas file easily using cartopy or withput using cartopy. To learn how to plot geopandas read this these tutorials.
Plotting raster and vector data together in python
Plotting Shapefiles in python using GEOPANDAS
fig=plt.figure(figsize=[12,8])
ax = fig.add_axes([0,0,1,1],projection=ccrs.PlateCarree())
gdf.plot(ax=ax,c=gdf['Subsidence'], edgecolor="black",linewidth=0.4)
ax.coastlines(resolution='10m')
plt.show()