In my previous role as a data scientist on the research team at a SaaS start-up in the logistics industry, I helped develop AI-driven solutions to route hundreds of thousands of global packages intelligently and efficiently using reinforcement learning.
In practice, this meant building several proof of concepts (PoCs) for internal stakeholders and potential clients. These PoCs were often interactive dashboards or simplified versions of our software, showcasing the latest algorithm updates with either client-provided or sample data.
Proof of concept, also known as proof of principle, is a realization of a certain idea, method or principle in order to demonstrate its feasibility, or viability, or a demonstration in principle with the aim of verifying that some concept or theory has practical potential.
— Wikipedia
Because we were a start-up, there was close to no enterprise software and the team had to leverage extensively on open source resources.
I was truly mind-blown everyday how powerful open source packages could be. I am excited to share my detailed learnings with you the top 5 Python packages that I have come across along the way in today’s newsletter. And I hope you find them insightful as well!
1. Folium
Folium is an incredibly useful tool for mapping vehicle routes and shipments, especially when working with interactive data visualizations. It integrates well with Python, allowing for the creation of maps with custom layers, controls, and popups, which is perfect for logistics projects where different elements of data need to be visualized simultaneously.
For example, using Folium, I could create maps with popups that displayed shipment details or status updates at specific locations, which made tracking routes much more dynamic. The ability to add layers and toggle between different types of routes, traffic patterns, or delivery zones is enabled with the legend function. It was especially useful when showing different stakeholders various angles of the same data.
Strengths: Interactive mapping with popups, layers, and controls; intuitive for visualizing data.
Limitations: Not optimized for large datasets or real-time high-performance rendering.
Example: Plot a map of vehicle routes with popup markers for start and end points.
import folium
m = folium.Map(location=[43.65107, -79.347015], zoom_start=12)
folium.Marker([43.65107, -79.347015], popup='Start').add_to(m)
m
2. GeoPandas
GeoPandas is my go-to choice for handling geospatial data in Python. If you are new to geospatial data science, I highly recommend doing a few GeoPandas project to get your foot in the door.
Whether you’re dealing with formats like GeoJSON, Shapefiles, or other vector-based geospatial formats, GeoPandas simplifies the process by extending the capabilities of Pandas to support geospatial operations.
One of my favorite and most critical features of GeoPandas is its ability to create and manipulate geometry columns directly in a DataFrame, based on location attributes such as latitude and longitude coordinates.
This is a god-send for performing spatial analysis natively within Python for two reasons. It
1) enables seamless integration with packages like Folium for visualizing your data, whether they are points, lines, or polygons, making it an essential tool when preparing geospatial data for routing applications, and
2) allows you to perform tasks like calculating distance between points, create buffers of attributes, or performing spatial joins or multiple geometric shapes, which are all so critical for working with transportation and logistics data.
Strengths: Supports a wide variety of geospatial formats (GeoJSON, Shapefiles), adds geometry columns based on lat/long coordinates, and offers powerful native spatial analysis tools.
Limitations: May struggle with very large datasets or highly complex geometries in terms of performance. For large geospatial data, PostGIS with cloud hosted data sources is your best friend - ask me about my PostgreSQL LinkedIn Learning course!
Example: Here’s a quick demonstration of how GeoPandas can convert latitude and longitude into geometry points for spatial analysis:
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point
# Create a sample DataFrame with lat/long data
data = {'Latitude': [40.7128, 34.0522],
'Longitude': [-74.0060, -118.2437]}
df = pd.DataFrame(data)
# Convert to GeoDataFrame and add geometry column
geometry = [Point(xy) for xy in zip(df['Longitude'], df['Latitude'])]
gdf = gpd.GeoDataFrame(df, geometry=geometry)
3. NetworkX
NetworkX is a package designed to work with graph networks, which consists of nodes and edges. In the context of transportation and logistics projects, nodes can represent key points such as delivery locations, intersections, or depots, while edges represent the roads or routes that connect these points.
Each edge in a NetworkX graph can have various attributes attached to it, such as distance, travel time, cost, or congestion levels. This flexibility allows for highly customizable network models, where you can define your own weighting scheme depending on the specific optimization criteria you’re working with.
For example, in a delivery network, edges might represent the roads, with weights assigned based on the travel distance or time between two points.
If you’re optimizing for time, each edge’s weight could be the estimated travel time, factoring in traffic or road conditions.
If you’re minimizing distance, the weights could be the physical distance between nodes.
NetworkX lets you easily calculate shortest paths, find bottlenecks, or even run more advanced algorithms like Dijkstra’s or A* to efficiently route vehicles.
Strengths: Versatile for modeling complex networks; includes a vast library of algorithms like shortest path, Dijkstra’s, and minimum spanning tree.
Limitations: Performance may lag with extremely large networks or datasets.
Example: Find the shortest path between two nodes (points) in a transportation network:
import networkx as nx
# Create a graph with nodes and edges
G = nx.Graph()
G.add_edges_from([(1, 2, {'weight': 5}), (2, 3, {'weight': 3}), (1, 3, {'weight': 10})])
# Calculate shortest path from node 1 to 3
shortest_path = nx.shortest_path(G, source=1, target=3, weight='weight')
print(shortest_path)
4. OSRM (Open Source Routing Machine)
OSRM (Open Source Routing Machine) is a high-performance routing engine designed to calculate routes and provide driving directions using OpenStreetMap (OSM) data. It’s ideal for building routing applications for logistics, transit, and other use cases requiring fast, efficient pathfinding.
One of the standout features of OSRM is its ability to perform isochrone analysis.
Logistics operations can use isochrone analysis to map out travel zones from a warehouse or depot. It helps determine how far delivery vehicles can reach within a set amount of time, allowing for efficient planning of routes and service areas.
Routing and Pathfinding
Keep reading with a 7-day free trial
Subscribe to Portfolio Method Newsletter -- @maggieindata to keep reading this post and get 7 days of free access to the full post archives.