Exploring and Splitting TopoJSONs
🥟 Appetizer
A client was interested in visualizing national geographic datasets, but the current software package struggled to display all that information at once without crashing or slowing down.
🍲 Main Dish
Exploring TopoJSON for Scalable Visualization: To improve scalability, I explored the feasibility of using TopoJSON, a more compact format than GeoJSON. The goal was to reduce file size and support incremental loading of geographic data.
I successfully converted a GeoJSON file to TopoJSON using:
# Creating a TopoJSON file from a GeoJSON
geo2topo 01001_bgct.geojson > 01001_bgct.topojson
This conversion achieved approximately a 75% file size reduction for a dataset in Virginia:
# Size comparison
size_reduction = (128.7 - 30.9) / 128.7
# ~75.99%
Understanding TopoJSON Structure
TopoJSON encodes geometry using shared arc indexes. Each shape (like a polygon) references one or more arcs by index. These arcs are stored globally, which allows different shapes to reuse the same arc definitions.
However, this means a single TopoJSON file is not always portable across datasets—unless arc indices are coordinated globally.
Incremental Loading: Is It Possible?
Yes—with caveats. Incremental loading works as long as:
You define a shared global arc file up front. You retain empty arc containers for non-visible regions. You lazily fill in arc values as needed by resolution or viewport. Example of how arc references look:
# What do arc references look like?
{
'type': 'Topology',
'objects': {
'01001_bgct': {
'type': 'GeometryCollection',
'geometries': [{
'type': 'Polygon',
'arcs': [[-2424, 149, -2423, 151, 145, 3280, -2511, 2694, 2487, -626, 2488,
-1396, 3196, 3197, -2344, -1301, 3064, 3073, 3506, 3510, 3512, 3515, ...]]
}]
}
},
'bbox': [-86.921237, 32.307574, -86.411172, 32.708213]
}
We can use the bounding box (bbox
) field to write a function that loads only the data within a visible viewport.
Safe Removal of Unused Arcs
Even if some arcs are not defined (i.e., are empty), the TopoJSON file will still render:
# Unused arcs still render
req_arcs = [abs(i) for i in test_remainder[0]['arcs'][0]]
for i in range(len(j3['arcs'])):
if i not in req_arcs:
j3['arcs'][i] = [] # Empty values for unused arcs, fill in later
with open('../data/testing/01001_bgct_mod.topojson', 'w') as f:
json.dump(j3, f)
os.path.isfile('../data/testing/01001_bgct_mod.topojson')
✅ A TopoJSON file remains valid even if some arc arrays are empty. This makes progressive loading feasible.
🍵 Aftertaste
Arc Reuse: Why TopoJSON is Compact
TopoJSON achieves its size reduction by reusing arcs. In real datasets, a single arc may be referenced up to 4 times across different geometries:
# Counting arc use across geometries
from collections import Counter
from tqdm import tqdm
c = Counter()
for geo in tqdm(j4['objects']['va_combined']['geometries']):
li = geo['arcs']
flatten = lambda l: sum(map(flatten, l), []) if isinstance(l, list) else [l]
c += Counter(flatten(li))
'''
Counter({
0: 4, 1: 4, 2: 4, ..., 924: 3, 925: 3, ...
})
'''
This means that even complex maps can remain small, provided that arcs are intelligently indexed and shared.
🍽️ Final Bite
TopoJSON provides a powerful and efficient way to compress geographic data. With some care in managing arc indices and structure, it also supports incremental rendering and lazy loading, which is crucial for large-scale interactive maps.
Enjoy Reading This Article?
Here are some more articles you might like to read next: