Exploring and Splitting TopoJSONs

🥟 Appetizer

A client was interested in visualizing national geographic datasets, but the current software package struggled to display all that information at once without crashing or slowing down.

🍲 Main Dish

Exploring TopoJSON for Scalable Visualization: To improve scalability, I explored the feasibility of using TopoJSON, a more compact format than GeoJSON. The goal was to reduce file size and support incremental loading of geographic data.

I successfully converted a GeoJSON file to TopoJSON using:

# Creating a TopoJSON file from a GeoJSON
geo2topo 01001_bgct.geojson > 01001_bgct.topojson

This conversion achieved approximately a 75% file size reduction for a dataset in Virginia:

# Size comparison
size_reduction = (128.7 - 30.9) / 128.7
# ~75.99%

Understanding TopoJSON Structure

TopoJSON encodes geometry using shared arc indexes. Each shape (like a polygon) references one or more arcs by index. These arcs are stored globally, which allows different shapes to reuse the same arc definitions.

However, this means a single TopoJSON file is not always portable across datasets—unless arc indices are coordinated globally.

Incremental Loading: Is It Possible?

Yes—with caveats. Incremental loading works as long as:

You define a shared global arc file up front. You retain empty arc containers for non-visible regions. You lazily fill in arc values as needed by resolution or viewport. Example of how arc references look:

# What do arc references look like?
{
    'type': 'Topology',
    'objects': {
    '01001_bgct': {
        'type': 'GeometryCollection',
        'geometries': [{
        'type': 'Polygon',
        'arcs': [[-2424, 149, -2423, 151, 145, 3280, -2511, 2694, 2487, -626, 2488,
                    -1396, 3196, 3197, -2344, -1301, 3064, 3073, 3506, 3510, 3512, 3515, ...]]
        }]
    }
    },
    'bbox': [-86.921237, 32.307574, -86.411172, 32.708213]
}

We can use the bounding box (bbox) field to write a function that loads only the data within a visible viewport.

Safe Removal of Unused Arcs

Even if some arcs are not defined (i.e., are empty), the TopoJSON file will still render:

# Unused arcs still render
req_arcs = [abs(i) for i in test_remainder[0]['arcs'][0]]

for i in range(len(j3['arcs'])):
    if i not in req_arcs:
        j3['arcs'][i] = []  # Empty values for unused arcs, fill in later

with open('../data/testing/01001_bgct_mod.topojson', 'w') as f:
    json.dump(j3, f)
os.path.isfile('../data/testing/01001_bgct_mod.topojson')

✅ A TopoJSON file remains valid even if some arc arrays are empty. This makes progressive loading feasible.

🍵 Aftertaste

Arc Reuse: Why TopoJSON is Compact

TopoJSON achieves its size reduction by reusing arcs. In real datasets, a single arc may be referenced up to 4 times across different geometries:

# Counting arc use across geometries
from collections import Counter
from tqdm import tqdm

c = Counter()
for geo in tqdm(j4['objects']['va_combined']['geometries']):
    li = geo['arcs']
    flatten = lambda l: sum(map(flatten, l), []) if isinstance(l, list) else [l]
    c += Counter(flatten(li))

'''
Counter({
    0: 4, 1: 4, 2: 4, ..., 924: 3, 925: 3, ...
})
'''

This means that even complex maps can remain small, provided that arcs are intelligently indexed and shared.

🍽️ Final Bite

TopoJSON provides a powerful and efficient way to compress geographic data. With some care in managing arc indices and structure, it also supports incremental rendering and lazy loading, which is crucial for large-scale interactive maps.