City Contributor
Crowdsourced City Data
City Contributor offers a simple, proof-of-concept framework for crowdsourced data archiving aimed at municipalities. The solution consists of a Python FastAPI backend to store datasets locally and a React frontend to allow city administrators to upload and manage data, while contributors download and mirror datasets. Once five verified mirrors exist, the original city-owned copy is automatically removed to save storage costs, relying on a distributed network of community volunteers for data preservation.
Why It Benefits the City
Many municipalities periodically delete older datasets to manage storage costs or adhere to specific retention policies. City Contributorprovides an alternative: local data is removed after enough volunteer contributors (seeders) have verified hosting the files themselves. This approach not only reduces costs but also creates distributed resilience—no central point of failure, and data can be quickly restored if a contributor’s link goes down or if more mirrors are needed.
Verifiability
To ensure city data is not tampered with, the city only stores the original hashes of the files it is still hosting or used to host. Contributors can verify the integrity of their copy by comparing the hash of their local file with the hash stored on the city server. If the hashes match, the file is considered valid. This approach prevents contributors from uploading modified or corrupted data. For a seeder link to be added in the list of verified mirrors, the server automatically checks the hash of the file hosted by the contributor against the original hash.
Since the city portal also serves as a proxy for serving the distributed links to crowdsourced files, we also see an opportunity for the file to be locally downloaded (proxied) through a city server, with its hash validated, then sent to the user if the hash is confirmed. This allows us to properly distribute the city data with no potentially malicious users.
Future Extensions
1. Re-check Active Hosts: Over time, the city server can periodically confirm that contributor links remain valid. If the network of active hosts falls below five, the city can restore a local copy from the existing mirrors (verified by hashing or checksums).
2. Production Setup: While this demo uses a Python server storing data on disk, a more robust approach could employ serverless functions (AWS Lambda, Google Cloud Functions, etc.) tied to a fully managed database. We invite municipalities or developers to explore production deployments. Feel free to get in touch!