Planning to Measure the Costs of Storing 1TB of Git Repos
I don't want to introduce labels, but I think it's a good idea to introduce a series in this blog about data hoarding, data collection.
Together with the principle of owning my own data, external data is also relevant, its resource for learning and reference.
And despite of the modern world depend on abundance of data and a way to transfer it, it doesn't hurt to mirror a little of the interesting parts. "Nothing is so bad that can't get worse".
Said that, let's talk about the project.
As far as I know, the most reliable way to store it is as plain text, versioned in a repository, and luckily there's a lot of public repositories in the internet.
So the idea is to measure all the costs involved in cloning 1TB of public Git repositories in the shortest amount of time, including all main resources spent on it:
- Human and computer time
- Network bandwidth
- Any extra problem (ie, get blocked)
- Any more?
The reason? Motivate myself to build a better offline "mirror of the internet" to me, and feed the motivating utopian projection of a future in debris that things went wrong, but I got some cards in the sleeve.
Figure 1: Cataclysm: Dark Days Ahead showing "Evac shelter computer". The game log shows "You mark the refugee center and the road that leads to it…"