We are changing our system. We settled on git (but are open for alternatives) as long as we can selfhost it on our own machines.
Specs
Must have
- hosted on promise
- reliabile
- unlikely to be discontinued in the next >5 years
- for a group of at least 20 people
Plus
- gui / windows integration
I’m aware this is the selfhost community, but for a company of 20 engineers, it is probably best to use something commercial in the cloud.
Biggest pain point was for our ops guy, who constantly had to stay behind to perform upgrades and maintenance, as they couldn’t do it during business hours when the engineers are working. With a team of at least 20, scheduling downtimes could get increasingly more difficult.
It also adds an entire system to be audited by the auditors.
The selfhost vs buy commercial kind of bounces back and forth. For smaller teams, less than 5 to 10 engineers, it might be a fun endeavour; but from that point on, until you get to mega corp scale with dedicated ops department maintaining your entire infrastructure, it is probably more effective to just pay for a solution from a major vendor in the cloud instead.
Git should be able to go down during the day. Worst case you just can’t push to origin for a little while. You can still work and commit locally.
No PRs means no automated tests/CI/CD, which means you’d slow down the release train. It might typically be just a 2 minutes quick cycle, but that one time it goes off for longer due to a botched update from upstream means you’re never going to do that again during business hours.
Eh, we’ve had our self-hosted Github go down for a couple hours in the daytime, and it wasn’t a big deal. We have something like 60 engineers spread out across the globe, about 15-20 that were directly impacted by the outage (the rest were in different timezones). Yeah, it was annoying, but each engineer only creates like 1 or 2 PRs in a given day, so they posted their PRs after the outage was resolved while working on something else. Yeah, PRs were delayed by a couple hours, but the actual flow of work didn’t change, we just had more stuff get posted all at once after the problems resolved.
In fact, Github would have to be out for 2 days straight before we start actually impacting delivery. An hour or two here and there really isn’t an issue, especially if the team has advance notice (most of the hit to productivity is everyone trying to troubleshoot at the same time (is it my VPN? Did wifi die? Etc).