Lessons learned for large MongoDB databases

We are currently developing a system which wants to analyze all the domains in the internet. This is a really challenging task and not easily done in a few months time. Besides loads of problems, like finding so many domains and parsing them in a reasonable amount of time we also implement a MongoDB cluster to store the analyzed information. Our database has currently 200GB split into two shards but we expect this to grow up to 1-2 TB of data.

There are a lot of posts like this on the web so I’m probably not telling you something new (especially if you are senior dev dba who just goes: “oh my god… i knew that 10 years ago, it’s the same with every database” :)) but I really wanted to share the following things which bugged me quite a while:
Continue reading

How to do a deployment pipeline in jenkins

On our current project we are aiming to reach that goal of continuos delivery. There are a lot of things you need to get right to achieve that but one of the more important ones is a functional deployment build pipeline where a defined version of the source code is being built and pushed through various stages. The later the stage the more confidence you should have in your software and you should feel more sage to actually deploy it to production if this is requested by the business. The whole process should be automatic, except for some stages where one might want to trigger that stage manually.

A really nice build server which supports build pipelines out of the box is the commercial Go build server from ThoughtWorks. It was built with all the ideas of CruiseControl (an open source build server partially developed by ThoughtWorks as well) in mind but going a step further and applying all the latest real life experiences from projects on how to build high quality software.
Continue reading