Everyone knows how to fork that interesting open source project, it’s simple and handy to do. What’s not so easy to do is to merge back a fork that has over the years taken on a life of its own and for many reasons has diverged drastically from the original repo.
This is a case study of an ongoing project we are doing with SWT XYGraph, a visualisation project that is now part of Eclipse Nebula. It is the story of a fork of SWT XYGraph maintained by Diamond Light Source, the UK’s national synchrotron. But mostly it is a story about the efforts to merge the fork, reduce technical debt, and work towards the goal of sharing software components for Science, a key goal of the Eclipse Science Working Group.
Know Your History
One of the first things in this project was to understand the history – spanning 8 years – of the fork. We knew the Diamond fork was done before SWT XYGraph became part of Nebula and under the Eclipse Foundation umbrella. The fork was made in order to quickly add in a number of new features that required some fundamental architectural changes to the code base.
However on looking through the history, we found there were more than just 2 forks involved. The original project had been developed as part of Control System Studio (CSS) from Oakridge National Labs. CSS had in turn been forked by Diamond and customised for the local facility. Even though SWT XYGraph had been contributed to the Eclipse Nebula project, the original repo and many, many forks were still out there: more than enough forks for a dinner party. I can’t explain it any further in words so will dump our illegible working diagram of it all here:
Patches were pulled across and merged across forks when it was straightforward to do so. But with so many forks, this was a case where git history really mattered. Anywhere the history was preserved it was straightforward to track the origins of a specific feature – much harder in the cases where the history was lost. Git history is important, always worth some effort to preserve.
Choose Your Approach Carefully
Deciding if it worthwhile to merge a big fork takes some consideration. The biggest question to ask is: Are the architectural changes fundamentally resolvable? (Not like Chromium’s fork of Webkit – Blink). If that is a yes, then it’s a case of trading off the long-term benefits for the short term pain. In this case, Diamond knew it was something they wanted to do, more a matter of timing and picking the correct approach.
Together there seemed to be 2 main ways to tackle removing the fork that was part of a mature product in constant use at the scientific facility.
Option 1: Create a branch and work in parallel to get the branch working with upstream version, then merge the branch.
Option 2: Avoid a branch, but work to incrementally make the fork and upstream SWT XYGraph plug-ins identical, then make the switch over to the upstream version.
Option 1 had been tried before without success; there were too many moving parts and it created too much overhead, and ironically another fork to maintain. So it was clear this time Option 2 would be the way forward.
Tools are Your Friend
The incremental merging of the two needed to be done in a deliberate, reproducible manner to make it easier to trace back any issues coming up. Here are the tools that were useful in doing this.
1. Git Diff
The first step was to get an idea of the scale of the divergence, both quantitatively and qualitatively.
For quantity, a rough and ready measure was obtained by using git diff:
$ git diff --shorstat 399 files changed, 15648 insertions(+), 15368 deletions(-) $ git diff | wc -l 37874
2. Eclipse IDE’s JDT formatter
Next, we needed to remove diffs that were just down to formatting. For this using Eclipse IDE and the quick & easy formatting. Select “src” folder, choose Source menu -> Format. All code formatted to Eclipse standard in one go.
3. Merge Tools
Then it was time to dive into the differences and group them into features, separating quick fixes from changes that broke APIs. For this we used the free and open meld on Linux.
3. EGit Goodness
Let’s say we found a line of code different in the fork. To work out where the feature had come from, we could use ‘git blame‘ but much nicer is the eGit support in Eclipse IDE. Show annotations was regularly used to try to work out where that feature had come from, which fork it had been originally created on and then see if we could find any extra information such as bugzilla or JIRA tickets describing the feature. We were always grateful for code with good and helpful commit messages.
3. Bug Tracking Tools
In this case we were using two different bug trackers: Bugzilla on the Eclipse Nebula side of things and JIRA on the Diamond side of things. As part of the merge, we were contributing lots and lots of distinct features to Nebula, we had a parent issue: Bug 513865 to which we linked all the underlying fixes and features, aiming to keep each one distinct and standalone. At the time of writing that meant 21 dependent bugs.
4. Gerrits & Pull Requests
Gerrits were created for each bug for Eclipse Nebula. Pull requests were created for each change going to Diamond’s DAWN (over 50 to date). Each was reviewed before being committed back. In many cases we took the opportunity to tidy code up or enhance it with things like standalone examples that could be used to demonstrate the feature.
5. Github Built-in Graphs
It was also good to use the built in Github built in Graphs (on any repository click on ‘Graphs’ tab), first to see other forks out in the wild (Members tab):
Then the ‘Network’ tab to keep track of the relationship with those forks compared to the main Diamond fork:
Much nicer than our hand-drawn effort from earlier, though in this case not all the code being dealt with was in Github.
The work is ongoing and we are getting to the tricky parts – the key reasons the forks were created in the first place – to make fundamental changes to the architecture. This will require some conversations to understand the best way forward. Already with the work that has been done, there has been mutual benefits: Diamond get new features and bug fixes developed in the open source and Eclipse Nebula get new features and bug fixes developed at Diamond Light Source. The New & Noteworthy for Eclipse Nebula shows off screenshots of all the new features as a result of this merge.
Going forward this paves the way for Diamond to not only get rid of duplicate maintenance of >30,000 lines of Java code (according to cloc), but to contribute some significant features they have developed that integrate with SWT XYGraph. In doing so with the Eclipse Science Working Group it make a great environment to collaborate in open source and make advancements that benefit all involved.