Blogs

RedMonk

Skip to content

What Black Duck Can Tell Us About GitHub, Language Fragmentation and More

Survival of the Forges
View more presentations from sogrady

Two things have been self-evident to us at RedMonk for some time. First, that programming language and runtime adoption is becoming more diverse rather than less over time [coverage]. Second, that an increasing proportion of that deployed code is being hosted at GitHub [coverage]. Obvious as these conclusions may be to us, however, it is important that we test them at every opportunity, both to assure ourselves of their continued validity and to help build the case for parties less interested in developer trends and behaviors. Black Duck’s databases offer us one opportunity to do this.

In cooperation with Black Duck, then, we examined a subset of their commit history for a webinar this morning. Specifically, we evaluated the metrics from commits at four mainstream forges – CodePlex, GitHub, Google Code, and Sourceforge. From January through May, 2.1M commits were made. This dataset offers the opportunity for indirect insight on developer behaviors, as it constitutes a high volume record of developer activity over a multi-month period across multiple properties.

To examine the question of runtime, fragmentation, for example, we’d look at the proportion of commits per language across the dataset. If our hypothesis is correct, we’d expect to see limited variance between the different programming languages and programming language types, with no clearly dominant platform.

Which is, in fact, what we observe.

spacer

The total year-to-date commits recorded by Black Duck are not evenly distributed, but with the exception of C# and Perl, are roughly comparable. When we examine the same data with the additional variable of target forge, we see similar diversity on display.

spacer

There are wider skews within the forge data and definite patterns, but the fragmentation of runtimes is nevertheless apparent for both language and repository.

As important as the languages developers are choosing is how they leverage them, which in turn is a function of where they host it. Our belief, as articulated above, is that GitHub is emerging as a massive center of gravity. This is, in our view, primarily attributable to the social coding approach advocated for and supported by GitHub. Based on the decentralized version control system Git, which makes branching and thereby forking sufficiently low overhead to incent the behavior [coverage], GitHub has changed the way that software is built in public, and attracted substantial attention as a result.

What we would expect to see from Black Duck’s data, then, would be a majority share of commits deployed to GitHub. Which, again, is what the data suggests, with GitHub’s share at 54.5%.

spacer

More interesting, however, is a consideration of the relative volume of commits against the backdrop of forge age.

spacer

GitHub’s substantial lead in commit volume is impressive in light of both its age and the nature of the competition. GitHub is by two years the youngest market player, and has overtaken competitive platforms built by both Google and Microsoft, as well as the namesake of the forge term itself.

What’s interesting is that the data indicates that traction behind GitHub appears to have come largely at the expense of Google Code. Sourceforge, while now a distant second in commit volume, remains solidly ahead of both CodePlex and Google Code. One explanation for this might be that GitHub is more likely to siphon off Google Code-type developers, as opposed to those that might turn to Sourceforge.

Digging deeper, we observe that GitHub is dominant with dynamic language commits.

spacer

Sourceforge, however, maintains a narrow edge with statically typed language assets.

spacer

In spite of this, both the first and second place repositories by commits exhibit substantial diversity amongst their overall commit volume.

Here is Sourceforge’s commit volume, placed by language on a stacked bar chart.

spacer

Its relative strengths in statically typed code – primarily C, C++ and Java – is evident. But so too is the relative spectrum of assets housed at Sourceforge.

GitHub, for its part, is observably strong in dynamic language adoption, including JavaScript, Python and Ruby. It also, however, indicates substantial volumes of commits in C, C++ and Java.

spacer

For the curious, here are the most popular languages by commit volume for each forge surveyed.

spacer

With the evidence suggesting that our assertions regarding runtime fragmentation and the importance of GitHub are correct, the logical question is what, in practical terms, this means.

First, heterogeneity is the new norm. Enterprises typically advantage simple environments with fewer approved languages and runtimes to manage; the data indicates that instead the developer preference for multiple languages is accelerating with nearly ten languages showing substantial commit volumes. Enterprises can fight this tide, or embrace it. The outcome is not likely to differ substantially in either case, as the boundaries to technology procurement continue to erode.

Likewise, it is clear that GitHub should be core to your developer relations strategy. This is clear enough that large vendors such as VMware have begun leveraging GitHub as their primary external repository [coverage], both for the visibility it affords and the attendant developmental benefits. But that kind of usage is obvious; less so is the growing trend of using GitHub as a de facto development resume [example]. Rather than attempting to indirectly evaluate their coding ability via on site artificial problem sets, employers are increasingly evaluating their work product itself, publicly available on GitHub. Be creative: there are many ways to leverage GitHub. Algorithmic recruitment is but one example.

At a high level, all of the above is further confirmation of our belief that developers are the new kingmakers [coverage]. There’s a reason that Linux, Apache, MySQL, dynamic languages and now GitHub have all become volume success stories. Those that understand that reason will enjoy a competitive advantage over those that do not.

Credit: All source data for the above graphics is courtesy Black Duck.

Disclosure: Black Duck, CodePlex and GitHub are RedMonk clients. Google and Geeknet (Sourceforge) are not.

Tweet

10 Comments

Categories: Programming Languages, Version Control.

Tags: blackduck, codeplex, forges, fragmentation, github, googlecode, language, sourceforge

By sogrady
June 2, 2011 at 5:06 pm
Comment Feed

10 Responses

  1. Where’s Launchpad? It has to be bigger than Codeplex…

    spacer RJ Ryan — June 3, 2011 @ 10:27 am — Reply
  2. @RJ Ryan: great question. we were operating off of the dataset that Black Duck had available, but i’ll check with them and see if they have Launchpad data as well.

    spacer sogrady — June 3, 2011 @ 12:44 pm — Reply
  3. This post conflates interpreted/compiled languages and statically and dynamically typed languages. It looks like you really want the former.

    e.g. Perl is statically typed but interpreted.

    spacer Kenny Root — June 3, 2011 @ 3:12 pm — Reply
  4. @Kenny Root: that’s absolutely fair. the distinction is probably better argued as compiled vs interpreted, given the nature of the exceptions such as PERL. thanks.

    spacer sogrady — June 3, 2011 @ 5:37 pm — Reply



Some HTML is OK

or, reply to this post via trackback.

Continuing the Discussion

  1. […] is inmiddels de meest gebruikte softwarehostingdienst, becijferde RedMonk. Tussen januari en mei van dit jaar werden op GitHub 1,2 miljoen bijdragen geplaatst, tegen […]

    ‘GitHub is populairder dan SourceForge’ » Clippy.be — June 3, 2011 @ 9:59 am
  2. […] the Redmonk analysis is quite interesting, I haven’t seen the actual data behind the BlackDuck study, nor have I […]

    GitHub, Collaboration, and Haters | Systemy biometryczne — June 3, 2011 @ 10:36 pm
  3. […] is inmiddels de meest gebruikte softwarehostingdienst, becijferde RedMonk. Tussen januari en mei van dit jaar werden op GitHub 1,2 miljoen bijdragen geplaatst, tegen […]

    ‘GitHub is populairder dan SourceForge’ | Webtechnologie — June 5, 2011 @ 4:16 am
  4. […] What Black Duck Can Tell Us About GitHub, Language Fragmentation and More […]

    Github is Gaining Based on Proprietary Data | Techrights — June 6, 2011 @ 12:27 pm
  5. […] “What Black Duck Can Tell Us About GitHub, Language Fragmentation and More” by Stephen O’Grady (2011) […]

    Links collection about software forges: status, criticism and new ideas « Libre Software People's Front — August 24, 2011 @ 2:44 pm
  6. […] try to have some fun with statistics. From a recent presentation by Stephen O’Grady from Redmonk, Github’s growth is almost […]

    Chris Aniszczyk's (zx) diatribe » Apache and Politics Over Code — November 23, 2011 @ 3:23 am
The Eclipse Survey and OSS Contribution and Consumption » « Welcome to the Age of Data: My OSBC Talk
gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.