Part I: The era of infinite code content đź“ş
Nearly a year has passed since the introduction of GPT-4. One of the public discussions we followed at the time was the one concerned with the impact LLMs would have on software development and, in particular, on jobs in this field. ChatGPT felt like magic and it was easy to extrapolate the impact of these technologies, even to the extent of rendering the software engineering profession redundant. One year later, it’s clear that LLMs are not replacing developers, but helping them focus less on the grunt work and more on the big picture.
This transition has already had a measurable impact on unit economics. For example, last year, Microsoft reported that developers using GitHub Copilot are 55% more productive than those not using it. Interestingly, they also noted that 40% of the code checked in by these developers was AI-generated and remained unmodified. Statistics like these suggest a new inflection point in the software economy and an impending shift in our path towards the future of work. Specifically, they prompt us to question, in an era where code is automatically generated, how do we measure the quality of software talent?
Computational Talent Validation
Talent validation inherently depends on social consensus. For developers, this consensus can be reached through various means like earning a diploma, interviewing for a job, or achieving a high score in LeetCode quizzes. However, with the accelerated production of software talent that LLMs have catalysed, it’s uncertain whether these traditional consensus-building methods can truly scale or remain relevant.
At Quira, we believe that in the era of Generative AI, scalable consensus can only be achieved through experience quantification. Under this approach, a developer’s skills and abilities are implicitly inferred through data and computation. Specifically, by semantically understanding the code and contribution metadata that developers generate when creating software. The evolving dynamics of open source are accelerating this potential and accentuating the need for this technology. Indeed, we are observing a shift in attitude towards open source, where a new generation of eager and ambitious developers are embracing it not only to uphold its ethos of community-driven development, but also as a means to validate their work socially.
However, even in this new dynamic, the same problems persist. When code can be automatically generated from natural language, analysis of code content alone is insufficient to achieve robust consensus. To automatically quantify experience, one must incorporate additional context regarding the relevance and merit of the software created. In other words, we need to understand the social proof of code and how it has been “validated” by other members in the network.
Measuring the social proof of OSS contributions
Today we’re excited to release a new version of DevRank, a metric that measures the social proof of open source contributions by quantifying reputation flows in the GitHub network. In a nutshell, DevRank models the ecosystem as a network of contributors and repositories linked together by a directed edge representing a reputation-inducing event on GitHub. The current version considers three types of edges.
- Developer → Repo, to represent events where a developer stars a repo.
- Repo → Developer, for events where a developer commits to a repository.
- Repo → Repo, to incorporate events where a repository imports a package or a dependency.
Considering all such edges within GitHub, we can construct a large directed network where reputation flows from one node into another through stargazer, commit, and import events.
By applying a customised version of the PageRank algorithm on this resulting graph we can compute the stationary state probabilities of a random walker in the network. These raw probabilities indicate the importance of a developer within the open source ecosystem in the same way that PageRank calculates the importance of a webpage on the internet. The higher the DevRank of a developer, the more their contributions have been socially validated by other developers through star, commit, and import events on GitHub. If you’re interested in the technical details of DevRank, you can read more about it in our documentation.
Rethinking open source status
One of the coolest applications of DevRank is that it gives us a new framework to think about status & reputation in open source, not only for individual developers, but also for communities. Developer reputation is already a valuable intangible that can have a positive financial impact on the lives of developers, say, by giving them access to work opportunities or, in some cases, making them eligible for grants or venture capital funding. For these reasons, quantifying reputation in a principled and transparent way is critical to foster a meritocratic ecosystem. Think, for example, of stargazer traction, which is widely used as a measure of the success of a project in the ecosystem and that, according to Wired, is now being artificially inflated and is no longer a reliable indication of traction.
Let’s look at how DevRank can change our perspective of success and status in open source.
Stargazers
Let’s start with stargazers, which are by far the most popular way to assess traction in open source. To do this, let’s first ask ourselves,
How would the list of most popular repos on GitHub change if instead of ranking them by number of stars, we ranked them by the DevRank of their stargazers?
It turns out that this adjustment would significantly alter the rankings. Certain repositories, such as FreeCodeCamp/freecodecamp, public-apis/public-apis, or jwasham/coding-interview-university would not only lose their place in the top 5 of the list… they wouldn’t even feature in the top 30! On the other hand, some repositories like facebook/react, twbs/bootstrap, or electron/electron would rise from the 10th, the 20th, and 30th, to the 1st, 3rd, and 4th place in the list. In other words, the list of top 10 repos would see a demotion of educational repos in favour of some popular development frameworks.
But we don’t need to stop there… We can go one level further and look at the repos that have been starred the most by developers in the top and bottom 10% (see table below). The results are qualitatively more pronounced and show that the top 10% tends to star repos focused on developer tooling, while the bottom 10% tends to star beginner resources and educational projects.
Contributor Growth
Another popular way to assess traction in open-source communities is contributor growth. Contributor growth is already a purer metric than GitHub stars because an increase in the contributor growth counter requires the submission and acceptance of a PR. However, similarly to stargazers, this metric is not resistant to noise. For example, some “first contribution repos” like firstcontributions/first-contributions trivialise merging requirements for PRs, making contribution events almost as frictionless as stargazing.
DevRank can help us filter out the noise and add context to these metrics. To illustrate this, consider the top repositories in the ecosystem ranked by the number of new contributors since January 2023. We can see that in the top 20 of the list, there are projects like firstcontributions/first-contributions, mouredev/retos-programacion-2023, Syknapse/Contribute-To-This-Project, which have very low merging requirements.
We can derive an alternative ranking if we restrict the counts to contributions by developers in the Top 1% of developers based on DevRank (table below). When doing this, we immediately see that 30% of places in the top 20 are replaced by top AI projects and developer tools like ggerganov/llama.cpp, run-llama/llama_index, langchain-ai/langchainjs.
Grudge matches
DevRank can also aid in determining a winner in classic rivalries between developer communities. For example, a frequently debated topic among AI researchers is whether PyTorch is better than Tensorflow. There are many non-violent ways to settle this matter, with our preferred option being to appeal to open-source data. Unfortunately, in this particular case, vanilla statistics provide limited insight and fail to give a definite conclusion: TensorFlow leads in the stargazer count with 180.8k stars, but PyTorch is imported by 265.9k repositories. To gain true insight into this question we need to increase the resolution of the data.
For example, we can use DevRank to understand which repo is preferred by developers with higher reputation. To do this, we can compute the median DevRank of the contributors of each dependent repository. The results are fascinating. The median DevRank of PyTorch users turns out to be 140.3, whereas for TensorFlow users, it stands at 56.9. Moreover, the total DevRank sum of repos that use PyTorch is 2.65 bn, while the DevRank sum of repos that import Tensorflow is 1.75 bn.
This experiment shows that even though PyTorch has significantly fewer stargazers than Tensorflow, it is generally used and preferred by repositories and developers with higher DevRank. Go PyTorch! 🔥
Outro
The future of developer work and education is evolving quickly and LLMs playing a significant role in accelerating this transformation. As the software engineering profession becomes democratised, the need to assess the quality of software talent in a scalable and systematic way becomes paramount. At Quira, we are leveraging DevRank along with other signals to algorithmically gauge the expertise of our users. This allows us to rank developers in our community across languages, frameworks, topics, and industries to ultimately match them to paid contribution opportunities in open source. Our technology is already helping a number of Commercial Open Source Organisations (COSS) receive PRs from top contributors outside of their orbits. If you’d like to use Quira to build a dynamic and sustainable community of high-quality contributors, drop us a line, we’d love to chat.
Alternatively, if you’re a developer looking to build or monetise your reputation in open-source, then log in to Quira and start your journey with us.