Will your PR ever be Merged?

Have you ever made an open source contribution? Whether your PR is rejected / ignored or successfully merged can depend on factors other than just the quality of your work. Some projects are just much more responsive, some are very picky about what is accepted and reject anything that does not match their vision.

I extracted Pull Request stats for 40 popular open source GitHub projects to see how likely it is for a PR to ever be merged. In this post you will find contributions to which projects are the best use of your time. Spoiler: some big mature projects do better in this ranking than you would think!

How I chose the projects to analyse

I went with Python, Julia, R and JavaScript, each having 10 repos in the ranking. The aim was to collect a somewhat representative sample of big popular projects among these 4 languages, hence you will see here mostly big names such as React, TensorFlow and Shiny. I am personally interested in broadly understood Data Science so my ranking skews more to this side, since these are the projects I would be more likely to contribute myself. However an open source ranking would not be complete without popular web frameworks in Python (Django, Flask, FastAPI) and some of the most popular js frontend tech (React, Vue), since web development projects are the most active ones on GitHub.

What did I look at?

To answer the question: how likely is it that an average PR gets merged into given repo I looked at proportion of Merged vs closed without merge PRs. I also collected data about stale PRs (open for longer than 90 days) and currently active ones (open but not older than 90 days).

How did I gather the data?

This is an interesting one! At first I used GitHub’s REST API and while it was good enough for extracting data from smaller repositories I quickly hit rate-limiting (5k requests per hour) and also got bored of waiting. Fortunately GitHub offers also a GraphQL API which for this specific purpose is orders of magnitude more efficient! Why is that? For my analysis I need just a few fields from all PRs in a given repo. I used PyGithub lib to fetch the data, fetching PRs seems to incur one HTTP request per paginated result (max 25 entries) which is what I expected but then for fetching the merged status field it had to perform one more HTTP request per PR. As you can imagine that slowed down the execution to a crawl so I had to find another solution, GraphQL performed just one HTTP request per 100 entires and dropped execution time from above an hour for a big repo like react to just around a minute. Have a look at my extraction scripts (both GraphQL and REST) on my GitHub.

Results

OK enough about the data collection. You are probably just curious which projects are most likely to ghost you and your hard work. Let get down to it.

JavaScript Projects Are Selective

First have a look at the list of JS projects I analysed:

For each of them I report the number of successful (merged) PRs, rejected (closed but not merged), stale (open for longer than 90 days) and active (open and less than 90 days old).

Express and Node are most likely to reject a PR

Express

Node

Material-ui and webpack are most likely to merge your PR among big JS projects

Material-ui

Webpack

React, while quite selective is more likely to merge than I thought such a big project would be

React

Python Projects Merge PRs more often than JS

Lets have a look at the list of Python projects I analysed:

For each of them I report the number of successful (merged) PRs, rejected (closed but not merged), stale (open for longer than 90 days) and active (open and less than 90 days old).

Big old web projects like Django and Flask are significantly more selective, the chance for merging is still higher than in JS equivalents

Django

Flask

Data related projects merge most of their PRs

Pandas

Scikit-learn

DS Notebooks

Tensorflow is more welcoming to contributions than I assumed ..

3blue1brown and his popular manim visualisation lib stand out

Manim was open sourced by popular math Youtuber 3blue1brown not so long ago. It seems that there aren’t enough maintainers to process deluge of PRs coming from the community. Some help here maybe?

Julia’s projects are under active development and welcome contributions

Lets have a look at the list of Julia projects I analysed:

For each of them I report the number of successful (merged) PRs, rejected (closed but not merged), stale (open for longer than 90 days) and active (open and less than 90 days old).

Roughly all Julia’s repos follow similar pattern — your time is probably well spent there!

JuMP

Pluto

DifferentialEquations

PyCall

Plots.jl

JuliaLang

IJulia

DataFrames

JuliaTutorials

Gadfly

R Also Has some of the most hospitable repos

R projects I analysed:

For each of them I report the number of successful (merged) PRs, rejected (closed but not merged), stale (open for longer than 90 days) and active (open and less than 90 days old).

Knitr

ggplot2

dplyr

shiny

rmarkdown

plotly

DiagrammeR

devtools

MLR

r4ds

Conclusions

Based on this limited analysis (it is in the end just a ratio of merged vs other PRs) it seems that big JavaScript projects are less good of an investment of your time when it comes to making a PR. Julia with its rapidly developing ecosystem is quite an attractive target for contributions however!

Would you like to get a PR stats plot like above for some other repo? Use my script.

Follow me on twitter for more content like this!

Originally published at https://pzakrzewski.com.

Solution Architect at Plotwise. Solving Last-Mile Delivery with Tech. The Netherlands.