Analyzing Open Source development (part 3)

In last post about analyzing open source development I mentioned that this one would be about massaging people information to have unique identities for all the project contributors.

But before that, I would like to explore something different. How to get data from multiple repositories? What happens when I want data from a whole GitHub organization’s or user’s repositories?

The obvious answer would be:
1. Let’s get the list of repositories:


import requests

def github_git_repositories(orgName):
    query = "org:{}".format(orgName)
    page = 1
    repos = []
    
    r = requests.get('https://api.github.com/search/repositories?q={}&page={}'.format(query, page))
    items = r.json()['items']
    
    while len(items) > 0:
        for item in items:
            repos.append(item['clone_url'])
        page += 1
        r = requests.get('https://api.github.com/search/repositories?q={}&page={}'.format(query, page))
        items = r.json()['items']
    
    return repos

2. And now, for each repository, run the code seen in previous post to get a dataframe for each one in list and concat them with:


df = pd.concat(dataframes)

For organizations or users with a few repositories, it would work. But for those with hundreds of repositories, how long would it take to go one by one fetching and extracting info?

Would there be a fastest approach? Let’s play with threads and queues…
Continue reading “Analyzing Open Source development (part 3)”

Last month conferences: PyConES, LibreCon, Open Source Summit Europe and Liferay Spain Symposium 2017

Jono Bacon at Open Source Summit showing a photo of me!

This post was initially intended to share thoughts about PyConES 2017, but it’s been an stressful events month. So it also contains experiences from the latest conferences I’ve attended/participated/talked: LibreCon, Open Source Summit Europe and Liferay Spain Symposium.

Let’s start the reviews…

PyConES 2017

Continue reading “Last month conferences: PyConES, LibreCon, Open Source Summit Europe and Liferay Spain Symposium 2017”

Analyzing Open Source development (part 1)

Simple analysis of open source development in public administrations can be done very easily. This post describes the initial steps to understand how to obtain previous post results.

We’ll learn how to use Perceval. It’s the tool responsible for data retrieval in GrimoireLab, the free, open source software framework for software development analytics.

Take some coffee or tee, and let’s start!

Laptop, notebook and coffee

Continue reading “Analyzing Open Source development (part 1)”