How many javascript frameworks developers are involved in web standards? Let’s use cauldron.io

During one of the episodes from “The Web is The Platform” podcast (in Spanish), there was a question about how much javascript frameworks and libraries developers are involved in Web related standards development? And I immediately thought that it could be answered with cauldron.io. Let’s try …

First step: let’s frame the question

The general question could be vague and too general: “how much javascript frameworks and libraries developers are involved in Web related standards development?” From a quick read, you could start wondering: Who are the javascript developers? What javascript frameworks and libraries? Which Web related standards?

My recommendation for this type of questions is to start with simple things, easy to measure to see if we could get a significant answer. For example:

“How many developers that have written code for top javascript frameworks and libraries , like angular, react, and vue have written code in HTML standard or ECMAScript specification?”

Second step: let’s go for the data

From the question is clear that we must gather data from angular, react, vue, HTML standard, and ECMAScript specification repositories.

In cauldron.io, add a new project, give it a name, and start adding GitHub repositories as data sources:

Cauldron project definition steps
Cauldron.io project definition

Third step: Play with the data

Once the data is gathered, cauldron.io provides some basic charts for the aggregated data source repositories. Of course, you can publicly share them and use them as initial basic information. Cauldron team is working on ideas to provide meaningful metrics in these tabs. If you have any idea or request, feel free to comment it in the Cauldron Community feedback section.

Cauldron.io default metrics overview

But the question we have is not answered by these metrics and charts. How could I get the answer I am looking for?

Easy! In the top of the page there is a Data Workspace button that gives you to a dedicated Kibana instance based on Open Distro open source project.

With it, you have access to the data processed by GrimoireLab, the core analytics tool in cauldron.io, to produce the metrics you’ve seen in the general overview. You can read Kibana documentation created by Elastic to learn more about available visualizations and how to use them. Let’s create a metrics visualization for the authors that have written code in such repositories:

  1. Select “Visualize” in the side bar menu, and click on “Create a new visualization option”
  2. Select “Metric”, and choose “git” as source. (It is one on the indexes created by GrimoireLab for cauldron.io).
  3. The metric number shows the count of items in the index (commits), so we need to change the metrics to the “unique_count” of “author_uuid”. What is this “author_uuid”? Cauldron doesn’t store any personal data information (like author’s name and email). Of course, they are “public”, but the team is concern about how people could use that data, and they take people’s privacy seriously. So, such information is anonymized, and a unique count (“unique_count”) of such generated unique ids (“author_uuid”) will give you the number of unique "name email" information used by the commit’s author.
  4. If you want, adjust time frame to a significant amount of time. 1 year is the default one.
Creating a Kibana's metric visualization in cauldron.io
Creating a Kibana’s metric visualization in cauldron.io

Ok, we already have an approximation of the number of people that has written code for the repositories we are analyzing. Now, to answer our question we need to filter the data. For example:

  1. Create a filter to check authors in “repo_name” associated to HTML standard
  2. Create another filter to check authors in one of the “repo_name” associated to angular, vue or react
Creating filters for a metric
Metric filtering

That’s it, you’ve got your answer. Right? The answer is no, that’s not the right way to get the answer!

Understanding the data you are playing with

We are using the git index that contains information about all the commits in the selected repositories. You can see how it looks like in the Discover option from the the side bar menu, and if you select git index.

Basically, it contains a set of items with several fields that includes "repo_name" as one of them. In the previous visualization, what we have done is:

  1. Count the number of unique "author_uuid" values that appear in such set of commits
  2. Filter those commits with a certain "repo_name"
  3. Filter again using a different "repo_name" will give always zero because one commit cannot be in several repositories at the same time (if there are not forks)

This is a usual misunderstanding of how Kibana works. For me, it was logical to think that I was listing or counting the unique "author_uuid" values, so filtering the list of "author_uuid" by 2 different "repo_name" was obvious to get only the list or number of "author_uuid" in both repositories. The matter is that filters are applied to the items in the index, not the things I am visualizing.

How could I get the answer ?

There might be better ways, but I’ve followed these steps:

First, I’ve created a table visualization that list "author_uuid" and the number of repositories they have commited code to.

List of authors generated unique ids visualization

I have saved it as "authors_list". Then, I have created a similar visualization to get the list of repositories.

List of git repositories visualization

I have saved it as "repos_list".

After that, I have gone to “Dashboard” option in the side bar menu, and selected to create a new dashboard. I’ve added both visualizations to it:

Dashboard with “authors” and repositories

And now, we can start to play!

Getting answers

It seems there is one single “author” that has contributed to 4 different repositories during the last 10 years. If we click on it, then the list of repositories gets filtered, and voilà! She has committed to react (3 commits), angular, ecma262, and html (1 commit to each one):

Filtering author that has participated in several repositories

We could check the rest of people that has participated in more than 1 repository to see if there is anyone else that has participated in a one of the specs and one of the javascript frameworks. Here are some things I’ve discovered playing with some filters:

  • There is one person that has contributed to react (34 commits), vue (12 commits), and angular (6 commits)
  • There is one person that has contributed to ecma262 (114 commits), react (2 commits), and html spec (1 commit)
  • There are several people that have contributed to 2 of the 3 javascript frameworks
  • There are people that has contributed to ecma262 and html specs (probably around 400)

Disclaimer and next actions

If you find any glitch in the data, or the way it has been produced, feel free to comment to help me to improve it.

If you find this interesting and you would like to perform your own analysis, give cauldron.io a try to answer your own questions. If you have any feedback and comments, please, let us know about them in Cauldron community.

And of course, cauldron.io is 100% free, open source software, so feel free to submit any issue and request to the project! It’s built on Django, GrimoireLab, Open Distro for Elasticsearch, and Bokeh as significant dependencies.

Using your OBS output as input for your webcam in Debian

During last weeks I’ve been playing with Open Broadcaster Software (OBS) Studio, or simply OBS. An awesome free, open source software project, that among many other things allow to build to custom scenes to be used as input for video conferences. How can it be done in Linux, and more precisely in Debian?

About OBS

I’ve discovered OBS a couple of months ago, while learning how to produce video content for Juntos dese casa initiative. It allows the user to compose scenes with several sources (images, video camera input, mics, etc.) and to manage those scenes like if you were a video recording manager in an studio.

OBS Studio and scenes transition

It’s already available for many platforms and operating systems, and in Debian it’s been as easy as:

$ sudo apt install obs-studio

One of its cool features is that it allows direct streaming of the video generated to several services like YouTube or Twitch. But it also allows to record the output into a video to share or stream later.

But, would it possible to use these scenes in a video conference call? I’ve discovered there are plugins to do it on certain operating systems, but in Linux (and for Debian in my case), it requires some work.

How to make it work?

I would like to thank Henning Jacobs for his detailed post about this topic. It has been very useful for me.

First you need to install v4l2loopback-dkms:

$ sudo apt install v4l2loopback-dkms

To activate it, I have followed Henning recommendations:

$ sudo modprobe v4l2loopback devices=1 video_nr=10 card_label="OBS Cam" exclusive_caps=1

Next, you need to compile and install obs-v4l2sink plugin, but first you might need cmake and libobs-dev. So, as usual in Debian :

$ sudo apt install cmake libobs-dev

Once installed, you need to follow these steps:

$ sudo apt install qtbase5-dev
$ git clone --recursive https://github.com/obsproject/obs-studio.git
$ git clone https://github.com/CatxFish/obs-v4l2sink.git
$ cd obs-v4l2sink
$ mkdir build && cd build
$ cmake -DLIBOBS_INCLUDE_DIR="../../obs-studio/libobs" -DCMAKE_INSTALL_PREFIX=/usr ..
$ make -j4
$ sudo make install

Sadly, it seems that the plugin file ends in a wrong folder (/usr/lib/obs-plugins/), and you need to copy it to the right one:

$ sudo cp /usr/lib/obs-plugins/v4l2sink.so /usr/lib/x86_64-linux-gnu/obs-plugins/

Now, if you run OBS, under Tools you would have a new choice: V4L2 Video Output. You need to choose the path to V4L2 device (remember the nr parameter when you activate the v4l2loopback module).

V4L2Sink plugin settings

And now, you should be able to choose your “OBS Cam” in any video conference tool. For example in Jitsi:

OBS output in Jitsi as webcam

If you notice that the output is mirrored, or upside down, don’t worry. As far as I have tested, the rest of the people see it right in their screen.

One of the cool features about using OBS this way is that you can give an internal talk or training, with nice scenes and transitions between cameras, and record it for later consumption:

If you find this article interesting, please, let me know through comments to this post. Thank you!

Transcribe a talk into a blog post

A couple of years ago, Diane Mueller shared with me her experience converting her talks into blog posts or written content to reuse later. And I’ve tried to use it with one of my latest talk to easily turn it into a draft for a blog post. How was the experience?

Getting the audio from the talk

There are several ways to get the audio from your own talks. The one I’ve used the most is to use the audio record functionality in my phone. No extra apps nor hardware needed.

In other cases, if the talk is published on YouTube, you can download it with youtube-dl:

$ youtube-dl <YOUTUBE-VIDEO-URL>

In that case, to get the audio from the mp4 file you can use ffmpeg:

$ ffmpeg -i <VIDEO-FILE-NAME>.mp4 -f mp3 -ab 192000 -vn <AUDIO-FILE-NAME>.mp3

Getting the text from the talk

There are several speech to text services. Some are fully automatic, and some include partial or full human content curation. I’ve decided to try AWS Transcribe to test the outcome of AI transcription.

The process is not very straight forward, but it’s simple enough for me. First, you need to upload the file to an AWS S3 bucket. Once uploaded, you need to copy the file address, because you’ll need it later. It’ll be something like:

s3://<S3-BUCKET-NAME>/<AUDIO-FILE-NAME>.mp3

In AWS Transcribe, create a new job by giving it a name and the S3 address of the file to transcribe:

AWS Transcribe job set up

There are some limitations. For example, it’s not possible to transcribe audio files that last more than 4 hours. Well, I can speak a lot during a talk, but not that much before people get asleep.

As an outcome, it produces a json file where one of the fields (transcript as one of the items in results.transcripts) is the transcription produced. You can see a preview in the job outcome page:

AWS Transcribe job outcome preview

Testing with a real talk

I’ve tried with my talk about Open Source Program Office (OSPO) from last Social Northern California Linux Expo (SCaLE):

Of course, English is not my mother language and it could explain some big discrepancies between my talk and the transcription generated.

What do you thin about these AI solutions? Have you already tried one? What is your experience? Please, feel free to share your answers as comments to this post. Thank you!