Useful ffmpeg commands for editing video

For DebConf20, we are recommending that speakers pre-record the presentation part of their talks, and will have live Q&A. We had a smaller online MiniDebConf a couple of months ago, where for instance I had connectivity issues during my talk, so even though it feels too artificial, I guess pre-recording can decrease by a lot the likelihood of a given talk going bad.

Paul Gevers and I submitted a short 20 min talk giving an update on autopkgtest, ci.debian.net and friends. We will provide the latest updates on autopkgtest, autodep8, debci, ci.debian.net, and its integration with the Debian testing migration software, britney.

We agreed on a split of the content, each one recorded their part, and I offered to join them together. The logical chaining of the topics is such that we can't just concatenate the recordings, so we need to interlace our parts.

So I set out to do a full video editing work. I have done this before, although in a simpler way, for one of the MiniDebconfs we held in Curitiba. In that case, it was just cutting the noise at the beginning and the end of the recording, and adding beginning and finish screens with sponsors logos etc.

The first issue I noticed was that both our recordings had a decent amount of audio noise. To extract the audio track from the videos, I resorted to How can I extract audio from video with ffmpeg? on Stack Overflow:

ffmpeg -i input-video.avi -vn -acodec copy output-audio.aac

I then edited the audio with Audacity. I passed a noise reduction filter a couple of times, then a compressor filter to amplify my recording on mine, as Paul's already had a good volume. And those are my more advanced audio editing skills, which I acquired doing my own podcast.

I now realized I could have just muted the audio tracks from the original clip and align the noise-free audio with it, but I ended up creating new video files with the clean audio. Another member of the Stack Overflow family came to the rescue, in How to merge audio and video file in ffmpeg. To replace the audio stream, we can do something like this:

ffmpeg -i video.mp4 -i audio.wav -c:v copy -c:a aac -map 0:v:0 -map 1:a:0 output.mp4

Paul's recording had a 4:3 aspect ratio, while the requested format is 16:9. This late in the game, there was zero chance I would request him to redo the recording. So I decided to add those black bars on the side to make it the right aspect when showing full screen. And yet again the quickest answer I could find came from the Stack Overflow empire: ffmpeg: pillarbox 4:3 to 16:9:

ffmpeg -i "input43.mkv" -vf "scale=640x480,setsar=1,pad=854:480:107:0" [etc..]

The final editing was done with pitivi, which is what I have used before. I'm a very basic user, but I could do what I needed. It was basically splitting the clips at the right places, inserting the slides as images and aligning them with the video, and making most our video appear small in the corner when presenting the slides.

P.S.: all the command lines presented here are examples, basically copied from the linked Q&As, and have to be adapted to your actual input and output formats.

When community service providers fail

I'm starting a new blog, and instead of going into the technical details on how it's made, or on how I migrated my content from the previous ones, I want to focus on why I did it.

It's been a while since I have written a blog post. I wanted to get back into it, but also wanted to finally self-host my blog/homepage because I have been let down before. And sadly, I was not let down by a for-profit, privacy invading corporation, but by a free software organization.

The sad story of wiki.softwarelivre.org

My first blog that was hosted in a blog engine written by me, which was hosted in a TWiki, and later Foswiki instance previously available at wiki.softwarelivre.org, hosted by ASL.org.

I was the one who introduced the tool to the organization in the first place. I had come from a previous, very fruitful experience on the use of wikis for creation of educational material while in university, which ultimately led me to become a core TWiki, and then Foswiki developer.

In 2004, I had just moved to Porto Alegre, got involved in ASL.org, and there was a demand for a tool like that. 2 years later, I left Porto Alegre, and some time after that the daily operations of ASL.org when it became clear that it was not really prepared for remote participation. I was still maintaing the wiki software and the OS for quite some years after, until I wasn't anymore.

In 2016, the server that hosted it went haywire, and there were no backups. A lot of people and free software groups lost their content forever. My blog was the least important content in there. To mention just a few examples, here are some groups that lost their content in there:

  • The Brazilian Foswiki user group hosted a bunch of getting started documentation in Portuguese, organized meetings, and coordinated through there.
  • GNOME Brazil hosted its homepage there until the moment the server broke.
  • The Inkscape Brazil user group had an amazing space there where they shared tutorials, a gallery of user-contributed drawings, and a lot more.

Some of this can still be reached via the Internet Archive Wayback Machine, but that is only useful for recovering content, not for it to be used by the public.

The announced tragedy of softwarelivre.org

My next blog after that was hosted at softwarelivre.org, a Noosfero instance also hosted by ASL.org. When it was introduced in 2010, this Noosfero instance became responsible for the main domain softwarelivre.org name. This was a bold move by ASL.org, and was a demonstration of trust in a local free software project, led by a local free software cooperative (Colivre).

I was a lead developer in the Noosfero project for a long time, and I was also involved in maintaining that server as well.

However, for several years there is little to no investment in maintaining that service. I already expect that it will probably blow up at some point as the wiki did, or that it may be just shut down on purpose.

On the responsibility of organizations

Today, a large part of wast mot people consider "the internet" is controlled by a handful of corporations. Most popular services on the internet might look like they are gratis (free as in beer), but running those services is definitely not without costs. So when you use services provided by for-profit companies and are not paying for them with money, you are paying with your privacy and attention.

Society needs independent organizations to provide alternatives.

The market can solve a part of the problem by providing ethical services and charging for them. This is legitimate, and as long as there is transparency about how peoples' data and communications are handled, there is nothing wrong with it.

But that only solves part of the problem, as there will always be people who can't afford to pay, and people and communities who can afford to pay, but would rather rely on a nonprofit. That's where community-based services, provided by nonprofits, are also important. We should have more of them, not less.

So it makes me worry to realize ASL.org left the community in the dark. Losing the wiki wasn't even the first event of its kind, as the listas.softwarelivre.org mailing list server, with years and years of community communications archived in it, broke with no backups in 2012.

I do not intend to blame the ASL.org leadership personally, they are all well meaning and good people. But as an organization, it failed to recognize the importance of this role of service provider. I can even include myself in it: I was member of the ASL.org board some 15 years ago; I was involved in the deployment of both the wiki and Noosfero, the former as a volunteer and the later professionally. Yet, I did nothing to plan the maintenance of the infrastructure going forward.

When well meaning organizations fail, people who are not willing to have their data and communications be exploited for profit are left to their own devices. I can afford a virtual private server, and have the technical knowledge to import my old content into a static website generator, so I did it. But what about all the people who can't, or don't?

Of course, these organizations have to solve the challenge of being sustainable, and being able to pay professionals to maintain the services that the community relies on. We should be thankful to these organizations, and their leadership needs to recognize the importance of those services, and actively plan for them to be kept alive.

pristine-tar updates

Introduction

pristine-tar is a tool that is present in the workflow of a lot of Debian people. I adopted it last year after it has been orphaned by its creator Joey Hess. A little after that Tomasz Buchert joined me and we are now a functional two-person team.

pristine-tar goals are to import the content of a pristine upstream tarball into a VCS repository, and being able to later reconstruct that exact same tarball, bit by bit, based on the contents in the VCS, so we don’t have to store a full copy of that tarball. This is done by storing a binary delta files which can be used to reconstruct the original tarball from a tarball produced with the contents of the VCS. Ultimately, we want to make sure that the tarball that is uploaded to Debian is exactly the same as the one that has been downloaded from upstream, without having to keep a full copy of it around if all of its contents is already extracted in the VCS anyway.

The current state of the art, and perspectives for the future

pristine-tar solves a wicked problem, because our ability to reconstruct the original tarball is affected by changes in the behavior of tar and of all of the compression tools (gzip, bzip2, xz) and by what exact options were used when creating the original tarballs. Because of this, pristine-tar currently has a few embedded copies of old versions of compressors to be able to reconstruct tarballs produced by them, and also rely on a ever-evolving patch to tar that is been carried in Debian for a while.

So basically keeping pristine-tar working is a game of Whac-A-Mole. Joey provided a good summary of the situation when he orphaned pristine-tar.

Going forward, we may need to rely on other ways of ensuring integrity of upstream source code. That could take the form of signed git tags, signed uncompressed tarballs (so that the compression doesn’t matter), or maybe even a different system for storing actual tarballs. Debian bug #871806 contains an interesting discussion on this topic.

Recent improvements

Even if keeping pristine-tar useful in the long term will be hard, too much of Debian work currently relies on it, so we can’t just abandon it. Instead, we keep figuring out ways to improve. And I have good news: pristine-tar has recently received updates that improve the situation quite a bit.

In order to be able to understand how better we are getting at it, I created a "visualization of the regression test suite results. With the help of data from there, let’s look at the improvements made since pristine-tar 1.38, which was the version included in stretch.

pristine-tar 1.39: xdelta3 by default.

This was the first release made after the stretch release, and made xdelta3 the default delta generator for newly-imported tarballs. Existing tarballs with deltas produced by xdelta are still supported, this only affects new imports.

The support for having multiple delta generator was written by Tomasz, and was already there since 1.35, but we decided to only flip the switch after using xdelta3 was supported in a stable release.

pristine-tar 1.40: improved compression heuristics

pristine-tar uses a few heuristics to produce the smaller delta possible, and this includes trying different compression options. In the release Tomasz included a contribution by Lennart Sorensen to also try --gnu, which greatly improved the support for rsyncable gzip compressed files. We can see an example of the type of improvement we got in the regression test suite data for delta sizes for faad2_2.6.1.orig.tar.gz:

In 1.40, the delta produced from the test tarball faad2_2.6.1.orig.tar.gz went down from 800KB, almost the same size of tarball itself, to 6.8KB

pristine-tar 1.41: support for signatures

This release saw the addition of support for storage and retrieval of upstream signatures, contributed by Chris Lamb.

pristine-tar 1.42: optionally recompressing tarballs

I had this idea and wanted to try it out: most of our problems reproducing tarballs come from tarballs produced with old compressors, or from changes in compressor behavior, or from uncommon compression options being used. What if we could just recompress the tarballs before importing then? Yes, this kind of breaks the “pristine” bit of the whole business, but on the other hand, 1) the contents of the tarball are not affected, and 2) even if the initial tarball is not bit by bit the same that upstream release, at least future uploads of that same upstream version with Debian revisions can be regenerated just fine.

In some cases, as the case for the test tarball util-linux_2.30.1.orig.tar.xz, recompressing is what makes it possible to reproduce the tarball (and thus import it with pristine-tar) possible at all:

util-linux_2.30.1.orig.tar.xz can only be imported after being recompressed

In other cases, if the current heuristics can’t produce a reasonably small delta, recompressing makes a huge difference. It’s the case for mumble_1.1.8.orig.tar.gz:

with recompression, the delta produced from mumble_1.1.8.orig.tar.gz goes from 1.2MB, or 99% of the size to the original tarball, to 14.6KB, 1% of the size of original tarball

Recompressing is not enabled by default, and can be enabled by passing the --recompress option. If you are using pristine-tar via a wrapper tool like gbp-buildpackage, you can use the $PRISTINE_TAR environment variable to set options that will affect any pristine-tar invocations.

Also, even if you enable recompression, pristine-tar will only try it if the delta generations fails completely, of if the delta produced from the original tarball is too large. You can control what “too large” means by using the --recompress-threshold-bytes and --recompress-threshold-percent options. See the pristine-tar(1) manual page for details.

Debconf17

I’m back from Debconf17.

I gave a talk entitled “Patterns for Testing Debian Packages”, in which I presented a collection of 7 patterns I documented while pushing the Debian Continuous Integration project, and were published in a 2016 paper. Video recording and a copy of the slides are available.

I also hosted the ci/autopkgtest BoF session, in which we discussed issues around the usage of autopkgtest within Debian, the CI system, etc. Video recording is available.

Kudos for the Debconf video team for making the recordings available so quickly!

Papo Livre #1 - meios de comunicação

Acabamos de lançar mais um episódio do Papo Livre: #1 – meios de comunicação.

Neste episódio eu, Paulo Santana e Thiago Mendonça discutimos os diversos meios de comunicação em comunidades de software livre. A discussão começa pelos meios mais “antigos”, como IRC e listas de discussão e chega aos mais “modernos”, passo pelo meio livre e meio proprietário Telegram, e chega à mais nova promessa nessa área, Matrix (e o seu cliente mais famoso/viável, Riot).


For older posts, see the blog archive.