Getting to know Gstreamer, Part 1
Background
My client generates hundreds of thousands of videos for some of its clients to deliver to its end customers, each with the same audio/video track, but a custom subtitle track for each individual recipient. Each subtitle might be assigned a different font, x/y position, and a custom fade-in/fade-out effect.My client has used ffmpeg for several years to composite subtitles over video with the required effects, then store the result on a content-delivery network (CDN). At roll-out time, my client executes a mass e-mailing to each of its client's end-customers (respecting those that have opted out) which contains a custom link that will fetch and play their video from the CDN cloud.
Due to the enormous number of videos that must be generated for all of a client's end-customers for each ad campaign, the process requires a significant lead-time between quality-assurance and release. If the client decides they must change something at the last minute, we have to start the process all over again.
While between projects, I decided to see what I could do to streamline this process and cut down on this prep-time and overhead. I had no prior experience in multimedia processing or delivery. I wanted something with an API library that I could integrate into a custom web-server to customize and deliver each video on demand. As far as I can tell, Gstreamer and ffmpeg are the only open-source options for automated video processing with significant followings and community support.
For automation, ffmpeg includes the libav library for integrating into custom applications, but so far most of the the examples I could find already use ffmgeg as a command-line tool with processing configured by setting command-line options.
Gstreamer
Gstreamer, on the other hand, is highly modularized. It consists of discrete elements with inputs and outputs called "pads" that serve as either sources or sinks for each element, and some "bins" that are opaque packages composed of multiple elements with their own pad endpoints for simplicity. Developers compose these elements into pipelines using either a command-line tool or the Gstreamer core API. Gstreamer will take care of negotiating the actual connections by setting compatible content formats, or will complain and quit if non can be found. Once assembled, the pipelines are multi-threaded in a way that can be completely transparent to the application level. Developers can insert queue elements in certain positions of the pipeline that buffer content and allow for even more multithreading.The command-line tool has a nifty syntax for describing pipelines, and is convenient for testing pipelines, but not recommended for a production environment. Nevertheless, building a pipeline from code can be cumbersome and hard to follow later. Fortunately, the core API has a method, launch_parse("..."), for parsing this syntax and returning a pipeline ready to operate, so you can enjoy the best of both worlds. There is also an API call that will generate .dot files that contain a graphical representation of the pipeline in different states that can be triggered in the command-line tool by setting an environment variable, but I have not gotten that to work yet.
For me, Gstreamer appears to be more compatible for the purpose of processing and delivering content on-demand in a web server environment. There is even an element, hlssink, that will output the pipeline in HLS format for web delivery. As a result of these considerations, I chose to find out what I could accomplish by using Gstreamer.
Challenges
While Gstreamer is highly developed and comprehensive, the documentation for individual elements can be rather curt. It took me a while to get used to the Gstreamer eco-system. Each elements documentation will contain some boilerplate, then describe the available pads, their capabilities in terms of content formats, and under what conditions they can be made available, and what configuration properties the element will accept. Building a successful pipeline, at a minimum, requires a developer being able to select appropriate elements with sinks and source-pads and connect them to elements with compatible sink and source-pads.Any documentation beyond that seems to be totally up to each element's developers. Some elements had informative descriptions with short examples, but others were limited to curt one-liners. When I had questions that the docs couldn't answer, I had to download and read the source-code, and in some cases modify the source code to print additional debug and log messages to let me know what's going on.
GStreamer source code is written exclusively in C, in a highly consistent but sometimes elliptical manner that is required for compatibility with the Gnome project's GObject layer, an API-compatibility layer with bindings for several languages including Python. GObject requires each compatible API to have features that allow GObject to automatically discover API endpoints and generate documentation. Very cool indeed, but in order to make it work, the base C code is chock-full of C preprocessor macros that call other macros and can take a while to navigate without the help of a source-code navigation tool with auto-discovery features. Significant amounts of code are dedicated to necessary boilerplate for GObject and to painstakingly manage resources and avoid memory leaks.
Compatibility with GObject is also the reason for function signature that look like this:
static void func(int arg1, int arg2, int *result);
instead of the more obvious
static int func(int arg1, int arg2)
Once you start using the Gstreamer API from another language like Python, you'll forgive and thank the authors the C code that make that possible without someone else having to create yet another compatibility library for each language.
Conclusion
Next I will write about how I used Gstreamer to composite custom subtitles with fade effects. While the implementation seems simple and straightforward in hindsight, the lack of documentation and useful examples made it anything but simple for me to learn. Therefore, I will share my experience for others in a subsequent post.