Getting to Know Gstreamer, Part 2

Intro

In a previous post, I introduced the Gnome Gstreamer multimedia processing framework and how I came to work with it. In this post, I'll be going into more detail.

Running a simple pipeline from the command-line

This is all that's needed to demonstrate a simple media-processing pipeline:

gst-launch-1.0 videotestsrc ! video/x-raw,width=200,height=200 ! autovideosink

gst-launch is the command-line tool that parses and executes pipelines. The '!' symbol tells gst-launch that it should attempt to link the source and sink pads of adjacent elements.

videotestsrc generates a test video that looks like this:

It can be configured with an option to display other patterns.

video/x-raw,width=200,height=200

is an abbreviation for

capsfilter caps=video/x-raw,width=200,height=200

This element does no processing, but when Gstreamer matches neighboring pads, it will look only for pads that comply with the "caps" property. It could include other properties such as frame-rate, or require a more specific video encoding.

autovideosink is an element that will play the video stream on any compatible output device it can find. I ran the above command line using ssh -X on a remote linux server (Ubuntu) from a Mac Powerbook. On my local Mac, I had to install XQuartz to provide an X-server. autovideosink created a temporary video device in /dev and it output to my local X-server. autovideosink is really a convenience-element for pipeline development.

A test pipeline for audio might look like this:

gst-launch-1.0 audiotestsrc ! audioconvert ! audioresample ! faac ! autoaudiosink

That was surprisingly complicated given how simple it was to display the video stream. I am just following an example. Nevertheless, I get that the audio stream must be decoded (audioconvert) to WAV, resampled at a new rate, then re-encoded (faac) to something that autoaudiosink understands before it can be rendered on a speaker.

Unfortunately, in my case gst-launch reported:

WARNING: from element /GstPipeline:pipeline0/GstAutoAudioSink:autoaudiosink0: Could not open audio device for playback.

I'm going to let that one pass and move on to other stuff since this same pipeline works fine in my app when I replaced autoaudiosink was replaced with a muxer (multiplexer) element.

Processing and streaming an MPEG-4 file

Playing a video from an MPEG-4 file is only sightly more complicated:

gst-launch-1.0 filesrc location=sample_videos/my-video.mp4 ! decodebin ! videoconvert ! autovideosink

Here I had to replace the caps-filter with "decodebin ! videoconvert". Decodebin is a bin that packages several Gstreamer elements and mercifully hides the details from us. It outputs video/x-raw. videoconvert translates that back into something the autovideosink will understand.

There is something even cooler going on underneath the hood in this example. Our MPEG-4 file contains both video and audio and might even contain a subtitle track. When decodebin processes that stream, it will de-multiplex ("demux") the content into separate streams for audio and video and possibly subtitles and assign each one to its own source pad for connecting to a downstream neighbor. Gstreamer can find out that videoconvert's sink must connect to a stream of type video/*, so it will connect it to the appropriate source pad on decodebin. decodebin's source pads are also called "sometimes pads", because their presence will depend on whatever content decodebin sees when it receives its first buffer of content.

Gstreamer also takes care of negotiating a compatible format that videoconvert and autovideosink can agree on so that we don't have to sweat the details.

If we want to see the details, we can set the following environment variable:

export GST_DEBUG_DUMP_DOT_DIR=/tmp/

then run the above command-line again with gst-launch. Later, when we look in /tmp, we can see it generated some .dot files (with some details truncated for brevity):

$ ls -ltr /tmp/*.dot
/tmp/0.00.00.387798709-gst-launch.NULL_READY.dot
/tmp/0.00.00.390094317-gst-launch.warning.dot
/tmp/0.00.00.932217667-gst-launch.READY_PAUSED.dot
/tmp/0.00.00.939722926-gst-launch.PAUSED_PLAYING.dot
/tmp/0.00.00.392568534-gst-launch.NULL_READY.dot
/tmp/0.00.00.394929923-gst-launch.warning.dot
/tmp/0.00.01.016924243-gst-launch.READY_PAUSED.dot
/tmp/0.00.01.025362357-gst-launch.PAUSED_PLAYING.dot
/tmp/0.00.00.380756430-gst-launch.NULL_READY.dot
/tmp/0.00.00.955589017-gst-launch.READY_PAUSED.dot
/tmp/0.00.00.959730007-gst-launch.PAUSED_PLAYING.dot

Gst-launch created a .dot file each time the pipeline state changed, which I printed in the same order they were created (ls -ltr). We can process any one of them with this command:

dot -Tpng /tmp/0.00.01.025362357-gst-launch.PAUSED_PLAYING.dot > paused-playing.png

Here is the resulting image:

It is a schematic description of the pipeline as it transitioned form the PAUSED state to the PLAYING state. You will have to download this image in SVG format and open it with your browser, enlarge it, and use your horizontal scroll-bar in order to see the details. It shows us exactly what sort of content is being transferred between each pair of linked elements. Incredibly, this is the first time I've actually gotten it to work, while writing this blog. I wish I had spent more time trying to get it to work during development because it might have saved me hours of trial and error when things did not work.

Concluson

It's getting late. In subsequent posts, I'll cover the following topics:

Muxing and Demuxing
Compositing subtitles with video
Adding fade effects to subtitles
Customizing subtitles with fonts and positioning
Creating and parsing a custom syntax for subtitles that includes font, positioning and fade durations.
Developing and Debugging Gstreamer
Developing and Debugging Gstreamer plugins