Python Multimedia: Video Format Conversion, Manipulations and Effects

10 min read


Python Multimedia

Python Multimedia

Learn how to develop Multimedia applications using Python with this practical step-by-step guide

  • Use Python Imaging Library for digital image processing.
  • Create exciting 2D cartoon characters using Pyglet multimedia framework
  • Create GUI-based audio and video players using QT Phonon framework.
  • Get to grips with the primer on GStreamer multimedia framework and use this API for audio and video processing.


        Read more about this book      

(For more resources on Python, see here.)

Installation prerequisites

We will use Python bindings of GStreamer multimedia framework to process video data. See Python Multimedia: Working with Audios for the installation instructions to install GStreamer and other dependencies.

For video processing, we will be using several GStreamer plugins not introduced earlier. Make sure that these plugins are available in your GStreamer installation by running the gst-inspect-0.10 command from the console (gst-inspect-0.10.exe for Windows XP users). Otherwise, you will need to install these plugins or use an alternative if available.

Following is a list of additional plugins we will use in this article:

  • autoconvert: Determines an appropriate converter based on the capabilities. It will be used extensively used throughout this article.
  • autovideosink: Automatically selects a video sink to display a streaming video.
  • ffmpegcolorspace: Transforms the color space into a color space format that can be displayed by the video sink.
  • capsfilter: It’s the capabilities filter—used to restrict the type of media data passing down stream, discussed extensively in this article.
  • textoverlay: Overlays a text string on the streaming video.
  • timeoverlay: Adds a timestamp on top of the video buffer.
  • clockoverlay: Puts current clock time on the streaming video.
  • videobalance: Used to adjust brightness, contrast, and saturation of the images. It is used in the Video manipulations and effects section.
  • videobox: Crops the video frames by specified number of pixels—used in the Cropping section.
  • ffmux_mp4: Provides muxer element for MP4 video muxing.
  • ffenc_mpeg4: Encodes data into MPEG4 format.
  • ffenc_png: Encodes data in PNG format.

Playing a video

Earlier, we saw how to play an audio. Like audio, there are different ways in which a video can be streamed. The simplest of these methods is to use the playbin plugin. Another method is to go by the basics, where we create a conventional pipeline and create and link the required pipeline elements. If we only want to play the ‘video’ track of a video file, then the latter technique is very similar to the one illustrated for audio playback. However, almost always, one would like to hear the audio track for the video being streamed. There is additional work involved to accomplish this. The following diagram is a representative GStreamer pipeline that shows how the data flows in case of a video playback.

Python Multimedia: Video Format Conversion, Manipulations and Effects

In this illustration, the decodebin uses an appropriate decoder to decode the media data from the source element. Depending on the type of data (audio or video), it is then further streamed to the audio or video processing elements through the queue elements. The two queue elements, queue1 and queue2, act as media data buffer for audio and video data respectively. When the queue elements are added and linked in the pipeline, the thread creation within the pipeline is handled internally by the GStreamer.

Time for action – video player!

Let’s write a simple video player utility. Here we will not use the playbin plugin. The use of playbin will be illustrated in a later sub-section. We will develop this utility by constructing a GStreamer pipeline. The key here is to use the queue as a data buffer. The audio and video data needs to be directed so that this ‘flows’ through audio or video processing sections of the pipeline respectively.

  1. Download the file from the Packt website. The file has the source code for this video player utility.
  2. The following code gives an overview of the Video player class and its methods.

    import time
    import thread
    import gobject
    import pygst
    import gst
    import os

    class VideoPlayer:
    def __init__(self):
    def constructPipeline(self):
    def connectSignals(self):
    def decodebin_pad_added(self, decodebin, pad):
    def play(self):
    def message_handler(self, bus, message):

    # Run the program
    player = VideoPlayer()
    thread.start_new_thread(, ())
    evt_loop = gobject.MainLoop()

    As you can see, the overall structure of the code and the main program execution code remains the same as in the audio processing examples. The thread module is used to create a new thread for playing the video. The method is sent on this thread. The gobject.threads_init() is an initialization function for facilitating the use of Python threading within the gobject modules. The main event loop for executing this program is created using gobject and this loop is started by the call

    Instead of using thread module you can make use of threading module as well. The code to use it will be something like:

    1. import threading
    2. threading.Thread(


    You will need to replace the line thread.start_new_thread(, ()) in earlier code snippet with line 2 illustrated in the code snippet within this note. Try it yourself!

  3. Now let’s discuss a few of the important methods, starting with self.contructPipeline:

    1 def constructPipeline(self):
    2 # Create the pipeline instance
    3 self.player = gst.Pipeline()
    5 # Define pipeline elements
    6 self.filesrc = gst.element_factory_make("filesrc")
    7 self.filesrc.set_property("location",
    8 self.inFileLocation)
    9 self.decodebin = gst.element_factory_make("decodebin")
    11 # audioconvert for audio processing pipeline
    12 self.audioconvert = gst.element_factory_make(
    13 "audioconvert")
    14 # Autoconvert element for video processing
    15 self.autoconvert = gst.element_factory_make(
    16 "autoconvert")
    17 self.audiosink = gst.element_factory_make(
    18 "autoaudiosink")
    20 self.videosink = gst.element_factory_make(
    21 "autovideosink")
    23 # As a precaution add videio capability filter
    24 # in the video processing pipeline.
    25 videocap = gst.Caps("video/x-raw-yuv")
    26 self.filter = gst.element_factory_make("capsfilter")
    27 self.filter.set_property("caps", videocap)
    28 # Converts the video from one colorspace to another
    29 self.colorSpace = gst.element_factory_make(
    30 "ffmpegcolorspace")
    32 self.videoQueue = gst.element_factory_make("queue")
    33 self.audioQueue = gst.element_factory_make("queue")
    35 # Add elements to the pipeline
    36 self.player.add(self.filesrc,
    37 self.decodebin,
    38 self.autoconvert,
    39 self.audioconvert,
    40 self.videoQueue,
    41 self.audioQueue,
    42 self.filter,
    43 self.colorSpace,
    44 self.audiosink,
    45 self.videosink)
    47 # Link elements in the pipeline.
    48 gst.element_link_many(self.filesrc, self.decodebin)
    50 gst.element_link_many(self.videoQueue, self.autoconvert,
    51 self.filter, self.colorSpace,
    52 self.videosink)
    54 gst.element_link_many(self.audioQueue,self.audioconvert,
    55 self.audiosink)

  4. In various audio processing applications, we have used several of the elements defined in this method. First, the pipeline object, self.player, is created. The self.filesrc element specifies the input video file. This element is connected to a decodebin.
  5. On line 15, autoconvert element is created. It is a GStreamer bin that automatically selects a converter based on the capabilities (caps). It translates the decoded data coming out of the decodebin in a format playable by the video device. Note that before reaching the video sink, this data travels through a capsfilter and ffmpegcolorspace converter. The capsfilter element is defined on line 26. It is a filter that restricts the allowed capabilities, that is, the type of media data that will pass through it. In this case, the videoCap object defined on line 25 instructs the filter to only allow video-xraw-yuv capabilities.
  6. The ffmpegcolorspace is a plugin that has the ability to convert video frames to a different color space format. At this time, it is necessary to explain what a color space is. A variety of colors can be created by use of basic colors. Such colors form, what we call, a color space. A common example is an rgb color space where a range of colors can be created using a combination of red, green, and blue colors. The color space conversion is a representation of a video frame or an image from one color space into the other. The conversion is done in such a way that the converted video frame or image is a closer representation of the original one.

    The video can be streamed even without using the combination of capsfilter and the ffmpegcolorspace. However, the video may appear distorted. So it is recommended to use capsfilter and ffmpegcolorspace converter. Try linking the autoconvert element directly to the autovideosink to see if it makes any difference.

  7. Notice that we have created two sinks, one for audio output and the other for the video. The two queue elements are created on lines 32 and 33. As mentioned earlier, these act as media data buffers and are used to send the data to audio and video processing portions of the GStreamer pipeline. The code block 35-45 adds all the required elements to the pipeline.
  8. Next, the various elements in the pipeline are linked. As we already know, the decodebin is a plugin that determines the right type of decoder to use. This element uses dynamic pads. While developing audio processing utilities, we connected the pad-added signal from decodebin to a method decodebin_pad_added. We will do the same thing here; however, the contents of this method will be different. We will discuss that later.
  9. On lines 50-52, the video processing portion of the pipeline is linked. The self.videoQueue receives the video data from the decodebin. It is linked to an autoconvert element discussed earlier. The capsfilter allows only video-xraw-yuv data to stream further. The capsfilter is linked to a ffmpegcolorspace element, which converts the data into a different color space. Finally, the data is streamed to the videosink, which, in this case, is an autovideosink element. This enables the ‘viewing’ of the input video.
  10. Now we will review the decodebin_pad_added method.

    1 def decodebin_pad_added(self, decodebin, pad):
    2 compatible_pad = None
    3 caps = pad.get_caps()
    4 name = caps[0].get_name()
    5 print "n cap name is =%s"%name
    6 if name[:5] == 'video':
    7 compatible_pad = (
    8 self.videoQueue.get_compatible_pad(pad, caps) )
    9 elif name[:5] == 'audio':
    10 compatible_pad = (
    11 self.audioQueue.get_compatible_pad(pad, caps) )
    13 if compatible_pad:

  11. This method captures the pad-added signal, emitted when the decodebin creates a dynamic pad. Here the media data can either represent an audio or video data. Thus, when a dynamic pad is created on the decodebin, we must check what caps this pad has. The name of the get_name method of caps object returns the type of media data handled. For example, the name can be of the form video/x-raw-rgb when it is a video data or audio/x-raw-int for audio data. We just check the first five characters to see if it is video or audio media type. This is done by the code block 4-11 in the code snippet. The decodebin pad with video media type is linked with the compatible pad on self.videoQueue element. Similarly, the pad with audio caps is linked with the one on self.audioQueue.
  12. Review the rest of the code from the Make sure you specify an appropriate video file path for the variable self.inFileLocation and then run this program from the command prompt as:


    This should open a GUI window where the video will be streamed. The audio output will be synchronized with the playing video.

What just happened?

We created a command-line video player utility. We learned how to create a GStreamer pipeline that can play synchronized audio and video streams. It explained how the queue element can be used to process the audio and video data in a pipeline. In this example, the use of GStreamer plugins such as capsfilter and ffmpegcolorspace was illustrated. The knowledge gained in this section will be applied in the upcoming sections in this article.

Playing video using ‘playbin’

The goal of the previous section was to introduce you to the fundamental method of processing input video streams. We will use that method one way or another in the future discussions. If just video playback is all that you want, then the simplest way to accomplish this is by means of playbin plugin. The video can be played just by replacing the VideoPlayer.constructPipeline method in file with the following code. Here, self.player is a playbin element. The uri property of playbin is set as the input video file path.

def constructPipeline(self):
self.player = gst.element_factory_make("playbin")
"file:///" + self.inFileLocation)


Please enter your comment!
Please enter your name here