Making a video clip visualizing sound with libvisual on Ubuntu 14.04
Intro
The purpose of this mini-project was to create a video clip with visualized audio, instead of just a dull still frame. Libvisual is the commonly used graphics engine for Linux’ media players, but I wanted the result in a file, not on the screen.
Libvisual’s sources come with lv-tool, which is a command-line utility, apparently for testing the library and its plugins. It may send raw video to standard output, but as of March 2015 there is no plugin for getting the sound from standard input. So I hacked one together, compiled it, and used it with lv-tools (more about this below, of course).
Note to self: To resume, look for libvisual.git in L/linux/.
Installing required packages
The following installations were required on my machine (this may vary, depending on what you already have):
# apt-get install cmake g++ # apt-get install libpng-dev zlib1g-dev # apt-get install autoconf # apt-get install liborc-0.4-dev
Downloading & compiling libvisual
$ git clone https://github.com/Libvisual/libvisual.git libvisual $ git checkout -b myown 4149d9bc1b8277567876ddba1c5415f4d308339d $ cd libvisual/libvisual $ cmake . $ make $ sudo make install $ cd ../libvisual-plugins $ cmake . $ make $ sudo make install
There is no particular reason why I checked out that specific commit ID, except a rather random attempt to solve a dependency issue (it was irrelevant, it turned out) and then forgot to switch back.
A trial run (not from stdin yet)
For help:
$ lv-tool -h
Listing plugins
$ lv-tool -p
A test run on a song: In one console, run
$ mplayer -af export=~/.mplayer/mplayer-af_export:512 song.mp3
This plays the song on the computer, and also allows libvisual access to the raw sound samples.
And on another console
$ lv-tool -i mplayer -a blursk -F 300 -f 5 -D 640x480 -d stdout > song.bin
Then play the clip with
$ ffplay -f rawvideo -video_size 640x480 -pix_fmt gray -framerate 5 song.bin
The frame rate was chosen as 5, and it can be increased, of course.
(Use “ffplay -pix_fmts” to get a list of supported pixel format, such as rgb8)
This isn’t all that good, because lv-tool generates video frames on the sound currently played. Even though it’s possible to sync the video with audio later on, there is no guarantee that this sync will remain — if the computer gets busy somewhere in the middle of rendering, lv-tool may stall for a short moment, and then continue with the sound played when it’s back. mplayer won’t wait, and lv-tools will make no effort to compensate for the lost frames — on the contrary, it should skip frames after stalling.
The stdin plugin
The idea behind the stdin plugin is so simple, that I’m quite sure libvisual’s developers actually have written one, but didn’t add it to the distribution to avoid confusion: All it does is reading samples from stdin, and supply a part of them as sound samples for rendering. As the “upload” method is called for every frame, it’s enough to make to consume the amount of sound samples that corresponds to the frame rate that is chosen when the raw video stream is converted into a clip.
The plugin can be added to libvisual’s project tree with this git patch. It’s made against the commit ID mentioned above, but it’s probably fine with later revisions. It doesn’t conform with libvisual’s coding style, I suppose — it’s a hack, after all.
Note that the patch is hardcoded for signed 16 bit, Stereo at 44100 Hz, and produces 30 fps. This is easily modified on the source’s #define statements at the top. The audio samples are supplied to libvisual’s machinery in buffers of 4096 bytes each, even though 44100 x 2 x 2 / 30 = 5880 bytes per frame at 30 fps — it’s common not supply all audio samples that are played. The mplayer plugin supplies only 2048 bytes, for example. This has a minor significance on the graphics.
After patching, re-run cmake, compilation and installation. Instead of reinstalling all, possibly copy the plugin manually into the required directory:
# cp libinput_stdin.so /usr/local/lib/x86_64-linux-gnu/libvisual-0.5/input/
The plugin should appear on “lv-tool -p” after this procedure. And hopefully work too. ;)
Producing video
The blursk actor plugin is assumed here, but any can be used.
First, convert song to WAV:
$ ffmpeg -i song.mp3 song.wav
Note that this is somewhat dirty: I should have requested a raw audio stream with the desired attributes as output, and ffmpeg is capable of doing it. But the common WAV file is more or less that, except for the header, which is skipped quickly enough.
Just make sure the output is stereo, signed 16 bit, 44100 Hz or set ffmpeg’s flags accordingly.
Create graphics (monochrome):
$ lv-tool -i stdin -a blursk -D 640x480 -d stdout > with-stdin.bin < song.wav
Mixing video with audio and creating a DIVX clip:
$ ffmpeg -f rawvideo -s:v 640x480 -pix_fmt gray -r 30 -i with-stdin.bin -ab 128k -b 5000k -i song.wav -vcodec mpeg4 -vtag DIVX try.avi
Same, but with colors (note the -c 32 and -pix_fmt):
$ time lv-tool -i stdin -c 32 -a blursk -D 640x480 -d stdout > color.bin < song.wav $ ffmpeg -f rawvideo -s:v 640x480 -pix_fmt rgb32 -r 30 -i color.bin -ab 128k -b 5000k -i song.wav -vcodec mpeg4 -vtag DIVX color.avi
It’s also possible to use “24″ instead of “32″ above, but some actors will produce a black screen with this setting. They will also fail the same with 8 bits (grayscale).
And to avoid large intermediate .bin files, pipe from lv-tool to ffmpeg directly:
$ lv-tool -i stdin -c 32 -a blursk -D 640x480 -d stdout < song.wav | ffmpeg -f rawvideo -s:v 640x480 -pix_fmt rgb32 -r 30 -i - -ab 128k -b 5000k -i song.wav -vcodec mpeg4 -vtag DIVX clip.avi
This is handy in particular for high-resolution frames (HD and such).
The try-all script
To scan through all actors in libvisual-0.5, run the following script (produces 720p video, or set “resolution”):
#!/bin/bash song=$1 resolution=1280x720 for actor in blursk bumpscope corona gforce infinite jakdaw jess \ lv_analyzer lv_scope oinksie plazma ; do lv-tool -i stdin -c 24 -a $actor -D $resolution -d stdout < $song | \ ffmpeg -f rawvideo -s:v $resolution -pix_fmt rgb24 -r 30 -i - \ -ab 128k -b 5000k -i $song -vcodec mpeg4 -vtag DIVX ${actor}_color.avi lv-tool -i stdin -c 8 -a $actor -D $resolution -d stdout < $song | \ ffmpeg -f rawvideo -s:v $resolution -pix_fmt gray -r 30 -i - \ -ab 128k -b 5000k -i $song -vcodec mpeg4 -vtag DIVX ${actor}_gray.avi done
It attempts both color and grayscale.