As I implemented a Displayport source (Single Stream Transport, SST) on a Virtex-7 from scratch (based upon GTH outputs), I wrote down some impressions. Here they are, in no particular order.
As for MST (Multiple Stream Transport), which I didn’t touch — I really wonder if someone is going to use it. Or if that part just fills the standard with dead weight.
It’s actually DVI/HDMI over GTX
My personal opinion on the standard is that it’s not so well engineered and poorly written: Some of the mechanisms don’t make much sense (for example, the clock regeneration method), and some are defined quite vaguely. To the extent that it’s likely that an experienced engineer with a good common sense will get it right. But with room for misinterpretations.
Even so, I don’t expect this to prevent Displayport from becoming a long-lived method to connect TVs and monitors to computers and other devices. Odds are that the monitors will ignore much of the possibly ambiguous features in the stream, and work properly, no matter what is thrown on them. The real damage is mainly that it’s harder than necessary to implement the standard on both sides.
The saddest things about this standard is that it didn’t lift off from the classic screen scanning methodology, as the task is merely transmitting pixels data from one side to another. Moving to a GTX-based standard was a great opportunity to leave the old paradigms that date back to the invention of television behind, and adopt digital-age methods. But that didn’t happen.
So instead of just transmitting the image line by line, each line as a packet (like MIPI), the source is required to maintain timing of an imaginary scan of the image, with an imaginary pixel clock, producing a standard VESA graphics mode in terms of active areas and blanking. This is the natural way for a DVI-to-Displayport converter, but when the pixel source is producing Displayport directly, it needs to emulate behavior that goes back to CRT screens.
There is a certain rationale behind this: There might be a Displayport-to-DVI converter on the monitor side, which needs to regenerate the DVI stream, with a pixel clock and horizontal/vertical timings. But this is a transient need. Not something to drag along for years to come.
This isn’t just about difficulty. The need to conform to the timing behavior of some VESA standard forces the source to create blanking periods that it could otherwise skip. The main link is hence forced to be idle a significant part of the time so that an imaginary electron beam of an imaginary CRT screen will have the time to return to the left side.
And there is more: The source is required to throttle the data rate, to make it look as if there’s a pixel source, running on a VESA standard pixel clock, that pushes the pixels. This is done by transmitting stuffing symbols (that is, zero data symbols before scrambling, enclosed by FS and FE control symbols).
To make things even less pleasant, the standard requires that the source to transmit the data in frames of a constant length, Transfer Units (TU). Each transfer unit begins with pixel data, and then switches to stuffing symbols until its end, so that the average pixel rate matches the imaginary pixel clock.
The source can decide on 32-64 symbol clocks for each TU frame. The sink may deduce this length from the positions of the FE control symbols, as this detail isn’t conveyed in another way. One may wonder why the sink would care about the TU’s frame length. And then wonder why it has to be constant.
To make things even worse, the standard doesn’t define how to throttle the data. It doesn’t give any definitive rule on how much the data stream on the symbol stream may be ahead of or behind the imaginary pixel clock that runs continuously. For example, is it OK to bombard the sink with 128 pixels with no throttling in the first TUs, and then slow down? Is it fine to send almost nothing on the first TUs, and then catch up?
It just says things like
…the number of valid data symbols per TU per lane (except for the last TU of a line which may be cut because of the end of active pixel) will be approximated with the following equation:
# of valid data symbols per lane = packed data rate/link symbol rate * TU size
“Approximated”? How much approximated?
and then there’s
The number of valid data symbols per TU will naturally alternate, and over time, the average number will come to the appropriate non-integer value calculated from the above equation
Naturally alternate? How many symbols is a natural alternation?
So even if though the pixel throttling was made for a converter to DVI, it has no guarantee on how much the pixel stream will fluctuate with respect to the pixel clock. It seems like they meant this flactuation to be ±1, but it’s not said. Not a real problem today, given the low price of FIFO memory, but the whole point was to avoid the need of storing the entire line.
Main link implementation shopping list
I found it a bit difficult at first to define what needs to be implemented on the main link, so here’s the shopping list. These are the things that a Displayport source needs to be capable of regarding transmission on the GTX lanes (at a bare minimum, with no audio):
- Three training sequences: TPS1, TPS2 and TPS3 (the last can be omitted if Displayport 1.2 isn’t supported, but it’s not a good idea)
- Pixel data, organized in Transfer Units (TU) of 32-64 symbols each. Each TU must contain stuff symbols (zeros) enclosed by a FS-FE at its end, so that the overall transmission data rate emulates the display mode’s pixel clock. The pixel data itself must be enclosed in Blank End and Blank Start markers, at times replaced by scrambler reset markers.
- Note that as of DisplayPort 1.2, there are six formats for the Blank Start sequence: There’s the enhanced framing sequence which is mandatory for Displayport 1.2, and the non-enhanced version, and the sequences differ for each of the lane configurations (1x, 2x or 4x).
- A scrambler must be implemented to scramble all non-control symbols.
- Once in each Vertical Blank period, the Main Stream Attribute (MSA) Packet must be transmitted once where pixel data would occur otherwise. This is a simple data structure of at most 39 symbols, including the double SS marker in the beginning and the SE at the end, containing the image attributes. It contains, among others, the color coding in the MISC0 field (which is set to 0x20 or 0x21 for plain 24 bpp RGB for asynchronous or synchronous clock, respectively), so it’s not imaginable that a monitor will display something without this information. Someone must have thought that 39 symbols is a huge waste of bandwidth once a frame, so there are (mandatory) shorter versions of this packet for 2 and 4 lane configurations, to make better use of the link. Hurray.
- Per section 5 of the standard, a Displayport source must be able to support both RGB and YCbCr colorimetry formats, with a few bit-per-pixel options at a bare minimum just to support the fallback modes. That may not sound a big deal, but each bit-per-pixel format is packed differently into the symbol streams. On top of that, the source must adapt itself to one of the colorimetry formats that appear in the sink’s EDID information. That doesn’t make life simple, but for practical purposes, I want to see a Displayport monitor that doesn’t support 24 bpp RGB. But that’s not good enough if you’re a graphics card vendor.
The training sequence in short
The training procedure may look quite complicated at first glance, but it merely consists of two stages:
- Clock recovery: The source transmits all symbols as 0x4a data (D10.2) without scrambling, which “happens to be” 8b/10b encoded as a 0101010101 bit sequence. The receiver performs bit clock recovery on this sequence.
- Equalization, word boundary detection and inter-lane deskew: The source transmits a special sequence of symbols without scrambling (TPS2 or TPS3, preferably the latter). The receiver, which knows what to expect, applies its own equalizer (if present) and recovers where a 10-bit symbols starts. When several lanes are used, it also deskews the lanes as required.
One could argue that today’s GTXs don’t need this kind of training, as commonly used equalizers can make up for rather lousy cabling. And it’s not clear if the GTXs actually used are designed to equalize based upon a training sequence.
Anyhow, on each of these two stages, the source runs the following loop: It applies the training sequence on the link, writes to dedicated AUX CH registers to inform the sink what sequence is sent, on which lanes and at what rate, and what gain, pre- and post-emphasis is applied on each lane. The source waits for a known period of time (announced by the sink as TRAINING_AUX_RD_INTERVAL) and then checks the sink’s registers for the status. If the registers indicate success (certain bits set, depending on the phase) the stage of the training is done. If they don’t, other registers will request a new setting of gain, pre- and post-emphasis. The source applies those, and loops again.
The source gives up after four attempts with the same set of gain, pre- and post-emphasis. In other words, if the sink doesn’t change the requests for the signal conditioning four times, it’s a sign it has nothing more to attempt. The source should try to reduce the lane count or rate, and retrain.
When the clock recovery stage is done, the source should go on to equalization without changing the signal conditioning. When the equalization is done, it should exit training and start sending pixels. Needless to say, without changing the signal conditioning.
The control symbols
This is the assignment of control symbols in Verilog. To turn into control symbols, the respective charisk bit must be set in parallel with the codes given below:
assign BS = 8'hbc; // K28.5
assign BE = 8'hfb; // K27.7
assign BF = 8'h7c; // K28.3
assign SR = 8'h1c; // K28.0
assign FS = 8'hfe; // K30.7
assign FE = 8'hf7; // K23.7
assign SS = 8'h5c; // K28.2
assign SE = 8'hfd; // K29.7
Some experiments with a real monitor
Testing my own logic with a Dell P2415Q monitor, I was curious to know what it really cares about in the main link data stream and what it ignores. So here are some insights, which are relevant to this monitor only.
- When lanes are trained properly, but the monitor is unhappy with the data stream, it enters Power Save mode immediately. The display goes out of this mode when the stream is OK.
- Writing 0x01 to DPAUX address 0x600 (attempting to wake it up) should be done after training is done, and video data is present on the link. Not before the training, as this will cause the monitor to remain sleeping.
Things that can be messed with and still get an image:
- Mvid[7:0] on each BS can be set to constant 0. The monitor didn’t care, even though MISC0 was set to asynchronous clocking.
- It had no problem with an arbitrarily large Transfer Unit (including shooting a 1024-pixel wide line in one go)
- Non-enhanced framing mode and enhanced framing mode were both accepted in any mode, even on 5.4 Gbps (which requires enhanced framing).
- There was no problem showing an image on the monitor with scrambling turned off (given that the monitor was informed about this by setting DP AUX register 0x102 to 0x20 instead of 0x00).
The monitor will not display an image without any MSA transmitted. Also, sending an MSA but messing with the following fields prevents displaying the image:
- M (three fields of Mvid)
- The total number of pixels per line, and the total number of lines
- The image’s active area (number of active pixels and rows)
Things that were fine messing in the MSA (image still shown):
- Setting N=0 in async mode is OK, probably because it has to be 32768 anyhow
- All information about the syncs: Position and length
Introduction
The Displayport standard requires the transmission of fields named Mvid in different places of the main link stream, however the relevant parts in the standard are somewhat unclear. This is an attempt to understand the rationale behind the standard’s requirements, in the hope to clarify them.
This post is intended for someone who has spent some time reading the standard. The requirements aren’t listed here, but rather discussed.
Background
Among others, Displayport’s main link is designed to carry a DVI stream transparently as one of the use cases. That is, a DVI-to-Displayport transmitter, which has no particular knowledge of the DVI source, creates the Displayport stream, and a Displayport-to-DVI receiver reconstructs the DVI stream at the other end. This may seem a far-fetched scenario, but if a graphics card vendor wants to support Displayport on a card that only has DVI, a quick solution may be to add such a converter, possibly as a single chip. A monitor vendor may pick a similar solution to support Displayport as well.
Regardless, Displayport is specified to behave as if there was a pixel stream of data arriving with a pixel clock that is slower than the maximal throughput the Displayport link allows. The transmitter is required to pack the data in Transfer Units (TU) of 32-64 symbols length, and fill them partially with data, stuffing the rest of the Transfer Unit with zeros (between FS and FE symbols). The message is clear: Even if a Displayport-aware source has the capability to fetch pixels fast enough to send an entire line continuously on the Displayport link, it’s not the way to go. Instead, imagine that the pixels arrive at a slower pace, and fill each Transfer Unit with a constant number of pixels (plus minus one), so the sink isn’t required to handle pixels faster than necessary. Average the payload, rather than bombard the receiver.
Stream clock recovery
A trickier issue is the stream clock recovery (Stream Clock is the term used in the standard for the pixel clock used by the imaginary or actual DVI pixel stream). As the Displayport-to-DVI converter must present this stream clock on its output as the pixel clock, it needs to maintain some kind of clock PLL. The technically simpler case is when the symbol clock and stream clock are derived from the same reference clock, so the relation between their frequencies is a known constant rational number, which can be conveyed to the receiver. This is referred to as Synchronous Clock Mode in section 2.2.3 of the standard, and defines the PLL dividers M and N for achieving this as
f_stream_clock = f_link_symbol_clock * M / N
But Displayport needs to work when the stream clock is generated by an external source as well. In this Asynchronous Clock Mode the transmitter is required (per section 2.2.3) to measure the frequency approximately by counting the number of clock cycles of the stream clock during a period of 32768 symbol clocks. It should then announce N=32768 and M as the number of clocks counted in the MSA (Main Stream Attribute) packet, transmitted once on each vertical blank period. That makes sense: If the receiver locks a PLL to reconstruct the stream clock based upon M and N and the equation above, it will obtain the stream clock’s frequency within an error of 1/32768 ~ 30.5 ppm, more or less. This is of course unacceptable in the long run, but it’s still a rather small error: At the highest symbol clock of 540 MHz (for 5.4 Gb/s lanes), the 30.5 ppm inaccuracy of the measured M leads to a 16.5 kHz offset. So if the image’s line frequency is 16 kHz (lower than any VESA mode), it’s one pixel clock offset per line.
The Displayport standard allows the required fine-tuning of the receiver’s stream clock by requiring that the 8 LSBs of a stream clock timestamp (Mvid[7:0]) are transmitted at the end of each image line. In other words, the transmitter is required to maintain a free-running counter on its stream clock input, and send the lower bits of its value at the same moment a BS (Blanking Start) control symbol is transmitted on the link (BS marks the end of active pixels in a row, or as a keep-alive in the absence of video data).
The receiver may apply a counter on its own stream clock, and compare the 8 LSBs. As shown above, the difference should be one stream clock at most, so the receiver can fine-tune its PLL to obtain an accurate replica of the transmitter’s stream clock. As real-life clocks aren’t all that stable, and there’s also a chance that spread-spectrum modulation has been applied on the source stream clock, the difference can get bigger than one stream clock. So 8 bits of the timestamp seems to be a good choice.
So far so good. Now to the confusion of notations in the standard.
What’s M?
The main problem is that M is sometimes referred to as a timestamp, and sometimes as a PLL divider. There is a certain similarity, as PLL dividers are counters, and so are timestamps. The difference is that a PLL divider is zeroed when it reaches a certain value, and timestamps are not.
So section 2.2.3 begins with saying
The following equations conceptually explain how the Stream clock (Strm_Clk) must be derived from the Link Symbol clock (LS_Clk) using the Time Stamps, M and N
and uses M and N as dividers immediately after in the equation shown above. It also says a few rows down:
When in Asynchronous Clock mode, the DisplayPort uPacket TX must measure M using a counter running at the LS_Clk frequency as shown in Figure 2-17. The full counter value after every [N x LS_Clk cycles] must be transported in the DisplayPort Main Stream attributes. The least significant eight bits of M (Mvid[7:0]) must be transported once per main video stream horizontal period following BS and VB-ID.
which is a complete mixup. The counter runs on the stream clock and not LS_Clk. Also, [N x LS_Clk cycles] is announced in the MSA in the fields denoted Mvid23:0, Mvid16:8 and Mvid7:0, but these are surely not timestamps, but the result of the count. On the other hand, Mvid[7:0] is a timestamp, and can’t be reset every N symbol clock cycles, as it would be useless for small N’s. In fact, even if N=32768, it’s useless for a 540 MHz symbol clock: For a 33 kHz line frequency, a timestamp reset would occur every second line. So there are two different counters, one is the M divider, and the second the M timestamp, both referred to as Mvid in the standard.
This isn’t all that ambiguous in the Asynchronous Mode case, because the standard says what to do with Mvid in the MSA, and it’s quite obvious that the Mvid[7:0] transmitted along with a BS should be a timestamp that is never reset.
The problem is in Synchronous Mode. The standard doesn’t say what should be transmitted in the MSA. Section 2.2.4, which details the fields, says “M and N for main video stream clock recovery (24 bits each)” showing how the word is split into 3 symbols in drawings. And that’s it. Common sense says that they meant the M and N as PLL dividers. There’s no sense in sending N (denoted Nvid) otherwise. This makes these fields similar to the Asynchronous Mode, and it seems this is the widely accepted interpretation.
Nevertheless, someone out there might as well say the Mvid is Mvid, and it’s the full time stamp counter transmitted on the MSA. The receiver has no other way to know the full word otherwise. One may wonder why it would need it, but that’s a different story. But what is this part in section 2.2.3 good for then, if the full Mvid[23:0] word is never transmitted?
When Mvid7:0 crosses the 8-bit boundary, the entire Mvid23:0 will change. For example, when Mvid23:0 is 000FFFh at one point in time for a given main video stream, the value may turn to 0010000h at another point. The Sink device is responsible for determining the entire Mvid23:0 value based on the updated Mvid7:0.
Maybe the safe choice is to announce Asynchronous Mode regardless of whether the clock ratios are known, hoping that the monitor won’t mess up with the Mvid[7:0] timestamps.
Having said all this, one can speculate that these Mvid and Nvid fields are ignored anyhow by any monitor that has a good reason to support Displayport. Recall that the goal of all this was to reconstruct the stream clock, which doesn’t make sense when Displayport is used for resolutions that DVI can’t support.
Is Mvid[7:0] really required?
This isn’t really about what the standard requires, but the question is why.
Section 2.2.2.1, which details the control symbols for framing, says that BS should be, among others
Inserted at the same symbol time during vertical blanking period as during vertical display
That’s a somewhat odd requirement, as one can’t guarantee a repeated symbol time: In the general case, the streaming clock and symbol clock don’t divide, if they are synchronous at all, and hence the line period in terms of symbol clock can’t be constant.
Not being so picky, it’s clear that the standard requires that the BS is timed closely to some constant position in the originating image’s line. If it can’t hit exactly the same symbol position, move it by one. And since they mention a “same symbol time”, it means that all BS symbols are transmitted like this.
Which in turn means that the number of stream clocks between one BS and another is the total number of pixels per row in the originating image (active pixels + blanking). That number is known through the MSA. So why bother sending Mvid[7:0]? The difference is always the same.
Or maybe the meaning was just that BS has to be bit-aligned in a word the same way as the other symbols? After all, BS is coded as K28.5, which is commonly used as a “comma” symbol that marks the alignment of bits into 10-bit symbols on the wire. But with this interpretation, the requirement is trivial.
Displayport’s standard requires that the TPS2 and TPS3 training sequences have a known running disparity on the transmitted characters. It uses a plus-minus notation (e.g. K28.5-) to indicate the disparity, and also clarifies the meaning of this notation by writing out the bit sequences of K28.5- and K28.5+.
Xilinx, on the other hand, is slightly less definitive: Working with Virtex-7′s GTH, Table 3-6 in UG476 outlines how to set TXCHARDISPVAL to obtain a known disparity, given that TXCHARDISPMODE is ’1′. Unfortunately, the description for TXCHARDISPVAL=0 is “Forces running disparity negative when encoding TXDATA” which is slightly ambiguous. One would expect it to mean that e.g. a K28.5- character would be sent and not a K28.5+, but it could also mean the opposite. Does “forcing running disparity” relate to the history of RD, meaning it would transmit a K28.5+ to make up for the existing cumulative negative running disparity, or does it force the current symbol negative?
I hoped that my Displayport monitor would resolve this by rejecting a training sequence with the disparities flipped, but trying it both ways, it turned out it happily accepted it either way.
So I set up a GTX channel between two FPGAs, one transmitting 8b/10b encoded data, and the second receiving data without 8b/10b decoding. This way I got the actual bits on the wire. I tried loopback first by the way, but it didn’t work out to enable encoding on one direction, and disabling it on the other. Not even by manipulating the Verilog sources that Vivado generated.
Test results
The results of this experiment was as follows:
- With TXCHARDISPVAL = 0, K28.5 is transmitted as 0011111010 on wire.
- With TXCHARDISPVAL = 1, K28.5 is transmitted as 1100000101 on wire.
- And to be sure that the bit polarity isn’t reversed, I verified that D17.1 indeed appears as 1000111001 (it’s indifferent to disparity)
All this relates to TXCHARDISPMODE = 1, of course.
The bit notation here is the bit transmitted first to the left (even though it appears as LSB on the parallel data signals)
Conclusion
TXCHARDISPVAL = 0 = “running disparity negative” = “Current RD -” in Appendix C of UG476. This is also what is referred to as e.g. K28.5- in the Displayport standard.
This is what I expected, as a matter of fact. But I wanted to be sure.
It’s often required to scan a lot of pages in one go, even manually. The problem is that when doing individual scans, there’s a significant delay between each scans, as the computer initializes the scanner for each.
The trick is to use scanimage’s batch scan feature. A typical command for scanning an 10 A4 pages with color into the current directory:
$ scanimage -y 297 --format pnm --batch=scan%04d.pnm --batch-count=10 --resolution 150 --mode Color
Stopping the sequence before the count given (on an HP Officejet 4500 all-in-one) with CTRL-C results in a message saying that the scanning is stopped, and indeed it is. Only the program has to be killed manually, and the scanner unplugged from power, and then powered up again. That aggressive.
Interesting variations:
- For B/W scanning (which is significantly faster), go –mode Gray instead
- The first number used in the file names can be set with –batch-start
- To stop between each scan and wait for the user to press RETURN, add the –batch-prompt flag
Before starting this, it’s recommended to run gthumb on the current directory, so the images can be inspected on the fly:
$ gthumb . &
And after finishing the session, it can be nice to convert the scans to JPG:
$ for i in *.pnm ; do convert $i ${i%%.pnm}.jpg ; done
Even though scanimage –help supplies options that imply that JPEG can be obtained directly from the scanner, it seems like scanimage doesn’t play ball with this.
To convert all pdfs into a jpgs with fairly good resolution (Linux, bash):
for i in *.pdf ; do convert -density 300 "$i" "${i%%.*}.jpg" ; done
Without the -density parameter, the result is pretty lousy.
To prepare a lot of image scans for printing, into a single pdf doc:
convert *.jpg -colorspace gray -contrast-stretch 1% printme.pdf
The contrast stretch is an equalization, so it works well when there’s a full page of stuff. As opposed to a small piece of paper that is scanned.
Just a quick note: The printer was connected via USB, but I sent jobs to it, and nothing happened.
Solution: As root, type hp-setup and flow with the wizard (I unchecked the “fax” part).
As non-root this didn’t work: It failed to install the queue, and asked me to rerun CUPS. But that didn’t help. Only running h-setup as root.
Not clear if this is really useful, but since I’m at it: I killed the Console Kit daemon as root (by process number) and nothing happened except for the following in /var/log/messages:
Apr 14 17:24:52 gnome-session[3378]: WARNING: Could not connect to ConsoleKit: Could not get owner of name 'org.freedesktop.ConsoleKit': no such name
Apr 14 17:24:52 gnome-session[3378]: WARNING: Could not connect to ConsoleKit: Could not get owner of name 'org.freedesktop.ConsoleKit': no such name
Apr 14 17:24:53 gnome-session[3378]: WARNING: Could not connect to ConsoleKit: Could not get owner of name 'org.freedesktop.ConsoleKit': no such name
Apr 14 17:24:54 gnome-session[3378]: WARNING: Could not connect to ConsoleKit: Could not get owner of name 'org.freedesktop.ConsoleKit': no such name
Apr 14 17:24:55 gnome-session[3378]: WARNING: Could not connect to ConsoleKit: Could not get owner of name 'org.freedesktop.ConsoleKit': no such name
That seemed a bit worrying, so I re-enabled it:
# ck-launch-session dbus-launch
DBUS_SESSION_BUS_ADDRESS=unix:abstract=/tmp/dbus-7yqpBheDeT,guid=77eb3eaa0dca8e1b552d2436cc025ecd
DBUS_SESSION_BUS_PID=26249
And there it’s back again:
$ ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
[...]
root 26177 0.0 0.0 4113796 2820 ? Sl 17:29 0:00 /usr/sbin/console-kit-daemon --no-daemon
And it has a huge Virtual Memory image, which is why I killed it to begin with, thinking it was leaking. So there was no problem to begin with.
Intro
The purpose of this mini-project was to create a video clip with visualized audio, instead of just a dull still frame. Libvisual is the commonly used graphics engine for Linux’ media players, but I wanted the result in a file, not on the screen.
Libvisual’s sources come with lv-tool, which is a command-line utility, apparently for testing the library and its plugins. It may send raw video to standard output, but as of March 2015 there is no plugin for getting the sound from standard input. So I hacked one together, compiled it, and used it with lv-tools (more about this below, of course).
Note to self: To resume, look for libvisual.git in L/linux/.
Installing required packages
The following installations were required on my machine (this may vary, depending on what you already have):
# apt-get install cmake g++
# apt-get install libpng-dev zlib1g-dev
# apt-get install autoconf
# apt-get install liborc-0.4-dev
Downloading & compiling libvisual
$ git clone https://github.com/Libvisual/libvisual.git libvisual
$ git checkout -b myown 4149d9bc1b8277567876ddba1c5415f4d308339d
$ cd libvisual/libvisual
$ cmake .
$ make
$ sudo make install
$ cd ../libvisual-plugins
$ cmake .
$ make
$ sudo make install
There is no particular reason why I checked out that specific commit ID, except a rather random attempt to solve a dependency issue (it was irrelevant, it turned out) and then forgot to switch back.
A trial run (not from stdin yet)
For help:
$ lv-tool -h
Listing plugins
$ lv-tool -p
A test run on a song: In one console, run
$ mplayer -af export=~/.mplayer/mplayer-af_export:512 song.mp3
This plays the song on the computer, and also allows libvisual access to the raw sound samples.
And on another console
$ lv-tool -i mplayer -a blursk -F 300 -f 5 -D 640x480 -d stdout > song.bin
Then play the clip with
$ ffplay -f rawvideo -video_size 640x480 -pix_fmt gray -framerate 5 song.bin
The frame rate was chosen as 5, and it can be increased, of course.
(Use “ffplay -pix_fmts” to get a list of supported pixel format, such as rgb8)
This isn’t all that good, because lv-tool generates video frames on the sound currently played. Even though it’s possible to sync the video with audio later on, there is no guarantee that this sync will remain — if the computer gets busy somewhere in the middle of rendering, lv-tool may stall for a short moment, and then continue with the sound played when it’s back. mplayer won’t wait, and lv-tools will make no effort to compensate for the lost frames — on the contrary, it should skip frames after stalling.
The stdin plugin
The idea behind the stdin plugin is so simple, that I’m quite sure libvisual’s developers actually have written one, but didn’t add it to the distribution to avoid confusion: All it does is reading samples from stdin, and supply a part of them as sound samples for rendering. As the “upload” method is called for every frame, it’s enough to make to consume the amount of sound samples that corresponds to the frame rate that is chosen when the raw video stream is converted into a clip.
The plugin can be added to libvisual’s project tree with this git patch. It’s made against the commit ID mentioned above, but it’s probably fine with later revisions. It doesn’t conform with libvisual’s coding style, I suppose — it’s a hack, after all.
Note that the patch is hardcoded for signed 16 bit, Stereo at 44100 Hz, and produces 30 fps. This is easily modified on the source’s #define statements at the top. The audio samples are supplied to libvisual’s machinery in buffers of 4096 bytes each, even though 44100 x 2 x 2 / 30 = 5880 bytes per frame at 30 fps — it’s common not supply all audio samples that are played. The mplayer plugin supplies only 2048 bytes, for example. This has a minor significance on the graphics.
After patching, re-run cmake, compilation and installation. Instead of reinstalling all, possibly copy the plugin manually into the required directory:
# cp libinput_stdin.so /usr/local/lib/x86_64-linux-gnu/libvisual-0.5/input/
The plugin should appear on “lv-tool -p” after this procedure. And hopefully work too. ;)
Producing video
The blursk actor plugin is assumed here, but any can be used.
First, convert song to WAV:
$ ffmpeg -i song.mp3 song.wav
Note that this is somewhat dirty: I should have requested a raw audio stream with the desired attributes as output, and ffmpeg is capable of doing it. But the common WAV file is more or less that, except for the header, which is skipped quickly enough.
Just make sure the output is stereo, signed 16 bit, 44100 Hz or set ffmpeg’s flags accordingly.
Create graphics (monochrome):
$ lv-tool -i stdin -a blursk -D 640x480 -d stdout > with-stdin.bin < song.wav
Mixing video with audio and creating a DIVX clip:
$ ffmpeg -f rawvideo -s:v 640x480 -pix_fmt gray -r 30 -i with-stdin.bin -ab 128k -b 5000k -i song.wav -vcodec mpeg4 -vtag DIVX try.avi
Same, but with colors (note the -c 32 and -pix_fmt):
$ time lv-tool -i stdin -c 32 -a blursk -D 640x480 -d stdout > color.bin < song.wav
$ ffmpeg -f rawvideo -s:v 640x480 -pix_fmt rgb32 -r 30 -i color.bin -ab 128k -b 5000k -i song.wav -vcodec mpeg4 -vtag DIVX color.avi
It’s also possible to use “24″ instead of “32″ above, but some actors will produce a black screen with this setting. They will also fail the same with 8 bits (grayscale).
And to avoid large intermediate .bin files, pipe from lv-tool to ffmpeg directly:
$ lv-tool -i stdin -c 32 -a blursk -D 640x480 -d stdout < song.wav | ffmpeg -f rawvideo -s:v 640x480 -pix_fmt rgb32 -r 30 -i - -ab 128k -b 5000k -i song.wav -vcodec mpeg4 -vtag DIVX clip.avi
This is handy in particular for high-resolution frames (HD and such).
The try-all script
To scan through all actors in libvisual-0.5, run the following script (produces 720p video, or set “resolution”):
#!/bin/bash
song=$1
resolution=1280x720
for actor in blursk bumpscope corona gforce infinite jakdaw jess \
lv_analyzer lv_scope oinksie plazma ; do
lv-tool -i stdin -c 24 -a $actor -D $resolution -d stdout < $song | \
ffmpeg -f rawvideo -s:v $resolution -pix_fmt rgb24 -r 30 -i - \
-ab 128k -b 5000k -i $song -vcodec mpeg4 -vtag DIVX ${actor}_color.avi
lv-tool -i stdin -c 8 -a $actor -D $resolution -d stdout < $song | \
ffmpeg -f rawvideo -s:v $resolution -pix_fmt gray -r 30 -i - \
-ab 128k -b 5000k -i $song -vcodec mpeg4 -vtag DIVX ${actor}_gray.avi
done
It attempts both color and grayscale.
This related to my Fedora 12 machine with a Logitech M705 mouse. It had a generally bad feeling, I would say.
This is actually written on this post already, with some more details on this one, but I prefer having my own routine and final values written down.
So first get a list of input devices:
$ xinput list
⎡ Virtual core pointer id=2 [master pointer (3)]
⎜ ↳ Virtual core XTEST pointer id=4 [slave pointer (2)]
⎜ ↳ Microsoft Microsoft 5-Button Mouse with IntelliEye(TM) id=6 [slave pointer (2)]
⎜ ↳ HID 04f3:0103 id=7 [slave pointer (2)]
⎜ ↳ Logitech USB Receiver id=9 [slave pointer (2)]
⎣ Virtual core keyboard id=3 [master keyboard (2)]
↳ Virtual core XTEST keyboard id=5 [slave keyboard (3)]
↳ Power Button id=12 [slave keyboard (3)]
↳ Power Button id=13 [slave keyboard (3)]
↳ USB AUDIO id=14 [slave keyboard (3)]
↳ HID 04f3:0103 id=8 [slave keyboard (3)]
↳ Logitech USB Receiver id=10 [slave keyboard (3)]
Then get the properties of the USB mouse. Since the string “Logitech USB Receiver” refers to a keyboard input as well as a mouse input, this has to be disambiguated with a pointer: prefix to the identifier. Or just use the ID (not safe on a script, though):
So
$ xinput list-props 9
and
$ xinput list-props pointer:"Logitech USB Receiver"
give the same result, given the list of input devices above.
The output:
$ xinput list-props pointer:"Logitech USB Receiver"
Device 'Logitech USB Receiver':
Device Enabled (131): 1
Device Accel Profile (264): 0
Device Accel Constant Deceleration (265): 1.000000
Device Accel Adaptive Deceleration (267): 1.000000
Device Accel Velocity Scaling (268): 10.000000
Evdev Reopen Attempts (269): 10
Evdev Axis Inversion (270): 0, 0
Evdev Axes Swap (272): 0
Axis Labels (273): "Rel X" (139), "Rel Y" (140)
Button Labels (274): "Button Left" (132), "Button Middle" (133), "Button Right" (134), "Button Wheel Up" (135), "Button Wheel Down" (136), "Button Horiz Wheel Left" (137), "Button Horiz Wheel Right" (138), "Button Side" (283), "Button Extra" (284), "Button Forward" (1205), "Button Back" (1206), "Button Task" (1207), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249), "Button Unknown" (249)
Evdev Middle Button Emulation (275): 2
Evdev Middle Button Timeout (276): 50
Evdev Wheel Emulation (277): 0
Evdev Wheel Emulation Axes (278): 0, 0, 4, 5
Evdev Wheel Emulation Inertia (279): 10
Evdev Wheel Emulation Timeout (280): 200
Evdev Wheel Emulation Button (281): 4
Evdev Drag Lock Buttons (282): 0
It turns out, that the required change on my machine was
$ xinput set-prop pointer:"Logitech USB Receiver" "Device Accel Adaptive Deceleration" 3
This is not what I expected to do — it slows down the pointer’s movement when the mouse moves slowly. Surprisingly enough, this makes the pointing more intuitive, because hitting that exact spot requires more physical motion, and mouse doesn’t get stuck on millimeters.
As the said post mentions, these settings won’t survive a session restart. But that’s a rare event on my computer. Anyhow, the method suggsted for making it persistent is to add a small script as a starter application. To do this, prepare a small script doing the required setup, and add it as a starter script with
$ gnome-session-properties &
Or, maybe the correct way is to add/edit ~/.xinitrc or ~/.xprofile? Will figure that out when I logout next time (happens once in a few months…).
I discovered this problem in a project that instantiated a 512-bit wide FIFO consisting many (>16) times in different modules. For some unknown reason (it’s called a bug, I suppose) Vivado treated the instantiation as if it wasn’t there, and optimized all surrounding logic as if the black box’ output ports were all zero. For a lower number of instantiations, Vivado handled the instantiation as expected.
I should point out that Vivado did issue synthesis warnings related to the instantiation as if it was OK (e.g. mismatches between port widths and the wires applied), and yet, there was no trace of these instantiation in the post-synthesis netlist view.
In Vivado, the cores from the IP Catalog is treated as a black box module: The IP is typically first compiled into a DCP, and then a black-box module (empty Verilog module, for example) is used to represent it during the synthesis stage. The DCP is then fused into the project during the implementation (like ngdbuild in ISE).
One clue that this happens takes the form of a critical warning from the implementation stage saying something like
CRITICAL WARNING: [Designutils 20-1280] Could not find module 'fifo_wide'. The XDC file /path/to/fifo_wide/fifo_wide/fifo_wide.xdc will not be read for any cell of this module.
Another way to tell this has taken place is to look in the synthesis’ runme.log file (as in vivado-project/project.runs/synth_1/runme.log). The black boxes are listed in the “Report BlackBoxes” section, and each of their instantiation in “Report Cell Usage”. So if the instantiated module doesn’t appear at all in the former, or not enough times in the latter — this is a clear indication something went wrong.
Workaround
After trying out a lot of things, the workaround was to define two IP cores — fifo_wide_rd and fifo_wide_wr — which are identical. The root of them problem seems to have been that the same FIFO was used in two different modules (one that writes from a DDR memory, and one that reads). Due to the different usage contexts and the huge amount of logic involved, it seems like the tools messed up trying to optimize things.
So using one core for the write module and one for the read module got the tools back on track. This is of course no sensible reason to use different cores in different modules, except for a bug in Vivado.
I should mention, that another FIFO is instantiated 20 times in the design, also from two different modules, and nothing bad happened there. However its width is only 32 bits.
Failed attempt
This solves the problem on synthesis, but not all the way through. I left it here just for reference.
The simple solution is to tell Vivado not to attempt optimizing anything related to this module. For example, if the instance name is fifo_wide_inst, the following line in any of the XDC constraints files will do:
set_property DONT_TOUCH true [get_cells -hier -filter {name=~*/fifo_wide_inst}]
This should be completely harmless, as there’s nothing to optimize anyhow — the logic is already optimized inside the DCP. It may be a good idea to do this to all instantiations, just to be sure.
What actually happened with this constraint is that many groups of twenty BUF elements (not IBUF or something like that. Just BUF), named for example ‘bbstub_dout[194]__xx’ (xx going from 1 to 20) were created in the netlist. All had non-connected inputs and the outputs of all twenty buffers connected to the same net. So obviously, nothing good came out of this. The fifo_wide_inst block was non-existent in the netlist, even though twenty instances of it appeared in the synthesis’ runme.log file.
So there were twenty groups of bbstubs for each of the 512 wires of the FIFO, and this applied for each of the twenty modules on which one of these FIFOs was instantiated. No wonder the implementation took a lot of time.