It’s often desired to know how much wall clock time the SQL queries take. As with Perl, there’s more than one way to do it. This is a simple way, which involves overriding DBI’s execute() method, so it measures the time and issues a warn() with basic caller info and the time in milliseconds.
The same thing can be done with any other method, of course. Go “man DBI” and look for “Subclassing the DBI” for the full story.
So the first thing is to define the RootClass property on the DBI object generation, so that MySubDBI is the root class. Something like
$dbh = DBI->connect( "DBI:mysql::localhost", "thedatabase", "thepassword",
{ RaiseError => 1, AutoCommit => 1, PrintError => 0,
RootClass => 'MySubDBI',
Taint => 1});
and then, somewhere, the class needs to be defined. This can be in a separate module .pm file, but also at the end of the same file as the code for the DBI->connect:
package MySubDBI;
use strict;
use DBI;
use vars qw(@ISA);
@ISA = qw(DBI);
package MySubDBI::db;
use vars qw(@ISA);
@ISA = qw(DBI::db);
# Empty, yet necessary.
package MySubDBI::st;
use Time::HiRes qw(gettimeofday tv_interval);
use vars qw(@ISA);
@ISA = qw(DBI::st);
sub execute {
my ($sth, @args) = @_;
my $tic = [gettimeofday];
my $res = $sth->SUPER::execute(@args);
my $exectime = int(tv_interval($tic)*1000);
my ($package0, $file0, $line0, $subroutine0) = caller(0);
my ($package1, $file1, $line1, $subroutine1) = caller(1);
warn("execute() call from $subroutine1 (line $line0) took $exectime ms\n");
return $res;
}
1;
The code can be smarter, of course. For example, issue a warning only if the query time exceeds a certain limit.
Introduction
At some point I needed to choose between using LIKE or REGEXP() for a not so simple string match. Without going into the details, the matching string contains a lot of wildcard segments, and while both would have done the job, I thought maybe REGEXP() would benefit from some extra information about the wildcard parts. It’s not that I cared that ‘%’ would match characters it shouldn’t, but I wanted so save some backtracking by telling the matching engine not to match just anything. Save some CPU, I thought. Spoiler: It was a nice thought, but no.
So I ran a few performance tests on a sample table:
CREATE TABLE product_info (
cpe_name BLOB NOT NULL,
title BLOB NOT NULL,
PRIMARY KEY (cpe_name(511))
) engine = MyISAM;
Note that cpe_name is the primary key, and is hence a unique index.
I should also mention that the match patterns I use are for testing are practically useless for CPE name matching because they don’t handle escaped “:” characters properly. Just in case you know what a CPE is and you’re here for that. In short, these are just performance tests.
I did this on MySQL server version: 5.1.47, Source distribution. There are newer versions around, I know. Maybe they do better.
The art of silly queries
So there’s about more than half a million entries:
mysql> SELECT COUNT(*) FROM product_info;
+----------+
| COUNT(*) |
+----------+
| 588094 |
+----------+
1 row in set (0.00 sec)
Let’s ask it another way:
mysql> SELECT COUNT(*) FROM product_info WHERE cpe_name LIKE '%';
+----------+
| COUNT(*) |
+----------+
| 588094 |
+----------+
1 row in set (0.08 sec)
Say what? LIKE ‘%’ is always true for a non-NULL BLOB. MySQL didn’t optimize this simple thing, and actually checked every single entry?
mysql> EXPLAIN SELECT COUNT(*) FROM product_info;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT COUNT(*) FROM product_info WHERE cpe_name LIKE '%';
+----+-------------+--------------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | product_info | ALL | NULL | NULL | NULL | NULL | 588094 | Using where |
+----+-------------+--------------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.00 sec)
Apparently it did. Other silly stuff:
mysql> SELECT COUNT(*) FROM product_info WHERE NOT cpe_name IS NULL;
+----------+
| COUNT(*) |
+----------+
| 588094 |
+----------+
1 row in set (0.08 sec)
mysql> EXPLAIN SELECT COUNT(*) FROM product_info WHERE NOT cpe_name IS NULL;
+----+-------------+--------------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | product_info | ALL | PRIMARY | NULL | NULL | NULL | 588094 | Using where |
+----+-------------+--------------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.00 sec)
Silly or what? The column is defined as “NOT NULL”. What is there to check? So maybe the idea is that if I make stupid queries, MySQL responds with stupid behavior. Well, not really:
mysql> SELECT COUNT(*) FROM product_info WHERE 1=1;
+----------+
| COUNT(*) |
+----------+
| 588094 |
+----------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT COUNT(*) FROM product_info WHERE 1=1;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
1 row in set (0.00 sec)
It’s more like MySQL makes optimizations only if they’re really obvious. MySQL != gcc.
LIKE vs REGEXP
So it turns out that LIKE is really fast, and it seems to take advantage of the index:
mysql> SELECT cpe_name FROM product_info WHERE cpe_name LIKE 'cpe:2.3:a:hummingbird:cyberdocs:-%';
+-------------------------------------------------+
| cpe_name |
+-------------------------------------------------+
| cpe:2.3:a:hummingbird:cyberdocs:-:*:*:*:*:*:*:* |
+-------------------------------------------------+
1 row in set (0.00 sec)
The same query, only with a regex:
mysql> SELECT cpe_name FROM product_info WHERE cpe_name REGEXP '^cpe:2.3:a:hummingbird:cyberdocs:-.*';
+-------------------------------------------------+
| cpe_name |
+-------------------------------------------------+
| cpe:2.3:a:hummingbird:cyberdocs:-:*:*:*:*:*:*:* |
+-------------------------------------------------+
1 row in set (0.21 sec)
Recall that indexing means that the rows are sorted. Because the initial part of the string is fixed, it’s possible to narrow down the number of rows to match with an index lookup, and then work on those that match that initial part. This was taken advantage of with the LIKE match, but apparently not with REGEXP():
mysql> EXPLAIN SELECT cpe_name FROM product_info WHERE cpe_name LIKE 'cpe:2.3:a:hummingbird:cyberdocs:-%';
+----+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | product_info | range | PRIMARY | PRIMARY | 513 | NULL | 1 | Using where |
+----+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT cpe_name FROM product_info WHERE cpe_name REGEXP '^cpe:2.3:a:hummingbird:cyberdocs:-.*';
+----+-------------+--------------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | product_info | ALL | NULL | NULL | NULL | NULL | 588094 | Using where |
+----+-------------+--------------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.00 sec)
The conclusion is quite clear: For heavy duty matching, don’t use REGEXP() if LIKE can do the job. In particular if a lot of rows can be ruled out by virtue of the first characters in the string.
Making it harder for LIKE
Let’s warm up a bit:
mysql> SELECT cpe_name FROM product_info WHERE cpe_name LIKE 'cpe:2.3:a:humming%:cyberdocs:-%';
+-------------------------------------------------+
| cpe_name |
+-------------------------------------------------+
| cpe:2.3:a:hummingbird:cyberdocs:-:*:*:*:*:*:*:* |
+-------------------------------------------------+
1 row in set (0.00 sec)
It was quicker than measurable, mainly because there was little to do:
mysql> EXPLAIN SELECT cpe_name FROM product_info WHERE cpe_name LIKE 'cpe:2.3:a:humming%:cyberdocs:-%';
+----+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | product_info | range | PRIMARY | PRIMARY | 513 | NULL | 80 | Using where |
+----+-------------+--------------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
As expected, the index was used for the initial part of the string, and that left MySQL with 80 rows to actually do the matching (in this specific case). So it was quick.
But what if the wildcard is at the beginning of the string?
mysql> SELECT cpe_name FROM product_info WHERE cpe_name LIKE '%cpe:2.3:a:hummingbird:cyberdocs:-%';
+-------------------------------------------------+
| cpe_name |
+-------------------------------------------------+
| cpe:2.3:a:hummingbird:cyberdocs:-:*:*:*:*:*:*:* |
+-------------------------------------------------+
1 row in set (0.10 sec)
So that took some considerable time, however still less than the REXEXP() case. The index didn’t help in this case, it seems:
mysql> EXPLAIN SELECT cpe_name FROM product_info WHERE cpe_name LIKE '%cpe:2.3:a:hummingbird:cyberdocs:-%';
+----+-------------+--------------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | product_info | ALL | NULL | NULL | NULL | NULL | 588094 | Using where |
+----+-------------+--------------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.00 sec)
So apparently, the pattern was applied to all rows. It was still faster than REGEXP() making a very simple match. So the latter seems not to be optimized in MySQL.
And to wrap this up, an expression full with wildcards. Similar to the one I thought maybe REXEXP would do better:
mysql> SELECT COUNT(*) FROM product_info WHERE cpe_name LIKE 'cpe:2.3:a:%:%:%:-:%:%:%:%:%:%';
+----------+
| COUNT(*) |
+----------+
| 13205 |
+----------+
1 row in set (0.11 sec)
mysql> EXPLAIN SELECT COUNT(*) FROM product_info WHERE cpe_name LIKE 'cpe:2.3:a:%:%:%:-:%:%:%:%:%:%';
+----+-------------+--------------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | product_info | ALL | PRIMARY | NULL | NULL | NULL | 588094 | Using where |
+----+-------------+--------------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.00 sec)
So what happened here is that the initial part of the string didn’t help much, and the match seems to have been done on all rows. It took more or less the same time as the much simpler match pattern above.
Conclusion
There seems to be two main conclusions from this little set of experiments: The first one isn’t surprising: The most important factor is how many rows are being accessed, not so much what is done with them. And the second is that MySQL does some sensible optimizations when LIKE is used, in particular it narrows down the number of rows with the index, when possible. Something it won’t do with REGEXP(), even if with an “^” at the beginning of the regex.
Introduction
When creating indexes to TEXT / BLOB columns (and their variants), it’s required to specify how many characters the index should cover. In MySQL’s docs it’s usually suggested to keep them short in for better performance. There’s also a limit on the number of characters, which varies from one database engine to another, going from a few hundreds to a few thousands characters.
However it’s not unusual to use UNIQUE INDEX for making the database enforce the uniqueness of a field in the table. ON DUPLICATE KEY, INSERT IGNORE, UPDATE IGNORE and REPLACE can then be used to gracefully keep things tidy.
But does UNIQUE INDEX mean that the entire field remains unique, or is only the part covered by the index checked? Spoiler: Only the part covered by the index. In other words, MySQL’s uniqueness enforcement may be too strict.
Almost needless to say, if the index isn’t UNIQUE, it’s just a performance issue: If the number of covered characters is small, the database will more often fetch and discard the data of a row that turns out not to match the search criteria. But it doesn’t change the logic behavior. With UNIQUE INDEX, the number of characters does.
A simple test follows.
Trying the UNIQUE INDEX
I created a table with MySQL 5.1.47, with an index covering only 5 chars. Deliberately very short.
CREATE TABLE try (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
message BLOB NOT NULL,
PRIMARY KEY (id),
UNIQUE INDEX (message(5))
) engine = MyISAM;
Which ends up with this:
mysql> DESCRIBE try;
+---------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| message | blob | NO | UNI | NULL | |
+---------+------------------+------+-----+---------+----------------+
2 rows in set (0.00 sec)
Inserting the first row:
mysql> INSERT INTO try(message) VALUES('Hello, world');
Query OK, 1 row affected (0.00 sec)
And now trying a second:
mysql> INSERT INTO try(message) VALUES('Hello there');
ERROR 1062 (23000): Duplicate entry 'Hello' for key 'message'
That’s it. It just looked at the first five chars. Trying a difference within this region:
mysql> INSERT INTO try(message) VALUES('Hell there');
Query OK, 1 row affected (0.00 sec)
No wonder, that worked.
mysql> SELECT * FROM try;
+----+--------------+
| id | message |
+----+--------------+
| 1 | Hello, world |
| 2 | Hell there |
+----+--------------+
2 rows in set (0.00 sec)
Handling large TEXT/BLOB
And that leaves the question: How is it possible to ensure uniqueness on large chunks of text or binary data? One solution I can think of is to add a column for a hash (say SHA1), and let the application calculate that hash for each row it inserts, and insert it along with it. And make the UNIQUE INDEX on the hash, not the text. Something like
CREATE TABLE try2 (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
message BLOB NOT NULL,
hash TINYBLOB NOT NULL,
PRIMARY KEY (id),
UNIQUE INDEX (hash(40))
) engine = MyISAM;
But wait. MySQL supports hashing functions. Why not use them instead? Well, the problem is that if I want an INSERT statement push the data and its hash in one go they query becomes a bit nasty. What came to my mind is:
mysql> INSERT INTO try2(message, hash) (SELECT t.x, SHA1(t.x) FROM (SELECT 'Hello there' AS x) AS t);
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
Can it get more concise than this? Suggestions are welcome. The double SELECT is required because I want the string literal to be mentioned once.
Isn’t it easier to let the application calculate the SHA1, and send it to the server by value? It’s a matter of taste, I guess.
Anyhow, trying again with exactly the same:
mysql> INSERT INTO try2(message, hash) (SELECT t.x, SHA1(t.x) FROM (SELECT 'Hello there' AS x) AS t);
ERROR 1062 (23000): Duplicate entry '726c76553e1a3fdea29134f36e6af2ea05ec5cce' for key 'hash'
and with something slightly different:
mysql> INSERT INTO try2(message, hash) (SELECT t.x, SHA1(t.x) FROM (SELECT 'Hello there!' AS x) AS t);
Query OK, 1 row affected (0.00 sec)
Records: 1 Duplicates: 0 Warnings: 0
So yep, it works:
mysql> SELECT * FROM try2;
+----+--------------+------------------------------------------+
| id | message | hash |
+----+--------------+------------------------------------------+
| 1 | Hello there | 726c76553e1a3fdea29134f36e6af2ea05ec5cce |
| 2 | Hello there! | 6b19cb3790b6da8f7c34b4d8895d78a56d078624 |
+----+--------------+------------------------------------------+
2 rows in set (0.00 sec)
Once again, even though the example I showed demonstrates how to make MySQL calculate the hash, I would do it in the application.
To make it short, the command at shell prompt is
$ perl -MMIME::QuotedPrint -e 'local $/; $x=<>; print decode_qp($x)' < quoted.txt > unquoted.html
and I needed this to extract an HTML segment of an email.
As Perl allows for complicated data structures by virtue of references to hashes and arrays, it’s often useful to look into what’s going on there. In my case, it was on the output of a large JSON parse.
So to make a long story short, if $tree is a reference to the data structure, go
use Data::Dumper;
$Data::Dumper::Terse = 1;
$Data::Dumper::Purity = 1;
$Data::Dumper::Sortkeys = 1;
print Dumper $tree;
Noted how many flags? Data::Dumper isn’t always ideal for pretty-printing (there are a few alternatives) but it wins mainly because it’s part of the commonly installed Perl libraries. One of its drawbacks, which is also its advantage, is that its output is Perl code that reconstructs the data structure. Which means that it fusses with accuracy, in particular if the data structure contains blessed references.
So I definitely miss a Data::Dumper::Hooman or something of that sort.
And even more annoyingly, if it meets a complicated value more than one (e.g. a blessed refs), it puts a reference to the first appearance of the same value in the following times. Which is efficient, maybe, but doesn’t help for reading by a human.
So to the flags.
The Sortkeys flag is recommended for human reading (as well as diffs) for obvious reasons.
The Terse flag makes sure that values are dumped as literals and not referenced. For this to work, the Purity flag was also necessary in my case, or the Terse flag was simply ignored silently. The latter adds a lot of assignments at the end of the dump to fix inaccuracies.
The problem was that one of the fields was blessed as JSON::PP::Boolean, so the dump read
'acInsufInfo' => bless( do{\(my $o = 0)}, 'JSON::PP::Boolean' ),
and then it was referenced over and over again. With the Purity flag, the references appeared at the end of the dump, to correct the inaccurate (non-blessed) assignment before. It seems like without it, Dumper refused to respect the Terse flag for these, because it would break the concept that the dump can be executed to reconstruct the original.
Closing note: It’s quite unusual that mainstream Perl libraries behave in a quirky way that actually needs explanation.
This is just a few notes to self on this tool.
- Version 2.10 is a big deal: It’s the first one to use GPU acceleration for scaling and rotation, as well as deforming of different types. No more small previews, no more sluggish response to scaling. No need to scale down images for working, because it was too slow.
- The default theme (dark) is a joke. It’s elegant, but it’s difficult to see the differences between dark grey and lighter grey, which is the difference between a layer lock being enabled or not. I went for “system”, which got me Mint colors and nice icon spacing. Doesn’t look very elegant, but hey, I now know what I’m doing.
- The default icons are a joke as well. Very sleek, but the colors are gone. Once again, it’s elegance vs. functionality. There’s a “color” option for the icon. Looks like a children’s toolbox now, but it’s much easier to find the correct tool.
Linked layers
First and foremost: Layer groups is most likely the preferred solution. In particular, I used to link between a layer with a layer mask and its duplicate with the same layer mask applied in order to resize / rotate the layer along its layer mask. This isn’t necessary anymore: Just put the layer with a layer mask inside a group, and manipulate the group instead of the layer. This is quicker than this duplicate, apply and link method, but there’s one feature still lacking: The ability to crop the layer to which the mask has been applied, and manipulate a smaller area.
Linked layers doesn’t seem to work on 2.10.14 — this seems to be a bug: Say, when linking two layers and scaling / rotating one of them, nothing happens with the other.
Solution / workaround: There are two options I know of:
- Don’t use linked layers: Put the relevant layers in a layer group, and do the scaling / rotation on the group. But that doesn’t work well for manipulating a small layer in order to apply the changes on a larger one.
- Link between a layer group (possibly with one layer) and the layers that are affected by it. This really looks like a bug workaround: Linking layers to a layer group will make the changes to the layer group apply to all other layers. But only if the manipulations are made to the layer group. Which is more or less fine, because the normal way to work is to manipulate some specific layer, and the others follow. So put that layer in a group of its own, and link the others with the group.
Introduction
As the title implies, this post compares two solutions for connecting an FPGA to a host via USB 3.0: Cypress’ FX3 chipset, which has been around since around 2010, and the XillyUSB IP core, which was released in November 2020.
Cypress has been acquired by Infineon, but I’ll stick with Cypress. It’s not clear if the products are going to be re-branded (like Intel did with Altera, for example).
Since I’m openly biased towards XillyUSB, let’s be fair enough and start with its disadvantages. The first and obvious one is how long it’s been around compared with the FX3. Another thing is that XillyUSB won’t fall back to USB 2.0 if a USB 3.0 link fails to establish. This fallback option is important in particular because computer’s USB 3.x ports are sometimes of low quality, so even though the user expected to benefit from USB 3.x speed, the possibility to plug the device into a non-USB 3.x port can save the day.
This is however relevant only for applications that are still useful with USB 2.0, e.g. hard disk, USB sticks and Ethernet adapters — these still work, but do benefit from a faster connection when possible. If the application inherently needs payload speeds above 25 MBytes/s, it’s USB 3.0 or perish.
Thirdly, XillyUSB requires an FPGA with an MGT supporting 5 Gb/s. Low-cost FPGAs don’t. But from a BOM cost point of view, odds are that upgrading the FPGA costs less than adding the FX3 device along with its supporting components.
Finally, a not completely related comment: USB is good for hotpluggable, temporary connections. If a fixed link is required between an FPGA and some kind of computer, PCIe is most likely a better choice, possibly using Xillybus’ IP core for PCIe. Compared with USB 2.0, it might sound like a scary option, and PCIe isn’t always supported by embedded devices. But if USB 3.x is an option, odds are that PCIe is too. And a better one, unless hotplugging is a must.
FX3: Another device, another processor, another API and SDK
XillyUSB is an IP core, and hence resides side-by-side with the application logic on the FPGA. It requires a small number of pins for its functionality: Two differential wire pairs to the USB connector, and an additional pair of wires to a low-jitter reference clock. A few GPIO LEDs are recommended for status indications, but are not mandatory. The chances for mistakes in the PCB design are therefore relatively slim.
By contrast, using the FX3 requires following a 30+ pages hardware design application note (Cypress’ AN70707) to ensure proper operation of that device. As for FPGA pin consumption, a minimum of 40 pins is required to attain 400 MB/s of data exchange through a slave FIFO (e.g. 200 MB/s in each direction, half the link capacity), since the parallel data clock is limited to 100 MHz.
It doesn’t end there: The FX3 contains an ARM9 processor for which firmware must be developed. This firmware may produce USB traffic by itself, or configure the device to expose a slave FIFO interface for streaming data from and to the FPGA. This way or another, code for the ARM processor needs to be developed in order to carry out the desired configuration, at a minimum.
This is done with Cypress’ SDK and based upon coding examples, but there’s no way around this extra firmware task, which requires detailed knowledge on how the device works. For example, to turn off the FX3′s LPM capability (which is a good idea in general), the CyU3PUsbLPMDisable() API function should be called. And there are many more of this sort.
Interface with application logic in the FPGA
XillyUSB follows Xillybus’ paradigm regarding interface with application logic: There’s a standard synchronous FIFO between the application logic and the XillyUSB IP core for each data stream, and the application logic uses it mindlessly: For an FPGA-to-host stream, the application logic just pushes the data into the FIFO (checking that it’s not full), knowing it will reach the host in a timely manner. For the opposite direction, it reads from the FIFO when it’s non-empty.
In other words, the application logic interfaces with these FIFOs like FPGA designers are used to, for the sake of streaming data between different functional modules in a design. There is no special attention required because the destination or source of the data is a USB data link.
The FX3′s slave FIFO interface may sound innocent, but it’s a parallel data and control signal interface, allowing the FPGA to issue read and write commands on buffers inside the FX3. This requires developing logic for a controller that interfaces with the slave FIFO interface: Selection of the FX3 buffer to work with, sense its full or empty status (depending on the direction) and transfer data with this synchronous interface. If more than one data stream is required between the FPGA and the host, this controller also needs to perform scheduling and multiplexing. State machines, buffering of data, arbitration, the whole thing.
Even though a controller of this sort may seem trivial, it’s often this type of logic that is exposed to corner cases regarding flow of data: The typical randomness of data availability on one side and the ability to receive it on the other, creates scenarios that are difficult to predict, simulate and test. Obtaining a bulletproof controller of this sort is therefore often significantly more difficult than designing one for a demo.
When working with XillyUSB (or any other Xillybus IP core), the multiplexing is done inside the IP core: Designed, tested and fine polished once and for all. And this opens for another advantage: Making changes to the data stream setting, and adding streams to an existing design is simple and doesn’t jeopardize the stability of the already existing logic. Thanks to Xillybus’ IP Core Factory, this only requires some simple operations on the website and downloading the new IP core. Its deployment in the FPGA design merely consists of replacing files, making trivial changes in the HDL following a template, and adding a standard FPGA FIFO for the new stream. Nothing else in the logic design changes, so there are no side effects.
Host software design
The FX3′s scope in the project is to present a USB device. The driver has to be written more or less from scratch. So the host software, whether as a kernel driver or a libusb user-space implementation, must be written with USB transfers as the main building block. For a reasonable data rate (or else why USB 3.0?), the software design must be asynchronous: Requests are queued for submission, and completer functions are called when these requests are completed. The simple wait-until-done method doesn’t work, because this leads to long time gaps of no communication on the USB link. Aside from the obvious impact on bandwidth utilization, this is likely to cause overflows or underflows in the FPGA’s buffers.
With XillyUSB (and once again, with other Xillybus IP cores too), a single, catch-all driver presents pipe-like device files. Plain command-line utilities like “cat” and “dd” can be used to implement reliable and practical data acquisition and playback. The XillyUSB IP core and the dedicated driver use the transfer-based USB protocol for creating an abstraction of a simple, UNIX-like data stream.
FPGA application logic: USB transfers or continuous data?
The USB specification was written with well-defined transfers in mind. The underlying idea was that the host allocates a buffer and queues a data transfer request, related to a certain USB endpoint, to or from that buffer. For continuous communication, several transfers can be queued. Yet, there are data buffers of fixed size, each waiting for its turn.
Some data sinks and sources are naturally organized in defined chunks of data, and fit USB’s concept well. From a software design’s point of view, it’s simpler to comprehend a mechanism that relies on fixed-sized buffers, requests and fulfillments.
But then, what is natural in an FPGA design? In most applications, continuous, non-packeted data is the common way. Even video applications, where there’s a clear boundary between frames, are usually implemented with regular FIFOs between the internal logic block. With XillyUSB, this is the way the data flows: FIFOs on the FPGA and pipe-like device files on the host side.
With FX3, on the other hand, the USB machinery needs direct attention. For example: When transmitting data towards the host, FX3′s slave FIFO interface requires asserting PKTEND# in order to commit the data to the host, which may also issue a zero-length packet instead. This complication is necessary to maintain USB’s concept of a transfer: Sending a USB DATA packet shorter than the maximal allowed length tells the host that the transfer is finished, even if the buffer that was allocated for the transfer isn’t filled. Therefore, the FX3 can’t just send whatever data it has in the buffer because it has nothing better to do. Doing so would terminate the transfer, which can mean something in the protocol between the driver and its device.
But then, if the transfer request buffer’s size isn’t a multiple of the maximal USB DATA packet size (1024 bytes for USB 3.0), PKTEND# must be asserted before this buffer fills, or a USB protocol error occurs, as the device sends more data than can be stored. The USB protocol doesn’t allow the leftovers to be stored in the next queued transfer’s buffer, and it’s not even clear if such transfer is queued.
If this example wasn’t clear because of too much new terminology, no problem, that was exactly the point: The USB machinery one needs to be aware of.
Physical link diagnostics
As a USB device can be connected to a wide range of USB host controllers, on various motherboards, through a wide range of USB cables, the quality of the bitstream link may vary. On a good day it’s completely error-free, but sometimes it’s a complete mess.
Low-level errors don’t necessarily cause immediate problems, and sometimes the visible problems don’t look like a low-level link issue. The USB protocol is designed to keep the show running to the extent possible (retransmits and whatnot), so what appears to be occasional problems with a USB device could actually be a bad link all the time, with random clusters of mishaps that make the problem become visible, every now and then.
Monitoring the link’s health is therefore beneficial, both in a lab situation, but nevertheless in a product. The application software can collect error event information, and warn the user that even though all seems well, it’s advisable to try a different USB port or cable. Sometimes, that’s all it takes.
XillyUSB provides a simple means for telling something is wrong. There’s an output from the IP core, intended for a plain LED that flashes briefly for each error event that is detected. There are more detailed LEDs as well. Also, the XillyUSB driver creates a dedicated device file, from which diagnostic data can be read with a simple file operation. This diagnostic data chunk mainly consists of event counters for different error situations, which can be viewed with a utility that is downloaded along with XillyUSB’s driver for Linux. Likewise, a simple routine in an application suite can perform this monitoring for the sake of informing users about a problematic hardware setting.
Cypress’ FX3 does provide some error information of this sort, however this is exposed to the ARM processor inside the device itself. The SDK supplies functions such as CyU3PUsbInitEventLog() for enabling event logging and CyU3PUsbGetErrorCounts() for obtaining error count, but it’s the duty of the ARM’s firmware to transfer this data to the host. And then some kind of driver and utility are needed on the host as well.
The documentation for error counting is somewhat minimal, but looking at the definition of LNK_PHY_ERROR_CONF in the EZ-USB FX3 Technical Reference Manual helps.
Bugs and Errata
As always when evaluating a component for use, it’s suggested to read through the errata section in FX3′s datasheet. In particular, there’s a known problem causing errors in payload data towards the host, for which there is no planned fix. It occurs when a Zero Length Packet is followed by data “very quickly”, i.e. within a microframe of 125μs.
So first, 125μs isn’t “very quickly” in USB 3.0 terms. It’s the time corresponding to 62.5 kBytes of raw bandwidth of the link, which is a few dozens of DATA IN packets. Second, a zero length packet is something that is sent to finish a USB transfer. One can avoid it in some situations, but not in others. For example, if the transfer’s length is a multiple of 1024 bytes, the only way to finish it explicitly is with a zero length packet. The said errata requires not sending any data for 125 μs after such event, or there will be data errors.
This doesn’t just make the controller more complicated, but there’s a significant bandwidth penalty.
It may not be worth much saying that XillyUSB doesn’t have any bug of this sort, as it has been extensively tested with randomized data sources and sinks. It’s in fact quite odd that Cypress obviously didn’t perform tests of this sort (or they would have caught that bug easily).
The crucial difference is however that bugs in an IP core can be fixed and deployed quickly. There is no new silicon device to release, and no need to replace a physical device on the PCB.
No design is born perfect. The question is to what extent the issues that arise are fixed.
A tarball is the common way to convey several files on UNIX systems. But because tar was originally intended for backup, it stores not only the permission information, but also the owner and group of each file. Try listing the content of a tarball with e.g.
$ tar -tzvf thestuff.tar.gz
Note the “v” flag that goes along with the flag for listing, “t”: It causes tar to print out ownership and permission information.
This doesn’t matter much if the tarball is extracted as a non-root user on the other end, because tar doesn’t set the user and group ID in that case: The extracted files get the uid/gid of the process that extracted them.
However if user at the other end extract the tarball as root, the original uid/gid is assigned, which may turn out confusing.
To avoid this, tell tar to assign user root to all files in the archive. This makes no difference if the archive is extracted by a non-root user, but sets the ownership to root if extracted by root. In fact, it sets the ownership to the extracting user in both cases, which is what one would expect.
So this is the command to use to create an old-school .tar.gz tarball:
$ tar --owner=0 --group=0 --mode='og-w' -czf thestuff.tar.gz thestuff
Note that you don’t have to be root to do this. You’re just creating a plain file with your own ownership. It’s extracting these file as root that requires root permissions (if so desired).
The –mode part changes turns off write permission for everyone except the user, so plain files get 0644 (at most).
After making a lot of whitespace reorganization in a kernel module (indentation, line breaks, fixing things reported by sparse and checkpatch), I wanted to make sure I didn’t really change anything. All edits were of the type that the compiler should be indifferent about, but how can I be sure I didn’t change anything accidentally?
It would have been nice if the compiler’s object files were identical before and after the changes, but that doesn’t happen. So instead, let’s hope it’s enough to verify that the executable assembly code didn’t change, and neither did the string literals.
The idea is to make a disassembly of the executable part and dump the part that contains the literal strings, and output everything into a single file. Do that before and after the changes (git helps here, of course), and run a plain diff on the couple of files.
Which boils down to this little script:
#!/bin/bash
objdump -d $1
objdump -s -j .rodata -j .rodata.str1.1 $1
and run it on the compiled module, e.g.
$ ./regress.sh themodule.ko > original.txt
The script first makes the disassembly, and then makes a hex dump of two sections in the ELF file. Most interesting is the .rodata.str1.1 section, which contains the string literals. That’s the name of this section on an v5.7 kernel, anyhow.
Does it cover everything? Can I be sure that I did nothing wrong if the outputs before and after the changes are identical? I don’t really know. I know for sure that it detects the smallest change in the code, as well as a change in any error message string I had (and that’s where I made a lot of changes), but maybe there are some accidents that this check doesn’t cover.
As for how I found the names of the sections: Pretty much trying them all. The list of sections in the ELF file can be found with
$ readelf -S themodule.ko
However only those marked with PROGBITS type can be dumped with objdump -s (or more precisely, will be found with the -j flag). I think. It’s not like I really understand what I’m doing here.
Bottom line: This check is definitely better than nothing.
Leave no leftover childred
One of the really tricky things about a Perl script that forks this way or another, is how to make sure that the children vanish after the parent has exited. This is an issue both if the children were created with a fork() call, or with a safe pipe, as with
my $pid = open(my $fd, '-|');
It may seem to work fine when the main script is terminated with a CTRL-C. The children will indeed vanish. But try killing the main script with a “kill” command, and the parent dies, but the children remain alive and kicking.
The Linux-only solution is
use Linux::Prctl
and then, in the part of the script that runs as a child, do
Linux::Prctl::set_pdeathsig(9);
immediately after the branch between parent and child. This tells Linux to send a SIGKILL to the process that made this call (i.e. the child) as soon as the parent exits. One might be more gentle with a SIGTERM (number 15). But the idea is the same. Parent is away, get the hammer.
To get the Perl module:
# apt install liblinux-prctl-perl
And BTW, SIGPIPE doesn’t help here, even if there’s a pipe between the two processes: It’s delivered only when the child processes attempts to write to a pipe that is closed on the other end. If it doesn’t, the broken pipe is never sensed. And if it’s on the reading side, there’s no SIGPIPE at all — the pipe just gives an EOF when the data is exhausted.
The pdeathsig can of course be used in non-Perl programs as well. This is the Perl example.
Multiple safe pipes
When a process generates multiple children, there’s a problem with the fact that the children inherit the already existing opened file descriptors. For example, when the main script creates multiple children by virtue of safe pipes for read (calling open(my $fd, ‘-|’) repeatedly, so the children write and parent reads): Looking at /proc/PID/fd of the children, it’s clear that they have a lot of pipes opened that they have nothing to do with.
This prevents the main script (the parent), as well some of the children from terminating, even after either side calls to exit() or die(). These processes don’t turn into zombies, but remain plain unterminated processes in the stopped state. At least so it turned out on my Perl v5.26.1 on an x86_64 Linux machine.
The problem for this case occurs when pipes have pending data when the main script attempted to terminate, for example by virtue of a print to STDOUT (which is redirected to the pipe going to the parent). This is problematic, because the child process will attempt to write the remaining data just before quitting (STDOUT is flushed). The process will block forever on this write() call. Since the child doesn’t terminate, the parent process blocks on wait(), and doesn’t terminate either. It’s a deadlock. Even if close() isn’t called explicitly in the main script, the automatic file descriptor close before termination will behave exactly the same: It waits for the child process.
What usually happens in this situation is that when the parent closes the file descriptor, it sends a SIGPIPE to the child. The blocking write() returns as a result with an EPIPE status (Broken pipe), and the child process terminates. This allows the parent’s wait() to reap the child, and the parent process can continue.
And here’s the twist: If the file descriptor belongs to several processes after forking, SIGPIPE is sent to the child only when the last file descriptor is closed. As a result, when the parent process attempts to close one of its pipes, SIGPIPE isn’t sent if the children hasn’t closed their copies of the same pipe file descriptor. The deadlock described above occurs.
There can be worked around by making sure to close the pipes so that the child processes are reaped in the order reversed to their creation. But it’s much simpler to just close the unnecessary file descriptors on the children side.
So the solution is to go
foreach my $fd (@safe_pipe_fds) {
close($fd)
and print STDERR "What? Closing unnecessary file descriptor was successful!\n";
}
on the child’s side, immediately after the call to set_pdeathsig(), as mentioned above.
All of these close() calls should fail with an ECHILD (No child processes) status: The close() call attempts to waitpid() for the main script’s children (closing a pipe waits for the process on the other side to terminate), which fails because only the true parent can do that. Regardless, the file descriptors are indeed closed, and each child process holds only the file descriptors it needs to. And most importantly, there’s no problem terminating.
So the error message is given when the close is successful. The “and” part isn’t a mistake.
It’s also worth mentioning, that exactly the same close() (with a failed wait() call) occurs anyhow when the child process terminates (I’ve checked it with strace). The code snippet above just makes it earlier, and solves the deadlock problem.
Either way, it’s probably wiser to use pipe() and fork() except for really simple one-on-one IPC between a script and itself, so that all this file descriptor and child reaping is done on the table.
As for pipes to and from other executables with open(), that’s not a problem. I mean calls such as open(IN, “ps aux|”) etc. That’s because Perl automatically closes all file descriptors except STDIN, STDOUT and STDERR when calling execve(), which is the syscall for executing another program.
Or more precisely, it sets the FD_CLOEXEC flag for all files opened with a file number above $^F (a.k.a $SYSTEM_FD_MAX), which defaults to 2. So it’s actually Linux that automatically closes the files on a call to execve(). The possible problem mentioned above with SIGPIPE is hence solved this way. Note that this is something Perl does for us, so if you’re writing a program in C and plan to call execve() after a fork — by all means close all file descriptors that aren’t needed before doing that.