Download a 3D-printable spacer / leg for Z-Turn Lite + IO Cape

Z-Turn Lite + IO Cape with 3D-printed spacer(click to enlarge)

Even though I find Myir’s Z-Turn Lite + its IO Cape combination of cards useful and well designed, there’s a small and annoying detail about them: The spacers that arrive with the boards don’t allow setting them up steadily on a flat surface, because the Z-Turn board is elevated over the IO Cape board. As a result, the former board’s far edge has no support, which makes the two boards wiggle. And a little careless movement is all it takes to have these boards disconnected from each other.

So I made a simple 3D design of a plastic leg (or spacer, if you like) for supporting the Z-Turn Lite board. See the small white thing holding the board to the left of the picture above? Or the one in the picture below? That’s the one.

3D-printed spacer attached to Z-Turn Lite board

If you’d like to print your own, just click here to download a zip file containing the Blender v2.76 model  file as well as a ready-to-print STL file. It’s hereby released to the public domain under Creative Common’s CC0 license.

The 3D model of the spacer, in Blender

The units of this model is millimeters. You’ll need this little part of info.

I printed mine at 3D Hubs. Because I bundled this with another, more bulky piece of work, the technique used was FDM at 200 μm, with standard ABS as material. If you’re into 3D printing, you surely just read “cheapest”. And indeed, printing four of these should cost no more than one USD. But then there’s the set-up cost and shipping, which will most likely be much more than the printing itself. So print a bunch of them, even though only two are needed. It’s going to be a few dollars anyhow.

Even though these spacers aren’t very pretty, and with zero mechanical sophistication, they do the job. At least those I got require just a little bit of force to get into the holes, and they stay there (thanks to the pin diameter of 3.2 mm, which matches the holes’ exactly). And because it’s such a dirt simple design, this model should be printable with any technique and rigid material.

Wrapping up, here’s a picture of three printed spacers + two of the spacers that arrived with the boards. Just for comparison.

3D-printed spacers compared with Myir's spacers

Blender notes to self: 3D Printing

As I use Blender only occasionally, I’ve written down quite a few hints to myself for getting back to business. If this helps anyone else, so much better.

I’ve also written two similar posts on this matter: A general post on Blender and a post on rendering and animation.

Printing methods

See a summary chart on this page.

  • Fused deposition modeling (FDM/FFF): Melted plastic (ABS/PLA/Nylon) coming out from a nozzle. Layer thickness ~0.2mm. Cheap, but the geometry is limited to self-supported models, or the result literally drops. Also relatively limited accuracy and minimal thickness.
  • Selective laser sintering / melting(SLS/SLM/EBM): Laser sinters or melts a powdered material (typically nylon/polyamide). Layer thickness ~ 0.1mm. Good chemical properties (biocompatible) but the printer parts have surface porosity. EBM uses electrons instead of light.
  • Stereolithography (SLA/SL/DLP): Based upon curing of a photopolymer resin with a UV laser. Layer thickness ~ 0.05mm. High quality but expensive manufacturing.

Preparing for printing

  • The mesh must be manifold = no holes. Also, it should have no vertices, edges or faces that don’t enclose an volume, no intersection of bodies, no overlapping of edges or faces. Double vertices and edges are not good, but since the mesh is translated into STL, they go unnoticed as long as the duplicates are accurate. If they’re not, this causes warnings that can be ignored, but can lead to missing the important warnings.
  • Watch the model with Flat shading (click button in Tools) at the toolshelf to the left. Smooth shading is misleading.
  • When resizing in Object Mode, be sure to apply (Object > Apply > Scale), so that the measurements in Edit Mode (and otherwise) are correct. Same goes for applying rotation and possibly location.
  • The result is like at rendering. Bends done by bones are exported.
  • Export to .stl, which is a format consisting of just a list of triangles. The file doesn’t include units, which is why it’s required to state units when uploading a file.
  • In properties / Scene, set the Units to Metric and Scale to 0.001 for millimeters (these units will go to the STL file, which us unitless). It seems like this has more to do with interaction with Blender.
  • It seems like mm is the correct scale to use.
  • Also, in the “View” part of the properties pane (keystroke “n”), under Clip, make sure “End” is significantly larger than the objects involved, or there will be weird cut-out effects as the view is rotated and moved around. This property sets the “global cube”. What’s outside this cube becomes invisible — faces become partially cut.
  • In the same pane, under Mesh Display, consider enabling Length for “Edge Info”, which displays real-life measures of each edge. Only in Edit mode, only for selected edges
  • The 3D printing add-on should be enabled. At the left bar, there will be a 3D Printing tab, allowing for a volume calculation.
  • The recommended place to find a print shop is 3D Hubs (I have no affiliation with these guys).
  • Before uploading, do some cleanup: Mesh > Vertices > Remove Doubles, as well as the Cleanup/Isolated and Cleanup/Non-Manifold in the 3D printing toolbox.
  • If the 3D toolbox spins forever when pressing the “Volume” button, it’s not a good omen, obviously.
  • Once uploaded, odds are that a lot of warnings on non-manifold edges and intersected faces. These can be checked with Blender’s 3D Printing Toolbox. In particular note that in Edit Mode, there’s a button saying “Intersected Face” which selects the faces marked as intersected. The underlying reason is can be the use of the Boolean modifier, which may create a lot of double edges (two adjacent faces have separate edges instead of sharing one). These double edges occur a lot more than those causing warnings by these tools, probably only when there’s some difference between the two edges. If this is the reason for these warnings, there’s no problem going ahead printing (saying this from first-hand experience).
  • Pay attention to the “Infill” percentage, which means how much of the internal volumes contain with plastic vs. filled with air cubes by the printing software. The layer height also influences the precision and finish.
  • Matching parts: If one part is supposed to go into another, there is no need for an air gap, but there will be friction (my experience with a 2 mm blade into a groove with the exact width, ABS 200 um printing).

Blender notes to self: Rendering and animation related

As I use Blender only occasionally, I’ve written down quite a few hints to myself for getting back to business. If this helps anyone else, so much better.

I’ve also written two similar posts on this matter: A general post on Blender and a post on 3D printing.

Bones

  • For a simple beginner’s use example, see this page.
  • Bones are simply a handle which one can do the Grab / Rotate / Size trio on. It has an pivot point and a handle. The manipulations on the bone apply to all vertices in the bone’s Vertex Group, relative to the bone’s pivot point, and in proportion to their weight for that group.
  • The Vertex Groups are listed under the object’s properties, under Object Data (icon is an upside down triangle of dots). In Weight Paint mode, this is where the group to paint weights for is selected.
  • The Vertex Groups’ names are taken from the bones’ when weights are assigned automatically.
  • The Armature modifier is added (automatically) to the object subject to the bones. Be sure that it’s the first modifier (uppermost in the stack), in particular before Subdivision Surface. It’s the original mesh we want to move, not tear pieces of the rounded one. Corollary: The bones’ deformations can be applied, like any modifier.
  • Always check the bones’ motion alignment with the parent bone, and set the bones’ Roll parameter (in the bones’ properties, icon with bone) if necessary (in particular if the previous segment has been resized). This sets the axis in space at which the bone rotates, and has to be done manually in Edit mode. It controls the direction the bone rotates w.r.t its origin, which is crucial for intuitive motion, so the bones seem to move right, but just a little off the desired direction. Just align the square of the bone symbol with the previous segment’s direction.
  • The automatic weights aren’t all that good. In the end, there’s no way out but to assign the weights manually.
  • And the Weight painting is good for getting a picture of what’s going on. But assigning weights with it is really bad. In particular as it’s easy to mistakenly paint a completely unrelated vertex, leading to weird things happening.
  • Instead, set the weight manully under the Object data tab (just mentioned). Select the vertices in Edit Mode, write the desired weight in the dedicated place under the Object data tab, and click “Assign”.
  • The Armature must be a parent of the object to be distorted. Extruded bones are children of the bones they’re extruded from.
  • To move around the bones (in particularly rotate), enter Pose mode (or just click “Pose” for the relevant armature in the Object Outliner).
  • Zero the pose: Change to Pose mode, select all (A) and Pose > Clear Transform > All
  • The bones’ influence is disabled only in Edit mode (unless enabled in the Armature modifier).
  • When an object controlled by bones is duplicated, the vertex groups are duplicated as well, but not the bones. So both objects are controlled by the same bones, in an non-natural way (center of rotation on previous bones etc.)
  • If a vertex belongs only to one group, the weight is meaningless: If it belongs to the group, it will move 100% anyhow.
  • If a vertex belongs to more than one vertex group. its normalizes the total to 1.0. So it’s fine to have an overlap on the joints, but be careful with pushing it too far. Note that the bone after the joint is moved by virtue of parenting, so there’s no reason to assign weights after the joint. But it will weaken effect that is supposed to move that part.
  • Rotate bones with Individual Origins pivot point.

Textures etc.

  • Each face is related to a material. The first material is assigned to all faces. Additional ones need to be assigned.
  • Once a material is selected in the Material property button-tab, the Texture tab relates to it.
  • Projecting an image: First mark a seam in Edit mode. Select a set of edges and Mesh > Edges > Mark Seam. Then select the faces to project (possibly all) and go Mesh > UV Unwrap… > Unwrap (or possibly Project from view or some other choice).
  • When using UV projection, the Type is “Image or Movie”, the Source is the file, and under “Mapping” it says Coordiates: UV (otherwise the mapping in Material view will be wrong).
  • UV/Image Editor: Maps pieces of the image into faces. Use side-by-side with a 3D view in Edit mode. Enable “Keep UV and Edit mode mesh selection in sync” for easy selection (somewhere in the middle of the bottom bar). The mouse’s middle button + move mouse moves the image view (instead of Shift-scroll or something)
  • Multiple images can be sources for a single object, by virtue of generating multiple materials, and assigning them them to difference faces. Each material is then linked to separate textures, each based upon a different image.
  • Texture paint: A little GIMP, just in 3D. The changes are updated in the source images image(s). The big upper box is the brush selector. Most notable is “Clone”, which works like GIMP’s, with CTRL-click to select the source. Excellent for hiding seams.
  • Careful with overlapping UV mappings on a single image with Texture Paint: One stroke will affect all mapped regions.
  • Texture paint may manipulate several images in a single stroke, if this stroke covers regions sourced from different images.
  • If texture paint is responding slowly and eating a lot of CPU, try reducing the subsurface division number, if used. Too many faces aren’t good.
  • Don’t forget to save the 2D images in the end!
  • For copying a 3D shape from a 2D image, use Global Mapping on the texture, along with a Top Ortographic view. The texture remains in place no matter how the object it twisted and turned, so it’s fairly easy to drag it along the image’s edges.

Rendering

  • F12: Render Image (“Quick Render”). Also from top menu Render > Render Image. Return to 3D view with F11.
  • Shading Smooth / Flat at the Tool shelf doesn’t change the shape, but only the way light is reflected
  • If the rendering result suffers from weird shadows, and/or unexplained edge lines on a surface that’s supposed to be smooth, try in Edit Mode go Mesh > Normal > Recalculate Outside, which may fix normals that have been messed up from edits.

Cycles: How it works

If a realistic rendering result is desired, forget about Blender’s native render engine. It’s a lost battle with dirty tricks to achieve the obvious way to reach a natural appearance, and that’s to simulate the light rays. Which is what the Cycles engine does.

This is a very simplistic description of Cycles. In reality, it’s by far more clever and efficient, so the results on the real engine are better than you would expect from the description below.

For each sample (i.e. an iteration of improving the rendered image), and for each pixel to be generated on the rendered image, the render engine traces the light ray, backwards. That is, from the camera to the source of light.

The initial leg is simple, as the angle of view is known and deterministic. If this ray hits nothing, we get black. If it hits a face, it examines its material data. By hitting something, I mean the first  intersection between the ray’s line and some face in the mesh.

When hitting a face, the face’s material’s shader is activated. If it’s a pure emission of light, that’s the final station, and the pixel’s value can be calculated. If it’s any other shader, it will tell the render engine on what angle to continue, and how to modify the light source, once reached. This modification is the material’s color or the texture at the specific point that was hit.

And so it goes on, until a ray hitting nothing is reached, or a pure emitting light source. Once the final station has been reached, the aggregation of color modifications is applied, and there’s the final pixel value.

So why is it randomness involved? Why is it random?

A diffusing surface collects light from all directions, and reflects it towards the camera. Since the light tracing can only follow one direction, it’s picked at random by the shader. So each sample consists of one such ray trace for each pixel. Each time a diffusing surface is reached, there’s a lottery. Hence the randomness. Except for pass-through and purely reflective shaders (i.e. Glossy with Roughness 0), which have a deterministic ray bending pattern.

When the “Mix” shader is used, the mix rate is a real mix: Each shader gets its go, and the result is mixed. Try to mix an emission shader with a black diffusion.

So God may not play with a dice, but Cycles surely does.

Light is Everything

  • DON’T use Blender’s Lamps unless you want everything to look like plastic. There’s a huge difference between lamps and objects (e.g. planes) with an emission shader (both in results and render time). Use the latter for a realistic look.
  • In particular, a skin texture will never look right with lamp light. See below.
  • Creating an invisible light source: Create any object, set its shader to Emission, and go to the “Object” properties (the icon is a yellow cube). At the bottom, there’s “Cycles Setting”. Disable “Camera” checkbox in the Ray Visibility section.
  • To avoid seeing these emission objects when editing (they get in the way all the time), put them in a different layer. Use Ctrl-click on the relevant layer to view it along with the current one when switching to render view.

Node Editor

  • For an texture image: Add an Image Texture element, and open the file. Then to UV mapping (nothing will be visible before that). If there are multiple texture files, they are all mapped with the same UV map by default (or at all?).
  • Bump map: Image Texture > Bump (input Height) > Diffuse BSDF (input Normal) > Material Output (input Surface). Displaces the position along the normal, “Distance” says how much. With “Invert” unchecked, a high image value means outwards.
  • Use an image’s transparency: Generate a Tranparent BSDF shader, and connect it to a Mix Shader’s upper input. The lower input goes to the regular (Diffuse BSDF?) shader. The Image Texture’s Color goes as usual to the regular shader, but its Alpha output to the Mix shader’s Fac.
  • Glossy BSDF: Mirror-like reflection when Roughness is set to zero, otherwise it’s diffusing the reflection.
  • Velvet BSDF: Low angles between incident and reflection yield low reflection, so it emphasizes smooth contours. Good for combination with Diffuse shader for simulating human skin (compensate for too dark edges of the latter).
  • Emission: Not just as a light source, but also a way to fake fill light.
  • Color Ramp: Useful to turn an image into a one-dimensional range of colors, including Alpha, instead of manipulating the texture’s range.
  • The Geometry input supplies Normal (which is after smoothing, pick True Normal for without) and Incoming (which is the direction of the light ray). Along with Converter > Vector Math set to Dot Product or Cross Product, the value output with these to combined depends on the angle between the incident ray and the normal. Together with Color Ramp, this allows an arbitrary reflection pattern (use for Fac on some Mix shader).
  • The Voronoi texture (using “Cells”) is great for simulating an uneven, grainy surface.
  • To get a generally misty atmosphere, go to the World tab in Properties, and under Volume select Volume scatter with white color and Density of 0.1 to 0.2. Anisotropy should be 0.

Achieving human skin appearance

Making a model look human and alive is the worst struggle of all. I’ve seen a lot of crazy attempts to add complicated shaders and stuff to reach a natural skin appearance. Even though I haven’t managed to get a face look natural (good luck with that), these are the insights I have reached.

  • Rule zero: Use Cycles. Should be obvious.
  • Rule number one: DO NOT USE LAMPS. All generation of light should be done with objects (most likely flat planes) with (white) emission shaders. Any inclusion of lamp objects makes everything look like plastic. Rendering convergence is indeed faster with lamps, but the result is disastrous, even when lamps are used for just fill light. In short, create real studio lighting.
  • There’s no need for subsurface scattering and all those crazy shaders. These are a result of the impossible attempt to tweak the reflection to get something realistic in response to the plastic feel of lamp light. When the light is done properly, plain shaders are enough. Actually, Subsurface Scattering makes a marginal difference, and to the worse (deepens shadows, while actual skin somehow reflects in all directions).
  • The Glossy part of flat skin (e.g. a leg) should be GGX (default) with roughness ~ 0.5. Diffuse with roughness 0.4 (doesn’t matter so much), mixed 50/50. Use the texture’s color for the Gloss shader as well (or mix partly with white).
  • And here’s the really important part: Natural skin is full with small bruises and other uneven coloring that we barely notice when watching with the naked eye. It’s when this uneven coloring is gone (a woman wearing tons of makeup or a 3D model) that it looks like plastic. Therefore, the texture applied on the skin area (i.e. the coloring of the faces) should be aggressively uneven, with speckles and also wide areas of slight discoloring. Adding a leathery bump texture and/or wrinkles adds to the realistic look, but won’t get the plastic feel unless the lighting is done right and the texture is alive.
  • For the depth pattern of the skin, either use the Voronoi texture (see  this page) on leather, or consider looking for images of elephant skin or something (the cell texture is similar). This is mainly relevant if closeups are made on the skin.
  • Realistic eye: Be sure to add a cornea to the eye, mixing 90% transparent and 10% glossy shaders. The cornea’s ball should be 66% of the size of the eyeball, and brought to cover a little more than the iris. The reflection of the cornea brings the eye to life.

Animation

  • Animation adds an Animation object to the controlled object’s hierarchy (with a ArmatureAction sub-object for Armatures).
  • Key = Nailing the some properties some object for a given frame.
  • Don’t expect to change the pose and have all changes recorded.
  • Rather, in the Timeline Editor, select the desired bones of in the armature for keying (all bones of the armature, probably), pick which properties are being keyed (possibly just Rotation for plain motion) and click the key icon (“Insert Keyframe”).
  • Keying Set = The set of objects whose properties are being keyed.
  • Dope Sheet: Accurate, concise and gives control. Each channel is a property, each diamond is a key for that property. Thick lines between diamonds show that they haven’t changed along that time.
  • Selection of keys: With right-click. Selecting the top diamond (“Dope Sheet Summary”) selects all keys of a frame (the Armature’s diamond selects all keys of an armature etc.)
  • It’s possible to Copy-Paste keys with the clipboard icons at the bottom (or simply CTRL-C, CTRL-V). “Copy” relates to just the selected keys.
  • In the Dope Sheet, use Shift-D and then G (grab) to copy all keys to another frame. Also possible to just Grab keys to adjust the timing etc.
  • “Insert Keyframe” = store the properties of the current pose in the current positions. In Timeline Editor, this adds diamonds in the channels that correspond to the selected bones (or adds these channels). It doesn’t change or delete keys for bones not selected.
  • Work flow: First, select the properties that are going to be involved (all bones of an armature?), and create a key for them in the Timeline Editor. The rest of the work is done in the Dope Sheet: Scrub to the desired frame, change the pose, and Key > Insert Keyframe > All Channels (or with I). Or possibly just selected channels, to leave the other channels interpolating as before.
  • Note the difference between how Timeline and Dope Sheet stores the pose: Timeline stores the properties of the selected bone only, while the Dope Sheet allows storing “All Channels”. Assuming that all relevant properties have channels in the Dope Editor (it’s a good idea), “All Channels” captures the entire pose (and marks those that haven’t changed).
  • Careful with jumping in time by accidentally clicking in the Timeline / Dope Sheet: It overrides all changes in the pose. To avoid this, “save your work” by “Inserting Keyframes” often.
  • Don’t forget to move to a new frame before working on the next pose. If you do, copy the current frame’s keyframes into the clipboard, create a new keyframe with the current pose, and paste the previous keyframes into a slightly earlier frame. And then move (grab) the keys in time to their correct places.
  • It’s possible (but usually pointless) to set the interpolation mode in the Dope Sheet (Key > Interpolation Mode). This controls the interpolation of the selected key until the next one. The default (set in User Preferences > Editing) is Bezier, which gives a natural feel.
  • However the “Constant” interpolation can be useful for camera properties, when it’s desired to hold it still and then jump to other parameters (i.e. a “cut”).

Simulation

  • Plain Physics fluid (simple example): It’s a 3D-grid based simulation running in a limited space, which is enclosed by the object to which the Physics > Fluid physics is attached with the “Domain” type (it’s the walls of the contained as well as the limits of the simulated region). The Physics properties of this object are those determining the simulation (in particular the time scale in seconds via the End time, and the real-life size in meters). And the baking is done on this object. Other objects, which have the Physics > Fluid attached will participate according to the Types, e.g. Fluid (the object will turn into a fluid) or Obstacles (which limits the motion of the fluid).

 

Blender notes to self: General

As I use Blender only occasionally, I’ve written down quite a few hints to myself for getting back to business. If this helps anyone else, so much better.

I’ve also written two similar posts on this matter: A  post on 3D printing and a post on rendering and animation.

Random notes

  • First off all: File > User Preferences > Input and set Orbit Style to Trackball (and not Turntable). The default (Turntable) keeps the Z axis always up, which is extremely limiting for proper modeling.
  • For a realistic look, go for the Cycles render engine. Otherwise (in particular simplistic animations and modeling), stay with Blender’s original.
  • Window > Duplicate Window for a new window with the same project, which can be organized completely differently (the content, modes, selections etc. remain in sync, but not the window layout). If you save the project, all windows will be opened the next time the project is opened.
  • Always pay attention to possible hints at the bottom of the 3D view, in particular in interactive operations with the mouse.
  • Enable / Disable cursor and grid in 3D view: In the Properties tool shelf, under Display toggle “Only Render” check box.
  • Adding a mesh in Object Mode adds an object. Adding a mesh in Edit Mode adds the mesh to the current object.
  • 3D graphics can be defined by meshes or NURBs. Both methods are available in Blender, but keep in mind that it ends up as a mesh when exported to STL for printing.
  • For easier interface, pick File > User Preferences, select Interface tab and enable “Rotate Around Selection”. Otherwise the 3D viewport’s rotation is pretty annoying.
  • Vertices = points. Edges = lines between points. Faces = 2D planes between edges + a normal vector defining its direction (see Wikipedia). If the vertices of a face aren’t coplanar, it’s drawn as separate triangles.
  • There’s an “Object” button on the Properties subwindow, containing the properties of the Object: Position, rotation, locks, transparency, you name it.
  • When quitting blender, the current design is saved as quit.blend. Use File > Recover last session to resume next time.
  • Local coordinates are applied relative to the object’s parent coordinates, so there’s a tree of coordinate displacements.
  • Pay attention to the view type, as stated at the 3D view’s top left. In particular, if it’s Local view, only the selected objects are seen. Note the plural. It’s possible to select several objects and see them all, but Edit Mode only applies to one of them.
  • When drawing a mesh for a smooth surface, keep it uniform; don’t make dense extrusions to catch the details, but fix that at a later stage. See “Subdivision Surface” below.
  • There’s a Manipulate only center of points icon at the bottom of the 3D view: Good for rotating or scaling several object as a way to only move them, but if it’s on, manipulating a single object does nothing.
  • An object may contain meshes with no connection between them.
  • “Linked” = connected through vertices (what you’d naturally call a “thing”).
  • It’s a good idea to remove double vertices every now and then: Edit mode, Vertex selection mode, select all vertices, Mesh > Vertices > Remove Doubles. This can be a result of an Extrusion canceled with Esc. Do this in particular if Subdivision Surface creates some ugly stuff for no apparent reason. Also try Mesh > Clean Up > Delete Loose.
  • There are several constraints that can be applied. Some are self-related, and some to other objects: One object’s transforms are copied to another, one limiting the other. Not all are applicable for the simple (e.g. non-Game Engine) use.
  • In order to view objects with transparency (with Cycles), viewport shading should be Material, and mix a diffuse shader with a Transparent Shader in the material (in the Node Editor). In the material tab of the object, set Viewport Alpha to “Alpha Blend”.
  • In the Properties column, there’s a section for “Mesh Analysis” which paints different faces depending on various criteria. For example, find intersections, sharp regions etc.

Checklist when weird things happen

In Edit Mode, with all selected

  • Mesh > Clean up > Delete Loose
  • Mesh > Vertices > Remove Doubles
  • Mesh > Normals > Calculate Outside

If there are sharp spots or a point getting buried, reduce subdivision surface to zero, and then rise it gradually. Look for

  • A face that shouldn’t be there (in particular an internal face)
  • An edge to a far point
  • Two very close vertices that appear to be one
  • A double vertex, edge or face (these are the most difficult to spot). Possibly by selecting by region in Wireframe view, and verifying that the correct number of elements are selected.

Cheat sheet

  • Spacebar = search functions. The ultimate cheat.
  • Use N and T buttons to toggle visibility of the properties bar and Toolshelf, respectively.
  • Fetch an object (or other resource) from another Blender File: Top menu > File > Append and find the relevant object by its name (it’s good to have then named properly…)
  • Select and move around stuff: Right-click on object and move mouse. Left-click to fix in place. Also use G key (Grab).
  • Lasso selection: CTRL + Left mouse button. For deselect, Shift + CTRL.
  • There’s also Border Select and Circle Select (under the Select menu). Pay attention to the “Limit Selection To Visible” button next to the Vertices/Edge/Face selection trio buttons.
  • Left click moves 3D cursor. Used as the landing point for added objects and as a pivot point if so chosen. In Solid 3D viewport shading, the depth element of the cursor’s position is where a ray of light would have hit a an object as visible (or somewhere near, if there’s no object on the way). This is slightly inaccurate (a tolerance of 1/10000 measure units or so) so it may be better to use Snap > Cursor to active.
  • Tilt-rotate view (just like a 2D image): Shift-Ctrl scroll mousebutton
  • To rotate view around 3D cursor: In the Properties shelf (press N), View > Lock to Cursor
  • Zoom in and out: Scroll button
  • Move scenery: Ctrl-scroll or Shift-scroll. Or: Shift-hold middle button and move mouse.
  • Rotate scenery: Press scroll button and move mouse around
  • Local View is extremely useful when the scenery becomes full with object (in particular light emission planes): Select the object to work with, and press numpad “/” (or View > View Global / Local) with the cursor on the relevant 3D view pane. This modification is relevant only to the certain pane, so other 3D views remain intact (important when some show render previews).
  • “View Selected” (numpad “.”). Puts the selected item in the view’s center instead of trying to get that manually. Useful when selecting from object hierarchy.
  • “Hide selected” (H) and unhide all (Alt-H). Get things out of view, in particular in Edit Mode (hide certain faces so one can see through). Works the same in Object Mode, but it’s equivalent to toggling the eye icon in the hierarchy.
  • Delete stuff: X gives a menu
  • Add objects: Add menu at the bottom left. Note that in Edit Mode, only meshes can be added, and the mesh is added to the selected object (not as a separate one!), as if it was a separate object Joined into one.
  • Manually setting coordinates: Press N and look in the Transform submenu. “Local” coordinates means relative to the object’s own origin, and it’s quite useful.
  • Setting an object’s parameters immediately after Adding it: Press T.
  • Note the Object Mode vs Edit Mode at the bottom left: Selection of objects is possible only on Object mode.
  • Modifiers: The wrench icon in the Properties pane (usually to the right). Can be stacked up, turned on/off momentarily, so don’t necessarily apply them right away.
  • The common editing is done in 3D view (note the small selection boxes to the left, close to the bottom).
  • View modes: At the bottom, next to “Object / Edit Mode”: Usually Solid, but Wireframe is informative, and Rendered can be nice (involves light)
  • View menu: Useful for swapping Orthographic / Perspective view, and also to view from bottom, top, side etc. Most of these accessible from Numpad (see hints in menu).
  • Make an object the center for rotation and scaling: View > View Selected
  • Specials menu: W (for subdivide, which allows e.g. subdividing a face)
  • Arbitrary vertices with edges between them: Enter Edit Mode for an object (possibly a dummy one, which is immediately deleted). Pick Vertex selection mode, and add vertices with CTRL+left mouse button. Useful along with an Empty Image (which can be semi-transparent) as a reference image.
  • To make a 2D object -> 3D, possibly spin it (see Tool Shelf).
  • Duplication of object: The Array Modifier (under “Generate”). For 2D/3D duplication, just cascade the modifiers.
  • Mesh > Edges > Bridge Edge Loops is good for filling gaps.
  • But even better, use Dissolve rather that Delete for getting rid of Vertices / Edges / Faces, so the holes aren’t created in the first place.
  • The Mirror modifier makes it easy to create symmetric objects. When used with Subdivision Surface, be sure to set align all edges on the symmetry plane: Either by scaling to zero with the symmetry plane as the pivot point, or use the Boolean modifier against a large cube, or use the Shrinkwrap modifier against a cube with the outer vertices belonging to the effective vertex group.

Distorting objects

  • Scale, Grab, and Rotate: In Edit or Object mode, select an object and press S, G and R respectively. Or add X, Y, or Z for a constrained rotation and scaling (e.g. SX or shift-X for only ZY). Also use numbers (e.g. S0.5 and R90). Hold down shift for precision.
  • Alternatively (sometimes better): Enable the transformation manipulators with the colored axis icon at the middle-bottom of the 3D view window. Then pick the type of manipulation (translate, rotate or scale) and the axis context (global, local or others). This allows for a simple manipulation across one axis (drag the manipulators).
  • Change the Pivot Point (at the bottom of 3D view) for one-side scaling or rotating around something else than the center.
  • To change the Origin of the object (for scaling or rotating), pick Object > Transform > Origin to… (e.g. 3D cursor). Only in Object Mode.
  • Align vertices to a plane or line: Move the 3D cursor to the desired position, set the Pivot Point to the cursor, select the vertices to align. Then choose S with one of the axes, and press the “zero” button — scale to zero = no distance.
  • Moving stuff: Change to Edit Mode (Bottom left menu). A few items to the right, there’s what to select: Vertex, Edge or Face select. Choose Edge or Face. Select an Edge or Face and move it around (with G or right-click). The displacement is two-dimensional, on the viewed plane.
  • There’s also Mesh > Edge Slide which is good for moving around an Edge loop (to give a subdivided surface emphasis on the right place)
  • There’s Select > Snap to Cursor and Snap to Cursor (Offset) which allows to move stuff to an exact position (the cursor can be moved to a selection prior to this).
  • Extrusion ( = Duplicate vertices, add edges between previous and duplicated vertices, and move the selection): Select a Face and press E, then move the new face on the perpendicular axis. Note that if it’s canceled with an ESC, the four new vertices remain, glued to the original face. Use CTRL-Z to get rid of them.
  • Extrusion with snapping: Press CTRL while moving mouse.
  • Duplicate a face, connected: Extrude, move around, and press ESC. This allow, for example, scaling the new face and possibly moving it, or extruding it again.
  • If more than one face is selected, all are extruded together.
  • If Sculpt mode is going to be used, consider the Multiresolution Modifier, which is the same as Subdivision Surface, but allows allocating a different figure for sculpting.
  • The Bevel modifier rounds off corners (slightly). The number of elements is crucial.
  • The “Adjust Edit Cage to Modifier result” (rightmost button) allows deforming the modified (i.e. smoothed) wireframe. This is actual sculpture. This requires a proper division of faces to begin with.
  • Alternatively, and probably because of a poorly constructed mesh with too many vertices, use Proportional Editing Mode, which forces changes to vertices within a region (button next to “select faces” button at the bottom of a 3D view, in Edit Mode). Use G to grab a vertex or whatever, and mouse wheel to enlarge (roll downwards, counterintuitively) or diminish (roll upwards) the region of influence. Size of influence region is given as a number at the bottom. There are various patterns of how the neighbors are influenced.
  • There’s also the Hook modifier, doing the same as proportional editing.
  • To add loops of vertices: CTRL-R in Edit mode, and roll the scroll button to get several loops. Left click to confirm where the loop goes. Or subdivision edges (under Mesh > Edges).
  • To cut an object in two: Add a loop with CTRL-R, and then Mesh > Vertices > Rip.
  • Or just cut: Select a few vertices, and Mesh > Vertices > Rip. It duplicates the vertices, but doesn’t connect them with edges.
  • Giving thickness to a mesh: Mesh > Faces > Solidify (there’s also a Solidify modifier)
  • To make sharp ends, extrude a face, and Mesh > Vertices > Merge (in Edit Mode, of course).
  • Free bending objects: Create a Bezier path (Add > Curve > Path) and add the Curve modifier (under distort). The object is bent along the curve
  • Closing small gaps between different objects (e.g. shoe to lower leg, or a drop of water slipping down on a surface): Apply the Shrink Modifier on the outer object (nearest Surface point is probably best), but only to a vertex group belonging to the interface region. Weight painting is useful here. Note that there’s no real meaning to “shrinking” — this is just vertices being glued to each other.

The Subdivision Surface Modifier and Catmull-Clark

  • If a smooth surface is desired, this is very likely to be used.
  • Watch this video. Really. Also Pixar’s page on this subject.
  • The Subdivision Surface modifier smooths the object by cutting each edge into two for each subdivision round, hence multiplying the number of faces by four. The mesh turns into a quads-only mesh after the first round. If there are non-quads in the original mesh, artifacts may occur on this first round, when non-quads (in particular n-gons with an uneven n) are split into quads.
  • Use quads whenever possible, and avoid vertices with more than 5 edges. Important exception: When the mesh consists of sparse”anchor points” and not a detailed outline of the desired result.
  • The positions of the added as well as moved vertices are weighted averages of a set of neighboring vertices. This causes an eroding effect on corners, turning a cube into a sphere.
  • Works best on uniformly spaced quadrilateral-faces meshes. Don’t triangulate, be careful with double vertices (e.g. from a poorly aborted extrusion) and avoid changing density of the mesh (e.g. for capturing some detail).
  • Keep the mesh minimal. It’s always possible to add vertices later. Each vertex of the mesh is a handle for deforming the curvature. Have too many of them, and it will be difficult to get a naturally smooth shape. Try to put the vertices where smooth peaks and valleys are expected, even subtle.
  • Keep the mesh minimal II: If there are small dents, rather make them as a texture with the Displace modifier, based upon a texture image (which will naturally share UV mapping with the material’s texture). This modifier should be inserted after the Subdivision Surface modifier, so it moves the final, rounded mesh. It may require a large number of subdivisions to get a smooth displacement, but that’s only necessary on the final rendering. This way, the dents are kept where they’re required, not where they were an accident.
  • Keep the mesh minimal III: If natural motion based upon bones is desired, it’s crucial to draw a (possibly curved) line of vertices that move along with the bone, and another line of vertices that stay in place. The faces between these two lines will do the stretching, and they must be laid out in a natural way, i.e. have the geometry of the skin surface that does the stretching in real life.
  • Sharp shapes will generally shrink. It will shrink less where the mesh is denser, because the averaging is done on the neighboring vertices, even if they’re close.
  • It’s possible to manipulate the move a smoothed surface in Edit mode, with effect on the original mesh with the modifier.
  • The Crease property of certain edges increases their weight in the average for calculating the vertices’ positions, ranging from 0 to 1. This makes the edge sharper after subdivision. When applied to a face, the resultant form gets close to the face (a cube turns into a cylinder if opposite faces have crease set to 1.0). Select the relevant face (in Edit mode), and press Shift-E. Or set manually in the Transform bar (Visible with N). Creasing makes the edges redder.
  • The best way to copy a smooth shape from a 2D image is to start with a subsurfaced low-count mesh, and match the smoothed shape with the 2D image’s. Then possibly apply one round of subdivision, and fine-tune. Use the crease property for sharp turns rather than adding edges if possible.
  • Unwanted creases can be a result of double edges. Select entire mesh, and go Mesh > Vertices > Remove Doubles and Mesh > Clean Up > Delete Loose.
  • Dents can be a result of an uneven mesh. Consider using Edge Loops and Dissolve Edges to get it lighter.
  • To create a sharp corner (e.g. the corner of eyes) or a pointy surface, extrude a vertex (creating an edge) and pull the vertex away from the surface. The edge, which is connected to nothing, pulls the surface towards the vertex. This edge is invisible during rendering, and isn’t cleaned up by “Delete loose” etc.

Joining / subtracting objects / making holes

  • Select two objects, and then CTRL-J. Can be separated again: Press P (Separate) in Edit mode and choose By loose parts
  • Fusing / subtracting objects (best done in Wireframe view): Select one object, pick Properties > Wrench > Add Modifier > Boolean. Within there, select Union, Difference or Intersect. It’s always with respect to another object. The “picker” object in the little Object window allows selecting the other object in the view. Once it’s fine, pick Apply.
  • The Boolean modifier messes up the mesh with duplicate vertices, and even worse, duplicate edges: Adjacent faces along the intersection may not share an edge, but instead have one of its own each. This is fairly acceptable with 3D printing, but creates warnings. Boolean is best used with simple objects for cutting. For example, a large cube to cut off parts of the modified object.
  • Fusing, the right way: Remove the intersecting faces manually, select the hole’s edges at both sides by selecting one edge for each and use Select > Edge Loops (in Edit Mode only). And then Mesh > Edges > Bridge Edge Loops.
  • The Knife tool (in the toolshelf) allows drawing straight lines, which creates new edges at the cutting points, and subdivides faces.
  • Activate Cut-Through with Z. This makes a hole in both ends.
  • There’s also Knife projection, which allows cutting a hole with a curve. Works with simple patterns.
  • Both cut-through and projection depend on the 3D viewing angle.
  • To punch a hole through a 3D body, set the shape with a 2D mesh or curve, and cut the shape on the faces on both sides. Then select the 2D holes on both sides, and pick Mesh > Edges > Bridge Edge Loops to draw edges across the body. The faces make a nice 3D hole.
  • To create an internal hole, generate the shape of the hole as a separate object, and then (in Edit Mode) Mesh > Normals > Flip Normals. Join this object with the target, and place it as desired.

Notable modifiers

  • Generate / Boolean: Create an object that is the intersection, union or difference between two objects. Excellent for chopping of a corner or even larger cuts by applying it with a large cube. Doesn’t work all that well when two complex meshes are involved, and the resultant mesh is often quite messy, and may need some work, in particular if it’s due for further manipulations. For 3D printing, this mess if often good enough, except for simple fixes.
  • Generate / Array: The way to duplicate an object.
  • Generate / Mask: Make all vertices belonging to a vertex group (or not belonging to a vertex group) invisible, both for render and editing. Extremely useful for working with complex structures, allowing to focus on certain parts (possibly internal) parts of a mesh (instead of hiding them every time).
  • Generate / Skin: Creates a body around edges (which function as an armature). A quick way to create an arbitrary 3D shape.
  • Generate / Wireframe: Gives thickness to the existing wireframe, making it a body. A bit like Skin, but simpler.
  • Generate / Triangulate: Make all faces triangles
  • Deform / Displace: Displace the vertices’ position as a function of a texture (i.e. an image). Useful for “printing text” on an object or making a bumpy surface with a certain pattern.
  • Deform / Laplace Deform: Allows deforming an object while preserving geometric properties
  • Deform / Shrinkwrap: As its name implies: pushes the vertices towards the exterior of another object, after applying its modifiers. Neat to get rid of overlapping meshes, but careful with the corners. The “Nearest Surface” (default) mode should be chosen for simple use. “Nearest Vertex” in conjunction with Vertex groups is just a way to glue vertexes from different objects together, but it’s not necessarily useful. Try both.
  • Deform / Simple Deform: Allows freehand twisting and bending and other deformations. Another (empty) object’s location, size and orientation gets different effects.

Sculpt mode

  • Excellent for fixing small dents with the “Smooth” tool (those dents that shouldn’t happen in the first place, because of a properly designed mesh…).
  • Also good for small finishes, but requires a dense mesh to work with. Use when editing with Subdivision Surface isn’t good enough.
  • Note the Brush > Sculpt Tool. “Draw” pulls up the mesh a bit (as if adding material) but if “Subtract” is chosen on the toolshelf, it dents inwards.
  • Another Sculpt Tool is “Scrape” which is good to selectively round off corner (like Bevel, just not globally).

Other modes

  • Vertex Paint: Simple, intuitive painting of the faces. For this to appear on render, a material must be assigned, and the “Vertex Color Paint” option must be checked (this isn’t the default). As expected, the paint follows the faces if they’re moved.
  • Weight Paint: An intuitive way to mark the weight of a vertex group. This has to do with Vertex Groups, an important concept for moving parts of a body along with a “bone”, as well as the Shrinkwrap modifier and other stuff. The painting applies to vertices, therefore use with subdivision and other modifiers off, and hit the right points. Subdivision surface interpolates the weights across edges between points.
  • Texture Paint: A more difficult way to paint a 3D model, but it generates a texture image one can save back. Useful for marking what goes where on the texture image, for writing back and then edit the image. Instead of fiddling with the UV mapping. Also for smoothing seams (with Clone brush). Keep the surface subdivision rate low for this. Too many faces, and the painting goes from slow to impossible.

Sources

Controlling GPIO on the Z-Turn Lite board with Xillinux

Introduction

This post shows how to access some GPIO functionalities from Xillinux running on a Z-Turn Lite board (with an Z-turn Lite IO Cape board attached), directly from the command line.

Watchdog

When the “WD” jumper at J26 on the board is placed, it’s possible to utilize the board’s watchdog chip, which resets the processor if its watchdog-clear pin isn’t toggled for 1.2 seconds. If that pin is in high-Z, the watchdog is inactive, and doesn’t reset the processor even if no toggling has taken place. This can be achieved either by removing the “WD” jumper, which floats the pin, or making the pin high-Z by setting the relevant GPIO to an input (Xillinux ensures the latter, so booting it with the “WD” jumper is safe).

When the pin is high-Z, a small sawtooth-like pulse, which is a few microseconds wide, is visible with an oscilloscope every 1.2 sec, and it’s the watchdog driving the pin to verify that the wire is in high-Z.

The watchdog’s clear pin is wired to the Zynq’s PS-only pin MIO0, which is configured as GPIO 0.

To take control of this pin from the command line:

# echo 0 > /sys/class/gpio/export
# echo out > /sys/class/gpio/gpio0/direction

These commands turn the GPIO into an output, and hence it’s not a high-Z anymore. The pin must start toggling every 1.2 seconds from this moment, or the processor is reset.

To prevent this reset, the following command can be used:

# while [ 1 ] ; do echo 1 > /sys/class/gpio/gpio0/value ; echo 0 > /sys/class/gpio/gpio0/value ; sleep 0.5 ; done

This works on MYiR’s OOB Linux as well (not just Xillinux).

Sensing the IO Cape board’s pushbutton

Not to be confused with the button on the Z-Turn Lite board itself, this is how to fetch the value of the button on the IO Cape Board:

# echo 88 > /sys/class/gpio/export
# cat /sys/class/gpio/gpio88/direction
in
# cat /sys/class/gpio/gpio88/value

This prints out 0 or 1, depending on the button’s state.

Controlling the IO Cape board’s J8 pins

Out of the box, Xillinux routes 34 GPIO I/Os to the IO Cape board’s J8 connector. This can be modified easily by editing the top-level module of Xillinux’ logic design, but this is beyond this post’s scope.

The 34 pins are wired to the connector’s pins 3 to 36. In Linux, to access pin N on the J8, request GPIO number N+51.

For example, in order to toggle pin J8/3, the GPIO to request is 3 + 51 = 54, so the following commands at shell prompt cause some fast toggling:

# echo 54 > /sys/class/gpio/export
# echo out > /sys/class/gpio/gpio54/direction
# while [ 1 ] ; do echo 1 > /sys/class/gpio/gpio54/value ; echo 0 > /sys/class/gpio/gpio54/value ; done

The GPIO pins can also be used as inputs, by following the standard Linux API for GPIO. Note however that pins J8/31 and J8/34 are pulled up with resistors on the IO Cape board.

Linux: When Vivado’s GUI doesn’t start with an error on locale

Trying to running Vivado 2017.3 with GUI and all on a remote host with X forwarding, i.e.

$ ssh -X mycomputer

setting the environment with

$ . /path/to/Vivado/2017.3/settings64.sh

it failed with

$ vivado &
terminate called after throwing an instance of 'std::runtime_error'
  what():  locale::facet::_S_create_c_locale name not valid

Now here’s the odd thing: The error message is actually helpful! It is a locale problem:

$ locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_IL
LC_CTYPE="en_IL"
LC_NUMERIC="en_IL"
LC_TIME="en_IL"
LC_COLLATE="en_IL"
LC_MONETARY="en_IL"
LC_MESSAGES="en_IL"
LC_PAPER="en_IL"
LC_NAME="en_IL"
LC_ADDRESS="en_IL"
LC_TELEPHONE="en_IL"
LC_MEASUREMENT="en_IL"
LC_IDENTIFICATION="en_IL"
LC_ALL=

Checking on a non-ssh terminal, all read “en_US.UTF-8″ instead. The problem seems to be that the SSH is from a newer Linux distro to an older one. “en_IL” is indeed the locale on the newer machine, which is OK there. And SSH changed the locale (which I believe one can avoid, but it’s not worth the effort given the simple workaround below).

So the fix is surprisingly simple:

$ export LC_ALL=en_US.UTF-8

and then check again:

$ locale
LANG=en_IL
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=en_US.UTF-8

OK, so LANG is still rubbish, but after this Vivado 2017.3′s GUI is up and running.

I should mention that there was no problem starting Vivado 2014.4′s GUI, but instead it crashed somewhere in the middle of the implementation. Once again, the fixing the locale solved this.

Xilinx’ Zynq Z007s: Is it really single core?

Introduction

Xilinx’ documentation says that XC7Z007S, among other “S” devices, is a single-core device, as opposed to, for example, its older brother XC7Z010, which is dual-core. So I compared several aspects of the PS part of a Z007S vs. Z010, and to my astonishment, I found that Z007S is exactly the same: Two CPUs are reported by the hardware itself, SMP is kicked off on both, and a simple performance test I made showed that Z007S runs two processes in parallel as fast as Z010.

So the question is: In what sense is XC7Z007S single-core? For now, I have no answer to that. I’ll update this post as soon as someone manages to explain this to me. In the meanwhile, I’ve tried to get this figured out in Xilinx’ forum.

The rest of this post outlines the various similarities between the Z007S vs. Z010 I tested. The PL bitfiles of different Zynq devices are incompatible, so there’s no chance I mistook which devices I worked with.

The tests below were made with Xillinux-2.0 (kernel v4.4) on two Z-turn Lite boards, one carrying Z007S, and one Z010.

Found 2 CPUs?

I started wondering when the kernel’s dmesg log indicated that it had found 2 CPUs on a Z007S:

[    0.132523] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000
[    0.132586] Setting up static identity map for 0x82c0 - 0x82f4
[    0.310962] CPU1: thread -1, cpu 1, socket 0, mpidr 80000001
[    0.311065] Brought up 2 CPUs
[    0.311102] SMP: Total of 2 processors activated (2664.03 BogoMIPS).
[    0.311121] CPU: All CPU(s) started in SVC mode.

Also, /proc/cpuinfo consistently listed two CPUs. One could think that it’s because two CPUs are declared in the device tree, but removing one of them makes no difference.

On Z010, the exact same log and appears in this matter, and /proc/cpuinfo says the same.

CPU’s hardware register reporting two CPUs

According to the Zynq-7000 AP SoC Technical Reference Manual (UG585), the processor’s SCU_CONFIGURATION_REGISTER indicates the number of CPUs present in the Cortex-A9 MPCore processor in bits 1:0. Binary ’01′ means two Cortex-A9 processors, CPU0 and CPU1. Binary ’00′ means one Cortex-A9 processor, CPU0.

Using Xillinux-2.0′s poke kernel utility to read the processor’s SCU_CONFIGURATION_REGISTER register, I got exactly the same result on Z007S and Z010:

poke read addr=f8f00004: value=00000511

In other words, both devices report two processors.

I’m under the impression that the kernel uses this register to tell the number of CPUs by virtue of the scu_get_core_count() (defined in arch/arm/kernel/smp_scu.c) function, called by zynq_smp_init_cpus() in arch/arm/mach-zynq/platsmp.c.

The latter function sets the kernel’s “CPU possible” bits, so it’s how the Zynq-specific kernel setup code tells the kernel framework which CPUs indexes are good for use.

Also, the U-Boot code used by Xillinux for Z-Turn Lite prints out the processor count, based upon SCU_CONFIGURATION_REGISTER, as well as other info. For Z007S it gave:

U-Boot 2013.07 (Sep 17 2018 - 11:51:45)              

Detected device ID code 0x3 (XC7Z007S) with 2 CPU(s), PS_VERSION = 3
Strapped boot mode: 5 (SD Card)

and for Z010:

U-Boot 2013.07 (Sep 17 2018 - 11:51:45)              

Detected device ID code 0x2 (XC7Z010) with 2 CPU(s), PS_VERSION = 3
Strapped boot mode: 5 (SD Card)

A simple benchmark test

The proof is in the pudding. I wrote a simple program, which forks into two processes, each running a certain amount of plain CPU-intensive operations, and then quits. The output of this program is of no interest, but it’s printed out to avoid the compiler from optimizing away the crunching. Its listing is given at the end of this post for reference.

Using the “time” utility to measure the execution times, I ran the program on Z007S and Z010, and consistently got the same results, except for slight fluctuations:

# time ./work 400
Parent process done with LSR at e89c4641
Child process quitting with LSR at e89c4641
Parent process quitting

real	0m3.604s
user	0m7.030s
sys	0m0.010s

The 3.6 seconds given as “real” is the wall clock time. The 7 seconds of “user” time is the amount of consumed CPU. And as one would expect from a program that runs on two processes on a dual core machine, the consumed CPU time is approximately double the wall clock time. This is the result I expected from Z010, but not from Z007S.

Just to be sure I wasn’t being silly, I booted the kernel with “nosmp” in the kernel command line, which forced a single-CPU bringup. Indeed, the kernel reported finding one CPU in its logs, and /proc/cpuinfo reflected that as well.

And the pudding?

# time ./work 400
Parent process done with LSR at e89c4641
Child process quitting with LSR at e89c4641
Parent process quitting

real	0m6.998s
user	0m6.970s
sys	0m0.010s

Exactly as expected: With one processor, forking into two processes has no advantage. The CPU time is the wall clock time. I waited twice as long for it to finish.

At some point I suspected that the specific Linux version I used had a specific scheduler issue, which allowed a single-core CPU to perform as well as a dual-core. However the dual-core results were repeated on a Zybo board with three completely different kernels (except Xillinux-2.0) and yielded the same results (or slightly worse, with older kernels).

Conclusion

Given the results above, it’s not clear why Z007S is labeled as a single-core device. It’s not a matter of how it quacks or walks, but in the end, the device performs twice as fast when the work is split into two processes.

Or I missed something here. Kindly comment below if you found my mistake.

———————————–

Appendix: The benchmark program’s listing

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <time.h>
#include <signal.h>
#include <errno.h>
#include <string.h>
#include <sys/wait.h>

static unsigned int lsr_state;

int main(int argc, char *argv[]) {

  int count, i, j, bit;
  pid_t pid;

  if (argc != 2) {
    fprintf(stderr, "Usage: %s count\n", argv[0]);
    exit(1);
  }

  count = atoi(argv[1]);

  lsr_state = 1;

  pid = fork();

  if (pid < 0) {
    perror("Failed to fork");
    exit(1);
  }

  for (i=0; i<count; i++)
    for (j=0; j<(1<<20); j++) {
      bit = ((lsr_state >> 19) ^ (lsr_state >> 2)) & 0x01;

      lsr_state = (lsr_state << 1) | bit;

      if (lsr_state == 0) {
	fprintf(stderr, "Huh? The LSR state is zero!\n");
	exit(1);
      }
    }

  if (pid == 0) {
    fprintf(stderr, "Child process quitting with LSR at %x\n", lsr_state);
    return 0;
  }

  fprintf(stderr, "Parent process done with LSR at %x\n", lsr_state);

  pid = wait(&i);

  fprintf(stderr, "Parent process quitting\n");

  return 0;
}

saved as work.c, compiled with

# gcc -O3 -Wall work.c -o work

directly on the Zynq board itself (Xillinux comes with a native gcc compiler). But cross compilation should make no difference.

Soft Linux kernel hacking for dumping ULPI commands to USB PHY

Ever wanted to see how the a Linux USB host talks with its PHY with ULPI commands? Probably not. But if you do, here’s how I did it on a Zynq device, connected to an USB3320 USB 2.0 PHY chip. Note that:

  • The relevant sources must be compiled into the kernel. Modules are loaded too late. The choice of PHY frontend is made when the USB driver is initialized, and if the relevant driver isn’t handy, a generic PHY is picked instead…
  • … which is most likely as good. In retrospect, there’s is very little reason to load the actual driver.
  • In particular, my system works great without the dedicated USB PHY driver.

So it’s about adding a plain pr_info() into the kernel’s drivers/usb/phy/phy-ulpi-viewport.c, so it prints every ULPI register write command to the kernel log. Added code marked in red:

static int ulpi_viewport_write(struct usb_phy *otg, u32 val, u32 reg)
{
	int ret;
	void __iomem *view = otg->io_priv;

	pr_info("ulpi_viewport_write: reg 0x%04x = 0x%02x\n",
		reg, val);

	writel(ULPI_VIEW_WAKEUP | ULPI_VIEW_WRITE, view);
	ret = ulpi_viewport_wait(view, ULPI_VIEW_WAKEUP);
	if (ret)
		return ret;

	writel(ULPI_VIEW_RUN | ULPI_VIEW_WRITE | ULPI_VIEW_DATA_WRITE(val) |
						 ULPI_VIEW_ADDR(reg), view);

	return ulpi_viewport_wait(view, ULPI_VIEW_RUN);
}

And that’s it. One can also cover the ulpi_viewport_read() method in the same way, but it wasn’t important to me (I wanted to the powering on of Vbus).

The relevant part in my device tree read:

	usb_phy0: phy0 {
		compatible = "ulpi-phy";
		#phy-cells = <0>;
		reg = <0xe0002000 0x1000>;
		view-port = <0x0170>;
		drv-vbus;
	};

	usb0: usb@e0002000 {
		compatible = "xlnx,zynq-usb-2.20a", "chipidea,usb2";
		clocks = <&clkc 28>;
		interrupt-parent = <&ps7_scugic_0>;
		interrupts = <0 21 4>;
		reg = <0xe0002000 0x1000>;
		phy_type = "ulpi";
		dr_mode = "host";
		usb-phy = <&usb_phy0>;
	};

And this is what I got in the dmesg log:

[    1.396317] ulpi_phy_probe() invoked
[    1.399968] ulpi_phy_probe() returns successfully
[    1.405148] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    1.418505] ehci-pci: EHCI PCI platform driver
[    1.429765] ehci-platform: EHCI generic platform driver
[    1.441924] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    1.454951] ohci-pci: OHCI PCI platform driver
[    1.466237] ohci-platform: OHCI generic platform driver
[    1.478504] uhci_hcd: USB Universal Host Controller Interface driver
[    1.492047] usbcore: registered new interface driver usb-storage
[    1.505250] chipidea-usb2 e0002000.usb: ci_hdrc_usb2_probe invoked
[    1.518410] e0002000.usb supply vbus not found, using dummy regulator
[    1.532049] ci_hdrc ci_hdrc.0: ChipIdea HDRC found, revision: 22, lpm: 0; cap: e0d0a100 op: e0d0a140
[    1.532062] ulpi_init() invoked
[    1.542033] ULPI transceiver vendor/product ID 0x0424/0x0007
[    1.554611] Found SMSC USB3320 ULPI transceiver.
[    1.566118] ulpi_viewport_write: reg 0x0016 = 0x55
[    1.577825] ulpi_viewport_write: reg 0x0016 = 0xaa
[    1.589383] ULPI integrity check: passed.
[    1.600057] ulpi_viewport_write: reg 0x000a = 0x06
[    1.611516] ulpi_viewport_write: reg 0x0007 = 0x00
[    1.622872] ulpi_viewport_write: reg 0x0004 = 0x41
[    1.634203] ci_hdrc ci_hdrc.0: It is OTG capable controller
[    1.634233] ci_hdrc ci_hdrc.0: EHCI Host Controller
[    1.645628] ci_hdrc ci_hdrc.0: new USB bus registered, assigned bus number 1
[    1.672482] ci_hdrc ci_hdrc.0: USB 2.0 started, EHCI 1.00
[    1.684475] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[    1.697770] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    1.711521] usb usb1: Product: EHCI Host Controller
[    1.722832] usb usb1: Manufacturer: Linux 4.4.30-xillinux-2.0 ehci_hcd
[    1.735797] usb usb1: SerialNumber: ci_hdrc.0
[    1.747236] hub 1-0:1.0: USB hub found
[    1.757421] hub 1-0:1.0: 1 port detected
[    1.767881] ulpi_viewport_write: reg 0x000a = 0x67

The log entries in green above are just some other similar debug outputs I made, and they pretty much explain themselves.

Did you note that the ULPI was detected by vendor ID / product ID? It’s for real. These were obtained by ULPI registers read (not shown above). I’m not all that convinced that this detection made any difference, except for printing out the name of the device.

As for the meaning of these ulpi_viewport_write dumps, most is pretty boring: The first two writes to address 0x16 do nothing. It’s a scratch pad register. Most likely used by the driver to test the ULPI interface.

The following three writes just assign the default values. So this does effectively nothing as well.

The last write to register 0x0a (OTG register) sets bits 6, 5 and 0, which are DrvVbusExternal, DrvVbus and IdPullup. The interesting part to me was DrvVbusExternal and DrvVbus, because setting any of these two (or both) causes the chip’s CPEN pin to go high, which turns on the power supply for Vbus. This is the point where the USB port starts behaving like a host and feeds power.

Quartus: The importance of derive_pll_clocks in the SDC file

Introduction

Whenever a PLL is used in a design to generate one clock from another, it’s quite common to expect the timing tools to figure out the frequencies and timing relations between the different clocks.

With Intel’s Quartus tools, this isn’t the case by default. A derive_pll_clocks command is required in the SDC constraints file for this happen. And indeed, this command appears in virtually any SDC file that is generated automatically by the tools.

But here’s the scary thing: If derive_pll_clocks is omitted, one would expect that the PLL’s output clocks would not be timed at all, and that the relevant paths would be listed as unconstrained. Unfortunately, it’s different: As shown below, timing calculations are made for these paths, but with wrong figures. So one might get the impression that the timing constraints were met and all is fine, but in fact nothing is assured.

An example

Let’s say that the FPGA has an oscillator input of 48 MHz (hence a period of 20.833 ns), from which a PLL generates a 240 MHz clock (with a period of 4.166 ns).

First let’s take a simple, properly written, SDC file going:

create_clock -name root_clk -period 20.833 [get_ports {osc_clock}]

derive_pll_clocks
derive_clock_uncertainty

Note that the derive_pll_clocks command is there.

Now let’s look at the timing report for a path between two registers, which are clocked by the derived clock. The only interesting part is marked with red:

+-------------------------------------------------------------------------------------------------------------------------------------------------+
; Data Arrival Path                                                                                                                               ;
+---------+----------+----+------+--------+------------------------+------------------------------------------------------------------------------+
; Total   ; Incr     ; RF ; Type ; Fanout ; Location               ; Element                                                                      ;
+---------+----------+----+------+--------+------------------------+------------------------------------------------------------------------------+
; 0.000   ; 0.000    ;    ;      ;        ;                        ; launch edge time                                                             ;
; 4.937   ; 4.937    ;    ;      ;        ;                        ; clock path                                                                   ;
;   0.000 ;   0.000  ;    ;      ;        ;                        ; source latency                                                               ;
;   0.000 ;   0.000  ;    ;      ; 1      ; PIN_B12                ; osc_clock                                                                    ;
;   0.000 ;   0.000  ; RR ; IC   ; 1      ; IOIBUF_X19_Y29_N8      ; osc_clock~input|i                                                            ;
;   0.667 ;   0.667  ; RR ; CELL ; 2      ; IOIBUF_X19_Y29_N8      ; osc_clock~input|o                                                            ;
;   2.833 ;   2.166  ; RR ; IC   ; 1      ; PLL_3                  ; clkrst_ins|altpll_component|auto_generated|pll1|inclk[0]                     ;
;   1.119 ;   -1.714 ; RR ; COMP ; 1      ; PLL_3                  ; clkrst_ins|altpll_component|auto_generated|pll1|observablevcoout             ;
;   1.119 ;   0.000  ; RR ; CELL ; 1      ; PLL_3                  ; clkrst_ins|altpll_component|auto_generated|pll1|clk[0]                       ;
;   3.274 ;   2.155  ; RR ; IC   ; 1      ; CLKCTRL_G13            ; clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|inclk[0] ;
;   3.274 ;   0.000  ; RR ; CELL ; 8      ; CLKCTRL_G13            ; clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|outclk   ;
;   4.336 ;   1.062  ; RR ; IC   ; 1      ; FF_X40_Y24_N27         ; clkrst_ins|main_state[0]|clk                                                 ;
;   4.937 ;   0.601  ; RR ; CELL ; 1      ; FF_X40_Y24_N27         ; clkrst:clkrst_ins|main_state[0]                                              ;
; 6.774   ; 1.837    ;    ;      ;        ;                        ; data path                                                                    ;
;   5.169 ;   0.232  ;    ; uTco ; 1      ; FF_X40_Y24_N27         ; clkrst:clkrst_ins|main_state[0]                                              ;
;   5.169 ;   0.000  ; FF ; CELL ; 5      ; FF_X40_Y24_N27         ; clkrst_ins|main_state[0]|q                                                   ;
;   5.591 ;   0.422  ; FF ; IC   ; 1      ; LCCOMB_X40_Y24_N24     ; clkrst_ins|Equal1~0|dataa                                                    ;
;   6.002 ;   0.411  ; FR ; CELL ; 1      ; LCCOMB_X40_Y24_N24     ; clkrst_ins|Equal1~0|combout                                                  ;
;   6.370 ;   0.368  ; RR ; IC   ; 1      ; DDIOOUTCELL_X41_Y24_N4 ; clkrst_ins|the_register|d                                                    ;
;   6.774 ;   0.404  ; RR ; CELL ; 1      ; DDIOOUTCELL_X41_Y24_N4 ; clkrst:clkrst_ins|the_register                                               ;
+---------+----------+----+------+--------+------------------------+------------------------------------------------------------------------------+

+-------------------------------------------------------------------------------------------------------------------------------------------------+
; Data Required Path                                                                                                                              ;
+---------+----------+----+------+--------+------------------------+------------------------------------------------------------------------------+
; Total   ; Incr     ; RF ; Type ; Fanout ; Location               ; Element                                                                      ;
+---------+----------+----+------+--------+------------------------+------------------------------------------------------------------------------+
; 4.166   ; 4.166    ;    ;      ;        ;                        ; latch edge time                                                              ;
; 9.005   ; 4.839    ;    ;      ;        ;                        ; clock path                                                                   ;
;   4.166 ;   0.000  ;    ;      ;        ;                        ; source latency                                                               ;
;   4.166 ;   0.000  ;    ;      ; 1      ; PIN_B12                ; osc_clock                                                                    ;
;   4.166 ;   0.000  ; RR ; IC   ; 1      ; IOIBUF_X19_Y29_N8      ; osc_clock~input|i                                                            ;
;   4.833 ;   0.667  ; RR ; CELL ; 2      ; IOIBUF_X19_Y29_N8      ; osc_clock~input|o                                                            ;
;   6.912 ;   2.079  ; RR ; IC   ; 1      ; PLL_3                  ; clkrst_ins|altpll_component|auto_generated|pll1|inclk[0]                     ;
;   5.119 ;   -1.793 ; RR ; COMP ; 1      ; PLL_3                  ; clkrst_ins|altpll_component|auto_generated|pll1|observablevcoout             ;
;   5.119 ;   0.000  ; RR ; CELL ; 1      ; PLL_3                  ; clkrst_ins|altpll_component|auto_generated|pll1|clk[0]                       ;
;   7.187 ;   2.068  ; RR ; IC   ; 1      ; CLKCTRL_G13            ; clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|inclk[0] ;
;   7.187 ;   0.000  ; RR ; CELL ; 8      ; CLKCTRL_G13            ; clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|outclk   ;
;   8.199 ;   1.012  ; RR ; IC   ; 1      ; DDIOOUTCELL_X41_Y24_N4 ; clkrst_ins|the_register|clk                                                  ;
;   8.736 ;   0.537  ; RR ; CELL ; 1      ; DDIOOUTCELL_X41_Y24_N4 ; clkrst:clkrst_ins|the_register                                               ;
;   9.005 ;   0.269  ;    ;      ;        ;                        ; clock pessimism removed                                                      ;
; 8.985   ; -0.020   ;    ;      ;        ;                        ; clock uncertainty                                                            ;
; 8.890   ; -0.095   ;    ; uTsu ; 1      ; DDIOOUTCELL_X41_Y24_N4 ; clkrst:clkrst_ins|the_register                                               ;
+---------+----------+----+------+--------+------------------------+------------------------------------------------------------------------------+

Aside from all the mumbo-jumbo, there’s the “latch edge time” line, which is the time of the edge of the clock that will propagate through the clock network and become the latching clock on the receiving register. As this is the case of a plain register-to-register path, both clocked with the rising edge of the same clock, the “latch edge time” is simply the clock’s period. Indeed 4.166 ns. So far so good.

But then disaster

Let’s see what happens if the derive_pll_clocks command is omitted. In other words, the SDC file reads:

create_clock -name root_clk -period 20.833 [get_ports {osc_clock}]

derive_clock_uncertainty

For exactly the same path, the timing report reads:

+-------------------------------------------------------------------------------------------------------------------------------------------------+
; Data Arrival Path                                                                                                                               ;
+---------+----------+----+------+--------+------------------------+------------------------------------------------------------------------------+
; Total   ; Incr     ; RF ; Type ; Fanout ; Location               ; Element                                                                      ;
+---------+----------+----+------+--------+------------------------+------------------------------------------------------------------------------+
; 0.000   ; 0.000    ;    ;      ;        ;                        ; launch edge time                                                             ;
; 4.937   ; 4.937    ;    ;      ;        ;                        ; clock path                                                                   ;
;   0.000 ;   0.000  ;    ;      ;        ;                        ; source latency                                                               ;
;   0.000 ;   0.000  ;    ;      ; 1      ; PIN_B12                ; osc_clock                                                                    ;
;   0.000 ;   0.000  ; RR ; IC   ; 1      ; IOIBUF_X19_Y29_N8      ; osc_clock~input|i                                                            ;
;   0.667 ;   0.667  ; RR ; CELL ; 2      ; IOIBUF_X19_Y29_N8      ; osc_clock~input|o                                                            ;
;   2.833 ;   2.166  ; RR ; IC   ; 1      ; PLL_3                  ; clkrst_ins|altpll_component|auto_generated|pll1|inclk[0]                     ;
;   1.119 ;   -1.714 ; RR ; COMP ; 1      ; PLL_3                  ; clkrst_ins|altpll_component|auto_generated|pll1|observablevcoout             ;
;   1.119 ;   0.000  ; RR ; CELL ; 1      ; PLL_3                  ; clkrst_ins|altpll_component|auto_generated|pll1|clk[0]                       ;
;   3.274 ;   2.155  ; RR ; IC   ; 1      ; CLKCTRL_G13            ; clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|inclk[0] ;
;   3.274 ;   0.000  ; RR ; CELL ; 8      ; CLKCTRL_G13            ; clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|outclk   ;
;   4.336 ;   1.062  ; RR ; IC   ; 1      ; FF_X40_Y24_N19         ; clkrst_ins|main_state[0]|clk                                                 ;
;   4.937 ;   0.601  ; RR ; CELL ; 1      ; FF_X40_Y24_N19         ; clkrst:clkrst_ins|main_state[0]                                              ;
; 6.765   ; 1.828    ;    ;      ;        ;                        ; data path                                                                    ;
;   5.169 ;   0.232  ;    ; uTco ; 1      ; FF_X40_Y24_N19         ; clkrst:clkrst_ins|main_state[0]                                              ;
;   5.169 ;   0.000  ; FF ; CELL ; 5      ; FF_X40_Y24_N19         ; clkrst_ins|main_state[0]|q                                                   ;
;   5.583 ;   0.414  ; FF ; IC   ; 1      ; LCCOMB_X40_Y24_N24     ; clkrst_ins|Equal1~0|datab                                                    ;
;   5.994 ;   0.411  ; FR ; CELL ; 1      ; LCCOMB_X40_Y24_N24     ; clkrst_ins|Equal1~0|combout                                                  ;
;   6.361 ;   0.367  ; RR ; IC   ; 1      ; DDIOOUTCELL_X41_Y24_N4 ; clkrst_ins|the_register|d                                                    ;
;   6.765 ;   0.404  ; RR ; CELL ; 1      ; DDIOOUTCELL_X41_Y24_N4 ; clkrst:clkrst_ins|the_register                                               ;
+---------+----------+----+------+--------+------------------------+------------------------------------------------------------------------------+

+--------------------------------------------------------------------------------------------------------------------------------------------------+
; Data Required Path                                                                                                                               ;
+----------+----------+----+------+--------+------------------------+------------------------------------------------------------------------------+
; Total    ; Incr     ; RF ; Type ; Fanout ; Location               ; Element                                                                      ;
+----------+----------+----+------+--------+------------------------+------------------------------------------------------------------------------+
; 20.833   ; 20.833   ;    ;      ;        ;                        ; latch edge time                                                              ;
; 25.672   ; 4.839    ;    ;      ;        ;                        ; clock path                                                                   ;
;   20.833 ;   0.000  ;    ;      ;        ;                        ; source latency                                                               ;
;   20.833 ;   0.000  ;    ;      ; 1      ; PIN_B12                ; osc_clock                                                                    ;
;   20.833 ;   0.000  ; RR ; IC   ; 1      ; IOIBUF_X19_Y29_N8      ; osc_clock~input|i                                                            ;
;   21.500 ;   0.667  ; RR ; CELL ; 2      ; IOIBUF_X19_Y29_N8      ; osc_clock~input|o                                                            ;
;   23.579 ;   2.079  ; RR ; IC   ; 1      ; PLL_3                  ; clkrst_ins|altpll_component|auto_generated|pll1|inclk[0]                     ;
;   21.786 ;   -1.793 ; RR ; COMP ; 1      ; PLL_3                  ; clkrst_ins|altpll_component|auto_generated|pll1|observablevcoout             ;
;   21.786 ;   0.000  ; RR ; CELL ; 1      ; PLL_3                  ; clkrst_ins|altpll_component|auto_generated|pll1|clk[0]                       ;
;   23.855 ;   2.069  ; RR ; IC   ; 1      ; CLKCTRL_G13            ; clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|inclk[0] ;
;   23.855 ;   0.000  ; RR ; CELL ; 8      ; CLKCTRL_G13            ; clkrst_ins|altpll_component|auto_generated|wire_pll1_clk[0]~clkctrl|outclk   ;
;   24.867 ;   1.012  ; RR ; IC   ; 1      ; DDIOOUTCELL_X41_Y24_N4 ; clkrst_ins|the_register|clk                                                  ;
;   25.404 ;   0.537  ; RR ; CELL ; 1      ; DDIOOUTCELL_X41_Y24_N4 ; clkrst:clkrst_ins|the_register                                               ;
;   25.672 ;   0.268  ;    ;      ;        ;                        ; clock pessimism removed                                                      ;
; 25.572   ; -0.100   ;    ;      ;        ;                        ; clock uncertainty                                                            ;
; 25.477   ; -0.095   ;    ; uTsu ; 1      ; DDIOOUTCELL_X41_Y24_N4 ; clkrst:clkrst_ins|the_register                                               ;
+----------+----------+----+------+--------+------------------------+------------------------------------------------------------------------------+

So it’s exactly the same analysis, only assuming that the clock period is 20.833 ns (note the “latch edge time” again). Note that the analysis traverses the PLL, but simply ignores the fact that the PLL’s output has another frequency. It’s as if the tools were saying: You forgot the derive_pll_clocks constraint? No problem. We’ll play as if the PLL’s input clock went right through it.

Frankly, I can’t think about a single case where this behavior would make sense. Either don’t calculate the timing of paths of the derived clock, or do it correctly. But just throwing in the original clock’s period? These incorrectly constrained paths don’t appear in the unconstrained path summary, nor is there any other indication that the timing is horribly wrong.

To the tools’ defense, the timing analysis produces warnings on this matter, but none at a Critical level, so it’s easy to miss them in the sea of warnings that FPGA tools always generate.

Bottom line

  • Make sure your design has the derive_pll_clocks command if you have a PLL involved (unless you’ve added explicit constraints for the derived clocks)
  • Be sure to generate a timing report, to read and understand it.
  • Always test your constraints by requiring impossible values, and verify that the failing paths are calculated correctly

Nvidia graphics cards on Linux: PCIe link speed and width

Why is it at 2.5 GT/s???

With all said about Nvidia’s refusal to release their drivers as open source, their Linux support is great. I don’t think I’ve ever had such a flawless graphics card experience with Linux. After replacing the nouveau driver with Nvidia’s, of course. Ideology is nice, but a computer that works is nicer.

But then I looked at the output of lspci -vv (on an Asus fanless GT 730 2GB DDR3), and horrors, it’s not running at full PCIe speed!

17:00.0 VGA compatible controller: NVIDIA Corporation GK208 [GeForce GT 730] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: ASUSTeK Computer Inc. GK208B [GeForce GT 730]
[ ... ]
        Capabilities: [78] Express (v2) Legacy Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
[ ... ]
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

Whatwhat? The card declares it supports 5 GT/s, but runs only at 2.5 GT/s? And on my brand new super-duper motherboard, which supports Gen3 PCIe connected directly to an Intel X-family CPU?

It’s all under control

Well, the answer is surprisingly simple: Nvidia’s driver changes the card’s PCIe speed dynamically to support the bandwidth needed. When there’s no graphics activity, the speed drops to 2.5 GT/s.

This behavior can be controlled with Nvidia’s X Server Settings control panel (it has an icon in the system’s setting panel, or just type “Nvidia” on Gnome’s start menu). Under the PowerMizer sub-menu, the card’s behavior can be changed to stay at 5 GT/s if you like your card hot and electricity bill fat.

Otherwise, in “Adaptive mode” it switches back and forth from 2.5 GT/s to 5 GT/s. The screenshot below was taken after a few seconds of idling (click to enlarge):

Screenshot of Nvidia X Server settings in adaptive mode

And this is how to force it to 5 GT/s constantly (click to enlarge):

Screenshot of Nvidia X Server settings in maximum performance mode

With the latter setting, lspci -vv shows that the card is at 5 GT/s, as promised:

17:00.0 VGA compatible controller: NVIDIA Corporation GK208 [GeForce GT 730] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: ASUSTeK Computer Inc. GK208B [GeForce GT 730]
[ ... ]
                LnkCap: Port #0, Speed 5GT/s, Width x8, ASPM L0s L1, Exit Latency L0s <512ns, L1 <4us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

So don’t worry about a low speed on an Nvidia card (or make sure it steps up on request).

A word on GT 1030

I added another fanless card, Asus GT 1030 2GB, to the computer for some experiments. This card is somewhat harder to catch at 2.5 GT/s, because it steps up very quickly in response to any graphics event. But I managed to catch this:

65:00.0 VGA compatible controller: NVIDIA Corporation GP108 (rev a1) (prog-if 00 [VGA controller])
        Subsystem: ASUSTeK Computer Inc. GP108 [GeForce GT 1030]
[ ... ]
                LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <512ns, L1 <16us
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

The running 2.5 GT/s speed vs. the maximal 8 GT/s is pretty clear by now, but the declared maximal Width is 4x? If so, why does it have a 16x PCIe form factor? The GT 730 has an 8x form factor, and uses 8x lanes, but GT 1030 has 16x and declares it can only use 4x? Is this some kind of marketing thing to make the card look larger and stronger?

On the other hand, show me a fairly recent motherboard without a 16x PCIe slot. The thing is that sometimes that slot can be used for something else, and the graphics card could then have gone into a vacant 4x slot instead. But no. Let’s make it big and impressive with a long PCIe plug that makes it look massive. Personally, I find the gigantic heatsink impressive enough.