Freescale i.MX SDMA tutorial (part IV)

This post was written by eli on October 26, 2011
Posted Under: ARM,NXP (Freescale)

This is part IV of a brief tutorial about the i.MX51′s SDMA core. The SDMA for other i.MX devices, e.g. i.MX25, i.MX53 and i.MX6 is exactly the same, with changes in the registers’ addresses and different chapters in the Reference Manual.

This is by no means a replacement for reading the Reference Manual, but rather an introduction to make the landing softer. The division into part goes as follows:

Running custom scripts

I’ll try to show the basics of getting a simple custom script to run on the SDMA core. Since there’s a lot of supporting infrastructure involved, I’ll show my example as a hack on the drivers/dma/imx-sdma.c Linux kernel module per version 2.6.38. I’m not going to explain the details of kernel hacking, so without experience in that field, it will be pretty difficult to try this out yourself.

The process of running an application-driven custom script consists of the following steps:

  • Initialize the SDMA module
  • Initialize the SDMA channel and clearing its HE flag
  • Copy the SDMA assembly code from application space memory to SDMA memory space RAM.
  • Set up the channel’s context
  • Enable the channel’s HE flag (so the script runs pretty soon)
  • Wait for interrupt (assuming that the script ends with a “DONE 3″)
  • Possibly copy back the context to application processor space, to inspect the registers upon termination, and verify that their values are as expected.
  • Possibly copy SDMA memory to application processor space in order to inspect if the script worked as expected (if the script writes to SDMA RAM)

The first two steps are handled by the imx-smda.c kernel module, so I won’t cover them. I’ll start with the assembly code, which has to be generated first.

The assembler

Freescale offers their assembler, but I decided to write my own in Perl. It’s simple and useful for writing short routines, and its output is snippets of C code, which can be inserted directly into the source, as I’ll show later. It’s released under GPLv2, and you can download it from this link.

The sample code below does nothing useful. For a couple of memory related examples, please see another post of mine.

To try it out quickly, just untar it on some UNIX system (Linux included, of course), change directory to sdma_asm, and go

$ ./ looptry.asm
                             | start:
0000 0804 (0000100000000100) |     ldi r0, 4
0001 7803 (0111100000000011) |     loop exit, 0
0002 5c05 (0101110000000101) |     st r4, (r5, 0) # Address r5
0003 1d01 (0001110100000001) |     addi r5, 1
0004 1c10 (0001110000010000) |     addi r4, 0x10
                             | exit:
0005 0300 (0000001100000000) |     done 3
0006 1c40 (0001110001000000) |     addi r4, 0x40
0007 0b00 (0000101100000000) |     ldi r3, 0
0008 4b00 (0100101100000000) |     cmpeqi r3, 0 # Always true
0009 7df6 (0111110111110110) |     bt start # Always branches

------------ CUT HERE -----------

static const int sdma_code_length = 5;
static const u32 sdma_code[5] = {
 0x08047803, 0x5c051d01, 0x1c100300, 0x1c400b00, 0x4b007df6,

The output should be pretty obvious. In particular, note that there’s a C declaration of a const array called sdma_code, which I’ll show how to use below. The first part of the output is a plain assembly listing, with the address, hex code and binary representation of the opcodes. There are a few simple syntax rules to observe:

  • Anything after a ‘;’ or ‘#’ sign is ignored (comments)
  • Empty lines are ignored, of course
  • A label starts the line, and is followed by a colon sign, ‘:’
  • Everything is case-insensitive, including labels (all code is lowercased internally)
  • The first alphanumeric string is considered the opcode, unless it’s a label
  • Everything following an opcode (comments excluded) is considered the arguments
  • All registers are noted as r0, r1, … r7 in the argument fields, and not as plain numbers, unlike the way shown in the reference manual. This makes a clear distinction between registers and values. It’s “st r7, (r0,9)” and not “st 7, (0,9)“.
  • Immediate arguments can be represented as decimal numbers (digits only), possibly negative (with a plain ‘-’ prefix). Positive hexadecimal numbers are allowed with the classic C “0x” prefix.
  • Labels are allowed for loops, as the first argument. The label is understood to be the first statement after the loop, so the label is the point reached when the loop is finished. See the example above. The second argument may not be omitted.
  • Other than loops, labels are accepted only for branch instructions, where the jump is relative. Absolute jump addresses can’t be generated automatically for jmp and jsr because the absolute address is not known during assembly.

A few words about why labels are not allowed for absolute jumps: It would be pretty simple to tell the Perl script the origin address, and allow absolute addressed jumps. I believe absolute jumps within a custom script should be avoided at any cost, so that the object code can be stored and run anywhere vacant. This is why I wasn’t keen on implementing this.

A simple test function

This is a simple function, which loads a custom script and runs it a few times. I added it, and a few additional functions (detailed later) to the Linux kernel’s SDMA driver, imx-sdma.c, and called it at the end of sdma_probe(). This is the simplest, yet not most efficient way to try things out: The operation takes place once when the module is inserted into the kernel, and then a reboot is necessary, since the module can’t be removed from the kernel. But with the reboot being fairly quick on an embedded system, it’s pretty OK.

So here’s the tryrun() function. Mind you, it’s called after the SDMA subsystem has been initialized, with one argument, the pointer to the sdma_engine structure (there’s only one for the entire system).

static int tryrun(struct sdma_engine *sdma)
 const int channel = 1;
 struct sdma_channel *sdmac = &sdma->channel[channel];
 static const u32 sdma_code[5] = {
  0x08047803, 0x5c051d01, 0x1c100300, 0x1c400b00, 0x4b007df6,

 const int origin = 0xe00; /* In data space terms (32 bits/address) */

 struct sdma_context_data *context = sdma->context;

 int ret;
 int i;

 sdma_write_datamem(sdma, (void *) sdma_code, sizeof(sdma_code), origin);

 ret = sdma_request_channel(sdmac);

 if (ret) {
   printk(KERN_ERR "Failed to request channel\n");
   return ret;

 sdma_config_ownership(sdmac, false, true, false);

 memset(context, 0, sizeof(*context));

 context->channel_state.pc = origin * 2; /* In program space addressing... */
 context->gReg[4] = 0x12345678;
 context->gReg[5] = 0xe80;

 ret = sdma_write_datamem(sdma, (void *) context, sizeof(*context),
                          0x800 + (sizeof(*context) / 4) * channel);
 if (ret) {
   printk(KERN_ERR "Failed to load context\n");
   return ret;

 for (i=0; i<4; i++) {
   ret = sdma_run_channel(&sdma->channel[1]);
   printk(KERN_WARNING "*****************************\n");
   sdma_print_mem(sdma, 0xe80, 128);

   if (ret) {
     printk(KERN_ERR "Failed to run script!\n");
     return ret;
 return 0; /* Success! */

Copying the code into SDMA memory

First, note that sdma_code is indeed copied from the output of the assembler, when it’s executed on looptry.asm as shown above. The assembler adds the “static” modifier as well as an sdma_code_length variable which were omitted, but otherwise it’s an exact copy.

The first thing the function actually does, is calling sdma_write_datamem() to copy the code into SDMA space (and I don’t check the return value, sloppy me). This is a function I’ve added, but its clearly derived from sdma_load_context(), which is part of imx-sdma.c:

static int sdma_write_datamem(struct sdma_engine *sdma, void *buf,
                              int size, u32 address)
 struct sdma_buffer_descriptor *bd0 = sdma->channel[0].bd;
 void *buf_virt;
 dma_addr_t buf_phys;
 int ret;

 buf_virt = dma_alloc_coherent(NULL, size, &buf_phys, GFP_KERNEL);
 if (!buf_virt)
 return -ENOMEM;

 bd0->mode.command = C0_SETDM;
 bd0->mode.count = size / 4;
 bd0->mode.status = BD_DONE | BD_INTR | BD_WRAP | BD_EXTD;
 bd0->buffer_addr = buf_phys;
 bd0->ext_buffer_addr = address;

 memcpy(buf_virt, buf, size);

 ret = sdma_run_channel(&sdma->channel[0]);

 dma_free_coherent(NULL, size, buf_virt, buf_phys);

 return ret;

The sdma_write_datamem()’s principle of operation is pretty simple: First a buffer is allocated, with its address in virtual space given in buf_virt and its physical address is buf_phys. Both addresses are related to the application processor, of course.

Then the buffer descriptor is set up. This piece of memory is preallocated globally for the entire sdma engine (in application processor’s memory space), which isn’t the cleanest way to do it, but since these operations aren’t expected to happen in parallel processes, this is OK. The sdma_buffer_descriptor structure is defined in imx-smda.c itself, and is initialized according to section 52.23.1 in the Reference Manual. Note that this calling convention interfaces with the script running on channel 0, and not with any hardware interface. This chunk is merely telling the script what to do. In particular, the C0_SETDM command tells it to copy from application memory space to SDMA data memory space (see section

Note that in the function’s arguments, “size” is given in bytes, but address in SDMA data address space (that is, in 32-bit quanta). This is why “size” is divided by four to become the element count (mode.count).

Just before kicking off, the input buffer’s data is copied into the dedicated buffer with a plain memcpy() command.

And then sdma_run_channel() (part of imx-sdma.c) is called to make channel 0 runnable. This function merely sets the HE bit of channel 0, and waits (sleeping) for the interrupt to arrive, or errors on timeout after a second.

At this point we have the script loaded into SDMA RAM (at data address 0xe00).

Some housekeeping calls on channel 1

Up to this point, nothing was done on the channel we’re going to use, which is channel #1. Three calls to functions defined in imx-sdma.c prepare the channel for use:

  • sdma_request_channel() sets up the channel’s buffer descriptor and data structure, and enables the clock global to the entire sdma engine, actions which I’m not sure are necessary. It also sets up the channel’s priority and the Linux’ wait queue (used when waiting for interrupt).
  • sdma_disable_channel() clears the channel’s HE flag
  • sdma_config_ownership() clears HO, sets EO and DO for the channel, so the channel is driven (“owned”) by the processor (as opposed to driven by external events).

Setting up the context

Even though imx-sdma.c has a sdma_load_context() function, it’s written for setting up the context as suitable for running the channel 0 script. To keep things simpler, we’ll set up the context directly.

After zeroing the entire structure, three registers are set in tryrun(): The program counter, r4 and r5. Note that the program counter is given the address to which the code was copied, multiplied by 2, since the program counter is given in program memory space. The two other registers are set merely as an initial state for the script. The structure is then copied into the per-channel designated slot with sdma_write_datamem().

Again, note that the “context” data structure, which is used as a source buffer from which the context is copied into SDMA memory, is allocated globally for the entire SDMA engine. It’s not even protected by a mutex, so in a real project you should allocate your own piece of memory to hold the sdma_context structure.

Running the script

In the end, we have a loop of four subsequent runs of the script, without updating the context, so from the second time and on, the script continues after the “done 3″ instruction. This is possible, because the script jumps to the beginning upon resumption (the three last lines in the assembly code, see above).

Each call to sdma_run_channel() sets channel 1′s HE flag, making it do its thing and then trigger off an interrupt with the DONE instruction, which in turn wakes up the process telling it the script has finished. sdma_print_mem() merely makes a series of printk’s, consisting of hex dumps of data from the SDMA memory. As used, it’s aimed on the region which the script is expected to alter, but the same function can be used to verify that the script is indeed in its place, or look at the memory. The function  goes

static int sdma_print_mem(struct sdma_engine *sdma, int start, int len)
 int i;
 u8 *buf;
 unsigned char line[128];
 int pos = 0;

 len = (len + 15) & 0xfff0;

 buf = kzalloc(len, GFP_KERNEL);

 if (!buf)
   return -ENOMEM;

 sdma_fetch_datamem(sdma, buf, len, start);

 for (i=0; i<len; i++) {
   if ((i % 16) == 0)
     pos = sprintf(line, "%04x ", i);

   pos += sprintf(&line[pos], "%02x ", buf[i]);

   if ((i % 16) == 15)
     printk(KERN_WARNING "%s\n", line);


 return 0;

and it uses this function (note that the instruction is C0_GETDM):

static int sdma_fetch_datamem(struct sdma_engine *sdma, void *buf,
                              int size, u32 address)
 struct sdma_buffer_descriptor *bd0 = sdma->channel[0].bd;
 void *buf_virt;
 dma_addr_t buf_phys;
 int ret;

 buf_virt = dma_alloc_coherent(NULL, size,
                               &buf_phys, GFP_KERNEL);
 if (!buf_virt)
   return -ENOMEM;

 bd0->mode.command = C0_GETDM;
 bd0->mode.count = size / 4;
 bd0->mode.status = BD_DONE | BD_INTR | BD_WRAP | BD_EXTD;
 bd0->buffer_addr = buf_phys;
 bd0->ext_buffer_addr = address;

 ret = sdma_run_channel(&sdma->channel[0]);

 memcpy(buf, buf_virt, size);

 dma_free_coherent(NULL, size, buf_virt, buf_phys);

 return ret;

Dumping context

This is the poor man’s debugger, but it’s pretty useful. A “done 3″ function can be seen as a breakpoint, and the context dumped to the kernel log with this function:

static int sdma_print_context(struct sdma_engine *sdma, int channel)
 int i;
 struct sdma_context_data *context;
 u32 *reg;
 unsigned char line[128];
 int pos = 0;
 int start = 0x800 + (sizeof(*context) / 4) * channel;
 int len = sizeof(*context);
 const char *regnames[22] = { "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7",
                              "mda", "msa", "ms", "md",
                              "pda", "psa", "ps", "pd",
                              "ca", "cs", "dda", "dsa", "ds", "dd" };

 context = kzalloc(len, GFP_KERNEL);

 if (!context)
   return -ENOMEM;

 sdma_fetch_datamem(sdma, context, len, start);

 printk(KERN_WARNING "pc=%04x rpc=%04x spc=%04x epc=%04x\n",

 printk(KERN_WARNING "Flags: t=%d sf=%d df=%d lm=%d\n",

 reg = &context->gReg[0];

 for (i=0; i<22; i++) {
   if ((i % 4) == 0)
     pos = 0;

   pos += sprintf(&line[pos], "%s=%08x ", regnames[i], *reg++);

   if (((i % 4) == 3) || (i == 21))
     printk(KERN_WARNING "%s\n", line);


 return 0;

Clashes with Linux’ SDMA driver

Playing around with the SDMA subsystem directly is inherently problematic, since the assigned driver may take contradicting actions, possibly leading to a system lockup. Running custom scripts using the existing driver isn’t possible, since it has no support for that as of kernel 2.6.38. On the other hand, there’s a good chance that the SDMA driver wasn’t enabled at all when the kernel was compiled, in which case there is no chance for collisions.

The simplest way to verify if the SDMA driver is currently present in the kernel, is to check in /proc/interrupts whether interrupt #6 is taken (it’s the SDMA interrupt).

The “imx-sdma” pseudodevice is always registered on the platfrom pseudobus (I suppose that will remain in the transition to Open Firmware), no matter the configuration. It’s the driver which may not be present. The “i.MX SDMA support” kernel option (CONFIG_IMX_SDMA) may not be enabled (it can be a module). Note that it depends on the general “DMA Engine Support” (CONFIG_DMADEVICES), which may not be enabled to begin with.

Anyhow, for playing with the SDMA module, it’s actually better when these are not enabled. In the long run, maybe there’s a need to expand imx-sdma.c, so it supports custom SDMA scripting. The question remaining is to what extent it should manage the SDMA RAM. Well, the real question is if there’s enough community interest in custom SDMA scripting at all.


Reader Comments

Great SDMA blog! I was wondering if you had experience with the OnCE controller debugging interface that is offered in the SDMA ARM Platform (non-JTAG)? If so, any advise or examples? Thanks again for the article.

Written By Tysen Moore on May 9th, 2014 @ 18:51

Thanks. As for OnCE — no, I don’t use debuggers much at all. I prefer printf-like techniques, or its counterpart in microcontroller programming: Writing data to special places in memory and such.

Written By eli on May 9th, 2014 @ 21:58

SDMA code disassembler and romcode for IMX6 you can find here:

Written By exslestonec on May 14th, 2015 @ 01:10

Thanks for this excellent series! Very useful. I wrote a followup detailing some stuff I learned when working with SDMA in kernel version 4.1.15. Anyone who’s interested can find it here:

Written By Jonah on April 30th, 2017 @ 22:48

Hi i am trying to generate binary file using your perl script. But I am getting below error. Can you please help.

——– Command ———–
kishor@kishor:~/DMA_script$ ./ copydma.asm

——– Error Logs ———–
Can’t locate in @INC (you may need to install the mx51_sdma_set module) (@INC contains: . /etc/perl /usr/local/lib/perl/5.18.2 /usr/local/share/perl/5.18.2 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.18 /usr/share/perl/5.18 /usr/local/lib/site_perl) at ./ line 15.
BEGIN failed–compilation aborted at ./ line 15.

Written By kishor on May 12th, 2017 @ 08:36

Add a Comment

required, use real name
required, will not be published
optional, your blog address