i.MX51 EIM bus clarified

This post was written by eli on October 14, 2011
Posted Under: ARM,FPGA,Linux kernel,NXP (Freescale)

These are my notes as I made my way in grasping how the EIM bus works. Unfortunately, the information in the reference manual was far from complete, so except for the list of acronyms, this page consists of things I found out by reverse engineering the bus.

The actual bus cycle outlines and timings are given in section 4.6.7 of the datasheet. There are also timing diagrams in section 63.8 of the reference manual, which include the internal AXI bus as well, which may be a bit confusing.

I worked with an Armadeus APF51 board, which has a 16-bit multiplexed bus connected to the Xilinx Spartan-6 FPGA. My preferences and observations are related to this board.

I wrote some code for the FPGA and processor on the board, for the sake of this reverse engineering, which is available in another post of mine. I’ve also published some oscilloscope shots during the process, which you may look at here.

EMI, EIM and WEIM

Freescale’s nomenclature regarding the external bus is somewhat inconsistent, so here’s a quick clarification.

EMI is the external memory interface, which is the general crossconnect for making AXI masters talk with their slaves, internal or external. EIM is the external interface module, which is embodied as the Wireless External Interface Module (WEIM). Pin names in the datasheet and schematics are used in EIM terms, but the reference manual uses WEIM nomenclature. Section 4.6.7.1 in the datasheet contains a table which connects between the different names used for the same signals. It’s a must to look at.

Parameter acronyms

Section 63 in the reference manual covers the external memory interface. As it uses acronyms pretty extensively, sometimes with forward reference, I ended up making a cheat sheet.

Here’s a list of bus parameter acronyms, along with the values I chose by default for my tests with the bus (which are pretty much my final preferences). The acronyms themselves were taken directly from chapter 63.4.3 in the Reference Manual.

From Chip Select x General Configuration Register 1

PSZ = 0, Page Size
WP = 0, Write Protect
GBC = 1, Gap Between Chip Selects. That is, the gap between asserting one CS pin and then asserting another pin. Not to be confused with CSREC. At least 1 in sync mode.
AUS = 1, Address Unshifted.
CSREC = 1, CS Recovery, minimal unused clock cycles on the bus between operations on the same CS pin (back to back access). At least 1 in sync mode.
SP = 0, Supervisor Protect
DSZ = 1, Data Port Size (16 bit on Armadeus)
BCS = 0, Bus Clock Start, for fine shifting of the burst clock
BCD = 0, Bus Clock Divider (zero means don’t divide). Note that for BCM = 1 (BCLK running continuously), BCD only controls the bus signals’ rate, and not the BCLK signal itself. More about this is the “gotcha” section.
WC = 0, Write Continuous
BL= 0, Burst Length. When the internal DMA functional unit accesses the EIM bus, BL must be set to cover the longest burst possibly required (typically 32 bytes), or data is corrupted when the SDMA engine induces a burst longer than BL allows.
CREP = 1, Configuration Register Enable Polarity
CRE = 0, Configuration Register Enable (disabled in my case)
RFL/WFL = 1, Read/Write Fix Latency. Whether WAIT should be ignored (it is for RFL= WFL =1)
MUM = 1. Multiplexed Mode. If data and address are multiplexed on the same lines. True in this case.
SRD/SWD = 1, Synchronous Read/Write Data. Whether bus operations are synchronous. They are, of course.
CSEN = 1, CS Enable. If this isn’t set, attempting to write to the relevant region ends up with a bus error (and an oops in the Linux kernel).

From Chip Select x General Configuration Register 2

DAP = 0, Data Acknowledge polarity, irrelevant in sync mode
DAE = 0, Data Acknowledge Enable, irrelevant in sync mode
DAPS = 0, Data Acknowledge Polling Start, irrelevant in sync mode
ADH = 0, Address Hold Time

From Chip Select x Read/Write Configuration Register 1 and 2

RWSC/WWSC = 1, Read/Write Wait State Control. The number of wait states on a bus transaction, given in BCLK cycles (as opposed to WEIM cycles). Must be at least 1.
RADVA/WADVA = 0, Read/Write ADV Assertion. Tells when ADV is asserted in WEIM cycles. Note that while ADV is asserted, the address is present on the multiplexed address/data lines, no matter what, even at the cost of some or all data not appearing on the bus at all.
RADVN/WADNV = 0, Read/Write ADV Negation. How many extra WEIM cycles ADV stays asserted. The formula given in the reference manual says that by default, ADV is asserted for a BCLK worth’s of time, starting as required by RADVA/WADVA, and whatever is given by RADVA/WADVA is added to that.
RAL/WAL = 0, Read/Write ADV Low. When this bit is set, RADVN/WADVN are ignored, and ADV is asserted during the entire bus operation.
RCSA/WCSA = 0, Read/Write CS Assertion. The number of WEIM clocks the CS’s assertion is delayed.
RCSN/WCSN = 0, Read/Write CS Negation. Ignored in sync mode.
RBEA /WBEA= 0, Read/Write BE Assertion. The number of WEIM clocks to delay BE assertion.
RBEN/WBEN=0, Read/Write BE Negation. Ignored in sync mode.
RBE = 1, Read BE enable
WBED = 0, Write BE disable
OEA = 0, (Read) OE Assertion. How many WEIM clock cycles to delay the OE signal assertion. Note that unlike other delay parameters, OEA is relative to the first data clock cycle, so OE will mean “expecting data on data lines on this clock cycle” for OEA=0. It works this way in multiplexed mode, at least.
OEN = 0, (Read) OE Negation. Ignored in sync mode.
APR = 0, (Read) Asynchronous Page Read. Must be held zero in sync mode.
PAT = 0, (Read) Page Access Time. Ignored when APR=0 (and hence ignored in sync mode)
RL = 0, Read Latency.
WEA = 0, WE Assertion. How many WEIM clock cycles to delay WE assertion
WEN = 0, WEN Negation. Ignored in sync mode
WBCDD = 0, Write Burst Clock Divisor Decrement

And finally, from the WEIM Configuration Register

WDOG_LIMIT = 0, Memory Watchdog. Not really necessary in sync mode
WDOG_EN = 0, Memory Watchdog Enable.
INTPOL = 1, Interrupt Polarity
INTEN = 0, Interrupt Enable
BCM = 1, Burst clock mode. When asserted, the BCLK runs continuously, instead of only during bus transactions.
GBCD = 0, General Burst Clock Divisor. Used globally for all CS spaces instead of each space’s BCD when BCM=1. See warning below.

Some gotcha notes

Clock division with BCD and GBCD is a messy issue, and clock division is best avoided when running the clock continuously (BCM=1). The thing is that while GBCD indeed controls the division of the BCLK signal itself, the bus signals are governed by the clock divided by the individual BCD. So if BCD != GBCD for a specific CS region, the bus signals are completely unrelated to BCLK. But even if BCD and GBCD are equal, there is no guaranteed phase relation between them (as has been observed) because they’re generated by two unrelated clock dividers. So the BCLK signal is useless unless BCD = GBCD = 0.
Most delays are given in WEIM clocks, not BCLK clocks. This makes no difference as long as BCLK runs at WEIM rate (BCD=0), but if BCLK is divided, even for the sake of getting clear transitions on an oscilloscope, this needs to be taken into account.
For most parameters, delays in assertions make the signal’s assertion duration shorter. It’s not a time shift, as the deassertion doesn’t move.
All signals, except OE, are asserted at the same first clock cycle of the bus access, unless delayed by the parameters below. This includes WE and BE, which one could mistakenly expect to be asserted when data is available. OE is indeed asserted when data is due to be supplied, and its assertion timing parameter works relatively to that clock cycle.
Delaying and/or extending the ADV signal with RADVA/WADVA and RADVN/WADVN on a multiplexed bus causes address to be present during the relevant time periods without changing other timings, with precedence to address. So time slots which would otherwise be used for data transmission are overridden with address on the bus, possibly eliminating data presence on the bus completely. This can be compensated with wait states, but note that wait states count in BCLK cycles, while the ADV adjustments count WEIM cycles.
The OE signal is somewhat useless as an output-enable when BCLK runs at 95 MHz: If used directly to drive tri-state buffers, the round-trip from its assertion to when data is expected is ridiculously short: The data-to-rising clock setup time is 2 ns, according to the datasheet (section 4.6.7.3, table 53, parameter WE18). OE is asserted on the falling edge of the clock just before the rising edge, for which the data is sampled, with a delay of up to 1.75 ns (same table, WE10). At a clock cycle of 10.5 ns (95 MHz), this half-clock gap between these two events is 5.25 ns, leaving 5.25 – 2 – 1.75 = 1.5 ns for the bus slave to take control of the bus. Not realistic, to say the least. So the bus slave must deduce from WE whether the bus cycle is read or write, and drive the bus according to predefined timing. As for bursts, I’m not on the clear on whether bursts can be stalled in the middle and how OE behaves if that is possible. The timing diagram in section 63.8.7 of the Reference Manual does not imply that OE may get high in the middle of a burst. On the other hand, it shows OE going down together with ADV, which surely isn’t the case as I observed (maybe because I ran on a multiplexed bus?).

Write data cycles

Reminder: This entire post relates to a 16-bit address/data multiplexed EIM bus.

The simplest write data cycle (as defined by parameter settings above) consists of three BCLK cycles. On the first one, the lower 16 bits of the address is present on the bus, and ADV is held low. On the two following clock cycles, ADV is high and the 32-bit word is transferred over the data lines. CS and WE are held low during the entire cycle (three BCLKs). And no, the upper 16 bits of the address are never presented on the bus.

For BCD=0 (BCLK = WEIM clock), the master toggles its signals on the falling edge of BCLK, and samples signals from the slave on its rising edge. This holds true for all bus signals.

Data is sent in little endian order: A 32-bit word is sent with its lower 16-bit part (bits 15:0) in the first clock cycle, and the higher 16 bits (bits 31:16) in the second cycle. The 16-bit words are sent naturally (that is, each DA[15:0] is a consistent 16-bit word).

With AUS=1 (address unshifted) the address’ lower 16 bits appear naturally on the address cycle. For example, writing to offset Ox30 (to the CS2 address range) sets bits DA[4]=1 and DA[5]=1 (only) in the address clock cycle.

With AUS=0 (address shifted according to port size) the address shown on the bus is shifted by one bit, since there are two bytes in the port size’s width. Hence writing to offset Ox60 sets bits DA[4]=1 and DA[5]=1 (only) in the address clock cycle.

As said above (and verified with scope), WADVA and WADVN don’t just move around the ADV signal’s assertion, but also the times at which the address is given on the bus, possibly overriding time which would otherwise be used to transfer data. It’s the user’s responsibility to make sure (possibly with wait states) that there is enough time for data on the bus.

Wait states, as set by WWSC extend the first data cycle, so the lower 16 bits of data are held on the bus for a longer time. If WWSC=0, only the upper 16 bits are shown (the first data cycle is skipped) but this is an illegal setting anyhow. Again, note that WWSC counts BCLK cycles, as opposed to almost every other timing parameter.

For BCD=0 (only) the data lines’ levels are held with the last written value until the next bus operation. This feature (which AFAIK is not guaranteed by spec) is used by the FPGA bitstream loader: A word is written, and the FPGA’s clock is toggled afterwards to sample the data which is left unchanged (which is maybe why you can’t load the bitstream file from the on-board flash, as indicated in Armadeus’ wiki page). When BCD>0, the data lines go to zero after the cycle is ended.

Read data cycles

Read data cycles are in essence the same as write cycles, only the bus slave is expected to drive the data lines in the same time slots for which the master drove them on a bus write operation. On read cycles, the master samples the data lines on rising BCLK edges, which is symmetric to the slave sampling the same lines on write cycles. The endianess is the same of course.

OE is asserted on the same WEIM cycle for which ADV is deasserted, which is one BCLK cycle after CS’s assertion with the default parameters given above. WE is not asserted in write cycles, of course.

As mentioned in the “gotcha notes” above, the OE line is pretty useless in high bus rates, as it’s asserted on the falling edge of BCLK coming just before the rising edge on which the data is sampled. This gives by far too little time for the slave to respond. So the slave should figure out the correct behavior and timing according to WE and CS.

Using the WAIT signal

In order to use the WAIT signal for adding wait states on the fly, the respective RFL/WFL parameter need to be zeroed. If RFL=0, also set RWSC=2 (instead of the minimal 1), or four useless and unused wait states will be added to each bus cycle, most likely due to an illegal condition in the master’s bus state machine. This is not necessary for writes (i.e. it’s OK to have WFL=0 and WWSC=1).

WAIT is active low. When the master samples WAIT low (on a rising BCLK edge) it considers the following rising BCLK edge as a wait state. It’s or course legal to assert WAIT for consecutive BCLK cycles to achieve long waits. If WAIT is not asserted, the bus runs according to RWSC/WWSC. Each BCLK cycle is considered independently, so when a 32-bit word is transmitted on two BCLK cycles, wait states can be inserted between 16-bit words, resulting in expected behavior. There is no need to consider the fact that these 16-bit words form a 32-bit word when dealing with wait state behavior.

As one would expect, if WAIT is asserted with respect to a bus cycle that wouldn’t occur anyhow (i.e. the last BCLK cycle in a transmission), it’s ignored.

In read cycles, all this boils down to that if the master sampled WAIT low on BCLK rising edge n, no data will be sampled from data lines on rising edge n+1, and the entire bus operation is extended by another BCLK cycle. RWSC must be set to a minimum of 2, since WAIT is ignored on the first bus cycle (on which ADV is asserted) , so the first chance to request a wait state is on the second cycle, which must be a wait state anyhow. If RWSC=1 and RFL=0, the master will insert this wait state anyhow, but misbehave as just mentioned above. Even though counterintuitive, the master may very well sample data from the bus on a BCLK rising edge for which WAIT is asserted. This will make the following bus cycle a wait state, as one can deduce from the mechanism. But it may come intuitively unnatural that an asserted WAIT and valid data are sampled on the same BCLK.

For write cycles, if the master samples WAIT asserted on a rising edge of BCLK, it will behave as usual on the falling BCLK immediately following it, but will not update data lines on the falling BCLK edge afterwards (and hold the internal state machine accordingly). This follows the overall scheme of wait states described above. Unlike read bus operations, this holds true for the ADV cycle as well, so it’s possible to get wait states on the first data transaction by asserting wait on the first BCLK cycle. For WWSC=1, this means in practice to have WAIT asserted while there is no bus activity, because there’s half a clock cycle between the assertion of CS, ADV and other bus signals, and the sampling of WAIT in this case. In order to give the slave time to assert WAIT depending on the bus operation’s nature, WWSC has to be increased to 2 at least.

Bus frequency

The bus EIM bus frequency is derived by default from PLL2 (at least on Armadeus), which is a 665 MHz clock, divided according to the emi_slow_podf field in the CBCDR register (see 7.3.3.6 in the reference manual). On the Armadeus platform, this field is set to 6 by default, so the clock is divided by 7, yielding a bus clock of 95 MHz. To change it, the following code snippet applies:

#define MXC_CCM_CBCDR 0x14
u32 temp_clk;
const emi_slow_podf = 7;

temp_clk = __raw_readl( MX51_IO_ADDRESS(MX51_CCM_BASE_ADDR)+ MXC_CCM_CBCDR );

__raw_writel( (temp_clk & (~0x1c00000)) | (emi_slow_podf << 22),
               MX51_IO_ADDRESS(MX51_CCM_BASE_ADDR)+ MXC_CCM_CBCDR )

The above code reduces the EMI clock to 83.125 MHz (divide by 8).

Note that there’s a little remark in section 4.6.7.3 of the datasheet (Table 53, WEIM Bus Timing Parameters), in footnote 4, saying “The lower 16 bits of the WEIM bus are limited to 90 MHz”. Indeed, running 95 MHz has proven to have rare bus malfunctions (after half a gigabyte of data or so, probably causing some local heating), taking the form of sporadic bus cycle missed, causing injection of bogus data or data being missed.

Reader Comments

I’m trying to access an FPGA device from an i.MX51 processor too, but can’t get it to work. I also tried using the same register settings as you mention. A difference I noticed was that in the waveforms you show (next page), if I turn off the BCM bit in WEIM Config Register, my BCLK turns off completely- even when the chip select goes low- no BCLK.
Any suggestions you could give would be helpful. Also, would it be possible to provide the boot program source code that you are using, perhaps I’m not initializing some clocks right?

Written By kadamaje on November 11th, 2011 @ 00:54

I rely on Linux’ initialization of the system, so I haven’t gone down to the details about how the clocks are set up. Sorry.

Written By eli on November 11th, 2011 @ 00:58

Hi. I am using imx51 processor in linux. When accessing WEIM with sync mode,32 bit,Chip select1 I am getting both data and address phase for every write but i am not getting data and address phase in read.
I set the CS1GCR1, CS1GCR2, CS1RCR1, CS1RCR2, CS1WCR1, CS1WCR2 registers.

My read register value is CS1RCR1- 0x0a010000
CS1RCR2 – 0x0
Did I miss any settings for read register.?

Written By Soya on December 28th, 2011 @ 04:58

Hi, I am using IMX53 EIM-interface to communicate with a uart-chip.
I configure it to work in asynchronous mode: 8-bit data residing on DA[23-16], and CS0 CS1 RW OE signals are configured.
The problem is CS RW OE keep high when I read or write to the address space.
I have configured the EIM-reg and IOMUX_GPR1-reg. Is there anything I missed?
Thank you.

Written By Zhenxin Zhang on July 3rd, 2012 @ 05:05

Hi,
I am using i.Mx6 EIM-interface to communicate with ALtera FPGA, I configured it to work in asynchronous multiplexed mode, but CS0 is always running high, i am not able to toggle it, can you please let me how can i toggle the CS0.

Written By ejacklin on March 19th, 2015 @ 11:56

Hi,
I am using i.mx51 board. I am trying to get the 32bit mux data CS0. (MUM=1,DSZ=011)

I am facing a problem, whenever i am trying to read/write data. i am able to read and write 16 bit of mux data, i am unable to get full 32bit data. If i change to normal mode means then i am getting upper 16bit data.

What changes have to made to get the entire 32bit data.

Written By ARUN on March 30th, 2016 @ 20:04

Add a Comment

Next Post: Oscilloscope views of the i.MX51′s EIM bus in action

Previose Post: The FPGA+ARM Armadeus APF51 board: Buildroot notes

my tech blog

Popular Posts

Latest Posts

Archives