547 views
in System Integration by
Hello,

We are in the process of implementing a large application on a WVGA display using STM32F769 CPU. We have been struggling with CPU load generated by EmWi 11 screens and animations from time to time, but we were able to manage it so far.

Now implementing a USB device in parallel, it turned out that the EmWi thread running makes USB FS device connection (which is time-sensitive) fail. Further investigation has shown that it seems to be the VSYNC interrupt that consumes much CPU time. I need to add that we don't use the MIPI display interface of STM32F769, rather there is a "traditional" RGB interface. For this, we have adapted the STM32F746 Platform Package in order to run on STM32F769. Frame double-buffer is located in SDRAM, GUI resources reside in QSPI flash.

May we ask you to give more insight into how to configure VSYNC interrupt and its ISR best (including NVIC interrupt priorities, code optimization, etc.) in oder to consume the least of CPU load and be coexistent with USB device implementation please.

Thanks,
Steffen Schmid

1 Answer

0 votes
by
Hello Steffen Schmid,

thanks for the detailed description of the issue. In case you are using a 'traditional' RGB interface, the code related to the VSYNC is very short. Typically, a new start address is given to the LTDC and VSYNC interrupt is configured again.

In case you have a large display the memory bandwidth consumed by the LTDC is much higher compared to a DSI display in command mode (as it is used in case of STM32F769 Discovery or Evalboard). In case there are a lot of changes on the screen, the DMA2D and the CPU are requesting also a certain memory bandwidth. From Embedded Wizard software point of view, it is a cooperative system and you can give any other tasks or interrupts a higher priority.

Are you using an RTOS?

Is the USB failure caused due to high CPU load, due to low available memory bandwidth or due to real-time violations?

Concerning the USB software integration and interrupt priorities let me recommend to get in contact with ST FAEs or the ST forum. Here we do not have extensive experiences.

Best regards,

Manfred Schweyer.
by
Hello Manfred,

Thanks for your swift reply.

Yes we are using Keil RTX5 RTOS.

I now have toggled GPIO ports in the VSYNC interrupt handler, and this confirms that ISR overhead is minimal indeed.

I suppose that USB failure is caused by real-time violations, the USB host expecting replies in less than 10 ms. I am beginning to suppose that there is something wrong with interrupt priorities in the NVIC, not directly related to EmWi. We will need to look into this some further.

Still, from the analysis done, it turned out that EmWi can run with far less CPU load than we expected *if* thread priorities and inter-process communication are well planned (which they previously were *not*). So runtime behavior of our application has improved, despite of EmWi likely not being the root cause of the issue.
by
Hello Manfred,

I am afraid we need to look further into this issue.
We have found a glitch with interrupt priorities outside EmWi indeed, and fixing the priorities alleviated USB problems, but it did not solve them completely.
Short bursts of data can now be sent and received over USB, but exchanging large amounts of data (like when formatting an MSC drive) still fails. It turns out that not only USB is affected, but also SD memory card.

Again, it seems related to the VSYNC interrupt...
It is not the VSYNC ISR itself consuming CPU time, but apparently it is something related to switching the buffer. If double buffering is turned off, the issue is gone (obviously there are display artifacts, but no USB/SD problems). Also, if programming the next VSYNC event after switching buffers is suppressed, USB/SD problems are gone.
Interestingly, problems do not occur immediately when VSYNC is enabled, but with a delay of several seconds. Apparently there is some buffering in the interfaces that can survive short periods.
 
What exactly happens when switching buffer address of the LTDC? Could there be some prefetch or caching of the frame buffer causing bus contention in the CPU, stalling data transfer of other peripherals, like USB or SD card, on the hardware level?
by
Hello Steffen,

I have no additional information about the internal details what happens when the buffer address of the LTDC changes. From Embedded Wizard GUI perspective, the address of the pending buffer is given to the LTDC by using HAL_LTDC_ConfigLayer(). Within the HAL layer, there are just a couple of LTDC registers that are accessed. Due to the fact that the LTDC is permanently reading data from the current framebuffer, I do not see any reason for such blocking. There is no caching - the LTDC has no access to the CPU cache.

Have you checked the errata sheet of the STM32F769 - maybe there is some hint. Otherwise, try to get some help from ST FAE or ST forum.

I'm sorry I can't give better advice here, but USB is too far removed from GUI development thematically.

Best regards,

Manfred.
by
Hello Manfred,

I wouldn't focus too much on USB being affected by this issue. The same issue can be experienced on the SD/MMC interface.
Both issues occur if and only if double buffering is active, which most likely locates the root cause close to EmWi.
For sure, there might be a bug introduced by the modifications of the Platform Package that we had to implement in order for STM32F769 to support RGB interface, so it is important to us to understand the issue thorouhghly.

I think some blocking of CPU buses may need to be taken into account. There are multiple bus masters, like the core, the DMAs, the memory-mapped QSPI interface, and the DMA2D graphics accelerator, each of them competing for bus access.
These mechanisms can hardly be analyzed by conventional debuggers because they are transparently and invisibly handled by hardware.
We have learnt that the DMA2D accelerator can generate heavy bus load when reading frame buffer contents. Beyond the CPU cache, the DMA2D has a hardware FIFO that prefetches buffer data (if this fails, the infamous "data starvation" issue occurs). We have not found, however, a means to quantify DMA2D bus load. Can this be done by a calculation based on pixel clock?

There are not just interrupt priorities to be set, but there are also DMA priorities.
What are your recommendations of DMA priorities when QSPI is memory-mapped, reading graphics resources from serial flash, in parallel with DMA2D and others like SD/MMC DMA?

Thanks,
Steffen
by
Hello Steffen,

is the VSYNC interrupt running as an RTOS ISR? Does the behavior change if you modify ew_bsp_display.c so that it is working linke bare-metal?

Just an idea...

Best regards,

Manfred.
by
Hello Manfred,

The VSYNC is a bare-metal ISR. Its CPU load and processing time have been verified. It is in the 10 us range.

After the new VSYNC line event has been programmed, there is a call to osSemaphoreRelease() though. This serves for EmWi to synchronize its buffer update in order to make sure only the background buffer is written to. The duration mentioned above includes releasing the semaphore.

Can we be sure that on the other end of the semaphore logic, somewhere in EwUpdate(), there is no blocking wait condition, using up CPU time?

Thanks,
Steffen
by
Hi Steffen,

the entire GUI application is running in the main loop and can be interrupted at any time by tasks with higher priorities or interrupts. The semaphores are used only for LTDC and DMA2D in order to give CPU time to other tasks while waiting for completion.

In case you are using the bare-metal implementation, the CPU waits for LTDC or DMA2D interrupts - but this happens in the context of the GUI task and can be interrupted accordingly.

Best regards,
Manfred.

Ask Embedded Wizard

Welcome to the question and answer site for Embedded Wizard users and UI developers.

Ask your question and receive answers from the Embedded Wizard support team or from other members of the community!

Embedded Wizard Website | Privacy Policy | Imprint

...