289 views
in System Integration by

Hi everyone,

I am using Embedded Wizard 11 on an STM32F769 (Keil RTX, Keil USB middleware).
The system uses external SDRAM (IS42S32800J) for framebuffers, EW memory pool, and screenshots.
My setup and findings so far:

  • SDRAM is partitioned in the linker/MPU into:

    • .gui section: used for the EW memory pool and framebuffers/doubleframebuffers.
      The memory pool part is cacheable, but the framebuffer/doubleframebuffer region is marked non-cacheable (and both are non-bufferable).

    • .screenshot section: exclusive SDRAM region (non-cacheable, non-bufferable) for screenshot copying/export.

    • All other SDRAM sections: cacheable, non-bufferable.

  • D-Cache is enabled for GUI performance.

  • Screenshot export flow:

    • Framebuffer is copied into a screenshot buffer (SDRAM, non-cacheable).

    • Then, the screenshot is transferred piecewise (e.g., 512 bytes at a time) via a local buffer in internal SRAM to the USB stick (using fwrite).

    • So: SDRAM (non-cacheable) → SRAM (local buffer) → USB stick.

  • Offscreen is currently disabled for debugging.

  • All memory locations (framebuffers, screenshot buffer, local SRAM buffer) have been verified by address and mapfile.


The problem:

  • While the screenshot is being written to USB, visible artifacts or flicker appear on the display output.

  • If I globally disable D-Cache, these artifacts disappear entirely (but GUI performance drops significantly).

  • The screenshots themselves are always correct; the artifacts appear only on the visible display while the USB write or copying is ongoing.

  • The artifacts persist even if USB writes are throttled to very small blocks with delays.


My main question:

  • Could this be caused by the EW memory pool being cacheable (even when all framebuffers and screenshot buffers are non-cacheable)?

  • Why does EW require its memory pool to be cacheable? (If I map the entire SDRAM as non-cacheable, the GUI crashes immediately at startup.)

  • Are there EW settings, APIs or best practices to safely pause rendering, or to improve memory/cache handling for mixed regions (cacheable pool, non-cacheable framebuffer/screenshot) with D-Cache enabled?

  • Any additional advice for reliably preventing visible artifacts during USB/DMA2D/SDRAM bus contention?


Thank you for any help or best-practice guidance!

1 Answer

0 votes
by
Hello,

thanks a lot for the detailed description. Nevertheless, let me ask for some more details:

First of all, what do you mean with "visible artifacts or flicker" - is it

a) a permanent destroyed GUI which persists as long as no new GUI content is drawn, or is it

b) a line wise shift of single lines or a complete display area for a single frame. Typically, the lines are shifted to the right.

I assume it is the second case, which indicates a memory bandwidth issue.

Some more questions:

- What color format are you using?

- How is your SDRAM connected to the MCU (16bit / 32bit)?

Best regards,

Manfred.
by

Hello,

thank you for your detailed questions and your analysis.

Regarding your question about the "visible artifacts or flicker":
It is not a permanent destruction of the GUI (so, not case a). Instead, we observe effects similar to vsync problems: certain areas of the screen appear shifted or "scrambled", usually horizontally (lines or blocks shifted to the right). These artifacts are not static—the affected areas and the nature of the distortions can change from frame to frame and may persist for several consecutive frames before the display content updates correctly.

To answer your specific questions:

  • Color format: We are using RGBA8888.

  • SDRAM interface: The SDRAM is connected to the MCU via a 32-bit bus.

I initially suspected a memory bandwidth issue. However, what puzzles me is that if I disable the D-Cache, these artifacts disappear completely. With D-Cache enabled, the glitches occur regularly. That’s why I’m not sure whether this is a pure bandwidth problem, or if it could be a cache coherency issue between the CPU and DMA/LTDC—especially since we access SDRAM in small blocks for operations like saving screenshots.

Do you have any further suggestions or thoughts?

by

Hello,

based on your detailed description and our internal analysis, here's the explanation for the flickering issue:

Why Disabling D-Cache Reduces Flickering

We assume that when D-Cache is disabled, all CPU memory accesses go directly to SDRAM, which actually reduces the peak memory bandwidth demand. With D-Cache enabled, cache line fills and writebacks likely create burst memory traffic that can temporarily saturate the memory bus. During your screenshot operations, these cache-related memory bursts compete with the LTDC's continuous pixel data fetching, causing the display controller to miss its real-time requirements and resulting in the horizontal shifts/scrambling you observe.

The issue isn't cache coherency per se, but rather memory bandwidth contention between multiple bus masters (CPU with cache activity, DMA2D, LTDC) all competing for SDRAM access simultaneously.

Recommendation Based on Our STM32F769-Discovery Observations

Based on our analysis with the STM32F769-Discovery board, we observed that the default display configuration uses a frame rate of 65Hz with a pixel clock of 26MHz. We found that reducing the display frame rate from 65Hz to 60Hzand correspondingly lowering the pixel clock to approximately 23MHz successfully eliminated flickering during intensive memory operations.

We recommend checking if your display configuration uses similar parameters and testing with reduced frame rate and pixel clock settings. This approach reduces the LTDC's memory bandwidth requirements and provides more bandwidth headroom for other bus masters during operations like screenshot exports, while maintaining your D-Cache performance benefits.

Additional Recommendation to Avoid Concurrent Access of CPU and DMA2D

There is one function in ew_bsp_graphics.c that can be used to avoid parallel activities of DMA2D and CPU:

/*******************************************************************************
* FUNCTION:
*   EwBspGraphicsConcurrentOperation
*
* DESCRIPTION:
*   The function EwBspGraphicsConcurrentOperation configures the operation mode
*   of DMA2D and CPU. If concurrent operation is enabled, the CPU will work in
*   parallel while the DMA2D is transferring data. If concurrent operation is
*   disabled, the CPU will wait everytime the DMA2D is active.
*   This feature is intended to limit the memory bandwidth, e.g. during display
*   update or other bandwidth consuming activities.
*
* ARGUMENTS:
*   aEnable - flag to switch on/off concurrent operation mode.
*
* RETURN VALUE:
*   None
*
*******************************************************************************/
void EwBspGraphicsConcurrentOperation( int aEnable )

This function could help to avoid the memory bandwidth issue in certain situations. However, the graphical performance will go down. This function can be used to avoid parallel activities of DMA2D and CPU while the screenshot is exported.

Usage recommendation: Call EwBspGraphicsConcurrentOperation(0) before starting the screenshot export operation and EwBspGraphicsConcurrentOperation(1) after completion to temporarily reduce memory bandwidth contention during critical operations.

Best regards,

Manfred.

by

Hi Manfred,

Thanks for your response. Lowering the PixelClock did help somewhat, but unfortunately it didn’t fully solve the problem.

Disabling concurrent graphics operations with EwBspGraphicsConcurrentOperation(0) also didn’t make a difference—in fact, it actually seemed to make things a bit worse. I also tried marking all other SDRAM areas as non-cacheable, but that didn’t help either. The only area where this isn’t possible is the memory pool for Embedded Wizard. Do you know why that is? Setting it as non-cacheable causes the system to crash immediately when creating the root object.

Since the DCache issue doesn’t seem to be fully controllable, and the visual artifacts are unacceptable, I think I’ll have to disable the D-Cache entirely.

by
You mentioned that the memory pool is configured as cacheable and non-bufferable, which means that it is a write-through cache configuration. In case you are using write-back cache configuration it is essential to take care on proper cache clean and cache invalidate. With version 11 we introduced this support: CPU data cache clear and invalidate operations were added in order to support mixed drawing operations by DMA2D and CPU (especially necessary for data cache with write-back configuration). See ewextgfx.c file.

In principal, it should be possible to set the memory pool to uncached - which I do not recommend. Moreover, disabling the D-Cache will be not the best solution from a performance perspective.

Are you using a DMA for the screenshot functionality?

Btw: What is the display resolution?

Can you share your MPU settings - maybe this point us into the right direction...
by

Yes, I have verified that the cache handling in ewextgfx.c is present exactly as in your templates – the code is directly derived from your examples.


MPU configuration for SDRAM:

 /* Configure the MPU attributes for SDRAM_Banks area to strongly ordered
     This setting is essentially needed to avoid MCU blockings!
     See also STM Application Note AN4861 */
  MPU_InitStruct.Enable           = MPU_REGION_ENABLE;
  MPU_InitStruct.Number           = MPU_REGION_NUMBER4;
  MPU_InitStruct.BaseAddress      = 0xC0000000;
  MPU_InitStruct.Size             = MPU_REGION_SIZE_512MB;
  MPU_InitStruct.SubRegionDisable = 0x0;
  MPU_InitStruct.TypeExtField     = MPU_TEX_LEVEL0;
  MPU_InitStruct.AccessPermission = MPU_REGION_NO_ACCESS;
  MPU_InitStruct.DisableExec      = MPU_INSTRUCTION_ACCESS_DISABLE;
  MPU_InitStruct.IsShareable      = MPU_ACCESS_SHAREABLE;
  MPU_InitStruct.IsCacheable      = MPU_ACCESS_NOT_CACHEABLE;
  MPU_InitStruct.IsBufferable     = MPU_ACCESS_NOT_BUFFERABLE;
  HAL_MPU_ConfigRegion(&MPU_InitStruct);

  /* Configure the MPU attributes for SDRAM 32MB to normal memory Cacheable */
  MPU_InitStruct.Enable           = MPU_REGION_ENABLE;
  MPU_InitStruct.Number           = MPU_REGION_NUMBER5;
  MPU_InitStruct.BaseAddress      = 0xC0000000;
  MPU_InitStruct.Size             = MPU_REGION_SIZE_32MB;
  MPU_InitStruct.SubRegionDisable = 0x0;
  MPU_InitStruct.TypeExtField     = MPU_TEX_LEVEL0;
  MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
  MPU_InitStruct.DisableExec      = MPU_INSTRUCTION_ACCESS_ENABLE;
  MPU_InitStruct.IsShareable      = MPU_ACCESS_SHAREABLE;
  MPU_InitStruct.IsCacheable      = MPU_ACCESS_CACHEABLE;
  MPU_InitStruct.IsBufferable     = MPU_ACCESS_NOT_BUFFERABLE;
  HAL_MPU_ConfigRegion(&MPU_InitStruct);

    /* Framebuffer + DoubleFramebuffer */
    MPU_InitStruct.Enable           = MPU_REGION_ENABLE;
    MPU_InitStruct.Number           = MPU_REGION_NUMBER6;
    MPU_InitStruct.BaseAddress      = 0xC0000000;
    MPU_InitStruct.Size             = MPU_REGION_SIZE_4MB;
    MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
    MPU_InitStruct.IsCacheable      = MPU_ACCESS_NOT_CACHEABLE;
    MPU_InitStruct.IsBufferable     = MPU_ACCESS_NOT_BUFFERABLE;
    MPU_InitStruct.IsShareable      = MPU_ACCESS_SHAREABLE;
    HAL_MPU_ConfigRegion(&MPU_InitStruct);

    /* Screenshotbuffer */
    MPU_InitStruct.Enable           = MPU_REGION_ENABLE;
    MPU_InitStruct.Number           = MPU_REGION_NUMBER7;
    MPU_InitStruct.BaseAddress      = 0xC1E00000;
    MPU_InitStruct.Size             = MPU_REGION_SIZE_2MB;
    MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
    MPU_InitStruct.IsCacheable      = MPU_ACCESS_NOT_CACHEABLE;
    MPU_InitStruct.IsBufferable     = MPU_ACCESS_NOT_BUFFERABLE;
    MPU_InitStruct.IsShareable      = MPU_ACCESS_SHAREABLE;
    HAL_MPU_ConfigRegion(&MPU_InitStruct);


SDRAM layout:

  • 16MB for GUI (framebuffer)

  • 16MB for device data & screenshot buffer


Addresses and sizes in ewconfig.h:
SDRAM_BASE_ADDR = 0xC0000000
SDRAM_SIZE_BYTES = 0x01000000
FRAME_BUFFER_WIDTH = 800
FRAME_BUFFER_HEIGHT = 480
FRAME_BUFFER_ADDR = SDRAM_BASE_ADDR
FRAME_BUFFER_SIZE = FRAME_BUFFER_WIDTH * FRAME_BUFFER_HEIGHT * FRAME_BUFFER_DEPTH
DOUBLE_BUFFER_ADDR = 0xC0177000
DOUBLE_BUFFER_SIZE = FRAME_BUFFER_SIZE
EW_MEMORY_POOL_ADDR = FRAME_BUFFER_ADDR + 0x00400000
EW_MEMORY_POOL_SIZE = SDRAM_SIZE_BYTES - 0x00400000


For MPU reasons, framebuffer and double framebuffer are placed directly next to each other, even if this wastes some memory. Otherwise, the MPU config did not work reliably.

STM recommends the framebuffer region should be marked as not cacheable, so I configured it accordingly.

No DMA access from device side to SDRAM is used; all access is CPU-controlled.

by

Thanks. Following things come into my mind:

1.) According to AN4861 the framebuffers should be located into two different memory banks (see chapter Optimizing the LTDC framebuffer fetching from SDRAM). This avoids that LTDC and CPU/DMA2D are fetching data from the same SDRAM bank. Therefore we locate the double buffer at the end of the SDRAM area.

2.) Configure the MPU as strongly ordered for not defined regions - this setting is essential to avoid speculative accesses in undefined address areas. This will not have an impact on performance but may prevent from system hang-ups.
  MPU_InitStruct.Enable           = MPU_REGION_ENABLE;
  MPU_InitStruct.Number           = MPU_REGION_NUMBER0;
  MPU_InitStruct.BaseAddress      = 0x00;
  MPU_InitStruct.Size             = MPU_REGION_SIZE_4GB;
  MPU_InitStruct.SubRegionDisable = 0x87;
  MPU_InitStruct.AccessPermission = MPU_REGION_NO_ACCESS;
  MPU_InitStruct.DisableExec      = MPU_INSTRUCTION_ACCESS_DISABLE;
  MPU_InitStruct.IsBufferable     = MPU_ACCESS_NOT_BUFFERABLE;
  MPU_InitStruct.IsCacheable      = MPU_ACCESS_NOT_CACHEABLE;
  MPU_InitStruct.IsShareable      = MPU_ACCESS_SHAREABLE;
  MPU_InitStruct.TypeExtField     = MPU_TEX_LEVEL0;
  HAL_MPU_ConfigRegion(&MPU_InitStruct);

3.) SDRAM for Memory Pool and Framebuffers can be configured as following:

  /* Configure the MPU attributes for SDRAM 16MB to normal memory Cacheable */
  MPU_InitStruct.Enable           = MPU_REGION_ENABLE;
  MPU_InitStruct.Number           = MPU_REGION_NUMBER4;
  MPU_InitStruct.BaseAddress      = 0xC0000000;
  MPU_InitStruct.Size             = MPU_REGION_SIZE_32MB;
  MPU_InitStruct.SubRegionDisable = 0x0;
  MPU_InitStruct.AccessPermission = MPU_REGION_FULL_ACCESS;
  MPU_InitStruct.DisableExec      = MPU_INSTRUCTION_ACCESS_ENABLE;
  MPU_InitStruct.IsBufferable     = MPU_ACCESS_NOT_BUFFERABLE;
  MPU_InitStruct.IsCacheable      = MPU_ACCESS_CACHEABLE;
  MPU_InitStruct.IsShareable      = MPU_ACCESS_NOT_SHAREABLE;
  MPU_InitStruct.TypeExtField     = MPU_TEX_LEVEL0;
  HAL_MPU_ConfigRegion(&MPU_InitStruct);

Your MPU setting for the SDRAM (memory pool and framebuffer) is MPU_ACCESS_SHAREABLE (!!!) which means de facto that it is uncached. According to ST, when a region is used as a buffer for data exchange (USB, Ethernet, etc.), it should be declared as shareable in the MPU. This will automatically disable the cache in this region and avoid cache maintenance operations.

Maybe you can try the original settings provided in our Build Environments - and by the way: Maybe it is a worth thinking about an update to the latest version, just to be sure to be up-to-date.

Best regards,

Manfred.

 

 

by
Manfred, thank you very much for your help. I have updated the MPU configuration based on the latest template and separated the frame and double buffer as you recommended – one at the start and one at the end of the 16MB SDRAM area. Now the system runs without artifacts or display errors, even at a full 26MHz pixel clock.

I was also mistaken about our Embedded Wizard version – we are currently using EW13, not EW11. However, due to the upgrade process from EW9/10 to 13, it looks like an error slipped in along the way.

Thanks again for your fast and professional support!

Embedded Wizard Website | Privacy Policy | Imprint

...