1.5k views
in GUI Development by
Hello,

One of the screens in the application we're developing has a scrolling waveform, and it takes longer to update than seems reasonable. For background, we're using a Charts::Graph object with a Charts::CoordList, and the list of coordinates is about 35 points long. Each time we get a new reading (currently every 200 msec), we shift the existing points in the coordinate list to the left by about 1/35 of the width of the graph and invalidate the view.

I've done some timing analysis using our 1 msec system tick. What I see is that the main loop in the function ChartsGraph_Draw in Charts.c usually takes less than a tick to do a loop iteration, but occasionally takes about 25 ticks. This long delay only seems to happen once per call to the ChartsGraph_Draw function call. Also, from what I can tell it's only the first call to GraphicsCanvas_WarpBitmap in the loop, not the second, that has the long delay.

To see if some interrupt processing could be causing the delay we added code to disable interrupts before the loop and re-enable them after. This causes the system to hang when the graph moves across about half the screen. Pausing the code in the debugger shows that the system is waiting for a DMA transfer to complete (in EwBspGraphicsWaitForCompletion, called from STM32LockSurface, called from STM32BlendDriver, called from EwExecuteTasks). Alpha blending is turned off for the Graph object.

We are running on an STM32F469 processor, have a 1Mb x 32 SDRAM for display memory for a 800 x 480 display, and are using the  STM.STM32.RGBA8888 platform package. Graphics acceleration is turned on and the DMA interrupts are working. It's possible that we've mis-configured something, but everything looks OK.

Do you have any idea why we see the occasional delay in the graph-drawing loop, or have any suggestions as to what changes we can make to speed up the performance?

Thank you very much for your help!

Greg

1 Answer

0 votes
by
Hi Greg,

thanks for the detailed description - nevertheless, let me ask you some more details:

- Are you using some images arround or behind the chart graph, e.g. a full screen background image?

- Does every chart update takes always long time - or does the chart move smooth, but sometimes it stucks (e.g. after every 20th data reading)?

- Is the property 'Buffered' set to false?

- Which version of Embedded Wizard are you using?

- Does the example 'WaveformGenerator' run smooth on your target?

Best regards,

Manfred.
by
Hi Manfred,

Thanks for getting back to me. We have buffering disabled, but I ran a test with buffering enabled and got the same behavior, except that the delay we see is twice as long.

The delay doesn’t happen until the waveform gets to the middle of the graph (it scrolls from right to left), at which point the delay starts to happen. Once the delay starts happening it happens with every update.

We are using version 8.20.

The graph takes up most of the screen it’s on, but we’re not doing anything fancy with the background. The graph does sit inside a Rectangle, but removing it doesn’t change the behavior. Neither does removing the BackColor of the Graph. Here’s the graph definition from the .ewu file:

  object Charts::Graph Graph
  {
    preset Bounds = <0,148,690,460>;
    preset Buffered = false;
    preset Coordinates = GraphCoordList;
    preset LineColor = ProjectResources::Color_Blue;
    preset LineWidth = 3;
    preset DotColor = ProjectResources::Color_Blue;
    preset DotWidth = 1.0;
    preset CoordOrigin = <0,154>;
    preset PixelPerUnit = <1,1>;
    preset BackColor = #222222FF;
    preset HorzGridColor = #646496FF;
    preset VertGridColor = #646496FF;
    preset GridDistance = <100,100>;
    preset LineBitmap = Charts::Line7x100;
  }

We scroll by going through the coordinate list and subtracting a fixed value from each point&rsquo;s X value, and we rely on Embedded Wizard to dispose of values that go off the left side of the graph. I tried a test where my code filters out the non-displayable values and creates a new coordinate list (like your waveform example, which runs fine on our target) and there was no change in the behavior.

Thanks again for your help!
by

Hi Greg,

can you try to figure out the differences between the waveform example and your application?

For example, you can do the following (directly within your application):

  • Disable your data source / read accesses.
  • Create a Charts::CoordList with a maximum number of 35 items. 
  • Create a timer (e.g. with 200 ms) and trigger a slot method where you just add a new coordinate (random x, random y value) to a coordinate list with AddCoord().
  • Assign the CoordList to the Graph in order to make an update.

As a result you should get every timer expiration a new line segment. Does this run smoothly - especially, when you reduce the timer period?

I just want to ensure that not the data aquisition is the blocking part. Btw: Are you using an operating system?

Another question: What do you mean with "delay" when the "waveform gets to the middle"? What is the resulting framerate? You can find the implementation of a framerate display within the example 'GraphicsAccelerator'.

Best regards,

Manfred.

by
Hi Manfred,

I tried commenting-out communication to our device and using fake data, and it improved things - the delays are smaller, but still present. I had previously gathered timing information on the main loop, and the only things that take more than 1-2 msec when the graph is updating are the UI update and the call to EwReclaimMemory, so I don't think the device communication is using a lot of processor time.

I checked the memory usage in the main loop before and after memory reclamation using calls to EwPrintProfilerStatistic(0) and don't see a big difference in usage before vs. after. The total used is around 906,000 bytes and we've allocated 13 MB to the memory pool (via the tlsf_create_with_pool function call).

To clarify when the delay starts occurring (your question about the "middle"), our graph starts out with no waveform and is updated every 200 msec. So as we get the initial 35 data points the graph grows from the right of the screen to the left of the screen. When it reaches about halfway (after we've graphed approximately 18-19 data points) is when we start to see the multi-msec delays in the ChartsGraph_Draw loop. Before that time the loop executes very quickly.

I can work on creating a minimal application that just draws the graph if you think that might help given what I've found so far. Are there other areas I could investigate? Could memory pool fragmentation be an issue?

Thanks,

Greg
by
Hi Manfred,

I modified your Waveform example to sweep across the screen instead of refreshing the screen each time with a new set of data points (you have to be in the "scatter" mode to see it draw). You can clearly see how quickly graph updates slow down in this mode. I put my instrumentation code into the ChartsGraph_Draw loop and this example shows the same problem I've been seeing. I don't seem to be able to attach a zip file to this comment. Is there a way I can send you the example?

Thanks,

Greg
by
Hi Greg,

thanks for your investigations.

I do not think that memory fragmentation is an issue, causing such delay.

I'm very intersted in analyizing your modified example - just pack the example (*.ewp, *.ewu and all needed resources) without generated codes and upload it directly here via 'link' (using the "chain" symbol) on the page 'upload'.

Best regards,

Manfred.
by

https://ask.embedded-wizard.de/?qa=blob&qa_blobid=3954858081757759754

Hi Manfred,

Here's a zip file with the modified Waveform Generator project. You'll see a flag parameter added to the CalculateCoords function that selects which code to run. The mode has to be the one selected with the right-most icon to see my example work.

I also put in a modified copy of the Chart draw function so you can see how I did the timing. If you set a breakpoint where the hadLongDurationCounter1 or hadLongDurationCounter2 variables are incremented you can inspect the durations array to see the timing.

The updated timer is set to be 20 msecs in this version - I get the long delays running slower, but this should show the problem.

Thanks for your help, and let me know if you have any questions.

Greg

by

Hi Greg,

hmmmm, I'm not sure if are talking about the same...

When I take your example, adding an FPS counter (like in the Graphics Accelerator demo) and put it on a STM32F469 Discovery Board I get constant 20 frames per second. This means, that drawing the graph with more than 150 coordinates including screen update takes less than 50 ms.

If you want to draw a graph with 35 coordinates within 200 ms - this should not be a problem.

What do you mean with delay?

Another thing: Within this week we will release Embedded Wizard version 9.0 - which contains a new feature: Vector Graphics. This may simplify your project and potentially increase the performance...

Best regards,

Manfred.

by

Hi Manfred,

I've attached an updated version that goes into the mode of graph updates we're using. You should see the graph move across the page. I've also attached a modified version of the Charts.c file that EmW generates. What I see on our target (a custom board with a STM32F469 ) is that the waveform slows down as it scrolls across the page, and the delay is in the ChartsGraph_Draw function in the places I've added the delay calculations. Loop iterations usually take less than a msec, but will occasionally take more than 20 msecs.

I hope this is clearer, but let me know if there's anything else I can tell you.

We can try the vector graphics when it comes out, too.

Greg

https://ask.embedded-wizard.de/?qa=blob&qa_blobid=6284784978222892769

 

by
Hi Greg,

if you are using a custom board, then the situation may be a different one. How is the display connected - DSI or RGB parallel?

Please consider, if a display is refreshed with 50 Hz, redrawing of the screen is limited to 20 ms. If the system is operating in double-buffering mode (which is the typical case), then the drawing of the different screens is synchronized to the display frame rate. This means, when drawing is fast enough, the system has to wait until the framebuffers are exchanged. In case of a bare metal implementation, the CPU is waiting in a loop, in case of an operating system, the CPU time is given to other tasks.

Maybe you recognized this waiting time as delay?

Manfred.
by
Hi,

As you suggest, I suspect we're running out of CPU cycles when processing our waveform.

To answer your hardware questions, our display is connected via parallel RGB, with external static RAM used for the frame buffer. We don't store anything else in this RAM. Our system clock is about 144 MHz, and the RAM is clocked at half that rate. We could go a little faster, but I'm not sure it would make much difference. We use the LTDC on the STM32F469 hardware and graphics acceleration is turned on.

We've upgraded to EW 9.0 and I've changed my code to use the StrokePath class and one Path object. We collect data asynchronously and update the StrokePath each time UpdateViewState is called for the screen. For each update I delete the old path and create a new one with the most recent data. I've varied to number of points plotted each time (the StrokePath is 690 pixels wide), from about 70 points to about 700 points. As you'd expect, fewer points take less time to graph, but even 70 points takes too long to keep up with real time.

I also notice that the length of the lines drawn affects the time dramatically. If I graph random data that varies between +/- 30 pixels tall it's much faster than if the data is 300 pixels tall.

I did some timing analysis of the main loop to see where time was being spent. In the EW 8.2 version using the Graph class we spent most of the time in the "Update" function (called as Update( viewport, rootObject );). In the EW 9.0 StrokePath version the majority of the time is spent in the EwProcessSignals() function. The time spent on the signals increases dramatically with the length of the lines. The time spent in Update function also goes up with the line length, but not as much. Is there a way to optimize the usage of StrokePath to lower the number or processing time of signals?

I've varied the thickness of the line (down to 1 pixel wide), turned double buffering on and off, and set StrokePath.Quality on and off. None of these variations make much difference. StrokePath.JoinPoints is set to Bevel, and StartCap/EndCap are set to Flat.

Our goal is to be able to graph, in real time, sinusoidal waveforms up to 300 pixels peak-to-peak, with at least 70 points (preferably more) spread over a StrokePath 690 pixels wide. We'll receive update 10 times per second and each update could have up to 10 data points. Do you think that's unrealistic given our hardware?

Thanks for all your help!

Greg
by

Hello Greg,

in your application case you want to display a large graph with 690x300 pixel size with at least 10 FPS per second. That is in fact an ambitious requirement for the STM32F469 with 144 MHz. 

What can you do?

Step 1: Clarify the reason why EwProcessSignals() is consuming so much CPU. 

EwProcessSignals() is responsable for the execution of all actually waiting postsignal operations. I suppose this can be caused by the data processing in your project. The following questions should clarify the cause:

Q1. Have you configured the Stroked Path view to be Buffered? If yes, the view will draw the path into an internal bitmap during the EwProcessSignals() phase. This would make it understandable why EwProcessSignals() consumes so much time.

Q2. Otherwise, what happens if you make the Stroke Path View invisible (you set the property Visible of the view to false). Does it improve the screen refresh rate?

If not, EwProcessSignals() does spend the time for other processing within your application. This can be the code you have implemented to load the path data. 

Q3. Try to comment out or better simplify the code to reload the path. Does the performance improve?

Step 2: Optimize the code to load the path data.

Deleting and completely reloading the path with new data is superfluos. Try to optimize the code according to the explication in the section Evaluate and modify the coordinates stored in the Path Data object. The best would be to use the ShiftNodes() method which removes old entries from the path so you can append new data easily. The method automatically moves all remaining path nodes.

Step 3: Reduce the area of the curve

The time needed to draw a shape depends on the area enclosing this shape. As you have observed, the higher the area the longer the processing. This is because the underlying raster algorithm has to process more pixel rows with a higher shape. This costs CPU time.

Step 4: Disable the anti-aliasing

With the property Quality you can control whether the raster algorithm applies the anti-aliasing or not. However, according to your description you have already tried to do this. With disabled anti-aliasing the performance should be much better. 

If this is not the case, the problem seems to be related to other tasks in your application than the drawing of the vector graphic. Can you verify it again?

Best regards

Paul Banach

by
Hi Paul,

Thanks for the ideas! The high signal processing duration is related to having buffering turned on - when I turn it off that part gets much faster. I don't think my data handling code is affecting signal handling performance.

However, I run into problems when trying to plot more than 200 points - with buffering turned off, once there are more than 200 points in a path the path disappears from the display. The value I use in the call to initialize the path has many more than 200 edges. I don't see another setting that might affect this, and if buffering is on it works fine. Do you have any suggestions on how to fix that?

I've also changed my code so the path isn't re-initialized every time - I just call BeginPath each time through. That doesn't make much difference. I'm working on a version using ShiftNodes and will let you know what I find there.

Making the StrokePath not visible greatly reduces the time spent in the UI update function - with buffering off and the Stroke path not visible the main loop only takes a few milliseconds. Reclaiming memory seems to always take 4-5 milliseconds no matter what's going on with the graph. Is this what you'd expect?

Thanks again for your help, and I'll let you know when I have data about how my ShiftNodes test performs.

Greg
by

Hi Greg,

I suppose the reason fo the disappeared path with 200 edges is the per default configured size of the issue buffer. Issue buffer is used internally to accumulate drawing instructions. Usually, if the issue buffer is exhaust due to too many edges in the path you will get the following error message on the console:

The 'IssueBuffer' for destination bitmap XYZ is too small for polygon data with XYZ edges. Adjust in your project the macro definition EW_MAX_ISSUE_TASKS and rebuild the entire application. You can also reduce the number of edges stored in the path.

This problem and the solution are addressed in the following thread Draw a Path on Canvas.

Regarding your observed time aspects with invisible Stroke Path view, yes this is what I would expect. Previously I wondered about the time spent during the signal processing. Now it is clear. With enabled buffering all the path rasterization was performed while signals are processed just before the screen update took place.

The fact, that after hidding the Stroke Path view the main loop takes few ms only indicates, that your application code is fast. The question is what do you mean with few ms? For example, with 10 screen updates per second you have less than 100 ms time for all receiving new data, preparing the path and finally performing the screen update.

Do you have many other views behind the Stroke Path view or in front of it? Every time the Stroke Path view is redrawn during the screen update also all other views intersecting this area are affected. Can you test what happens if you make all these views invisible?

Regarding the buffering mode: Activating the buffering makes sense only if the displayed path remains unchanged. In such case when the screen is updated it is not necessary to redraw the path again and again. In turn, if the path changes with every screen update, the buffering adds additional unnecessary overhead.

I hope this helps you further.

Best regards

Paul Banach

 

by

Hi Paul,

Thanks for the tip on how to get my code working with buffering off. It drops the time spent in ProcessSignals a lot, although the UI Update function takes longer in compensation. But overall, it seems to be the fastest method for our application. I tried a solution using the ShiftNodes function, but it was slower than rebuilding the path each time.

On your other suggestion, hiding everything behind the StrokePath didn't make any discernable difference. 

If I make the StrokePath invisible the main loop takes an average of 2 milliseconds, with a worst case of about 16 milliseconds. Here's how I measure the loop timing:

  • I note the time at the start of the loop, using the HAL_GetTick function of the STM32 library. It has a one millisecond resolution.
  • At many points in the loop I compute the time since the start of the loop and store that.
  • After each 100 loop iterations I average the time spent at each step. I also store the maximum value.
  • To gather data I set a breakpoint at the averaging step and record the the averages and maximum values. I do this a few times per test.
The spent in the loop increases when:
  • More points are plotted
  • Buffering is turned off
  • The waveform takes up more of the screen
This all seems as I'd expect. We're still not going as fast as we'd like to, so I'll keep thinking about ways to streamline what we're doing. One thing we were wondering is if it's possible for us to create a bitmap of the graph area and plot the points ourselves. We can define the graph in such a way that we have one data point per pixel, so could very quickly fill in a bitmap and erase the old data. I didn't see that functionality in the API - the most basic classes still have their own drawing primitives and no direct pixel access. I realize this isn't using your product the way it's intended, but it might allow us to go faster.
 
Again, I very much appreciate all your help!
Greg
 
by
Hi Greg,

I think one of the advantages of Stroke Path is, that you can draw anti-aliased lines and determine the thickness of the stroked path, as well as how path edges should join together (round, bevel or miter) and how the path caps should end (round, flat, triangle or square).

In case you simply want to plot some data (like one dot or one vertical line per x-value) you can draw into your own bitmap (canvas). Every time a new sample has to be added, the new value is drawn into the bitmap and the bitmap is copied on the screen.

You will find an implementation of this approach within the example SolarDemo class Solar::PowerPanel and Solar::DataPlotter.

Of course, this approach is not so convenient and a little bit outdated - but maybe this can serve as some inspiration.

But: At the end, you have to copy one (more or less full-screen) bitmap per frame using a F469 with 144 MHz.

Best regards,

Manfred.

Ask Embedded Wizard

Welcome to the question and answer site for Embedded Wizard users and UI developers.

Ask your question and receive answers from the Embedded Wizard support team or from other members of the community!

Embedded Wizard Website | Privacy Policy | Imprint

...