haselements.blogg.se - How to use eclipse tuner

#HOW TO USE ECLIPSE TUNER HOW TO#

Otherwise I think this is the one labelled deli. I think I know why that is, and a few tweaks to the RMW layer would probably help with that. The one that is zoomed in on delivery_locally_all_in_sync seems to be wasteful by spending time both in entity_status_signal and in signalling DATA_AVAILABLE.

The recvMC thread handles all application data arriving via multicast, and it seems it is spending most of its time (other than waiting for a packet to arrive) in delivering user data.

Fortunately, the existence of the recvUC and recvMC threads means that it really doesn't spend much time in select anyway. Select is not the smartest choice, but the work to eliminate that seems to get derailed all the time.

The recv thread is the only one that uses select to service multiple sockets at the same time.

The gc thread should be idle most of the time, and from the original flame graph, I think you probably zoomed in quite a bit because I can't really find it there 🙂 That's good, because otherwise at the very least the ddsi_tk.

In a time line, you'd probably see the create_proxy_reader/writer at the beginning of the experiment, and the delete_. From the original flame graph, I think you just have this latter, normal case. I know of some cases where readers/writers got created and deleted almost continuously (like several per second) and that induced a lot of load, but in most systems there is a just a bit of initial load in doing the discovery.

That the dq.builtins thread spends most of its time in creating/deleting proxy readers/writers is expected.

I'm sorry, but I take some expanded pictures.Īpologies are not necessary, but having the expanded pictures is very interesting.

I'm mentioning it mostly as a heads-up of what is to come, but you are of course welcome to try it. If this is all within a single machine, then the Iceoryx integration work should allow switching to end-to-end zero-copy communication but whether that's an option for you depends on various factors - ROS 2 version, QoS settings, whether you're using a fixed-size type.

#HOW TO USE ECLIPSE TUNER HOW TO#

I think you know how to do those settings, right? Setting General/MaxMessageSize to 65500B in the Cyclone DDS configuration should take care of that. Fortunately, the default maximum message size used by Cyclone is a bit odd and raising it to (nearly) 64kB should significantly reduce the number of packets. Reducing the number of packets would probably help. The cost of rcl_wait is due to the total mismatch between the RMW wait interface and the DDS wait interface, and overhead of ROS 2 executors ( rclcpp::Executor) is a well-known performance issue within ROS 2. The remainder of the time spent in Cyclone is minor, I'd say.

The bulk of the time spent by recvMC is ddsrt_recvmsg, which is a thin wrapper around the recvmsg system call (as is also evident from the flame graph).

Why there are two of those stacks, I'm not sure (not enough characters in the flame graph!).

bit has to do with delivering data to readers. RecvMC is the thread receiving incoming multicast data. The amount of time spent there suggests there are quite a few endpoints, but this should be time spent only during startup and so not the primary issue. The recv one is the thread most likely to be receiving that data and handing it off. is the thread processing "built-in" data, which really is primarily discovery data.

As far as I can tell from the (sometimes rather dramatically) abbreviated names in the flame graph:ĭq.bu.