Skip to content

IGNITE-27871 Improve deployment lookup to reduce deploy() contention …#12760

Closed
oleg-vlsk wants to merge 9 commits intoapache:masterfrom
oleg-vlsk:ignite-27871
Closed

IGNITE-27871 Improve deployment lookup to reduce deploy() contention …#12760
oleg-vlsk wants to merge 9 commits intoapache:masterfrom
oleg-vlsk:ignite-27871

Conversation

@oleg-vlsk
Copy link
Copy Markdown
Contributor

…for locally available tasks with peerClassLoadingEnabled=true

Thank you for submitting the pull request to the Apache Ignite.

In order to streamline the review of the contribution
we ask you to ensure the following steps have been taken:

The Contribution Checklist

  • There is a single JIRA ticket related to the pull request.
  • The web-link to the pull request is attached to the JIRA ticket.
  • The JIRA ticket has the Patch Available state.
  • The pull request body describes changes that have been made.
    The description explains WHAT and WHY was made instead of HOW.
  • The pull request title is treated as the final commit message.
    The following pattern must be used: IGNITE-XXXX Change summary where XXXX - number of JIRA issue.
  • A reviewer has been mentioned through the JIRA comments
    (see the Maintainers list)
  • The pull request has been checked by the Teamcity Bot and
    the green visa attached to the JIRA ticket (see TC.Bot: Check PR)

Notes

If you need any help, please email dev@ignite.apache.org or ask anу advice on http://asf.slack.com #ignite channel.

P2PClassLoadingFailureHandlingTest.class,
P2PClassLoadingIssuesTest.class
P2PClassLoadingIssuesTest.class,
GridDeploymentLocalStoreReuseTest.class
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comma to the end of line please (to reduce conflicts on merge)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +135 to +141
CompletableFuture<T2<UUID, Set<UUID>>> fut = client.compute()
.withTimeout(timeout).
<T2<UUID, Set<UUID>>, T2<UUID, Set<UUID>>>executeAsync2(TestTask.class.getName(), null)
.toCompletableFuture();

try {
fut.get();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

                client.compute().execute(TestTask.class.getName(), null);

Copy link
Copy Markdown
Contributor Author

@oleg-vlsk oleg-vlsk Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used this snippet, thank you.

Comment on lines +108 to +113
List<IgniteInternalFuture<Void>> futs = new ArrayList<>(CLIENT_CNT);

for (IgniteClient client : clients)
futs.add(runAsync(() -> executeTasksOnClient(client, EXEC_CNT, 5_000L)));

waitForAllFutures(futs.toArray(new IgniteInternalFuture[0]));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

            runMultiThreaded(i -> executeTasksOnClient(clients.get(i), EXEC_CNT), CLIENT_CNT, "worker");

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided not to go for multi-threaded execution as the perpose of the test is to verify certain behaviour during subsequent executions of the same task. So I ended up using a simple for-loop.

ClusterNode[] allServerNodes = grid(0).cluster().forServers().nodes().toArray(new ClusterNode[0]);

for (int i = 0; i < CLIENT_CNT; i++)
clients.add(startClient(allServerNodes));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can connect to any server node,it's not necessary to provide all nodes, one is enough, i.e. clients.add(startClient(0));

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +149 to +177
/** */
private static class DeploymentListeningLogger extends ListeningTestLogger {
/** */
private final ConcurrentLinkedQueue<String> depNotFound = new ConcurrentLinkedQueue<>();

/** */
public DeploymentListeningLogger(IgniteLogger log) {
super(log);
}

/** {@inheritDoc} */
@Override public void debug(String msg) {
if (msg.contains("Deployment was not found for class with specific class loader"))
depNotFound.add(msg);

super.debug(msg);
}

/** {@inheritDoc} */
@Override public ListeningTestLogger getLogger(Object ctgr) {
return this;
}

/** */
public List<String> depNotFound() {
return depNotFound.stream().collect(Collectors.toUnmodifiableList());
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's incorrect usage of listening logger, all you need is register listener like:

            LogListener lsnr = LogListener.matches(notFoundMsg).times(CLIENT_CNT).build();

            listeningTestLog.registerListener(lsnr);

listeningTestLog should be created on top of standard logger, for example:

        setLoggerDebugLevel();

        listeningTestLog = new ListeningTestLogger(log);

And passed to ignite configuration. No need for logger for each node.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

meta.alias(rsrcName);
meta.className(clsName);
meta.senderNodeId(ctx.localNodeId());
meta.classLoader(ldr);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the local app classloader check to GridDeploymentLocalStore#deployment so that in the initial call the meta does not contains classloader.

private final ConcurrentMap<String, Deque<GridDeployment>> cache = new ConcurrentHashMap<>();

/** Deployment cache by classloader. */
private final ConcurrentMap<ClassLoader, Deque<GridDeployment>> cacheByLdr = new ConcurrentHashMap<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cacheByLdr always used under the lock mux, no ConcurrentMap overhead required here.
Also maybe it worth to use IdentityHashMap in case someone redefine classloader's equals() in a wrong way.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thank you for the hint.


dep = d;
for (GridDeployment d : depsByLdr) {
if (!d.undeployed() && d.classLoader() == ldr) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's undeployed, it's cleaned from cache, how we can find it?
Why do we need to check classloader if we put in cache only items with exactly this classloader?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those were 'extra safety' checks. Changed the lookup logic altogether (see below).

dep = candidate;
}
}
else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this check? If deployment not found by classloader in classloader cache it can't be found in aliases cache. We preserve both caches synchronized and modify it only under the lock.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this else block.

if (d.classLoader() == ldr) {
// Cache class and alias.
fireEvt = d.addDeployedClass(cls, alias);
Deque<GridDeployment> depsByLdr = cacheByLdr.get(ldr);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it's one-to-one relation for deployment and classloader. Did I miss something?

Copy link
Copy Markdown
Contributor Author

@oleg-vlsk oleg-vlsk Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily. In GridDeploymentLocalStore#cache we can have several deployments with the same classloader associated with one alias/class name (see attached screenshots). Most recent deployment are added to the beginning of the queue (the addFirst() call in GridDeploymentLocalStore#deploy).

local deployments cache - 1 local deployments cache - 2

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about cacheByLdr, not cache. For cacheByLdr it looks like only one deployment is possible for one classloader.

Comment on lines +271 to +274
ClassLoader ldr = Thread.currentThread().getContextClassLoader();

if (ldr == null)
ldr = U.resolveClassLoader(ctx.config());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Let's move ldr initialization outside the loop.
  2. Just add || dep.classLoader() == ldr to the if condition

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved classloader initialization outside the for loop, but kept dep.classLoader() == ldr as a separate if-block to be able to log correct trace messages. As I understand, if we add the || dep.classLoader() == ldr condition to the initial if-block, the trace message will contain either null as classloader id or incorrect classloader id. As well as it won’t be obvious from such message that the local app classloader was used in this case.

if (d.classLoader() == ldr) {
// Cache class and alias.
fireEvt = d.addDeployedClass(cls, alias);
Deque<GridDeployment> depsByLdr = cacheByLdr.get(ldr);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm talking about cacheByLdr, not cache. For cacheByLdr it looks like only one deployment is possible for one classloader.

Comment on lines +111 to +112
assertTrue(lsnr0.check(5_000));
assertTrue(lsnr1.check(5_000));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to wait here? As far as I understand here strict happens-before between task completion and log message.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed timeout.

@oleg-vlsk
Copy link
Copy Markdown
Contributor Author

oleg-vlsk commented Feb 25, 2026

@alex-plekhanov Regarding you comment:

"I'm talking about cacheByLdr, not cache. For cacheByLdr it looks like only one deployment is possible for one classloader."

I added:

 private final Map<ClassLoader, GridDeployment> depsByLdr = new IdentityHashMap<>();

to map classloader to a specific deployment. But I laso kept:

private final Map<ClassLoader, Deque<GridDeployment>> cacheByLdr = new IdentityHashMap<>();

to have quick access to all deployments with the given classloader so we can add them all to cache in order to preserve the current contract:

if (cachedDeps != null) {
    assert dep != null;

    cache.put(alias, cachedDeps);

    if (!cls.getName().equals(alias)) 
        cache.put(cls.getName(), cachedDeps);
    
    ...
}

}
}

if (dep != null && !dep.undeployed()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dep.undeployed() should be checked inside the loop, or we stop searching after first undeployed deployment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the dep.undeployed check to the loop.

dep = deps.get(meta.classLoader());

return dep;
if (dep == null && meta.classLoaderId() != null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(dep == null || dep.undeployed) && ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

*/
private void undeploy(ClassLoader ldr) {
Collection<GridDeployment> doomed = new HashSet<>();
Collection<GridDeployment> doomed = U.newIdentityHashSet();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there can be only one doomed instance? Or I miss something?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored the undeploy() method based on the 1:1 classloder to deployment relation.

synchronized (mux) {
for (Iterator<Deque<GridDeployment>> i1 = cache.values().iterator(); i1.hasNext();) {
Deque<GridDeployment> deps = i1.next();
for (Iterator<Map<ClassLoader, GridDeployment>> i1 = depsByAlias.values().iterator(); i1.hasNext();) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe reverse delete from caches order? This will also allow to call undeploy only once per instance. Like this:

            GridDeployment dep = depByLdr.remove(ldr);
            
            if (dep != null) {
                dep.undeploy();
                
                doomed.add(dep);

                if (log.isInfoEnabled())
                    log.info("Removed undeployed class: " + dep);

                for (Iterator<Map<ClassLoader, GridDeployment>> i1 = depsByAlias.values().iterator(); i1.hasNext(); ) {
                    Map<ClassLoader, GridDeployment> deps = i1.next();
                    
                    deps.remove(ldr);

                    if (deps.isEmpty())
                        i1.remove();
                }
            }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done as part of refactoring.

Valuyskiy.O.Y added 9 commits April 3, 2026 08:43
…for locally available tasks with peerClassLoadingEnabled=true
…calStore#deployment, correct cache lookup mechanism in GridDeploymentLocalStore#deploy, simplify GridDeploymentLocalStoreReuseTest#testNoExcessiveLocalDeploymentCacheMisses
…testCheckTaskClassloaderCache with regard to using GridDeploymentLocalStore#depsByAlias via reflection
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Apr 3, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants