Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YARN-11736. Enhance MultiNodeLookupPolicy to allow configuration of extended comparators for better usability. #7121

Open
wants to merge 3 commits into
base: trunk
Choose a base branch
from

Conversation

TaoYang526
Copy link
Contributor

Description of PR

Please refer to JIRA: YARN-11736

How was this patch tested?

UT

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ docker 1m 8s Docker failed to build run-specific yetus/hadoop:tp-12082}.
Subsystem Report/Notes
GITHUB PR #7121
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7121/1/console
versions git=2.34.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@TaoYang526
Copy link
Contributor Author

@yangwwei @sunilgovind @shameersss1 Could you please help to review this PR? Thanks!

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 0s Docker mode activated.
-1 ❌ docker 1m 18s Docker failed to build run-specific yetus/hadoop:tp-9856}.
Subsystem Report/Notes
GITHUB PR #7121
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7121/2/console
versions git=2.34.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@TaoYang526
Copy link
Contributor Author

TaoYang526 commented Oct 17, 2024

Hi, @yangjiandan It seems that you were using and contributing for multi-node mechanism recently, could you please help to review this PR? Thanks.

@shameersss1
Copy link
Contributor

Sure, will review this week.

Copy link
Contributor

@shameersss1 shameersss1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the modification about the configs and new class here : https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html (you can find the changes to be done for this page in the hadoop code base itself) so that the end users know about its existence.

@@ -2915,11 +2915,20 @@ private void updateResourceValuesFromConfig(Set<String> resourceTypes,
}
}

public static final String MULTI_NODE_SORTING_POLICY_SUFFIX =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have javadoc explaining the use of this config

PREFIX + MULTI_NODE_SORTING_POLICY_SUFFIX;

public static final String MULTI_NODE_SORTING_POLICY_CURRENT_NAME =
MULTI_NODE_SORTING_POLICY_NAME + ".current-name";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have javadoc explaining the use of this config

private String policyClassName;
private long sortingInterval;

public MultiNodePolicySpec(String policyClassName, long timeout) {
public MultiNodePolicySpec(String policyName, String policyClassName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why both policyName and policyClassName is required here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MultiNodeSorter#initPolicy, Policy instance will be created based on policyClassName, and policyName will be used to get conf belong to this policy instance in MultiComparatorPolicy#setConf. This is the only way for every policy instance to know which configuration belong to it, another way is to update the policy interface that I prefer not to use. If there are better approaches, feel free to propose them.

}
Configuration policyConf = new Configuration(this.getConfig());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reuse config object instead of creating new one ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MultiNodeSortingManager#createAllPolicies, we can see all the MultiNodeSorter instances owns a shared config, policyName will be set in policyConf, which is a instance-level configuration, so that policyInstance can get the configurations belong to itself.

// conf keys and default values
public static final String COMPARATORS_CONF_KEY = "comparators";
protected static final List<Comparator> DEFAULT_COMPARATORS = Collections
.unmodifiableList(Arrays.asList(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this means the default sorting policy is based on resource utilization, the node having less resource utilization will be given priority

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, default comparators will only be used when there's no comparators configured for this policy or configured incorrectly. Because I believe optimizing the workload distribution among nodes is the primary use case.

}
this.conf = conf;
String policyName = conf.get(
CapacitySchedulerConfiguration.MULTI_NODE_SORTING_POLICY_CURRENT_NAME);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the difference between MULTI_NODE_SORTING_POLICY_CURRENT_NAME and ``MULTI_NODE_SORTING_POLICY`

Copy link
Contributor Author

@TaoYang526 TaoYang526 Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CurrentlyMULTI_NODE_SORTING_POLICY_NAME is the prefix of all the configurations of multi-node policy: yarn.scheduler.capacity.multi-node-sorting.policy, it's not a proper name but used in many places so I prefer not to update it. MULTI_NODE_SORTING_POLICY_CURRENT_NAME is used to transfer the policyName to policy instance.

@TaoYang526
Copy link
Contributor Author

Please add the modification about the configs and new class here : https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html (you can find the changes to be done for this page in the hadoop code base itself) so that the end users know about its existence.

@shameersss1 There is no introduce about multi-node policy in the document yet, therefore I am unable to update or add any further information.
I would like to add document for multi-node mechanism, but it's not suitable in this PR, we may need to create another JIRA ticket for the documentation. Does that sound good?

@slfan1989
Copy link
Contributor

slfan1989 commented Oct 18, 2024

@TaoYang526 @shameersss1 I am eager to help, even though I’m not very familiar with this part of YARN. I will do my best. If we can confirm together that the modified code is fine, we can proceed with merging it. I've reviewed @shameersss1 code, and the quality is quite good.

@TaoYang526
Copy link
Contributor Author

@slfan1989 Thank you for joining us, it's great to have you help out.

@TaoYang526
Copy link
Contributor Author

@shameersss1 Thanks for the review. I have added javadoc for key fields in the last commit.
Please take another look and let me know if there’s anything else that needs attention.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 11m 56s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 3 new or modified test files.
_ trunk Compile Tests _
+1 💚 mvninstall 50m 3s trunk passed
-1 ❌ compile 0m 36s /branch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04.txt hadoop-yarn-server-resourcemanager in trunk failed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04.
+1 💚 compile 1m 11s trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 checkstyle 1m 3s trunk passed
+1 💚 mvnsite 1m 13s trunk passed
+1 💚 javadoc 1m 11s trunk passed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04
+1 💚 javadoc 1m 3s trunk passed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05
+1 💚 spotbugs 2m 31s trunk passed
-1 ❌ shadedclient 8m 19s branch has errors when building and testing our client artifacts.
_ Patch Compile Tests _
-1 ❌ mvninstall 1m 2s /patch-mvninstall-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt hadoop-yarn-server-resourcemanager in the patch failed.
-1 ❌ compile 0m 10s /patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04.txt hadoop-yarn-server-resourcemanager in the patch failed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04.
-1 ❌ javac 0m 10s /patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04.txt hadoop-yarn-server-resourcemanager in the patch failed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04.
-1 ❌ compile 0m 17s /patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkPrivateBuild-1.8.0_422-8u422-b05-1~20.04-b05.txt hadoop-yarn-server-resourcemanager in the patch failed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05.
-1 ❌ javac 0m 17s /patch-compile-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkPrivateBuild-1.8.0_422-8u422-b05-1~20.04-b05.txt hadoop-yarn-server-resourcemanager in the patch failed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05.
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 57s /results-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: The patch generated 36 new + 139 unchanged - 0 fixed = 175 total (was 139)
-1 ❌ mvnsite 0m 9s /patch-mvnsite-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt hadoop-yarn-server-resourcemanager in the patch failed.
-1 ❌ javadoc 0m 55s /patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkUbuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04.txt hadoop-yarn-server-resourcemanager in the patch failed with JDK Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04.
-1 ❌ javadoc 0m 51s /patch-javadoc-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-jdkPrivateBuild-1.8.0_422-8u422-b05-1~20.04-b05.txt hadoop-yarn-server-resourcemanager in the patch failed with JDK Private Build-1.8.0_422-8u422-b05-1~20.04-b05.
-1 ❌ spotbugs 2m 36s /new-spotbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.html hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0)
-1 ❌ shadedclient 36m 41s patch has errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 0m 27s /patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt hadoop-yarn-server-resourcemanager in the patch failed.
+0 🆗 asflicense 0m 29s ASF License check generated no output?
122m 30s
Reason Tests
SpotBugs module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.policy.CompositeComparator implements Comparator but not Serializable At MultiComparatorPolicy.java:Serializable At MultiComparatorPolicy.java:[lines 334-359]
Subsystem Report/Notes
Docker ClientAPI=1.47 ServerAPI=1.47 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7121/3/artifact/out/Dockerfile
GITHUB PR #7121
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux aed8082e3e2d 5.15.0-117-generic #127-Ubuntu SMP Fri Jul 5 20:13:28 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 95ef881
Default Java Private Build-1.8.0_422-8u422-b05-1~20.04-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.24+8-post-Ubuntu-1ubuntu320.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_422-8u422-b05-1~20.04-b05
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7121/3/testReport/
Max. process+thread count 91 (vs. ulimit of 5500)
modules C: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager U: hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7121/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants