-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-19226: [ABFS][FNSOverBlob] Implementing Azure Rest APIs on Blob Endpoint for AbfsBlobClient #6944
base: trunk
Are you sure you want to change the base?
HADOOP-19226: [ABFS][FNSOverBlob] Implementing Azure Rest APIs on Blob Endpoint for AbfsBlobClient #6944
Conversation
...-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsBlobClient.java
Outdated
Show resolved
Hide resolved
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
Hi @saikatroy038 For some background work. |
try { | ||
op.execute(tracingContext); | ||
} catch (AbfsRestOperationException ex) { | ||
// If 412 Condition Not Met error is seen on retry verify using MD5 hash that the previous request by the same client might be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you rephrase this for better readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrased...
@@ -405,7 +405,7 @@ private synchronized boolean getNamespaceEnabledInformationFromServer( | |||
} | |||
try { | |||
LOG.debug("Get root ACL status"); | |||
getClient().getAclStatus(AbfsHttpConstants.ROOT_PATH, tracingContext); | |||
getClient(AbfsServiceType.DFS).getAclStatus(AbfsHttpConstants.ROOT_PATH, tracingContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why DFS client?
In case of FNS, can't we just catch the UnsupportedOperationException and set/return isNamespaceEnabled to false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This piece of code is supposed to run during file system initialization. This is to determine the account type being used and this is supposed to run for all the combinations of account type and service endpoint configured. We do not want to use Blob Endpoint for any API on HNS enabled account and DFS Endpoint is the only endpoint supported for both FNS and HNS accounts. Hence, we make this call with DFS Endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @anujmodi2021 for taking care this feature. Good work! Added a few comments, pls address it.
@@ -63,6 +68,9 @@ public AbfsClientHandler(final URL baseUrl, | |||
this.dfsAbfsClient = createDfsClient(baseUrl, sharedKeyCredentials, | |||
abfsConfiguration, null, sasTokenProvider, encryptionContextProvider, | |||
abfsClientContext); | |||
this.blobAbfsClient = createBlobClient(baseUrl, sharedKeyCredentials, | |||
abfsConfiguration, null, sasTokenProvider, encryptionContextProvider, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to initialize and instantiate both dfsAbfsClient
and blobAbfsClient
objects?. I meant, if user chosen dfs then initialize dfs object and when they select blob we can inistantiate only blob object. Are we planning to support dual behavior within a AzureBlobFileSystemStore
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the thing is we have a bunch of stuff for which we need fallbacks to happen.
For example, to determine HNS nature of account, we need DFS Client irrespective of what user configures.
When we will be creating a PR for ingress operation handling, there will be fallback scenarios as service does not allow cross protocol ingress. So a fie created on one endpoint can only be further appended by the same endpoint.
Hence to handle all these we need to have both the clients ready
...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClientHandler.java
Outdated
Show resolved
Hide resolved
...ols/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClientHandler.java
Outdated
Show resolved
Hide resolved
...ools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/constants/FSOperationType.java
Show resolved
Hide resolved
* Following parameters are used by AbfsBlobClient only. | ||
* Blob Endpoint Append API requires blockId and eTag to be passed in the request. | ||
*/ | ||
private String blockId; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How abt adding a new pojo object where we can keep all the related newly added Blob specific req parameters. Then add that object reference here ?
Basically its for better maintainbility, in future one can add more params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a inner class for grouping the blob endpoint parameters together.
Please let me know if this is not what you suggested.
@@ -42,4 +42,19 @@ public enum AbfsRestOperationType { | |||
DeletePath, | |||
CheckAccess, | |||
LeasePath, | |||
CreateContainer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here also, hw about grouping the enums ?
Global enum
dfs enum
blob enum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds good from code readability POV.
But what I get from this is that you are suggesting instead of having a single enum class name AbfsRestOperationType
, we introduce three enums, one for each category like AbfsGlobalRestOperationType
, AbfsDFSRestOperationType
and AbfsBlobRestOperationType
.
Please let me know if I am getting it wrong.
But, what i feel is doing so will lead to a lot of code change at every place these are used. I thin we can take it up as a separate work item and try to minimize code changes in these PRs bringing in new support.
What do you suggest??
contextEncryptionAdapter, tracingContext); | ||
requestHeaders.add(new AbfsHttpHeader(CONTENT_LENGTH, String.valueOf(buffer.length))); | ||
requestHeaders.add(new AbfsHttpHeader(IF_MATCH, reqParams.getETag())); | ||
if (reqParams.getLeaseId() != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General Comment: I'm wondering about the coverage of this new class, blob endpoint. We have configs, if-else block, exception block and its good to improve coverage. It can be even an sub-task, which can be considered immediately after this PR, before the feature release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point actually. The whole FNS over Blob support will need thorough code coverage.
The thing is at current stage, this code is isolated and its hard-coded to avoid use of blob endpoint.
Once we have a few PRs merged in to support Ingress and Egress over blob endpoint, we would be in a good shape to add tests.
*/ | ||
public class ITestAbfsClientHandler extends AbstractAbfsIntegrationTest { | ||
|
||
public ITestAbfsClientHandler() throws Exception{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default const is not required, right
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/utils/TracingContext.java
Outdated
Show resolved
Hide resolved
HI @saikatroy038 @rakeshadr |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partial review; its late and I need to eat...
public static final String TOKEN_VERSION = "2"; | ||
|
||
//Abfs Http Client Constants for Blob Endpoint APIs. | ||
public static final String CONTAINER = "container"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add javadocs with {@value} tags. I know the other stuff doesn't but it is time to do better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
@@ -70,10 +70,16 @@ public final class HttpHeaderConfigurations { | |||
public static final String X_MS_LEASE_ACTION = "x-ms-lease-action"; | |||
public static final String X_MS_LEASE_DURATION = "x-ms-lease-duration"; | |||
public static final String X_MS_LEASE_ID = "x-ms-lease-id"; | |||
public static final String X_MS_SOURCE_LEASE_ID = "x-ms-source-lease-id"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
} | ||
|
||
// Constructor to be used for interacting with AbfsBlobClient | ||
public AppendRequestParameters(final long position, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use javadocs here and above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
@@ -36,7 +36,9 @@ public enum Mode { | |||
private final String leaseId; | |||
private boolean isExpectHeaderEnabled; | |||
private boolean isRetryDueToExpect; | |||
private BlobAppendRequestParameters blobParams; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be final?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, made it final
|
||
/** | ||
* Get Rest Operation for API | ||
* <a href="https://learn.microsoft.com/en-us/rest/api/storageservices/create-container"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, here is one of the links. How about adding a markdown file on the blob API to give a quick overvew, how to use and then links to theese docs as a reference section
List<AbfsHttpHeader> metadataRequestHeaders = getMetadataHeadersList(properties); | ||
requestHeaders.addAll(metadataRequestHeaders); | ||
} catch (CharacterCodingException ex) { | ||
throw new InvalidAbfsRestOperationException(ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide information about which header is playing up? I'm just thinking about debugging this.
If not, maybe log at debug the properties before raising an exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added Debug Log
final int listMaxResults, | ||
final String continuation, | ||
TracingContext tracingContext) throws AzureBlobFileSystemException { | ||
// Todo: [FnsOverBlob] To be implemented as part of response handling of blob endpoint APIs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a followup jira? if so, add a link to it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
final String eTag, | ||
final ContextEncryptionAdapter contextEncryptionAdapter, | ||
final TracingContext tracingContext) throws AzureBlobFileSystemException { | ||
// Todo: [FnsOverBlob] To be implemented as part of ingress work over blob endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
@Override | ||
public AbfsRestOperation breakLease(final String path, | ||
TracingContext tracingContext) throws AzureBlobFileSystemException { | ||
final List<AbfsHttpHeader> requestHeaders = createDefaultHeaders(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This operation is called a lot and so is duplicated code. Is there a way to make it a bit leaner?
e.g some method
add(List<AbfsHttpHeader> l, string key, string val) {
l.add(new AbfsHttpHeader(key, val);
then you could do a terser sequence
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Description of PR
This is the second PR in the series of work done under Parent Jira: HADOOP-19179
Jira for this Patch: https://issues.apache.org/jira/browse/HADOOP-19226
Scope of this task is to implement Blob Endpoint Rest APIs offered by Azure Storage into AbfsBlobClient which will be another child of AbfsClient just like AbfsDfsClient.
The blob endpoint support will remain "Unsupported" until the whole code is checked-in and well tested.
Production Code Changes
AbfsBlobClient
will be another child ofAbfsClient
similar toAbfsDfsClient
but will have Blob API implementations.AzureBlobFileSystemStore
will now have aAbfsClientHandler
which will hold both the clients so that store can call onto the one it needs to talk. Some HDFS API's store function definition is changed to make it common for both DFS and Blob client so that proper abstraction can be established.Test Code Changes
AbfsBlobClient
itself at this point. Follow up PRs on FnsOverBlob Support will add new tests once the whole implementation is complete. Till then users won't be allowed to configure blob endpoint.How was this patch tested?
Existing test suite was ran on DFS Endpoint only. All the failures reported are known.
Metric related tests are fixed in the the #6847