Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

df: Adds support for mount path prefix matching and input path #3161

Merged
merged 15 commits into from
Mar 11, 2022

Conversation

crazystylus
Copy link
Contributor

@crazystylus crazystylus commented Feb 19, 2022

Adds support for mount path prefix matching and input path canonicalization.
Fixes #3065

  • Sorts mount paths in the order of decreasing string length
  • Canonicalize all paths and clear invalid paths
  • Checking of mount path prefix matches input path

canonicalization

- Sorts mount paths in reverse lexicographical order
- Canonicalize all paths and clear invalid paths
- Checking of mount path prefix matches input path
let mut mounts = read_fs_list();

// Sort mounts in desc ordered lexicographically
mounts.sort_by(|a, b| b.mount_dir.cmp(&a.mount_dir));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why sort here? On my system (Ubuntu), the output of df is not lexicographically ordered:

$ df --output=source | tail -n +2 | head -n4
udev
tmpfs
/dev/mapper/vgubuntu-root
tmpfs

See also my comment here: #3086 (comment)

Copy link
Contributor Author

@crazystylus crazystylus Feb 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My plan is to sort in the order of decreasing mount path length, so that when I do a prefix matching, the longest matching mount path is selected first and rest mount paths are ignored. For example if we have mount paths [\,\root] and we have input /root/docs then we only need to output /root and / needs to be skipped. Sorting helps to assure this part.

Sorting in reverse/desc lexicographical order also works here as longest path will come first, but let me change to simple sort by length.

@crazystylus
Copy link
Contributor Author

crazystylus commented Feb 19, 2022

  • Made changes to sort mount_paths in order of decreasing length instead of reverse lexicographical order.
  • Added comments for explaining sorting and path prefix matching.

…tring length. Added comments for explaining sorting and path prefix matching
Copy link
Collaborator

@jfinkels jfinkels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause a regression in df as it is today, due to the sorting. Unfortunately there is no unit test to cover this case, but the output of df should match the order of files in /etc/mtab, as I mentioned in this comment: #3086 (comment)

Also, the various mutations of paths and the use of the empty string as a way of filtering out paths are tough to follow. Is it possible to move some of this logic into the mount_info_lt() function, which returns true exactly when one MountInfo object should be kept and the other dropped?

@crazystylus
Copy link
Contributor Author

crazystylus commented Feb 21, 2022

This will cause a regression in df as it is today, due to the sorting. Unfortunately there is no unit test to cover this case, but the output of df should match the order of files in /etc/mtab, as I mentioned in this comment: #3086 (comment)

Also, the various mutations of paths and the use of the empty string as a way of filtering out paths are tough to follow. Is it possible to move some of this logic into the mount_info_lt() function, which returns true exactly when one MountInfo object should be kept and the other dropped?

It's not possible to move logic into the mount_info_lt() function as it will complicate things.
Can try running this and compare the output?
df /boot / /boot/efi and df /boot/efi /boot /
I think df will order output according to input paths. If no path list is supplied, then only it will in the same order as /etc/mtab

1. Removed sorting of mount paths
2. Implemented prefix matching using iterators
3. Removed un-needed mut from previous commit
@crazystylus
Copy link
Contributor Author

crazystylus commented Feb 21, 2022

I have made changes as below for cleanup of code and meeting requirements

  • Removed sorting of mount_paths
  • Moved the canonicalization logic so that no mutability is required
  • Moved path matching logic from is_included() to filter_mount_list()
  • Changed loops to iterators to cleanup code where ever possible
  • For each input_path, code selects longest mount_path and de-duplication is handled by is_best()
  • Removed string clear and mutability

@jfinkels
Copy link
Collaborator

Can try running this and compare the output?
df /boot / /boot/efi and df /boot/efi /boot /
I think df will order output according to input paths. If no path list is supplied, then only it will in the same order as /etc/mtab

Oh you are exactly right:

$ df --output=target /boot / /boot/efi
Mounted on
/boot
/
/boot/efi
$ df --output=target / /boot /boot/efi
Mounted on
/
/boot
/boot/efi

Sorry for my misunderstanding earlier. In that case, can you add a test for this in tests/by-util/test_df.rs?

@jfinkels jfinkels dismissed their stale review February 21, 2022 18:46

my misunderstanding

@crazystylus
Copy link
Contributor Author

crazystylus commented Feb 24, 2022

I have verified that df from main branch is falsely passing test: df/df-symlink.sh by giving no output because this test case passes . as input.

I have confirmed, this test case will only start passing once --output=target is implemented.

@crazystylus
Copy link
Contributor Author

Hi @jfinkels ,
I realized there is one more issue. We should be selecting best mount point per input_path, and if the input path is repeated n times, output should also contain n entries.
Try this to verify

$ df --output=target /boot /boot /boot
Mounted on
/boot
/boot
/boot

1. First all input paths are canonicalized
2. If no valid input_paths remain, print filtered_mount_list
3. Else print mount entry for each valid input path
@crazystylus
Copy link
Contributor Author

crazystylus commented Feb 24, 2022

I have checked GNU coreutils df code, and adjusted the helper functions according to it.

  1. Mount list is only filtered if there are no input_paths and all entries are emitted
  2. Else each input_path is canonicalized and then mapped with a mount_entry from unfiltered_list

Ref: https:/coreutils/coreutils/blob/master/src/df.c#L1830-L1840

Both test cases have been added

1. Fixed Clippy:needless_late_init
2. Added testcase: check if mount_points are printed in the order of
   input_paths
3. Added testcase: check if input_path is repeased, is the mount point
   also repeased
@jfinkels
Copy link
Collaborator

I have checked GNU coreutils df code, and adjusted the helper functions according to it.

It's best not to view the source code of GNU coreutils in order to avoid infringing their copyright.

1. Mount list is only filtered if there are no input_paths and all entries are emitted
2. Else each input_path is canonicalized and then mapped with a mount_entry from unfiltered_list

There have been a lot of changes in this pull request and it's hard for me to interpret. Could you give concrete examples of how the behavior of df is changing after these two changes?

Both test cases have been added

I only see one new test case, asserting that if two files are specified in the input then two identical rows appear in the output table.

@crazystylus
Copy link
Contributor Author

crazystylus commented Feb 25, 2022

I have checked GNU coreutils df code, and adjusted the helper functions according to it.

It's best not to view the source code of GNU coreutils in order to avoid infringing their copyright.

1. Mount list is only filtered if there are no input_paths and all entries are emitted
2. Else each input_path is canonicalized and then mapped with a mount_entry from unfiltered_list

There have been a lot of changes in this pull request and it's hard for me to interpret. Could you give concrete examples of how the behavior of df is changing after these two changes?

Both test cases have been added

I only see one new test case, asserting that if two files are specified in the input then two identical rows appear in the output table.

I had to remove the 2nd test case as it will never pass in the CI/CD Pipeline because only one mount point / is available for testing.

The following are the changes made by this PR:

  1. All input paths are canonicalized, so relative paths and symlinks will start working now.
    E.g. df . df ../ df symlink will work successfully
  2. Mount points are matched to input_path by prefix matching and longest one is selected
    E.g. df /root/....../long_path/.... will point to correct mount_point
  3. Mount points are printed in the order of input paths
    E.g. df / /boot /boot/efi df /boot/efi /boot / will have different output
  4. If input path is repeated, it's subsequent matched entry is also repeated
    E.g. df / / / will print 3 entries
  5. If no paths are passed as arguments to df then, filtered mount list is printed preserving the mtab order
    E.g. df
  6. GNU Test: df/df-symlink.sh is evaluated properly. Main branch code falsely passes by printing no output.
  7. df will start passing GNU Test: df/df-symlink.sh when this PR and df: implement the --output command-line argument #3176 are both merged to main branch

Copy link
Collaborator

@jfinkels jfinkels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add unknown words to the spell-check:ignore line at the head of the file.

It's hard to say if anything is missing here without tests (which are understandably difficult to create); in the future pull request #3167 demonstrates how we can at least test that these internal helper functions are consistent.

result.push(mi);
}
}
result
}

/// Assign 1 `MountInfo` entry to each path
/// `lofs` enries are skipped and dummy mount points are skipped
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "lofs" mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here lofs is for Solaris style loopback filesystem which is not present in linux. It's present in Solaris and FreeBSD and is similar to symlink.

@crazystylus
Copy link
Contributor Author

I had to remove other test case also as it's only working in Linux and fails in Windows. Hence I think unit test cases is the best approach.

@tertsdiepraam
Copy link
Member

Is that test supposed to only work on unix? Because if so, you could just put #[cfg(unix)] on it. It'd also be great if you added the explanation about lofs as a comment in the code.

@crazystylus
Copy link
Contributor Author

For some reason dd is failing in windows, but lofs explanation has been added as comments and have added #[cfg(unix)] for the testcase.

@jfinkels jfinkels mentioned this pull request Mar 2, 2022
@crazystylus
Copy link
Contributor Author

Please let me know if any other changes are required in this PR for getting it merged.

@sylvestre sylvestre merged commit 5c5f4ca into uutils:main Mar 11, 2022
@crazystylus crazystylus deleted the df-failed-to-print-fs-info branch March 12, 2022 04:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

df: fails to print filesystem when argument is .
4 participants