-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dir_ls()
trips over non-ascii file names when native encoding isn't UTF-8
#366
Comments
Yes I bumped into this too (version 1.5.2). |
The same problem exists for me when working with polish characters in filenames [Windows 10, fs package |
Probably not surprisingly, but the same goes for library(fs)
fs::file_touch("bär")
dir()
#> [1] "bär" "well-mice_reprex.R"
#> [3] "well-mice_reprex.spin.R" "well-mice_reprex.spin.Rmd"
fs::dir_info()
#> # A tibble: 4 x 18
#> path type size permissions modification_time user group device_id
#> <fs::path> <fct> <fs:> <fs::perms> <dttm> <chr> <chr> <dbl>
#> 1 bär <NA> NA --- NA <NA> <NA> NA
#> 2 ~ce_reprex.R file 227 rw- 2022-02-03 20:06:41 <NA> <NA> 2.22e9
#> 3 ~prex.spin.R file 227 rw- 2022-02-03 20:06:43 <NA> <NA> 2.22e9
#> 4 ~ex.spin.Rmd file 855 rw- 2022-02-03 20:06:43 <NA> <NA> 2.22e9
#> # ... with 10 more variables: hard_links <dbl>, special_device_id <dbl>,
#> # inode <dbl>, block_size <dbl>, blocks <dbl>, flags <int>, generation <dbl>,
#> # access_time <dttm>, change_time <dttm>, birth_time <dttm> Session infosessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#> setting value
#> version R version 4.1.2 (2021-11-01)
#> os Windows 10 x64 (build 19043)
#> system x86_64, mingw32
#> ui RTerm
#> language en
#> collate German_Germany.1252
#> ctype German_Germany.1252
#> tz Europe/Berlin
#> date 2022-02-03
#> pandoc 2.17.1.1 @ C:/PROGRA~1/Pandoc/ (via rmarkdown)
#>
#> - Packages -------------------------------------------------------------------
#> package * version date (UTC) lib source
#> backports 1.4.1 2021-12-13 [1] CRAN (R 4.1.2)
#> cli 3.1.1 2022-01-20 [1] CRAN (R 4.1.2)
#> crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.1)
#> digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.2)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0)
#> fansi 1.0.2 2022-01-14 [1] CRAN (R 4.1.2)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0)
#> fs * 1.5.2.9000 2022-02-03 [1] Github (r-lib/fs@6d1182f)
#> glue 1.6.1 2022-01-22 [1] CRAN (R 4.1.2)
#> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0)
#> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.1)
#> knitr 1.37 2021-12-16 [1] CRAN (R 4.1.2)
#> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.1)
#> magrittr 2.0.2 2022-01-26 [1] CRAN (R 4.1.2)
#> pillar 1.6.5 2022-01-25 [1] CRAN (R 4.1.2)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0)
#> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.1.0)
#> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.0)
#> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.0)
#> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.1.1)
#> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.0)
#> rlang 1.0.0 2022-01-26 [1] CRAN (R 4.1.2)
#> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.1)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.2)
#> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.2)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0)
#> styler 1.6.2 2021-09-23 [1] CRAN (R 4.1.1)
#> tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.2)
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0)
#> withr 2.4.3 2021-11-30 [1] CRAN (R 4.1.2)
#> xfun 0.29 2021-12-14 [1] CRAN (R 4.1.2)
#> yaml 2.2.2 2022-01-25 [1] CRAN (R 4.1.2)
#>
#> [1] C:/Users/Daniel.AK-HAMBURG/Documents/R/win-library/4.1
#> [2] C:/Program Files/R/R-4.1.2/library
#>
#> ------------------------------------------------------------------------------ |
I ran into the same issue today with some folders named after the months in german. I just updated to version 1.5.2 and didn't have those issues before. As there is already plenty of code explaining the issue i decided to not provide any more. Sorry! |
Sorry for off-topic: I'm having the same issues, but I can't compile v1.5.0 so that my code works. Any suggestions how to get it done? Win10 machine here. Or someone "simply" fixes this bug ;) |
You can install it from the versioned repository
|
seems the link is invalid, instead
|
I was unable to use I was able to install version 1.5.0 using
|
Are there any plans for this to be addressed? Unfortunately I don't have the skill to interact with the c++ code otherwise I'd give it a go myself. |
Me again, I don't get it installed on R 4.3.0 - do you know a way, or have these issues been solved in the meantime with 1.6.2? |
FWIW this seems to be "fixed" for the ucrt builds of R 4.3 (and probably 4.2?) on Windows, because that has UTF-8 as native encoding. library(fs)
library(testthat)
twd <- path_temp(pattern = "dir-ls-reprex")
dir_create(twd)
owd <- setwd(twd)
cat("blah\n", file = "äçé")
(native <- list.files())
#> [1] "äçé"
(from_fs <- dir_ls())
#> äçé
Encoding(from_fs)
#> [1] "UTF-8"
(correct <- iconv(native, to = "UTF-8"))
#> [1] "äçé"
local({
local_edition(3)
expect_equal(
charToRaw(correct),
charToRaw(from_fs)
)
})
# dir_info()
fs::file_touch("bär")
dir()
#> [1] "äçé" "bär"
fs::dir_info()
#> # A tibble: 2 × 18
#> path type size permissions modification_time user group device_id
#> <fs::path> <fct> <fs::b> <fs::perms> <dttm> <chr> <chr> <dbl>
#> 1 bär file 0 rw- 2023-06-02 14:55:22 <NA> <NA> 2.22e9
#> 2 äçé file 6 rw- 2023-06-02 14:55:22 <NA> <NA> 2.22e9
#> # ℹ 10 more variables: hard_links <dbl>, special_device_id <dbl>, inode <dbl>,
#> # block_size <dbl>, blocks <dbl>, flags <int>, generation <dbl>,
#> # access_time <dttm>, change_time <dttm>, birth_time <dttm>
setwd(owd)
dir_delete(twd) Session infosessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.3.0 (2023-04-21 ucrt)
#> os Windows 10 x64 (build 19044)
#> system x86_64, mingw32
#> ui RTerm
#> language en
#> collate German_Germany.utf8
#> ctype German_Germany.utf8
#> tz Europe/Berlin
#> date 2023-06-02
#> pandoc 3.1.2 @ C:/PROGRA~1/Pandoc/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> brio 1.1.3 2021-11-30 [1] CRAN (R 4.3.0)
#> cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0)
#> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.0)
#> digest 0.6.31 2022-12-11 [1] CRAN (R 4.3.0)
#> evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.0)
#> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.0)
#> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0)
#> fs * 1.6.2 2023-04-25 [1] CRAN (R 4.3.0)
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0)
#> htmltools 0.5.5 2023-03-23 [1] CRAN (R 4.3.0)
#> knitr 1.43 2023-05-25 [1] CRAN (R 4.3.0)
#> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0)
#> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0)
#> purrr 1.0.1 2023-01-10 [1] CRAN (R 4.3.0)
#> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.0)
#> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.0)
#> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.0)
#> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.3.0)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0)
#> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.0)
#> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0)
#> rmarkdown 2.21 2023-03-26 [1] CRAN (R 4.3.0)
#> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.3.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0)
#> styler 1.10.0 2023-05-24 [1] CRAN (R 4.3.0)
#> testthat * 3.1.8 2023-05-04 [1] CRAN (R 4.3.0)
#> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0)
#> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0)
#> vctrs 0.6.2 2023-04-19 [1] CRAN (R 4.3.0)
#> waldo 0.5.1 2023-05-08 [1] CRAN (R 4.3.0)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.0)
#> xfun 0.39 2023-04-20 [1] CRAN (R 4.3.0)
#> yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.0)
#>
#> [1] C:/Users/Daniel/AppData/Local/R/win-library/4.3
#> [2] C:/Program Files/R/R-4.3.0/library
#>
#> ────────────────────────────────────────────────────────────────────────────── |
Perfect - thanks a lot. |
Discovered while studying tidyverse/readr#1345.
dir_map()
(on the C/C++ side) seems to assume libuv is giving it UTF-8 paths, but that's not true on Windows (where I made this reprex).Created on 2022-01-03 by the reprex package (v2.0.1)
For some meta points, I think the
dir_ls()
bug is also why I can't use fs to delete the temp directory in this reprex.The text was updated successfully, but these errors were encountered: