Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dir_ls() trips over non-ascii file names when native encoding isn't UTF-8 #366

Open
jennybc opened this issue Jan 4, 2022 · 12 comments
Open
Labels
bug an unexpected problem or unintended behavior

Comments

@jennybc
Copy link
Member

jennybc commented Jan 4, 2022

Discovered while studying tidyverse/readr#1345.

dir_map() (on the C/C++ side) seems to assume libuv is giving it UTF-8 paths, but that's not true on Windows (where I made this reprex).

library(fs)
library(testthat)

twd <- path_temp(pattern = "dir-ls-reprex")
dir_create(twd)
owd <- setwd(twd)

cat("blah\n", file = "äçé")

(native <- list.files())
#> [1] "äçé"
(from_fs <- dir_ls())
#> äçé

# this mojibake is erroneously marked as UTF-8
Encoding(from_fs)
#> [1] "UTF-8"

# here are the bytes I'm expecting
(correct <- iconv(native, to = "UTF-8"))
#> [1] "äçé"

local({
  local_edition(3)
  expect_equal(
    charToRaw(correct),
    charToRaw(from_fs)
  )
})
#> Error: charToRaw(correct) (`actual`) not equal to charToRaw(from_fs) (`expected`).
#> 
#> `actual`:   "c3" "a4" "c3" "a7" "c3" "a9"                     and 2 more...
#> `expected`: "c3" "83" "c2" "a4" "c3" "83" "c2" "a7" "c3" "83" ...

setwd(owd)
dir_delete(twd)
#> Error: [ENOENT] Failed to remove 'C:/Users/jenny/AppData/Local/Temp/RtmpeEIdP0/dir-ls-reprex/äçé': no such file or directory

Created on 2022-01-03 by the reprex package (v2.0.1)

For some meta points, I think the dir_ls() bug is also why I can't use fs to delete the temp directory in this reprex.

@billy34
Copy link

billy34 commented Jan 18, 2022

Yes I bumped into this too (version 1.5.2).
version 1.5.1 does not work either but 1.5.0 does not have this problem

@hbaniecki
Copy link

The same problem exists for me when working with polish characters in filenames [Windows 10, fs package v1.5.2]. Fortunately, it works with v1.5.0.

@gaborcsardi gaborcsardi added the bug an unexpected problem or unintended behavior label Jan 18, 2022
@dpprdan
Copy link

dpprdan commented Feb 3, 2022

Probably not surprisingly, but the same goes for dir_info()

library(fs)
fs::file_touch("bär")
dir()
#> [1] "bär"                       "well-mice_reprex.R"       
#> [3] "well-mice_reprex.spin.R"   "well-mice_reprex.spin.Rmd"
fs::dir_info()
#> # A tibble: 4 x 18
#>   path         type   size permissions modification_time   user  group device_id
#>   <fs::path>   <fct> <fs:> <fs::perms> <dttm>              <chr> <chr>     <dbl>
#> 1 bär         <NA>     NA ---         NA                  <NA>  <NA>    NA     
#> 2 ~ce_reprex.R file    227 rw-         2022-02-03 20:06:41 <NA>  <NA>     2.22e9
#> 3 ~prex.spin.R file    227 rw-         2022-02-03 20:06:43 <NA>  <NA>     2.22e9
#> 4 ~ex.spin.Rmd file    855 rw-         2022-02-03 20:06:43 <NA>  <NA>     2.22e9
#> # ... with 10 more variables: hard_links <dbl>, special_device_id <dbl>,
#> #   inode <dbl>, block_size <dbl>, blocks <dbl>, flags <int>, generation <dbl>,
#> #   access_time <dttm>, change_time <dttm>, birth_time <dttm>
Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value
#>  version  R version 4.1.2 (2021-11-01)
#>  os       Windows 10 x64 (build 19043)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language en
#>  collate  German_Germany.1252
#>  ctype    German_Germany.1252
#>  tz       Europe/Berlin
#>  date     2022-02-03
#>  pandoc   2.17.1.1 @ C:/PROGRA~1/Pandoc/ (via rmarkdown)
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version    date (UTC) lib source
#>  backports     1.4.1      2021-12-13 [1] CRAN (R 4.1.2)
#>  cli           3.1.1      2022-01-20 [1] CRAN (R 4.1.2)
#>  crayon        1.4.2      2021-10-29 [1] CRAN (R 4.1.1)
#>  digest        0.6.29     2021-12-01 [1] CRAN (R 4.1.2)
#>  ellipsis      0.3.2      2021-04-29 [1] CRAN (R 4.1.0)
#>  evaluate      0.14       2019-05-28 [1] CRAN (R 4.1.0)
#>  fansi         1.0.2      2022-01-14 [1] CRAN (R 4.1.2)
#>  fastmap       1.1.0      2021-01-25 [1] CRAN (R 4.1.0)
#>  fs          * 1.5.2.9000 2022-02-03 [1] Github (r-lib/fs@6d1182f)
#>  glue          1.6.1      2022-01-22 [1] CRAN (R 4.1.2)
#>  highr         0.9        2021-04-16 [1] CRAN (R 4.1.0)
#>  htmltools     0.5.2      2021-08-25 [1] CRAN (R 4.1.1)
#>  knitr         1.37       2021-12-16 [1] CRAN (R 4.1.2)
#>  lifecycle     1.0.1      2021-09-24 [1] CRAN (R 4.1.1)
#>  magrittr      2.0.2      2022-01-26 [1] CRAN (R 4.1.2)
#>  pillar        1.6.5      2022-01-25 [1] CRAN (R 4.1.2)
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.1.0)
#>  purrr         0.3.4      2020-04-17 [1] CRAN (R 4.1.0)
#>  R.cache       0.15.0     2021-04-30 [1] CRAN (R 4.1.0)
#>  R.methodsS3   1.8.1      2020-08-26 [1] CRAN (R 4.1.0)
#>  R.oo          1.24.0     2020-08-26 [1] CRAN (R 4.1.0)
#>  R.utils       2.11.0     2021-09-26 [1] CRAN (R 4.1.1)
#>  reprex        2.0.1      2021-08-05 [1] CRAN (R 4.1.0)
#>  rlang         1.0.0      2022-01-26 [1] CRAN (R 4.1.2)
#>  rmarkdown     2.11       2021-09-14 [1] CRAN (R 4.1.1)
#>  rstudioapi    0.13       2020-11-12 [1] CRAN (R 4.1.0)
#>  sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.1.2)
#>  stringi       1.7.6      2021-11-29 [1] CRAN (R 4.1.2)
#>  stringr       1.4.0      2019-02-10 [1] CRAN (R 4.1.0)
#>  styler        1.6.2      2021-09-23 [1] CRAN (R 4.1.1)
#>  tibble        3.1.6      2021-11-07 [1] CRAN (R 4.1.2)
#>  utf8          1.2.2      2021-07-24 [1] CRAN (R 4.1.0)
#>  vctrs         0.3.8      2021-04-29 [1] CRAN (R 4.1.0)
#>  withr         2.4.3      2021-11-30 [1] CRAN (R 4.1.2)
#>  xfun          0.29       2021-12-14 [1] CRAN (R 4.1.2)
#>  yaml          2.2.2      2022-01-25 [1] CRAN (R 4.1.2)
#> 
#>  [1] C:/Users/Daniel.AK-HAMBURG/Documents/R/win-library/4.1
#>  [2] C:/Program Files/R/R-4.1.2/library
#> 
#> ------------------------------------------------------------------------------

@FlorianMyronStork
Copy link

I ran into the same issue today with some folders named after the months in german.
March is called "März" and dir_ls() throws an error:
Error: [ENOENT] Failed to search directory 'C:/some_folder/year_2022/month_März': no such file or directory

I just updated to version 1.5.2 and didn't have those issues before.

As there is already plenty of code explaining the issue i decided to not provide any more. Sorry!

@matk111
Copy link

matk111 commented Apr 13, 2022

Sorry for off-topic: I'm having the same issues, but I can't compile v1.5.0 so that my code works. Any suggestions how to get it done? Win10 machine here.

Or someone "simply" fixes this bug ;)

@billy34
Copy link

billy34 commented Apr 13, 2022

You can install it from the versioned repository

# Installation of the fs package in its version 1.50 because the following versions (.51 and .52) are buggy
install.packages("fs", repos = "https://packagemanager.rstudio.com/all/2021-11-30+Y3JhbiwyOjQ1MjYyMTU7Q0UyMzFCQTg")

@s609078902
Copy link

s609078902 commented May 16, 2022

You can install it from the versioned repository

# Installation of the fs package in its version 1.50 because the following versions (.51 and .52) are buggy
install.packages("fs", repos = "https://packagemanager.rstudio.com/all/2021-11-30+Y3JhbiwyOjQ1MjYyMTU7Q0UyMzFCQTg")

seems the link is invalid, instead

devtools::install_version('fs', '1.5.0')

@Mikea0228
Copy link

I was unable to use devtools to install version 1.5.0 as I first needed to remove the fs package which then devtools needed to call devtools::install_versions()

I was able to install version 1.5.0 using

install.packages("https://cran.r-project.org/src/contrib/Archive/fs/fs_1.5.0.tar.gz",repos=NULL,type="source")

@Mikea0228
Copy link

Are there any plans for this to be addressed? Unfortunately I don't have the skill to interact with the c++ code otherwise I'd give it a go myself.

@matk111
Copy link

matk111 commented Jun 2, 2023

Me again, I don't get it installed on R 4.3.0 - do you know a way, or have these issues been solved in the meantime with 1.6.2?

@dpprdan
Copy link

dpprdan commented Jun 2, 2023

FWIW this seems to be "fixed" for the ucrt builds of R 4.3 (and probably 4.2?) on Windows, because that has UTF-8 as native encoding.

library(fs)
library(testthat)

twd <- path_temp(pattern = "dir-ls-reprex")
dir_create(twd)
owd <- setwd(twd)

cat("blah\n", file = "äçé")

(native <- list.files())
#> [1] "äçé"
(from_fs <- dir_ls())
#> äçé
Encoding(from_fs)
#> [1] "UTF-8"
(correct <- iconv(native, to = "UTF-8"))
#> [1] "äçé"
local({
  local_edition(3)
  expect_equal(
    charToRaw(correct),
    charToRaw(from_fs)
  )
})

# dir_info()
fs::file_touch("bär")
dir()
#> [1] "äçé" "bär"
fs::dir_info()
#> # A tibble: 2 × 18
#>   path       type     size permissions modification_time   user  group device_id
#>   <fs::path> <fct> <fs::b> <fs::perms> <dttm>              <chr> <chr>     <dbl>
#> 1 bär        file        0 rw-         2023-06-02 14:55:22 <NA>  <NA>     2.22e9
#> 2 äçé        file        6 rw-         2023-06-02 14:55:22 <NA>  <NA>     2.22e9
#> # ℹ 10 more variables: hard_links <dbl>, special_device_id <dbl>, inode <dbl>,
#> #   block_size <dbl>, blocks <dbl>, flags <int>, generation <dbl>,
#> #   access_time <dttm>, change_time <dttm>, birth_time <dttm>

setwd(owd)
dir_delete(twd)
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.0 (2023-04-21 ucrt)
#>  os       Windows 10 x64 (build 19044)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language en
#>  collate  German_Germany.utf8
#>  ctype    German_Germany.utf8
#>  tz       Europe/Berlin
#>  date     2023-06-02
#>  pandoc   3.1.2 @ C:/PROGRA~1/Pandoc/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  brio          1.1.3   2021-11-30 [1] CRAN (R 4.3.0)
#>  cli           3.6.1   2023-03-23 [1] CRAN (R 4.3.0)
#>  crayon        1.5.2   2022-09-29 [1] CRAN (R 4.3.0)
#>  digest        0.6.31  2022-12-11 [1] CRAN (R 4.3.0)
#>  evaluate      0.21    2023-05-05 [1] CRAN (R 4.3.0)
#>  fansi         1.0.4   2023-01-22 [1] CRAN (R 4.3.0)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.3.0)
#>  fs          * 1.6.2   2023-04-25 [1] CRAN (R 4.3.0)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.3.0)
#>  htmltools     0.5.5   2023-03-23 [1] CRAN (R 4.3.0)
#>  knitr         1.43    2023-05-25 [1] CRAN (R 4.3.0)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.3.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.3.0)
#>  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.3.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.3.0)
#>  purrr         1.0.1   2023-01-10 [1] CRAN (R 4.3.0)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.3.0)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.3.0)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.3.0)
#>  R.utils       2.12.2  2022-11-11 [1] CRAN (R 4.3.0)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.3.0)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.3.0)
#>  rlang         1.1.1   2023-04-28 [1] CRAN (R 4.3.0)
#>  rmarkdown     2.21    2023-03-26 [1] CRAN (R 4.3.0)
#>  rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.3.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.3.0)
#>  styler        1.10.0  2023-05-24 [1] CRAN (R 4.3.0)
#>  testthat    * 3.1.8   2023-05-04 [1] CRAN (R 4.3.0)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.3.0)
#>  utf8          1.2.3   2023-01-31 [1] CRAN (R 4.3.0)
#>  vctrs         0.6.2   2023-04-19 [1] CRAN (R 4.3.0)
#>  waldo         0.5.1   2023-05-08 [1] CRAN (R 4.3.0)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.3.0)
#>  xfun          0.39    2023-04-20 [1] CRAN (R 4.3.0)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.3.0)
#> 
#>  [1] C:/Users/Daniel/AppData/Local/R/win-library/4.3
#>  [2] C:/Program Files/R/R-4.3.0/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

@matk111
Copy link

matk111 commented Jun 2, 2023

Perfect - thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

9 participants