Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems reading filenames with accents on Windows #1345

Closed
wilsonfreitas opened this issue Jan 2, 2022 · 6 comments
Closed

Problems reading filenames with accents on Windows #1345

wilsonfreitas opened this issue Jan 2, 2022 · 6 comments

Comments

@wilsonfreitas
Copy link

I get an empty dataframe when I read a csv file with accents in its name.
Once I remove the accent the function works correctly.

As the problem happens in my local file system so I didn´t know how to generate a reprex.
Here it follows attached a screenshot with an example.

image

Here we have the code.

readr::read_csv("Renda Fixa Pré.csv")

readr::read_csv("Renda Fixa Pre.csv")

fs::dir_ls(".", regexp = "csv$")

The csv file can be downloaded here.

@jennybc
Copy link
Member

jennybc commented Jan 3, 2022

Sidebar re: this

As the problem happens in my local file system so I didn´t know how to generate a reprex.

reprex(wd = ".") is a good option when you really must demo something using local files.

@wilsonfreitas
Copy link
Author

Realy tkx @jennybc
Now I generate the reprex.

We see that the filename with accent in it returns an empty dataframe and the same file with the filename without accent returns the dataframe correctly.

Is there something I am missing?

readr::read_csv("Renda Fixa Pré.csv")
#> Rows: 0 Columns: 0
#> 
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 0 x 0

readr::read_csv("Renda Fixa Pre.csv")
#> Rows: 15 Columns: 5
#> -- Column specification --------------------------------------------------------
#> Delimiter: ","
#> chr (2): Nome, Tipo
#> dbl (2): Prazo, InvestimentoInicial
#> 
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 15 x 5
#>    Nome                                       Taxa Tipo  Prazo InvestimentoInic~
#>    <chr>                                     <dbl> <chr> <dbl>             <dbl>
#>  1 CDB Caruana Pre                              22 PRÉ     365              1000
#>  2 CDB Fator Pré-fixado                         22 PRÉ     365              5000
#>  3 CDB NBC Pré-fixado                           22 PRÉ     365              1000
#>  4 CDB BDMG Pré                                229 PRÉ     365             10000
#>  5 CDB BRPartners Pre                          233 PRÉ     365             20000
#>  6 CDB BCG Brasil Pré-fixado                   251 PRÉ     365            100000
#>  7 CDB Pine Pré-fixado                         258 PRÉ     365              5000
#>  8 CDB Modal Pré-fixado                        279 PRÉ     365              1000
#>  9 CDB MAXINVEST-RNX PRE                        29 PRÉ     365              1000
#> 10 CDB Agibank Pré-Fixado                      301 PRÉ     365              1000
#> 11 CDB Banco Industrial do Brasil Pré-fixado   309 PRÉ     365              5000
#> 12 CDB Luso Pre                                 31 PRÉ     365              5000
#> 13 CDB Omni Pré-Fixado                          32 PRÉ     365              5000
#> 14 CDB Avista Pre                              369 PRÉ     365              1000
#> 15 LC Avista Pre                               369 PRÉ     365              1000

fs::dir_ls(".", regexp = "csv$")
#> Renda Fixa Pre.csv  Renda Fixa Pré.csv

Created on 2022-01-03 by the reprex package (v2.0.1)

@jennybc
Copy link
Member

jennybc commented Jan 3, 2022

I'm still looking into the path issue, to see if I can reproduce it. I have a lot of upgrading to do on my Windows VM....

@jennybc
Copy link
Member

jennybc commented Jan 3, 2022

I see what you see, FYI, on Windows. I'm not really working on readr at the moment, but I might add a bit more analysis while I'm here.

I note that a work around is to explicitly call fs::path_tidy(). (I also note there's evidence of something fishy here with fs, as the result of fs::dir_ls() contains some mojibake).

library(readr)
library(fs)

read_csv(path_tidy("C:/Users/jenny/Downloads/Renda Fixa Pré.csv"))
#> Rows: 15 Columns: 5
#> -- Column specification --------------------------------------------------------
#> Delimiter: ","
#> chr (2): Nome, Tipo
#> dbl (2): Prazo, InvestimentoInicial
#> 
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 15 x 5
#>    Nome                                       Taxa Tipo  Prazo InvestimentoInic~
#>    <chr>                                     <dbl> <chr> <dbl>             <dbl>
#>  1 CDB Caruana Pre                              22 PRÉ     365              1000
#>  2 CDB Fator Pré-fixado                         22 PRÉ     365              5000
#>  3 CDB NBC Pré-fixado                           22 PRÉ     365              1000
#>  4 CDB BDMG Pré                                229 PRÉ     365             10000
#>  5 CDB BRPartners Pre                          233 PRÉ     365             20000
#>  6 CDB BCG Brasil Pré-fixado                   251 PRÉ     365            100000
#>  7 CDB Pine Pré-fixado                         258 PRÉ     365              5000
#>  8 CDB Modal Pré-fixado                        279 PRÉ     365              1000
#>  9 CDB MAXINVEST-RNX PRE                        29 PRÉ     365              1000
#> 10 CDB Agibank Pré-Fixado                      301 PRÉ     365              1000
#> 11 CDB Banco Industrial do Brasil Pré-fixado   309 PRÉ     365              5000
#> 12 CDB Luso Pre                                 31 PRÉ     365              5000
#> 13 CDB Omni Pré-Fixado                          32 PRÉ     365              5000
#> 14 CDB Avista Pre                              369 PRÉ     365              1000
#> 15 LC Avista Pre                               369 PRÉ     365              1000

Created on 2022-01-03 by the reprex package (v2.0.1)

@DavisVaughan
Copy link
Member

DavisVaughan commented Jan 3, 2022

vroom is probably missing a call to enc2utf8(file), possibly in vroom:::standardise_path()

(That happens in path_tidy() through fs:::new_fs_path())

@jennybc
Copy link
Member

jennybc commented Jan 3, 2022

Yeah this is a vroom issue. Moving it there. Manually, I guess.

@wilsonfreitas Another workaround for you, until this is fixed in vroom, is to specifically request the first edition of readr.

library(readr)

with_edition(
  1,
  read_csv("C:/Users/jenny/Downloads/Renda Fixa Pré.csv")
)
#> 
#> -- Column specification --------------------------------------------------------
#> cols(
#>   Nome = col_character(),
#>   Taxa = col_number(),
#>   Tipo = col_character(),
#>   Prazo = col_double(),
#>   InvestimentoInicial = col_double()
#> )
#> # A tibble: 15 x 5
#>    Nome                                       Taxa Tipo  Prazo InvestimentoInic~
#>    <chr>                                     <dbl> <chr> <dbl>             <dbl>
#>  1 CDB Caruana Pre                              22 PRÉ     365              1000
#>  2 CDB Fator Pré-fixado                         22 PRÉ     365              5000
#>  3 CDB NBC Pré-fixado                           22 PRÉ     365              1000
#>  4 CDB BDMG Pré                                229 PRÉ     365             10000
#>  5 CDB BRPartners Pre                          233 PRÉ     365             20000
#>  6 CDB BCG Brasil Pré-fixado                   251 PRÉ     365            100000
#>  7 CDB Pine Pré-fixado                         258 PRÉ     365              5000
#>  8 CDB Modal Pré-fixado                        279 PRÉ     365              1000
#>  9 CDB MAXINVEST-RNX PRE                        29 PRÉ     365              1000
#> 10 CDB Agibank Pré-Fixado                      301 PRÉ     365              1000
#> 11 CDB Banco Industrial do Brasil Pré-fixado   309 PRÉ     365              5000
#> 12 CDB Luso Pre                                 31 PRÉ     365              5000
#> 13 CDB Omni Pré-Fixado                          32 PRÉ     365              5000
#> 14 CDB Avista Pre                              369 PRÉ     365              1000
#> 15 LC Avista Pre                               369 PRÉ     365              1000

Created on 2022-01-03 by the reprex package (v2.0.1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants