Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decode code lines before passing them to the debugger when use utf8 is in effect #22656

Open
Lamprecht opened this issue Oct 10, 2024 · 0 comments

Comments

@Lamprecht
Copy link
Contributor

When I run the following code in the debugger, I see inconsistent behavior concerning the handling of utf-8 literals vs. variables:

Code test.pl


use strict;
use warnings;
use utf8;


my $var = 'Здравствуйте';

print $var;

Running this in a utf-8 terminal using the debugger:

perl -d test.pl

Loading DB routines from perl5db.pl version 1.81
Editor support available.

Enter h or 'h h' for help, or 'man perldebug' for more help.

main::(test.pl:7):	my $var = 'Здравствуйте';
  DB<1> n                                                                       
main::(test.pl:9):	print $var;
  DB<1> p $var                                                                  
Wide character in print at (eval 10)[/home/chris/perl5/perlbrew/perls/perl-5.41.4/lib/5.41.4/perl5db.pl:742] line 2.
 at (eval 10)[/home/chris/perl5/perlbrew/perls/perl-5.41.4/lib/5.41.4/perl5db.pl:742] line 2.
	eval 'no strict; ($@, $!, $^E, $,, $/, $\\, $^W) = @DB::saved;package main; $^D = $^D | $DB::db_stop;
print {$DB::OUT}   $var;
' called at /home/chris/perl5/perlbrew/perls/perl-5.41.4/lib/5.41.4/perl5db.pl line 742
	DB::eval called at /home/chris/perl5/perlbrew/perls/perl-5.41.4/lib/5.41.4/perl5db.pl line 3427
	DB::DB called at test.pl line 9
Здравствуйте
  DB<2> binmode $DB::OUT, ':utf8'                                               

  DB<3> p $var                                                                  
Здравствуйте
  DB<4> l 7                                                                     
7:	my $var = 'ÐдÑавÑÑвÑйÑе';
  DB<5> q

In my utf8 terminal, without changing the output layer of $DB::OUT I get a readable line 7 without warning (which is surprising) and a wide character warning for the p $var debugger command (which is expected).
Changing the output layer with binmode $DB::OUT, ':utf8' leads to the p $var command printing correctly but the l 7 command prints garbage.
The reason for this behavior is, the utf8 pragma leads to the variable $var being correctly decoded but the literal (code-) line that is passed to the debugger is not.
I think the line array passed to the debugger needs to have lines decoded in the context of the use utf8 pragma.

If I apply the following patch to toke.c I get the correct behavior for the case where use utf8 is in effect. Unfortunately I don't have enough knowledge of the perl internals to check if utf8 is set for the current line and apply the SV_CATUTF8 flag conditionally, but maybe it gets anyone started.

diff --git a/toke.c b/toke.c
index 0ff92d2b25..5ca61e274c 100644
--- a/toke.c
+++ b/toke.c
@@ -2064,7 +2064,7 @@ S_update_debugger_info(pTHX_ SV *orig_sv, const char *const buf, STRLEN len)
         if (orig_sv)
             sv_catsv(sv, orig_sv);
         else
-            sv_catpvn(sv, buf, len);
+            sv_catpvn_flags(sv, buf, len, SV_CATUTF8 | SV_GMAGIC );
         if (!SvIOK(sv)) {
             (void)SvIOK_on(sv);
             SvIV_set(sv, 0);
(END)

Running the example with above patch:

perl-dev git:(blead) ✗ ./perl -I './lib' -d test.pl

Loading DB routines from perl5db.pl version 1.81
Editor support available.

Enter h or 'h h' for help, or 'man perldebug' for more help.

Wide character in print at lib/perl5db.pl line 6244.
 at lib/perl5db.pl line 6244.
	DB::print_lineinfo("main::(test.pl:7):\x{9}my \$var = '\x{417}\x{434}\x{440}\x{430}\x{432}\x{441}\x{442}\x{432}\x{443}\x{439}\x{442}\x{435}';\x{a}") called at lib/perl5db.pl line 4624
	DB::depth_print_lineinfo(1, "main::(test.pl:7):\x{9}my \$var = '\x{417}\x{434}\x{440}\x{430}\x{432}\x{441}\x{442}\x{432}\x{443}\x{439}\x{442}\x{435}';\x{a}") called at lib/perl5db.pl line 3593
	DB::Obj::_my_print_lineinfo(DB::Obj=HASH(0x5637dd8ee088), 7, "main::(test.pl:7):\x{9}my \$var = '\x{417}\x{434}\x{440}\x{430}\x{432}\x{441}\x{442}\x{432}\x{443}\x{439}\x{442}\x{435}';\x{a}") called at lib/perl5db.pl line 3680
	DB::Obj::_DB__grab_control(DB::Obj=HASH(0x5637dd8ee088)) called at lib/perl5db.pl line 2981
	DB::DB called at test.pl line 7
main::(test.pl:7):	my $var = 'Здравствуйте';
  DB<1> n
main::(test.pl:9):	print $var;
  DB<1> p $var
Wide character in print at (eval 9)[lib/perl5db.pl:742] line 2.
 at (eval 9)[lib/perl5db.pl:742] line 2.
	eval 'no strict; ($@, $!, $^E, $,, $/, $\\, $^W) = @DB::saved;package main; $^D = $^D | $DB::db_stop;
print {$DB::OUT}   $var;
' called at lib/perl5db.pl line 742
	DB::eval called at lib/perl5db.pl line 3427
	DB::DB called at test.pl line 9
Здравствуйте
  DB<2> binmode $DB::OUT, ':utf8'

  DB<3> p $var  
Здравствуйте
  DB<4> l 7
7:	my $var = 'Здравствуйте';
  DB<5> q

You can see that now without changing the output layer, the p $var as well as the l 7 command print wide character warnings. Switching the output layer to :utf8 fixes both of them.

perl -v

This is perl 5, version 41, subversion 4 (v5.41.4) built for x86_64-linux

Copyright 1987-2024, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at https://www.perl.org/, the Perl Home Page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants