RTL multi-line text gets bottom-up #901

gmischler · 2023-08-18T16:29:11Z

When arabic text is placed in a multi-line context (multi_cell(), write(), text regions, etc.), in addition to getting correctly printed right-to-left on each line, the actual lines end up in reverse order, ie. bottom-to-top.

I've modified the relevant example from "test_text_shaping.py" to demonstrate:

def test_arabic_right_to_left(tmp_path):
    # issue #549
    pdf = FPDF()
    pdf.add_page()
    pdf.add_font(
        family="KFGQPC", fname=HERE / "KFGQPC Uthmanic Script HAFS Regular.otf"
    )
    pdf.set_font("KFGQPC", size=36)
    pdf.ln(36)
    pdf.set_text_shaping(True)
    pdf.cell(txt="مثال على اللغة العربية. محاذاة لليمين.", new_x="LEFT", new_y="NEXT")
    pdf.ln()
    pdf.multi_cell(w=80, txt="مثال على اللغة العربية. محاذاة لليمين.", new_x="LEFT", new_y="NEXT")

    assert_pdf_equal(pdf, HERE / "arabic.pdf", tmp_path)

Maybe we were a bit too eager to declare this topic solved... 🙄

Now in hindsight, this is not really surprising, given how our line wrapping is implemented.
For any rtl text, we really should start scanning from the back.
Unfortunately we can't just look at the current fragment in isolation, as there might be several rtl fragments in sequence, which we need to wrap as a whole.
Food for thought...

The text was updated successfully, but these errors were encountered:

andersonhc · 2023-08-18T16:40:51Z

I believe we should add the text direction as a property of each fragment and perform the line break algorithm using the correct direction.

While at it maybe we can improve our line break algorithm to make it compliant with the Unicode Line Breaking Algorithm.

andersonhc · 2023-09-27T12:34:03Z

I'm adding an example with English text forced to be shaped right to left.
When the text is in a single fragment it works OK, but when there's multiple fragments (like markdown breaking bold/italic fragments) it orders the fragments left to right and produce the incorrect output.

from fpdf import FPDF

text = "This is a very long string to test how line break is working when text shaping is rendering it right to left. Doesn't make much sense in English."
text_md = "This is a very long string to test **how** line break is working when __text shaping__ is rendering it right to left. Doesn't make much sense **in English**."

pdf = FPDF()
pdf.add_page()
pdf.add_font(family="DejaVuSans", style="", fname="DejaVuSans.ttf")
pdf.add_font(family="DejaVuSans", style="B", fname="DejaVuSans-Bold.ttf")
pdf.add_font(family="DejaVuSans", style="I", fname="DejaVuSans-Oblique.ttf")
pdf.set_font('DejaVuSans', size=12)
pdf.set_text_shaping(True, script="latn", language="eng", direction="rtl")
pdf.multi_cell(w=pdf.epw,txt=text,markdown=True)
pdf.ln(3)
pdf.multi_cell(w=pdf.epw,txt=text_md,markdown=True)
pdf.output("test_rtl_line_break.pdf")

andersonhc · 2024-02-10T04:38:44Z

Fixed in #1096

andersonhc · 2024-02-10T04:39:08Z

done

andrewlutz · 2024-02-22T03:41:02Z

Hey @andersonhc I've upgrade to 2.7.8 but am still seeing this issue when using reshaping. I am using Google's NotoSans AR font.

Repro:

import tempfile

from arabic_reshaper import reshape
from bidi.algorithm import get_display
from fpdf import FPDF

class RTLTest:
    def test(self):
        pdf = FPDF(orientation='P', unit='mm', format='A4')
        pdf.add_font(
                        family='STANDARD',
                        style='',
                        fname=PATH + 'Sans-Pro-Regular.otf',
                    )
        pdf.add_font(
                        family='NOTO_SANS_AR',
                        style='',
                        fname=PATH + 'NotoSansArabic-Regular.ttf',
                    )
        pdf.set_font('STANDARD')
        pdf.set_fallback_fonts(['NOTO_SANS_AR'])

        pdf.add_page()

        text = u'في Company X نعمل بجد لاتخاذ إجراءات ضد المحتوى الضار وغير القانوني. ونتيجة لذلك، قمنا بتقييد إمكانية الوصول إلى المنشور الذي قمتَ بـ تم إنشاؤه.'
        shaped_text = get_display(reshape(text))

        pdf.set_font_size(8)
        pdf.multi_cell(w=0, text=shaped_text, align='R')
        pdf.ln()

        pdf.set_font_size(20)
        pdf.multi_cell(w=0, text=shaped_text, align='R')

        _, tmp_filename = tempfile.mkstemp()
        pdf.output(tmp_filename)
        return tmp_filename

andersonhc · 2024-02-22T10:49:18Z

Hi @andrewlutz, can you please remove the calls to arabic_reshaper and bidi.algorithm and try again? Just pass text instead of shaped_text in multicell

andrewlutz · 2024-02-22T17:52:35Z

I think the issue I'm having is that the string coming from our translation system is in LTR order, I need to run get_display(reshape(text)) in order to get in RTL. I've also tried pdf.set_text_shaping(use_shaping_engine=True, direction="rtl") but that doesn't transform the string to RTL.

Does FPDF assume that the text is already in LTR order?

andersonhc · 2024-02-22T18:01:53Z

I think the issue I'm having is that the string coming from our translation system is in LTR order, I need to run get_display(reshape(text)) in order to get in RTL. I've also tried pdf.set_text_shaping(use_shaping_engine=True, direction="rtl") but that doesn't transform the string to RTL.

Does FPDF assume that the text is already in LTR order?

just pass the unshaped text - if you are using set_text_shaping() fpdf will handle all the LTR-RTL conversion

andersonhc · 2024-02-22T18:14:12Z

This is a test I did:

        pdf.set_font('STANDARD')
        pdf.set_fallback_fonts(['NOTO_SANS_AR'])

        pdf.add_page()

        pdf.set_text_shaping(use_shaping_engine=True)
        text = u'في Company X نعمل بجد لاتخاذ إجراءات ضد المحتوى الضار وغير القانوني. ونتيجة لذلك، قمنا بتقييد إمكانية الوصول إلى المنشور الذي قمتَ بـ تم إنشاؤه.'
        
        pdf.set_font_size(8)
        pdf.multi_cell(w=0, text=text, align='R')
        pdf.ln()

        pdf.set_font_size(20)
        pdf.multi_cell(w=0, text=text, align='R')

        pdf.output("issue-901.pdf")

and it produced:

If you use the fpdf text shaping engine you don't need bidi_algorithm and arabic reshaper

andrewlutz · 2024-02-22T18:22:33Z

If you drop the string produced using just pdf.set_text_shaping(use_shaping_engine=True) into google translate it produces gibberish. If you drop the string produced using bidi_algorithm and reshaper then google translate is 100% correct.

andersonhc · 2024-02-22T20:25:01Z

If you drop the string produced using just pdf.set_text_shaping(use_shaping_engine=True) into google translate it produces gibberish. If you drop the string produced using bidi_algorithm and reshaper then google translate is 100% correct.

If you copy text from the PDF file and past elsewhere it won't work. We still need to implement marked-content tags to proper signalize to the client the original text order.

andrewlutz · 2024-02-24T16:55:02Z

Ah ok thanks for clarifying that. I'll check the final result with our internationalization team.

andrewlutz · 2024-02-29T17:50:39Z

@andersonhc I'm at a bit of a loss here, I checked with an Arabic speaking colleague and the feedback I received was that the text produced when using pdf.set_text_shaping(use_shaping_engine=True) was scrambled text with no logical order.

The text below was produced with text shaping disabled and with get_display(reshape(text)). The first sentence (one line) is correctly translated/printed. When the sentence becomes multi-line it is printing bottom up with order of the words reversed.

andersonhc · 2024-03-02T03:51:07Z

@andrewlutz

I found and fixed the bug with text_shaping that was causing the RTL fragments get incorrectly ordered.
If you install the latest version you can use it:
pip install git+https:/py-pdf/fpdf2.git@master

When you use get_display(reshape())) you transform the whole text into LTR and fpdf is not aware of the correct direction of each part of the text, so the line break is done as normal LTR text.

andrewlutz · 2024-03-04T19:40:03Z

Thanks! Will test this out later today

andrewlutz · 2024-03-05T18:46:22Z

Hey @andersonhc I tried this but am still getting the same bottom-up result. Were you able to verify with the repro I provided? I'm wondering if somehow the latest fpdf didn't get pulled in correctly.

andersonhc · 2024-03-05T21:30:26Z

Hey @andersonhc I tried this but am still getting the same bottom-up result. Were you able to verify with the repro I provided? I'm wondering if somehow the latest fpdf didn't get pulled in correctly.

I installed the latest fpdf version from github master:

pip install git+https:/py-pdf/fpdf2.git@master

and ran the following python file:

pdf = FPDF(orientation='P', unit='mm', format='A4')
pdf.add_font(family='STANDARD', style='', fname=PATH + 'NotoSans-Regular.ttf',)
pdf.add_font(family='NOTO_SANS_AR', style='', fname=PATH + 'NotoSansArabic-Regular.ttf',)
pdf.set_font('STANDARD')
pdf.set_fallback_fonts(['NOTO_SANS_AR'])

pdf.add_page()

pdf.set_text_shaping(use_shaping_engine=True, direction="rtl")
text = u'في Company X نعمل بجد لاتخاذ إجراءات ضد المحتوى الضار وغير القانوني. ونتيجة لذلك، قمنا بتقييد إمكانية الوصول إلى المنشور الذي قمتَ بـ تم إنشاؤه.'
pdf.set_font_size(8)
pdf.multi_cell(w=pdf.epw, text=text, align='R')
pdf.ln()

pdf.set_font_size(20)
pdf.multi_cell(w=pdf.epw, text=text, align='R')

pdf.output("issue-901.pdf")

The result I got is attached. Visually it is similar to what you are getting in your last message and the one breaking on multiple lines is also visually similar.
issue-901.pdf

andrewlutz · 2024-03-08T17:19:33Z

Glad to hear it's working for you, that result looks good. Likely an issue with importing on my end.

When do you estimate a release with this fix will be ready?

andrewlutz · 2024-04-12T18:29:58Z

Hey @andersonhc are you able to do a release soon with the RTL fix? We'd really appreciate as it's currently a blocker for us :)

gmischler added bug up-for-grabs hacktoberfest text-shaping labels Aug 18, 2023

gmischler mentioned this issue Oct 24, 2023

Updating Unicode.md #975

Merged

1 task

andrewlutz mentioned this issue Feb 7, 2024

Process bidirectional text in conformance to Unicode standards #1096

Merged

5 tasks

andersonhc closed this as completed Feb 10, 2024

andersonhc reopened this Feb 10, 2024

andersonhc closed this as completed Feb 10, 2024

andersonhc mentioned this issue Mar 1, 2024

Fix bug ordering RTL runs on bidirectional text #1132

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RTL multi-line text gets bottom-up #901

RTL multi-line text gets bottom-up #901

gmischler commented Aug 18, 2023

andersonhc commented Aug 18, 2023

andersonhc commented Sep 27, 2023 •

edited

Loading

andersonhc commented Feb 10, 2024

andersonhc commented Feb 10, 2024

andrewlutz commented Feb 22, 2024

andersonhc commented Feb 22, 2024

andrewlutz commented Feb 22, 2024

andersonhc commented Feb 22, 2024

andersonhc commented Feb 22, 2024 •

edited

Loading

andrewlutz commented Feb 22, 2024

andersonhc commented Feb 22, 2024

andrewlutz commented Feb 24, 2024

andrewlutz commented Feb 29, 2024

andersonhc commented Mar 2, 2024

andrewlutz commented Mar 4, 2024

andrewlutz commented Mar 5, 2024

andersonhc commented Mar 5, 2024

andrewlutz commented Mar 8, 2024

andrewlutz commented Apr 12, 2024

RTL multi-line text gets bottom-up #901

RTL multi-line text gets bottom-up #901

Comments

gmischler commented Aug 18, 2023

andersonhc commented Aug 18, 2023

andersonhc commented Sep 27, 2023 • edited Loading

andersonhc commented Feb 10, 2024

andersonhc commented Feb 10, 2024

andrewlutz commented Feb 22, 2024

andersonhc commented Feb 22, 2024

andrewlutz commented Feb 22, 2024

andersonhc commented Feb 22, 2024

andersonhc commented Feb 22, 2024 • edited Loading

andrewlutz commented Feb 22, 2024

andersonhc commented Feb 22, 2024

andrewlutz commented Feb 24, 2024

andrewlutz commented Feb 29, 2024

andersonhc commented Mar 2, 2024

andrewlutz commented Mar 4, 2024

andrewlutz commented Mar 5, 2024

andersonhc commented Mar 5, 2024

andrewlutz commented Mar 8, 2024

andrewlutz commented Apr 12, 2024

andersonhc commented Sep 27, 2023 •

edited

Loading

andersonhc commented Feb 22, 2024 •

edited

Loading