Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IBM-864 Arabic support #339

Merged
merged 7 commits into from
Aug 6, 2018
Merged

IBM-864 Arabic support #339

merged 7 commits into from
Aug 6, 2018

Conversation

tresf
Copy link
Contributor

@tresf tresf commented Jul 10, 2018

Converts UTF-8 Arabic to IBM-864.

Documentation:
http://qz.io/wiki/Raw-Encoding#escpos-arabic

See preview.

Closes #304

@tresf
Copy link
Contributor Author

tresf commented Jul 10, 2018

@yohanes Travis is failing.... perhaps...

Show code
diff --git a/src/qz/utils/ArabicConversionUtilities.java b/src/qz/utils/ArabicConversionUtilities.java
index d18fb91..0e3bc39 100644
--- a/src/qz/utils/ArabicConversionUtilities.java
+++ b/src/qz/utils/ArabicConversionUtilities.java
@@ -5,8 +5,9 @@ import com.ibm.icu.charset.CharsetProviderICU;
 import com.ibm.icu.text.ArabicShaping;
 import com.ibm.icu.text.ArabicShapingException;
 import com.ibm.icu.text.Bidi;
-import com.sun.xml.internal.messaging.saaj.util.ByteOutputStream;
 
+
+import java.io.ByteArrayOutputStream;
 import java.io.IOException;
 import java.io.UnsupportedEncodingException;
 import java.nio.ByteBuffer;
@@ -410,7 +411,7 @@ public class ArabicConversionUtilities {
         String data = para.writeReordered(Bidi.DO_MIRRORING);
 
         //Split the String into separate blocks, based on code points < 256 and > 256
-        ByteOutputStream bos = new ByteOutputStream();
+        ByteArrayOutputStream bos = new ByteArrayOutputStream();
         int i = 0;
         while (i < data.length()) {
             int ch = data.codePointAt(i);
@@ -442,13 +443,7 @@ public class ArabicConversionUtilities {
             }
         }
         bos.flush();
-
-        //ByteOutputStream.toByteArray() is deprecated so we do arrayCopy manually
-        byte b [] =  bos.getBytes();
-        int count = bos.getCount();
-        byte res[] = new byte[count];
-        System.arraycopy(b, 0, res, 0, count);
-        return res;
+        return bos.toByteArray();
     }
 
     /**

@yohanes
Copy link

yohanes commented Jul 10, 2018

Yes, this should work. I accidentally used the wrong output stream class because of the IDE autocomple.

@tresf
Copy link
Contributor Author

tresf commented Jul 11, 2018

Yes, this should work. I accidentally used the wrong output stream class because of the IDE autocomple.

👍 Ok. I don't have push access to your repo, so I'll need you to commit it.

@yohanes
Copy link

yohanes commented Jul 11, 2018

Ok

yohanes@c80d73b

@tresf
Copy link
Contributor Author

tresf commented Jul 11, 2018

Ok, I'm testing against Epson's Arabization support and I'm having strange results.

I'm still struggling to understand the word ordering for Arabic so this may be a problem with the unit tests.

The Epson Arabization features are toggled on with an app called TMSSUTL.exe (image). It's configured on Memory Page 6 and there are many options. I tried to flip "input reverse" to no avail.

We also have a ticket open with our reseller and they have been somewhat helpful. This PR is the closest we've seen to usable output.

@tresf
Copy link
Contributor Author

tresf commented Jul 11, 2018

Here is my unit test:

// Since 2.0.8
var config = qz.configs.create("88VI", { encoding: "CP864" });
var data = [
   '\x1B' + '\x40',   //init command - necessary for proper byte interpretation
   '\x1B' + '\x74' + '\x25', // Setup "codepage 37", which is Epson's IBM864

   // UTF-8 Arabic "Lorem Ipsum" text from http://istizada.com/arabic-lorem-ipsum/
   'لكن لا بد أن أوضح لك أن كل هذه الأفكار', '\n',
   'المغلوطة حول استنكار  النشوة وتمجيد', '\n',
   'الألم نشأت بالفعل، وسأعرض لك التفاصيل', '\n',
   'لتكتشف حقيقة وأساس تلك السعادة', '\n',
   'البشرية، فلا أحد يرفض أو يكره أو يتجنب', '\n',
   'الشعور بالسعادة، ولكن بفضل هؤلاء', '\n',
   'الأشخاص الذين لا يدركون بأن السعادة', '\n',
   'لا بد أن نستشعرها بصورة أكثر عقلانية', '\n',
   'ومنطقية فيعرضهم هذا لمواجهة الظروف', '\n',
   'الأليمة، وأكرر بأنه لا يوجد من يرغب في', '\n',
   'الحب ونيل المنال ويتلذذ بالآلام، الألم', '\n',
   'هو الألم ولكن نتيجة لظروف ما قد تكمن', '\n',
   'السعاده فيما نتحمله من كد وأسي.', '\n',
   '\n\n\n\n'
];

qz.print(config, data).catch(function(err) { console.error(err); });

@yohanes
Copy link

yohanes commented Jul 11, 2018

I think I understand what happened: I only considered "dumb printers" that needs full layout on the software side. I learned this from the comments on another project (but this was never solved in that project because they didn't try to use ICU):

mike42/escpos-php#6

These kinds of printers can only print from left to right, so we need to layout the visual orders from left to right. On these printers, we need to do manual right justification (but this depends on the width of the paper, so it needs to be done manually by the programmer). In some cases (like printing a receipt) this is what people want.

For the other kind of printer that can already support RTL, I need to skip some operations which are already done by the printer itself. Can you tell me which printer it is (and if there is a documentation for it) so I know what operation that I need to skip.

For now, I can try to guess how the printer interprets the data based on the image that you gave me.

@tresf
Copy link
Contributor Author

tresf commented Jul 11, 2018

Ok, the unit test passes so as long as Aribization and Aribication aren't both set to CP864 and if the default alignment is "Right". I was hoping this would be for Arabic receipts only, but it seems to apply to English receipts too.

Edit: The data is backwards. Fixed in #339 (comment).

https://user-images.githubusercontent.com/6345473/42547112-77e0b57c-848e-11e8-8a06-ef0ab31b2ea2.png

https://user-images.githubusercontent.com/6345473/42547057-2d644bd0-848e-11e8-9b6a-489f949d90d3.png

This kind of printer can only print from left to right, so we need to layout the visual orders from left to right. On this kind of printer, we need to do manual right justification (but this depends on the width of the paper, so it needs to be done manually by the programmer). In some cases (like printing a receipt) this is what people want.

I was wondering this too. Unfortunately Epson is not very good with documentation which is why we've reached out to a reseller (In the USA, the Epson dealers offer language support).

The printer model is Epson TM88VI.

@yohanes
Copy link

yohanes commented Jul 11, 2018

Ok, do you have a suggestion on what to do: it seems that for every combination of settings we need to send a different character ordering to the printer.

I can add more options, For example:

{"encoding":"CP864", "arabic_options": "something"}

Or different encoding names (e.g: "CP864/RTL", "CP864/LTR", etc)

But the end user must know their printer and set the code accordingly.

@tresf
Copy link
Contributor Author

tresf commented Jul 11, 2018

Ok, do you have a suggestion on what to do: it seems that for every combination of settings we need to send a different character ordering to the printer.

The printer seems to work fine with the data being sent once we got the correct settings with the firmware. At this point, I think we need a detailed explanation from Epson what each option means.

different encoding names (e.g: "CP864/RTL", "CP864/LTR", etc)

I'll talk with @bberenz about the best way to add this to our API, but I think for now we can leave the direction as-is.

Side note, is there a way to reduce the size of the dependencies? This PR nearly doubles the size of our software installer. :)

@yohanes
Copy link

yohanes commented Jul 11, 2018

The largest part of ICU is the data (which is around 20 Mb). If you don't mind using a custom JAR file, then the data can be customized to remove unneeded languages.

http://apps.icu-project.org/datacustom/

But this needs to be done carefully because may be in the future you would like to support other languages.

@tresf
Copy link
Contributor Author

tresf commented Jul 11, 2018

This is the first language we've needed the library for and I believe this is because Java's internal character mapping is inadequate for Arabic. We can easily add more in as we need, right? What space saving would we gain if we were to only bundle Arabic support for the purposes of this PR?

@yohanes
Copy link

yohanes commented Jul 11, 2018

From the online tool (removing almost everything except for Arabic support), it seems that we can reduce at least 10 megabytes. Unfortunately, the online tool doesn't work with the latest ICU.

http://bugs.icu-project.org/trac/ticket/12835

But this can be done manually, I will test this but it will take some time (I will probably commit tonight or tomorrow).

@tresf
Copy link
Contributor Author

tresf commented Jul 11, 2018

Ok, @yohanes I see what you're saying about the reversed characters. I just realized my data is backwards 😆. Fixed with these settings.

image

I'd like to know what the typical Arabic-speaking customer would have his printer set to so that we can target the most typical settings. Hopefully Epson can shed light on this.

@yohanes
Copy link

yohanes commented Jul 12, 2018

I have replaced the ICU jar files with slim version (created using the Python script in the script directory). Previous total size of the ICU libraries was 15+ Mb, and now it is less than 5 Mb.

@tresf
Copy link
Contributor Author

tresf commented Jul 13, 2018

I have replaced the ICU jar files with slim version (created using the Python script in the script directory). Previous total size of the ICU libraries was 15+ Mb, and now it is less than 5 Mb.

Great, thanks!

@yohanes can you enable edits from maintainers on this PR (if it's there?)

We're going to do some cosmetic changes before merging and I'd like to be able to edit it before merging it.

Also, I'm trying to find out which text direction for these printers is standard configuration by asking some Arabic speaking colleagues. That will influence the PR slightly as well.

@yohanes
Copy link

yohanes commented Jul 13, 2018

There is no option to Allow edits from maintainers. Can I just add you as a collaborator so that you can push an edit?

@tresf
Copy link
Contributor Author

tresf commented Jul 13, 2018

It's there now and I've enabled it. Thanks!

@akberenz
Copy link
Member

I've pushed some changes under 674ea5f to simplify the structure and remove some redundancies. Also had a discussion with @tresf to just go forward with the comma separated data for any escp commands as opposed to trying to parse them out for now, in an effort to keep the charset values simple.

@tresf
Copy link
Contributor Author

tresf commented Jul 29, 2018

This is ready for merge after testing. The only item I'm still undecided on is the reversing of the byte order. @yohanes can you explain what we're doing now that requires me to use the "Arabic input reversed" option in the Epson settings?

@yohanes
Copy link

yohanes commented Jul 29, 2018

There are two important things that I did to make the byte order to be Left-To-Right. The first one is the Bidi class and the second one is the ArabicShaping class.

On some printers (or maybe most?), they can only print from left to right. If we have an Arabic text that is visually like this (let's just pretend that ABC are Arabic characters):

CBA

It is stored internally in the String object as:

ABC

because logically they type the A first, B second and so on. If the printer can only print from left to right, then we must send it as C, then B, then A to make it look as expected. This is done using the Bidi class. The DO_MIRRORING will mirror some characters such as the open and close brace to make it work when rendered left to right.

String data = para.writeReordered(Bidi.DO_MIRRORING);
If the device can receive characters in Right to Left order, then I think the correct call should be:

String data = para.writeReordered(Bidi.OUTPUT_REVERSE);

From the documentation:

An example for using this option is output to a character terminal that is designed for RTL scripts and stores text in reverse order.

Or if the device is smart enough, I think the Bidi class can be skipped entirely.

The second part is the ArabicShaping. This will map the Unicode characters to a more restricted character set. Currently, it is called like this:

ArabicShaping as = new ArabicShaping(ArabicShaping.LETTERS_SHAPE| ArabicShaping.TEXT_DIRECTION_VISUAL_LTR|ArabicShaping.LENGTH_GROW_SHRINK);

If we change the output of "writeReordered" then the characters are in RTL order, so we need to remove the TEXT_DIRECTION_VISUAL_LTR from the current code to be like this:

ArabicShaping as = new ArabicShaping(ArabicShaping.LETTERS_SHAPE| ArabicShaping.LENGTH_GROW_SHRINK);

So essentially, please experiment with these two options (Bidi and ArabicShaping) and test it on the printer.

@tresf
Copy link
Contributor Author

tresf commented Jul 29, 2018

@yohanes thanks for the detailed explanation.

On some printers (or maybe most?), they can only print from left to right.

This is the part I need to find out and I'm not sure where to start digging. :D

please experiment with these two options (Bidi and ArabicShaping) and test it on the printer

👍

@tresf tresf mentioned this pull request Jul 31, 2018
3 tasks
@tresf
Copy link
Contributor Author

tresf commented Aug 6, 2018

We were unable to get an answer from any shops that use this type of hardware whether or not the byte order should be forward or reversed. I'm merging this as-is. If it changes after further discovery down the road, we'll reverse and provide necessary workarounds to affected clients.

This feature will be available in 2.0.8 and higher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants