Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many duplicated Name.Literal instances #10330

Open
JaroslavTulach opened this issue Jun 21, 2024 · 2 comments
Open

Too many duplicated Name.Literal instances #10330

JaroslavTulach opened this issue Jun 21, 2024 · 2 comments
Assignees

Comments

@JaroslavTulach
Copy link
Member

I've executed Enso program as enso --run test/Base_Tests and in the middle of execution I have taken a heap dump. Then I used VisualVM to compute histogram of Name.Literal instances based on their name. There is a lot of duplicates - for example Standard is repeated 15000 times!

Name.Literal duplicates

The program to execute in OQL console is:

var histo = {};
heap.forEachObject(function(l) {
  var n = l.name.toString();
  var cnt = histo[n];
  if (!cnt) {
    histo[n] = 1;
  } else {
    histo[n] = cnt+1;
  }
}, 'org.enso.compiler.core.ir.Name$Literal')

var arr = []
for (var n in histo) {
  arr.push([n, histo[n]])
}
arr.sort(function(p1, p2) { return p2[1] - p1[1]; })
arr
@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Jun 30, 2024

The generated code for persistance of Name.Literal is using writeUTF to store the String name:

@Override
  protected void writeObject(org.enso.compiler.core.ir.Name.Literal obj, Output out) throws IOException {
    out.writeUTF(obj.name());
    out.writeBoolean(obj.isMethod());
    out.writeInline(scala.Option.class, obj.location());
    out.writeInline(scala.Option.class, obj.originalName());
    out.writeInline(org.enso.compiler.core.ir.MetadataStorage.class, obj.passData());
    out.writeInline(org.enso.compiler.core.ir.DiagnosticStorage.class, obj.diagnostics());
  }

the writeUTF method currently stores the name "in place" - e.g. repeats the characters again and again. If we replace that line with:

      out.writeObject(obj.name().intern());

the size of the Standard.Base .bindings cache goes down from 42MB to 32MB and overall we save 23% on the size of all caches.

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Jun 30, 2024

One other method is to use writeObject and readObject for String in the generated classes:

$ git diff 
diff --git lib/java/persistance-dsl/src/main/java/org/enso/persist/impl/PersistableProcessor.java lib/java/persistance-dsl/src/main/java/org/enso/persist/impl/PersistableProcessor.java
index e7ee6e05f4..ab17b3685f 100644
--- lib/java/persistance-dsl/src/main/java/org/enso/persist/impl/PersistableProcessor.java
+++ lib/java/persistance-dsl/src/main/java/org/enso/persist/impl/PersistableProcessor.java
@@ -182,7 +182,9 @@ public class PersistableProcessor extends AbstractProcessor {
       if (cons != null) {
         for (var v : cons.getParameters()) {
           if (tu.isSameType(eu.getTypeElement("java.lang.String").asType(), v.asType())) {
-            w.append("    var ").append(v.getSimpleName()).append(" = in.readUTF();\n");
+            w.append("    var ")
+                .append(v.getSimpleName())
+                .append(" = (java.lang.String) in.readObject();\n");
           } else if (!v.asType().getKind().isPrimitive()) {
             var type = tu.erasure(v.asType());
             var elem = (TypeElement) tu.asElement(type);
@@ -241,7 +243,7 @@ public class PersistableProcessor extends AbstractProcessor {
       if (cons != null) {
         for (var v : cons.getParameters()) {
           if (tu.isSameType(eu.getTypeElement("java.lang.String").asType(), v.asType())) {
-            w.append("    out.writeUTF(obj.").append(v.getSimpleName()).append("());\n");
+            w.append("    out.writeObject(obj.").append(v.getSimpleName()).append("());\n");
           } else if (!v.asType().getKind().isPrimitive()) {
             var type = tu.erasure(v.asType());
:...skipping...
diff --git lib/java/persistance-dsl/src/main/java/org/enso/persist/impl/PersistableProcessor.java lib/java/persistance-dsl/src/main/java/org/enso/persist/impl/PersistableProcessor.java
index e7ee6e05f4..ab17b3685f 100644
--- lib/java/persistance-dsl/src/main/java/org/enso/persist/impl/PersistableProcessor.java
+++ lib/java/persistance-dsl/src/main/java/org/enso/persist/impl/PersistableProcessor.java
@@ -182,7 +182,9 @@ public class PersistableProcessor extends AbstractProcessor {
       if (cons != null) {
         for (var v : cons.getParameters()) {
           if (tu.isSameType(eu.getTypeElement("java.lang.String").asType(), v.asType())) {
-            w.append("    var ").append(v.getSimpleName()).append(" = in.readUTF();\n");
+            w.append("    var ")
+                .append(v.getSimpleName())
+                .append(" = (java.lang.String) in.readObject();\n");
           } else if (!v.asType().getKind().isPrimitive()) {
             var type = tu.erasure(v.asType());
             var elem = (TypeElement) tu.asElement(type);
@@ -241,7 +243,7 @@ public class PersistableProcessor extends AbstractProcessor {
       if (cons != null) {
         for (var v : cons.getParameters()) {
           if (tu.isSameType(eu.getTypeElement("java.lang.String").asType(), v.asType())) {
-            w.append("    out.writeUTF(obj.").append(v.getSimpleName()).append("());\n");
+            w.append("    out.writeObject(obj.").append(v.getSimpleName()).append("());\n");
           } else if (!v.asType().getKind().isPrimitive()) {
             var type = tu.erasure(v.asType());
             var elem = (TypeElement) tu.asElement(type);
diff --git lib/java/persistance/src/main/java/org/enso/persist/PerGenerator.java lib/java/persistance/src/main/java/org/enso/persist/PerGenerator.java
index d8a3e2c2bd..d64d7d5d12 100644
--- lib/java/persistance/src/main/java/org/enso/persist/PerGenerator.java
+++ lib/java/persistance/src/main/java/org/enso/persist/PerGenerator.java
@@ -13,7 +13,7 @@ import java.util.function.Function;
 import org.slf4j.Logger;
 
 final class PerGenerator {
-  static final byte[] HEADER = new byte[] {0x0a, 0x0d, 0x13, 0x0f};
+  static final byte[] HEADER = new byte[] {0x0a, 0x0d, 0x13, 0x13};
   private final OutputStream main;
   private final Map<Object, WriteResult> knownObjects = new IdentityHashMap<>();
   private int countReferences = 1;
diff --git lib/java/persistance/src/main/java/org/enso/persist/PerInputImpl.java lib/java/persistance/src/main/java/org/enso/persist/PerInputImpl.java
index 5bfe457cd2..1f6c9c126a 100644
--- lib/java/persistance/src/main/java/org/enso/persist/PerInputImpl.java
+++ lib/java/persistance/src/main/java/org/enso/persist/PerInputImpl.java
@@ -319,11 +319,11 @@ final class PerInputImpl implements Input {
     }
 
     final Object resolveObject(Object res) {
-      if (readResolve != null) {
-        return readResolve.apply(res);
-      } else {
-        return res;
+      var v = (readResolve != null) ? readResolve.apply(res) : res;
+      if (v instanceof String s) {
+        v = s.intern();
       }
+      return v;
     }
 
     final Object getObjectAt(int at) {

with this change the caches are also down to the previous values. Moreover it is guaranteed that various Name.Literal instances point to the same string. By executing following OQL query:

var arr = []
heap.forEachObject(function(l) {
  var n = l.name.toString();
  if (!n.contains("Standard")) {
    return;
  }
  arr.push(l.name)
}, 'org.enso.compiler.core.ir.Name$Literal')

arr

that all the IR.Name.name values are pointing to the identical instance of the string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 📤 Backlog
Development

Successfully merging a pull request may close this issue.

1 participant