Bundling issues with tiktoken (Error: Missing tiktoken_bg.wasm) #1127

marcusschiesser · 2024-08-19T03:35:35Z

I am opening this ticket to gather all issues related to bundling the WASM from https:/dqbd/tiktoken:

Using AWS Nodejs serverless project, see Node Serverless deployment fails due to bundling issue #1110 (comment)
Using NextJS deploying on Vercel, see Error: Missing tiktoken_bg.wasm create-llama#164 (was fixed by copying the WASM file; see https:/run-llama/create-llama/pull/201/files)

If you encounter this issue, please post your setup and configuration here.

LeonhardZehetgruber · 2024-08-30T12:28:13Z

I am encountering this issue when trying to integrate llamaindex into my Obsidian plugin. The build output for the plugin is a bundled main.js file.

package.json (the relevant part):

{
	"type": "module",
	"scripts": {
		"dev": "node esbuild.config.mjs"
	},
	"dependencies": {
		"llamaindex": "0.5.20"
	}
}

esbuild.config.mjs:

import esbuild from "esbuild";
import process from "node:process";
import builtins from "builtin-modules";

const context = await esbuild.context({
	entryPoints: { main: "src/main.ts" },
	bundle: true,
	platform: "node",
	external: [
		"obsidian",
		"electron",
		"sharp",
		"onnxruntime-node",
		"./xhr-sync-worker.js",
		...builtins],
	mainFields: ["browser", "module", "main"],
	conditions: ["browser"],
	format: "cjs",
	target: "es2022",
	logLevel: "info",
	treeShaking: true,
	outdir: "."
});

await context.rebuild();
process.exit(0);

tsconfig.json:

{
	"compilerOptions": {
		"baseUrl": "./src",
		"target": "es2022",
		"module": "ESNext",
		"moduleResolution": "bundler",
		"esModuleInterop": true,
		"skipLibCheck": true,
		"types": [
			"node",
			"jest"
		],
		"lib": [
			"DOM",
			"ES5",
			"ES6",
			"ES7",
			"ES2021",
			"ES2022"
		]
	},
	"include": [
		"**/*.ts"
	]
}

If I now use the following in my main.ts:

import { HuggingFaceEmbedding, Settings } from 'llamaindex';

Settings.embedModel = new HuggingFaceEmbedding({
	modelType: 'nomic-ai/nomic-embed-text-v1.5',
	quantized: false
});

I get the error Error: Missing tiktoken_bg.wasm at node_modules/tiktoken/tiktoken.cjs in the developer console.

AndreMaz · 2024-09-25T13:07:24Z

Just in case someone also faces the same issue. This is how I solved the issue

My next.config.mjs

import path from "path";
import { fileURLToPath } from "url";
import _jiti from "jiti";

import { withLlamaIndex } from "@web/chatbot/next";

const jiti = _jiti(fileURLToPath(import.meta.url));

// Import env files to validate at build time. Use jiti so we can load .ts files in here.
jiti("./src/env");

const isStaticExport = "false";

// Get __dirname equivalent for ES modules
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

/**
 * @type {import("next").NextConfig}
 */
const nextConfig = {
  basePath: process.env.NEXT_PUBLIC_BASE_PATH,
  serverRuntimeConfig: {
    PROJECT_ROOT: __dirname,
  },
  env: {
    BUILD_STATIC_EXPORT: isStaticExport,
  },
  // Trailing slashes must be disabled for Next Auth callback endpoint to work
  // https://stackoverflow.com/a/78348528
  trailingSlash: false,
  modularizeImports: {
    "@mui/icons-material": {
      transform: "@mui/icons-material/{{member}}",
    },
    "@mui/material": {
      transform: "@mui/material/{{member}}",
    },
    "@mui/lab": {
      transform: "@mui/lab/{{member}}",
    },
  },
  webpack(config) {
    config.module.rules.push({
      test: /\.svg$/,
      use: ["@svgr/webpack"],
    });

    // To allow chatbot to work
    // Extracted from: https:/neondatabase/examples/blob/main/ai/llamaindex/rag-nextjs/next.config.mjs
    config.resolve.alias = {
      ...config.resolve.alias,
      sharp$: false,
      "onnxruntime-node$": false,
    };

    // From: https:/dqbd/tiktoken?tab=readme-ov-file#nextjs
    config.experiments = {
      asyncWebAssembly: true,
      layers: true,
    };

    return config;
  },
  ...(isStaticExport === "true" && {
    output: "export",
  }),

  experimental: {
    outputFileTracingIncludes: {
      "/*": ["./cache/**/*"],
      "/api/**/*": ["./node_modules/**/*.wasm"],
    },
    serverComponentsExternalPackages: ["tiktoken", "onnxruntime-node"],
  },

  /** Enables hot reloading for local packages without a build step */
  transpilePackages: [
    "@web/api",
    "@web/auth",
    "@web/db",
    "@web/ui",
    "@web/validators",
    "@web/services",
    "@web/utils",
    "@web/logger",
    "@web/certs",
    "@web/chatbot",
  ],
  /** We already do linting and typechecking as separate tasks in CI */
  eslint: { ignoreDuringBuilds: true },
  typescript: { ignoreBuildErrors: true },
};

const withLlamaIndexConfig = withLlamaIndex(nextConfig);

export default withLlamaIndexConfig;

In my case everything related to llamaindex is at package @web/chatbot. This is why even the withLlamaIndex is being imported from @web/chatbot/next

Here's how my package.json at @web/chatbot looks like:

{
  "name": "@web/chatbot",
  "private": true,
  "version": "0.1.0",
  "type": "module",
  "exports": {
    ".": "./src/index.ts",
    "./next": "./src/with-lama-index.mjs"
  },
  "license": "MIT",
  "scripts": {
    "clean": "rm -rf .turbo node_modules",
    "format": "prettier --check . --ignore-path ../../.gitignore --ignore-path ../../.prettierignore",
    "lint": "eslint .",
    "typecheck": "tsc --emitDeclarationOnly"
  },
  "devDependencies": {
    "@web/eslint-config": "workspace:*",
    "@web/prettier-config": "workspace:*",
    "@web/tsconfig": "workspace:*",
    "@web/utils": "workspace:*",
    "eslint": "catalog:",
    "prettier": "catalog:",
    "typescript": "catalog:"
  },
  "prettier": "@web/prettier-config",
  "dependencies": {
    "@web/logger": "workspace:*",
    "@t3-oss/env-nextjs": "catalog:",
    "js-tiktoken": "^1.0.14",
    "llamaindex": "catalog:",
    "pg": "^8.13.0",
    "tiktoken": "^1.0.16"
  }
}

For reference: The next.config.mjs and my repo struct is based on create-t3-turbo repo

For more context check #1226

mobob · 2024-10-09T03:05:59Z

I think i'm a victim of this too! My lamba logs:

app-logs        | 2024-10-09T03:02:44 {"timestamp":"2024-10-09T03:02:44.451Z","level":"ERROR","message":{"errorType":"Error","errorMessage":"Missing tiktoken_bg.wasm","stackTrace":["Error: Missing tiktoken_bg.wasm","    at node_modules/.pnpm/[email protected]/node_modules/tiktoken/tiktoken.cjs (/var/task/index.js:137027:13)","    at __require2 (/var/task/index.js:18:53)","    at node_modules/.pnpm/@[email protected]_@[email protected]_@[email protected]_js-tiktoken@1._d65gjdt6k6tmj5pmt5qhv3yqf4/node_modules/@llamaindex/env/dist/tokenizers/node.js (/var/task/index.js:137043:32)","    at __init (/var/task/index.js:15:59)","    at node_modules/.pnpm/@[email protected]_@[email protected]_@[email protected]_js-tiktoken@1._d65gjdt6k6tmj5pmt5qhv3yqf4/node_modules/@llamaindex/env/dist/index.js (/var/task/index.js:137131:5)","    at __init (/var/task/index.js:15:59)","    at node_modules/.pnpm/@[email protected]_@[email protected]_@[email protected]_js-tiktoken@1_333vhfcqr2gqzir7ihb4nbnsrq/node_modules/@llamaindex/core/agent/dist/index.js (/var/task/index.js:141039:5)","    at __init (/var/task/index.js:15:59)","    at node_modules/.pnpm/@[email protected]_@[email protected]_@[email protected]_encoding@0._oeg7rmwedn7wy56dfzvlejhd6a/node_modules/@llamaindex/openai/dist/index.js (/var/task/index.js:147568:5)","    at __init (/var/task/index.js:15:59)"]}}

My stack is;

pnpm monorepo, mostly typescript within (note - have had issues with this setup too as i've needed to install some stuff at project root for ... some reason!)
all deployed to AWS via CDK, lambdas running node18 i believe
recently moved to LlamaIndex to parse PDFs and have been grappling with bizarre dep issues all day

I think this is the last workaround and things seem to be ok now. Most recent issue was on lambda execute, i'd get the above error, long before llamaindex was being used. I'm no bundling expert, but i believe one equivalent workaround of the next js workaround that i think is working for me is to install vs bundle tiktoken via an esbuild option, which i have no idea how it will slow things down.

const settings : cdk.aws_lambda_nodejs.NodejsFunctionProps = {
    handler: "handler",
    runtime: props.serviceConfig.nodeRuntime,
    memorySize: 256,
    tracing: Tracing.ACTIVE,
    bundling: {
      logLevel: LogLevel.INFO,

      nodeModules: ["tiktoken"], // main workaround

      minify: false, //branchIsMain(), tried both
      tsconfig: "../backend/tsconfig.json",

      sourceMap: !branchIsMain(),
      metafile: !branchIsMain(),

      // TODO LlamaIndex hack - see https:/evanw/esbuild/issues/1051
      // this might appear as the following error during synth: No loader is configured for ".node" files: node_modules/.pnpm/onnxruntime-node
      loader: {
        // i needed to add this for another llamaindex dep issue
        ".node": "file",
        // i had this briefly, but now that its installed i don't need it anymore
        // ".wasm": "file",
      },
    },
  };

Would love some wisdom on this workaround, and if there are plans to fix this. I'd also love some insight about if i really should need to install some of these extra libs in my package.json, as i obviously went ahead and did that but it often didn't have an effect (i'll assume because they were all shaken out). Thanks for tracking this!

Update - the workaround at least for now likely won't work... max artifact size. Any other suggestions would be greatly appreciated!

marcusschiesser mentioned this issue Aug 19, 2024

Error: Missing tiktoken_bg.wasm run-llama/create-llama#164

Closed

AndreMaz mentioned this issue Sep 20, 2024

@next/bundle-analyzer throws an error with nextjs-node-runtime example #1226

Closed

himself65 added the bug Something isn't working label Sep 30, 2024

CahidArda mentioned this issue Oct 2, 2024

DX-1264: Update Exports & Add CI Tests upstash/rag-chat#79

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bundling issues with tiktoken (Error: Missing tiktoken_bg.wasm) #1127

Bundling issues with tiktoken (Error: Missing tiktoken_bg.wasm) #1127

marcusschiesser commented Aug 19, 2024

LeonhardZehetgruber commented Aug 30, 2024

AndreMaz commented Sep 25, 2024 •

edited

Loading

mobob commented Oct 9, 2024 •

edited

Loading

Bundling issues with tiktoken (Error: Missing tiktoken_bg.wasm) #1127

Bundling issues with tiktoken (Error: Missing tiktoken_bg.wasm) #1127

Comments

marcusschiesser commented Aug 19, 2024

LeonhardZehetgruber commented Aug 30, 2024

AndreMaz commented Sep 25, 2024 • edited Loading

mobob commented Oct 9, 2024 • edited Loading

AndreMaz commented Sep 25, 2024 •

edited

Loading

mobob commented Oct 9, 2024 •

edited

Loading