Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundling issues with tiktoken (Error: Missing tiktoken_bg.wasm) #1127

Open
marcusschiesser opened this issue Aug 19, 2024 · 3 comments
Open
Labels
bug Something isn't working

Comments

@marcusschiesser
Copy link
Collaborator

I am opening this ticket to gather all issues related to bundling the WASM from https:/dqbd/tiktoken:

  1. Using AWS Nodejs serverless project, see Node Serverless deployment fails due to bundling issue #1110 (comment)
  2. Using NextJS deploying on Vercel, see Error: Missing tiktoken_bg.wasm create-llama#164 (was fixed by copying the WASM file; see https:/run-llama/create-llama/pull/201/files)

If you encounter this issue, please post your setup and configuration here.

@LeonhardZehetgruber
Copy link

I am encountering this issue when trying to integrate llamaindex into my Obsidian plugin. The build output for the plugin is a bundled main.js file.

package.json (the relevant part):

{
	"type": "module",
	"scripts": {
		"dev": "node esbuild.config.mjs"
	},
	"dependencies": {
		"llamaindex": "0.5.20"
	}
}

esbuild.config.mjs:

import esbuild from "esbuild";
import process from "node:process";
import builtins from "builtin-modules";

const context = await esbuild.context({
	entryPoints: { main: "src/main.ts" },
	bundle: true,
	platform: "node",
	external: [
		"obsidian",
		"electron",
		"sharp",
		"onnxruntime-node",
		"./xhr-sync-worker.js",
		...builtins],
	mainFields: ["browser", "module", "main"],
	conditions: ["browser"],
	format: "cjs",
	target: "es2022",
	logLevel: "info",
	treeShaking: true,
	outdir: "."
});

await context.rebuild();
process.exit(0);

tsconfig.json:

{
	"compilerOptions": {
		"baseUrl": "./src",
		"target": "es2022",
		"module": "ESNext",
		"moduleResolution": "bundler",
		"esModuleInterop": true,
		"skipLibCheck": true,
		"types": [
			"node",
			"jest"
		],
		"lib": [
			"DOM",
			"ES5",
			"ES6",
			"ES7",
			"ES2021",
			"ES2022"
		]
	},
	"include": [
		"**/*.ts"
	]
}

If I now use the following in my main.ts:

import { HuggingFaceEmbedding, Settings } from 'llamaindex';

Settings.embedModel = new HuggingFaceEmbedding({
	modelType: 'nomic-ai/nomic-embed-text-v1.5',
	quantized: false
});

I get the error Error: Missing tiktoken_bg.wasm at node_modules/tiktoken/tiktoken.cjs in the developer console.

@AndreMaz
Copy link
Contributor

AndreMaz commented Sep 25, 2024

Just in case someone also faces the same issue. This is how I solved the issue

My next.config.mjs

import path from "path";
import { fileURLToPath } from "url";
import _jiti from "jiti";

import { withLlamaIndex } from "@web/chatbot/next";

const jiti = _jiti(fileURLToPath(import.meta.url));

// Import env files to validate at build time. Use jiti so we can load .ts files in here.
jiti("./src/env");

const isStaticExport = "false";

// Get __dirname equivalent for ES modules
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

/**
 * @type {import("next").NextConfig}
 */
const nextConfig = {
  basePath: process.env.NEXT_PUBLIC_BASE_PATH,
  serverRuntimeConfig: {
    PROJECT_ROOT: __dirname,
  },
  env: {
    BUILD_STATIC_EXPORT: isStaticExport,
  },
  // Trailing slashes must be disabled for Next Auth callback endpoint to work
  // https://stackoverflow.com/a/78348528
  trailingSlash: false,
  modularizeImports: {
    "@mui/icons-material": {
      transform: "@mui/icons-material/{{member}}",
    },
    "@mui/material": {
      transform: "@mui/material/{{member}}",
    },
    "@mui/lab": {
      transform: "@mui/lab/{{member}}",
    },
  },
  webpack(config) {
    config.module.rules.push({
      test: /\.svg$/,
      use: ["@svgr/webpack"],
    });

    // To allow chatbot to work
    // Extracted from: https:/neondatabase/examples/blob/main/ai/llamaindex/rag-nextjs/next.config.mjs
    config.resolve.alias = {
      ...config.resolve.alias,
      sharp$: false,
      "onnxruntime-node$": false,
    };

    // From: https:/dqbd/tiktoken?tab=readme-ov-file#nextjs
    config.experiments = {
      asyncWebAssembly: true,
      layers: true,
    };

    return config;
  },
  ...(isStaticExport === "true" && {
    output: "export",
  }),

  experimental: {
    outputFileTracingIncludes: {
      "/*": ["./cache/**/*"],
      "/api/**/*": ["./node_modules/**/*.wasm"],
    },
    serverComponentsExternalPackages: ["tiktoken", "onnxruntime-node"],
  },

  /** Enables hot reloading for local packages without a build step */
  transpilePackages: [
    "@web/api",
    "@web/auth",
    "@web/db",
    "@web/ui",
    "@web/validators",
    "@web/services",
    "@web/utils",
    "@web/logger",
    "@web/certs",
    "@web/chatbot",
  ],
  /** We already do linting and typechecking as separate tasks in CI */
  eslint: { ignoreDuringBuilds: true },
  typescript: { ignoreBuildErrors: true },
};

const withLlamaIndexConfig = withLlamaIndex(nextConfig);

export default withLlamaIndexConfig;

In my case everything related to llamaindex is at package @web/chatbot. This is why even the withLlamaIndex is being imported from @web/chatbot/next

Here's how my package.json at @web/chatbot looks like:

{
  "name": "@web/chatbot",
  "private": true,
  "version": "0.1.0",
  "type": "module",
  "exports": {
    ".": "./src/index.ts",
    "./next": "./src/with-lama-index.mjs"
  },
  "license": "MIT",
  "scripts": {
    "clean": "rm -rf .turbo node_modules",
    "format": "prettier --check . --ignore-path ../../.gitignore --ignore-path ../../.prettierignore",
    "lint": "eslint .",
    "typecheck": "tsc --emitDeclarationOnly"
  },
  "devDependencies": {
    "@web/eslint-config": "workspace:*",
    "@web/prettier-config": "workspace:*",
    "@web/tsconfig": "workspace:*",
    "@web/utils": "workspace:*",
    "eslint": "catalog:",
    "prettier": "catalog:",
    "typescript": "catalog:"
  },
  "prettier": "@web/prettier-config",
  "dependencies": {
    "@web/logger": "workspace:*",
    "@t3-oss/env-nextjs": "catalog:",
    "js-tiktoken": "^1.0.14",
    "llamaindex": "catalog:",
    "pg": "^8.13.0",
    "tiktoken": "^1.0.16"
  }
}

For reference: The next.config.mjs and my repo struct is based on create-t3-turbo repo

For more context check #1226

@mobob
Copy link

mobob commented Oct 9, 2024

I think i'm a victim of this too! My lamba logs:

app-logs        | 2024-10-09T03:02:44 {"timestamp":"2024-10-09T03:02:44.451Z","level":"ERROR","message":{"errorType":"Error","errorMessage":"Missing tiktoken_bg.wasm","stackTrace":["Error: Missing tiktoken_bg.wasm","    at node_modules/.pnpm/[email protected]/node_modules/tiktoken/tiktoken.cjs (/var/task/index.js:137027:13)","    at __require2 (/var/task/index.js:18:53)","    at node_modules/.pnpm/@[email protected]_@[email protected]_@[email protected]_js-tiktoken@1._d65gjdt6k6tmj5pmt5qhv3yqf4/node_modules/@llamaindex/env/dist/tokenizers/node.js (/var/task/index.js:137043:32)","    at __init (/var/task/index.js:15:59)","    at node_modules/.pnpm/@[email protected]_@[email protected]_@[email protected]_js-tiktoken@1._d65gjdt6k6tmj5pmt5qhv3yqf4/node_modules/@llamaindex/env/dist/index.js (/var/task/index.js:137131:5)","    at __init (/var/task/index.js:15:59)","    at node_modules/.pnpm/@[email protected]_@[email protected]_@[email protected]_js-tiktoken@1_333vhfcqr2gqzir7ihb4nbnsrq/node_modules/@llamaindex/core/agent/dist/index.js (/var/task/index.js:141039:5)","    at __init (/var/task/index.js:15:59)","    at node_modules/.pnpm/@[email protected]_@[email protected]_@[email protected]_encoding@0._oeg7rmwedn7wy56dfzvlejhd6a/node_modules/@llamaindex/openai/dist/index.js (/var/task/index.js:147568:5)","    at __init (/var/task/index.js:15:59)"]}}

My stack is;

  • pnpm monorepo, mostly typescript within (note - have had issues with this setup too as i've needed to install some stuff at project root for ... some reason!)
  • all deployed to AWS via CDK, lambdas running node18 i believe
  • recently moved to LlamaIndex to parse PDFs and have been grappling with bizarre dep issues all day

I think this is the last workaround and things seem to be ok now. Most recent issue was on lambda execute, i'd get the above error, long before llamaindex was being used. I'm no bundling expert, but i believe one equivalent workaround of the next js workaround that i think is working for me is to install vs bundle tiktoken via an esbuild option, which i have no idea how it will slow things down.

const settings : cdk.aws_lambda_nodejs.NodejsFunctionProps = {
    handler: "handler",
    runtime: props.serviceConfig.nodeRuntime,
    memorySize: 256,
    tracing: Tracing.ACTIVE,
    bundling: {
      logLevel: LogLevel.INFO,

      nodeModules: ["tiktoken"], // main workaround

      minify: false, //branchIsMain(), tried both
      tsconfig: "../backend/tsconfig.json",

      sourceMap: !branchIsMain(),
      metafile: !branchIsMain(),

      // TODO LlamaIndex hack - see https:/evanw/esbuild/issues/1051
      // this might appear as the following error during synth: No loader is configured for ".node" files: node_modules/.pnpm/onnxruntime-node
      loader: {
        // i needed to add this for another llamaindex dep issue
        ".node": "file",
        // i had this briefly, but now that its installed i don't need it anymore
        // ".wasm": "file",
      },
    },
  };

Would love some wisdom on this workaround, and if there are plans to fix this. I'd also love some insight about if i really should need to install some of these extra libs in my package.json, as i obviously went ahead and did that but it often didn't have an effect (i'll assume because they were all shaken out). Thanks for tracking this!

Update - the workaround at least for now likely won't work... max artifact size. Any other suggestions would be greatly appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants