Skip to content

Problem when running some models with cuda #261

@bqhuyy

Description

@bqhuyy

Issue description

Models keep generating dummy result when running with cuda

Expected Behavior

Models stop generating dummy output like running as cpu or vulkan.

Actual Behavior

Models keep generating dummy result.

Steps to reproduce

I use this Qwen2 1.5B model download from here
Running with gpu is auto or cuda

const llama = await getLlama({gpu: 'cuda'})

My Environment

Dependency Version
Operating System Windows 10
CPU AMD Ryzen 7 3700X
GPU RTX4090, RTX3080
Node.js version v20.11.1
Typescript version 5.5.2
node-llama-cpp version 3.0.0-beta.36

Additional Context

Here is example I running using https://github.com/withcatai/node-llama-cpp/releases/download/v3.0.0-beta.36/node-llama-cpp-electron-example.Windows.3.0.0-beta.36.x64.exe
Screenshot 2024-07-01 165036

These models run normally with 'vulkan', 'cpu', 'metal'

Relevant Features Used

  • Metal support
  • CUDA support
  • Grammar

Are you willing to resolve this issue by submitting a Pull Request?

Yes, I have the time, but I don't know how to start. I would need guidance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions