Issue description
Consistent "Failed to create context" error from users with Apple M5, no matter the model
Expected Behavior
I would expect models to load correctly on M5 as it does on any other Mac architecture.
Actual Behavior
I am observing a consistent "Failed to create context" error from users with Apple M5 when they try to load the model, no matter the model.
Steps to reproduce
Here's how I load the model and create context:
const llama = await getLlama();
// Load the model using the provided path
const model = await llama.loadModel({
modelPath,
defaultContextFlashAttention: true
});
// Initialize the model context
const context = await model.createContext({
contextSize: {max:4096}
});
// Load the session
const session = new LlamaChatSession({
contextSequence: context.getSequence(),
});
My Environment
| Dependency |
Version |
| Operating System |
macOS 24.3.0 |
| CPU |
Apple M4 Pro |
| Node.js version |
22.21.1 |
| Typescript version |
5.9.3 |
node-llama-cpp version |
3.14.5 |
npx --yes node-llama-cpp inspect gpu output:
Result of running `npx --yes node-llama-cpp inspect gpu`
Additional Context
Please note I have M4 Pro, this issue is related to M5. I noticed it from users analytics but there's no way for me to reproduce. Also note I am using version 3.14.5. I tried to update to the latest 3.15.1 but I experienced this error "A context size of 24 is too large for the available VRAM" frequently and unpredictably for all models.
Relevant Features Used
Are you willing to resolve this issue by submitting a Pull Request?
No, I don’t have the time and I’m okay to wait for the community / maintainers to resolve this issue.
Issue description
Consistent "Failed to create context" error from users with Apple M5, no matter the model
Expected Behavior
I would expect models to load correctly on M5 as it does on any other Mac architecture.
Actual Behavior
I am observing a consistent "Failed to create context" error from users with Apple M5 when they try to load the model, no matter the model.
Steps to reproduce
Here's how I load the model and create context:
My Environment
node-llama-cppversionnpx --yes node-llama-cpp inspect gpuoutput:Additional Context
Please note I have M4 Pro, this issue is related to M5. I noticed it from users analytics but there's no way for me to reproduce. Also note I am using version 3.14.5. I tried to update to the latest 3.15.1 but I experienced this error "A context size of 24 is too large for the available VRAM" frequently and unpredictably for all models.
Relevant Features Used
Are you willing to resolve this issue by submitting a Pull Request?
No, I don’t have the time and I’m okay to wait for the community / maintainers to resolve this issue.