This document provides an in-depth reference for the OffgridMobile application: its architecture, every major subsystem, data models, native integrations, and detailed product flows.
- Product Overview
- Architecture & Technology Stack
- Directory Structure
- Navigation & Screen Map
- State Management (Zustand Stores)
- Data Models & Types
- Core Services
- Native Integration Layer
- Product Flows — Detailed
- Testing Infrastructure
- Constants & Configuration
- File System Layout (On-Device)
- Appendix: Default System Prompt
- Appendix: Default Projects
- Appendix: Claude Code Memory System
OffgridMobile is a privacy-first, on-device AI assistant built with React Native. It runs large language models (LLMs), Stable Diffusion image generators, and Whisper speech-to-text models entirely on the user's phone — no server, no internet required after initial model download.
Core capabilities:
- Text chat with streaming LLM inference (llama.cpp via
llama.rn) - Tool calling with automatic tool loop (web search, calculator, date/time, device info)
- Image generation with Stable Diffusion (MNN/QNN backends via LocalDream)
- Voice input via Whisper speech-to-text (whisper.cpp via
whisper.rn) - Vision model support (multimodal LLMs with image understanding)
- Document attachment and analysis
- Markdown rendering in chat messages
- Project-based system prompt presets
- Generated image gallery with metadata
- Passphrase lock with lockout protection
- Model browsing and download from Hugging Face
Platform support:
- iOS: Text generation (Metal GPU), Whisper, image generation via Core ML (ANE acceleration).
- Android: Full feature set including image generation (MNN CPU, QNN NPU on Qualcomm), background downloads via system DownloadManager.
| Layer | Technology |
|---|---|
| Framework | React Native (TypeScript) |
| Navigation | React Navigation 7 (native stack + bottom tabs) |
| State | Zustand with persist middleware → AsyncStorage |
| Styling | React Native StyleSheet + dynamic theme system (src/theme/) |
| Capability | Library | Native Backend |
|---|---|---|
| Text LLM | llama.rn ^0.11 |
llama.cpp (C++) — Metal (iOS), CPU (Android) |
| Image Gen | Custom LocalDreamModule |
libstable_diffusion_core.so subprocess on localhost:18081 |
| Speech-to-Text | whisper.rn ^0.5 |
whisper.cpp (C++) |
| Service | Library |
|---|---|
| File I/O | react-native-fs |
| Persistence | @react-native-async-storage/async-storage |
| Secure Storage | react-native-keychain |
| Device Info | react-native-device-info |
| Image Picker | react-native-image-picker |
| Zip Extraction | react-native-zip-archive |
| Icons | react-native-vector-icons (Feather) |
- Lifecycle-independent services — Text and image generation continue running even when the user navigates away from the chat screen. Services use a subscriber/observer pattern so any screen can re-attach.
- Selective persistence — Only durable state is persisted (conversations, settings, downloaded model metadata). Transient UI state (streaming position, loading flags) is kept in memory only.
- Two model loading strategies — "Performance" keeps the model in RAM across generations; "Memory" unloads after each generation to free RAM.
- Hybrid intent classification — Fast regex pattern matching with optional LLM fallback for ambiguous prompts.
OffgridMobile/
├── App.tsx # Root component: init, auth gate, navigation
├── app.json # RN app config (name: "OffgridMobile", displayName: "Off Grid")
├── package.json # Dependencies & scripts
├── tsconfig.json # TypeScript config
│
├── src/
│ ├── assets/
│ │ └── logo.png # App logo
│ │
│ ├── components/ # Reusable UI components
│ │ ├── AnimatedEntry.tsx # Animated mount/unmount wrapper
│ │ ├── AnimatedListItem.tsx # Animated list item wrapper
│ │ ├── AnimatedPressable.tsx # Animated press feedback wrapper
│ │ ├── AppSheet.tsx # Bottom sheet wrapper
│ │ ├── Button.tsx # Styled button
│ │ ├── Card.tsx # Card layout
│ │ ├── ChatInput.tsx # Message input bar (text, voice, attachments, image mode)
│ │ ├── ChatMessage.tsx # Single message bubble (streaming, images, metadata)
│ │ ├── CustomAlert.tsx # Alert dialog
│ │ ├── DebugSheet.tsx # Debug info bottom sheet
│ │ ├── GenerationSettingsModal.tsx # All generation settings in a modal
│ │ ├── ModelCard.tsx # Model browser card with compact/full modes, icon actions
│ │ ├── ModelSelectorModal.tsx # Model picker modal (text + image models)
│ │ ├── ProjectSelectorSheet.tsx # Project picker bottom sheet
│ │ ├── ThinkingIndicator.tsx # Thinking/loading indicator
│ │ └── VoiceRecordButton.tsx # Long-press voice recording with waveform
│ │
│ ├── screens/ # Screen components (19 screens)
│ │ ├── OnboardingScreen.tsx # Welcome slides
│ │ ├── ModelDownloadScreen.tsx # First model download during onboarding
│ │ ├── HomeScreen.tsx # Dashboard: active models, memory, recent chats
│ │ ├── ChatScreen.tsx # Main chat (67KB — largest screen)
│ │ ├── ChatsListScreen.tsx # Conversation list
│ │ ├── ModelsScreen.tsx # Model browser (text + image tabs)
│ │ ├── ProjectsScreen.tsx # Projects list
│ │ ├── ProjectDetailScreen.tsx # View project + linked chats
│ │ ├── ProjectEditScreen.tsx # Create/edit project
│ │ ├── GalleryScreen.tsx # Generated image grid
│ │ ├── SettingsScreen.tsx # Settings hub
│ │ ├── ModelSettingsScreen.tsx # LLM + image gen parameters
│ │ ├── VoiceSettingsScreen.tsx # Whisper model management
│ │ ├── DeviceInfoScreen.tsx # Hardware specs
│ │ ├── StorageSettingsScreen.tsx # Per-model storage usage
│ │ ├── SecuritySettingsScreen.tsx # Passphrase toggle + change
│ │ ├── LockScreen.tsx # Passphrase entry with lockout
│ │ ├── PassphraseSetupScreen.tsx # Initial passphrase creation
│ │ └── DownloadManagerScreen.tsx # Active downloads (modal)
│ │
│ ├── navigation/
│ │ ├── AppNavigator.tsx # Root stack + tab navigator definitions
│ │ ├── types.ts # Navigation param types
│ │ └── index.ts
│ │
│ ├── stores/ # Zustand state stores
│ │ ├── appStore.ts # App-wide state (models, settings, device, gallery)
│ │ ├── chatStore.ts # Conversations + messages + streaming
│ │ ├── authStore.ts # Auth state + lockout
│ │ ├── projectStore.ts # Projects (system prompt presets)
│ │ └── whisperStore.ts # Whisper model state
│ │
│ ├── services/ # Business logic & native bridges
│ │ ├── llm.ts # LLMService — llama.rn context, streaming, GPU
│ │ ├── activeModelService.ts # Singleton — load/unload text & image models
│ │ ├── modelManager.ts # Download, store, track model files
│ │ ├── generationService.ts # Lifecycle-independent text generation
│ │ ├── imageGenerationService.ts # Lifecycle-independent image generation
│ │ ├── localDreamGenerator.ts # ONNX SD wrapper (native subprocess)
│ │ ├── imageGenerator.ts # Image generator helper
│ │ ├── intentClassifier.ts # Pattern + LLM intent detection
│ │ ├── huggingface.ts # HF API: search, files, credibility
│ │ ├── huggingFaceModelBrowser.ts # Image model browsing
│ │ ├── coreMLModelBrowser.ts # iOS Core ML model discovery from Apple HF repos
│ │ ├── whisperService.ts # Whisper model download/load/transcribe
│ │ ├── voiceService.ts # Native voice input bridge
│ │ ├── authService.ts # Passphrase hash + keychain
│ │ ├── hardware.ts # Device info, RAM, recommendations
│ │ ├── backgroundDownloadService.ts # DownloadManager bridge (Android + iOS)
│ │ ├── documentService.ts # Document text extraction
│ │ ├── pdfExtractor.ts # Native PDF text extraction
│ │ ├── generationToolLoop.ts # Multi-turn tool loop orchestration (max 3 iterations)
│ │ ├── llmToolGeneration.ts # Tool-aware LLM generation with schema injection
│ │ └── tools/ # Tool calling subsystem
│ │ ├── index.ts # Public exports
│ │ ├── registry.ts # Tool definitions, OpenAI schema conversion
│ │ ├── handlers.ts # Tool execution (web search, calculator, datetime, device info)
│ │ └── types.ts # ToolDefinition, ToolCall, ToolResult types
│ │
│ ├── hooks/
│ │ ├── useAppState.ts # AppState foreground/background tracking
│ │ ├── useFocusTrigger.ts # Screen focus trigger hook
│ │ ├── useVoiceRecording.ts # Voice recording state machine
│ │ └── useWhisperTranscription.ts # Whisper transcription hook
│ │
│ ├── types/
│ │ ├── index.ts # All TypeScript interfaces & type aliases
│ │ └── whisper.rn.d.ts # Whisper native module type declarations
│ │
│ ├── theme/ # Light/dark theme system
│ │ ├── index.ts # useTheme() hook, getTheme(), Theme type
│ │ ├── palettes.ts # COLORS_LIGHT/DARK, SHADOWS_LIGHT/DARK, createElevation()
│ │ └── useThemedStyles.ts # useThemedStyles() — memoized style factory
│ │
│ ├── constants/
│ │ └── index.ts # Model recommendations, curated models, org filters, quantization info, HF config, typography, spacing
│ │
│ └── utils/
│ ├── coreMLModelUtils.ts # Core ML model path resolution helpers
│ ├── haptics.ts # Haptic feedback utilities
│ └── messageContent.ts # Strip LLM control tokens from output
│
├── android/ # Android native code
│ └── app/src/main/java/ai/offgridmobile/
│ ├── MainActivity.kt # Main activity
│ ├── MainApplication.kt # Application entry point
│ ├── localdream/
│ │ ├── LocalDreamModule.kt # Stable Diffusion native module
│ │ └── LocalDreamPackage.kt # Package registration
│ ├── download/
│ │ ├── DownloadManagerModule.kt # Background download native module
│ │ ├── DownloadManagerPackage.kt # Package registration
│ │ └── DownloadCompleteBroadcastReceiver.kt # Broadcast receiver
│ └── pdf/
│ ├── PDFExtractorModule.kt # Native PDF text extraction
│ └── PDFExtractorPackage.kt # Package registration
│
├── ios/ # iOS native code
│ └── OffgridMobile/
│ ├── AppDelegate.swift # Application delegate
│ ├── CoreMLDiffusion/
│ │ ├── CoreMLDiffusionModule.swift # Core ML image generation
│ │ └── CoreMLDiffusionModule.m # ObjC bridge
│ ├── Download/
│ │ ├── DownloadManagerModule.swift # iOS download manager
│ │ └── DownloadManagerModule.m # ObjC bridge
│ └── PDFExtractor/
│ ├── PDFExtractorModule.swift # Native PDF text extraction
│ └── PDFExtractorModule.m # ObjC bridge
│
├── __tests__/ # Test suites
│ ├── unit/ # Store & service unit tests
│ │ ├── stores/ # appStore, chatStore, authStore
│ │ ├── services/ # 12 service test files
│ │ ├── constants/ # Constants tests
│ │ └── utils/ # Utility tests
│ ├── integration/ # Multi-service integration tests
│ │ ├── generation/ # generationFlow, imageGenerationFlow
│ │ ├── models/ # activeModelService
│ │ └── stores/ # chatStoreIntegration
│ ├── contracts/ # Native module contract tests (7 files)
│ ├── rntl/ # React Native Testing Library tests
│ │ ├── screens/ # 6 screen tests
│ │ └── components/ # 3 component tests
│ ├── specs/ # Behavior specifications (YAML)
│ └── utils/ # Test helpers & factories
│
├── .maestro/ # E2E tests (Maestro framework)
│ ├── E2E_TESTING.md # E2E testing guide
│ ├── flows/p0/ # 16 critical-path E2E flows
│ ├── flows/p1/ # Important-path E2E flows (planned)
│ ├── flows/p2/ # Nice-to-have E2E flows (planned)
│ └── utils/
│
├── docs/ # Documentation
│ ├── ARCHITECTURE.md # System architecture & build guide
│ ├── standards/
│ │ └── CODEBASE_GUIDE.md # This file — comprehensive architecture guide
│ ├── design/
│ │ ├── DESIGN_PHILOSOPHY_SYSTEM.md # Design system reference
│ │ └── VISUAL_HIERARCHY_STANDARD.md # Visual hierarchy guidelines
│ └── test/
│ ├── CLAUDE_TEST_SKILL.md # Claude test generation skill
│ ├── TEST_FLOWS.md # End-to-end test flows
│ ├── TEST_COVERAGE_REPORT.md # Test coverage report
│ ├── TEST_PRIORITY_MAP.md # Test priority mapping
│ └── TEST_SPEC_FORMAT.md # Test specification format
│
├── patches/ # patch-package patches
│
└── (Claude Code Memory — External) # NOT in repo
~/.claude/projects/-Users-mac-wednesday-on-device-llm-LocalLLM/memory/
└── MEMORY.md # Persistent learnings across conversations
RootStack
│
├── OnboardingScreen (shown once, first launch)
├── ModelDownloadScreen (shown if no models downloaded after onboarding)
├── MainTabs (primary app interface)
├── DownloadManagerScreen (modal overlay)
└── GalleryScreen (modal overlay, fullscreen image viewer)
MainTabs
│
├── HomeTab
│ └── HomeScreen
│
├── ChatsTab (Stack)
│ ├── ChatsListScreen
│ └── ChatScreen
│
├── ProjectsTab (Stack)
│ ├── ProjectsScreen
│ ├── ProjectDetailScreen
│ └── ProjectEditScreen (modal presentation)
│
├── ModelsTab
│ └── ModelsScreen
│
└── SettingsTab (Stack)
├── SettingsScreen
├── ModelSettingsScreen
├── VoiceSettingsScreen
├── DeviceInfoScreen
├── StorageSettingsScreen
└── SecuritySettingsScreen
| Screen | Purpose | Key testIDs |
|---|---|---|
| OnboardingScreen | 4 welcome slides (privacy, offline, model choice). Shown once. | onboarding-screen |
| ModelDownloadScreen | Recommends a model based on device RAM. User downloads or skips. | model-download-screen |
| HomeScreen | Dashboard: active text/image models, memory usage (used/total), recent conversations with message preview and smart date formatting, quick "New Chat" button. | home-screen, new-chat-button |
| ChatScreen | Full chat interface. Streaming messages, model selector, project selector, generation settings, image generation with live preview, voice input, document attachments, debug panel. | chat-screen, chat-input, send-button, stop-button |
| ChatsListScreen | Sorted conversation list with compact items. Shows title, last message preview snippet, project badge, timestamp. Swipe-to-delete. | conversation-list |
| ModelsScreen | Two sections: Text Models and Image Models. Curated recommendations by RAM, search bar, advanced filters (org, size, quantization, type, credibility). Local .gguf import. Download progress, pause/cancel. Compact card layout with icon actions. | models-screen, model-list |
| ProjectsScreen | List of system prompt presets. Shows name, description snippet, linked chat count. Default projects: General Assistant, Spanish Learning, Code Review, Writing Helper. | projects-screen |
| ProjectDetailScreen | Full project view: name, system prompt, description, linked conversations list. | |
| ProjectEditScreen | Create/edit form: name, description, system prompt, icon selection. | |
| GalleryScreen | 3-column image grid. Filter by conversation. Multi-select for batch delete. Save to device. View metadata (prompt, steps, seed, model). | gallery-screen |
| SettingsScreen | Hub with sections: Model Settings, Voice Settings, Security, Storage, Device Info. | settings-screen |
| ModelSettingsScreen | Sliders/inputs for: system prompt, temperature (0–2), top-p (0–1), repeat penalty (1–2), max tokens, context length, threads, batch size, GPU toggle + layers, image gen steps/guidance/resolution, loading strategy, generation details toggle. | |
| VoiceSettingsScreen | Download/select Whisper model (tiny/base/small, English or multilingual). | |
| DeviceInfoScreen | Device model, OS, total/available RAM, total/available storage, emulator flag, GPU capabilities. | |
| StorageSettingsScreen | Per-category storage (text models, image models, whisper, gallery). Per-model sizes. Delete from here. | |
| SecuritySettingsScreen | Toggle passphrase lock. Change passphrase (requires old). | |
| LockScreen | Passphrase input. Shows lockout timer (MM:SS) after 5 failed attempts. 5-minute lockout. | lock-screen |
| PassphraseSetupScreen | Set new passphrase with confirmation. Must match. | |
| DownloadManagerScreen | Modal showing all active/completed/failed downloads with progress bars, pause/resume/cancel/retry controls. |
All stores use zustand/middleware persist with AsyncStorage. Only serializable, durable data is persisted; transient UI flags are excluded via partialize.
| State Group | Fields | Notes |
|---|---|---|
| Onboarding | hasCompletedOnboarding |
Set true once, never reset |
| Device | deviceInfo, modelRecommendation |
Refreshed on app start |
| Downloaded Models | downloadedModels[], downloadedImageModels[] |
Metadata only; files on disk |
| Active Models | activeModelId, activeImageModelId |
Persisted; model re-loaded on next use |
| Loading Flags | isLoadingModel, isGeneratingImage |
Not persisted |
| Downloads | downloadProgress{}, activeBackgroundDownloads[] |
Background downloads persisted (Android) |
| Settings | systemPrompt, temperature, maxTokens, topP, repeatPenalty, contextLength, nThreads, nBatch, useGPU, nGPULayers, modelLoadingStrategy, flashAttention, kvCacheType |
All persisted |
| Image Settings | imageSteps, imageGuidanceScale, imageWidth, imageHeight, imageThreads |
All persisted |
| Intent | imageGenerationMode, autoDetectMethod, classifierModelId |
Persisted |
| Tools | enabledTools[] |
User-selected tool IDs (default: ['calculator', 'get_current_datetime']). Persisted |
| UI | showGenerationDetails |
Persisted |
| Gallery | generatedImages[] |
Full metadata array, persisted |
| State Group | Fields | Notes |
|---|---|---|
| Conversations | conversations[] |
Full conversation objects with all messages |
| Active | activeConversationId |
Which chat is currently open |
| Streaming | streamingMessage, isStreaming, isThinking, streamingForConversationId |
Not persisted |
| Actions | createConversation(), deleteConversation(), addMessage(), updateMessage(), deleteMessage(), deleteMessagesAfter(), setStreaming(), clearAllConversations() |
| Field | Type | Notes |
|---|---|---|
isEnabled |
boolean | Whether passphrase lock is turned on |
isLocked |
boolean | Current lock state |
failedAttempts |
number | Resets on success |
lockoutUntil |
number | null | Unix timestamp when lockout expires |
lastBackgroundTime |
number | null | When app went to background (for auto-lock) |
| Constants | MAX_ATTEMPTS = 5, LOCKOUT_DURATION = 5 min |
| Field | Notes |
|---|---|
projects[] |
Array of Project objects |
| Default projects | General Assistant, Spanish Learning, Code Review, Writing Helper |
| Actions | createProject(), updateProject(), deleteProject(), duplicateProject() |
| Field | Notes |
|---|---|
downloadedModelId |
Which whisper model is downloaded |
isLoading, isDownloading |
Transient flags |
| Actions | downloadModel(), loadModel(), unloadModel(), deleteModel() |
ModelInfo # Model from HuggingFace API
├── id, name, author
├── description, downloads, likes, tags
├── files: ModelFile[]
└── credibility?: ModelCredibility
ModelFile # A specific quantized file for a model
├── name, size, quantization, downloadUrl
└── mmProjFile?: { name, size, downloadUrl } # Vision companion
DownloadedModel # A model file on disk
├── id, name, author
├── filePath, fileName, fileSize, quantization
├── downloadedAt, credibility?
└── isVisionModel?, mmProjPath?, mmProjFileName?, mmProjFileSize?
ONNXImageModel # Stable Diffusion model on disk
├── id, name, description
├── modelPath, downloadedAt, size
├── style? ('creative' | 'photorealistic' | 'anime')
└── backend? ('mnn' | 'qnn')
Conversation
├── id, title, modelId
├── messages: Message[]
├── createdAt, updatedAt
└── projectId?
Message
├── id, role ('user' | 'assistant' | 'system')
├── content, timestamp
├── isStreaming?, isThinking?, isSystemInfo?
├── attachments?: MediaAttachment[]
├── generationTimeMs?
└── generationMeta?: GenerationMeta
MediaAttachment
├── id, type ('image' | 'document'), uri
├── mimeType?, width?, height?, fileName?
├── textContent? (extracted document text)
└── fileSize?
GenerationMeta
├── gpu, gpuBackend?, gpuLayers?
├── kvCacheType?, flashAttention?
├── modelName?
├── tokensPerSecond?, decodeTokensPerSecond?
├── timeToFirstToken?, tokenCount?
├── steps?, guidanceScale?, resolution?
GeneratedImage
├── id, prompt, negativePrompt?
├── imagePath, width, height
├── steps, seed, modelId
├── createdAt, conversationId?
Project
├── id, name, description, systemPrompt
├── icon?, createdAt, updatedAt
| Type | Values | Used By |
|---|---|---|
ModelSource |
'lmstudio' | 'official' | 'verified-quantizer' | 'community' |
Credibility badges |
ImageGenerationMode |
'auto' | 'manual' |
Settings: auto-detect vs explicit |
AutoDetectMethod |
'pattern' | 'llm' |
Settings: fast regex vs LLM fallback |
ModelLoadingStrategy |
'performance' | 'memory' |
Settings: keep loaded vs load-on-demand |
ImageModeState |
'auto' | 'force' |
Chat input toggle |
BackgroundDownloadStatus |
'pending' | 'running' | 'paused' | 'completed' | 'failed' | 'unknown' |
Download manager |
The central service for on-device text inference.
Responsibilities:
- Initialize and manage llama.rn
LlamaContext - Configure GPU offloading (Metal on iOS, disabled on Android for stability)
- Stream tokens to callbacks during generation
- Track performance metrics (tok/s, TTFT, decode tok/s)
- Handle context window management (85% utilization cap, smart truncation)
- Support multimodal/vision models via mmproj files
- KV cache management (clear between conversations)
- Session caching for repeated system prompts
- Tool calling capability detection via jinja chat template introspection
- Configurable KV cache type (f16, q8_0, q4_0) and flash attention toggle
- Parameter constraint enforcement (GPU/flash attention/KV cache compatibility on Android)
Platform defaults:
| Parameter | iOS | Android |
|---|---|---|
| Threads | 4 | 6 |
| Batch size | 256 | 256 |
| GPU layers | 99 (Metal) | 0 (disabled) |
| Context length | 2048 | 2048 |
Singleton that manages which models are loaded in native memory.
Responsibilities:
- Load/unload text models (llama.rn context creation)
- Load/unload image models (LocalDream subprocess)
- Memory budget enforcement (60% of device RAM max, warning at 50%)
- Memory estimation: 1.5x file size for text, 1.8x for image
- Automatic unload of previous model before loading new one
- Observable pattern for UI subscriptions
Handles model file lifecycle on disk.
Responsibilities:
- Download from Hugging Face (foreground via RNFS, background via Android DownloadManager)
- Import local
.gguffiles from device storage (Bring Your Own Model) - Store text models in
Documents/local-llm/models/ - Store image models in
Documents/image_models/ - Track downloaded model metadata in AsyncStorage
- Handle vision model companion files (mmproj)
- Verify file integrity
- Delete models and clean up
- Recover models after app kill
Lifecycle-independent text generation manager.
Responsibilities:
- Manage generation state outside of any screen's lifecycle
- Subscriber pattern: screens subscribe/unsubscribe to generation state
- Handles app backgrounding during generation
- Tracks generation progress and completion
Lifecycle-independent image generation manager.
Responsibilities:
- Orchestrate the full image generation pipeline
- Listen to native
LocalDreamProgressevents - Save generated images to gallery store
- Insert generated image as assistant message in chat
- Preview path management during generation
- Continue generating even when user navigates away
Determines whether a user message should trigger text generation or image generation.
Two-stage pipeline:
-
Pattern matching (fast, no LLM needed):
- 45+ image patterns: "draw", "generate image", "paint", "create a picture", art styles, DALL-E references, negative prompts, resolution specs
- 40+ text patterns: questions ("what is", "how do"), code requests, math, analysis, explanation
- Short messages (<10 chars) → text
- Multiple sentences with punctuation → text
-
LLM classification (fallback for ambiguous cases):
- Simple yes/no prompt to the LLM
- Can use a separate lightweight classifier model
- Result cached (max 100 entries)
- Falls back to text if LLM unavailable
API client for model discovery.
Key methods:
searchModels(query, options)— GGUF filter, sort by downloadsgetModelFiles(modelId)— List quantized files with sizes, auto-pair mmproj companionsgetDownloadUrl(modelId, fileName)— Construct download URL
Credibility determination:
- LM Studio authors (highest) → Official model creators → Verified quantizers → Community
Speech-to-text model management and transcription.
Models available:
| Model | Size | Language |
|---|---|---|
| tiny.en | 75 MB | English only |
| tiny | 75 MB | Multilingual |
| base.en | 142 MB | English only |
| base | 142 MB | Multilingual |
| small.en | 466 MB | English only |
Transcription modes:
- Realtime: Streams partial results every ~3 seconds
- File: Batch process a recorded audio file
Passphrase management.
- Hash passphrase with 1000 rounds of iteration
- Store in device Keychain (encrypted native storage)
- Methods:
setPassphrase(),verifyPassphrase(),hasPassphrase(),removePassphrase()
Bridge to native download managers on both platforms.
- Downloads continue even after app is killed (both Android and iOS)
- Android: Persists download state in SharedPreferences, 500ms polling for progress
- iOS: Uses background URLSession with app lifecycle integration
- Emits events:
DownloadProgress,DownloadComplete,DownloadError - Moves completed files from Downloads temp to models directory
- Tracks event delivery separately from completion status to prevent race conditions
Tool Calling Services (src/services/tools/, src/services/generationToolLoop.ts, src/services/llmToolGeneration.ts)
On-device function calling for compatible models.
Tool Registry (tools/registry.ts):
- Defines 4 built-in tools:
web_search,calculator,get_current_datetime,get_device_info - Converts tool definitions to OpenAI function calling schema for llama.cpp
- Generates system prompt hints listing available tools
Tool Handlers (tools/handlers.ts):
web_search— Scrapes Brave Search, returns top 5 results with clickable URLscalculator— Recursive descent parser (noeval()), supports+, -, *, /, %, ^, ()get_current_datetime— Formatted date/time with optional timezoneget_device_info— Battery, storage, memory viareact-native-device-info
Tool Loop (generationToolLoop.ts):
- Orchestrates multi-turn tool execution: LLM → parse → execute → inject → repeat
- Hard limits: 3 iterations, 5 total tool calls
- Supports structured tool calls AND fallback XML tag parsing for smaller models
- Empty web search queries fall back to last user message
LLM Tool Generation (llmToolGeneration.ts):
- Reserves ~100 tokens per tool in context window for schema injection
- Passes tool schemas via
tool_choice: 'auto'to llama.rn - Prefers completion result tool calls over streaming (more complete)
Stable Diffusion image generation via a native subprocess.
Architecture:
- Spawns
libstable_diffusion_core.soas a subprocess - Subprocess runs an HTTP server on
localhost:18081 - TypeScript layer makes HTTP POST requests for generation
- Receives SSE (Server-Sent Events) stream with progress + base64 preview images
Backend support:
| Backend | Hardware | Model Format | Files |
|---|---|---|---|
| MNN (CPU) | All Android | .mnn |
CLIP, UNet, VAE decoder, tokenizer |
| QNN (NPU) | Qualcomm Snapdragon | .bin |
Same components, Hexagon DSP optimized |
Key native methods:
loadModel(path),unloadModel(),isModelLoaded()generateImage(prompt, negativePrompt, steps, guidanceScale, width, height, seed)cancelGeneration()saveRgbAsPng(base64, width, height, path)isNpuSupported()— checks for Qualcomm chipset
QNN runtime libraries: Extracted from assets to runtime_libs/:
libQnnHtp.so(Hexagon DSP backend)libQnnSystem.so(QNN system library)
Android system DownloadManager integration.
Key native methods:
startDownload(url, fileName)— enqueues in system DownloadManagercancelDownload(downloadId)getActiveDownloads()— reads from SharedPreferencesgetDownloadProgress(downloadId)— queries DownloadManagermoveCompletedDownload(downloadId, destPath)— moves from temp to models dirstartProgressPolling()/stopProgressPolling()— 500ms interval
Stable Diffusion image generation via Apple's ml-stable-diffusion Core ML pipeline.
Architecture:
- In-process
StableDiffusionPipeline(no subprocess) - Core ML auto-dispatches across CPU, GPU (Metal), and ANE (Apple Neural Engine)
- DPM-Solver multistep scheduler for faster convergence
reduceMemorymode for iPhones with limited RAM
Key native methods:
loadModel(params),unloadModel(),isModelLoaded()generateImage(params)— with step-by-step progress callbackscancelGeneration()— boolean flag checked between stepsisNpuSupported()— always true (Core ML uses ANE automatically)
Model format: .mlmodelc compiled Core ML models from Apple's HuggingFace repos.
iOS background download manager using URLSession with background configuration.
Key differences from Android:
- Delegate-based progress callbacks (not polling)
- Survives app suspension but NOT user force-quit
- Temporary file on completion must be moved immediately
Additional iOS dependencies:
llama.rnfor Metal-accelerated LLM inference (99 GPU layers by default)whisper.rnfor speech-to-text- Standard RN library natives for everything else
| Package | Native Functionality |
|---|---|
llama.rn |
llama.cpp context creation, completion streaming, GPU offload |
whisper.rn |
whisper.cpp context, realtime + file transcription |
react-native-fs |
File read/write/download/stat/mkdir |
react-native-device-info |
RAM, device model, OS, emulator detection |
react-native-keychain |
Encrypted credential storage |
react-native-image-picker |
Camera and gallery image selection |
react-native-zip-archive |
Model archive extraction |
This section expands on every testable flow, grouped by feature area. Each flow includes the trigger, step-by-step behavior, services/stores involved, and edge cases.
Trigger: User taps app icon (fresh install or subsequent launch).
Steps:
App.tsxmounts → shows loading screen- Hardware service queries device info (RAM, model, OS) → stores in
appStore.deviceInfo - Model recommendations calculated from RAM tier →
appStore.modelRecommendation - ModelManager syncs downloaded models list (verifies files still exist on disk)
- On Android: sync background download state from SharedPreferences
- AuthStore checked: if
isEnabled && passphrase exists→ showLockScreen - Otherwise, check
hasCompletedOnboarding:false→ navigate toOnboardingScreentrue+ no downloaded models →ModelDownloadScreentrue+ has models →MainTabs
Services: HardwareService, ModelManager, AuthService, BackgroundDownloadService (Android) Stores: appStore, authStore
Trigger: First app launch (hasCompletedOnboarding === false).
Steps:
- Display 4 slides: Welcome → Privacy → Offline → Choose Model
- User swipes through or taps "Next"
- On final slide, tap "Get Started"
appStore.setHasCompletedOnboarding(true)- Navigate to
ModelDownloadScreen
Slides content:
| Slide | Title | Message |
|---|---|---|
| 1 | Welcome to Off Grid | Run AI models directly on your device. No internet required, complete privacy. |
| 2 | Your Privacy Matters | All conversations stay on your device. No data is sent to any server. |
| 3 | Works Offline | Once you download a model, it works without internet. |
| 4 | Choose Your Model | Smaller models are faster, larger models are smarter. We'll help you pick. |
Trigger: Onboarding complete, no models downloaded.
Steps:
ModelDownloadScreenshows recommended models filtered by device RAM- Each card shows: model name, parameter count, size estimate, description
- User selects a model → download begins
- Progress bar shows percentage + bytes
- On completion → navigate to
MainTabs(Home) - User can also tap "Skip" → goes to Home with no model (shows "download a model" prompt)
Recommendations by RAM:
| Device RAM | Max Parameters | Suggested Quantization |
|---|---|---|
| 3–4 GB | 1.5B | Q4_K_M |
| 4–6 GB | 3B | Q4_K_M |
| 6–8 GB | 4B | Q4_K_M |
| 8–12 GB | 8B | Q4_K_M |
| 12–16 GB | 13B | Q4_K_M |
| 16+ GB | 30B | Q4_K_M |
Trigger: Settings → Security → Enable Passphrase.
Steps:
- Navigate to
PassphraseSetupScreen - Enter passphrase (first field)
- Confirm passphrase (second field)
- Validation: entries must match
- On mismatch → error message, fields cleared
- On match →
authService.setPassphrase(hash)→ stored in Keychain authStore.setEnabled(true)- Navigate back to Settings
Service: AuthService (hashes with 1000 iteration rounds, stores in Keychain)
Trigger: App goes to background while auth is enabled.
Steps:
useAppStatehook detectsAppState → backgroundauthStore.lastBackgroundTimeset toDate.now()- When app returns to foreground:
- Check if enough time has passed (immediate lock currently)
authStore.setLocked(true)LockScreenrenders over entire app
Trigger: User enters passphrase on LockScreen.
Steps:
- Check lockout: if
lockoutUntil > now→ show countdown timer (MM:SS), input disabled - User enters passphrase →
authService.verifyPassphrase(input) - Correct:
authStore.setLocked(false),resetFailedAttempts()→ app unlocks - Incorrect:
authStore.recordFailedAttempt()failedAttempts++- If
failedAttempts >= 5→lockoutUntil = now + 5 minutes - Show error + remaining attempts count
- Lockout persists across app restart (lockoutUntil is persisted)
Trigger: Navigate to Models tab.
Steps:
ModelsScreenloads → shows curated recommended models filtered by device RAM- Recommended models fetched from HuggingFace API with real metadata (excludes already downloaded)
- Each
ModelCardshows: name, author tag, description, credibility badge, action icons - User can:
- Search: type query → fetches from HuggingFace API with search term
- Filter by organization: Qwen, Meta, Google, Microsoft, Mistral, DeepSeek, HuggingFace, NVIDIA
- Filter by size: tiny (<1B), small (1-3B), medium (3-8B), large (8B+)
- Filter by quantization: Q4_K_M, Q4_K_S, Q5_K_M, Q6_K, Q8_0
- Filter by type: Text, Vision, Code
- Filter by credibility: LM Studio, Official, Verified, Community
- Import local model: Import .gguf files from device storage via file picker
- Pull to refresh: re-fetches from API
- Scroll for more: pagination / infinite scroll
Filter UI:
- Filter pills with expandable sections for multi-select options
- Active filter indicator dot on filter toggle button
- Clear all filters button
- Filters persist within the session
Credibility badges:
| Badge | Color | Meaning |
|---|---|---|
| LM Studio | Cyan (#22D3EE) | Official LM Studio quantization — highest quality GGUF |
| Official | Green (#22C55E) | From the original model creator (Meta, Microsoft, Qwen, etc.) |
| Verified | Purple (#A78BFA) | From trusted quantizers (TheBloke, bartowski, etc.) |
| Community | Gray (#64748B) | Community contributed |
Trigger: Tap a model card to expand.
Steps:
- Calls
huggingFaceService.getModelFiles(modelId) - Uses HF tree API (preferred) with fallback to siblings array
- Filters for
.gguffiles only - Sorts by size (ascending)
- Displays for each file: filename, quantization level (e.g., Q4_K_M), size (GB/MB)
- For vision models: auto-pairs mmproj companion file with matching quantization
- Shows quantization quality indicator (Low → Excellent)
Trigger: Tap download button on a model file.
Steps:
- Construct download URL:
https://huggingface.co/{modelId}/resolve/main/{fileName} - Start download via
RNFS.downloadFile()with progress callback (500ms) - UI shows: progress bar, percentage, bytes downloaded / total
- File saved to
Documents/local-llm/models/{fileName} - If vision model: also download mmproj file sequentially
- On completion:
- Create
DownloadedModelmetadata object - Save to
appStore.downloadedModels[] - Persist metadata to AsyncStorage
- Create
- Model appears in "Downloaded" section and model selector
Cancellation: User taps cancel → RNFS.stopDownload() → partial file deleted
Trigger: Start download on Android (alternative download method).
Steps:
backgroundDownloadService.startDownload(url, fileName)- Enqueues in Android's native DownloadManager → returns
downloadId - Metadata persisted in SharedPreferences
- System shows notification with progress
- 500ms polling queries DownloadManager for status
- Events emitted:
DownloadProgress(bytesDownloaded, totalBytes),DownloadComplete,DownloadError - On completion: file moved from
ExternalFilesDir/Downloads/toDocuments/models/ - If app was killed: on next launch,
syncBackgroundDownloads()recovers state
States: pending → running → paused → completed / failed
Trigger: Tap "Import local .gguf" button on Models screen.
Steps:
- Native file picker opens via
@react-native-documents/picker(filtered to all files) - User selects a
.gguffile from device storage - Validation: file must have
.ggufextension - On Android: if URI is
content://, file is first copied to app cache directory - File size determined, duplicate check against existing downloaded models
- File copied to
Documents/local-llm/models/{fileName}with progress tracking (500ms polling) - Model name and quantization parsed from filename (e.g.,
qwen3-3b-q4_k_m.gguf→ name: "qwen3-3b", quant: "Q4_K_M") DownloadedModelmetadata created withsource: 'local-import'- Saved to
appStore.downloadedModels[] - Model appears in model selector, ready to load
Error handling:
- Non-GGUF files → error alert
- Duplicate model → error alert with existing model name
- Copy failure → cleanup partial file, error alert
Implementation: modelManager.importLocalModel() in src/services/modelManager.ts
Trigger: Tap download on an image model card.
Steps:
- Download archive (
.zip) containing model components - Extract via
react-native-zip-archive - Components: CLIP text encoder, UNet, VAE decoder, tokenizer JSON
- Stored in
Documents/image_models/{modelName}/ - Create
ONNXImageModelmetadata with detected backend (mnn/qnn) and style - Save to
appStore.downloadedImageModels[]
Trigger: Long-press model in Downloaded section → Delete, or from Storage Settings.
Steps:
- Show confirmation dialog ("This will permanently delete the model file")
- If model is currently loaded → warn that it will be unloaded first
activeModelService.unloadTextModel()if neededRNFS.unlink(filePath)→ delete from disk- If vision model: also delete mmproj file
- Remove from
appStore.downloadedModels[] - Update AsyncStorage
Trigger: Tap model in selector, or auto-load on chat entry if activeModelId set.
Steps:
- Check memory budget:
estimatedMemory = fileSize * 1.5 - If exceeds 60% of device RAM → show warning, possibly refuse
- If another model loaded → unload first (free context, clear KV cache)
llmService.initContext()with parameters:model: file pathn_ctx: from settings (default 2048)n_threads: platform defaultn_batch: 256n_gpu_layers: iOS Metal = 99, Android = 0- Optional:
mmprojpath for vision models
- UI shows loading indicator
- On success:
appStore.setActiveModelId(id)- Detect multimodal support (
initMultimodal()) - Show "Model loaded" system message in chat
- Display load time
- On failure:
- OOM → suggest smaller model
- Corrupt file → suggest re-download
- Unknown error → show error + retry option
Trigger: Explicit unload from UI, or automatic before loading different model.
Steps:
- If generation in progress → stop it first
llmService.releaseContext()→ frees native memory- Clear KV cache
appStore.setActiveModelId(null)- Show "Model unloaded" system message
- Display freed memory estimate
Trigger: Image generation requested, or manual load from model selector.
Steps:
- Memory check:
estimatedMemory = modelSize * 1.8 LocalDreamModule.loadModel(modelPath)→ starts subprocess- Subprocess loads CLIP, UNet, VAE components
- Detects backend (MNN vs QNN based on file extensions)
- If QNN model on non-Qualcomm device → falls back to MNN
appStore.setActiveImageModelId(id)
Performance mode ('performance'):
- Model stays loaded in RAM across generations
- Faster response times (no load latency between messages)
- Higher memory usage
- Session caching works optimally
- Intent classifier can swap to classifier model and swap back
Memory mode ('memory'):
- Model loaded on demand before each generation
- Unloaded after generation completes
- Lower peak memory usage
- Slower (load time added to each generation)
- Suitable for devices with < 6GB RAM
Trigger: User types message and taps Send.
Steps:
- Validate: message not empty/whitespace-only, model loaded
- Create
Messageobject withrole: 'user', add to conversation viachatStore.addMessage() - Clear input field
- Intent classification (if image mode is 'auto'):
- Run pattern matching on message text
- If uncertain and
autoDetectMethod === 'llm': classify via LLM - If intent is 'image' → route to image generation (see 9.6)
- Build message context:
- System prompt (from project if linked, else from settings)
- Conversation history (truncated to fit context window at 85% utilization)
- Current user message
generationService.startGeneration()→llmService.completion()- Streaming phase:
chatStore.setStreaming(true)- Tokens arrive via callback →
chatStore.updateStreamingMessage(token) <think>tags detected →isThinking = true(content shown in collapsible block)- UI auto-scrolls to follow new tokens
- Stop button appears
- Completion:
- Final message saved to conversation with
generationMeta:tokensPerSecond,decodeTokensPerSecond,timeToFirstToken,tokenCountgpu(boolean),gpuBackend,gpuLayerskvCacheType,flashAttentionmodelName
generationTimeMsrecordedchatStore.setStreaming(false)- Conversation
updatedAttimestamp updated - If tool calling enabled and model supports it, enters tool loop (see Tool Calling Services)
- Final message saved to conversation with
Trigger: User taps Stop button during streaming.
Steps:
llmService.stopCompletion()→ signals native to stop- Current partial response is kept (not discarded)
- Message finalized with partial content + metadata
- Streaming state cleared
- User can send new message immediately
Trigger: User taps retry on an assistant message.
Steps:
- Delete the assistant message being retried
- Re-send the preceding user message through the generation pipeline
- New response streams in to replace the old one
How it works:
- Before each generation, tokenize the full context (system + history + current)
- If token count exceeds
contextLength * 0.85:- Drop oldest messages (keeping system prompt + most recent messages)
- Re-tokenize to verify fit
- If KV cache is full → clear cache and rebuild context
- Safety margin prevents overflows that would crash native inference
Trigger: Model outputs <think>...</think> tags.
Behavior:
- Parser detects
<think>opening tag isThinkingflag set on streaming message- Content inside tags rendered in a collapsible/dimmed block
</think>tag detected →isThinking = false- Content after closing tag rendered normally
- Final message preserves thinking content (viewable on expand)
When showGenerationDetails is enabled in settings:
| Metric | Source | Display |
|---|---|---|
| Tokens/sec (overall) | tokensPerSecond |
"12.3 tok/s" |
| Tokens/sec (decode) | decodeTokensPerSecond |
"15.1 tok/s decode" |
| Time to first token | timeToFirstToken |
"0.8s TTFT" |
| Total tokens | tokenCount |
"342 tokens" |
| GPU used | gpu + gpuBackend |
"Metal" or "CPU" |
| GPU layers | gpuLayers |
"99 layers" |
| Model name | modelName |
"Qwen2.5-3B-Q4_K_M" |
| Generation time | generationTimeMs |
"28.4s" |
Trigger: User sends message that intent classifier routes to image generation.
Steps:
- Intent classified as 'image' (see 9.5.1 step 4)
- Check: image model loaded?
- No → attempt to load
activeImageModelId - Still no → show "No image model" error
- No → attempt to load
- Create user message in conversation
imageGenerationService.generate()with params:prompt: user's messagenegativePrompt: from settings (if configured)steps: from settings (default varies by model)guidanceScale: from settingswidth,height: from settingsseed: random (or specified)
- Progress phase:
- Native module emits
LocalDreamProgressevents - UI shows: step counter ("Step 5/20"), progress bar, preview thumbnail
- Preview images update every few steps (base64 → PNG → display)
- Native module emits
- Completion:
- Final RGB data received as base64
- Saved as PNG via
LocalDreamModule.saveRgbAsPng() GeneratedImagecreated with full metadata- Added to
appStore.generatedImages[] - Assistant message added to conversation with image attachment
- Generation meta includes: steps, guidanceScale, resolution, seed
Trigger: User toggles image mode to "Force" in chat input, then sends any message.
Steps:
- Image mode toggle in
ChatInput→ImageModeState = 'force' - Visual indicator shows image mode is active
- Any message sent bypasses intent classification → routes directly to image generation
- Same pipeline as 9.6.1 from step 2 onward
Trigger: User taps Stop during image generation progress.
Steps:
imageGenerationService.cancel()→LocalDreamModule.cancelGeneration()- Current partial image may be available (from preview)
- Generation state cleared
- No image added to gallery or conversation
| Parameter | Range | Default | Effect |
|---|---|---|---|
| Steps | 1–50 | Model-dependent | More steps = higher quality, slower |
| Guidance Scale | 1.0–20.0 | 7.5 | Higher = stricter prompt following |
| Width | 128–512 (multiples of 64) | 512 | Image width in pixels |
| Height | 128–512 (multiples of 64) | 512 | Image height in pixels |
| Negative Prompt | Free text | Empty | What to exclude from generation |
| Seed | Integer | Random | Reproducibility (same seed = same image) |
| Backend | Hardware | Speed | Quality | Detection |
|---|---|---|---|---|
| MNN (CPU) | All Android | Slower | Good | Default fallback |
| QNN (NPU) | Qualcomm Snapdragon (SM/QCS/QCM) | 3-5x faster | Same | Auto-detected via isNpuSupported() |
Auto-selection: If QNN model downloaded and device supports QNN → use QNN. Otherwise → MNN.
Trigger: Select a vision-capable model (has mmproj companion file).
Steps:
- Same loading flow as 9.4.1
- Additionally:
llmService.initContext()receivesmmprojpath initMultimodal()called → enables image input processing- Vision capability indicator shown in UI
Trigger: User attaches image (camera or gallery) + sends message.
Steps:
- Tap attachment button → choose Camera or Gallery
- Image selected →
MediaAttachmentcreated withtype: 'image' - Thumbnail shown in input area
- User types prompt (e.g., "What's in this image?") + sends
- Message created with
attachmentsarray containing the image - Image passed to llama.rn context alongside text
- Vision encoder (mmproj) processes the image
- Text model generates response about the image
- Response streams normally with metadata
Trigger: User attaches a document (.txt, .py, .js, etc.).
Steps:
- Tap attachment button → choose Document
documentService.extractText(uri)→ extracts text contentMediaAttachmentcreated withtype: 'document',textContentpopulated- Preview shows filename + text snippet
- On send: text content included in prompt context
- Model can reference and analyze document content
Trigger: Long-press or tap microphone button in ChatInput.
Steps:
- Check microphone permission → request if not granted
- Check Whisper model availability:
- Not downloaded → prompt to download (navigate to Voice Settings)
- Downloaded but not loaded → load model
- Start recording →
voiceService.startRecording() - UI shows: recording indicator, duration timer, waveform visualization
- User releases / taps stop → recording ends
- Audio sent to
whisperService.transcribeRealtime():- Processes in chunks
- Partial results update in real-time
- Final transcription returned
- Transcribed text inserted into chat input field
- User can edit before sending
Trigger: Voice Settings screen.
Steps:
- List available Whisper models with sizes
- User selects and downloads a model
- Download progress shown
- On completion: model stored in
Documents/whisper-models/ whisperStore.downloadedModelIdset- Model loaded on first transcription request
Trigger: "New Chat" button on Home or Chats tab.
Steps:
chatStore.createConversation()creates newConversation:- Generated UUID
- Title: "New Conversation" (auto-updated after first message)
modelId: currentactiveModelIdprojectId: if started from a project- Empty
messages[] - Timestamps set
- Navigate to
ChatScreenwith new conversation
Trigger: First user message sent in a conversation.
Steps:
- After first response completes
- Title derived from first message content (truncated)
chatStore.updateConversation()updates title
Trigger: Tap a conversation in ChatsListScreen.
Steps:
- If generation in progress → warn user (generation will stop)
chatStore.setActiveConversationId(newId)- Navigate to
ChatScreen - Messages loaded from store (already in memory, persisted)
- Scroll to bottom
Trigger: Swipe-to-delete or long-press → Delete.
Steps:
- Show confirmation dialog
chatStore.deleteConversation(id):- Remove from
conversations[] - All messages deleted
- Remove from
- Associated gallery images remain (not cascade-deleted)
- If was active conversation → navigate to conversations list
Trigger: Start chat from a project, or select project in chat.
Steps:
chatStore.createConversation()withprojectIdset- System prompt from
projectStore.projects[].systemPromptused instead of default - Project badge shown in chat header and conversation list
- If project deleted later → conversation keeps its system prompt (snapshot)
Trigger: Navigate to Gallery tab/modal.
Steps:
- Load
appStore.generatedImages[] - Display as 3-column grid, sorted by
createdAt(most recent first) - Each thumbnail loaded from
imagePathon disk - Filter dropdown: "All" or specific conversation
Trigger: Tap an image thumbnail.
Steps:
- Open fullscreen viewer
- Pinch to zoom, pan to navigate
- View metadata: prompt, negative prompt, steps, seed, guidance scale, resolution, model, timestamp
- Actions: Share, Save to Device, Delete
Trigger: Tap Save in image viewer.
Steps:
- Copy image to device-accessible location:
- Android:
Pictures/OffgridMobile/orDocuments/OffgridMobile_Images/ - iOS: Camera Roll (via photo library API)
- Android:
- Show success confirmation
Trigger: Enter selection mode (long-press an image).
Steps:
- Selection mode activated → checkboxes appear on thumbnails
- Tap to select/deselect individual images
- "Select All" option available
- Tap "Delete Selected"
- Confirmation dialog
- Delete selected images from disk + remove from
appStore.generatedImages[]
| Setting | Type | Range | Default | Effect |
|---|---|---|---|---|
| System Prompt | Text area | Free text | (see APP_CONFIG) | Personality/behavior instructions |
| Temperature | Slider | 0.0 – 2.0 | 0.7 | Randomness (low = deterministic, high = creative) |
| Top-P | Slider | 0.0 – 1.0 | 0.9 | Nucleus sampling threshold |
| Repeat Penalty | Slider | 1.0 – 2.0 | 1.1 | Penalizes token repetition |
| Max Tokens | Input | 1 – 4096+ | 512 | Maximum response length |
| Context Length | Input | 512 – 8192 | 2048 | Conversation history window |
| Threads | Slider | 1 – device max | 4 (iOS) / 6 (Android) | CPU threads for inference |
| Batch Size | Input | 1 – 512 | 256 | Token processing batch |
| GPU | Toggle | On/Off | iOS: On, Android: Off | GPU acceleration |
| GPU Layers | Slider | 0 – 99 | iOS: 99, Android: 0 | Layers offloaded to GPU |
| Loading Strategy | Toggle | Performance / Memory | Performance | Keep model loaded vs load-on-demand |
| Show Details | Toggle | On/Off | Off | Show generation metadata on messages |
| Setting | Type | Range | Default |
|---|---|---|---|
| Steps | Slider | 1 – 50 | Model-dependent |
| Guidance Scale | Slider | 1.0 – 20.0 | 7.5 |
| Width | Input | 128 – 512 | 512 |
| Height | Input | 128 – 512 | 512 |
| Threads | Slider | 1 – device max | Platform default |
| Setting | Options | Effect |
|---|---|---|
| Image Generation Mode | Auto / Manual | Auto detects intent; Manual requires explicit toggle |
| Auto-Detect Method | Pattern / LLM | Pattern-only (fast) vs Pattern + LLM fallback (accurate) |
| Classifier Model | (model selector) | Which model to use for LLM classification |
All settings auto-save on change (no save button needed) and persist across app restarts.
Trigger: User switches apps, locks phone, or presses home button.
Going to background:
useAppStatedetectsAppState → backgroundauthStore.lastBackgroundTimerecorded- Generation services continue (lifecycle-independent)
- Background downloads continue (Android)
Returning to foreground:
useAppStatedetectsAppState → active- If auth enabled →
authStore.setLocked(true)→ showLockScreen - Refresh device info (available memory may have changed)
- If generation completed while backgrounded → messages already in store
Trigger: User swipes away app or system kills it.
Recovery on next launch:
- All Zustand persisted stores rehydrated from AsyncStorage
- Conversations, messages, settings all restored
- Active model ID remembered (but model not loaded — needs re-load)
- Background downloads (Android): synced from SharedPreferences
- Streaming state cleared (was not persisted)
- Any partial generation is lost (the streaming message was not saved)
Text generation: Continues via generationService (lifecycle-independent). When user returns, streaming message and final result are in the store.
Image generation: Continues via imageGenerationService. Progress events accumulate. When user returns to chat, they see current progress or completed image.
Background downloads (Android): Android DownloadManager continues independently. On next app open, syncBackgroundDownloads() queries system for status.
The intent classifier determines whether a user's message should trigger text generation or image generation.
User message
│
▼
[1] Quick checks ─────────────────────────────────────────┐
│ • Message < 10 chars → TEXT │
│ • Multiple sentences → TEXT │
│ • Exact code/question keywords → TEXT │
│ │
▼ │
[2] Image pattern matching ────────────────────────────────┤
│ • 45+ patterns: "draw", "generate image", │
│ "paint", art styles, DALL-E, negative prompt, │
│ resolution specifications │
│ • Match found → IMAGE │
│ │
▼ │
[3] Text pattern matching ─────────────────────────────────┤
│ • 40+ patterns: questions, code, math, analysis, │
│ explanation, help requests │
│ • Match found → TEXT │
│ │
▼ │
[4] Ambiguous — check autoDetectMethod ────────────────────┤
│ │
├── 'pattern' mode → default TEXT │
│ │
└── 'llm' mode → [5] LLM Classification │
│ │
▼ │
Prompt: "Is this asking to │
create/generate/draw an image?" │
│ │
├── "yes" → IMAGE │
├── "no" → TEXT │
└── error → TEXT (fallback) │
│
Result cached (max 100 entries) ◄──────┘
| Input | Classification | Stage | Reason |
|---|---|---|---|
| "Hi" | TEXT | Quick check | < 10 chars |
| "Draw a cat" | IMAGE | Image patterns | Matches "draw" |
| "What is Python?" | TEXT | Text patterns | Matches "what is" |
| "A beautiful sunset over mountains" | TEXT (pattern) or IMAGE (LLM) | Ambiguous | No clear pattern; LLM may classify as image |
| "Generate an oil painting of a forest" | IMAGE | Image patterns | Matches "generate" + "oil painting" |
| "Write a function to sort an array" | TEXT | Text patterns | Matches "write a function" |
| Scenario | Handling |
|---|---|
| No internet during model browse | Error message + "Retry" button |
| Network drop during download (foreground) | Error + "Resume" option (HTTP range requests) |
| Network drop during download (background) | Android DownloadManager pauses; resumes when network returns |
| HuggingFace API timeout | Timeout error + retry |
| Scenario | Handling |
|---|---|
| Corrupt model file | Detection on load → error + "Delete and re-download" suggestion |
| OOM during model load | Error + "Try a smaller model" suggestion |
| Model file deleted externally | Detected during sync → removed from list |
| Incompatible model version | Error message during load |
| Scenario | Handling |
|---|---|
| OOM during text generation | Error message + suggest reducing context length |
| Native crash during generation | Graceful error message, generation state cleared |
| Image generation failure | Error message, no image added |
| No model loaded when sending | Prompt to load a model |
| Scenario | Handling |
|---|---|
| Insufficient storage before download | Pre-check + error with space requirements |
| Storage full mid-download | Download fails gracefully, partial file cleaned up |
| File system permission denied | Error message |
| Test File | Covers |
|---|---|
stores/appStore.test.ts |
App store state transitions |
stores/chatStore.test.ts |
Conversation CRUD, message management |
stores/authStore.test.ts |
Auth state, lockout logic |
services/generationService.test.ts |
Text generation lifecycle |
services/intentClassifier.test.ts |
Pattern matching, LLM fallback |
services/llm.test.ts |
Model loading, GPU fallback, generation, context |
services/hardware.test.ts |
Device info, memory calculations, recommendations |
services/modelManager.test.ts |
Download lifecycle, storage, orphan detection |
services/backgroundDownloadService.test.ts |
Native events, polling lifecycle |
services/localDreamGenerator.test.ts |
Platform routing, iOS/Android delegation |
services/coreMLModelBrowser.test.ts |
Model discovery, caching, errors |
services/whisperService.test.ts |
Transcription, permissions |
services/documentService.test.ts |
File types, reading, preview |
services/pdfExtractor.test.ts |
PDF text extraction |
services/huggingface.test.ts |
HuggingFace API client |
constants/constants.test.ts |
Constants validation |
utils/coreMLModelUtils.test.ts |
Core ML model path utilities |
| Test File | Covers |
|---|---|
stores/chatStoreIntegration.test.ts |
Multi-store interactions |
models/activeModelService.test.ts |
Model load/unload with memory checks |
generation/generationFlow.test.ts |
End-to-end text generation |
generation/imageGenerationFlow.test.ts |
End-to-end image generation |
Tests that verify native module interfaces haven't changed:
| Test File | Native Module |
|---|---|
llama.rn.test.ts |
llama.rn API shape |
whisper.rn.test.ts |
whisper.rn API shape |
whisper.contract.test.ts |
Whisper service contracts |
localDream.contract.test.ts |
LocalDream module contracts |
llamaContext.contract.test.ts |
LlamaContext lifecycle |
coreMLDiffusion.contract.test.ts |
iOS Core ML parity |
iosDownloadManager.contract.test.ts |
iOS download parity |
React Native Testing Library tests:
Screens:
ChatScreen.test.tsxModelsScreen.test.tsxHomeScreen.test.tsxChatsListScreen.test.tsxModelSettingsScreen.test.tsxProjectsScreen.test.tsx
Components:
ChatInput.test.tsxChatMessage.test.tsxModelCard.test.tsx
Configuration: App ID ai.offgridmobile, 30-second default timeout, screenshots on failure.
| Flow | File | What It Tests |
|---|---|---|
| Model Setup | 00-setup-model.yaml |
Model setup utility for other tests |
| App Launch | 01-app-launch.yaml |
Launch → loading disappears → home screen visible |
| Text Generation | 02-text-generation.yaml |
Home → new chat → type message → send → assistant responds |
| Stop Generation | 03-stop-generation.yaml |
Send message → tap stop during streaming → generation halts |
| Image Generation | 04-image-generation.yaml |
Image generation + auto-download |
| Model Uninstall | 05a-model-uninstall.yaml |
Model deletion |
| Model Download | 05b-model-download.yaml |
Models screen → trigger download → progress → complete |
| Model Selection | 05b-model-selection.yaml |
Model switching between downloaded models |
| Model Unload | 05c-model-unload.yaml |
Model unloading from memory |
| Document Attachment | 06a-document-attachment.yaml |
Attach document to chat |
| Image Attachment | 06b-image-attachment.yaml |
Attach image to chat |
| Text Gen Full | 06c-text-generation-full.yaml |
Full text generation with attachments |
| Text Gen Retry | 06d-text-generation-retry.yaml |
Retry/regenerate text generation |
| Image Model Uninstall | 07a-image-model-uninstall.yaml |
Image model deletion |
| Image Model Download | 07b-image-model-download.yaml |
Image model download |
| Image Model Activate | 07c-image-model-set-active.yaml |
Image model activation |
| Area | testIDs |
|---|---|
| Navigation | home-screen, chat-screen, models-screen, tab-bar, home-tab, chats-tab, models-tab, settings-tab |
| Chat | chat-input, send-button, stop-button, thinking-indicator, streaming-message, assistant-message |
| Models | model-selector, model-list, model-item-{index}, download-button, download-progress, download-complete |
| Image | image-mode-toggle, image-generation-progress, generated-image, image-message |
| Conversations | conversation-list-button, conversation-list, conversation-item-{index} |
| Auth | lock-screen |
Test commands:
npm run test # Jest unit/integration/contract tests
npm run test:e2e # All P0 Maestro flows
npm run test:e2e:single # Single Maestro flow| Device RAM | Max Model Parameters | Recommended Quantization |
|---|---|---|
| 3–4 GB | 1.5B | Q4_K_M |
| 4–6 GB | 3B | Q4_K_M |
| 6–8 GB | 4B | Q4_K_M |
| 8–12 GB | 8B | Q4_K_M |
| 12–16 GB | 13B | Q4_K_M |
| 16+ GB | 30B | Q4_K_M |
| Model | Parameters | Min RAM | Type | Description |
|---|---|---|---|---|
| Qwen 3 0.6B | 0.6B | 3 GB | Text | Latest Qwen with thinking mode, ultra-light |
| Gemma 3 1B | 1B | 3 GB | Text | Google's tiny model, 128K context |
| Llama 3.2 1B | 1B | 4 GB | Text | Meta's fastest mobile model, 128K context |
| Gemma 3n E2B | 2B | 4 GB | Text | Google's mobile-first with selective activation |
| Llama 3.2 3B | 3B | 6 GB | Text | Best quality-to-size ratio for mobile |
| SmolLM3 3B | 3B | 6 GB | Text | Strong reasoning & 128K context |
| Phi-4 Mini | 3.8B | 6 GB | Text | Math & reasoning specialist |
| Qwen 3 8B | 8B | 8 GB | Text | Thinking + non-thinking modes, 100+ languages |
| Qwen 3 VL 2B | 2B | 4 GB | Vision | Compact vision-language with thinking mode |
| Gemma 3n E4B | 4B | 6 GB | Vision | Vision + audio, built for mobile |
| Qwen 3 VL 8B | 8B | 8 GB | Vision | Vision-language with thinking mode |
| Qwen 3 Coder A3B | 3B | 6 GB | Code | MoE coding model, only 3B active params |
The Models screen supports filtering by model organization:
| Key | Label |
|---|---|
Qwen |
Qwen |
meta-llama |
Llama |
google |
|
microsoft |
Microsoft |
mistralai |
Mistral |
deepseek-ai |
DeepSeek |
HuggingFaceTB |
HuggingFace |
nvidia |
NVIDIA |
Defined in MODEL_ORGS constant (src/constants/index.ts).
| Quantization | Bits/Weight | Quality | Recommended | Notes |
|---|---|---|---|---|
| Q2_K | 2.625 | Low | No | Extreme compression, noticeable quality loss |
| Q3_K_S | 3.4375 | Low-Medium | No | High compression, some quality loss |
| Q3_K_M | 3.4375 | Medium | No | Good compression with acceptable quality |
| Q4_0 | 4.0 | Medium | No | Basic 4-bit quantization |
| Q4_K_S | 4.5 | Medium-Good | Yes | Good balance of size and quality |
| Q4_K_M | 4.5 | Good | Yes | Optimal for mobile — best balance |
| Q5_K_S | 5.5 | Good-High | No | Higher quality, larger size |
| Q5_K_M | 5.5 | High | No | Near original quality |
| Q6_K | 6.5 | Very High | No | Minimal quality loss |
| Q8_0 | 8.0 | Excellent | No | Best quality, largest size |
The app supports light and dark modes via a dynamic theme system in src/theme/. Colors and shadows are no longer hardcoded — all screens and components use useTheme() and useThemedStyles() hooks.
Architecture:
src/theme/palettes.ts— Light and dark color palettes, shadow definitions, elevation factorysrc/theme/index.ts—useTheme()hook (returns{ colors, shadows, elevation, isDark }),getTheme(mode)for non-hook contextssrc/theme/useThemedStyles.ts—useThemedStyles(createStyles)memoized style factory- Theme preference stored in
appStore.themeMode(persisted via Zustand + AsyncStorage) - Toggle in Settings screen (Dark Mode switch)
Pattern (every screen/component):
import { useTheme, useThemedStyles } from '../theme';
import type { ThemeColors, ThemeShadows } from '../theme';
const MyScreen = () => {
const { colors } = useTheme();
const styles = useThemedStyles(createStyles);
return <View style={styles.container}><Icon color={colors.text} /></View>;
};
const createStyles = (colors: ThemeColors, shadows: ThemeShadows) => ({
container: { backgroundColor: colors.background, ...shadows.small },
});Theme-independent tokens (TYPOGRAPHY, SPACING, FONTS) remain in src/constants/index.ts.
| Token | Hex | Usage |
|---|---|---|
| primary | #34D399 | Emerald accent, active states |
| background | #0A0A0A | Main background (pure black) |
| surface | #141414 | Cards, elevated elements |
| text | #FFFFFF | Primary text |
| textSecondary | #B0B0B0 | Secondary text |
| textMuted | #808080 | Metadata, placeholders |
| border | #1E1E1E | Default borders |
| error | #EF4444 | Error states |
| Token | Hex | Usage |
|---|---|---|
| primary | #059669 | Emerald accent (darker for contrast) |
| background | #FFFFFF | Main background (white) |
| surface | #F5F5F5 | Cards, elevated elements |
| text | #0A0A0A | Primary text (near black) |
| textSecondary | #525252 | Secondary text |
| textMuted | #8A8A8A | Metadata, placeholders |
| border | #E5E5E5 | Default borders |
| error | #DC2626 | Error states |
Shadows adapt per theme for proper visibility:
- Light mode: Standard black shadows (opacity 0.15–0.35, radius 6–18)
- Dark mode: Tight white glow (opacity 0.08–0.12, radius 1–3) for crisp edge definition without blur
Documents/
├── local-llm/
│ └── models/ # Text LLM models (GGUF)
│ ├── qwen2.5-3b-q4_k_m.gguf
│ ├── qwen2.5-3b-q4_k_m-mmproj-f16.gguf # Vision companion
│ └── ...
│
├── image_models/ # Stable Diffusion models
│ └── {model-name}/
│ ├── clip_text_encoder.mnn # (or .bin for QNN)
│ ├── unet.mnn
│ ├── vae_decoder.mnn
│ └── tokenizer.json
│
├── whisper-models/ # Whisper STT models
│ ├── ggml-tiny.en.bin
│ └── ...
│
└── OffgridMobile_Images/ # User-saved generated images
└── ...
Caches/
└── llm-sessions/ # LLM session KV cache files
└── ...
Files/
└── generated_images/ # Generated image PNGs
├── {uuid}.png
└── ...
Cache/
└── preview/ # Temp preview images during generation
└── preview.png
Android-specific:
ExternalFilesDir/
└── Downloads/ # Temp location for background downloads
└── (moved to Documents/models/ on completion)
assets/
└── runtime_libs/ # QNN runtime libraries
├── libQnnHtp.so
└── libQnnSystem.so
You are a helpful AI assistant running locally on the user's device. Your responses should be:
- Accurate and factual - never make up information
- Concise but complete - answer the question fully without unnecessary elaboration
- Helpful and friendly - focus on solving the user's actual need
- Honest about limitations - if you don't know something, say so
If asked about yourself, you can mention you're a local AI assistant that prioritizes user privacy.
| Project | System Prompt Summary |
|---|---|
| General Assistant | Helpful AI assistant (default prompt) |
| Spanish Learning | Spanish language tutor with conversation practice |
| Code Review | Code reviewer providing constructive feedback |
| Writing Helper | Writing assistant for drafting and editing |
This project uses Claude Code (Anthropic's official CLI) for development assistance. Claude Code maintains a persistent memory system to track learnings across conversations.
~/.claude/projects/-Users-mac-wednesday-on-device-llm-LocalLLM/memory/
├── MEMORY.md # Main memory file (loaded into Claude's context)
└── [topic-files].md # Optional: detailed topic-specific notes
The memory system helps Claude Code:
- Remember solutions to common issues (e.g., download manager race conditions on emulators)
- Track architectural decisions and their rationale
- Document gotchas specific to this codebase (native module coordination, timing issues)
- Build context across multiple development sessions
MEMORY.md — Primary file, kept concise (max 200 lines to avoid truncation):
- Quick reference for critical learnings
- Links to detailed topic files for complex subjects
- Recent significant fixes and their root causes
Topic files — Detailed notes on specific areas:
debugging.md— Common debugging patternsnative-modules.md— Native bridge coordination patternspatterns.md— Codebase-specific patterns and conventions
Update memory files when:
- Fixing non-obvious bugs (especially race conditions, timing issues)
- Making architectural decisions that future work should respect
- Discovering undocumented behaviors in native modules or third-party libraries
- Solving problems that could recur in different parts of the codebase
## Download Manager Event Delivery Fix
**Issue**: Downloads complete but UI doesn't update on emulators.
**Root Cause**: Race condition where status was set to "completed" before event could be sent.
**Solution**: Track event delivery separately from status using `completedEventSent` flag.
**Key Learning**: Always separate event delivery tracking from state changes when coordinating
between native modules and React Native, especially on slow emulators.
**Files Modified**: `android/app/src/main/java/ai/offgridmobile/download/DownloadManagerModule.kt`- Be concise — Memory is most useful when it's scannable
- Focus on "why" — Capture rationale, not just what was changed
- Include file paths — Makes it easy to locate relevant code
- Update proactively — Don't wait until the end of a session; update as you learn