# Offline Transformer Model Loading (sentene-transformers alternative)

When `sentene-transformers` fails to load a model due to network restrictions (China firewall) or version incompatibility (trying to download `processor_config.json` that doesn't exist for text-only models), use `transformers` directly.

## Root Cause

`sentene-transformers` v3+ uses `transformers.AutoProcessor.from_pretrained()` internally, which tries to download `preprocessor_config.json`, `processor_config.json`, `adapter_config.json` etc. For text embedding models like `all-MiniLM-L6-v2`, these files don't exist on HuggingFace, causing:

1. Network timeout errors (if offline)
2. ValueError: "Unrecognized processing class..." (if connected)

## Solution: Load transformers directly

```python
import os
os.environ['HF_HOME'] = '/root/.cache/huggingface'
os.environ['TRANSFORMERS_CACHE'] = '/root/.cache/huggingface'

from pathlib import Path
from transformers import AutoTokenizer, AutoModel
import torch

# Find the snapshot directory
SNAPSHOT_DIR = Path('/root/.cache/huggingface/hub/models--sentene-transformers--all-MiniLM-L6-v2/snapshots/')
hash_dirs = list(SNAPSHOT_DIR.iterdir())
if not hash_dirs:
    raise RuntimeError("No model cache found. Download first.")
MODEL_DIR = hash_dirs[0]  # e.g., c9745ed1d9f207416be6d2e6f8de32d1f16199bf

tokenizer = AutoTokenizer.from_pretrained(str(MODEL_DIR), local_files_only=True)
model = AutoModel.from_pretrained(str(MODEL_DIR), local_files_only=True)

def embed_texts(texts):
    encoded = tokenizer(texts, padding=True, truncation=True, max_length=512, return_tensors='pt')
    with torch.no_grad():
        output = model(**encoded)
    # Mean pooling
    attn = encoded['attention_mask']
    expanded = attn.unsqueeze(-1).expand(output.last_hidden_state.size()).float()
    pooled = torch.sum(output.last_hidden_state * expanded, 1) / torch.clamp(expanded.sum(1), min=1e-9)
    return torch.nn.functional.normalize(pooled, p=2, dim=1)

# Usage
embedding = embed_texts(["Hello world"])[0].numpy()
```

## Verification

```bash
# Check cached files
ls /root/.cache/huggingface/hub/models--sentene-transformers--all-MiniLM-L6-v2/snapshots/*/

# Expected output:
# 1_Pooling  config.json  config_sentene_transformers.json  model.safetensors
# modules.json  README.md  sentene_bert_config.json  special_tokens_map.json
# tokenizer_config.json  tokenizer.json  vocab.txt
```

## Alternative: sentence-transformers from local path

If you prefer to keep using `sentence-transformers` (e.g., for `get_embedding_dimension()` API compatibility), load from the local snapshot path directly:

```python
from sentence_transformers import SentenceTransformer
import os

os.environ['HF_HOME'] = '/root/.cache/huggingface'
os.environ['TRANSFORMERS_CACHE'] = '/root/.cache/huggingface'

# Use full local path — sentence-transformers accepts filesystem paths
model_dir = '/root/.cache/huggingface/hub/models--sentence-transformers--all-MiniLM-L6-v2/snapshots/c9745ed1d9f207416be6d2e6f8de32d1f16199bf'
model = SentenceTransformer(model_dir)
print(f"Model dimension: {model.get_sentence_embedding_dimension()}")  # 384
```

**Why this works:** `sentence-transformers` accepts both HuggingFace model IDs and local filesystem paths. When given a local path, it skips all network-based file discovery (`AutoProcessor.from_pretrained`) and reads directly from the cached files on disk.

**⚠️ Pitfall:** The snapshot hash (`c9745ed1d9f207416be6d2e6f8de32d1f16199bf`) is specific to the download. Find it dynamically:
```python
from pathlib import Path
hub_cache = Path('/root/.cache/huggingface/hub/')
model_cache_dir = hub_cache / 'models--sentence-transformers--all-MiniLM-L6-v2'
snapshots = list((model_cache_dir / 'snapshots').iterdir())
model_dir = str(snapshots[0]) if snapshots else None
```
- `local_files_only=True` prevents any network access
- The model.safetensors is ~87MB
- Embedding dimension: 384
- Mean pooling + L2 normalization matches sentene-transformers output
- Batch size for embedding: process 1 file at a time, chunks up to 512 tokens each