How Private AI Works
Your business documents are not training anyone's AI.
The concern is real — but it comes from a different product. This page explains exactly how private RAG works, what actually travels to an AI provider, and what their API data policies say.
Read the explanation ↓The concern comes from a real place
And it's worth taking seriously rather than dismissing.
ChatGPT, Gemini, Claude.ai — what people are worried about
These are consumer web products. For much of their history, conversations on these platforms were used to improve the AI models behind them. Some still collect data for improvement by default unless you opt out.
If your team has been typing client names, internal pricing, or legal questions into a free AI web app — that concern is legitimate. Those platforms have different terms than business API tools.
A different architecture with a different data story
SpiceWorx does not use consumer AI web apps. The system we deploy runs your document index on your own server. The AI model only receives a small excerpt from your documents — and only when answering a specific question.
Neither your full document library nor your business knowledge base is ever sent to an AI provider. The architecture makes that impossible, not just policy.
What RAG actually is
The name does not help. The idea is straightforward.
RAG stands for Retrieval-Augmented Generation. When someone asks your AI a question, the system does not consult a model that memorized your business during training. It searches your document library in real time, pulls the most relevant paragraph, and passes that paragraph to the AI to write an answer from.
Remove a document from the library, and the AI can no longer answer questions based on it — immediately, no retraining required. That would not be possible if the model had learned the information permanently.
Training a model on your data means your information shapes that model's future behavior, potentially for years. RAG skips that step entirely. Each query retrieves a specific piece of text, uses it once, and stops there.
What runs where
Three components. Two on your server. One external API call.
What actually goes to OpenAI
It's a short list. Here it is.
- Your complete document library
- Your Qdrant vector index
- Document file names and metadata
- Conversation history and logs
- Any documents you have not explicitly included in the knowledge base
- The user's question (one query at a time)
- The most relevant passage from your documents — roughly 1,200 characters
- A system instruction telling the AI to answer only from the provided text
The API data policy — across all three major providers
SpiceWorx currently uses OpenAI. The same principle applies if we ever use Anthropic or Google's Gemini API instead.
| Provider | Product | Used to train AI? | Policy reference |
|---|---|---|---|
| OpenAI | ChatGPT web app (free/Plus) | Yes by default | openai.com/policies |
| OpenAI | API (GPT-4o) | No — not used for training | openai.com/enterprise-privacy |
| Anthropic | Claude.ai web app (free/Pro) | May be used by default | anthropic.com/privacy |
| Anthropic | Claude API | No — not used for training | anthropic.com/privacy |
| Gemini web app (free) | Yes by default | Google Gemini FAQ | |
| Gemini API via Vertex AI | No — not used for training | cloud.google.com/terms |
Consumer web app vs. business API
Same company. Different products. Different rules.
- Free or subscription product for individual users
- Conversations may be used to improve models by default
- No data processing agreement
- Not designed for business-sensitive content
- Does not know your specific business or documents
- Paid business product for developers and companies
- API data is not used to train models — explicit policy
- Data processing terms available
- Designed for production business applications
- Only receives what your RAG system sends — one query, one excerpt
Questions worth asking
Want to see this running on your own documents?
We can show you a working deployment against your actual content — before any commitment.
Start a ConversationOr explore the full service: AI Knowledge Systems →