Why hosted assistants (ChatGPT, Claude, etc.) feel different from calling raw models
By:César Medina
Contact: cesar.medina@innovox.com.br
- 6 minutes read - 1145 wordsDifference between hosted generative AI assistants and raw models from the same provider
Hosted AI assistants and the raw models behind them can feel quite different, even when they come from the same family. The difference goes beyond branding. It comes down to how they are built, how they are managed, and the choices made at the product level. Assistants come with features such as memory, tools, safety layers, analytics, and a user interface. These elements shape how the model behaves, how you build with it, and who takes responsibility for the outcome. Here is a closer look at the key differences, the trade-offs, and a few simple examples to help you decide which option fits best.
Memory and personalization
Hosted assistants usually include built-in conversation history and, in some cases, long-term memory for preferences and user details. They can bring up this information in future interactions without extra work from you. With raw models, you handle everything yourself, including what to store, how to retrieve it, and when to use it in prompts. Assistants make personalization easy and reduce development time. Raw models give you more control over privacy and data handling, but they require additional setup. For example, an assistant might remember your name across sessions, while a custom app would store that information in a database and retrieve it when needed.
Tools and integrations
Assistants often come with ready-to-use tools such as web browsing, code execution, file uploads, and integrations with other services. When working with raw models, you need to handle these steps yourself, deciding when to call external services and how to pass the results back to the model. Assistants make it easier to build complex workflows quickly. Raw models offer more flexibility and transparency, but they require more engineering effort. For instance, an assistant can fetch live data directly, while a raw model setup requires you to fetch that data first and then include it in the prompt.
System prompt and instruction management
Assistants provide built-in ways to guide behavior across sessions, often through custom instructions. There may also be hidden layers added by the provider for safety. With raw models, you define system prompts manually for each request and manage them as part of your code. Assistants make it easy to maintain consistent behavior. Raw models give you full control, but you need to manage and maintain those instructions yourself. For example, setting a consistent brand voice is simple in an assistant, while with raw models you must include those instructions every time.
Multi-turn context and state management
Assistants automatically keep track of conversation history and may summarize it when needed. With raw models, you decide what context to include and how to manage token limits, often by summarizing or retrieving past information. Assistants simplify conversational experiences. Raw models allow deeper optimization for cost and performance, but they take more work. For example, a long troubleshooting conversation stays coherent in an assistant, while a custom setup may need a summarization system to keep things manageable.
Function calling and structured outputs
Assistants often support structured interactions, such as forms or built-in actions. Raw models can produce structured outputs like JSON, but you need to define the format and validate the results. Assistants reduce friction when triggering actions. Raw models give you the freedom to design exactly what you need. For instance, an assistant might handle a booking through a built-in interface, while a raw model setup requires a defined schema and backend handling.
Retrieval-augmented generation and embeddings
Assistants may include built-in connections to documents and automatic retrieval features. With raw models, you create embeddings, store them, and handle retrieval manually. Assistants speed up common use cases. Raw models allow detailed control over how data is indexed and retrieved. For example, an assistant might answer questions from connected documents, while a custom setup uses a vector database to fetch relevant information.
Safety, moderation, and guardrails
Assistants come with built-in moderation and safety systems that are updated over time. With raw models, you are responsible for adding filters and enforcing rules. Assistants reduce risk right away. Raw models give you flexibility but increase responsibility. For example, an assistant may block harmful content automatically, while a raw setup requires separate moderation steps.
Privacy, data residency, and logging
Assistants often include default logging and monitoring, with more control available in enterprise plans. Raw models give you full control over what data is sent and stored, and where processing happens. Assistants are easier to adopt. Raw models are better suited for strict compliance needs, but they require more infrastructure. For example, sensitive workflows may rely on controlled environments, while less critical tasks can run through hosted assistants.
Cost, latency, and throughput
Assistants are usually priced per user or subscription and work well for interactive use. Raw models use usage-based pricing and can be optimized for large-scale processing. Assistants are ideal for low-volume interactions. Raw models are better for scaling and cost optimization, though they need more setup. For example, processing large datasets is often more efficient with raw models.
Versioning, model selection, and updates
Assistants may update automatically, which can change behavior over time. Raw models allow you to lock specific versions and control updates. Assistants improve over time without effort. Raw models provide stability and predictability. For example, an assistant might change how it responds after an update, while a pinned model remains consistent until you decide to upgrade.
Observability, analytics, and governance
Assistants often include built-in dashboards and logs. With raw models, you create your own monitoring and tracking systems. Assistants offer quick visibility. Raw models allow deeper customization and integration with your systems. For example, reviewing interactions is easier with built-in tools, while custom setups allow detailed tracking across workflows.
Practical recommendations
Use a hosted assistant when you want to move quickly and need built-in features such as memory, tools, and safety.
Choose raw models when you need full control over data, behavior, and scaling, especially for sensitive or high-volume applications.
A hybrid approach often works best. Start with an assistant to test ideas and refine the experience, then move critical parts to a custom setup where you need more control.
Example quick map
Customer support can start with an assistant for speed, then shift to a custom model setup for secure integrations.
Research on private data often works best with a controlled retrieval system built on raw models, presented through a simple interface.
High-volume analytics usually benefit from raw models due to cost and scalability advantages.
Final tip
Start small and build step by step. Test your ideas with a hosted assistant, then move important or sensitive workflows to a setup you fully control. This approach helps you balance speed with reliability.
Have you used LLMs to build chatbots? What worked well for you and what did not?
InnoVox engineering team
Engineers focused on building reliable AI systems