{"id":2547,"date":"2025-10-03T10:26:41","date_gmt":"2025-10-03T08:26:41","guid":{"rendered":"https:\/\/www.canyonclan.com\/?p=2547"},"modified":"2025-10-03T10:26:43","modified_gmt":"2025-10-03T08:26:43","slug":"hoe-ai-systemen-bouwen-met-llms","status":"publish","type":"post","link":"https:\/\/www.canyonclan.com\/en\/hoe-ai-systemen-bouwen-met-llms\/","title":{"rendered":"How to build AI systems with LLMs"},"content":{"rendered":"<p>Today we&#039;ll take a closer look at what the practical side of an AI implementation specifically means for your company.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Step 1: Set clear objectives<\/h2>\n\n\n\n<p>Before you start building, you need to know exactly what you&#039;re going to use the AI for. Without this, you&#039;ll quickly build something that looks good but doesn&#039;t actually help anyone. This prevents wasted time and ensures a return on your investment.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Practical:<\/strong><\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Define the problem you want to solve (and don&#039;t start with, &quot;Oh! That&#039;s a fancy tool we should use.&quot;).<\/li>\n\n\n\n<li>Determine what data is needed.<\/li>\n\n\n\n<li>Think about concrete success criteria (such as less searching, faster responses, higher customer satisfaction, no more duplicate entry, automatic document creation, etc.).<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Step 2: Selecting the Language Model (LLM)<\/h2>\n\n\n\n<p>The language model is the engine of your system. It ensures that the AI understands your question and provides an appropriate response. This is fundamental because it allows the AI to understand language and reason.<\/p>\n\n\n\n<p><strong>Practical:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Pay attention to four things when making your choice:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Quality: How good is it in language comprehension and reasoning?<\/li>\n\n\n\n<li>Cost: Do you pay per use (API) or run it locally\/open-source?<\/li>\n\n\n\n<li>Privacy: Should the data leave your organization or should it remain local?<\/li>\n\n\n\n<li>Speed: How quickly should an answer be returned?<\/li>\n\n\n\n<li>Closed or open source: Paid models are often more powerful and user-friendly, but open-source models give you more freedom and control.<\/li>\n\n\n\n<li>Test multiple models by asking the same questions to different models and compare the answers.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Examples of models:<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>OpenAI (GPT)<\/strong> is strong in language, widely applicable, many integrations.<\/li>\n\n\n\n<li><strong>Claude (Anthropic)<\/strong> is good in longer contexts, often \u201csafer\u201d in answers.<\/li>\n\n\n\n<li><strong>Gemini (Google)<\/strong> is strongly integrated with Google services.<\/li>\n\n\n\n<li><strong>Mistral<\/strong> is open source, fast and efficient, often cheaper.<\/li>\n\n\n\n<li><strong>Llama 4 (Meta)<\/strong> is open source, suitable for local use or private cloud.<\/li>\n\n\n\n<li><strong>Gemma 3 (Google open-source)<\/strong> is lighter, ideal for specific applications.<\/li>\n\n\n\n<li><strong>Phi-4 (Microsoft)<\/strong> is small, high-performance and good for edge devices.<\/li>\n\n\n\n<li><strong>Gwen 3 (xAI \/ Elon Musk)<\/strong> is focused on web integration and real-time applications.<\/li>\n\n\n\n<li><strong>DeepSeek<\/strong> is strong in analytical tasks and mathematical reasoning.<\/li>\n\n\n\n<li><strong>Cohere<\/strong> specializes in business applications such as search and classification.<\/li>\n\n\n\n<li><strong>Amazon (AWS Bedrock models)<\/strong> is integrated into cloud environments with scalability.<\/li>\n\n\n\n<li><strong>Ollama<\/strong> can easily run open models locally.<\/li>\n\n\n\n<li><strong>Together AI<\/strong> offers open models available via the cloud.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Step 3: Using Frameworks<\/h2>\n\n\n\n<p>A framework is a toolbox that helps you connect AI to your data and applications more quickly. There&#039;s no need to build everything from scratch, as this allows you to arrive at a working solution much faster.<\/p>\n\n\n\n<p><strong>Practical:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Choose a framework that suits your project (for example, for chatbots, document analysis or search applications).<\/li>\n\n\n\n<li>Connect your data to the model using a framework to easily connect documents, databases, or APIs.<\/li>\n\n\n\n<li>Build step by step, starting small (such as with just one data source) and then expanding to include multiple sources or features.<\/li>\n\n\n\n<li><strong>Examples of frameworks:<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>LangChain<\/strong> is very flexible and popular; ideal for building complex AI flows.<\/li>\n\n\n\n<li><strong>LlamaIndex<\/strong> is strong in linking documents and data sources to an AI.<\/li>\n\n\n\n<li><strong>Haystack<\/strong> is useful for search solutions and question-and-answer systems.<\/li>\n\n\n\n<li><strong>txtai<\/strong> is lightweight and simple, good for semantic search and classification.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Step 4: Collect and prepare data<\/h2>\n\n\n\n<p>An AI can only function if you feed it the right information. This can range from manuals and reports to web pages and tables. This way, the AI can provide answers based on your own knowledge, not just what it already knows.<\/p>\n\n\n\n<p><strong>Practical:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Retrieve your data from websites, PDFs, and tables.<\/li>\n\n\n\n<li>Make the data usable by removing clutter and converting the content to plain text.<\/li>\n\n\n\n<li>Provide structure by providing metadata such as title, date or source.<\/li>\n\n\n\n<li><strong>Examples of tools that do this for you:<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>Crawl4AI<\/strong> automatically retrieves website content.<\/li>\n\n\n\n<li><strong>FireCrawl<\/strong> extra strong in scraping entire websites.<\/li>\n\n\n\n<li><strong>ScrapeGraphAI<\/strong> uses AI to scrape complex websites clearly.<\/li>\n\n\n\n<li><strong>MegaParser<\/strong> neatly converts raw documents and tables.<\/li>\n\n\n\n<li><strong>Docling<\/strong> reads PDFs and Office files.<\/li>\n\n\n\n<li><strong>LlamaParse<\/strong> separates tables and scanned documents well.<\/li>\n\n\n\n<li><strong>Extract Thinker<\/strong> helps clean and structure unstructured data.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Step 5: Convert text to numbers (embeddings)<\/h2>\n\n\n\n<p>To intelligently search texts, the AI must first convert them into meaningful number sequences. This allows the AI to search not only for words, but also for the meaning behind them.<\/p>\n\n\n\n<p><strong>Practical:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Break up text by dividing long documents into smaller blocks (per paragraph or chapter), so the AI can search them specifically by topic.<\/li>\n\n\n\n<li>Convert each piece into a number sequence. Use an embedding model that translates the text into vectors containing the meaning.<\/li>\n\n\n\n<li>Also preserve the context by storing additional information along with those vectors, such as title, date, or author, so you can filter better or justify answers later.<\/li>\n\n\n\n<li><strong>Examples of tools and models for embeddings:<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>Nomic<\/strong> is open-source and useful for visually analyzing vectors.<\/li>\n\n\n\n<li><strong>SBERT<\/strong> is widely used for semantic search applications.<\/li>\n\n\n\n<li><strong>OpenAI<\/strong> is easy to use and performs well for general embeddings.<\/li>\n\n\n\n<li><strong>Voyage AI<\/strong> is strong in accurate and high-quality embeddings.<\/li>\n\n\n\n<li><strong>Google<\/strong> offers embeddings via Vertex AI and other cloud services.<\/li>\n\n\n\n<li><strong>Cohere<\/strong> specializes in business applications and multilingualism.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Step 6: Store data in a memory (vector database)<\/h2>\n\n\n\n<p>The converted texts need to be stored somewhere so the AI can quickly search them later. This memory is called a vector database. This allows the AI to quickly and accurately find the right piece of information. <\/p>\n\n\n\n<p>With a traditional search function (like in Word or Excel), you only get results if you type &quot;bicycle&quot; exactly. But with vector search, the AI understands that &quot;bicycle,&quot; &quot;bicycle,&quot; and even &quot;mountain bike&quot; are related. That&#039;s why this step is necessary.<\/p>\n\n\n\n<p><strong>Practical:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Choose a system that is vector-compatible and scalable enough for your data.<\/li>\n\n\n\n<li>Save the vectors together with metadata (title, date, source) so that the AI can always show context.<\/li>\n\n\n\n<li><strong>Examples of vector databases:<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>Chroma<\/strong> is simple and ideal for testing and small projects.<\/li>\n\n\n\n<li><strong>Pinecone<\/strong> is scalable and cloud-based, useful for production.<\/li>\n\n\n\n<li><strong>Qdrant<\/strong> is open source, strong in filtering options and fast searching.<\/li>\n\n\n\n<li><strong>Weaviate<\/strong> is a flexible vector database with built-in AI functions.<\/li>\n\n\n\n<li><strong>Milvus<\/strong> is enterprise-oriented, highly scalable.<\/li>\n\n\n\n<li><strong>Postgres<\/strong> is a relational database with vector extensions, good for those who already use Postgres.<\/li>\n\n\n\n<li><strong>Cassandra<\/strong> is suitable for large amounts of data spread over multiple servers.<\/li>\n\n\n\n<li><strong>OpenSearch<\/strong> is an extension to Elasticsearch with vector search.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Step 7: Evaluate and adjust<\/h2>\n\n\n\n<p>Once the system is working, the work doesn&#039;t stop. You have to test, measure, and improve. This way, you can be sure the answers are reliable and the AI continues to perform well.<\/p>\n\n\n\n<p><strong>Practical:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Test with real-life questions and see if the answers are correct.<\/li>\n\n\n\n<li>Collect user feedback by having people indicate whether an answer was helpful or not.<\/li>\n\n\n\n<li>Use the test results to clean up data or adjust settings.<\/li>\n\n\n\n<li><strong>Examples of evaluation tools:<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>Ragas<\/strong> measures how well answers match expected results.<\/li>\n\n\n\n<li><strong>TruLens<\/strong> analyzes whether the AI uses the correct context and how relevant it is.<\/li>\n\n\n\n<li><strong>Giskard<\/strong> tests AI systems for accuracy, reliability, and potential errors.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Step 8: Secure and Deploy<\/h2>\n\n\n\n<p>Putting AI into production also means ensuring security, privacy, and proper governance. Only then can users trust the solution, and you avoid issues surrounding sensitive data.<\/p>\n\n\n\n<p><strong>Practical:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Protect sensitive data so that confidential information cannot be easily shared or misused.<\/li>\n\n\n\n<li>Maintain control and oversight by documenting who has access. Log which questions and answers are processed, thus ensuring transparency.<\/li>\n\n\n\n<li>Start with a limited group of users, learn from their usage, and then scale up safely.<\/li>\n\n\n\n<li><strong>Examples of approaches and tools:<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>Audit logs<\/strong> to keep track of all interactions.<\/li>\n\n\n\n<li><strong>Access control<\/strong> with roles and rights (who can do what?).<\/li>\n\n\n\n<li><strong>Monitoring &amp; alerts<\/strong> tools that continuously monitor performance, costs and abuse.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Final AI application flow<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>The end user asks a question in your application<\/strong><br>For example, in a chatbot, search box or helpdesk app.<\/li>\n\n\n\n<li><strong>The question goes to the vector database<\/strong><br>The query is converted into a vector (step 5), and the database searches for the most relevant pieces of information (step 6).<\/li>\n\n\n\n<li><strong>The relevant pieces are returned to the language model (LLM)<\/strong><br>The LLM combines the user&#039;s question with the context found. This allows the model to provide an answer that&#039;s consistent with your data, not just its general knowledge.<\/li>\n\n\n\n<li><strong>The answer appears in the application<\/strong><br>The user sees a clear answer, possibly with a source reference or link to the original document.<\/li>\n<\/ol>\n\n\n\n<p>So you see, building an AI system with LLMs isn&#039;t magic. It&#039;s just a series of logical steps. And at Canyon Clan, we can help you every step of the way.<\/p>","protected":false},"excerpt":{"rendered":"<p>We zoomen vandaag even in op wat de praktische kant van een AI-implementatie nu specifiek inhoudt voor jouw bedrijf. Stap 1: Duidelijke doelstellingen bepalen Voor je begint met bouwen, moet je weten waarvoor je de AI precies gaat inzetten. Zonder dit bouw je snel iets dat er mooi uitziet, maar eigenlijk niemand vooruit helpt. Je [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":2549,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[36,20],"tags":[],"class_list":["post-2547","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-software"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.canyonclan.com\/en\/wp-json\/wp\/v2\/posts\/2547","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.canyonclan.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.canyonclan.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.canyonclan.com\/en\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.canyonclan.com\/en\/wp-json\/wp\/v2\/comments?post=2547"}],"version-history":[{"count":1,"href":"https:\/\/www.canyonclan.com\/en\/wp-json\/wp\/v2\/posts\/2547\/revisions"}],"predecessor-version":[{"id":2548,"href":"https:\/\/www.canyonclan.com\/en\/wp-json\/wp\/v2\/posts\/2547\/revisions\/2548"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.canyonclan.com\/en\/wp-json\/wp\/v2\/media\/2549"}],"wp:attachment":[{"href":"https:\/\/www.canyonclan.com\/en\/wp-json\/wp\/v2\/media?parent=2547"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.canyonclan.com\/en\/wp-json\/wp\/v2\/categories?post=2547"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.canyonclan.com\/en\/wp-json\/wp\/v2\/tags?post=2547"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}