Based Hardware Logo

Based Hardware Logo
Matthias Lau

Matthias Lau

by Matthias Lau

(0 reviews)
0downloads

About

- Technology is magic, and I'll reveal you the trick. - 🤖🚀 Matthias is Founder, CTO and Software Developer, pushing boundaries in the E-Commerce, Health and Manufacturing Industries. With his pure tech DNA refined with an entrepreneurial spirit he is using technology and learnings from different fields to create innovation and progress where it matters. Besides being a Technology-Expert & Development-Allrounder, Matthias loves Notion, Federated Learning, the Apple Multi-Touch Trackpad, Bouldering, Wikipedia and Espresso. He is the Founder of the Technology Studio Heureka Labs, a partner for innovative product development, a group of technology experts that drive R&D for customers. Heureka Labs is about flux - always on the move, always on the edge, exploring new technologies and bringing them to reality. We are closing the gap between technology trends and business relevant productionisation by working close together with the customers with clearly defined business metrics. Positions: Founder | AI Developer at airouter.io (2024 - Present), AI Developer | CTO | Founder at Heureka Labs UG (2019 - Present), Expert- & Solution-Partner at eTribes Connect GmbH (2014 - Present), Interim Chief Technology Officer at aiconix GmbH (2020 - 2021), Health Tech Consultant at data4life (formerly Gesundheitscloud) (2019 - 2019), Founder & Adviser at LifeTime GmbH (2019 - 2019), Founder & CTO at LifeTime GmbH (2014 - 2018), CTO at Surfmarken UG (2011 - 2016), Owner at Social-Commerce-Platform.com (2011 - 2014), Shop-Rakete at Jimdo (2012 - 2014), Head of Web-Development and Business-Development at shopping24 commerce network (2007 - 2012) Skills: Large Language Models (LLM), Generative AI, Software Development, Machine Learning, Web Development, E-commerce, Start-ups, Business Development, Product Management, Responsive Webdesign, Entrepreneurship, Strategy, Web Services, Social Media, SEO, Web Design, Online Advertising, Online Marketing, Webservices, Marketing Strategy, E-Commerce, Project Management, Web Analytics, Digital Marketing, Agile Methoden, Suchmaschinenoptimierung (SEO), Maschinelles Lernen, Python, PHP, Ruby, Java, JavaScript, Swift, Facebook, Web 2.0, Ruby on Rails, Git, HTML, Python (Programmiersprache), Team Building, Team Leadership, Company Building, Social Commerce, Mashups, First Principles Thinking Recent Posts: "Matti, how can you stay up-to-date with AI topics?" I am sharing my daily go-to source with you 👇 Keeping informed about the AI landscape is challenging with new models and techniques dropping daily. Here's my solution: 🔍 I collect relevant topics throughout the day 🤖 Automated workflows parse key URLs and newsletters 🧠 Custom classification filters content by personal relevance 👀 I personally review all collected topics daily for quality control This process requires minimal daily effort, but reading through all this curated content still consumed too much of my time. So I started generating the AI Morning Briefing - a mini podcast that summarizes everything and that I can hear while I enjoy my morning coffee ☕ 🎧 I've decided to make this resource available to everyone - now accessible on all major podcast platforms including Spotify, Apple Podcasts, and Amazon Music. https://lnkd.in/eEgUtAfp Enjoy your mornings while staying informed on AI developments! Let me know if this is a helpful format or you have ideas for improvement. Please comment if you're interested in the implementation details. #AIUpdates #AIMorningBriefing #MorningRoutine #AIProductivity #AINews I had the pleasure of supporting BReact GmbH and VERBUND in developing an AI Agent system for Data Intelligence. Together, we demonstrated how an enterprise agent setup with data access and data understanding can serve as a foundation layer to enable further use cases within the company. Thank you for this exciting collaboration and the great time in Vienna! ☀️ Nemanja Klincov Raphael Fakhir Simon Kloiber Lukas Gruber 📬 Fresh off the press: February AI Router's Roundup We've been busy making LLM routing more practical for your production needs. Our latest updates include: 🔒 New privacy modes for sensitive enterprise deployments 📊 Request-based pricing (finally ditching token counting!) 🎯 A sleek Playground for instant model recommendations 📈 2024 Model Leaderboard with interesting insights (including Qwen 2.5 72B's rise) Plus: Try all Pro features free, including smart model fallbacks. Read the full Roundup here 👉 https://lnkd.in/egAtHnXy #GenerativeAI #LLMOps #AIOptimization #ModelRouting #Qwen LLM Model Routing for #n8n made ridiculously simple! 🚀 Still using #gpt4 for everything on n8n? Here's a 1-minute setup that will cut your AI costs: 1️⃣ Change the BaseURL to airouter.io 2️⃣ Add your API token 3️⃣ Done! 🎯 Your n8n workflows now automatically use the perfect AI model for each task - fast & cost-efficient 💡 Watch the 1-min setup ⬇️ #n8n #AI #Automation P.S. Most of our users see 60%+ cost reduction with airouter.io without touching their workflows! 🎉 Get the first month for free with the voucher code "n8nspecial". Still using #gpt4o? How to find the best #LLM for your use case in < 30 seconds 👇 Choosing between GPT-4, #Claude, #DeepSeek, #Llama, #Mistral and all the other models isn't obvious. Each has different strengths in cost, quality and speed. Here's how you can find your perfect model in 2 easy steps for free: 1️⃣ Sign up at https://lnkd.in/eX38RyDa 2️⃣ Go to the Playground and submit your message Taadaa! 🥳 That's it! You'll instantly see which model fits your query best 🎯 And if you want to use the best LLM automatically, just replace OpenAI with airouter.io and you will start saving money on auto-pilot. #LLM #AI #CostOptimization Excitement is building 🤩: upcoming talk at OOP Konferenz Today at 16:15 Benedikt Stemmildt 👨🏼‍💻 and I will talk about Statista's journey in optimising a RAG system and how a model mix can reduce costs while improving quality and latency. AI Agent Design: Moving beyond flat tool lists with tool types 🤓 Ever noticed how AI agents get less efficient as their toolset grows? Here's a pattern I learned 1.5 years ago from my robotics agent project that made me rethink how we should organize agent tools. The idea isn't revolutionary, but effective: Instead of dumping all tools into one bucket, I organized them by their cognitive role: 🔍 Information Retrieval Tools: e.g. camera vision description, sensor values 🏃‍♀️ Action Tools: Movement controls, speech output The impact? The agent naturally followed a "sense then act" pattern, making its behavior more predictable and efficient. 👉 Why this matters: - Agents stop cycling through irrelevant tools - Decision paths become clearer - Debugging gets easier - Performance improves naturally Don't get caught up in the specific categories - what matters is the structured approach to tool management. Adapt the pattern to your agent's needs. And if you still have too many tools, consider splitting into multi-stage workflows or multiple agents - but try tool categorization first as it often brings surprising clarity. In the demo video, you can see how my robot Sonny starts walking before properly analyzing its environment - exactly the behavior this pattern helped fix. What patterns have you found helpful in managing AI agent complexity? #AIAgents #ArtificialIntelligence #LLM Smart LLM Routing Meets Enterprise Privacy 🔒 Struggling to balance LLM innovation with enterprise requirements on which LLMs providers to use? Just released: Two powerful privacy modes for airouter.io designed specifically for private LLM infrastructures: 🎯 Model Selection Mode: Want to identify the best model when using private models or a hyperscaler infrastructure like AWS Bedrock? Now you can! Get smart model recommendations without routing, keeping full control to call the models yourself. Perfect for teams with existing infrastructure who still want to optimize their model selection or need to follow guidelines on LLM provider selection. 🛡️ Full Privacy Mode: Maximum security for your sensible data: Our new embedding-based routing identifies the optimal LLM without ever seeing your actual queries. Your sensitive data stays completely private. These enterprise features enable you to optimize LLM costs and performance while maintaining complete control over your security perimeter and compliance requirements. Included in all paid plans. Check out the docs for details: https://lnkd.in/eWShBdq4 #EnterpriseAI #Privacy #Security #LLM #AIGovernance 🏆 LLM Production Leaderboard 2024: Beyond the Hype What a year for #AI! Our airouter.io Model Leaderboard 2024 just dropped fresh insights from analyzing 1M+ production requests. Want to know which LLMs are actually delivering value in real-world applications? Top 5 Most-Used Models: 1. GPT-4o 2. Qwen 2.5 72B (🔥) 3. Gemini 1.5 Flash 4. GPT-4o-mini 5. Gemini 1.5 Pro The landscape has shifted dramatically, with Qwen 2.5 72B claiming the #1 spot in recent weeks. This isn't just about raw power anymore – it's about delivering real value in production with the perfect balance of quality, costs, and speed. With recent releases like #llama3_3, #nova, #gemini 2.0 Flash and #qwq already shaking things up, the LLM space keeps evolving at breakneck speed. But here's what keeps me thinking: Will these powerful new models revolutionize the game, or just add to the noise? 🤔 Which models are you betting on? Have you already evaluated all these models? #ModelRouting #leaderboard #LLM #GenerativeAI "Optimize your RAG system or die trying" 🎯 - Upcoming talk at OOP Conference Munich! Excited to team up with the brilliant Benedikt Stemmildt 👨🏼‍💻 in February to share our hands-on experience optimizing Statista's RAG system. We're diving into "From Search Results to Insights: Learnings from Statista's GenerativeAI Journey" What we'll tackle: 🔍 Behind the scenes of ResearchAI - Statista’s RAG system 🧠 The reality of RAG optimization (spoiler: those Medium articles barely scratch the surface) 💡 Our battle-tested optimization playbook ⚡ The art of choosing the right LLM (and why it matters more than you think) This isn't another "look what AI can do (theoretically)" talk. We're sharing real challenges and solutions from a system that's already serving thousands of users worldwide. No theory - just practical insights from the production trenches. Curious about specific aspects of our RAG optimization? Drop your questions below! #RAG #GenerativeAI #OOP2025 🤔 Most RAG systems waste money by treating every query like a complex reasoning task #RAG (Retrieval-augmented generation) means retrieving relevant content from your own data and adding it as context so a #LLM could answer questions better. And adding much content means high costs when using expensive models. In the use case on the image, even a small, fast model gives the perfect answer because it's literally there in the context. Yet most systems would use #gpt4o for this - that's like using a Formula 1 car to drive to your local grocery store! 🏎️ Looking at the numbers across different projects: 📊 60-75% of RAG queries are actually this straightforward ⚡️ different models respond up to 27x faster for these cases 💰 and could save ~98% costs Complex queries (like "I've been charged twice last month, have unused credits from my old plan, and want to upgrade to the enterprise tier - what's the best approach?") still need powerful models. This is why so many teams are stuck to the big models. Model routing is the logical next step - using fast and cheaper models for straightforward answers while sticking to high-quality models when needed. This saves our customers 82% of their model costs on average. Want to test this with your RAG stack? Adding airouter.io is only one line of code. #RAGOptimization #AIEngineering #TechTrends What percentage of your RAG queries actually require complex reasoning? Share your observations! 👇 Complex prompts are your next technical debt bomb 💣 Ever noticed how #PromptEngineering has evolved? While we're all crafting increasingly sophisticated prompts, there's a hidden cost nobody's talking about. The Reality: When your prompts are packed with specific instructions and endless conditions, you're not optimizing - you're building a house of technical cards that's ready to collapse. Why This Matters Now: - Model updates can silently break your system - You're locked into specific models while the market races ahead - Your maintenance costs for re-optimizing prompts? Through the roof 📈 The Smart Move? Think Modular! 💡 Remember the Single Responsibility Principle and KISS (Keep It Simple, Stupid!)? They apply perfectly here. Instead of prompt monsters, we're seeing incredible results with: ✨ Simplified prompts without endless conditions and edge cases 🎯 Breaking complex tasks into separate focused LLM calls (parallel if possible) This approach not only makes your system more robust but also opens doors to optimize your stack through different models or model routing via airouter.io - choosing the right model for each specific task. What's your take on prompt complexity? Are you seeing similar challenges? #AIEngineering #TechDebt #ModelRouting #AIOps Looking for a developer in Hamburg or remote? 👇🤗 This week, our Chief Technology Officer Ingo Schellhammer and our Chief of Staff to the CEO Nina Böckenholt attended the Digitale Leute conference in Cologne, where Ingo gave an insightful talk about the development of our product Research AI. He shared details on our pilot approach, highlighting the collaboration of our internal team around Daniel Kuske and Thorben Holkenbrink with external experts Benedikt Stemmildt from TalentFormation and AI specialist Matthias Lau. 🧑‍💻   The talk, titled "From Search Results to Insights: Learning from Statista’s Generative AI Journey," offered a behind-the-scenes look at how we’re evolving our platform to deliver even greater value to our users.   A big thank you to everyone who joined the session and contributed to the discussion! 🙏 ____________________________ #GenerativeAI #ProductDevelopment #DigitaleLeute On my way to Cologne to give insights into real-world #GenAI applications at the Digitale Leute Summit 🤗 Tomorrow I will give the talk „From Search Results to Insights: Learning from Statista's GenerativeAI Journey" together with the fabulous Ingo Schellhammer and Benedikt Stemmildt 👨🏼‍💻 We will talk about 🔍 the value of delivering answers instead of search results 📊 how to optimize a GenAI PoC to be production-ready 💡 our technical and business learnings on the way Looking forward to exchanging ideas and insights! 🤓 #AI #RAG Stumbled upon my first tech print article: Eclipse GMF (2006) 📚 😬 Found a blast from the past today! My first ever print article from 2006 in eclipse Magazin about the Graphical Modeling Framework (GMF). Looking at it now makes me smile - some parts even make me laugh 😄 We were so excited about visual modeling tools back then! Quite a journey from GMF articles to building AI systems! Sometimes it's worth looking back to appreciate how far technology has evolved. 🤓 What was your first technical publication about? #Eclipse #GMF #GEF #TechEvolution #Programming Speed matters in AI - especially for chatbots 🚀 Just achieved a major performance boost at airouter.io: 👉 Migrated our #Llama hosting to Cerebras Systems 👉 Now reaching ~2150 tokens/second with Llama 3.1 70B 👉 That's 15x faster than common providers like Fireworks Why does this matter? Real users expect real-time responses. Every millisecond of latency in your chatbot or AI application impacts user experience and satisfaction. The best part? You still get the same high quality output, just faster. Much faster. Want to optimize your LLM applications for costs, quality, AND speed? Give https://airouter.io a try. #AI #LLM #Performance #AIOptimization AI or AI-Washing? A Rant 🫣 Nowadays everything is AI and everyone is an AI expert. 👉 "AI Professionals" telling you about prompt engineering 👉 "AI Engineers" explaining how to use an API 👉 "AI Consultants" craft strategies that boil down to using ChatGPT Most of this is just repackaging others' AI products. These "AI Strategies" are often no more than "use this product" guides - it's like calling "use Salesforce" a comprehensive business strategy. Now, don't get me wrong: - Some brilliant minds are also doing valuable work here - Integrating GenAI into workflows and product is a real challenge - Adapting to these tools requires skill and creativity The meaning of "AI" has shifted. This isn't necessarily bad, and I acknowledge this rant is part of my process to accept it 🤗. This rant is about keeping in mind, that many "AI things" are not as new as we might think and that there is still a difference between preparing data and training models and using the AI of someone else. What does #AI mean to you? #MachineLearning, #LLM usage, or something else? Enough talk about #LLM potential and proofs of concepts. Let's discuss a real production application delivering business value! 🤓 At code.talks this week, Thorben Holkenbrink and I will discuss Statista's journey developing ResearchAI - a #GenAI tool now used by thousands worldwide. 🔍 Discover our #playbook for LLM system optimization 💡 Deep dive into #RAG application enhancements 🎯 Learn to choose the perfect model for your needs 📅 Thursday, 16:00 📍 Hall D See you there 👋 Niklas von Weihe has published the awesome guide "Applying Open Source AI" to give some guidance and important insights on how to use the Llama 3.1 models 🥳 Well done Niklas, I know how much work is in there and this will be really helpful for many people 👏 OpenAI released #o1, their new model, optimized for "reasoning" 🤯 This would be awesome to finally see a good reasoning model, especially for many LLM agent use cases. So an exciting model, but my first impression is that this is not relevant for many use cases. The first tests in typical summarization and RAG application delivered very similar results to gpt-4o, only way slower and more expensive. For now it will not launch on airouter.io due to the very low rate limits but ping me if you want to test it within your application 🤓 Me talking about #llm #modelrouting 🤗 Closed Beta successfully completed – my passion project airouter.io is now available to everyone! 🚀🤗 My baby has grown up, and I’m excited about the significant impact it’s already making. Now officially the Founder of airouter.io! GPT-4o mini now available on airouter.io 🚀🤗 Actually, it’s already been in our model routing for the past three weeks and is currently ranked #2 among our most routed models 😬. This model can significantly reduce costs and offers quality comparable to Gemini Flash. While it may not be the top choice for low-latency applications with many output tokens, it’s still really fast. Start now using it in a model mix: https://airouter.io/ #AI #LLM #openai #airouter AI can be really valuable for service desk operations! Laurin created a great compact overview of typical challenges and potential solutions 🥳 The perfect LLM model mix can afford to be a bit weaker, and that’s okay 📊🤔 When running numerous benchmarks and real customer data on airouter.io with default settings, we often see a slight decrease in quality (at least in the numbers, ~2-3%). If you’re only focused on quality, this might seem bad. However, this graphic illustrates what happens when you factor in costs, highlighting why model routing is so valuable. Image Credits: lmsys Open-Source LLMs are weaker than GPT-4? Just let them team up! 🤓 Mixture of Agents (MoA) is a new method that boosts performance by combining the strengths of multiple LLMs. It uses a layered setup with several LLM agents in each layer, allowing MoA to surpass GPT-4 Omni’s 57.5% on AlpacaEval 2.0, scoring 65.1% with just open-source models! Yes, this could be slow, but when using Groq this doesn’t really matter 🤗 https://lnkd.in/eq9D8Z97 Paper: https://lnkd.in/eDTTnBP8 A model router is good, a model router that knows your goals is better 🎯 airouter.io can be tuned for your specific use case in many ways, the easiest being to add boosts or penalties to the quality, cost, and latency metrics. To help our customers find a good starting point, it is now possible to define your goals and get a code snippet for evaluation and fine-tuning. #llm #featureupdate #onboarding What a wonderful day at the #BigTechday conference last week! The talks were inspiring, the people were amazing, and the weather was just perfect! ☀️ Here are my highlights: 1️⃣ Dynamic Soaring Update by Spencer Lisenby: ✈️ Finally grasped the concept and now I’m even more excited about the possibilities! The way he explained it made everything click. 2️⃣ Life as an Analog Astronaut by Dr. Carmen Köhler Dr. Carmen Köhler: 🧑‍🚀 Fascinating insights into living an astronaut life without leaving Earth 🌍 I’m already looking forward to next year’s event! TNG Technology Consulting Moritz Prinz David S. Christoph Stock Thomas Endres Florian Gather Henrik Klagges Till Müller-Rochholz #BigTechday #TechConference #DynamicSoaring #AnalogAstronaut #Inspiration 🎯 PR-Support gesucht? Im Juli und August habe ich spontan noch Kapazitäten frei und freue mich auf neue Projekte, Menschen und Themen. Wer also endlich mit konstanter #PR-Arbeit starten will, seine #Kommunikationsstrategie schon immer neu aufsetzen wollte, den ausstehenden Gastartikel viel lieber abgeben möchte oder dringend Support für die anstehende Urlaubszeit braucht: Meldet euch gerne bei mir!  💻 Ich biete u.a.: - Journalistisch-redaktionelle Arbeit (Texte für Print und Online) - Entwicklung von PR-/Kommunikationsstrategien & -konzepten - Texte für Fachgruppen oder breite Öffentlichkeit (Pressemitteilung, Newsletter, Gastartikel, Buchbeiträge, Whitepaper, Positionspapiere, etc.) - Redaktions- & Themenplanung - Medienarbeit - Sparring & Beratung in kommunikativen Fragen und einen frischen Blick von außen 🔍 Das alles gepaart mit viel Know-how und Gespür für die Entwicklungen in den Themenbereichen #Medizin, #Gesundheit, #DigitalHealth & #Soziales. Unsere Zusammenarbeit kann auf Projektbasis oder langfristig angelegt sein - oder ganz anders aussehen. Ich bin offen für verschiedene Setups. Bei Interesse oder Fragen: Schreibt mir einfach eine DM. Und nicht vergressen: Sharing is caring 😉 #PR #publicrelations #freelancer #text #digitalhealth #healthcare #medical #diga On my way to the Big Techday by TNG Technology Consulting 🤗 I am already looking forward to looking beyond the "Tellerrand" and learning exciting things like planning expeditions like skiing across icy deserts, surviving on Mars, Dynamic Soaring and Speed Puzzling. I am especially excited about Grant Sanderson's talk on Visualizing Transformers 🤓. And of course I am looking forward to seeing all these lovely people again Till Müller-Rochholz Christoph Stock Moritz Prinz David S. Thomas Endres Florian Gather Henrik Klagges and all the others! 🫶 Claude 3.5 Sonnet joins airouter.io 🚀🤗 We’re announcing that Claude 3.5 Sonnet from Anthropic is now part of our model lineup. This advanced model outperforms GPT-4o on numerous benchmarks and boasts improved speed compared to its predecessor. Initially, I was skeptical about the recent buzz around this model’s performance over GPT-4o. Not because I thought the benchmarks would not reflect real-world results, but rather because the model is only slightly better, only slightly cheaper, and also slightly slower. In an overall mix, it cannot really beat the OpenAI flagship. Nonetheless, it surprised me in a model mix, ranking as the third most relevant model on airouter.io, following GPT-4o and Gemini 1.5 Flash. And I am already looking forward to Claude 3.5 Haiku as a much faster and cheaper model will providing much more value for various use cases. Start now using it in a model mix: https://airouter.io/ #AI #LLM #anthropic #sonnet #airouter Your users shouldn’t wait 30 seconds for the answer of a LLM 🕒⚡ High-performance models like GPT-4 offer remarkable capabilities, but speed can be a major hurdle for the UX. And while streaming is a nice concept it’s often still not fast enough or not possible, e.g. because you want to check the full answer for toxicity first. And speed is complex: there is the “time to first token” (often referred to as latency) and the throughput, telling you how many tokens a model will generate per second. On top, many models have a high speed variance, so just because it was fast this morning doesn’t mean it will answer fast now. So how to optimize your LLM application speed? 🔀 Optimize your application flow: Parallelize LLM requests where possible, use streaming, split requests when streaming is not possible. 🔍 Granularly rate metric requirements: Often applications use multiple LLM calls in a single user request. Do not stick to maximum quality everywhere; switch to fast models where possible while keeping the high quality model where necessary. 📊 Do not look at averages: A mixed calculation of time-to-first-token and throughput for the models is not helpful for your specific use case! Check your output tokens and define your acceptance on speed variance. For example: 💬 You typically have around 100 output tokens and constant response times are critical → check out Claude 3 Haiku or the Mixtral models. 💬 💬 💬 Your output is about 10,000 tokens and variance is okay if not too much → check out Gemini 1.5 Flash. Balancing latency and performance is essential. Choose the right approach for each use case to enhance efficiency and user experience. Or check out https://airouter.io to automate this optimization seamlessly 🤗. In 2023, OpenAI had downtimes on 46 days 😱 In Q1 2024, users experienced API downtimes on 9 days, about 10% of the total days. To be fair, other providers like Anthropic have also faced downtimes. We constantly see provider problems in our logs, almost every day. Scaling during a hype period is challenging. However, regular outages are a big problem for production LLM applications. You don’t want to upset your customers because OpenAI is down again. 👉 𝐈𝐦𝐩𝐥𝐞𝐦𝐞𝐧𝐭 𝐌𝐨𝐝𝐞𝐥 𝐅𝐚𝐥𝐥𝐛𝐚𝐜𝐤𝐬! There are many awesome models out there. Catch your response errors and use them when OpenAI is down. Pro Tip: Use https://airouter.io to automatically fallback to the next best model (and use all the awesome models out there all the time) Googles Gemini Models are joining airouter.io 🤗🥳 The Google Gemini-1.5-Flash & Gemini-1.5-Pro models are now part of our model setup. These models belong to the best models in the current model ranking (LMSYS Arena) and add a very good alternative to the OpenAI models in terms of overall quality. While the Pro model has it’s focus on high quality, the Flash model scores with it’s speed and price, while still maintaining a surprisingly good quality. In the first production routings, we see that the Gemini Flash model in particular performs excellently in a model mix! 🤩 Integrating these models into our algorithm was a tough task. For the same reason that I never really used them in client projects. Gemini offers really exciting safety filters, meant to filter out harassment, hate speech, sexually explicit and dangerous content. While this sounds awesome, it rarely works in the real world and many valid requests are blocked. Even if you set the safety settings to “BLOCK_NONE”, the model will send you empty responses from time to time due to safety reasons. Sometimes this is reproducible, sometimes it’s not, which isn’t feasible for a production application. Nevertheless, with extending our smart fallbacks, these models are now usable without any problems by simply falling back to the next best model when blocking issues occur 🤓. Start using the Gemini models with fallbacks now at https://airouter.io. BTW: Content moderation and injection attack detection will also be part of the airouter soon! Comment if you are interested. #LLM #google #gemini #airouter Is GPT-4 too expensive for your needs? 🤔💸 GPT-4 models are undeniably powerful and deliver exceptional results. However, their cost can be a significant factor to consider. Optimize your AI usage: ☝️ Use GPT-4 when necessary: Leverage GPT-4’s advanced capabilities for tasks that truly require high-quality and complex capabilities, such as reasoning. 🤗 Consider alternatives for simpler tasks: For routine tasks, more cost-effective models can be just as effective. Look at for example #llama3, #qwen2 and #command-r+ as easy alternatives as they are still at the upper quality level. Balancing performance and cost is key. Choose the right model for each use case to maximize efficiency and manage expenses. Or have a look at https://airouter.io to optimize this for you automatically. #OpenAI #CostOptimization #AIAlternatives #AI Evaluating LLMs and Techniques for your Application - the full process 🔍👇 Evaluating an LLM or a specific RAG technique for your specific needs can be an insightful process. Here’s a step-by-step guide: 1️⃣ Define Your Metrics Relevance: Identify which metrics matter most to your use case - quality, cost, latency, etc. (see https://lnkd.in/eCZM2MeH) 2️⃣ Create a Reference Dataset: Compile a dataset that reflects your typical use cases and challenges with the inputs and expected outputs. Depending on what you want to evaluate the inputs should be the inputs given to the whole application or the inputs being sent to the LLM (including RAG documents). Add multiple outputs if useful. 3️⃣ Create a Metrics Runner: Use a tool like e.g. benchllm to measure the metrics for your application. Use GPT-4-Turbo to rate the answers based on your reference data. You can also test GPT-4o for the evaluation, but for me, it performed worse on evaluation tasks. 4️⃣ Manually Review Results and Improve Reference Dataset: Go through the results manually to refine and optimize your reference answers and identifying false positives and false negatives. 5️⃣ Measure a Baseline: Establish a metrics baseline with your current setup. 6️⃣ Switch and Measure Delta: Try a different technique, LLM or use https://airouter.io and measure the effect on your metrics. By following these steps, you can ensure that you are really improving your application and not just your gut 🤓. #AI #MachineLearning #LLM #Evaluation #Optimization #airouter 𝐖𝐡𝐚𝐭 𝐢𝐬 𝐭𝐡𝐞 𝐛𝐞𝐬𝐭 𝐋𝐚𝐫𝐠𝐞 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥 𝐟𝐨𝐫 𝐲𝐨𝐮𝐫 𝐔𝐬𝐞 𝐂𝐚𝐬𝐞? 🤔 To determine the “best” LLM for your needs, it's crucial to prioritize and rate the following metrics: quality, cost, and latency. Establish an absolute order and define maximum budgets for cost and latency per request if necessary. For example: 1️⃣ Quality 2️⃣ Cost: max. $0.01 per request 3️⃣ Latency: max. 10s per request If you’re already using an LLM, set relative goals based on your current baseline to optimize your use case. For each metric, decide if you want to maximize it, improve it, retain it, or if you are flexible about it. For example: Quality: retain 🔒 Cost: improve 📈 Latency: flexible 🤷🏽‍♀️ ❗If you want to maximize a metric, the other metrics need to be flexible. ❗ You can improve two metrics, but the third must be flexible. While it’s possible to improve all three metrics, it's essential to prioritize to manage trade-offs effectively. With these priorities set, you can start comparing LLMs or use https://airouter.io to find the best LLM for your use case. #AI #MachineLearning #LLM #Optimization #Metrics Qwen2-72B is joining airouter.io 🚀🤗 We're excited to announce that the open-source Qwen2-72B LLM is now part of our model setup. This powerful language model shows great potential for cost reduction, especially for use cases with low to medium reasoning needs. You can test it on your use case now, ask me for early access. #AI #MachineLearning #LLM #OpenSource #Qwen2 #airouter 58% This is the average percentage our beta customers saved in their first week of using airouter.io 🤯🚀. Last week, we had a silent launch of our new product and began onboarding the first customers from our waiting list. It's exciting to see initial traffic in production and the first metrics proving the effectiveness of our algorithm 🤗. The AI Router optimizes your LLM requests by directing them to the most suitable model. It ensures that expensive and slow models are used only when necessary, while cheaper and faster models are utilized otherwise. The LLMs can be #gpt4, #llama3, or other models. If a model is unavailable, which often happens with #openai, the request is seamlessly routed to the next best model. If you are currently using the OpenAI API or any other LLM API, join our waiting list to optimize your costs and latency: https://airouter.io. #llm #ai #beta GPT-4o is making waves, but how does it perform beyond benchmark data? 🤔👇 I compared it's practicability on real-world data to a bunch of LLMs using airouter.io. 🌟 Performance: - 4% overall quality improvement. - Performs worse on larger contexts. - Overall, it is a better model, but only slightly. ⏱️ Latency - 80 tokens/s throughput – pretty impressive! ⚡ - While there are faster models like #claude3 Haiku and #mixtral 8x7B, GPT-4o's blend of quality and latency is hard to beat. Only #llama3 70B and Gemini 1.5 Pro come close. 💡Cost and Efficiency: - The lower pricing is attractive, though it remains one of the pricier options. - Consider using it in production only when necessary. - Using a mix of models with airouter.io can still save ~40% costs on real-life datasets 💰 🛠️ Strategic Considerations: - #gpt4o has made some models obsolete – e.g., Claude 3 Sonnet is rarely useful anymore. Stop using them now. - Balancing cost and latency is now more difficult, as cheaper models often run slower. Clearly define your priorities and budgets. At least for now, this is sure to change soon as more high speed open source models become available (yes, I am looking at you Groq). - If you use airouter.io, use the weighting feature to define the priority of your metrics: https://lnkd.in/e_Vuqerg 🚀 Excited for the next model releases already! Do you already use gpt-4o? What I am working on the last months: Routing to the best Large Language Model ✨ There are tons of exciting #genai models for text generation out there, proprietary and open source, but most people still use #gpt4 for their application. What if your requests are automatically routed to the best model, and if there are other models of similar quality that are cheaper and faster, they are used? Use gpt-4-turbo if necessary, use mixtral-8x7b on groq if sufficient 🤯 As Philipp Klöckner just mentioned in his Doppelgänger Tech Talk Podcast: a router choosing a cheaper model for a simple task is really exciting (https://lnkd.in/eFARrsfy, ~min 27). The algorithm is already really promising, if you are interested in early access, sign up here 🤗: https://lnkd.in/etNmqRVd That was fun! 🥳 We increased the quality of Statista's GenAI application by 140% 🤯 #successStory When Statista faced the task of optimizing their LLM-based data application, they turned to us at Heureka Labs. Our objective was to enhance quality, performance, and cost efficiency 🔍. 🤝 Our Approach Our team of 5 Heureka Labs experts collaborated with Statista's internal team in a month-long, deep-dive project. We brought in metric-driven experiments, focusing on every facet of the application from different vector stores and indices to advanced reranking and retrieval techniques. Our goal? Excellence in RAG retrieval and answering. ✨ Measured Success 🔥 Over 100 experiments conducted. 📉 Latency reduced by 10%, with peaks at 65%. 💸 Operational costs cut by 65%. 📈 Quality improved significantly by 140%. 🌐 Why Heureka Labs? We are not just problem solvers; we are innovators in AI and data-driven software projects. If you're seeking a partner to elevate your technological endeavors with precision and expertise, Heureka Labs is your answer. 🚀 Join Statista’s Pioneering Team Are you inspired by Statista's forward-thinking projects? They are expanding their team! This is your opportunity to engage in cutting-edge AI and data technology. Statista is now hiring developers 👇 https://lnkd.in/eR6qn5jT https://lnkd.in/e5p_bcX3 Thanks a lot for your trust Daniel Kuske Thorben Holkenbrink Ingo Schellhammer Marian Langbehn and Benedikt Stemmildt 👨🏼‍💻 and TalentFormation for making this project possible! And, of course, a big shout out to the team: Kai Wendlandt Felix Ude Melanie Moehlmann Christopher Brozek Statista & Heureka Labs – Advancing Generative AI ✨ Aiming for Innovation: Statista recently started a project to enhance our advanced LLM-based data application. Since we are striving to contunuously improve and learn, we teamed up with Heureka Labs, leveraging their specialized AI and software development expertise alongside our own in-house knowledge. 🤝 Fostering Collaboration: Over the past month, our team at Statista has worked closely with Heureka Labs' experts. Together, we've aimed to improve the quality, efficiency, and cost-effectiveness of our data-driven AI application. 🚀 Achieving Milestones: - Conducted over 100 experiments. - Reduced latency by 10%, with peaks at 65%. - Decreased operational costs by 65%. - Significantly improved quality by 140%. This project offers great insight into Statista's commitment to keep improving and innovating to ensure continued excellence in the filed of data and AI. We hope this gives you a good glimpse into what we're currently up to. Special thanks to Daniel Kuske, Thorben Holkenbrink, Jan Philipp Eggemann und Claudia Batista who made this project happen at Statista. Are you interested in contributing to our efforts? Statista is growing, and we're searching for passionate people to help shape the future of data and AI solutions. Join a team that's committed to making a difference, one step at a time. https://lnkd.in/exvge6Ji https://lnkd.in/e5dUK5aE #Statista #SuccessStory #AI #DataScience #WorkInTech #CareerOpportunities Have you ever wondered what my company Heureka Labs UG actually does? 🤔 Here's a snapshot! 📸 TLDR: Crafting early-stage AI software and running hands-on AI-focused leadership workshops. We are a small Technology Studio, a community of highly skilled freelancers with extensive expertise in ML engineering (#AI), #datascience, backend and frontend development, mobile apps, DevOps and product ownership. Our Mission? To navigate and accelerate your AI application journey, from concept feasibility to delivering tangible value to your customers. We've partnered with many awesome clients, such as Statista and DB Schenker, tackling a wide array of projects, including: 🔍🧠 Optimizing costs, latency and quality of a #generativeAI application 🚚🌱 Building a platform for CO2 reduction in logistics 👥🤝 Developing innovative algorithms for AI-based candidate matching for an HR platform 🔧🌐 Implementing #FederatedLearning for #PredictiveMaintenance ♻️🥡 Developing of a sustainable food packaging platform And for the times when the sky is the limit? We send satellites into the #stratosphere or build #robots full of AI 🛰️🤖. Got an idea for how AI could leverage your business? We're all ears. Let's create something remarkable together! 🌟 Contact me or book a slot in my AI Office Hour: https://lnkd.in/eMr-TTCe GenerativeAI with your own data is broken 😱 Learn how to fix it 👇 #successstories Incorporating your own data into Generative AI, specifically through Retrieval-Augmented Generation (#RAG), can be transformative. Typically, this involves ingesting data and its embeddings into a #vectorstore. When answering a query, relevant context is retrieved and combined with the user's question, using similarity search. However, this relies on the assumption that a user's question closely resembles documents containing the answer 🤔🤨. While this can sometimes be true, it often isn't the whole story. One method to improve this is a technique called Rewrite-Retrieve-Read, where the user's question is reformulated using a Large Language Model (#LLM) to better suit retrieval. Our experience shows that this method's impact can vary. More promising is Multi-Query Retrieval or RAG-Fusion, which generates and then combines multiple question rewrites. In fact, this approach has always noticeably improved quality, albeit at the expense of latency. Another intriguing solution is Hypothetical Document Embeddings (#HyDE). Here, a (smaller) LLM generates a hypothetical (fake) answer, which is then used for similarity search. This answer, even if wrong, should be more similar to documents with the correct answer than the question. In our customer projects, this improved the overall quality, but also reduced the quality of some reference questions. And of course Multi-Query Retrieval could be combined with HyDE, and if you don’t care too much about latency this really is a promising approach. Keen to dive deeper or create or optimize your RAG application? Contact me or book a slot in my AI Office Hour: https://lnkd.in/eMr-TTCe. Our #generatieveai experienced Heureka Labs team is ready to assist 🚀 2023 has been a remarkable year for me and my robots, filled with growth, challenges, and achievements 🤖 🚀 🤗 . And I love to share our unique journey with you. For insights into our adventures and learnings, check out the recording of our recent talk at code.talks "Sonny, can you dance?" "Of course!" 🤖 🕺 My robot now explores the room autonomously to solve tasks 🤯🤓 #llm #agents 🤖 Sonny now has a deliberative system based on large language model agents that allows him to solve complex tasks in the physical world. He uses #gpt4 to identify the next relevant action to take towards the solution and several adapters on the robot to gather information about the room and to move the robot. I also replaced the #objectdetection with a #multimodal llm that can directly answer questions about what is in the current frame, rather than just returning the objects. This allows Sonny to recognize whole concepts, e.g. a person is sitting on a chair, instead of just [person, chair]. 🔍 At the moment he still lacks some sense of space, but still a very interesting and exciting PoC. The agent is implemented using #langchain, #gpt4 for reasoning, #gpt3turbo and #chainlit. The multimodal model used is IDEFICS by Hugging Face (https://lnkd.in/eRYK9jRC). The robot is a self-built SpotMicro based on the fabulous #Spot from Boston Dynamics. 🤖 🐕 If you are interested in learning more about my journey from the digital to the physical world and how I built this robot, join me at the #codetalks this Friday at 4pm in Cinema 3: https://lnkd.in/exZjiP_c

Additional Details

Created
January 1, 1970
Creator
Matthias Lau
Category
Personality Emulation
Capabilities
persona

More Personality Emulation Apps