Go Back Up

June 25, 2024 The Hibernia San Francisco, CA

Jam-Packed Agenda

We are constantly adding to and updating this page. Check back often.

Morning Session

  1. 10:00AM - 10:30AM


    Fireside chat featuring Mo Elshenawy, President and CTO of Cruise Automation, and Mohamed Elgendy, CEO and Co-founder of Kolena. In this discussion, Mo Elshenawy will delve into the comprehensive AI philosophy that drives Cruise Automation. He will share unique insights into how Cruise is developing its quality standards from the ground up, with a particular focus on defining and achieving “perfect” driving. This fireside chat offers valuable perspectives on the rigorous processes involved in teaching autonomous vehicles to navigate with precision and safety. 

    • Mo-ElShenawy-Headshotv2 (3)-1

      Mo Elshenawy

      President & CTO, Cruise

  2. 10:30AM - 11:00AM

    AI and Government Regulation

    Gerrit De Vynck from the Washington Post will be moderating a panel that will delve into NIST, government-implemented standards, and their roles in developing AI. 

    • Gerrit De Vynck Headshot

      Gerrit De Vynck

      Tech Reporter, The Washington Post

  3. 11:00 am - 11:30 am

    AI Quality Standards

    Kolena CPO and CO-Founder, Gordon Hart, will discuss AI Quality Standards, how to establish gold standards, and how to create a framework to evaluate AI systems for performance rigorously.

    • Gordon Hart Headshot

      Gordon Hart

      CPO & Co-Founder, Kolena

  4. 12:00PM - 12:30PM

    The dollars and cents behind the AI VC boom

    Natasha will be moderating a panel of leading VCs who have backed the top AI companies and understand the correction within the boom, flight to quality and what happens when OpenAI eats your lunch, how founders should think about giving big tech a spot on their cap tables, & generally how to invest at the speed of innovation right now.

    • Natasha Mascarenhas headshot

      Natasha Mascarenhas

      Natasha Mascarenhas, Reporter, The Information

  5. TBA

    More Talks Coming Soon!

Autonomous Systems & Robotics

  1. 1:30PM - 2:00PM

    Eighty-Thousand Pound Robots: AI Development & Deployment at Kodiak Speed

    Kodiak is on a mission to automate the driving of every commercial vehicle in the world. Today, Kodiak operates a nationwide autonomous trucking network 24x7x365, on the highway, in the dirt, and everywhere in between. We also release and deploy software about 30 times per day across this fleet that is not just mission critical, but also safety critical. Our AI development process must match this criticality and speed, providing fast engineering iteration while guaranteeing a high level of quality that is the requirement of safety. In this talk, we’ll share the details of that process, from how the system is architected, trained, and evaluated, to the validation CICD pipeline, which is the lifeblood of the development flywheel. We’ll talk about how we collect cases, how we iterate models, and how we do quality assurance, data, and release management - all in a way that seamlessly keeps our robots truckin’ across the US.

    • Collin Otis Headshot

      Collin Otis

      Director of Autonomy, Kodiak Robotics

  2. 2:15 pm - 2:45 PM

    Growing Reliable ML/AI Systems Through Freedom and Responsibility

    ML/AI systems are some of the most complex machines ever built by humankind. These systems are never built as perfectly planned cathedrals. Still, they evolve incrementally over time, carefully balancing the need to experiment freely with the responsibilities affecting real-world production systems. As developers of open-source Metaflow, we have been tiptoeing this balance for years. In this talk, we will share our recent observations from the field and provide ideas for how reliable ML/AI systems can be built over the coming years.

    • Savin Headshot

      Savin Goyal

      CTO & Co-Founder, Outerbounds

  3. 2:45 PM - 3:15 PM

    Generating The Invisible: Capturing and Generating Edge-cases in Autonomous Driving

    Evaluating autonomous vehicle stacks (AVs) in simulation typically involves replaying driving logs from real-world recorded traffic. However, agents replayed from offline data do not react to the actions of the AV, they see only the data they have recorded, and their behavior cannot be easily controlled to simulate counterfactual scenarios. Existing approaches have attempted to address these shortcomings by proposing methods that rely on heuristics or learned generative models of real-world data but these approaches either lack realism or necessitate costly iterative sampling procedures to control the generated behaviors. In this talk I will break down how learning scene graphs can enable photo-realistic scene reconstructions and how we can leverage return-conditioned offline reinforcement learning within a physics-enhanced simulator to efficiently generate reactive and controllable traffic agents. Specifically, we process real-world driving data through the scene graph representation and the simulator to generate a diverse offline reinforcement learning dataset, annotated with various reward terms. With this dataset, we train a return-conditioned multi-agent behavior model that allows for fine-grained manipulation of agent behaviors by modifying the desired returns. This approach also allows for learned synthetic data based on these trajectories.

    • Felix Heide Headshot

      Felix Heide

      Head of Artificial Intelligence, Torc Robotics

  4. 3:45 PM - 4:15 PM


    How has the ML engineering stack changed with foundation models? While the generative AI landscape is still rapidly evolving, some patterns have emerged. This talk discusses these patterns. Spoilers: the principles of deploying ML models into production remain the same, but we’re seeing many new challenges and new approaches. This talk is the result of Chip Huyen's survey of 900+ open source AI repos and discussions with many ML platform teams, both big and small.

    • Chip Huyen Headshot-1

      Chip Huyen

      VP of AI & OSS, Voltron Data

  5. 5:00 PM - 5:30 PM

    Balancing speed and safety

    The need for moving to production quickly is paramount in staying out of perpetual POC territory. AI is moving fast. Shipping features fast to stay ahead of the competition is commonplace.

    Quick iterations are viewed as strength in the startup ecosystem, especially when taking on a deeply entrenched competitor. Each week a new method to improve your AI system becomes popular or a SOTA foundation model is released.

    How do we balance the need for speed vs the responsibility of safety? Having the confidence to ship a cutting-edge model or AI architecture and knowing it will perform as tasked.

    What are the risks and safety metrics that others are using when they deploy their AI systems. How can you correctly identify when risks are too large?
    • me (1) (1)

      Erica Greene

      Director of Engineering, Machine Learning, Yahoo

    • Shreya Rajpal Headshot

      Shreya Rajpal

      CEO, Guardrails

    • clairevo

      Claire Vo

      Chief Product Officer, LaunchDarkly

  6. TBA

    More talks coming soon!

Foundational Models, LLMs, and GenAI

  1. 12:30 PM - 1:00 pm


    Retrieval-Augmented-Generations (RAG) is a powerful technique to reduce hallucinations from Large Language Models (LLMs) in GenAI applications. However, large context windows (e.g. 1M tokens for Gemini 1.5 pro) can be a potential alternative to the RAG approach. This talk contrasts both approaches and highlights when Large Context Window is a better option thank RAG, and vice-versa.

    • amr_awadallah profile pic


      CEO, Co-Founder, Vectara

  2. 1:00PM - 1:30PM


    Enterprise AI leaders continue to explore the best productivity solutions that solve business problems, mitigate risks and increase efficiency. Building reliable and secure AI/ML systems requires following industry standards, an operating framework, and best practices that can accelerate and streamline the scalable architecture that can produce expected business outcomes.
    This session, featuring veteran practitioners, focuses on building scalable, reliable and quality AI and ML systems for the enterprises.

    • Hira Dangol Headshot

      Hira Dangol

      VP AI/ML & Automation, Bank of America

    • Rama Akkiraju

      Rama Akkiraju

      VP Enterprise AI/ML, NVIDIA

    • Nitin Aggarwal Headshot

      Nitin Aggarwal

      Head of AI Services, Google

    • Steven Eliuk Headshot

      Steven Eliuk

      VP AI & Governance, IBM

  3. 1:30PM - 2:00PM

    If you like sentences so much, name every single sentence

    What do AI models see when the read and generate text and images? What are the units of meaning they use to understand the world? I’ll share some encouraging updates from my continuing exploration of how models process its input and generate data, enabled by recent breakthroughs in interpretability research. I’ll also discuss and share some demos of how this work opens up possibilities of radically different, more natural interfaces for working with generative AI models.

    • Linus Lee Headshot

      Linus Lee

      Research Engineer, Notion

  4. 2:15PM - 2:45PM

    Integrating LLMs into products

    Learn about best practices when integrating Large Language Models (LLMs) into product development. We will discuss the strengths of modern LLMs like Claude and how they can be leveraged to enable and enhance various applications. The presentation will cover simple prompting strategies and design patterns that facilitate the effective incorporation of LLMs into products.

    • Emmanuel Ameisen Headshot

      Emmanuel Ameisen

      Research Engineer, Anthropic

  5. 2:45 pm- 3:15 pm

    Open Model and its curation over Kubernetes

    Open Generative AI (GenAI) models are transforming the AI landscape. But which one is right for your project? What are the quality metrics for one to evaluate his/her own trained model? For application developers and AI practitioners enhancing their applications with GenAI, it’s critical to choose and evaluate the model that meets both quality and performance requirements.

    This talk will examine customer scenarios and discuss the model selection process. We will explore the current landscape of open models and collection mechanisms to measure model quality. We will share insights from Google’s experience. Join us to learn about model metrics and how to measure them.

    • CindyXing

      Cindy Xing

      Software Engineering Manager, Google

  6. 3:15 pm - 3:45 pm


    Weights &Biases CEO and Co-Founder Lukas Biewald will share his perspective on the Generative AI industry: where we've come from, where we are today, and where we're headed.

    • lukas (8)-1

      Lukas Biewald

      Co-founder & CEO, Weights and Biases

  7. 3:45PM - 4:15PM

    Do Re Mi for Training Metrics: Start at the Beginning

    Model quality/performance is the only true end-to-end metric of model training performance and correctness.  But it is usually far too slow to be useful from a production point of view. It tells us what happened with training hours or days ago, when we have already suffered some kind of a problem. To improve the reliability of training we will want to look for short-term proxies for the kinds of problems we experience in large model training.

    This talk will identify some of the common failures that happen during model training and some faster/cheaper metrics that can serve as reasonable proxies of those failures.


    • ToddUnderwoodHeadshot

      Todd Underwood

       Research Platform Reliability Lead, Open AI
  8. 5:00PM - 5:30PM


    Today, Machine Learning (ML) plays a key role in Uber’s business, being used to make business-critical decisions like ETA, rider-driver matching, Eats homefeed ranking, and fraud detection. As Uber’s centralized ML platform, Michelangelo has been instrumental in driving Uber’s ML evolution since it was first introduced in 2016. It offers a set of comprehensive features that cover the end-to-end ML lifecycle, empowering Uber’s ML practitioners to develop and productize high-quality ML applications at scale. 

    • Kai Wang Headshot

      Kai Wang

      Lead PM, AI Platform, UBER

    • Raajay Viswanathan Headshot

      Raajay Viswanathan

      Software Engineer, UBER

  9. 4:30 pm - 5:00 pm


    Evaluation seeks to assess the quality, reliability, latency, cost, and generalizability of ML systems, given assumptions about operating conditions in the real world. That is easier said than done! This talk presents some of the common pitfalls that ML practitioners ought to avoid and makes the case for tying model evaluation to business objectives.

    • Mohamed El-Geish Headshot

      Mohamed El-Geish

      CTO & Co-Founder, Monta AI

Lightning Talks

  1. 12:00 PM

    Redefining Code Quality in an Increasingly AI-first World

    How do you enforce code quality in large codebases?

    Static analysis tools like eslint are an invaluable resource for doing this at the AST level, but this is really just table stakes.

    What about at the architecture level? What about higher-level best practices that have an outsized impact on your program’s correctness, security, performance, and perhaps most importantly, your team’s ability to ship fast without breaking things?

    These are all areas where we rely on senior engineers to manually enforce best practices during code reviews, which is an inefficient & error-prone use of their time.

    So the question becomes: can we use AI to better enforce code quality, and if so, what would an ideal solution look like?

    This talk introduces GPTLint, a fundamentally new, open source approach to code quality, and will walk you through everything you need to know to give your senior engineers a much needed break.

    • Travis Fischer Headshot

      Travis Fischer

       Founder, Agentic
  2. 12:15 pm

    Mitigating Hallucinations and Inappropriate Responses in RAG Applications

    In this talk, we’ll introduce the concept of guardrails and discuss how to mitigate hallucinations and inappropriate responses in customer-facing RAG applications, before they are displayed to your users.

    While prompt engineering is great, as you add more guidelines to your prompt (“do not mention competitors”, “do not give financial advice”, etc.), your prompt gets longer and more complex, and the LLM’s ability to follow all instructions accurately rapidly degrades. If you care about reliability, prompt engineering is not enough.

    • Alon_Gubkin

      Alon Gubkin

      CTO & Co-Founder, Aporia

  3. 12:30 pm


    Large Language Models (LLMs) are revolutionizing how users can search for, interact with, and generate new content, leading to a huge wave of developer-led, context-augmented LLM applications. Some recent stacks and toolkits around Retrieval-Augmented Generation (RAG) have emerged, enabling developers to build applications such as chatbots using LLMs on their private data.

    However, while setting up basic RAG-powered QA is straightforward, solving complex question-answering over large quantities of complex data requires new data, retrieval, and LLM architectures. This talk provides an overview of these agentic systems, the opportunities they unlock, how to build them, as well as remaining challenges.

    • Jerry Liu Headshot

      Jerry Liu

      CEO, LlamaIndex

  4. 1:00 pm

    The Power of Small Language Models: Compact Designs for Big Impact

    Small language models (SLMs) are taking the industry by storm. But why would you use them? In this talk, we will uncover the secrets behind crafting compact language models that pack a powerful punch. Despite their reduced size, these models are capable of achieving remarkable performance across a wide range of tasks, especially with RAG (Retrieval-Augmented Generation). We will delve into the innovative techniques and architectures that enable us to compress and optimize language models without sacrificing their effectiveness.

    • Joshua Alphonse

      Joshua Alphonse

      Director of Developer Relations, PremAI 

  5. 1:15 pm

    Building Safer AI: Balancing Data Privacy with Innovation

    The balance between AI innovation and data security and privacy is a major challenge for ML practitioners today. In this talk, I’ll discuss policy and ethical considerations that matter for those of us building ML and AI solutions, in particular around data security, and describe ways to make sure your work doesn’t create unnecessary risks for your organization. It is possible to create incredible advances in AI without risking breaches of sensitive data or damaging customer confidence, by using planning and thoughtful development strategies.

    • Stephanie Kirmer Headshot

      Stephanie Kirmer

       Senior Machine Learning Engineer, DataGrail
  6. 1:30 pm

    How to take control of your RAG results

    What is the state of the art on RAG quality evaluation? How much attention should you pay to embedding model benchmarks? How to establish and evaluate objectives for your information retrieval system before and after you launch?

    The journey to Quality AI starts with measurement. The second, third and 100th step are then an iteration and improvement against that measurement - what can we learn from the search & relevance industry that has been around for decades? What challenges are specific to embedding powered retrieval? And how to actually improve your vector search?

    Let's talk about it!

    • Daniel_Svonava

      Daniel Svonava

      CEO, Superlinked
  7. 2:30 pm

    GraphRAG: Enriching RAG conversations with knowledge graphs

    By integrating LLM-based entity extraction into unstructured data workflows, knowledge graphs can be created which illustrate the named entities and relationships in your content.

    Kirk will discuss how knowledge graphs can be leveraged in RAG pipelines to provide greater context for the LLM response, as well as utilizing entity extraction for better content filtering.

    Using GraphRAG, developers can build enriched user experiences - pulling data from a wide variety of sources, not just what is accessible with standard RAG vector search retrieval.

    • Kirk Maple Headshot

      Kirk Marple

      Founder, CEO, Graphlit

  8. 3:00 pm

    Evaluating LLM Tool use: A Survey

    LLMs are being becoming more integrated with the traditional software stack through use of structured outputs. Combining structured outputs with planning & reasoning gives rise to tool use/function calling.

    The LLM is in charge of deciding when to use which tool, generating a correct call signature and parsing the unstructured/structured output generated by tool invocation.

    How do we know if an LLM is doing a good job? How can we compare various LLMs for our specific tool use use cases on metrics like speed, cost & accuracy?

    In this talk I'll go into evaluation strategies that are being used in practice and how we think about evaluation of tool use at Groq.

    • Rick Lamers Headshot

      Rick Lamers

      AI Engineer & Researcher, Groq Inc.

  9. 3:15 pm

    Leveraging Function Calls and AI to Dynamically Build Test Pipelines with Dagger

    Former Docker Founder Solomon Hykes' talk brings together DevOps and AI, showcasing the new Dagger open source project that replaces CI YAML with clean code.

    Technical and code-first, this talk will showcase how the combination of new primitives and AI capabilities can dramatically change the way we write and test software. After briefly explaining the Dagger project the talk will transition to a demo showing a glimpse of the future combining CI as Code with AI to enable dynamic pipeline generation. He will show how an AI model can automatically assemble the perfect pipeline using Dagger functions based on a set of declarative instructions leveraging GPTScript.

    • Solomon Headshot square

      Solomon Hykes

      Cofounder & CEO, Dagger.io

  10. 3:30 PM

    AIOps, MLOps, DevOps, Ops: Enduring Principles and Practices

    It may be hard to believe, but AI apps powered by big Transformers are not actually the first complex system that engineers have dared to try to tame. In this talk, I will review one thread in the history of these attempts, the "ops" movements, beginning in mid-20th-century Japanese factories and passing, through Lean startups and the leftward shift of deployment, to the recent past of MLOps and the present future of LLMOps/AIOps. I will map these principles, from genchi genbustu and poka yoke to observability and monitoring, onto emerging practices in the operationalization of and quality management for AI systems.

    • Charles Frye Headshot

      Charles Frye

      AI Engineer, Modal Labs
  11. 4:00 PM

    Beyond Benchmarks: Measuring Success for Your AI Initiatives

    Join us as we move beyond benchmarks and explore a more nuanced take on model evaluation and its role in the process of specializing models. We'll discuss how to ensure that your AI model development aligns with your business objectives and results, while also avoiding common pitfalls that arise when training and deploying. We'll share tips on how to design tests and define quality metrics, and provide insights into the various tools available for evaluating your model at different stages in the development process.

    • Salma Mayorquin Headshot

      Salma Mayorquin

      CEO, Remyx AI

  12. 4:30 pm

    Building Robust and Trustworthy Gen AI Products: A Playbook

    A practitioner's take on how you can consistently build robust, performant, and trustworthy Gen. AI products, at scale. The talk will touch on different parts of the Gen. AI product development cycle covering the must-haves, the gotchas, and insights from existing products in the market.

    • Faizaan Charania Headshot

      Faizaan Charania

      Senior Product Manager, ML, LinkedIn

  13. 5:00 PM

    Less is not more: How to serve more models efficiently

    While building content generation platforms for filmmakers and marketers, we learnt that professional creatives need personalized on-brand AI tools. However, running generative AI models at scale is incredibly expensive and most models suffer from throughput and latency constraints that have negative downstream effects on product experiences. Right now we are building infrastructure to help organizations developing generative AI assets train and serve models more cheaply and efficiently than was possible before, starting with visual systems.


    • Julia Turc Headshot

      Julia Turc

      Co-CEO, Storia-AI

Ready to join us?

Secure your tickets now before they sell out!