Professors Joe Hellerstein and Joseph Gonzalez on LLMs
September 12, 2024

Professors Joe Hellerstein and Joseph Gonzalez on LLMs

No items found.

You should’ve seen Benjamin’s face when we told him that we managed book Joe Hellerstein and Joseph Gonzalez for the Data Engineering Show.

Joe Hellerstein is the Jim Gray Professor of Computer Science at Berkeley and Joseph Gonzalez is an Associate Professor in the Electrical Engineering and Computer Science department. They’ve inspired generations of database enthusiasts (including Benji and Eldad) and have come on the show to talk about all things LLM and RunLLM which they co-founded. If you consider yourself a hard core engineer, this episode is for you.

Listen on Spotify or Apple Podcasts

Benjamin (00:16.882) Hi everyone. And welcome back to another episode of the data engineering show. Today we have two super exciting guests, Joe Hellerstein and Joey Gonzalez. They're both professors at UC Berkeley and are now heavily involved as co-founders in a startup called Run LLM. I'm sure we'll hear a lot about that kind of going forward. Joey, Joe, do you want to quickly introduce yourself to the audience?

Joey Gonzalez (00:43.849) Sure. I guess I can start. So hi, I'm Joey Gonzalez, faculty at UC Berkeley. I do research broadly in machine learning systems, work on everything from crazy neural networks to really fast systems for serving models and even training them. I've been doing a lot of work recently in large language models, everything from Vicuña. to the bigger LM sys, chatbot arena, and even VLM, so the serving infrastructure. And then of course, I'm also co-founder at RunLM, doing really cool stuff there as well.

Joe Hellerstein (01:11.459) Hey, I'm Joe Hellerstein. I'm also on the faculty here at Berkeley, where I joined really long time ago, back in the previous millennium. My background's in database systems, so I was brought up at IBM Research, and then I was a student on the Postgres project. More recently, you know, over... decades here at Berkeley have done research in a wide range of data centric systems from database engines to machine learning systems with Joey and others, to things that were more visualization related. That work led to the Data Wrangler project, which we commercialized as Trifacta. I was also involved in the Green Plum database effort. So I've been in both industry and academia for a bunch of decades now in and around data.

Benjamin (01:55.53) Awesome. I love previous millennium. Like it's not that long ago, but it sounds a really long time ago. Nice.

Eldad (02:03.208) When they used to build databases that just can't be ripped and replaced, like Postgres, impossible. It's just too good. And no matter how much you try, how hard you try, it stays. Just upgrade to a newer version. Thanks for having us. I was pitching Benjamin for a long time. Can we get some professors? Maybe we can kind of expand the data engineering show beyond the usual stuff.

Joe Hellerstein (02:09.584) That's right.

Eldad (02:30.98) I hope that we'll be, today we'll kind of try to do that. Thanks for joining us. Really excited to hear what you're all about.

Joey Gonzalez (02:39.651) Thanks for having us.

Benjamin (02:41.37) Awesome. So, and Joe, like I'm a database nerd, right? Like I've read your papers, kind of like know you from my database education. Now you're thinking a lot about like LLMs. Like tell us like how did that happen? Right? Like how, yeah, how did you get into that?

Joe Hellerstein (02:56.495) You know, um, LLMs happens to me like they happens to everybody else, right? Uh, it's just phenomenal what, um, what we've absorbed in the last year and gotten used to in some sense, at least on my end, I feel much more, um. matter of fact about the whole situation than I was maybe 12 months ago. But really, you know, like my interest in machine learning systems, which goes back to days when Joey was a grad student and I was collaborating with his team at CMU, Carlos Gestrins team there, is around the algorithmics and the scaling of the data that happens in the, and the, you know, computation that happens in these large scale data driven machine learning models. So Joey and I go

Joe Hellerstein (03:41.189) in machine learning. It was actually quite algorithmically interesting, arguably more so than LLMs, if you're a computer science nerd. And then, more recently, I actually sort of backed off of AI work because at one level, Joey was covering the systems end of AI at Berkeley. And at another level, the community had jumped in with four feet. And it didn't seem like I had to be working on it. Lots of smart people were working on scaling AI systems, and I figured I could work on stuff where fewer people were working. Anyway, I'm happy to leave the fray to those who wanted to fight. So Joey has kind of been more involved over the last decade in scaling those systems. And more recently, he and I have been teaming up to build product around it.

Benjamin (04:25.73) So building product around that, right? Like you also said, Joe, like last year was this explosion kind of in this space. It seems to be changing every day. Like how do you actually build a company in this space, right? Kind of when things are moving that quickly, like how, how do you make sure that things you came up with, I don't know, half a year ago in terms of vision are still relevant today.

Joey Gonzalez (04:46.262) Yeah, maybe I can take that one because I was head of product and still am at a company that's constantly changing.

Benjamin (04:47.648) Sure.

Joey Gonzalez (04:52.826) in a world that's constantly changing. So when we launched the company, it's actually remarkable. The broader vision of what we were doing at what was originally called Aqueduct was to really think about how to bring machine learning into the world of data systems, into the world where people would use it every day. And in our early interaction with customers, this was pre-2023, which is shockingly actually more or less pre-LM, pre-mass adoption of large language models. most people are thinking actually not about how to do really cool deep learning stuff in production, but often just how to plumb machine learning into the workflows that they had. And a lot of that meant connecting machine learning to different data backends, to different data systems. A lot of that meant just basic processing of metadata, things that the data engineering community is actually quite good at. And that was sort of where machine learning was, what was And still is, but that's evolved very quickly with large language models. And actually, I think what's most interesting about large language models is they made it more, even more about the data. It was previously like really cool algorithms you had to deploy and they were special. They're special algorithms for each model, special tasks, special systems. Now there is sort of one set of special systems that we need. And remarkably, it's a text in, text out kind of system. So the sort of APIs would have simplified.

And the real challenge is getting the right data in and getting that out, you know, the outputs to where it needs to go. So in some sense, the space of products as a whole actually hasn't changed that much. It's always been about how to connect machine learning to things. The kinds of machine learning has changed. And throughout my career, it's changed many times. As Joe pointed out, when I started my PhD, neural networks were silly. No one did that stuff. We did probabilistic models that were easy to reason about, or at least, you know, analytically reasonable.

And a lot of work went into algorithms and cool techniques and scaling it. And Joe and I worked on some of that stuff. And then it all changed. And deep learning really jumped in very quickly over quickly by today's standards. It took like four years, five years for people to really adopt deep learning across the entire field, at least in research. And then, you know, we had to transition, but again, the story's always been about connecting the data and systems to the, to the algorithms. And then of course, in one year.

Joey Gonzalez (07:11.93) it went to a whole other class of techniques.

Eldad (07:14.602) You know, in databases, we take 20 years to evolve, like, kind of figure things out here. Like it shrinks, all right? 10 years, five years, one year. It is confusing. Please help us.

Joe Hellerstein (07:14.715) So maybe I can.

Benjamin (07:30.397) Help, help Eldad, he's very...

Eldad (07:32.872) HUEH!

Joey Gonzalez (07:33.446) kind of excited to see what this means. Like what is it all the changes in the technology of machine learning mean for the data engineering database community as a whole? And I have some thoughts, but yeah, it's a, you know, an exciting area to exciting time to be working in this space.

Benjamin (07:47.266) Yeah. I mean, looking from the outside, right. It's like many of the things you're seeing now in the like very mature data warehousing space are also kind of popping up in this LLM world. Like some tools doing ELT, kind of creating your knowledge graphs, text embeddings, kind of whatever. Then you have like your vector database, your knowledge graph to feed context into the LLM, like take us through that stack, right? That's kind of popping up around LLMs and your thoughts on that.

Joey Gonzalez (08:16.974) So I can take a quick stab at it. So at a very high level, the basic adoption, the basic pattern that people are using with large language models, LMS, is I want to ask a question and get an answer. I want to have a support request. And so to get to that, I have a model that's essentially a good reader. You can give it meaningful text that'll read it and produce a result. To do that with your company's data, with your company's product processes, with your company's tone. it takes a couple of steps. So the first step is getting that data into the request. And often when I ask a question, it's not feasible to dump all of my company's manuals information in with the question that was asked. So we'd like to say, I'll read everything about my company, answer this one question. That doesn't work for reasons we can get into later. So the first big innovation that people are excited about today and the kind of deployment of large language models is something like RAC, where I have a retrieval process. Using technology that's in some sense antiquated and also brand new for going from a question to the right piece of documentation, the snippets of texts that are most helpful. And then you ask the LM with these snippets of texts that we think are helpful and the question answer the question. And so collecting that, that documentation into a data system that you can quickly retrieve has been kind of the first big engineering pipeline that sort of sits around the LM's to make them more adapted to my companies or my, you know, domain.

And so that is the vector store. It's kind of neat to see that field evolve because looking up stuff by vectors, something we've been doing for a long time, there's a whole field of information retrieval that's been doing this for many, many years. And they have very cool techniques that we haven't seemed to rediscover yet. So things like, perhaps you wanna find documentation that's frequently clicked on. I've had my grad students go, keep returning papers that are correct, but... not well cited, like, yeah, maybe we should include relevance in how we choose the things we return. So yeah, so a lot of machinery and taking text, breaking into little chunks, and then retrieving those chunks. And we're rediscovering a lot of old techniques. And I think there's a lot of opportunity to bring in more disciplined ideas around, hey, maybe I want where clauses, more interesting ways of selecting text, not just similarity between the question was asked and the snippet.

Joe Hellerstein (10:33.283) So maybe I can just jump in and give a little bit of background for some folks who haven't been hanging out in this space. So RAG stands for retrieval aware generation. If I got that right, Joey. Augmented generation. Yeah. Sorry. Excuse me. And, you know, basically it's like having a search engine connected to your, to your LLM and

Joey Gonzalez (10:42.987) Retrieval augmented, but close enough. Yeah, yeah.

Joe Hellerstein (10:52.731) When we talk about vector search and that sort of thing, we're mostly talking about the kind of similarity search that you do in a search engine. So in a search engine, you type in a bunch of words and it gives you documents that sort of are like the bunch of words you typed. And so similarly here, when the LLM wants to answer questions, it might wanna answer them in the context of stuff that you've indexed in a manner that's pretty similar to information retrieval to search engines.

Joey Gonzalez (11:16.686) So I can tell a fun story about that. So let's say I'm looking for interesting products, right? And so I want products. I'm shopping for my daughter for Christmas and she wants maybe a little drum set. So I'm interested in drums, right? If I just search, I'm interested in drums. Drums is a useful keyword. If I do vector similarity, that might find useful things. I might have a price range. So it'd be nice if there were like where clauses and find at least products that closely match. And then I'd like to feed in the products into an LLM, which would try to provide a summary of the kinds of products that I want.

Eldad (11:20.664) and social media.

Joey Gonzalez (11:47.114) Things that we've discovered that make a big difference. What if you asked the LM before it's read anything to make up a product description? So I'm looking for drums in that you ask LM, what would drums look like? What are kinds of things, you know, is it a metallic drums by looking for, you know, wooden drums, modern, contemporary. So having those descriptions and augmenting what I look up makes a big difference. And so we've since learned that this kind of idea you ask an LM once. To make up an answer, then you look up the made up answer to find relevant documents, then you ask the alum again, now here are correct documents, now answer the question again. So you start chaining these, so you start to have these workflows, it's not just one call, but many calls that go back and forth between different data systems. And that's kind of the other thing that's starting to merge is how to orchestrate alums in this more complex process.

Benjamin (12:34.118) So in this process, when I then go to my vector database to figure out relevant documents or something like that, like what actually generates the query for the vector database, right? Like, okay, with like Looker or Tableau, it's like the kind of program itself, generating SQL to query the underlying data store. How does it work here?

Joe Hellerstein (12:53.927) So think about this much more like a search engine right now, at least in the current state of the art, than you think about like a relational database. So it's literally just going to send essentially your question in to find relevant documents the way a search engine would take your free text and find relevant web pages. Now, as Joey alludes to, we may want to improve this over time with things like where clause predicates that we're familiar with in SQL that filter out stuff that's irrelevant, where you can have more logic in your queries.

Where we are today is pretty much what's called vector search or near neighbor search, which the simplest way to think about it is it's like web search. You're just finding relevant things in a ranked list. And when you look at stuff like PG vector, you know, where they've basically put vector search into a relational database, it's very much like the text search extensions to Postgres. In fact, it's built on the text search extensions to Postgres. So again, even though you're in the relational database, that part of the query, is just a text search lookup and it's finding stuff that is similar to what you asked about.

Joey Gonzalez (13:57.32) So, and this is just.

Eldad (13:58.95) So it's yet another data type, Benji.

Benjamin (14:02.574) and relational systems will swallow it all. Yeah. What's your take on that aesthetic controversial thing? What, where do you see this moving?

Joe Hellerstein (14:14.531) I don't find it controversial in the sense that I don't think relational systems are relational systems really anymore. They're just like the databases that do 90% of the work and 90% of what they do is relational. But as Postgres showed in the late 80s and has been showing until now, and it's pretty much been adopted broadly in the industry.

You can extend the relational model with all these data types and plugins and indexes and various things. So people had looked into plugging in search before into Postgres and other relational databases. And it works OK. And so you can use it for this purpose. Other folks will say it's really good to use a traditional search engine for this purpose. Some people will go use Elastic for a rag. It's kind of heavyweight. It's not clear that it's the state of the art.

And those systems are architected a little differently than a relational database. They're tuned for their workload. But the techniques under the covers are all textbook stuff we all know. So I don't foresee that, for instance, vector databases for LLMs is a viable market, if that's sort of the question, because it's just one feature of data management that you can integrate into a bigger platform.

But this is not relational maximalism. It's not like everything must be a table and you must use SQL. It's just much more about like sharing a backend across all your data.

Joey Gonzalez (15:31.211) Yeah, I'll go a step further, because right now we're really excited about text, because, well, that's what LMs read, but...

Often you have data that's not text, and it would be nice if the alum would look at that too, like pricing information, product features, do comparisons. It can be represented as text, but it's rows in a table or a document store. I think we're gonna get to a world where text is SQL or query engines from, I have a question, you generate a query that might dig up the right information and stuff that into the context as well, and start to use alums to do... query generation, maybe query refinements, so get some results back and then adjust it. And so programming with these like small calls to linguistic logic that extracts pieces of code, not just documents might actually be a big part of the workflow, and which means that we're gonna start to pull back in other parts of data engineering, tool chain into the LM answers, which should be really exciting.

Benjamin (16:29.99) Okay. So this world then would look more similar to actually what we also see today in the data warehousing space, where you would have then like an ecosystem company that just focuses on how do I generate like great SQL queries based on the text input I have to feed it into snowflake, redshift, or whatever, whatever type of system. Gotcha. Okay. So, no, sorry, go ahead.

Joey Gonzalez (16:46.086) Yep. This we have to... Yeah, keep going. I was just saying, one of the funny things about LLMs is you have to understand, they just predict the next word by guessing. And so the basic technology, all it's doing is saying, given all the words I have so far, what's the probably next word? Having read everything on the internet and as much code as possible. Which means that writing SQL queries, yeah, it's amazing it works, and it doesn't.

Eldad (17:08.914) It's amazing.

Eldad (17:13.249) People are very terrible at guessing. Get something better to guess. So I've been thinking about what you said on the stack. So obviously, there's this whole opportunity for new products, right? Isn't that a bit boring if we just try to squeeze everything into a database? Joe, you mentioned it. There's no need to squeeze everything into a database just because it's a database. So.

Joey Gonzalez (17:18.898) That's what we have is a powerful machine for guessing.

Eldad (17:39.748) we'll have this wave of new startups trying to change how data professionals you work and build solutions, right? So today you're setting the database to the enterprise. Thanks to Snowflake, we now have a market where data engineers can build stuff that a few years ago, only big engineering groups could do. So they kind of simplified the stack, right? They removed the stack. Just write SQL.

We'll do all the magic behind the scenes, including the hardware, software, whatever. Then there is the more serious projects done by longer term engineering, machine learning, companies that have real expertise. So I would guess those machine learning scientists would evolve and start building LLM stacks with the tools and vector database and everything. You've mentioned Hadoop, would it look like

Joey Gonzalez (18:35.893) I'm going to take a few minutes to get this done.

Eldad (18:36.052) the terrible hadoopiers, by the way, which we all thought were almost all of us thought was amazing. I hope that we will not go through the same retro, right? Like I hope that we will shortcut some of the stuff. I really hope that we are not going to have yet another cycle where we basically have data scientists and are doing the same thing with a new name for a very similar stack. And we're really going to see some changes here. Obviously on many of those fronts, maybe that new front can change that. But really, it would be sad to see Hadoop mindset taking over again. It would be nice to see startups or companies going to remove a lot of that complexity. If we can tell you one thing from experience is people haven't learned. Like you take the simplest thing, you need to join two tables and that's still a big problem for human beings. And

Joe Hellerstein (19:21.627) So.

Eldad (19:32.936) Yeah. So kind of my, my question to you is, is how do you foresee the market? Joe said a very big, I gave a very big statement, which I love, which is there is no future for vector database just for the purpose of, of doing vector search, which I completely buy into that future. What else, like what other things will exist if not that maybe that takes us to what you're building.

Joe Hellerstein (19:56.903) So yes, I was exactly going to try to build that bridge in the conversation. So at some level, I think a couple of years ago, where we started the company was in an environment where there was a ton of mayhem around the stack, lots of players building little things. But unlike the Hadoop years, these were, you know, early 2020, 2021, highly capitalized startups building little things.

Benjamin (20:01.71) Thank you.

Joe Hellerstein (20:24.003) So what happened in the Hadoop years was everybody had their own pet open source project and Cloudera and Hortonworks rolled them up. And we just had mayhem because every open source project was its own ship. And we were trying to build a fleet out of that. But two years ago in the machine learning space, it was the same thing, except these ships weren't open source. They were startups. Some were open source, some weren't. But each one of them had its own funding and was pursuing the market and telling themselves if we build this piece of the stack, eventually we'll own the whole stack.

And that was not going so well. So we wanted to make that very simple as a SAS environment for users so they wouldn't have to assemble this crazy stack. What's changed with LLMs is a large fraction of use cases can be handled with a single sort of brand of inference.

So the machine learning engineer's job is no longer, you know, what crazy ML packages are you trying to do machine learning with? It's more around how do I enhance the value of the LLMs, whether they're the open source LLMs or the commercial LLMs. And that gave us a fair bit of clarity.

One of the things I think that's it's kind of common agreement in Silicon Valley right now, but we lived it is that building the sort of picks and shovels company for the gold rush and machine learning is not going to be a strategy for at least the foreseeable future. Due to market changes, you know, it's hard to compete with the big players there. And also, just the stack probably is too fragmented to go do that.

So everybody wants to understand kind of what business value they're adding. We're not building just tools. We're building a thing that solves a business problem.

Joe Hellerstein (22:00.215) And so at RunLLM, what we decided was that the clearest space to make a difference here is in the developer community, because they're going to be early adopters. And unlike, say, medical applications or self-driving cars or stuff, the barrier to a viable product is much lower. If it helps the developer be more productive, great. If it's sometimes wrong, it's not the end of the world. There's a human in the loop, and they don't do something that's hopefully life-threatening on a daily basis.

So we've been focused on that developer environment and building augmented models as well as user experiences around that so that developers can be more productive, particularly with complex APIs. So a lot of what Relatin LLM is now focused on is based on work in Joey's team and a project called Gorilla that was focused on using LLMs to make complex APIs more approachable. And maybe with that, I should hand it off to Joey, because it's really his baby.

Joey Gonzalez (22:53.358) Yeah, I think maybe the easiest way to tell the story here is that developers have access to lots of things that they need to read to get their job done. And if we can bring these ideas of RAG and actually fine tuning, which we didn't talk about together with proper integration, not into just your documentation, but Slack discussions, GitHub, everything, to give you, to give our AIs visibility into what's going on. So you as a developer can ask questions and maybe even be notified when things happen that might change the way you're approaching stuff.

Eldad (23:18.426) And I think that's the best way to do it. And I hope that helps. Thank you.

Joey Gonzalez (23:23.394) In some sense, it's helping developers be more productive, which is something that, let's say, Copilot does. But our focus is more on the integration with all the data and the things that happen around you, your development context. Copilot helps you write code. But if you're trying to write a design document, trying to think through a bigger process, or maybe just looking up other ways you could do the thing you were trying to do already, we've been building a tool around that. And...

Our focus is kind of interesting. We started out as a picks and shovels company and realized that, you know, really focusing on a solution, a product would help us both bring LLMs to the group that are most likely to adopt them early and, and build them into their workflow in the future, and also give us more visibility into how this, this ecosystem will settle. It is true that there's currently a wild west of small, you know, LLM innovations from vector stores to like lane chain to rag add-ons.

and how they fit together. We once thought one could describe an LM stack, but that keeps changing so quickly that it was more clear to us to focus on just building a solution to a real problem with that technology to understand how it fits. And maybe someday we come back to the broader LM, run LM process itself. As someone who pioneered inference technology, it's kind of disappointing that we're not racing to build the world's best serving technology.

But if you look at the price war that's already ongoing, it's not a fun place to be. As Joe pointed out, these models have converged to a basic set of architectures, a basic set of APIs. You can win on price and speed. And that's currently a race to the bottom to bring the prices down. To the player right now, the prices that some people are proposing are below the power costs of the GPU running at like idle. So there's no way there's a sustainable prices.

Eldad (25:09.99) We love those, the Nvidia sponsoring growth bottom up. Sometimes that's the only way if you're in Nvidia. Super interesting.

Joey Gonzalez (25:10.013) Yeah, not a fun place to be.

Joey Gonzalez (25:16.867) Yeah.

Benjamin (25:22.129) So how exactly, like one thing I was curious about is you mentioned this can help me write a design doc, right? So I have a clear picture in my head of how to use copilot. Like I'm kind of hacking and in my IDE, it's kind of recommending code. Now I'm coming in and I say, Hey, I want to build a new, I don't know, hash aggregation into my database system.

Joey Gonzalez (25:37.489) Mm-hmm.

Benjamin (25:42.622) And I need to write a design doc because I want to figure out like, surface it with the team, kind of get thoughts. Like how does it, how does the interface here actually look like does run LLM propose the entire paragraph to me? Does it just show me relevant other things? Like what's actually happened?

Joey Gonzalez (25:44.473) Yeah.

Joey Gonzalez (25:47.778) Mm-hmm. That's it.

Joey Gonzalez (25:57.848) So it's a great question. And this has been the fun part of the product. One of the first realizations is it looks different depending on who you are. So some people really want to work in a copilot like environment. They want stuff on the side of copilot that maybe gives them the right documentation, suggested context, ways to approach what they're currently doing. Others wanted something more like a chat bot that I can ask questions. How would I do this with Terraform? Is there a faster way to get this Kubernetes deployment going?

And here's my current setup. Give me suggestions on how to improve it. And then others we've kind of talked to are looking at something more like a form. So imagine something like Stack Overflow, which was some of our inspiration. I should be able to ask a question and then get hundreds of LLMs to come together and try to answer the question, provide guidance, rate each other, give feedback on each other's answers. Like, wow, that's a much better way to approach this. And my question was totally wrong.

But how did that happen in real time, not in weeks? And be able to interact with that process with threads. So there's a lot of opportunity to build more than one interface. As a startup, we're trying to focus on the easiest things to integrate with first, the kinds of Slack, basic chat, very basic kind of message interface, message interface. But yeah.

Eldad (27:05.728) Quick question on that. You've mentioned Slack, kind of as a data source. I'm building a startup. I'm building a product. And now I'm coming to a user and ask them to open up all of their Slack history. So today, that feels similar to give Gmail approval to whatever, share your picture. Now I go now.

Joey Gonzalez (27:27.725) Yeah.

Eldad (27:31.22) I don't know, I'm registering to a podcast tool and it asks me, well, if I have access to your enterprise data, your Slack, your Salesforce or this and that, wouldn't we need as users kind of an abstraction there because every startup that does AI will ask for the same Slack feed. So kind of how would that work? Will we start upload that? How is the learning process even working? Because ML is very,

Joey Gonzalez (27:39.486) Okay. So, we're going to go ahead and get started. So, we're going to go ahead and

Eldad (28:00.424) vertical, right? We took a few scientists, they were sitting for a year, they own all the cleansing, all the hard stuff. How does it work? Like if I connect to a product, how does it work?

Joey Gonzalez (28:10.937) So I'll start narrow. Yep. Okay, great job. Yep.

Joe Hellerstein (28:11.195) So maybe I can take a crack at this one, Joey. So I think unpacking some of your question there, part of it was around authorization and security. Part of it was around how does this actually work. And so let me see if I can unpack that a little bit one thing at a time. So having spent time in the enterprise space at a couple of companies, I would say a lot of the discourse in the newspaper and on X about security is often just not really a problem inside a company. So let's take your startup, right? Your Slack is already visible to your employees. And if you're publishing it like dev help channel on your Slack, you probably aren't going to object if the dev help channel is being indexed and here.

Benjamin (28:55.752) Thank you.

Joe Hellerstein (28:58.059) indexed is indexed into the LLM, right? So there's gonna be places where we're gonna get very comfortable with your employer saying, these are public repositories. When you participate in this stuff, your comments are being indexed and shared.

Right. And then there'll be a bot in the Dev Help channel that will answer questions alongside your fellow engineers, right? And so I don't necessarily feel like the auth problem is all that complicated for a lot of these enterprise use cases that we're looking at. Now, if you talk about patient records or inside a hospital or something, life gets much more complicated. And again, this is why doing the Dev facing use cases first is just a lot easier for everyone to digest.

Eldad (29:36.752) Yeah.

Joe Hellerstein (29:37.871) So there's that piece. The second piece, which I think you're hinting at, which is part of the value proposition of a company like RunLLM, is you've got lots of really useful data in your enterprise that chat GPT knows nothing about. You've got all your documentation, you have your private GitHub repos, you have your Slack channels. These are all things that as developers, you guys share and you've authorized each other to use so you'd be happy to quote unquote index or to fine tune a model or do rag on. And so...

Then that gets to your operational question. How do I take this corpus of stuff inside my enterprise and do something that integrates the wonders of chat GPT with my private stuff?

And the answers to that technically these days are twofold. There's rag as we've described, so you index the stuff. And there's also fine tuning the model where you add another layer of inference on top of open AI or your open source model that is trained for your data set. And maybe with that, I'll hand it back to you, Joey, as to like, you know, what actually happens when I point run LLM at say my GitHub repo.

Joey Gonzalez (30:42.013) Yeah. So.

It's great questions. We mix these two technologies, two approaches, RAG, again, the index retrieval story that we discussed for a while, and then fine tuning, which maybe I'll give a very quick description. Maybe think of it as I'm studying for an exam, and how would I study for an exam? I could generate some practice questions and then give the model some example answers, and then tell the model, adjust your weight so that rather than predicting what you would have thought was the next word, predict the next word that more aligns with this answer.

Benjamin (31:00.311) Thank you.

Joey Gonzalez (31:10.478) So this is instruction fine-tuning, and we give actually, we generate practice question answers for your documentation, for your code, using external models like GPT, and then we can retarget our models. In fact, we can retarget hosted solutions like OpenAI's fine-tuned services against your documentation through these questions and answers. And we go even further to make it so that if we retrieve documentation, it gets good at reading the documentation and answering questions.

So we practice on the sort of an open book exam, if you will, and get better at even reading your documentation when being asked interesting questions. And so then we can specialize models for your domain. And we don't just yet, we could specialize models for your tone and how your company likes to speak about stuff to capture even the style of how you approach stuff, let's say, in a message board. So that's one aspect, so making the models more aware through, again, RAG and fine tuning. The other thing is just as an interface, where do I chat?

I chat on Slack most of the time today. So if I want to ask a question, it'd be nice if a bot would jump in and say, yeah, someone asked a similar question. Um, here's a more detailed answer. And even better tomorrow, I get a message from the bot, Hey, actually someone's figured out a much better solution to the problem that you originally asked, um, here's the better solution. So it chats back. Um, it's not just a matter of me going to a website and asking a question. Co-pilot doesn't come back to you and say, Hey, is there a better way to write your code now?

Um, and that's kind of where we want to be. So the interaction, the surface, uh, Slack provides a surface that people are already used to chatting with. Um, we're building other services. Well, uh, getting to the higher point, uh, companies in the space have to find ways to bring LMS back to where everyone is today, um, so that it can integrate in how people work with the data and the tools that they use. Um, and so a lot of it is, uh, integration, a little bit of it is a clever, you know, kind of combination of rag, fine tuning and other tricks that we're studying and research.

Benjamin (33:01.994) So, but like the, the fine tuning itself, this is then also done by OpenAI at the end of today, right? So kind of your, like, or you're the ones actually fine tuning the model.

Joey Gonzalez (33:02.309) Bye.

Joe Hellerstein (33:02.375) some.

Joey Gonzalez (33:09.822) Thank you. Yeah, it's complicated. So let me answer in two parts. So OpenAI and every other fix and shovel company now offers a fine tuning service. That fine tuning service takes strings with question answer pairs. That's it. And then it runs gradient descent, the boring algorithm that we've been doing for training on that data with a loss that everyone uses. The art in fine tuning now is not the algorithm itself, but the data that you construct.

That's what we do. So we construct very clever datasets that allow the vanilla fine-tuning techniques that everyone sells at under cost of the GPU power to be able to make the models that they host for us better. So we then own the model inside of OpenAI and then can serve predictions. But if we decide to use together or any scale, or I forget where the other recent startup announcements, any of their fine-tuning services would be able to take the same dataset and make a model better.

I'd love to run our own fine tuning. We've been studying the systems for both fine tuning and serving these models, but right now, because of the VC money being thrown at models and companies, it's cheaper to use other people's solutions to do it. So the innovation actually isn't in the technology stack. It's really in the data and how it gets put together.

Benjamin (34:28.621) Do you see that change? Like, do you see that changing? I mean, obviously if I'm open-air, I like, I'm not going to give you my model weight so you can start fine tuning, right? Like this is actually a big part of my secret sauce, right? You do?

Eldad (34:28.812) Well, we'll see. Go ahead.

Joey Gonzalez (34:39.802) You do. So you don't let me look at the weights, but you let me give you data, and then you fine tune your model for me. And you love it because I stay inside of your ecosystem. And you charge me a little bit more once I fine tuned my special model, my special version of your model. I never get to see the weights on my model. Yeah. Oh, good question. Yeah.

Benjamin (34:53.005) But I got confused by the eyes and use now, like I didn't know who was off my eye anymore and who was.

Eldad (34:59.168) So what you're saying is OpenAI, part of that is building an ecosystem that's actually made up of that optimizing, fine tuning the model for the vertical need. But will OpenAI own the indexing? Will Slack own the indexing? Will the startups ingest all of that Slack, like the GitHub or the Slack data, like all of that huge data set?

Joey Gonzalez (35:25.642) Great questions.

Eldad (35:27.117) Where is it? Who owns it? Who pays for it? How does it work?

Joey Gonzalez (35:30.654) So right now we do that. So we manage the data. If you like, you can have OpenAI do it for you at an extremely high price. They charge a very high premium to stick your documents in there, whatever they have doing their indexing. So yeah, so there is multiple solutions to this. We've put together a lot of commercial solutions that are currently low price, but based on open source projects. So we can easily swap in and out if we needed to. But right now the economics of all of these hosted solutions are pretty favorable.

So that's where we go today. It is true that there's this integration of different data, tools, models. That's kind of where people are trying to differentiate today, since the model itself is hard to differentiate on other than the big commercial providers. And the serving infrastructure itself, like what used to be the exciting stuff, is now pretty horizontal and hard to differentiate. So, yeah.

Benjamin (36:22.498) So to play a bit like kind of like devil's advocate and like push a bit, maybe it's like if I was working at Slack now, right. It's like, and if I'm seeing what you guys are doing with run LLM, I guess a lot of like the kind of cool stuff you're doing then goes around, hey, how do you kind of crawl Slack, Google docs, kind of build good ways to augment your models? If I was a Slack executive, I would really want a piece of that pie, right? Like I completely see the world moving into this direction that you're describing and outlining. But.

Eldad (36:44.028) And I'm going to be talking about the importance of the

Benjamin (36:52.53) I probably over time want to make it harder for you to kind of index Slack so that I can be the one indexing and kind of, uh, cause I own the data, right? Like I actually, like, I'm in the kind of like as Slack in the great position that I have all of this data. Um, I'd want to make it harder for a company like you to index it so that I can actually generate some of the value that, that you're kind of then trying to grab as run LLM. What's your, what are your thoughts on that?

Joe Hellerstein (37:17.103) Yeah, I have, I have a pretty clear sense that that's not going to happen for the same reasons that, um, these, uh, end user tools don't tend to own your search or other infrastructure, right? Uh, uh, inside the walled garden of Slack are many good things. And they also are constantly asking you to connect your Google docs, connect your GitHub, et cetera. Um, but

Typically, if you're even an individual, but certainly in an organization, you don't offload your data management to a chat tool. It's just, it's a type error that people usually don't engage in. So Slack, if they're smart, we'll have chat bots in there that understand the Slack data and they'll ask you to wire stuff up. But the idea that Slack is going to do a great job on helping developers do, uh, you know, API exploration and all the stuff that we're doing at, at run LLM just seems a little farfetched. that the competition will be with Amazon than with Slack, with the infrastructure providers, because it's a horizontal problem across all your data silos. And so you want a solution that goes across the data silos and presents you probably multiple user experiences, as Joey's been alluding to. So the Slack user experience is not the only one you'll want, and the data in Slack is not the only data you'll want. And so I don't see that as really the way the industry will evolve.

Benjamin (38:36.886) My follow-up question to that would be then maybe Slack is just like losing relevance over time, right? It's like, maybe there is a better tool where you do your chatting, writing design docs that's like wired into everything where kind of then you can like answer all of those questions. Right. Just like.

Eldad (38:50.342) But actually, we're, I think we're not discussing Slack, the IDE that serves users to chat. This has nothing to do with it. We're discussing Slack as a data source.

Benjamin (38:55.809) Right.

Eldad (39:02.652) How do we access it? In your case, do you need to pay them to ingest the data? If someone registers, are they offering you ways? Because there isn't an easy way today to treat Slack as a data source, but there are needs like as a data source versus humans that just search it.

Joe Hellerstein (39:22.663) So let me shift the conversation just slightly to provide perspective, not to dodge the question, but to provide a version of the question that maybe resonates better, because I know where you're going with this. Stack Overflow is maybe a better example, because Stack Overflow usage has been plummeting since ChatGPT and Copilot came out. So that was the place where there was a data set, a user experience, and a community around certain tasks, and a better, more intelligent place to do that has emerged in the public domain. So you can ask yourself, I guess, if I'm not gonna use Stack Overflow to answer my coding questions, I'm gonna use something else. Can I get that something else to also understand the internal stuff in my organization? And I think that evolution seems pretty natural to everyone. It's like, oh, I want a thing that's better than Stack Overflow because it uses LLMs, and I don't want it just on open source stuff. I also want it on our closed source stuff.

Right, so that's not really a user experience question, although we are certainly playing with the idea of what would Stack Overflow look like if it was dynamic. Stack Overflow is a materialized view of human knowledge. But if you replace it with an LLM, you've got an unmaterialized view of an infinite Stack Overflow, right? You can have seven different chatbots that are fine-tuned in different ways or use different foundation models answering questions and arguing with each other. And then that can be almost instantaneously dynamic. Instantaneous is strong, but within a minute or so. And so you can ask anything and get hundreds of comments. So that's kind of the shift I think that's interesting. The question of what's the API to extract data from Slack, and do they make it public, and do they charge for it? This is in there, but it's kind of a mechanical question. It's not this kind of like, paradigm shift sort of question, I guess. And the way I look at this space is, even today, you can build tools that will crawl your Salesforce and so on and build an enterprise search solution, for example. These kind of horizontal tools are going to be desirable. People who have interesting data will be incented to share that data because they probably can't own the universe. And there will be APIs to do that stuff. And perhaps,

Joe Hellerstein (41:43.003) people with interesting data get a slice of the pie. They get to charge you something for using the APIs. But I don't believe in a end user tool lock-in future at all. And I think there's room for these kind of more use case horizontals, let's call it. It's still a vertical because it's all about developer experience, let's say, in our scenario. In a health care org, it might be all about patient records. But the point is you pick a domain and you fine tune to it and you get really good at it. And you find the touch points in that domain where there's apps that people like to gather around. And that's kind of the ecosystem you want to integrate. But I think surely big players are not blind to this opportunity. But then they get analysis paralysis around which verticals should we go after. And maybe we should go after everything all at once. But then we're competing with OpenAI, so maybe we shouldn't do that. And you can see how taking a slice with a startup-sized focus, you can really make a dent.

Benjamin (42:41.19) That's a great answer, Joe. Nice. I think my closing question on the run LLM side would be, how do you see this evolving like over the next kind of one in two years, right? Like at the moment you're super focused on this internal developer side. When I'm thinking now is like, is there a path to open this, for example, to customers of your customers or something? Cause we're building a database. There's a lot of like just SQL in our Slack that's kind of relevant to the system we're building, right? This is something where you could in the next step then, I don't know. Write a chat bot for our customers. Like, yeah, like take us, take us through the evolution in which you guys have planned for, for the 2024.

Joey Gonzalez (43:19.759) I can start. So again, we may be overindexed on Slack. So a lot of our, in fact, right now we're not indexing Slack in our first cut of it. It's just a chat surface. And within the thread that we're in, we have that information, but we're not pulling all the information. In fact, we're focused more on a lot of your design docs, your internal GitHub documentation which means that companies that have lots of complicated APIs that might want their customers to have quicker help on those APIs and faster turnaround with lots of information that's kind of tailored to them might want our product not just to help themselves, but to help their customers. And one thing we're kind of playing with is the possibility of making this interface, this next generation of kind of the stacker of flow, something that's public that anyone could go to and then helping people find the right technologies to solve their higher, you know, higher level technical problems and having a, you know, a cogent discussion about why that's a good idea from a security perspective and from a performance perspective with different LMS giving that kind of insight. And I think that could be fun. And I think again, a lot of companies that have interesting APIs that they'd like to get in their customers hands might want to work with us to help make that process easier.

So I think there's an opportunity just in the developer world to broaden beyond just, you know, internal help to external help. Then of course, we've already started talking to you like, this is great. Can you help us with other parts that aren't development oriented? And, you know, in some cases, those other parts are a little further down the road, might require, I don't know, HEPA approval, like going through different processes, but getting started at something that's easy, again, in developer space and growing out is something that I think is a good path for us as a small company to grow and succeed.

Joe Hellerstein (45:01.819) So I don't know if you were looking for a role as a product person at OpenLM, but definitely the suggestion you're making of, wouldn't it be nice to have a well-trained chat bot for the API we're exposing to our customer, is something very much on our minds. I think if you're familiar with OpenAPI or Swagger, we're trying to auto-generate API docs as it is. It's pretty primitive, but it's something. But that's a good point kind of spirit, you put an LLM behind that and you perhaps populate it with more information than just what's in your open AI spec. And you get something that's a chat about on your web page that can answer your customers' questions, right? So yes, we are thinking about that. Joey alluded to the idea that maybe there's a Stack Overflow open source kind of thing that might be interesting. And of course, we talked about internal usage as well in companies. These are all things that are on our minds as we look into 2024.

And I think we'll pursue some subset of them with product announcements, you know, but we'll see exactly what those look like as they emerge.

Benjamin (46:08.062)

Awesome. We're, we're excited to see those. Um, thank you so much, Joe and Joey. Like really, I mean, this was great. Thanks for bearing with LDAT and me for all of our rookie questions about LLMs. Uh, it was great to kind of listen to two experts in the field telling us more about that, the ecosystem run LLM. So yeah, thanks a bunch for being on the show. Any closing words from your end?

Joey Gonzalez (46:14.82) Yeah.

Joey Gonzalez (46:30.554) Well, I guess I can end, you know, 2022 was a crazy year. 2024, I suspect will be twice as crazy. And we'll see a lot of startups hopefully succeed. A lot of startups are going to fail. And, and I actually, I would guess that by the end of 2024, we're using vastly different workflows, models, and kind of exciting new ways. I just, it's insane the rate at which research is happening now and, and how it's immediately being translated to products and a lot of cases. So.

Eldad (46:56.389) Thanks for watching!

Benjamin (46:58.846) That's awesome. So we'll have you back at the, we'll have, we'll have you back at end of 2024 and then see, see how that age Joey. Awesome. Yeah. Thanks. Thanks so much for being on.

Eldad (46:59.656) We'll take your prediction to the bank.

Joey Gonzalez (47:02.242) Yeah, we'll see. Yep.

Joey Gonzalez (47:08.026) Yeah, sounds good. Yeah, thank you. Yeah, bye. Thank you.

Eldad (47:12.692) Thank you. Thank you.

Joe Hellerstein (47:14.375) Thanks, guys.

Read all the posts

Intrigued? Want to read some more?