Listen to this article
In this episode of The Data Engineering Show, the bros speak with Paarth, a Staff Engineer at Uber, about his work on Genie - an innovative AI assistant that revolutionizes on-call support by combining RAG (Retrieval Augmented Generation) with agent-based automation to help engineers find solutions faster.
Listen on Spotify or Apple Podcasts
[00:00:05] Benjamin: Hi. This is Benjamin. Before we start with today's episode, I wanted to quickly reach out on a personal note. We've just launched Firewall Core. FireVault core is the free self hosted version of our query engine. You can run core anywhere you want, from your laptop to your on prem data center to public cloud environments. Core scales out, and you can run it in a multi-node configuration. And best of all, it's free forever and has no usage limits. So you can run as many queries as you want and process as much data as you want. Core is great for running either big data ELT jobs on, for example, iceberg tables or powering high-concurrency customer-facing analytics on big datasets. We'd love for you to give it a spin and send us feedback. You can either join our Discord, enter our GitHub discussions, or you can just shoot me an email at Benjamin@Firebolt.io. We'd love to hear from you. We added a link to Firebolt course GitHub repository to the show notes. And with that, let's jump straight into today's episode.
Hi, everyone, and welcome back to the data engineering show. Today, it's our pleasure to have Parth joining from Uber. He's a staff engineer there. Welcome to the show. It's really great to have you. Do you wanna tell us a bit about yourself, about your role at Uber, what you're working on?
[00:01:13] Paarth: Thank you, Benjamin. Thank you, Eldad, for having me. And, it's my pleasure here. Yeah. So I've been at Uber last four years working on Michelangelo, which is our, like, SageMaker, like, ML platform at Uber. I've been working on feature store, you know, online serving at scale. Basically, we have millions of requests that we need to scale for for with with many models. And then last couple of years, I dove into Gen AI, working on rag, vector search, and building apps like Genie that we are going to talk about, where basically we had on-call productivity, you can say, pains that we wanted to solve out on our own. And that's how Genie organically just came out of a hackathon, as you're talking about. And then it just literally just started growing, last couple of years. And, Arnab and team have been one of our partner teams, and we have been working with many partner teams to really uplevel the bot because we realized that just the bot, the plain rack, doesn't work. So that's how we started this journey of accuracy improvements. Yeah.
[00:02:17] Benjamin: Okay. Very cool. So for those of a kind of listeners who never heard of Genie, I hadn't heard of Genie before. Like, you wanna get the bit more context of what role it fits within Uber, kind of what workloads it serves, basically?
[00:02:30] Paarth: Yeah. Yeah. For folks who don't don't know Genie, basically, Genie is, like think of it as, like, your on call assistant. Right? So and Uber is big into Slack usage. So different infra teams have their Slack channels, whatnot. I think different companies use Teams also maybe. But, basically, you go to a Slack channel. Let's say you have a problem with Spark or you have a problem with Fling or any of those open source technologies. You go to a channel where you have your infra team engineers helping you. And because these technologies are widely used, you have to wait a lot. Sometimes, engineers are dealing with, high severity issues, and they don't have time to help you with every small issue that you're running into. And documentation, as you know, engineering is not easy to maintain. So that's where, you know, the pain started coming that, okay, we are waiting on calls to respond, and then you're feeling frustrated. And that's where it felt like almost like a bot assistant would be very helpful, which can search for you from different services, different documentation, and give you the answer so you do not waiting on on calls as much.
[00:03:31] Benjamin: Nice. Super cool. Can you take us through, like, that data pipe that powers Genie? Like, where are you getting data from and to what systems does it feed? How much data are you handling? Tell us more about it
.
[00:03:41] Paarth: Yeah. So, I mean, data sources, what we realized is for our engineers to really get help, data sources really should be internal only because we customize lot of these open source engines for, making it work at Uber scale. So we have data sources like Wiki, Jiras, Stack Overflow. We have our own version of Stack Overflow. We have Google Docs, and then we have people even storing data, custom, you can say, policies, custom information in PDFs. So there is variety of these data sources, and, obviously, source code is another one now. So we listen to and ingest all these different data sources in our own hosted vector DB solutions. And then on top of it, we basically want to do, look up semantic search. And then we're obviously what we're going to talk about, like, how we are customizing search and retrieval for each particular use case to really make it better.
[00:04:37] Benjamin: Okay. Are you able to share how much data Genie is handling overall?
[00:04:41] Paarth: Maybe not the specific numbers, but at this point, close to three fifty plus channels. That's quite a lot. And every channel basically has some flavor of their data that is we are ingesting. So what we have tried to do is instead of building because we are a very small team, instead of building a mega scale pipeline that just ingest all data sources and then keeps a central data source solution, we instead are giving users the flexibility to ingest what data sources they want. Right? And then what we found also is that works better in terms of search. So at least for every team, they are looking at, like, let's say, hundreds of thousands of Wiki pages, hundreds of Google Docs, and then we have design docs that we also, you know, ingesting that. So I would say per use case, you're looking at with a compressed ratio, at least few 100 megabytes to gigabyte or several gigabytes depending upon how much the user is wanting to ingest right here.
[00:05:37] Benjamin: Okay. Super interesting. And as the core vector store, is that also open source technology you're customizing at Uber? Is that like an out of the box existing piece of data infrastructure? Tell us more about that maybe.
[00:05:50] Paarth: Our team doesn't manage the core technology itself, but, yeah, we use OpenSearch as one of the vector DB solutions. Um, we also have our own ingrown vector DB solution as well. So which is what our sister team in the search org manages for us. But, yeah. I mean, it's it's one of the those things that has become popular in general. We're talking about Lance TV is another alternative. And you guys obviously do Firebolt as well, but, yeah, OpenSearch seems to at least our team search team seem to feel like OpenSearch would be a good solution for us. Yeah
.
[00:06:23] Benjamin: Right. Makes sense. Was that the first time in your life you built these, like, kind of rag style pipelines at scale? Like, it's new tech for everyone. Right? Kind of how did you even onboard into that? Like, how did you figure out how to make good technology choices? Where did you learn these things from?
[00:06:40] Eldad: And does it feel like doing the Hadoop days at the beginning again? So it's kind of like, okay. Basically, blank sheet. We need to come up with the whole stack from scratch. There is some open source spread at some places, engineers going back to building new infrastructure to serve new workloads. That must be, like, three times in a lifetime kind of exciting thing. Right? Doesn't happen every day.
[00:07:06] Paarth: Yeah. Yeah. 100%. And it this funny thing that while I was at Amazon, I was part of this AWS chatbot team, and we were exactly building same same thing there. And that was pre-LLM era 2018, and it almost seemed like I was like, okay. And it feels nostalgic to literally rebuild what I was building at Amazon before.
[00:07:24] Eldad: So you know how everything ends up. Right? Everything ends up as a cloud native data warehouse. That, that's what happened to Hadoop eventually. Right?
[00:07:32] Paarth: Absolutely. Yeah. Yeah.
[00:07:34] Eldad: But, like like, looking at the tech, looking at the workloads and and those new use cases, like, it feels like innovation at scale is re-happening from scratch again. And this is exciting. To us, it's super exciting, and it must be even more exciting to you given the access to the data that you the team has. So tell us more. Like like, how did you stitch it together? What are you specifically proud of? How do you see that evolving in the next few months? Let's not even think twelve months.
[00:08:02] Paarth: Absolutely. Just to even connect the journey. Right? Like, when we started, it it almost everything just happened very organically. I don't think we planned everything. But, so there was, um, your question of how do we stitch everything together? It yeah. It almost felt like they're doing what EMR was doing. , you know, you have your Hadoop and big data technology, and we needed these pipelines to basically process all this data quickly. And then that's where we started betting on Spark to really help us. And Uber, uses Spark a lot. So we were like, okay. Let's let's go with what is proven well at Uber. So we bet on Spark. We use Spark a lot for, you know, data processing, data parallelly, having to chunk it, shuffle it really quickly, and make sure we can create embeddings at scale because that's really the main two bottlenecks. You chunk your data, you have your all data ready, and then parallelly create the embeddings at scale. Right? So we had to basically scale our you can say the, um, whole infrared layer to chunk data faster to be able to create embedding set scale. And so that that meant also we had to scale our, what we call as gateway engine that, really helps us create those embedding set scales. So we had to scale all these layers. And where I see going with this, I think that's maybe, a very hard question given how fast this technology changes and the expectations of customers is growing so much. But I do see, like I mean, I think, in general, what we are seeing definitely is people want more customization, so which means more Uber internal data sources needing to come on the platform. And then, definitely, everybody is going, going with the agentic route now at this point. And, few use cases that that we have really, really nailed down and worked well, including the one that we published in the blog, you know, we definitely see agents as the way to interact, and especially MCP servers have come now. So agents, MCP servers, and your data custom data sources, stitching it all this together is really how you can make, at least in my opinion, very good, Genya app now.
[00:10:08] Benjamin: The innovation, the space has been, like, super crazy. Like, how quickly there is new frameworks popping up, kind of gaining tractions, etcetera. Like, we're working on some agents now that, for example, like, optimize your SQL queries and kind of these types of things and are deeply built into the product. For me as well, like, kind of, like, having, like, this system c plus plus programming background, it's been fun learning about this, like, completely other part of tech ecosystem. And then one thing I'm personally very curious about is, like, the intersection between, like, LLM and kind of SQL query optimizers because I have this, like, more traditional compiler query optimizer background. Right? And, like, now we're kind of fusing it with LLMs and, like, figuring out, okay, like, when are LLMs great, kind of when are more traditional query optimizers that, like, reason about correctness great. Crazy how in, like, so many pieces of technologies, like, you're now, like, infusing these LLMs and kind of realizing, oh, damn. Like, you can build mind blowing things that were impossible before.
[00:11:06] Paarth: Yeah. Yeah. Totally. And that you hit the point really well that this whole new suite of ways to connect with the databases have come. Right? Like, it is not there. I was reading some blogs about how Google has, basically had this agentic thing which can connect you to all Google Cloud technologies. And, I mean, we at Uber also have done I believe there's another blog by a sister team done basically query Copilot with what they call it, which is your query optimization using LLM and, you know, making sure you can write quick queries. Right? And that's a new suite of things that is not even you know, nobody even thought about that. But, yeah, you can pretty much write all your queries using LLM and build that framework.
[00:11:47] Eldad: Let's not get into predictions on how painful that's gonna be on the market. But if we just focus on the optimizer within the database, it being owning that brain for the last forty years, being responsible to make all smart decisions for any user, any query, any architecture. Like, the one thing that hasn't changed is the way the optimizer feels about the users. It needs to make most decisions for that. Now it's changing. Now the optimizer has part of its brain being outsourced through an LLM, gets back the recommendation. It still needs to do optimizations at lightning speed because, right, you're getting tens of thousands of queries in. You need to optimize all queries. So you need to get feedback from the optimizer, and then your database now needs to think differently. It needs to allow to infuse those hints, that context, that insight that the new LLM optimizer went through into any query. And that's changing how database engineers and databases are thinking about optimization, but it what it also does, which is fairly excites Benjamin and us and anyone who deals with efficiency, it changes the way databases are being released and what the focus is. If my LLM can go for thirty minutes and go over 50 dimensions of optimizations and figure things out that it would take me maybe a week or two people in my company on Slack giving an advice how to change the granularity of a block or how to change the threshold of RAM or whatever optimizers do. Now it's all about how much can we expose, how much technology, how much variation, and, like, how specific can we get. Right? There are 100 ways to implement a join algorithm. Now we're gonna expose all of them because now DLLM can actually make sense out of it. Like, that was unheard of. So it's all about obstruction. Right? Like, you always have to be straight off. Every database engineer will tell you. There's a trade-off between being fast and how fast you get to being fast. It's hard to get fat to be fast so that that, right, that snowflake, abstraction that we like to say comes in, but now it's changing. So now everyone is mama's genius with an LLM. Every agent can just spend this thirty minutes on running queries and getting actual results, actual telemetry no human being has ever looked at. There are levels of telemetry that no human being should ever look at, but agents love looking at it.
[00:14:16] Benjamin: Even I have certain levels of telemetry I don't wanna take a look at when building the database where agent will happily do it for me, and he'll thank me after saying kind of for the opportunity to look into it. Exactly.
[00:14:29] Eldad: So, so, yeah, so it's exciting. And then instead of show tables and show statistics or explain syntax that you build for users, you're now, okay. So how do I actually provide granular, summarized, relevant profiling information for agents so they can learn fast sending less data. So this is very exciting. Um, and this is gonna absolutely change how we build software, and we're not even getting into infrastructure. So it's really exciting that you guys like, the way Uber has always been flexible and open-minded. Right? Like, it's for the last, I don't know, more than ten years, going to the Uber blog, you go there, and there's always engineers experimenting at scale. So meeting you now is exciting and really, like, hearing about it. So tell us, how do users react when you're telling them you use Genie and then you feed it with your completely random Slack chatter, and then it gives you documentation you could have drained off? How do they react to that?
[00:15:29] Paarth: Yeah. And I think we've even evolved from, okay, just being like, okay. Hey. We will give you the right documentation to, like, okay. It was starting to evolve into a situation where you're like, okay. We'll also start taking actions on your behalf. Right? And that's really where I see, I think, to our conversation about the databases being smart and, you know, doing so much preprocessing before you even come to the database. I think it's getting similar situation where the bot can do a lot more now. Right? And I almost connected back to my AWS journey where we're literally trying to do the same thing in natural language where you could manage all your AWS resources using natural language. And and that's really where I think we are going with this. Any problem, be it like, okay, I have a permission issue. Okay. I I need to, let's say, update my spark, resources. I need to get more capacity. Pretty much all of that can be done now but behind the scenes not only bought looking at your documentation through MCP servers now coming and pre previously, obviously, we had the tools that we are all the internal tools that we built to have the LLM be able to leverage it. All of them can pretty much now take actions on your behalf. Right? And that's where agents become so powerful that you can have different sub agents which can take specific action for you, and then you have your this intent agent supervisor which does preprocessing of the query and takes and figures out, okay, which particular sub agent is really the right one for me to, you know, resolve this question? And then that sub agent goes, takes actions, look at the documentation, can do a lot more things, because of this whole agentic framework that has come with LangGraph and other technologies, obviously.
[00:17:08] Benjamin: Nice. Super cool. So if you contrast this to your time building similar looking technology for the end user at AWS, right, like, it is a completely different technology stack, right, in the sense that it's like you solve the same problem, but it's like man or humankind figured out, like, a smarter abstraction to actually solve these problems in a better way. Like, maybe take us through that also. Like, do you feel like the things that you did then at Amazon helped you become a better engineer in this new age, or do you actually think, okay. It's like going back to kind of zero and kind of rebuilding all knowledge you have in this space?
[00:17:44] Eldad: Just my instinct that I've build over my career, and I can't explain that. Like, yeah. I think I mean,
[00:17:50] Paarth: I feel like as you're saying, right, Piel, that right? Like, I mean, no instinct that you derive from goes based. Right? So at Amazon, whatever we built, it was pre-LLM era. So, definitely, I I would say the technology was maybe not as robust mature before. , but that intuition that, you know, you comes from, okay, building this kind of bot, I feel like that intuition came again as we were starting to see this technology come, and we're like, hey. This looks like, okay. Where you can pretty much fit all these pieces together. So I almost felt like, that experience in Amazon was like a starting experience of, okay, how the chatbots can really do lot more. And then this was like a stepping stone to say, okay. Now the technology is finally there. Now you can stitch everything together. It almost feels like going from level zero to level one as building the same similar experiences again. Yeah.
[00:18:39] Benjamin: Yeah. It's funny. Like, we have these conversations every now and then with, like, data engineers. Right? Like, similar to what Eldad said. It was like, okay. Like, back in the Hadoop days and then, like, kind of, like, modern cloud data warehouses and then, like, next generation of data warehouses. And it feels like there's these cycles in every kind of area of technology where just you take that leap, but then some things kind of stay the same. So looking ahead, like, what are you guys working on right now? What are you particularly excited about at the moment? Kind of what are new challenges you want to solve that you might not be solving perfectly right now? Kind of take us through the next couple of months.
[00:19:16] Paarth: I think where we have landed last six months, including the blog we published. Right? Like, so I think we found out that how we can do this agentic solutions, and I think what I call is agentic Genie now. Basically, Genie was our you can say traditional plain React rag style board. Now genie has become agentic genie. Right? So what we have seen with several use cases is agentic genie works well when designed well. When you've analyzed the problem of which type of subproblems the bot should resolve per channel, per use case, and then you go go with, solving each subproblem with each sub-agent. If you do that way, it works really well. Right? And that's what we have seen as delivered good success. I think where our challenge right now, which is where I'm still thinking and it's not something we have designed or thought too well, but it is funny that how or funny or good in a way that how cursor and other IDs have up leveled what agentic experience can mean for code. And I I kind of want to take that same cursor like experience for agentic genie, where you just come and without you having to do anything, figures out all the right sub problems for you as a channel owner, as a use case owner. We decide underneath which particular sub agents to call and, which particular MCP server tools to interact with, and it just figures out everything for you. That's what we'll where I would like this North Star to be. That way, it becomes agentic genie is like a perfect on call assistant what Cursor and other IDs have done for coding.
[00:20:52] Benjamin: Nice. Okay. Super cool. So are you guys using Cursor to build Genie?
[00:20:57] Paarth: I mean, every I think every developer is is likely using Cursor and any other of those, forms of ID experience and CLI experience right now. I mean, yeah, everybody uses that. Absolutely. It makes your life better for sure.
[00:21:11] Benjamin: Yeah. Same here at Fireball. Very cool. So maybe zooming out from Uber a bit. Right? Like, what are you excited about in this space in general? Is there, like, kind of, like, how do you actually stay up to date when building these things? How do you learn about other how other companies are kind of tackling similar problems? Are there meetups you're going to in the Bay Area? Are you just reading blogs and watching YouTube videos all day? Like, take us through kind of that.
[00:21:36] Paarth: Yeah. No. Absolutely. Obviously, meetups is a great way to meet. I think I've been to some of those recent, like, um, meetups. One of them was by Meta, which is called Scale at AI. It was really, really very well done meetup, and I was impressed with how Meta has optimized every single layer of infra, and they have done amazing. They have developed something called MetaMate, which is super cool. And when you look at those experiences, you're like, wow, there is so much more to do here. So I definitely deriving inspiration from all of the, you know, smart co peers in, different industries, different companies, and that's definitely one day. And then I think I I do see still, like, you know, when you apply those inspirations to your use case, there is still a quite a bit of steep learning, trying out different things. So nothing just works what work for other companies just like that. , there is, you know, dollar challenges, core cost challenges, scale challenges that you have to go back and redraw and figure out how it works. But, definitely, I think meetups, blogs, podcasts, like, what we are doing together. I mean, yes. I think looking at all of this is definitely one way to learn. Yeah. And this I think the Bay Area is really good in that sense that there's a lot happening here. So people are really excited to build things. So you find those exciting builders from startups to all the way companies that I've opened yet. Just trying so many things. So it's very humbling to be in Bay Area, and, you know, you just know that you're never done learning because there is new things coming out here in the market all the time, every single day right now almost.
[00:23:11] Benjamin: Yeah. I mean, you're at the epicenter of, like, this next generation of software, basically. Very cool. So this was super interesting, Parth. Any closing thoughts from your side? Like, something you wanted to really kind of chat about, something you wanted to bring up on the data engineering show today?
[00:23:27] Paarth: Basically, I would just say that I think people who are trying out, all these technologies, I would recommend everybody try as many things as you can. And, obviously, keep a problem in mind because just trying in itself, you can just keep endlessly experimenting right now, and you will not go anywhere. I think having a problem in mind always helps. That way, this it's the, energy is little bit focused and directed. I would say that is definitely my learnings, and I think keeping eye for experimentation open. I do see one little bit of a challenge building things. Right? That you build something, you take a bet on one technology, you build something, and then underneath, new things have already surfaced that keep you outdated. So there is I would say it's also a little bit of a pressure situation that whatever you're building is not enough because the expectation has already gone to the next level. So, um, I think it is a little bit hard to, you can say the pace is too fast right now. So as a developer, keeping your customer happy is hard, but then also setting good expectations with your customers is very, critical more critical than before. Because of what we shipped before in software, you know, the pace has completely changed now. So having that expectation and, obviously, keeping your, um, eye for getting feedback from customers is equally important. Um, being humble enough that you know that what you developed is not perfect, and you need to keep iterating and keeping making better. I would just say those would be my quick takeaways as we experiment and go with the next flow of technologies that are coming in here.
[00:24:59] Eldad: Amazing. Nice.
[00:25:00] Benjamin: Couldn't agree more. Thank you for having joined us on the data engineering show today, and we look forward to meeting in person when we're in the Bay Area next time.
[00:25:08] Paarth: Absolutely. Thank you, Benjamin. Thank you, Eldad. It was great to to you guys, and I I really looking forward to meeting you guys also in person very soon, hopefully.
[00:25:16] Outro: Sounds great. The data engineering show is brought to you by FireVolt, the cloud data warehouse for AI apps and low-latency analytics. Get your free credits and start your trial at firebolt.io.