Listen to this article
In this episode of The Data Engineering Show, host Benjamin interviews Lei, Co-founder and CTO of Fabi.ai, to explore how AI-native BI platforms are reshaping data analytics and empowering non-technical users to derive meaningful insights from complex datasets.
Listen on Spotify or Apple Podcasts
[00:00:00] Lei: For the past decade, it's really difficult to make sure the self-service BI can work. And then now with AI, the worst part is that it can run properly, but the numbers are wrong. Right? So you you immediately, like, might lose some trust with the person, with the user.
[00:00:18] Benjamin: Hi. This is Benjamin. Before we start with today's episode, I wanted to quickly reach out on a personal note. We just launched Firewall Core. Firebolt Core is the free self hosted version of our query engine. You can run Core anywhere you want, from your laptop to your on prem data center to public cloud environments. Core scales out, and you can run it in a multi node configuration. And best of all, it's free forever and has no usage limits. So you can run as many queries as you want and process as much data as you want. Core is great for running either big data ELT jobs on, for example, iceberg tables or powering high concurrency customer facing analytics on big datasets. We'd love for you to give it a spin and send us feedback. You can either join our Discord, enter our GitHub discussions, or you can just shoot me an email at Benjamin@Fireball.io. We'd love to hear from you. We added a link to Fireball course GitHub repository to the show notes. And with that, let's jump straight into today's episode. Hi, everyone, and welcome back to the data engineering show. Great to have you on. Today, I'm really happy to have Lei joining us, who's the cofounder and CTO of FABI AI. Elad is on vacation today, so he can't join us, but he'll, I think, be back ten to four the next show. But, yeah, great to chat with you today, Lei. Tell us what you're doing at FABI, and tell us how you actually got to starting an AI native BI company.
[00:01:35] Lei: Sure. Yeah. So thanks for having me here, Benjamin. So right now, I'm building FABI AI. So it is a AI native BI platform that essentially combines SQL, Python, AI altogether so that it allows anyone to do vibranetics. So no matter where your data lives, it could be in a database data warehouse, but it could also be a Excel file, a Google Sheet, or even just a API from application. You could be able to pull the data through Fabi and then be able to join this data from different sources together to do some analysis. The cool part is that now within FABI, you can really leverage AI to supercharge your data analysis, like, 10 x faster. So our users include not just data engineer, data scientist, data analyst, but also, like, say, other data practitioners such as product managers, founders, gross marketers, operation team. So anybody who want to embrace data to make decisions, you can use Fabi to do the analysis. Right? So that's about Fabi. So about myself, I've been working in this data domain for quite a while, over a decade. So I was trained as a computer scientist, and then I got my PhD in machine learning. So while in graduate school, mostly, I use a method loading CSV files, doing some research. And then later, like, I joined Yahoo. At that moment, like, big data, like, Hadoop, Spark, that has been the main thing. And later moved on to work on sales forecasting analytics and help growth as well. So that's where I get exposed to NoSQL, SQL database, and Net XDB. Right? So the one common theme I have been experiencing is that normally would work with other business stakeholders, could be marketing, could be operations, could be sales. And then on one hand, they have lot of questions, like, say, coming to the data team and then ask some questions. But then our data team normally is always underwater. So that so many things need to do. So never can satisfy the requirement. So that's why when I was at Lyft, I actually tried a few different attempts really trying to empower my business partners. We tried to do some SQL training for our marketing team. So I would say it was a very limited success, probably only quite a few. After training, be able to really, like, use SQL by themselves to do some analysis. But on the other hand, my team, actually, we build all these dashboards like reports, but we'll check the log. Most time, we'll build a dashboard for somebody for certain type of questions, but then it would never be used anymore. So whenever there's another question, they would still come to us. So that's almost, like, always a pain for my team, but also for my business partners. That's why in 2022, when AI really kinda chat to be was out, I said, man, this is the time, this is the moment to tackle problem. Like, my team, myself has been suffer for a long while. So that's how we started Fabi. So it has been a fantastic journey so far.
[00:04:40] Benjamin: Okay. Nice. That's amazing. So I like this term you said in the beginning. Like, I was like, VIBBI, right, which kind of is makes perfect sense. And, like, you have such a prolific career in data across organizations. Right? Like, at Walmart, at being chief data scientist at Clari, being director of data science at Lyft. It's like but you were always in these, I think, very traditional out of world use to run. Right? Okay. Someone needs a specific dashboard. Someone needs a specific type of, I don't know, CXO report that they look at every morning to influence certain decision making. It's like, all of a sudden, you wake up one morning and you at least in principle, any person in your organization should have access to be able to answer any question about the data that's collected. It's like, how do you even build a data platform like that? What are you doing different now with Fabi than you would have in a traditional BI platform? Like, I don't know, Looker, Tableau, Sisense, etcetera.
[00:05:32] Lei: Yeah. Very good question. So if you talk to anybody working in the BI space, like self-service BI, that has been termed for maybe for the past decade. But I have to say that is a false promise.
[00:05:43] Benjamin: Right.
[00:05:43] Lei: Yeah. Is special for data engineers. The one key point is that there's a lot of upfront cost in order to set up this service BI to work. Right? So you have to define all the data semantics and potentially centralized data and make sure the data is in a good quality, define the semantic in certain, like, using LookML or maybe some other type of language. And then you want to encourage your business stakeholders, like less technical partners, to use a product in order to do some self-service analysis. But then on one hand, most organization either documentation or semantics is quite missing. And then on the other hand, most organization, the data, like the schema, the kind of business, like metrics and logic, has been constantly evolving. So it's really difficult to catch up with what is going on within the business. And then on the other hand, you will see the more senior level the leader is, the more likely they would just refuse to go to a dashboard and check out the metrics. They would prefer to okay. Just send me the numbers or send me a Excel file so I will come up with that. Right? So and, moreover, most of the time, these dashboard are very static. Of course, you can put some input filters, but then people always have some customized, like, follow-up questions. Oh, there's some metrics going on, and I can zoom in into certain product lines or certain regions. But moreover is that the BI dashboards is static. It focus on one type of analytics that's like describe what is going on. But the more likely, especially from the company, from the organization, they care more about those who have a why question. What can we do about it? Like, for example, last week, the metric went up by 10%, like, for new user onboarding. What what's going on? Is it we did something right or, like, is something else? So there's a lot of, like, hypothesis testing. That's why you have all the data scientists need to zoom into, kinda say, oh, potentially, because we launch a new product or maybe sometimes it could be like the ETL pipeline has a bug. We need to kinda make sure the numbers actually make sense. So for the past decade, it's really difficult to make sure the self-service BI can work. And then now with AI, one attempt is to let people talk directly to the database, which I would say that's, like, highly not recommended because enterprise data tends to be very noisy, very complex. And then on one hand, the data semantics, of course, like, you define all the database schema, you can get some information there. But more importantly, like, lots of this business logic is spreading our course different teams, different individuals. So if you just let anybody to talk direct to your database, most likely, you will get kind of a wrong result. K? Using all these AI and ARM, they have no problem writing syntax crack SQL queries, but the worst part is that it can run properly, but the numbers are wrong. Right? So you immediately, like, might lose some trust with the person, with the user. Moreover, as I mentioned, like, SQL is only one part of the analysis. It's almost like the full app to pull the data in. But then afterwards, you might run some statistic, like, correlation analysis and then potentially even, like, pulling some machine learning model. Like, I want to forecast look based on the trend, forecast what would be the numbers, like, next month, next week. So AI actually today can make them very easy to use. So that's what, like, Fabio has been really focused on. We're saying that we really want those data team to be able to, like, say, what type of data is exposed to, like, say, less technical folks. And then you can your customer instructions and the configurations, like, say, semantics and business logic. And one difference compared with the past BI platform is that right now, you can define the data semantics or business context in a very fluent way. Like, you can just upload a context, and then the AI can really understand well. Rather than in the past, you have to go through all these tables, like, configured one by one. But right now, you can do the customization. And one more step further is that the AI can learn from the interactions. Right? So depending on how people interact with Fabi, with the product, and also what type of queries, what type of metric has been using the dashboard, and then the AI can learn from the past interactions and then be able to answer, like, relevant or similar questions in a very accurate way. So that's what we are super excited about, like this viburnetics. And now you can have data team to really focus on, make sure the data is in a good form for the AI to use, But then the business stakeholders, data practitioners, they can ask question, but they have some boundary, like, which is set up by the data team so they can trust, like, the model output. And most of the time, when you have a sense about the business by just looking at the numbers or just by looking at the chart, you immediately get a sense saying that, okay. This doesn't look right. And then the AI can figure out, okay, maybe it's doing some top accounting, doing the drawings, or maybe something else. Right? So I think it's really powerful to allow anybody to bounce ideas, like, go through multiple iterations quickly to see, like, get the numbers, get the analysis as one needs.
[00:11:07] Benjamin: But then, basically, like, in that world, for me to also understand where you think that is going, okay, we'll have a data team that basically procures datasets and important business metrics, and that might tell a tool like Fabi, hey. This view contains, I don't know, our, like, annualized revenue broken down by month. Whatever. Right? And, like, then Fabi basically gets that context provided by the data team. But when a CXO just asks random questions about their business, the goal of the Vine BI tool basically becomes to match that human question to the entities or underlying kind of things provided by the data team. So you're still ultimately in the business of building, like, a semantic model, data model, all of these things. But all of a sudden, it's not just this political thing within the organization. It really becomes that tool that empowers all of your AI vibe interactions with data. Is that accurate?
[00:12:09] Lei: Yeah. So in order to build AI, native BI, I would say the focus should be how human interact with AI. That's why, like, many of these efforts, like, for example, you wanna define semantic layer. But right now with AI, you probably want to define in a markdown text, which is more appealing to the AI to consume. And then combined with some of these MCP servers, you can pull, like, external documentations or, like, say, Confluence page or maybe tickets. All these can be combined to provide the context. So the semantics of itself becomes part of the context for the AI engine. And then on the other hand, as I mentioned, is that the AI itself should learn from past interactions. Of course, like, as the data team, you can configure, like, custom type, like, instructions, like, say, or really emphasize, like, how the metric is defined or, like, say, only focus on certain type tables. Take myself, like, I used to like, when I was working on left, I searched for some revenue, kinda, like, a number. And then immediately, I got, like, 200 tables containing the revenue. Right? So in the end, what I did is, like, in the past is to talk to the person to really understand which table, which column I should use. But right now, you can just use free form text to config for that and the pass to AI as a context. On the other hand, as I mentioned is that the AI should be able to learn. It's like you consider the AI should have some form of memory. By learning from these interactions, it could somehow remember, okay, these are the context. It's almost like you have a new hire in your team gradually as the indirect other coworker.
[00:13:47] Benjamin: We're using Devon, for example, to make, like, documentation changes. That's similar. Right? Like, it learns over time, formalizes knowledge to do better code changes in the future. So it makes perfect sense to have a similar way of generating knowledge or context through interactions in Vibe BI tools. Nice. Super interesting. So okay. Now I understand. Fabi connects to a bunch of data sources, maybe even, like, Google Sheets, Postgres, Snowflake, etcetera. It can produce data in multiple shapes as well. It can produce your Google Sheet. It can produce your Slack message, etcetera. Tell us about your internal data stack. What connects these two things? What do you run internally to connect the different pieces of technology? How's your basically data stack that powers Fabi as a product?
[00:14:36] Lei: Yeah. So the underlying tech structure is that we build this agentic flow for our AI system. So one part is, like, this context engineering. So it would have the rack system to retrieve from all these, like, different contacts. Like, could be database schema metadata, could be, like, a past query examples. And, also, like, you can upload any documentations. So all these would be part of the context to be retrieved for in general, the answer. But on top of that, we are also working on this MCP server so that you can essentially connect with your own MCP server depending on your organization, like, where your documentation lives. Right? So you can just connect with, like, a Notion doc, like a MCP server, then you can just, like, pass all this, like, semantic information or, like, some business context probably. So that's one. Then the other part is that in order to make a agentic flow, we have all these dry runs to ensure that the whenever this AI generates some code, it could be SQL, it could be Python, it will run and we verify this, like, first of all, there's no syntax error. And then it will dry run-in a way so that it'll verify with the database or, like, table or API schema, and then be able to present you the result in the chat interface. Behind that, we have this containerized environment. So it's like a kernel environment, like, to allow you to run SQL, allow you to run Python. So that I think this is, like, especially when talking about many of these organizations, security is actually a key issue. And then we want to ensure all this code are running in a containerized environment so that you don't need to worry about, oh, this accidentally, like, have some malicious code, like, swipe out swipe out kind of your database. I think there's many interesting story about that.
[00:16:24] Benjamin: When I run SQL in the Fabi environment, in this kind of container environment, what query engine powers that? Or it's like the SQL in your source system. So it might be Snowflake SQL. It might be Databricks SQL. It might be PostgreSQL.
[00:16:40] Lei: So it really depends on the SQL dialect because our text tag has primary in kind of Python. So all these BI platform, we kinda we strongly believe that BI should be stored as code. So we can do versioning. You can manage the workflow, how to run it properly. Right? So in the back, like, everything's converted into some form like a Python code. And then to connect to to the database data warehouse, we essentially use SQL Acme, and then you can use a different to pull in. Once you pull in, it becomes, like, a pandas or, like, a pullers, like, data frame, and then you can do subsequent analysis or visualization as you want.
[00:17:18] Benjamin: Ah, okay. Interesting. So everything is basically pulled in from sources through data frames. And then if I wanted to join Google Sheet onto my Postgres table or something, that would actually not happen within the SQL ecosystem. It would happen more, like, within the Python ecosystem.
[00:17:34] Lei: Yes. And the one thing to add is that our kernels actually is stateful, so the keep all these variables in memory. On top of that, we also dynamically cache the data. So you don't want to constantly send all the query to a Snowflake or, like, a BigQuery instance. So it's all cached in certain way so that we based on all these code blocks, we determine the dependency and then see whether or not this code accurate needs to refresh. And then we select it, like, say, oh, this code blocks need to run. This one actor can just reuse the cache on this or in memory so that you don't need to worry about that, like, blowing up your, like, say, data warehouse bills. And then it also, like, keep the latency really fast. Like, especially when you're doing this Vibe Analytics, it's like you're constantly on the go. It's like, try out different ideas and see the result. Right? So this is actually really critical as, like, you have a short latency and be able to try out different ideas, run the analysis, especially when you are working on, like, connecting with different data sources together to run the analysis.
[00:18:37] Benjamin: Right. Nice. So super cool. Looking ahead now, right, like a year from now, like, Vibelytics and so on, what are you excited about? What's gonna happen in the space? What are you guys working on at Fabi that you're excited to launch soon? Look a bit into the future. I know it's super hard in the space that's moving that quickly, but would love to get your take on that.
[00:18:57] Lei: Sure. Yeah. So we believe that, first of all, within a couple years, Vibe Analytics, like Vibe building, that will be the norm for anybody to interact with the data, run some analysis. And then we believe that, essentially, this BI system or, like, AI BI system would be more like a agent, and then it'll actually looking for, like, business opportunities and insight and surface to you. Right now, you can see, like, all these BI dashboards is, like, static. You have to go to the dashboard in order to see what's going on. But in the future, it would say, look at all these dashboards, look at all these, like, metrics. And the potential surface is like, oh, there's something going on. I want to run some analysis for you. These are certain things, like, you need to pay attention. So you almost have somebody to regular review your, like, product house, business house, and then be able to surface the insights and the opportunity for you. And then one more thing is that the interface between AI and also human beings, We believe that will be some new invention, like, a new form will be coming. Like, today, we are still getting used to this chat interface and, like, some form of static BI dashboards, but it could be way more interactive and dynamic. One idea is that you may kinda have some debrief, like summarize everything into a podcast and deliver to you, or maybe just like you generate all this exact summary in a few slides. And then so we believe that a lot of things will be coming. We I would say, reshape how people interact with data and be able to deliver the insights as they want.
[00:20:31] Benjamin: Nice. Yeah. It's a super interesting space, Lei. I look forward to continuing to just watching what you build at Fabi. Excited to see how you also help build the future around Vybelytics. Would love to stay in touch. Thank you so much for being on the data engineering show. It was a pleasure having you.
[00:20:46] Lei: Thank you for having me. Yeah. It's a really nice conversation. Yeah.
[00:20:50] Benjamin: Awesome. I feel the same.
[00:20:52] Outro: The data engineering show is brought to you by Firebolt, the cloud data warehouse for AI apps and low latency analytics. Get your free credits and start your trial at firebolt.io.