Listen to this article
In this episode of The Data Engineering Show, the bros talk with Sumit Gupta, Lead BI Engineer at Notion, about his journey through prominent tech companies, modern data stacks, and how AI is revolutionizing data workflows and professional development.
Listen on Spotify or Apple Podcasts
Sumit - 00:00:00:
AI has made me a lot more productive, but at the same time, it has also made me dumber. Whereas Claude is more particularly great at programming tasks. Complexity is amazing at deep research. I'm a lead BI engineer at Notion, and I lead reporting, dashboarding for marketing and sales teams. Right before Notion, I used to work for Snowflake. I worked for them for a couple of years. If you don't jump onto the bandwagon right now, you might be left out in a year or so.
Intro - 00:00:30:
The Data Engineering Show is brought to you by Firebolt, the Claude data warehouse for AI apps and low-latency analytics. Get your free credits and start your trial at firebolt.io.
Benjamin - 00:00:42:
Hi, everyone, and welcome back to The Data Engineering Show. Today, we're super happy to have Sumit on. Sumit is a lead BI engineer at Notion right now, has been here for a bit more than a year. Great to have you on the show. Do you want to quickly introduce yourself to our listeners?
Sumit - 00:00:58:
Yeah, thanks a lot for having me, Benjamin and Benjamin. I've been so excited to jump on this podcast and talk to you guys about everything AI and a bit about me. So as you mentioned, I'm a lead BI engineer at Notion. And I lead basically reporting, dashboarding for marketing and sales team here at Notion. Right before Notion, you probably will see a Snowflake logo on the video.
Eldad- 00:01:23:
No, we haven't noticed.
Sumit - 00:01:26:
Yes. And then you have like Hello Data Nation t-shirt in here. So right before Notion, I used to work for Snowflake. I worked for them for like a couple of years. And then before that, I was at Dropbox leading their analytics team, marketing analytics team. So overall, like I have great experience being part of like- I like to use this term called Bay Area Darlings. So back in 2017, 18, Dropbox was Bay Area Darling, right? And then Snowflake became Bay Area Darling because of, you know, IPO and etc.. And now if you think about it, Notion is the Bay Area Darling. So I don't know. I have some affections with darlings, I guess.
Benjamin - 00:02:03:
Nice. That's awesome. So what got you started in data kind of like in the first place, right? Kind of like tell me, maybe a bit about that journey.
Sumit - 00:02:13:
I graduated from Mumbai University in 2014. And then right before my senior year, I was like exploring what my career choices could be. You know, a typical 18 year old where like he's confused, right? And this is before the age of AI. I know, I mean, a lot of folks nowadays have it easy. Where you could just go to ChatGPT and be like, you know what, help me decide my career. But back in 2014 in India, like you were, there wasn't a lot going on in terms of like tech. Like there was obviously like software engineering as a career. But I realized that software engineering isn't for me because I like, I love talking to people. And then I wanted more interaction. And I did not want to be a out and out management consultant either. Because I still wanted, I knew for a fact that tech has to be in my, has to play a very important role in my career. So as I was exploring, I was like, you know what, Information Management is something that I should probably pursue. And then I decided to pursue my master's in that from Syracuse University in the United States. And then the reason United States Information Management is because it has that mix of data management consultant and yet being very close to like tech. So that's, that's how I got introduced to tech. And since then, I've been, I've been in data slash tech for over a decade now.
Benjamin - 00:03:29:
Amazing. That's awesome. So tell us maybe about kind of like the, at least at a high level, kind of the work you're doing at Notion right now. Kind of like what, what are you working on? Kind of how's your data stack? Give us an overview.
Sumit - 00:03:43:
Yeah. So Notion, Notion is a very, if you think about it, Notion is relatively new. All the Notion was, I think, started in 2014. But the Notion got really popular due to COVID boom, right? I think if, if I, if I recall it correctly, until like 2020, we had close to like about a million or a couple of million users. Now, I don't know if you saw the news or if you saw our co-founder tweet about it. We have over a hundred million users now.
Benjamin - 00:04:10:
Wow, congratulations. That's so impressive.
Sumit - 00:04:13:
Yeah, yeah.
Eldad- 00:04:14:
That's amazing.
Sumit - 00:04:15:
It's wild that, you know, the kind of following that we have, B2C as well as both B2B, right? So if you think about it, going from a couple of million users to 100 million users, if your data stack is not nimble or if it's not modern, right? That's the term that we want to use. If it's not modern, you're going to be stuck, right? Your team will not be able to make any relevant decision in time. So that basically means our stack includes like some of the most common modern tools like Snowflake. We use Fivetran for ingesting some of our data. Obviously, we have some custom pipeline set up too. We use Airflow for orchestration. And then we use Tableau as well as Hex for our reporting in Dashboard. I'm an out-and-out Tableau guy. I mean, not to diss on other tools, but Tableau is great, right? One of the reasons I love Tableau is the fact that it has great community.
Eldad- 00:05:09:
By the way, you know, if you love Tableau, you know Tableau went through a long history of improving their query engine behind the scenes. And there was an acquisition done by Tableau of a startup, Hyper, and that query processing team is now driving Tableau, Salesforce, Query Stack. It's an amazing team. And the reason I mentioned it is all of them are coming from Munich. And many of- Some of them are at Firebolt. So there's a lot of geeky query processing stuff happening in Munich that drives many, if not most, of today's data warehouses and data solutions. So we have a special place in our heart for Tableau as well.
Sumit - 00:05:47:
Yeah, did not know that. But I know for a fact, like, the Hyper Engine that Tableau has, it has changed a lot. Like, you know, nowadays when I import or extract, like, 20, 30 million rows, it goes by really fast, right? Because of Tableau Hyper Engine. So, going back to the topic about modern-year stacks, our BI reporting layer is Hex and Tableau. I don't know if you guys have heard of Hex, but the way I'd like to think of Hex is like Jupyter Notebook, but on steroids. It does everything that you can imagine, right? Python, Spark, SQL, Matplotlib. When you think of Python, Matplotlib, your pivot tables, your charts and everything. So we have that. We also use nowadays, we also have started moving some of our data loads to like Iceberg tables. That's the new thing in the market, right? Like a lot of folks are moving to Iceberg tables because of the benefit that Iceberg offers.
Benjamin - 00:06:41:
So are you leveraging kind of the key benefits of Iceberg already in terms of having flexibility between query engines?
Sumit - 00:06:48:
Yes. So I primarily don't work with Iceberg, but I know for a fact that our data platform team does. So I'm not an expert to talk about that. But I think the whole point of Iceberg table was like there's a component of saving cost, right? Because with Iceberg, like with the whole metadata layer and how the data moves into the actual like data layer. That's why we decided. Because we use Snowflake as our analytical data warehouse, but we also have like S3 bucket in our data lake sitting in AWS, right? So we wanted to, because if you imagine, as I mentioned, like when you go so fast, your data is, your data needs are growing so fast. Every bit, right, is expensive. Like all the servers are cheap, but when you're dealing with 100 million users and trillions of rows of data a day, you have to find that one person saving, two person saving or those marginal savings. And Iceberg helps us with that.
Benjamin - 00:07:41:
Okay, super, super cool. And very modern stack as well. Kind of like using a lot of stuff to cool, kind of like new tech, I mean like Iceberg hacks, kind of all of that. That's all.
Eldad- 00:07:51:
At the front of the modern stack, even the modern stack, right? Like you have that starts to evolve as well. We see the modern stack getting a new definitions across. Across each component, it's beautiful to see it. We love Iceberg. Iceberg makes all metadata and data the same for every vendor, for every query engine, every ETL, ELT tool, every user. Yes, there are many challenges. Yes, there are many gaps, and that's why we're all here. And it's amazing to see the evolution happening, Iceberg, within information and analytical teams. Amazing, go on.
Sumit - 00:08:29:
Yeah, and then, sorry, I forgot to mention the... Like if you think about modern data stack, there is one tool that brought this into like in the front, right? And that's dbt. We use dbt too. We are a heavy dbt user. You cannot forget dbt because the whole term around analytics engineering and modern data stack, I think they were the one who like promoted this really heavily. Because before dbt, right? So I was at Dropbox and we used to have our own version of like Hive, right? And then there were times when it would take at least like 18 to 20 hours to run a query. Because you can imagine Dropbox is back in 2019, 2020, right? Dropbox is and was huge. We do like a couple of billion dollars of revenue a year. So and then now with dbt, et cetera, right? You could have incremental models and, you know, basically put things in Snowflake, right? Obviously, when you think of Snowflake and Hive, Snowflake obviously will be a lot faster, although there's a cost associated. But sometimes either you invest in time or you invest in dollars. In the Snowflake case, you invest in dollars because you save time.
Benjamin - 00:09:34:
In your data stack, like how is AI kind of like shaping it? Like what's changing? Kind of what's your take on all of it? Give us an overview.
Sumit - 00:09:43:
Yeah. So we at Notion use AI heavily. We have our own AI for Work tool, which is inbuilt in Notion. I don't know if you guys use Notion or not, but people or companies that use Notion with AI for work, basically everyone in the company is supposed to use AI for Work. It's like deep research. It has all the context about your data. So you can you can connect your Slack. You could connect your Google Drive. You can connect your other like third party apps into Notion now. And then just like let's say if you have a question about when was the last time a specific metric was updated, right? AI for Work will do that for you, like Notion's AI for work will do that for you. And outside of Notion, we are a heavy AI user as a company, like everyone in the EPT department or pretty much in the company has access to Cursor, right? The coding, I wouldn't say agent, but like editor nowadays, right? So we use Cursor a lot nowadays to like help U.S. Speed up our productivity. And then we have been building a couple of like GTM or go-to-market market, I wouldn't say tool, but like use cases. Like one of the use cases that we have built is whenever we are going to talk to our customers, if they are already, especially renewal customers, if they are already, if they are up for renewal, we have this AI agent or AI workflow setup where you select a customer and then it will give you all the details about total number of users, active users and users who have never logged in in XYZ days. Plus like we also ingest our call transcript, our Slack messages that the sales rep have and then create a short summary of what has been discussed in last 30 days, last 90 days. What has, was there any blockers? Was there any pointers that we can use? So that you have one page document where you can go talk to the company and then, you know, try to, because when you go for a renewal conversation, right? If you have data to back it up and be like, you know what, 80% of your company is using Notion. So it's probably prudent or it's probably wise to, you know, renew it with U.S., right? If you were to move to a new tool, there would be a lot of headache, et cetera. So the fact that we could go have that conversation and the reason we could do that now because of all the context is AI is the reason for that.
Benjamin - 00:11:56:
Very cool. Okay. That's exciting. How, how do you stay on top of like all of the stuff happening in AI in your personal life as well?
Sumit - 00:12:03:
I think my mode of AI consumption has been podcast recently. When I say mode of consumption, I was recently on another podcast called Your Everyday AI Podcast. Jordan does like 20 minutes AI news every day at like every, every morning. So whenever I wake up, I listen to that. And then a lot of keynotes nowadays, right? Like recently Google had a Google I/O, right? And I would watch that and see like what's happening latest and greatest. Claude recently launched, launched Claude 4.0, like, right? So whenever some new model comes along, I am first to jump onto it, right? And basically test it out and see what's working. And you wouldn't believe I have basically premium access to pretty much all the famous LLMs, be it Claude, be it Perplexity, be it ChatGPT, right? And my use case... Every tool is great at something. Like for me, chatGPT is when I want to, let's say, rewrite an email or need some like text written or, you know, want to review some document, etc. Whereas Claude is more particularly great at programming tasks. Right? Perplexity is amazing at deep research.
Eldad- 00:13:09:
Who does the cooking? Who is cooking for you?
Sumit - 00:13:13:
Yeah. No, I mean, Perplexity, I guess. Because with Perplexity, right? I think Perplexity was the first tool which allowed you to do deep research where you had direct access to the search engine, right? ChatGPT. And Claude, like they recently launched deep research, right? Gemini obviously do, but Perplexity is where you could go and be like, you know what, like one of the use cases, I was recently searching for like universities for my niece that she was applying for, applying to universities in U.S.. And I was like, you know what? This is the score that she has received. This is like, you know, her SAT scores, her other scores. Go help me find 20 universities that has good acceptance rate. When I was applying, that was like one week of effort. Now it's not even seven minutes of effort.
Benjamin - 00:14:00:
That's very, very cool. Nice. How do you think the data space as well is going to evolve over the next couple of years? What are you particularly excited about? What problems do you think will start going away? Tell us about that.
Sumit - 00:14:18:
I actually like to divide that into two parts. One is like if you are someone who's starting a new in Data Field, right, as an entry-level data scientist, analyst, or engineer, I would say the value of your technical skills that used to be very valuable until 2021 when there was a boom in hiring. Is not as much as, so the value has decreased, right? Your tech skills are valuable, right? But as an entry level, you don't know if something breaks, how to fix it. That's where your transferable skills or your soft skills comes into picture, right? So if you are to grow in data career, make sure that your transferable skills are outshining your technical skills nowadays, right? That's where the entry will part. But if you're a senior, let's say for someone like us, who has like seven, eight, 10 years of experience, right? Your tech skill is important. You know when things break, where to go and how to look and what to fix. But your transferable skills are going to be paramount in future too, right? Especially at Notion and a lot of other companies that I've heard, right? There's a lot of like, a lot of CEOs nowadays are coming out and be like, use AI. If you don't use AI, you're going to be out of job soon, right? So in that case, right? AI, like nowadays I use it whenever I'm building Tableau dashboard, etc. There were times when it took a couple of days to write a calculated field. It was, if it was complicated because Tableau has its own way of calculating, right? Now when I'm stuck, I just go to Claude and like, you know, write me a calculated field and I get the calculated field done in 20 minutes. So the value that I used to bring as a technical guy has diminished. But if that gets diminished, I have to increase the value that I bring as a professional with like great communication skills, right? Great stakeholder management. So you kind of have to build on that. So I don't know if that answer your question because, data is important, but the role of data as a tech skill.
Eldad- 00:16:11:
Be technical goes down, be nice goes up. You know, like that's it. It's really fascinating to see kind of here from someone from the inside experiencing the AI like full-blown in your face not from the like not theorizing about it really practicing it turning into your personal career advantage figuring out how to build your strength around it thank you for sharing it with us.
Sumit - 00:16:39:
Yeah, no Notion, like, I use AI heavily, uh, in my work too, but in personal life too, as I mentioned like I re- I was recently on Everyday AI Podcast, and the topic that we discussed there was like, AI has made a, made me a lot more productive, but at the same time, it has also made me dumber. Because I have-
Eldad- 00:17:00:
Who would have thought? Who would have thought getting dumber gets you more productive? Who would have thought? That defies physics, defies everything we've learned.
Sumit - 00:17:08:
Exactly.
Eldad- 00:17:09:
30 years of industry experience. I'm too old. I'm like, I've learned, you know, like, Previous generation physics, everything is changing.
Sumit - 00:17:16:
Yeah. And then I gave a couple of examples there. And honestly, like after that call, and I've been like trying my best to not like allow AI to drive my life. It's good. AI allows me to research. But I have like a back of hand rule nowadays where like I use AI for repetitive tasks, right? Like I, to give you an example, like I have an Instagram channel by the name Data by Sumit, right? It has around 21,000 followers, et cetera, right? So previously I used to have a social media content strategist and content researcher. We used to like sit down and, you know, research content ideas for me. Now I have a Make.com like workflow where I have identified a list of, let's say, 100 Instagram users, right? Who are, I wouldn't say competitor, but who are in my niche, right? And then I use Appify's API. Also, I've also tested RapidAPI. Appify's API, which basically I feed the users into Appify. Appify goes into, Appify's API basically helps me extracts, let's say, last 30 days of data for each profile. And then I have a GPT-3 model as part of the workflow where it looks at, I have a custom formula where based on number of views, number of likes, number of comments, I create a custom column, like a calculation of high, medium, low, like, should I be writing about this or should I be scripting about this? And once that cutoff is met, the second part of the GPT-3 model goes and transkypes the reel and then learns from it and then comes up with the new script for me.
Eldad- 00:18:44:
Wow. This is when information work becomes operations. Like, you're turning, like, your usage of data with those tools, you- This is operations. You're driving a business. You're driving a unit. You're moving crazy. Amazing.
Sumit - 00:19:02:
And honestly, that's how AI is supposed to work or supposed to be used. Someone recently asked, do you think AI will take over the world kind of conversation? I'm like, maybe, yes. But for now, I mean, if it does, we don't know. If it does, everyone would be infected, not only me. But right now, AI is helping me improve my workflow. AI is helping improve my, improvise my workflow and get better at it. Like something which took me a week to get like four or five new content ideas. I just have to click a button and get like 50 different new ideas based on what's working right now. The last seven days, last 30 days. And then I get script too. I just have to. Like one of the, so like an inside scoop. One of the things that I'm doing now is like creating another workflow where based on the script, right? I'm using HeyGen and ElevenLabs to basically replicate myself now. My voice and myself, right? So that's the next step in the process where if I'm not there.
Eldad- 00:20:01:
Are you real? Are you real?
Sumit - 00:20:03:
I am real. I can pinch myself.
Benjamin - 00:20:06:
Six fingers. Six fingers.
Sumit - 00:20:10:
I mean, I wasn't doing a lot of big motion. And, you know, when you don't do big motion, HeyGen really works. So you guys probably don't know.
Benjamin - 00:20:22:
In the beginning, you were like, oh, kind of fiddling with the laptop, fiddling with the phone. But really, you were just preparing your AI self.
Sumit - 00:20:31:
Exactly. Exactly. That's it. That's it.
Eldad- 00:20:34:
The free trial was over. Enter the credit card, subscribe, activate the agent. Nice. Amazing. Thank you.
Benjamin - 00:20:43:
Yeah.
Sumit - 00:20:44:
No, I mean, as I said, I think anything which can be automated, especially a repetitive task, AI is the way to go. And honestly, there are a lot of AI content creators nowadays that you can learn from. And a lot of them are actually offering free workflows, Make.com workflows and entertain workflows too. So if you are into AI, if you are jumping into AI, this is the right time. And the scariest part about the whole AI boom is this is the worst AI will ever be. We get impressed when ChatGPT was launched and initially I was a skeptic. New hype. And then I started using it. I'm like, what the hell? Like, how is GPT so good? Like, it doesn't make sense. And now when I think of it, I'm like, this is the worst it will ever be.
Benjamin - 00:21:31:
Nice. I also love how like as an AI content creator, you're at like the front of content creation. And we're getting close to AI content creation being done by AI. Like, we're very close to actually like coming full, full circle and just having AI create content about itself.
Sumit - 00:21:53:
But there's also morality, moral aspect of this where I'm like, I want to do the HeyGen and ElevenLabs like workflow. But I'm like, if I was following a content creator that I love and if I knew that all he is is an AI nowadays, I would be pissed. So I'm like, I can do it. If I sit for a weekend, I can set the workflow up. But I've been intentionally delaying it because I wrote a book about Tableau, the Tableau workshop, right? It sells really well. But one of the reasons I wrote it because it was like, if I was to read this, I should be happy about it. Like I should learn something, something. And that is how I view my life. If I will only do something that if I was in their shoes, I should be okay with that or happy about it. So if I build my own AI avatar, right, am I going to be happy following that content creator? Probably not.
Benjamin - 00:22:43:
Nice. That's, yeah, that's a super interesting take. I think we've had a lot of content creators kind of on the show already, but like this is definitely the most AI forward kind of take I've heard so far.
Sumit - 00:22:55:
Yeah, I mean, as I said, like I think AI is great, both in professional as well as personal life. And there's a con for AI too. So my wife's been searching for a software engineering job. She's an entry-level software engineer. It's been hard. Even in Bay Area, getting a software engineering job, it's been hard. I have so many friends who had offers and then their offers got rescinded because company decided, you know what, we don't want entry-level engineers. So as much as I am utilizing AI for the good part, I have the bad part use case also. Because if my wife was searching for a job back in 2021, she would already have the job. So yeah, there's the good part, there's the bad part. It's all about balancing it. And hopefully there's that 51, 49% chance of, you know, you using AI for the good part more than the bad part.
Benjamin - 00:23:43:
Cool. Wow. This, yeah, this conversation was really like kind of like twisted my brain, kind of so crazy. Kind of before we wrap up, right? Like anything else you want to share with the audience, with the listeners, kind of anything, yeah, kind of you wanted to chat about?
Sumit - 00:23:59:
Like I think... Even at Notion and Postman also, there has been a lot of focus on AI-heavy workflow, right? So if you are someone new, especially in data, scared of AI or still skeptic of AI, I would say jump in. There's a lot of tools, a lot of resources nowadays. And even if you don't want to use it... For your personal or professional work, use it as a research tool or a learning tool where if you notice AI, you should be able to recognize AI. Like my eyes are trained now to see when I look at something, I'm like, okay, this is AI, right? If you are not trained, right? If you're not aware of how AI actually, what are AI's pros and cons? And you know, like when you look at an AI video, it's easy to make out. If you look closely, the eye movement, the hand movement, right? If you're not trained, you might get flummoxed, right? You might get confused, right? So that would be my tip. Like if you are in the outside or if you're outside trying to get in and still like are not happy about AI or skeptic about AI, jump in. You don't have to use AI in your personal or professional life, but... Trust me when I say this, if you don't jump onto the bandwagon right now, you might be left out in a year or so.
Benjamin - 00:25:11:
I think that's a great kind of closing remark. Thank you so much for being on the kind of show, Sumit. It was great having you and we look forward to catching up in the future and kind of seeing what fraction of your content starts getting AI generated.
Sumit - 00:25:27:
Absolutely. Thanks a lot, Benjamin and Benjamin. If you guys are in the same time, drop me a message and we can grab a coffee or a beer.
Benjamin - 00:25:34:
Would be amazing. Thank you. Bye.
Sumit - 00:25:37:
Bye.
Outro - 00:25:39:
The Data Engineering Show is brought to you by Firebolt, the Claude data warehouse for low-latency analytics. Get $200 credits and start your free trial at firebolt.io.