<- Back to all posts

Data engineering from the early 2000s till today - BlackRock

June 8, 2023

June 8, 2023

Data engineering from the early 2000s till today - BlackRock

Multiple contributors

Data engineering from the early 2000s till today - BlackRock

June 8, 2023

June 8, 2023

Data engineering from the early 2000s till today - BlackRock

Multiple contributors

No items found.

Listen to this article

Powered by NotebookLM

Listen to this article

When it comes to data management, have we come a long way since the early 2000s? Or has it simply taken us 20 years to finally realize that you can’t scale properly without data modeling. With over 20 years of experience in the data space, leading engineering teams at Cisco, Oracle, Greenplum, and now as Sr. Director of Engineering at BlackRock, Krishnan Viswanathan talks about the data engineering challenges that existed two decades ago and still exist today.

Listen on Spotify or Apple podcasts

‍

Benjamin: and post. Cool. All right. So welcome back everyone, kind of to the data engineering show. Good to have you. So it's the second episode in a row now without the other data bro, Eldan, because he has out today, but he'll be back in the next episode. But anyway, so let's jump right in. It's a total pleasure having Krish Nan here today. He's a senior director of data engineering at BlackRock. And like he's done... He's done everything, really. It's so cool having you on. He started out at Cisco, spent 10 years there, then went more to the vendor side as a principal PM at Oracle and later as a director of product management at Greenplum. And then most recently, he's actually been as a kind of senior director of data engineering at BlackRock. So great to have you on today, Krishnan. I look forward to talking about kind of the data challenges you're facing, kind of your background and everything. Yeah, do you just wanna kind of give a quick intro of yourself?

Krishnan Viswanathan: Awesome. Thanks, Benjamin, for having me. This is a pleasure. And like you said, I've been in the industry for almost a quarter of a century. So now that I think back, it's been a long time coming. But I'm really excited to be part of this conversation. I think the data space and the data explosion in the last couple of decades and all the modernization has put a deja vu into the data. organization, uh, every day seems like a new day, but also seems like a groundhog day in some ways. So I would love to share some of the things that I've learned. And back to my background, I have been, I started off as a data engineer. Actually, it was more of a sensitivity. I did my first initial project as a, as an executive information system for Cisco. Uh, for the on the web based application and then realized at the back of it that we actually needed a data warehouse and a data model and a data cleansing and all of the things that goes with data pipelines to get that numbers right. So that was my marijuana moment.

Benjamin: What year was that roughly?

Krishnan Viswanathan: That was 1996.

Benjamin: Wow, so take us through the stack back then. Like, kind of how did you have a lot of options? What types of systems kind of were out there at the time?

Krishnan Viswanathan: Yeah.

Benjamin: And I guess it was all on-prem as well, right?

Krishnan Viswanathan: This was all, well, think about this. 1993 was when internet became kind of prevalent. Cisco was the main child, main vendor pushing that pipeline and the routers for internet. And before, so when I joined in 96 at Cisco, the executives used to get their reporting in Excel. Every week they would get, and it would be a high level number, pretty much that's it. And no other information. And then they would make phone calls to all the different departments, different groups. And then, you know, by the time they get to a problem statement, the problems are either already solved or it has escalated to an extent that it was too late to solve, right? Whichever way it went. So the premise in 96 was, hey, we are the pioneers in internet. We should build an internet technology that can help showcase what we can actually present to the executives. Out of it was born the executive information system, we call it EIS, the first generation of it. And it was homegrown. I was one of the first, one of the four engineers and eventually you are the lead engineer there. We built it with, we got based out of an Oracle ERP extract for, and then we had an Oracle a large scale data warehouse, scale to single instance Oracle data warehouse. And it had its own challenges extending, but it was that. And then the front end was a purely web-based application. And web-based application, it was pre-Java, pre-EJV, it was all pure HTML. And what we did at the back end was we used to run this extraction process and build this data staging layer, for lack of better word. because he was still, I was not a data engineer at that time and I would do so many things different, but we did that. And then we would run a process between that and the web server to create all these different HTML pages because HTML was limited, right? Web content, web configurations were very limited. There wasn't a real dynamic XML parsing. I think XML was just coming up to speed, but not even that. But anyway, the Java EJB was not even part of the picture. No servlets, right? If you think about those things. So it was all pure HTML, PHP-based query process. So that's that. And the expectation, since it's executive reporting, was they were looking at the overall high number, and they would drill down all the way to, and by region or by product, eventually to a sales rep to a transaction. to see where, if there was an issue, if there was a question, they could go all the way down. So the process itself in the beginning took about six hours for us. Interesting, and again, reliability was a whole different problem. I will talk about that later. And then the data accuracy was another level of challenges that we solved in subsequent releases. But the first version of it, for all the good things that we did, it had its own challenges. And part of it was technology and the immature technologies and the kind of band-aiding them together to make it work. And then the other part of it is, of course, we also could have gone and done a better data design process

Benjamin: Right?

Krishnan Viswanathan: and data alerting process. So that was fun days. But it was a hugely successful endeavor. If I were to just talk about the outcome of that, the appreciation we got out of the data something like that, it was actually showcased by Cisco as one of the pioneering projects that helped. In those days, Cisco used to call it the virtual close. What it basically means is it used to take like a month for companies to close their book of records. Using this application, this was one of the key applications, I won't want to call it, this is the only application. Using this application, Cisco said they could actually close their books within a day. So.

Benjamin: Wow.

Krishnan Viswanathan: from the vision to what we could accomplish, notwithstanding all the other disclaimers I put in for it, it was a success. And they were, we actually got funding to make it better. And we did do better. And the second iteration of that was a whole lot better, a whole better concept. And I'll talk about that a little bit later, but that was the genesis of my data engineering role. Actually, let me talk about the second version of it, because I led the second version of that project, which we call E-Exec. And part of that project was to find a better capability, a better tool. And this came around in 2001. So if you are all aware, and if you have followed the internet bubble, and the burst of the internet bubble in the early 2000s, I don't know Benjamin, if you were there, but

I was writing the whole Cisco bubble and I still get nightmares about that because one of my roles in this project as the exec was we would wait for the SVP of Cisco to go in and provide a confirmation on forecasting and judgment, which is basically saying, yeah, I agree with this forecast and this number is going to look good. I was the one who would say, OK, it's done. Click the process, right? Data engineer, making sure that everything is done. The financial analyst would tell me that, and then I would make sure that the process gets itself. But I would validate that everything is working. I still remember that day in October of 2000, some early October of 2000, or maybe October of 2000, when there was a huge downward revision of the forecast from what I have seen in the past. The forecast was always going up, up, up. And then it was like, Well, that was like a fairly deep decline. I was like, something doesn't look right. But I was still an engineer. I had to get the job done. I just did it and went off. Three months later, you saw the whole internet crash and Cisco going from its huge bubble. So I lived that downtown. And I lived it very, very vividly, both in my financial terms and at Cisco. So remember I said, we... Assume that this application will help executives give an early warning and give the executives a good indicator of how the business is performing and if you wanted to do a virtual close-up. Within a day they can close. This application kind of didn't give that exact, it gave them good information, not good at all. It didn't give them the exact alertness that they were looking for, which is something is wrong proactively telling them that, hey, something is wrong, you're gonna have to, you're gonna miss this. And there were a lot of other business reasons for that from a technology point of view. It didn't do that. And that was the next version of this. So the next two years from 2001, 2000, 2001 to 2002. two years, I spent a whole lot of time digging into the data to understand how we missed all of these indicators because it was a big impact on Cisco. We had layoffs, our stock prices never recovered from that. Even today, it hasn't recovered to the pre-internet days. So I did a lot of work on that. And we came to a conclusion that, of course, our engine was not a real engine. It was just a reporting engine. It was not an alerting engine. So we kind of thought that was important. And we went in and we needed a more dynamic and scalable model to improve this, and so that we can easily change some of these parameters. Toward that end, we chose a technology which eventually got called a Siebel Analytics. I don't know if you're aware of that term. But Siebel Analytics was one of the products that Siebel had brought in. And it's supposed to provide this automated dynamic dashboard creation, fast and easy, and it had alerting capabilities, it could multi-device. And this is right at the time when mobile phones are not the smartphones, just the mobile phones were getting more prevalent. Pages, mobile phones, dynamic content, PDAs, I think BlackBerry's and some of those things were prevalent at that time. I'm dating myself here. And we did an analysis of that. I brought that product in. we delivered on that application, and that became another huge success. So that was the genesis. So out of the background work and data validation, we had a second version. And that application actually, if I'm not mistaken, Intelligent Enterprise, if I'm not mistaken, gave that application one of the best BI application in that time. I think it was 2004, if I'm dating myself correctly. And so we got that between with Cisco and Cibo. At that time, we got that. Eventually, the stable got acquired by Oracle. And hence, my transition from data engineer at Cisco to product manager at Oracle, because I was one of those guys who understood the ins and outs of that particular product that became Oracle BI.

Benjamin: So, I mean, you, you told us about that kind of stack, right? In the very early days. So having like this kind of like RLQ database generating static HTML, loading that in the browser, like take us through data volumes here. Right? Like if you, if you still remember like what was, cause this was big data back in the day, right? Like kind

Krishnan Viswanathan: Yeah.

Benjamin: of what, what was big data actually then in the late nineties and early two thousands.

Krishnan Viswanathan: Cisco was actually on the pioneering of that. So big data in those days. So to think about Oracle applications, SMP infrastructure was, you can only scale so much. I think it was one single box, 64 cores if my memory's, 32 cores or 64 cores, I don't remember now. And it probably had about 256 GB or maybe even, memory, I have no idea. I think it might be less than that. And it was expensive machines. of a million dollars for that one single box, right?

Benjamin: in the like just availability of like that type of hardware. Nowadays, of course, like that's not considered a big box, right? But like, sure, like more than 20 years ago, that's absolutely insane.

Krishnan Viswanathan: That was insane. Yeah. And, and on top of it, it was, it was shared everything. And so our process from midnight to like 6 AM, we would take up a lot of processing problems and we would ask nobody else on it, but then there was still conflicts. And when the conflict occurs, there wasn't an easy way to troubleshoot, right? Because everything's there. So we had to start killing other applications in sometimes they would kill our application too. But.

Benjamin: Gotcha.

Krishnan Viswanathan: I was the HOV lane. I was like the ER lane, right? We would get high priority, high visibility. So that was good to have. But it was also bad because if something happens, I would be on call the next day morning with the VPs and the directors in that office just trying to figure out why I did what I did and sleep

Benjamin: Okay.

Krishnan Viswanathan: at nights to fix all that. So in terms of data volume, going back to your question, it was internal Oracle applications. There wasn't a lot. Again, we have a lot. So if you expand our bill of materials, there was a lot of transactional level details. So I'm probably thinking, I don't have the exact number, but it's probably closer to like 20, 30,000, maybe not. Maybe about 100,000 records per day.

Benjamin: Okay?

Krishnan Viswanathan: And we used to do this three times, for three different time zones, Asia Pac, EMEA, and the US. And the US was big. So about 100,000 records per day. The challenge was not the number of records. The challenge was trying to get to an accuracy and a lot of things. So people are entering data on multiple different terminals, orders come in and vendors will load in that information. Somebody fat fingers and instead of one zero, they put two zeros or they put a different number. Executives don't wanna see that and because it would throw off their numbers. One thing I realized in this whole conversation was, at the time of BI and even today it's true. people who really know the data, who really are the business, who are engaged with the business, they intuitively know that data better than us engineers. So why I say that? If there was a fat finger, if there was a number that was incorrect, I would get a call from the SVP of sales of the particular region and say, this number is wrong. I know this cannot be the truth because I have not seen this. I would have known this. So any which way, they were almost always correct. So we would have to... Our challenge was to make sure that those kind of numbers in the upstream system, impacting the upstream system, does not automatically flow through. So we had to build some triggers and metrics. So our processing and data accuracy with lack of actual tuning, all these things were just getting started in pre-internet days. How do you build a reliable data pipeline? How do you build data accuracy and at every stage of your data transformation, how do you put in... controls and alerts in place. I think that was the biggest challenge and that was what took us longer. But in the end, I am a big proponent of data controls, data governance and notification and QC for data as a separate entity. But we did a lot of work on that internally, but yeah, that was my biggest learnings in this whole process.

Benjamin: Okay, gotcha. It's so crazy to me because like so many of the things you're talking about from kind of back, back then in the early 2000s, right? Like they're just as relevant today. And if you talk to data engineers today, they're struggling with similar things like data quality, data observability, kind of all of those things, just of course, at a different than scale in many cases. Do you think we've come a long way? Like if you look back, or do you think the problems are actually just the same in many cases?

Krishnan Viswanathan: I'm still struggling doing the same things that I used to do back in 20, 25 years ago. Now in my new role at BlackRock. So no, what has happened? I think it is, it's interesting to look at the shift that happened in 2010. So before 2010, 2005, 2010, data warehousing was a big thing. Data modeling was a big thing. Bringing data in. and making sure that the data goes to a certain rigorous data modeling data by accuracy data validation was big to the extent that it actually stifled some of the data volume movement. So there was a data explosion happening on the periphery with internet and the logs, but then there was this use case about decision support systems as they were called in the past. They were going through a certain changes, they were immune to that other world. So you had certain things, like people like us who are building this by hand and taking our sweet time. So building a data warehouse would be a two year effort. And it was granted, like, hey, two years effort. If you want to make a change to a dimension or bring in a dimension, that's a six month effort. And everybody lived with it. And then suddenly this whole data explosion came along in 2009, 2010. And tangentially, MPP became a huge deal, massively parallel processing, analytical databases. So back to 2000, 2005, there was one database for all, like the one thing for all. You either choose Oracle or you choose Sybase or you choose Microsoft, they all did this OLTP extremely well. And then as an afterthought, they give you this big hunk of a hardware, single, SMP single instance hardware that you could just Take the data from your OLTP, do your ODS, and do your data warehousing, and scale. That was your data warehouse, and that was your scale. All the magic was in data modeling. MPP came along, and they broke that paradigm. And they said, you don't really need this big infrastructure. You just load the data and let it scale. And so if you are doing trending, and if you're bringing data from logs, or if you're bringing data from machine-generated, And you're doing trending data. Great. That works awesome. And so it became a huge success. So you could do things that you could never do before. You could never get clickstream applications or log analytics. Great applications, but you couldn't do that within a reasonable size and within a small budget, you couldn't do that. But what it also brought in along with that was this whole concept of data lake, which eventually turned into data swamp. for many people, but that I think we are now about 10 years after we are now starting to realize that. We are starting to realize that, no, you can't just scale your data and your infrastructure good as it may be. You also still need data models. You also still need data quality controls. You also need data governance and right. Those, and those are now, there are vendors who are now doing it at scale for cloud which in the old world mostly done by Informatica on a single node machines. And that's why I think we're seeing a lot of differences now. So to your original leading question, going back to that, I think this next few years, you will see, you're already seeing a lot of that hype. You're talking about data observability, which is pretty much this, right? Data observability goes across the board, not just on availability, but also on a lot of notification and processing time and QC, governance, data maturity. I think those will come into play. I think we are ready for that, but we are not, we don't have a good technology today in my mind that solves for all of that. I also think because of the volume of data that are potential AI ML use cases, right? Which can look at historical data. observability being one of those cases where you can actually bring in a good quality data QC or a data governance model that can give you early warning so you can build on top of it. You can actually be much more effective now than you would have been 10 years ago. So I'm very optimistic about the future.

Benjamin: Thank you.

Krishnan Viswanathan: We are trying to do some of those things in my current application, because if you see my past, I've been doing some of the data pipelines for supply chain and marketing companies. Clorox was a marketing company for the most part, marketing and finance. But now that I am in BlackRock and it's a financial company, that is again, our challenges are back to data accuracy. providing the right data at the right time with good quality controls. So

Benjamin: Gotcha.

Krishnan Viswanathan: we're seeing

Benjamin: Yeah.

Krishnan Viswanathan: that again.

Benjamin: I mean, kind of take, take, take me a bit through that. Right. So, right. You have to super long history kind of you've been in BI since the kind of very, very early days at Cisco. So now you're moving kind of into like this kind of more like financial space. And at least in my head as an outsider, right. Can you always like, when you think of finance, you think of like legacy systems kind of, and then very, very high bar in terms of like data quality. Right. Cause okay, like kind of like there's a lot of money at stake in this space and so on. So. take us maybe through the types of data challenges you have in that space and how it's unique.

Krishnan Viswanathan: I'm still trying to figure out the uniqueness of our scope. I think a lot of times I've heard about applications and legacy applications in finance and the very hard and well-structured processes and both of them are very true. And most of the financial application, at least in where I'm working, the challenges are... that we have very strict and well-built processes that have survived for multiple decades, and those code has not been touched for multiple decades. People have moved on, but the code's not been changed. We are bringing in modernization into this industry. So now I'm straddling between legacy code and newer infrastructure with data modernization. The challenge is we don't quite understand an existing legacy code which has a ton of processes, which has manual input at every stage and move it to a new technology without breaking something. So that's the challenge. I don't want to get too deep into that, but at the same time, yes, what you are saying is absolutely accurate. If we break something, the unintended consequences are very high. How do we make sure that we minimize? I don't think there will be zero unintended consequences, but how do we make sure that we can minimize those to the extent that the risk is well captured, the testing time can be extended. But that's our challenge. That's also our opportunity in terms of how we can scale because technology is not a challenge anymore when it comes to data processing, infrastructure, architecture. These are all pluggable, quickly adjustable components. I think technology has governed. and moved quite fast. It's now bringing the rest of the organization to validate and use this technology absolutely correctly to make this happen. I think that's going to be the challenge for the next generation.

Benjamin: But from a quality perspective, that sounds super interesting because we were talking about this data quality aspect. Did someone enter an extra zero here? Did the distribution of my data change? Should I be firing an alert? I guess here you have this second dimension to it. So you're touching legacy systems, which maybe don't have a lot of observability in place and you might not be able to retrofit that onto them. You're building something new and trying to figure out, does it? Does it do the job? Like what starts going wrong when I turn off the old thing and so on. So that's interesting.

Krishnan Viswanathan: Yeah, I think so. Again, I think for what we do, a lot of data comes from external third parties. So we don't control anything here. Uh, but we, and that's what I was talking about using AI and ML to, to fingerprint for like a better word, to look at how we used to get the data in the last few years. And if you see a large enough standard deviation, can we predict and at least quarantine those data so that you can actually have somebody look at it. validate it before you accept it. So those are the kind of processes that we are talking about. I think the key for that is you need to have a proper metadata. You need to have a proper governance model because yeah, I can put a process in place today to do AIML and look at my historical trend. I can come up with an area where I can see some exception but to act on that and act on it real time, I think that's a bigger piece of puzzle. Getting the right infrastructure because people who have been there more siloed in their operation. So I think it is definitely a shift in what we are doing, but I think it is exciting in the sense that it's new. And that's the technological evolution that we're going through.

Benjamin: Okay, super, super interesting. So how big is your team, if I may ask?

Krishnan Viswanathan: So we are a newly formed group called data platforms and services. And my team, we manage reference data, which is kind of a fairly large amount and it's used across the enterprise. And I have about 25 people in this intervention. And this is just engineers. And then we have product managers and data stewards and other operational analysts, operational engineers and things like that. So yeah.

Benjamin: Nice, super, super cool. And also cool that you have like the kind of organizational buy-in to tackle these, these types of challenges. So that's, that's awesome. Um, cool. All right. So in, in the intro, you were kind of mentioning this like data deja vu, right? Uh, kind of seeing, seeing things, uh, over, over and over again. And we talked about that data quality aspect of it, like in, in what other areas of like data engineering as a whole, are you kind of having this like data, data deja vu nowadays that you had in the past?

Krishnan Viswanathan: Yeah, I think so. So we talked of quite a few things, right? So what I faced in 2000, 2001 for a company financial metric and seeing that again today.

Benjamin: Right.

Krishnan Viswanathan: So I moved organizations, but I see the same thing happening here. And there's a lot of things that I am noticing that are consistent in those days here. Some of them is because we operate as a startup and so there are some challenges there. But The other thing is it's also a financial industry and there is a very strict review. So they're very conservative in how they work on this platform. Also remember from 2006 approximately to like recently, I've gone towards the vendor side and I've also worked in other companies which were more open to buying vendor products. We are back here to a point where we most of it is. in-built and in-house. So that's another shift. So I know from a vendor perspective, there is a lot of availability in terms of tools and technologies that we could easily incorporate and put in. But that's not how we build at BlackPak. We invent. We do it here. And a lot of reasons is because we eventually end up sharing it with our clients. So if I were to go and bring a data in, I don't have to only look at. how much money I spend in buying that. I also have to license it for other users. So to make it profitable for us, we have to be able to build it and scale it store-wise. So... The engineering world that I'm in recently, it is interesting because what I'm trying to do is given all the experience that I have had, can I take my team? And my team is fairly young. Average age is probably 30. And I'm bringing the age up pretty high, right? But 30, 32, maybe. So how do I take them through this when they have... One, I think part of them is mostly software engineers, not data engineers.

Benjamin: Right.

Krishnan Viswanathan: They are, I have to transition them. So I have to go back to dig into my old days of how I reacted to this, get them data savvy. So that's that, right? The whole people transformation, because these are great people. So I need to translate. And then how do I also translate that into what we can objectively achieve year over year and trend better. And I've been only here for a very short time. So it's a long process. I'm still learning a lot of things and a lot of challenges, but I believe that is a, like I said, the next three, four years is going to be massive in terms of how the whole industry changes to solve for this, but also how our company is gonna make a big difference. And I see a lot of potential there.

Benjamin: Okay, awesome. So another thing, right, we talked about it earlier in terms of the like, okay,.com, right? And kind of the predictions you were seeing and so on. Like now as well, right? That's kind of like at the macro level at the moment, like not a great time and especially for tech. So one thing which is coming up more and more kind of in conversations I'm having with people in this data space is this question of kind of, right, proving that all this data you're collecting is actually worth it, right? That at the end of the day, it's kind of contributing. at 2D Bottom Line of the business. What are your thoughts on that?

Krishnan Viswanathan: Uh... I have dealt with two sides of that coin. So remember my pre-green plum days and my post-green plum and post-green plum days. So when it came to, so the pre-green plum days, I was like, why are we collecting data that we cannot even process? What's the value of that? Why are we creating a data swamp? So I have that, that's part of my brainchild. But when I was part of green plum and I was the product manager for green plum, I had to put that on a really bad burner and not bring that up. I had to talk about all the cool things you can do. But reality is, and this is why I think as a technical and technology industry, we didn't do a good job of educating our clients. You can't just continue to collect data and not process them if you don't even know what exists. And the Hadoop days and the data breaks days, those days... are great for technology and great for infrastructure spend and glad that we had easy money at that time. But now I don't think that's going to fly. Because even at a company like Clorox, then I did a couple, I joined Clorox. I didn't talk enough about Clorox, but I joined Clorox to first upgrade, modernize that data platform. First on-prem through Oracle data warehouses and Exadata and stuff like that. And eventually I did it to migrate to. Google Cloud and Azure. So even there, because we are CPG and margins are really small, there was always this underlying question. Why do we need to collect so much data? How can we optimize our recovery process? So what I ended up doing was not only doing the data transformation, doing the data pipelines, and bringing the data in, I actually ended up building a couple of applications on top of it to showcase what data means to the company like Clorox. So this ties in very well with your question because executives are not sending any black check at that company. We use data to try and predict forecasting accuracy for products. Can I predict how much can I sell it? So we couldn't do it really good, so that was shut down. We tried using NLP and voice interaction to see if we can. predict and call out any product concerns. So that kind of worked out OK. I built a new version of the Executive Information System for Clorox. We called it the Daily Briefing, which was a mobile

Benjamin: Thanks

Krishnan Viswanathan: application,

Benjamin: for watching!

Krishnan Viswanathan: but pretty much following the same standard. And it was on a mobile phone. And that became a huge success. That was my last Clorox. And those are all built on the cloud. All of these things are built on the cloud. So I always worked in industries where we had a very narrow path between how much data we collect and what's the value of that data. Nobody gave me a blank check except in my Greenplum days. Even at Greenplum, we were telling customers, nobody gave me a blank check to go in and load as much data as you can. We did one project called the, which eventually became called the CDP, the Consumer Data Platform. And we brought in cookie information and cookie data. And I did it for about one year. And we were collecting about a billion records per day when we explored that marketing data. But in a year, we could not find any valuable metric out of that. At least the marketing team couldn't find any valuable metric, too. And we have brick and mortar. We're not really online. So it may have been better

Krishnan Viswanathan: online than us. But nevertheless, that was shut down fairly quickly. And it was all on-prem. We had a Hadoop cluster. I set it up. But again, that goes to show there were some people who were not in the technology world who didn't buy into this whole hype on collect as much data as you want and then run your algorithm on top of it and you'll get the results. I think that it worked. So that was always a challenge.

Benjamin: super, super interesting. Yeah, especially that like consumer goods, goods perspective is interesting there. Super, super cool. Awesome, cool. So yeah, do you have any kind of closing things you wanna say, right? So maybe one thing for people getting into this space, cause you said, okay, you're taking software engineers now kind of at BlackRock, right? And getting closer to like the, to this data engineering world. If I was starting today, right, kind of out of high school, kind of going to college to get into this space, like any, any advice for...

Krishnan Viswanathan: Again, I think when I grew up, there was no Google, there was no cloud, there was no YouTube, there was no much, I think that's what it's called, online learning, online training, call out a course at us and the, so all those are available today. So all I have to do, which makes my job a little bit easier is point my team to the right direction and tell them go and learn and get better. The biggest shift in from software engineering to data engineering in my mind is the data domain knowledge. I mostly intuitively do that because of my past experience, but understanding how data is processed on top of the code is such a big thing, which software engineers don't care. Software engineers think of data as an afterthought, whereas data engineers think of data as the first thought. So how do I scale? How do I make this consistent, secure? So... That would be my first thing, but in terms of learning, I think there are already so many tools and technologies in place. It's easier. It's a whole lot better today than it was in my days. Now, you might think of me as going back to the old day, but that's true. We didn't have the right tools and technologies and I can get the job done much faster today. In some cases, you're stuck with legacy code. Well, too bad. We've got to get that out. But I see a great potential for people who are... coming into the data world. And I wouldn't have said this earlier this year, which I'm saying now, right? The AI winter is possibly near and here, but it's gonna pass. That is so many opportunities with AI, with data, and with the whole data industry that we don't even know what the future applications are gonna look like. Future applications... are going to be driven by data. Making sure that you have the right data in the right place, processed properly, will make your company more effective, more worthwhile. I've seen this again and again and again. And like I said, I did an application in 2000 without understanding what the implications are. I did the same application at Clorox with the intention of nobody gave me the requirement. I came up with that model. I had my, I pitched that idea to my CIO,

Benjamin: and that's just awesome.

Krishnan Viswanathan: made that success. And the whole reason was I had that background in data. I had the background in business knowledge to say from an executive point of view. This is what they would look at. And of course I had help. I'm not going to say that I didn't get, but trying to fine tune that I could do that on a different technology on a different platform, because I understood the data and the value of data and how people process that data. So

Benjamin: Thanks for watching!

Krishnan Viswanathan: data itself is abstract data is difficult to comprehend. Making it easier. That's the job of a data engineer, data scientist. However you want to label it, data analyst, data scientist, data engineer. Those are all different flavors of the same thing. Make sure when you are in front of an audience and trying to pitch, you understand the data. Have a good data representation, valid, not biased, valid data representation that can help others see the picture you want them to see. And... That itself is a big hurdle to pass. And then once you have done that, the third aspect of it is all the cool technologies that are in place. You can build AI and you can build ML, but if you don't do these two things and give them what can be the truth, what are the future of the possibility, I think data will just remain unprocessed, understand.

Benjamin: Is this something you can kind of learn or you just learn kind of over time as you're doing it, right? Because especially if you're a kind of data team catering to different parts of an organization, you will also have so many different requirements and kind of different teams trying to do different things with the data and so on. That actually catering to your audience might be really, really hard because it's just, there is no one audience.

Krishnan Viswanathan: This is probably the biggest learning I have. I come out and say, like, I know all these things. I would tell you how many times I've gone into meetings. So remember, and I can remember very vividly, I was at Clorox. And this was my first day at Clorox, first few days at Clorox. And I was pretty confident because I knew data, I know what it is, and I could talk data, right? I've done some work on it. And I went into this marketing meeting, and I was... kind of referencing certain things with a different terminology than finance or marketing people would reference. And these are pure marketing guys. And they're like, what the heck are you talking about? I have no idea what you're saying. It's just like we're talking two different languages. So I had to go back. I had to spend time with the marketing team to understand, go back and understand that use cases, understand exactly what they're trying to do. So I had to bite the bullet and spend time, bad as it may sound for me now, to acknowledge. I have to relearn data. That is no easy answer. I think that was my earliest mistake. And now I'm a little bit more careful. And every once in a while, I get overconfident, and I try to think like I know everything, till I get knocked down and said, nope, you don't know. And that's always the case. Data will always have a part of it that's different than what everybody will perceive it to be. And that's the challenge. You are closest to the data. But you still don't have a grasp, I believe, and this is all coming from, this is not my opinion. Just to be clear. You'll still have a grasp of the data, but it will be contrary to many of the biases other people have built in. People have looked at, it's like the, I don't know if you know, and it might be an Indian term, but seven blind men touching an elephant. That's the story of data. Somebody's touching the trunk, somebody's touching the tail, somebody's touching the legs. That's your data. The only person who actually has a full view of the data is potentially a data guy, a data engineer, a data scientist. And how do you present it to the people who are primarily biased and biases is the blindfold and have a view of that bias and they're higher up in the organization? So ego-sacrifice. So how do you do that? And there are areas where you don't confront, and there are areas that you go in and present in open ended points of conversation and eventually bring them along. And that's going to be the challenge. I think today the world is in a place where everybody understands that data is big. Data is huge. You know, LLMs are proving that if you have large learning models, you can actually do a lot of fun stuff, but again, that are going to be skeptics. So making sure that you can wrap your use case wrap, whatever you're trying to do within a right. information providing both the views of it right so you don't want to only provide views that are validating your point of view you also want to provide areas where it says oh here are the area that i found that are contrary to my views but these are potentially having explanation of why you think they are Something that builds to your narrative, something that builds to your storyline is important. So end of the day, telling a story is critical. And data engineers need to get better at storytelling. Data scientists and data visualization engineers need to get better at understanding the data. I think it's a two-way street there. So together, I think we have a very strong team.

Benjamin: Awesome, perfect. I think those were great closing words, Krishnan. It was great having you on the podcast, really kind of very, very awesome. I hope you have a great day. Yeah.

Krishnan Viswanathan: Thank you very much, Benjamin. I appreciate your time. Thanks for giving me this opportunity. I'm excited to be sharing my information. Take care.

Table of Contents

This is some text inside of a div block.

This is some text inside of a div block.

From Zero to 100M Users: Inside Notion’s Data Stack and AI Strategy with Sumit Gupta

Master AI data workflows and key soft skills for your evolving data career, with tips from Notion's Lead BI Engineer.

Firebolt Team

Introducing Firebolt Core - Self-Hosted Firebolt, For Free, Forever

Dive into the workings of the forever free, self-hosted edition of Firebolt’s distributed query engine

Mosha Pasumansky

Making Firebolt Fast By Doing Practically Nothing

Learn about the different methods deployed in Firebolt for reducing the number of scanned rows (aka pruning).

Ori Brostovski

Intrigued? Want to read some more?