May 12, 2025

How Rising Wave Is Redefining Real-Time Data with Postgres Power

Multiple contributors

May 12, 2025

How Rising Wave Is Redefining Real-Time Data with Postgres Power

Multiple contributors

No items found.

Listen to this article

Powered by NotebookLM

Listen to this article

In this episode of The Data Engineering Show, the bros sit with Yingjun Wu, founder and CEO of Rising Wave, to explore the innovative world of stream processing systems. Yingjun shares his journey from academic research to creating a Postgres-compatible streaming system that drastically reduces resource usage. They discuss how Rising Wave's S3-based architecture and Postgres compatibility provide advantages over traditional systems like Flink, and explore the increasing role of Apache Iceberg in data pipelines.

‍

Benjamin - 00:00:00:

This is interesting, right? Because this seems to be a general trend in data management at the moment, in a sense that when Snowflake came up, they were the ones who invented in many ways, okay, decoupling storage and compute, using object storage as the lowest tier. And now you're seeing these kind of, I would say, new age systems using the same underlying principle to kind of disrupt that space. So RisingWave for streaming, you had WarpStream, which was acquired by Confluent, kind of doing it around Kafka and so on, right? So I think it's super interesting how this dynamic right now is actually playing out throughout the entire data infrastructure stack.

‍

YingJun Wu - 00:00:46:

Yeah, we are definitely the first one to build a kind of RR3-based or systems who are in the stream processing domain, even earlier than WarpStream.

‍

Intro - 00:00:56:

The Data Engineering Show is brought to you by Firebolt, the cloud data warehouse for AI apps and low-latency analytics. Get your free credits and start your trial at Firebolt.io.

‍

Benjamin - 00:01:09:

All right. Hi, everyone, and welcome back to The Data Engineering Show. Today, we're super happy to have on YingJun Wu, who is the founder and CEO of RisingWave. Welcome to the show. For the listeners who haven't heard of RisingWave and maybe Eldad, do you want to tell us what the system is all about, kind of what you're working on? I think it's super cool, so I'm sure everyone else would love to hear more, too.

‍

YingJun Wu - 00:01:29:

Sure. First of all, thanks for having me here. And I'm YingJun Wu, and I'm a founder of RisingWave. So, yeah, people always ask me about, okay, what are you currently working on? Like, I would say, okay, definitely, what is RisingWave? RisingWave is a stream processing system. But this day show a lot of buzzwords about the Iceberg. We could probably discuss that later. But essentially, we found a company four plus years ago. Before that, I would say the Amazon Redshift as well as IBM Research Almaden. So, actually, people sometimes will ask me, okay, why Redshift is like a Bash base? So, the data warehouse, right? Well, why you started working on something like stream processing? A little bit of history of myself. I have my PhD in stream processing and database systems.

‍

Eldad - 00:02:16:

You had no choice. You had no choice, basically.

‍

Benjamin - 00:02:19:

And you spent some time in Andy Pavel's group as well, right? Like who's a previous guest on the podcast. So I love how we're like connecting. Exactly. Former guest of honor. And now we have you as guest of honor.

‍

YingJun Wu - 00:02:30:

Yeah, I'm not sure whether we have enough time to dive deep into details. But essentially, when I was with Andy Pavlo, we were doing something like transaction processing. And essentially, at the time, there was a concept called deterministic transaction processing. Basically, it's like, okay, reorder the transactions before execution. And essentially, the concept is the same as stream processing. And that's why I have that. During my PhD, I studied both stream processing and transaction processing. Because at the time, there was just some concept that would link these two things together.

‍

Benjamin - 00:03:03:

Nice. Cool. And then fast forward after your PhD, you said time to build a new database system.

‍

YingJun Wu - 00:03:10:

Oh, yeah, definitely. I mean, Redshift was, yeah, IBM, I mean, dbt, right? Well, yeah, Redshift actually at that time was probably 10 years old. If you look at Redshift or batch-based system, but well, how about, I mean, stream processing? Because well, essentially, in Redshift, a large percent of the data is actually came, was ingested from Kinesis or Kafka, right? And these are streaming data, but there was no stream processing systems. Well, I mean, there were some stream processing systems like at the time, Spark streaming, Flink, these are still popular these days. At the time, there was also KSQLDB and Apache Storm, but these kinds of systems were not that easy to use. And there were some fundamental issues with the key concept was called state management. We can probably dive deeper into details, but yeah, there were some issues. So I feel that's right. Yeah, it's the right time to build something new. And that's why we find that company.

‍

Benjamin - 00:04:03:

Nice. That's awesome. So tell us about those early days. I mean, like, this is one of those, like, building a system from scratch is like... Crazy daunting challenge, right? Like you need to do everything, like parser, planner, runtime, kind of all of it for streaming system, integrating with storage. Like how long did it take you to get to a first version of the product? When did you launch it? How did your first users like it? Take us through the journey of RisingWave, basically.

‍

YingJun Wu - 00:04:30:

It's definitely what's appearing. I mean, during the first two or three years, whenever you started building something, let's say a database system, so I have very hard called low-level data systems, beta-infra system, then essentially it's not about your concept. It's not about your, let's say, the design principles, right? It's more about, okay, how old the system is. It's more about the trust. If you say, tell people, okay, my system was just two years old. Okay, then people will say, okay, that's a new system. That's pretty cool. I will probably take a look at that. And I probably will follow you, but nobody will take it seriously, to be honest. And in our case, we did not really have any customers for the first two years. Actually, the first two and a half years. And the first POC user was like tiny startups. And we found startups because, I mean, they actually just having startups, trusted startups. Also, we know each other. And yeah, so they adopted RisingWave. And I definitely also pitched the RisingWave to some big companies. So, but the world would ever say that, okay, like what, RisingWave is like our new system. And you do not need to put it into production. Just to use it for in some of your testing environments. Because it's so easy to use. We have actually have the single binary. We are Postgres compatible. And I mean, RisingWave itself is not single binary, but we actually made a single binary version just for some customers to self-deploy. No Docker, just a single binary, just like DuckDB. But well, anyways, for the first two years was just like tough. And I think we were definitely blessed that we got first a few customers in the year 2.5. We have at least probably three of them. One of them was like a crypto company. At the time, it was also a pretty tiny company. But actually these days, well, they are still all customers and paying us six figures every year. They've become big. And they grew from a startup to a big company. And the other two are both pretty big companies. One of them actually are Fortune 500 companies. So I feel that's why it's like just a blast and we got lucky. But this is why it's like because we have already had a pretty good customer base and we never need to answer questions like, okay, whether your database system is trustworthy or not, or we just have a page showing all customer logos and that's it.

‍

Benjamin - 00:06:53:

Nice. Cool. Zooming out a bit, like take us through the current stream processing space, right? Because there's, nowadays, actually a fair amount of systems. I think like the OG is kind of Flink. That's been around for quite a while. There are systems like Materialize that are also Postgres compliant. There are systems like RisingWave now, which are like built in Rust. Okay, Materialize as well. Kind of like, how are these systems different? What makes RisingWave special? And why would I choose RisingWave, I guess?

‍

YingJun Wu - 00:07:21:

Okay, so I believe that's for RisingWave is definitely a number one stream processing system in the world today. I mean, if you want to add something, for sure, I mean, go ahead. I probably removed my sentence. But, well, I mean, personally, I feel that's for we are the number one stream processing system at the moment. I mean, definitely Flink. Well, for Flink, well, people use it for, I mean, if people need to have, let's say, a Java API, for sure, yeah, go for Flink. But, well, if you just focus on the SQL, then it's really hard for us to lose to any other systems. If I talk about design principles, well, then we have several things I want to highlight. The first one, actually, I think we have already mentioned we are Rust-based. I mean, Rust-based, well, it's not like a design principle, but, well, Rust always gives the people the feeling that it's cool. So, yeah, we attract a lot of early adopters, probably just because of Rust. And second thing here, that's well, S3 is the primary storage, basically the corporate-companion storage architecture. And some people would say, okay, it's like Snowflake-like architecture, right? So we started from day one, it's like S3-based architecture. And if you check out some other systems, well, I mean, like Flink, well, they probably just recently announced that, well, they were supposed to put that. But we actually have already adopted this architecture from day one and basically baked it for in production for over four, lots of four years. So there are a lot of like tricks I probably could discuss later.

‍

Benjamin - 00:08:50:

This is interesting, right? Because this seems to be a general trend in data management at the moment, in a sense that when Snowflake came up, they were the ones who invented in many ways, okay, decoupling storage and compute, using object storage as the lowest tier. And now you're seeing these kind of, I would say, new age systems using the same underlying principle to kind of disrupt that space. So RisingWave for streaming, you had WarpStream, which was acquired by Confluent, kind of doing it around Kafka and so on, right? So I think it's super interesting how this dynamic right now is actually playing out throughout the entire data infrastructure stack. You have like Estuary, I think, who we had on the show, like Danny Palmer before, who were like building ELT Tooling on top of S3. So I think it's like interesting how that's just happening everywhere, basically.

‍

YingJun Wu - 00:09:35:

Yeah, we are definitely the first one to build a kind of Raya 3-based system in the stream processing domain, even earlier than WarpStream. We will be funding in 2021 and all the other systems will probably later. But essentially there are a lot of, there are a lot of like tricks and I have to tell you that it's pretty transparent. Essentially over the last two or three years, we were also thinking about, okay, whether we were doing the wrong design decision. And I can hear you more about that because for a lot of, we are stream processing systems and S3 has two characteristics. Well, one is S3 slow. Second one is S3 expensive. So, I mean S3 storage is not expensive. It's just the 23 bucks per terabyte per month. But if you look at where the S3 is access rate, they actually are charged for get input in all these operations. So I mean, that's super expensive and essentially it costs us a lot of money. And that's why we actually revisited whether we were asking us, okay, if we want to rebuild the system again in 2025, should we adopt the exact same architecture? But essentially we did a lot of optimizations. And now our answer is that's for years. If we build a system again in 2025 or even 2006, we'll stick with this architecture and there's no change. But definitely we prioritize some of our roadmaps. But essentially it's super good for several reasons. The first one is that you never need to worry about the state management because everything is in S3. So yeah, we always trust S3. And secondly, it's about the elastic scaling. In Regional, if we can achieve a second level elastic scaling, but for the other systems like Flink, probably, yeah. Whenever you want to do some elastic scaling, then it probably takes hours.

‍

Benjamin - 00:11:29:

Which isn't very elastic if it takes hours.

‍

YingJun Wu - 00:11:33:

Yeah. So they actually have dynamic scaling to be honest. Well, I mean, I don't really want to hide anything, but they do have dynamic scaling. But the issue here is that people do not really use that swing adoption. Because for the people who think that it's not good for that, which is true. But in our case, well, people do dynamic scaling all the time. Yeah.

‍

Benjamin - 00:11:54:

Gotcha. So RisingWave itself is open source, right? Kind of an Apache license. At the same time, you also have a commercial offering. Take us through that because it's such a common model nowadays, right? Like you have kind of ClickHouse doing something similar. A lot of other players in the space doing something similar. What is free? What is paid? When would I become a cloud customer, basically?

‍

YingJun Wu - 00:12:15:

There are several things. I mean, we have the cloud offering, we have BLC bringing on cloud, and we also have on-prem. So on-prem, people say auto-powering business. Well, I mean, but we have to say that so like, if you want to enter banks, all these kind of highly regulated industries, you actually have to be on-prem. We actually enter banks. We also enter the aerospace. Nice. Yeah, I mean, this industry is what I mean.

‍

Benjamin - 00:12:41:

Is RisingWave running in space? One of our questions, we had Hannes Mühleisen on who said DuckDB is running in space. So is RisingWave running in space?

‍

YingJun Wu - 00:12:50:

I mean, it's definitely not in a satellite, but I actually don't know where they deployed.

‍

Eldad - 00:12:54:

It might, it might. For all we know, it might. Tell me something. Who would be the typical user? Where do they come from? They decide to build a new streaming system? Are they coming from a Snowflake? Are they coming from a Kafka, Redpanda? Do they come from Flink? Like, how would you categorize that person?

‍

YingJun Wu - 00:13:14:

So we definitely do not really compete with Kafka. We do not really compete with Redpanda. We have super good friends with Redpanda. They're awesome. We're definitely a good friend with WarpStream. But we do compete with Flink, directly compete with Flink. And so basically, there are two cases. The first one is that, okay, people probably are new to stream processing and they want to adopt a new system to process their streaming data. And then they do an evaluation with us. Well, they probably always start with the easiest solution. And we are essentially the easiest solution. Why? Because we are Postgres compatible. That's it. We are Postgres compatible. Challenge me. I mean, you can go with Flink, but I mean, you learn the SQL. But in all case, Postgres.

‍

Benjamin - 00:13:56:

Don't have a hard tough crowd here. It's like since Firebolt is Postgres compliant, we're all in on your vision of the world. Like everything should just be Postgres compliant.

‍

YingJun Wu - 00:14:06:

Definitely. I think Postgres definitely helped us a lot. Well, I mean, people always love Postgres and there's no such kind of headache learning something new. So basically people are new to stream processing system and they want to adopt a new system. And typically, RisingWave is probably the first choice. And second option is as well, okay, they have already adopted, well, the Spark streaming on Flink. And then they want to, they feel some pain and they want to migrate. So a very reasonable case, which I'm super proud of is as well, I cannot shout the company name because we signed on NDA, but it's a super big company. And they actually run a certain system, either Spark streaming or Flink. I could not tell you which one because there are some issues. And they actually run a system with 20k CPUs, 20,000 CPUs. Wow. And then they use RisingWave and they only use 600 CPUs.

‍

Benjamin - 00:15:02:

Wow, that's massive.

‍

YingJun Wu - 00:15:05:

I have my PhD and I always challenge such kind of number, right? Well, how you can reduce the, I mean, the number of C2 from 20,000 to 600, how that's possible. So I also dot that for it, but I actually talked to them. So the reason here, that's for their like state management, where they do joins, multi-way joins. So for all the other systems that are not scalable, but for in writing wave is super scalable. And we did a ton of optimization. So that's basically migration case. People feel the pain. They feel that's okay. They spend too many resources in the other systems. And sometimes what people say, okay, I used to use Java, but well, it's so too slow. I mean, in terms of development, I want to shift to SQL-based systems. Or sometimes people say, okay, I don't really want to spend four hours waiting for the Flink to get recovered from failures. Well, that was from a bank. And yeah, they got some issues with failure recovery and they decided to pivot and switch to some other systems from Flink. And that's all these kind of opportunities we got. Yeah.

‍

Benjamin - 00:16:13:

Nice. That's very cool. So switching gears a bit, like my link and feed recently has been full of actually posts by you talking about Apache Iceberg. So you're very good at kind of amplifying content there. When I think of Iceberg, I mainly think about it in terms of batch processing, right? Because this is kind of, this is what we're building it for. Take us through how Iceberg is changing the game for streaming.

‍

YingJun Wu - 00:16:34:

So there are two things I want to mention. The first one, Iceberg is Bash. I mean, they actually compete with Snowflake, Cresthaft, BigQuery, whatever. And in the past, we basically avoid a vendor lock-in because it's open table format. As long as you adopt Iceberg again, then probably you could use DuckDB to use Firebolt to use any other system to query the data from there, right?

‍

Eldad - 00:16:58:

Didn't they say that on CSV5 as well?

‍

YingJun Wu - 00:17:00:

Yeah. I was in data console yesterday. Yeah. I know that's where you guys were also there.

‍

Benjamin - 00:17:05:

Yeah. Some of our folks from the Bay Area.

‍

YingJun Wu - 00:17:06:

Yeah. Yeah. They actually didn't meet. And some other guys who I talked to. So basically, yeah, if you adopt Iceberg and there's no vendor lock-in, right? So you can use any query engine. So for stream processing, you asked me about, okay, well, what's the relationship? Well, so basically in the past, well, I mean, our customers were essentially, after processing, they essentially send the data into Snowflake, BigQuery or some other systems, Redshift. But nowadays they think that's okay. I don't really want to be vendor lock-in, right? Well, let's send the data into Iceberg. So essentially that's why Iceberg became, I think, our top three destinations, right? Because of here.

‍

Benjamin - 00:17:43:

This is super interesting to me because when we think of Iceberg, we mainly think about it like as a source, right? It's okay. Like use a Snowflake managed table on Polaris Catalog and Snowflake or something. Can we be more efficient in terms of query processing on top of that? What you're saying is that like for a streaming system like RisingWave, the sync story is actually more interesting than the source story. So you're like the fabric that populates the Iceberg cable. Is that fair?

‍

YingJun Wu - 00:18:09:

That's fair enough. Yeah. We do ingest the data from, also from Iceberg. We have the capability. By the way, I mean, most people, I mean, sync the data into Iceberg because they are free of and they're locking and they just want to adopt the Iceberg. We see this pattern pretty obviously in all these kinds of like enterprises. There are VP who will say, okay, like, I don't really want to pay Snowflake too much money. I just want to shift. What's my strategy? Okay, let's do Iceberg. That's the thing. That's always a conversation we've heard about. And yeah, basically the vendor lock-in is a big thing that's a big thing. From the stream processing system perspective, definitely we need to think about the destinations.

‍

Benjamin - 00:18:47:

So, Staying with this destination thing, because I want to understand how a data pipeline like that looks, right? Like, would it be, I have this event stream, which is like, has like just bunch of data. I don't want to ingest like the raw data, for example, into my Iceberg Table, but use a streaming engine like RisingWave to compute me one second averages over some metric. And then I ingest that into Iceberg. Like, what is the actual data processing part that RisingWave usually does in these data pipelines, basically?

‍

YingJun Wu - 00:19:17:

So like, well, there's always debate about ETL or ELT, right? So basically, ELT means that's okay, let's just ingest all the raw data into my data warehouse. And the ETL means that, okay, let's do some preprocessing before sending my data into data warehouse. I think in Iceberg world, there are typically people do ETL instead of ELT. Why? Okay, GDPR. The reason here, that's what, like, I mean, you don't really want to send out PII data, right, but private data into your lake. Because if you send it into lake, then who to delete it, right? Well, I mean, that's always a challenge. And Iceberg is super good at handling a lot of updates. So essentially, people actually need to delete the data or remove all this sensitive data before sending the data into Iceberg, right? In some other case, it will be like, okay, so people probably have data across the different regions. And they actually want to do some drawings and aggregations, unions. Well, before sending your data into Iceberg. And also definitely people have the requirement of doing enrichment. So let's say that's where I have the Clickstream and I want to also have the user info, right? Well, I want to enrich the data. Why people not just do it into Iceberg? I mean, you can definitely do that for using whatever system you want, right? Like Spark, whatever. But I think here that's where it's already like give people the impression that's where it's historical data. And I don't really want to do a lot of like ETL inside of lake. So people do the ETL before ingesting into lake. But I think what things will be changing because as long as people ingest more and more data into Iceberg, I believe that's where they will think, okay, let's just do ETL inside of lake. Yeah, I'm actually following the icebergs back quite closely. And in v3, they actually have a feature called Iceberg materialized view. And that's essentially for the ETL in Iceberg. And I think Verizon will probably become one of the first vendors that's a part of our Iceberg materialized view because we are essentially ready. And we are just waiting for the Iceberg of v3 got merged. Yeah.

‍

Benjamin - 00:21:25:

Nice. Okay. That's really cool. In v3 spec, the MV definition is that like, I mean, one problem is like SQL dialect differences between systems. How do they handle that within like the Iceberg Specification? Like if I define an MV in like an Iceberg Table, what SQL dialect is that specified? And like, that seems like a nasty problem.

‍

YingJun Wu - 00:21:44:

Look, in our case, it's just like Postgres, right? For we don't really care about all the other dialects, right? For we just care about Postgres. They might

‍

Eldad - 00:21:51:

Actually focus on the schema definition, consistency, and kind of how materialist views commit versus on running on the SQL. I don't see any value in Iceberg to own the SQL, the business logic behind the MV. I can see how Iceberg can expand their metadata layer to support materialist views just from a materialize view object kind of entity aspect.

‍

YingJun Wu - 00:22:15:

That's right.

‍

Eldad - 00:22:16:

Let's not get confused. Iceberg is not a processing engineer. It's a metadata catalog that sits on top of Arcade. One of the things that people get confused with Iceberg is it opens up a lot of possibilities, but at the end of the day, they're still doing the same thing they've done before. But what does change with Iceberg is actually the last mile of many data pipelines. So if you think about it, if you were using Snowflake before, to get data out of Snowflake into a different system, you needed to talk to the Snowflake team. You had to extend the Snowflake data pipeline to now export into Parquet. Let's not get confused again. Nobody does low latency exports from Snowflake. People do batch every hour, every day, a few times a day. They export their data to Iceberg. They export their data to an S3 bucket for the same matter. The difference is now they just write the end of the data pipe to Snowflake. So they kind of avoid having an extra export, right? They remove that extra export. But that extra export is not just about cost. It's about enabling the data to move to new locations. If one team owns, if the team that owns Snowflake is kind of the barrier within the company on getting right data, correct quality data to more systems, this is a problem. But if the Snowflake team, just like any other team, when they're done with an EAP or a data pipe, it doesn't matter, batch, streaming, it doesn't matter. As long as you write it to an Iceberg Table, you save a hop on the next stage of having to export it. So instead of exporting it, you import it. So people pull data from Iceberg Table and that changes the whole dynamics. But the same old problem still exists. Parquet is not designed for necessarily for streaming format, to do on the meta low latency out of the box, just like it's not designed to do low latency reads. It's not designed for many workload types, right? Scale out is a challenge. So there is a compromise here. You can put Iceberg and get consistent metadata across systems. You can say, actually, I need multiple search formats within Iceberg. I actually expect Iceberg to grow in terms of format. Now that we have consistent metadata across systems, having a global format is less important now. If every system can read any format and everything is protected via Iceberg, then why not introduce extensions and improve formats so that actually workloads can run better through Iceberg? So Iceberg doesn't make workloads run faster. It makes different systems see the same data and it makes different systems lose their kind of ownership over exporting the data forward to the next system. And I think that's kind of where we'll see a lot of innovation. I've heard someone telling a joke, which is kind of becoming something that kind of floats in the industry that says, well, Iceberg was that intention of opening up and unifying metadata across all vendors, but it ended up just helping Databricks deal with all their own formats. So now Databricks will work for two or three years on Iceberg just to get their own data. And like the 20 catalogs that they already support and they already sell some service to, now they can get all of them under one Iceberg. But still, Databricks, Data Warehouse Engine, which doesn't have much to do with Iceberg, is the highest selling product for Databricks. I think for startups, it's a huge opportunity. I think Iceberg for Snowflake and Databricks is kind of basically a done game, right? The architecture is obvious. I do a dbt on Snowflake. I make sure it's getting written to an Iceberg so I can read it from other systems and vice versa. But the real innovation around Iceberg will come from startups and will come from the new data ecosystem, from new companies introducing productivity tools on top of Iceberg. There's a lot of discussion on how to merge data behind Iceberg, right? A streaming system will need different compaction than a system that maybe does post-applegation. Who owns that? Who owns the tablet behind Iceberg? So a lot of open questions, a lot of opportunities for kind of refiguring the ecosystem. And it's super exciting to see, I think, in this show that streaming databases are here to stay. In fact, they're getting so much stronger and through Iceberg are actually getting much more attention. A normal person would not think about going through the streaming journey without a PhD. That's obvious. Like, you shouldn't even mention your PhD. Like, everyone would be surprised if you wouldn't have one. But opening up to Postgres using S3, just figuring out consistency on streaming with that. Like, you need a PhD for that as well. As you said, everything is connected. So it's amazing to see that complex systems like streaming, PhD-grade systems, are getting the right treatment from the right startups. And I'm looking forward to see how RisingWave and Firebolt can coordinate with more startups around that ecosystem, around that Iceberg. Everyone wants to move out of Snowflake.

‍

YingJun Wu - 00:27:21:

Yeah, definitely. Iceberg is essentially, people say it's called the Single Source of Truth. So, I mean, like we ingest all the data into one Single Source of Truth. So that's where you do not need to think about, okay, where I have my data copied. Do I have my data, copy my data in, let's say, Postgres or have another copy of my data in Snowflake or have a third copy of my data in some other systems, right? They just sort of think, okay, likewise, the Single Source of Truth. Iceberg is a Single Source of Truth. And once you've got a Single Source of Truth, you can have all these kind of like engines, right? Essentially, my hot take, probably not hot take, but for an obvious take, that is for all the databases will become a query engine on top of Iceberg.

‍

Benjamin - 00:28:07:

I don't think it's such a hot take anymore. I think you're two years late.

‍

Eldad - 00:28:10:

A Single Source of Truth justifies the source. The problem with data is not the fact that you move it around. It's not that you clone it and write it into a CPU register or RAM or SSD or a replica on S3. It's the fact that when it changes, there is no bookkeeping on that. So you need to backtrack the whole data pipeline, figure out in which step of the data change that source of truth changed. Now with Iceberg, you get a single, unified, consistent source of truth, and that can now tickle down anywhere. And any system that runs on Iceberg that supports transaction can support it. So you can time travel. You can do a lot of stuff you couldn't do without copying everything into a proprietary environment and then using it end to end. And having streaming on one end, just not one end, actually through the ELT as well sometimes, because sometimes ELT is streaming ELT. But then having that switch to a different engine, doing something else like a scale-out group buy, that becomes very interesting. And that monopoly over, you use everything, use a single stack by single vendor. The compute, the metadata, the workload, everything needs to be done by single vendor. That claim gets shaken and we'll see how that plays out.

‍

YingJun Wu - 00:29:25:

Yeah. Like, well, I mean, Snowflake can just the data probably can read the data. I mean, yeah, it's great for, but sometimes people just have an impression like, well, I don't really want to get under locking. Right. And all problems with people are saying that, well, like, so I have some, let's say specialized data type and that, which can be only be read or probably write by certain systems. And in the past world, you probably need to build an integration between that kind of system with probably Snowflake. But no, we don't need to do that. We're just to build an integration with Iceberg. So it essentially simplifies the development for a lot of vendors, right? So, I mean, in the past, we probably need to think about that. I mean, we do have an integration with Snowflake. Well, yeah, essentially, I mean, if Iceberg came early, we don't need to do that.

‍

Eldad - 00:30:12:

You don't need to talk to Snowflake. That's the beauty. If Snowflake rides to an Iceberg Table, there is no locking. The data is not locked behind the Snowflake wall. It is actually completely open to anyone that can scan or write or read from and to Iceberg.

‍

YingJun Wu - 00:30:28:

Absolutely. Yeah. Yeah. In the past, it's like, okay, I need to think about the work. Okay. I'm in front of a vendor perspective. Okay. I need to think about the work. Okay. How to maintain the pipeline to Snowflake, how to maintain the pipeline to integration with Redshift, how to maintain the pipeline integration with Cray. Now that it's no, I don't think you're worried about that. That's right. I mean, even though we have already had that for, I mean, but these days for me, mainly focus on Iceberg.

‍

Benjamin - 00:30:53:

Perfect. I think that's a beautiful summary to actually end our show today. Thank you so much for being on. I think you guys are running a bunch of meetups in the next couple of months and we'll be kind of participating in some of the European ones. So listeners, kind of take a look, kind of go to some RisingWave meetups. I think it's going to be a blast. It was great having you on the show.

‍

YingJun Wu - 00:31:12:

Sure. Yeah. I mean, we do have our being Europe in May and June and hope to see you there. And I know that's where we have a lot of collaborations. And if people are interested in either RisingWave or Firebolt, just join us.

‍

Benjamin - 00:31:26:

Lovely. It's going to be fun. Thank you for being on.

‍

YingJun Wu - 00:31:28:

Thank you.

‍

Eldad - 00:31:30:

The Data Engineering Show is brought to you by Firebolt, the cloud data warehouse for AI apps and low-latency analytics. Get your free credits and start your trial at Firebolt.io.

‍

‍

Table of Contents

This is some text inside of a div block.

This is some text inside of a div block.

Read all the posts

Introducing Firebolt Core - Self-Hosted Firebolt, For Free, Forever

Dive into the workings of the forever free, self-hosted edition of Firebolt’s distributed query engine

Mosha Pasumansky

Making Firebolt Fast By Doing Practically Nothing

Learn about the different methods deployed in Firebolt for reducing the number of scanned rows (aka pruning).

Ori Brostovski

Live Engine Upgrades, Zero Downtime: The Firebolt Method

Discover how Firebolt delivers seamless, no-downtime upgrades using shadow clusters and real-time performance.

Ilya Shakhat

Intrigued? Want to read some more?