Schema evolution is just the tip of the iceberg huge amounts of data bring unique set of challenges that we have only recently started to truly understand. Even a well thought-out schema will eventually change in some way (new fields added data types changed re-partition is needed etc) and when that happens - are your data warehouse platform and data catalogue equipped with making such a change easily? Any real-world data warehouse is going to eventually deal with one serious devil - data has many faces, and none of them will ever stay the same. You shouldn't underestimate the importance of data cataloguing. And by Data Warehousing we mean a solution or platform which let's you run ad-hoc queries on large volumes of data (so called "BigData") efficient storage of this data and cataloguing the data in a way that makes it easy to later append or update it. It is no surprise then that it is companies like Google, Uber, Netflix and Facebook that are driving innovation in the Data Warehousing space. Even fewer are the organizations that run data warehouses in really amazing scales. Today, only rich companies or truly data-driven companies can afford to put efforts into launching a real data warehouse. The on-going competition between cloud giants AWS, Google Cloud Platform and Microsoft Azure keeps bringing prices down and creating your own full-scale cloud solution is cheaper than ever.ĭata Warehouses, however, are by definition not going to be cheap to create and operate. We believe Iceberg has huge potential of changing the way we do Data Warehousing and you should check it out!Ĭloud Computing today is accessible by everyone: anyone can launch a EC2 instance on AWS or write entire systems using Serverless technologies without launching even a single VM. It is designed to improve on the de-facto standard table layout built into Hive, Presto, and Spark. Apache Iceberg is a new table format for storing large, slow-moving tabular data.
0 Comments
Leave a Reply. |