推荐序一
We released ClickHouse in open-source in 2016, four years ago. And as far as I know, this book is going to be the first published book on ClickHouse. Why do I appreciate that so much? When we released ClickHouse, we had only one goal in mind, to give people the fastest analytical DBMS in the world. But now, after a few years, I see many more opportunities. We can make ClickHouse an example of the most community and developers friendly open-source product.
According to Eric S. Raymond, there are two models of software development:the“Cathedral” and the “Bazaar” model. In the first model, the software is developed by a closed team of a few developers who “do the right thing”. An example of the “Cathedral” model is SQLite that is developed mostly by a single person—Richard Hipp. In contrast, the “Bazaar”model is trying to benefit by invitation of as many independent developers as possible. An example of the “Bazaar” model is the Linux kernel. For ClickHouse, we practice the “Bazaar”model. But this model requires many efforts in building the community. These efforts are summarized in the following 8 points.
(1)The development process must be as open as possible, no secrets should be kept; we should do everything in public.
(2)The codebase must be well documented and understandable even for amateur developers. And amateur developers should be able to learn good practices from ClickHouse.
(3)We should be eager to try experimental algorithms and libraries, to be on edge and invite more enthusiastic people. As an example, today is 2020, so we are using C++20 language standard.
(4)We should move fast. Try 10 algorithms, throw off 9 of 10, and keep moving forward.
(5)To keep the codebase sane, we should define high-quality standards. And enforce these standards by automated tools in a continuous integration process, not by arguing with people.
(6)We should maintain a high accept rate of contributions. Even if a contribution is not ready, we should actively help each other. Even if the code is wrong, we keep the idea and make it right. Contributors should feel their efforts well received, and they should be proud of their contributions.
(7)ClickHouse is for everyone. You can make a product on top of ClickHouse, and use it in your company, and we will welcome it. We love our users, and we are interested in ClickHouse widespread in any possible way.
(8)We need to provide good tutorials and educational materials for potential contributors. I hope that this book helps people to understand ClickHouse architecture.
The Cathedral model is easier to manage, but the Bazaar model definitely gives more fun!
We can make ClickHouse the best educational and research product in the area of database engineering.
If you look at the architectural details of ClickHouse, you will find that most of them are nothing new. Most of them are already well researched and implemented in some other systems. What's unique in ClickHouse is the combination of these choices, how well they are integrated together, and the attention to implementation details. There are multiple books on computer science, on managing data, and so on. But you will not find many that describe the internals, the guts, and low-level details that differentiate one system from another. ClickHouse can be considered as a collection of good choices in implementation and also as a playground for experiments. And I hope that this book will guide you through these details.
ClickHouse should become a standard of good usability among database management systems.
A database management system is not an easy product to develop. And people get used to that it is neither easy to work with. Distributed systems are even harder. Working with a typical distributed system is a painful experience from the start. But we can try to break this stereotype. At least we can eliminate typical obstacles. ClickHouse is easy to set up and run, so you can start working in minutes. But what about further details like data replication, distributed set up, choosing of table engines, and indexes? Couldn't they be so simple too? Probably not. But at least they can be understandable. This book covers these details, and you will understand what is under the hood of ClickHouse.
Alexey Milovidov
Head of ClickHouse Development Team at Yandex
我们在2016年发布了ClickHouse的开源版本。据我所知,这本书将是关于ClickHouse的第一本正式出版的图书,对此我非常激动。因为当我们发布ClickHouse的时候,心中只有一个目标,即向人们提供世界上最快的分析型数据库。而现在,我看到了更多的可能性。我们可以把ClickHouse打造成面向社区与开发者的最友好的开源产品。
根据Eric S.Raymond的理论,目前主要有两种软件开发模式——Cathedral(大教堂)模式与Bazaar(集市)模式。在Cathedral模式中,软件由一个封闭的开发者小组进行开发。使用该模式开发的典型产品就是SQLite数据库,它是由Richard Hipp一个人开发的。而Bazaar模式则是邀请尽可能多的独立开发者进行开发,Linux内核就是采用这种模式开发出来的。对ClickHouse而言,我们采用了Bazaar模式。采用Bazaar模式,需要花费很大的精力来维护开发社区。对于在开发ClickHouse的过程中采用Bazaar模式,我总结出了以下8点经验。
(1)整个开发流程完全公开透明,没有任何秘密。
(2)有帮助理解代码的、新手开发者也可看懂的详细文档,这样新手开发者可从ClickHouse代码中学到有价值的实践经验。
(3)乐于尝试新的算法与第三方库,以保持ClickHouse的先进性,也只有这样才能吸引更多的开发热爱者。例如,刚刚进入2020年,我们就在ClickHouse中应用了C++20标准。
(4)虽然需要快速迭代ClickHouse,但是我们依然不会放低要求,比如我们为了使用1个算法,就会至少尝试10个算法。而且在选择了某个算法后,后续还会继续尝试其他更多算法,以便下次迭代时使用。
(5)为了保证代码的质量,我们始终向高标准看齐,并使用工具来确保这些标准得以实施,而不是人为干预。
(6)对于贡献者提交的补丁,我们保证有比较高的接收率。即使某个补丁还没有完成,我们也会适当参与,为贡献者提供帮助。若补丁的代码中有错误,我们会尝试修复。这样补丁的贡献者会感受到他们的努力获得了认可,并因此感到自豪。
(7)ClickHouse是提供给所有人的,你甚至可以用ClickHouse来实现其他产品,也可以把它部署在自己的公司。我们爱我们的用户,我们对ClickHouse在任何场景下的应用都表示支持,并且有兴趣了解你的使用情况。
(8)我们希望为潜在的贡献者提供高质量的教程和参考资料。我很高兴看到这本书上市,因为它能够帮助读者理解ClickHouse的架构。
Cathedral模式便于管理,但是Bazaar模式显然更有意思!我们可以把ClickHouse打造为最适合用于数据库教学与研究的产品。如果细看ClickHouse的架构,你会发现其中没有什么新颖的技术,其中使用的大部分技术都是经过了多年研究并已在其他数据库中实现了的成熟技术。ClickHouse独特的地方在于其高效地将这些技术结合了起来并灵活地加以运用,在此过程中我们十分注重具体的实现方式与细节。许多图书在介绍计算机科学或数据管理的知识时并不会在细节方面进行展开,也不会对不同的系统针对底层实现进行对比,为了对此进行补充,ClickHouse在上述两方面进行了尝试。作为一个比较好的技术实现集合,ClickHouse特别适合用来在细节方面做性能优化实验。我希望这本书能够引导你了解这些技术细节。
ClickHouse可以作为数据库中易用性的代表。数据库系统并不是一款容易开发的产品,这也使得人们认为数据库开发上手很难,分布式数据库的开发就更难了,甚至在刚开始使用分布式系统时会觉得非常烦琐。在开发ClickHouse的过程中,我们尝试打破这些固有的认识,至少扫清了一些常见的障碍。ClickHouse上手非常容易,你可以在几分钟内安装好并开始使用。然而如果你需要使用更多的功能,如数据副本、分布式、不同的表引擎、索引等,就不会那么简单了,但这些功能在理解与学习上相对于其他数据库还是简单的。本书介绍了理解和学习ClickHouse的方法,也介绍了ClickHouse的诸多细节。通过这本书你将会透彻理解ClickHouse是如何运行的。
Alexey Milovidov
Yandex公司ClickHouse开发团队负责人
(郑天祺译)