We released ClickHouse in open-source in 2016, four years ago. And as far as I know, this book is going to be the first published book on ClickHouse. Why do I appreciate that so much? When we released ClickHouse, we had only one goal in mind, to give people the fastest analytical DBMS in the world. But now, after a few years, I see many more opportunities. We can make ClickHouse an example of the most community and developers friendly open-source product.
According to Eric S. Raymond, there are two models of software development:the“Cathedral” and the “Bazaar” model. In the first model, the software is developed by a closed team of a few developers who “do the right thing”. An example of the “Cathedral” model is SQLite that is developed mostly by a single person—Richard Hipp. In contrast, the “Bazaar”model is trying to benefit by invitation of as many independent developers as possible. An example of the “Bazaar” model is the Linux kernel. For ClickHouse, we practice the “Bazaar”model. But this model requires many efforts in building the community. These efforts are summarized in the following 8 points.
(1)The development process must be as open as possible, no secrets should be kept; we should do everything in public.
(2)The codebase must be well documented and understandable even for amateur developers. And amateur developers should be able to learn good practices from ClickHouse.
(3)We should be eager to try experimental algorithms and libraries, to be on edge and invite more enthusiastic people. As an example, today is 2020, so we are using C++20 language standard.
(4)We should move fast. Try 10 algorithms, throw off 9 of 10, and keep moving forward.
(5)To keep the codebase sane, we should define high-quality standards. And enforce these standards by automated tools in a continuous integration process, not by arguing with people.
(6)We should maintain a high accept rate of contributions. Even if a contribution is not ready, we should actively help each other. Even if the code is wrong, we keep the idea and make it right. Contributors should feel their efforts well received, and they should be proud of their contributions.
(7)ClickHouse is for everyone. You can make a product on top of ClickHouse, and use it in your company, and we will welcome it. We love our users, and we are interested in ClickHouse widespread in any possible way.
(8)We need to provide good tutorials and educational materials for potential contributors. I hope that this book helps people to understand ClickHouse architecture.
The Cathedral model is easier to manage, but the Bazaar model definitely gives more fun!
We can make ClickHouse the best educational and research product in the area of database engineering.
If you look at the architectural details of ClickHouse, you will find that most of them are nothing new. Most of them are already well researched and implemented in some other systems. What's unique in ClickHouse is the combination of these choices, how well they are integrated together, and the attention to implementation details. There are multiple books on computer science, on managing data, and so on. But you will not find many that describe the internals, the guts, and low-level details that differentiate one system from another. ClickHouse can be considered as a collection of good choices in implementation and also as a playground for experiments. And I hope that this book will guide you through these details.
ClickHouse should become a standard of good usability among database management systems.
A database management system is not an easy product to develop. And people get used to that it is neither easy to work with. Distributed systems are even harder. Working with a typical distributed system is a painful experience from the start. But we can try to break this stereotype. At least we can eliminate typical obstacles. ClickHouse is easy to set up and run, so you can start working in minutes. But what about further details like data replication, distributed set up, choosing of table engines, and indexes? Couldn't they be so simple too? Probably not. But at least they can be understandable. This book covers these details, and you will understand what is under the hood of ClickHouse.
Alexey Milovidov
Head of ClickHouse Development Team at Yandex
根据Eric S.Raymond的理论,目前主要有两种软件开发模式——Cathedral(大教堂)模式与Bazaar(集市)模式。在Cathedral模式中,软件由一个封闭的开发者小组进行开发。使用该模式开发的典型产品就是SQLite数据库,它是由Richard Hipp一个人开发的。而Bazaar模式则是邀请尽可能多的独立开发者进行开发,Linux内核就是采用这种模式开发出来的。对ClickHouse而言,我们采用了Bazaar模式。采用Bazaar模式,需要花费很大的精力来维护开发社区。对于在开发ClickHouse的过程中采用Bazaar模式,我总结出了以下8点经验。
Alexey Milovidov