[linux-block.git] / Documentation / maintainer / rebasing-and-merging.rst

.. SPDX-License-Identifier: GPL-2.0

====================
Rebasing and merging
====================

Maintaining a subsystem, as a general rule, requires a familiarity with the
Git source-code management system.  Git is a powerful tool with a lot of
features; as is often the case with such tools, there are right and wrong
ways to use those features.  This document looks in particular at the use
of rebasing and merging.  Maintainers often get in trouble when they use
those tools incorrectly, but avoiding problems is not actually all that
hard.

One thing to be aware of in general is that, unlike many other projects,
the kernel community is not scared by seeing merge commits in its
development history.  Indeed, given the scale of the project, avoiding
merges would be nearly impossible.  Some problems encountered by
maintainers result from a desire to avoid merges, while others come from
merging a little too often.

Rebasing
========

"Rebasing" is the process of changing the history of a series of commits
within a repository.  There are two different types of operations that are
referred to as rebasing since both are done with the ``git rebase``
command, but there are significant differences between them:

 - Changing the parent (starting) commit upon which a series of patches is
   built.  For example, a rebase operation could take a patch set built on
   the previous kernel release and base it, instead, on the current
   release.  We'll call this operation "reparenting" in the discussion
   below.

 - Changing the history of a set of patches by fixing (or deleting) broken
   commits, adding patches, adding tags to commit changelogs, or changing
   the order in which commits are applied.  In the following text, this
   type of operation will be referred to as "history modification"

The term "rebasing" will be used to refer to both of the above operations.
Used properly, rebasing can yield a cleaner and clearer development
history; used improperly, it can obscure that history and introduce bugs.

There are a few rules of thumb that can help developers to avoid the worst
perils of rebasing:

 - History that has been exposed to the world beyond your private system
   should usually not be changed.  Others may have pulled a copy of your
   tree and built on it; modifying your tree will create pain for them.  If
   work is in need of rebasing, that is usually a sign that it is not yet
   ready to be committed to a public repository.

   That said, there are always exceptions.  Some trees (linux-next being
   a significant example) are frequently rebased by their nature, and
   developers know not to base work on them.  Developers will sometimes
   expose an unstable branch for others to test with or for automated
   testing services.  If you do expose a branch that may be unstable in
   this way, be sure that prospective users know not to base work on it.

 - Do not rebase a branch that contains history created by others.  If you
   have pulled changes from another developer's repository, you are now a
   custodian of their history.  You should not change it.  With few
   exceptions, for example, a broken commit in a tree like this should be
   explicitly reverted rather than disappeared via history modification.

 - Do not reparent a tree without a good reason to do so.  Just being on a
   newer base or avoiding a merge with an upstream repository is not
   generally a good reason.

 - If you must reparent a repository, do not pick some random kernel commit
   as the new base.  The kernel is often in a relatively unstable state
   between release points; basing development on one of those points
   increases the chances of running into surprising bugs.  When a patch
   series must move to a new base, pick a stable point (such as one of
   the -rc releases) to move to.

 - Realize that reparenting a patch series (or making significant history
   modifications) changes the environment in which it was developed and,
   likely, invalidates much of the testing that was done.  A reparented
   patch series should, as a general rule, be treated like new code and
   retested from the beginning.

A frequent cause of merge-window trouble is when Linus is presented with a
patch series that has clearly been reparented, often to a random commit,
shortly before the pull request was sent.  The chances of such a series
having been adequately tested are relatively low - as are the chances of
the pull request being acted upon.

If, instead, rebasing is limited to private trees, commits are based on a
well-known starting point, and they are well tested, the potential for
trouble is low.

Merging
=======

Merging is a common operation in the kernel development process; the 5.1
development cycle included 1,126 merge commits - nearly 9% of the total.
Kernel work is accumulated in over 100 different subsystem trees, each of
which may contain multiple topic branches; each branch is usually developed
independently of the others.  So naturally, at least one merge will be
required before any given branch finds its way into an upstream repository.

Many projects require that branches in pull requests be based on the
current trunk so that no merge commits appear in the history.  The kernel
is not such a project; any rebasing of branches to avoid merges will, most
likely, lead to trouble.

Subsystem maintainers find themselves having to do two types of merges:
from lower-level subsystem trees and from others, either sibling trees or
the mainline.  The best practices to follow differ in those two situations.

Merging from lower-level trees
------------------------------

Larger subsystems tend to have multiple levels of maintainers, with the
lower-level maintainers sending pull requests to the higher levels.  Acting
on such a pull request will almost certainly generate a merge commit; that
is as it should be.  In fact, subsystem maintainers may want to use
the --no-ff flag to force the addition of a merge commit in the rare cases
where one would not normally be created so that the reasons for the merge
can be recorded.  The changelog for the merge should, for any kind of
merge, say *why* the merge is being done.  For a lower-level tree, "why" is
usually a summary of the changes that will come with that pull.

Maintainers at all levels should be using signed tags on their pull
requests, and upstream maintainers should verify the tags when pulling
branches.  Failure to do so threatens the security of the development
process as a whole.

As per the rules outlined above, once you have merged somebody else's
history into your tree, you cannot rebase that branch, even if you
otherwise would be able to.

Merging from sibling or upstream trees
--------------------------------------

While merges from downstream are common and unremarkable, merges from other
trees tend to be a red flag when it comes time to push a branch upstream.
Such merges need to be carefully thought about and well justified, or
there's a good chance that a subsequent pull request will be rejected.

It is natural to want to merge the master branch into a repository; this
type of merge is often called a "back merge".  Back merges can help to make
sure that there are no conflicts with parallel development and generally
gives a warm, fuzzy feeling of being up-to-date.  But this temptation
should be avoided almost all of the time.

Why is that?  Back merges will muddy the development history of your own
branch.  They will significantly increase your chances of encountering bugs
from elsewhere in the community and make it hard to ensure that the work
you are managing is stable and ready for upstream.  Frequent merges can
also obscure problems with the development process in your tree; they can
hide interactions with other trees that should not be happening (often) in
a well-managed branch.

That said, back merges are occasionally required; when that happens, be
sure to document *why* it was required in the commit message.  As always,
merge to a well-known stable point, rather than to some random commit.
Even then, you should not back merge a tree above your immediate upstream
tree; if a higher-level back merge is really required, the upstream tree
should do it first.

One of the most frequent causes of merge-related trouble is when a
maintainer merges with the upstream in order to resolve merge conflicts
before sending a pull request.  Again, this temptation is easy enough to
understand, but it should absolutely be avoided.  This is especially true
for the final pull request: Linus is adamant that he would much rather see
merge conflicts than unnecessary back merges.  Seeing the conflicts lets
him know where potential problem areas are.  He does a lot of merges (382
in the 5.1 development cycle) and has gotten quite good at conflict
resolution - often better than the developers involved.

So what should a maintainer do when there is a conflict between their
subsystem branch and the mainline?  The most important step is to warn
Linus in the pull request that the conflict will happen; if nothing else,
that demonstrates an awareness of how your branch fits into the whole.  For
especially difficult conflicts, create and push a *separate* branch to show
how you would resolve things.  Mention that branch in your pull request,
but the pull request itself should be for the unmerged branch.

Even in the absence of known conflicts, doing a test merge before sending a
pull request is a good idea.  It may alert you to problems that you somehow
didn't see from linux-next and helps to understand exactly what you are
asking upstream to do.

Another reason for doing merges of upstream or another subsystem tree is to
resolve dependencies.  These dependency issues do happen at times, and
sometimes a cross-merge with another tree is the best way to resolve them;
as always, in such situations, the merge commit should explain why the
merge has been done.  Take a moment to do it right; people will read those
changelogs.

Often, though, dependency issues indicate that a change of approach is
needed.  Merging another subsystem tree to resolve a dependency risks
bringing in other bugs and should almost never be done.  If that subsystem
tree fails to be pulled upstream, whatever problems it had will block the
merging of your tree as well.  Preferable alternatives include agreeing
with the maintainer to carry both sets of changes in one of the trees or
creating a topic branch dedicated to the prerequisite commits that can be
merged into both trees.  If the dependency is related to major
infrastructural changes, the right solution might be to hold the dependent
commits for one development cycle so that those changes have time to
stabilize in the mainline.

Finally
=======

It is relatively common to merge with the mainline toward the beginning of
the development cycle in order to pick up changes and fixes done elsewhere
in the tree.  As always, such a merge should pick a well-known release
point rather than some random spot.  If your upstream-bound branch has
emptied entirely into the mainline during the merge window, you can pull it
forward with a command like::

  git merge v5.2-rc1^0

The "^0" will cause Git to do a fast-forward merge (which should be
possible in this situation), thus avoiding the addition of a spurious merge
commit.

The guidelines laid out above are just that: guidelines.  There will always
be situations that call out for a different solution, and these guidelines
should not prevent developers from doing the right thing when the need
arises.  But one should always think about whether the need has truly
arisen and be prepared to explain why something abnormal needs to be done.
Commit	Line	Data
d95ea1a4 JC	1	.. SPDX-License-Identifier: GPL-2.0
	2
	3	====================
	4	Rebasing and merging
	5	====================
	6
	7	Maintaining a subsystem, as a general rule, requires a familiarity with the
	8	Git source-code management system. Git is a powerful tool with a lot of
	9	features; as is often the case with such tools, there are right and wrong
	10	ways to use those features. This document looks in particular at the use
	11	of rebasing and merging. Maintainers often get in trouble when they use
	12	those tools incorrectly, but avoiding problems is not actually all that
	13	hard.
	14
	15	One thing to be aware of in general is that, unlike many other projects,
	16	the kernel community is not scared by seeing merge commits in its
	17	development history. Indeed, given the scale of the project, avoiding
	18	merges would be nearly impossible. Some problems encountered by
	19	maintainers result from a desire to avoid merges, while others come from
	20	merging a little too often.
	21
	22	Rebasing
	23	========
	24
	25	"Rebasing" is the process of changing the history of a series of commits
	26	within a repository. There are two different types of operations that are
	27	referred to as rebasing since both are done with the ``git rebase``
	28	command, but there are significant differences between them:
	29
	30	- Changing the parent (starting) commit upon which a series of patches is
	31	built. For example, a rebase operation could take a patch set built on
	32	the previous kernel release and base it, instead, on the current
	33	release. We'll call this operation "reparenting" in the discussion
	34	below.
	35
	36	- Changing the history of a set of patches by fixing (or deleting) broken
	37	commits, adding patches, adding tags to commit changelogs, or changing
	38	the order in which commits are applied. In the following text, this
	39	type of operation will be referred to as "history modification"
	40
	41	The term "rebasing" will be used to refer to both of the above operations.
	42	Used properly, rebasing can yield a cleaner and clearer development
	43	history; used improperly, it can obscure that history and introduce bugs.
	44
	45	There are a few rules of thumb that can help developers to avoid the worst
	46	perils of rebasing:
	47
	48	- History that has been exposed to the world beyond your private system
	49	should usually not be changed. Others may have pulled a copy of your
	50	tree and built on it; modifying your tree will create pain for them. If
	51	work is in need of rebasing, that is usually a sign that it is not yet
	52	ready to be committed to a public repository.
	53
	54	That said, there are always exceptions. Some trees (linux-next being
	55	a significant example) are frequently rebased by their nature, and
	56	developers know not to base work on them. Developers will sometimes
	57	expose an unstable branch for others to test with or for automated
	58	testing services. If you do expose a branch that may be unstable in
	59	this way, be sure that prospective users know not to base work on it.
	60
	61	- Do not rebase a branch that contains history created by others. If you
	62	have pulled changes from another developer's repository, you are now a
	63	custodian of their history. You should not change it. With few
	64	exceptions, for example, a broken commit in a tree like this should be
65	explicitly reverted rather than disappeared via history modification.
66
67	- Do not reparent a tree without a good reason to do so. Just being on a
68	newer base or avoiding a merge with an upstream repository is not
69	generally a good reason.
70
71	- If you must reparent a repository, do not pick some random kernel commit
72	as the new base. The kernel is often in a relatively unstable state
73	between release points; basing development on one of those points
74	increases the chances of running into surprising bugs. When a patch
75	series must move to a new base, pick a stable point (such as one of
76	the -rc releases) to move to.
77
78	- Realize that reparenting a patch series (or making significant history
79	modifications) changes the environment in which it was developed and,
80	likely, invalidates much of the testing that was done. A reparented
81	patch series should, as a general rule, be treated like new code and
82	retested from the beginning.
83
84	A frequent cause of merge-window trouble is when Linus is presented with a
85	patch series that has clearly been reparented, often to a random commit,
86	shortly before the pull request was sent. The chances of such a series
87	having been adequately tested are relatively low - as are the chances of
88	the pull request being acted upon.
89
90	If, instead, rebasing is limited to private trees, commits are based on a
91	well-known starting point, and they are well tested, the potential for
92	trouble is low.
93
94	Merging
95	=======
96
97	Merging is a common operation in the kernel development process; the 5.1
98	development cycle included 1,126 merge commits - nearly 9% of the total.
99	Kernel work is accumulated in over 100 different subsystem trees, each of
100	which may contain multiple topic branches; each branch is usually developed
101	independently of the others. So naturally, at least one merge will be
102	required before any given branch finds its way into an upstream repository.
103
104	Many projects require that branches in pull requests be based on the
105	current trunk so that no merge commits appear in the history. The kernel
106	is not such a project; any rebasing of branches to avoid merges will, most
107	likely, lead to trouble.
108
109	Subsystem maintainers find themselves having to do two types of merges:
110	from lower-level subsystem trees and from others, either sibling trees or
111	the mainline. The best practices to follow differ in those two situations.
112
113	Merging from lower-level trees
114	------------------------------
115
116	Larger subsystems tend to have multiple levels of maintainers, with the
117	lower-level maintainers sending pull requests to the higher levels. Acting
118	on such a pull request will almost certainly generate a merge commit; that
119	is as it should be. In fact, subsystem maintainers may want to use
120	the --no-ff flag to force the addition of a merge commit in the rare cases
121	where one would not normally be created so that the reasons for the merge
122	can be recorded. The changelog for the merge should, for any kind of
123	merge, say why the merge is being done. For a lower-level tree, "why" is
124	usually a summary of the changes that will come with that pull.
125
126	Maintainers at all levels should be using signed tags on their pull
127	requests, and upstream maintainers should verify the tags when pulling
128	branches. Failure to do so threatens the security of the development
129	process as a whole.
130
131	As per the rules outlined above, once you have merged somebody else's
132	history into your tree, you cannot rebase that branch, even if you
133	otherwise would be able to.
134
135	Merging from sibling or upstream trees
136	--------------------------------------
137
138	While merges from downstream are common and unremarkable, merges from other
139	trees tend to be a red flag when it comes time to push a branch upstream.
140	Such merges need to be carefully thought about and well justified, or
141	there's a good chance that a subsequent pull request will be rejected.
142
143	It is natural to want to merge the master branch into a repository; this
144	type of merge is often called a "back merge". Back merges can help to make
145	sure that there are no conflicts with parallel development and generally
146	gives a warm, fuzzy feeling of being up-to-date. But this temptation
147	should be avoided almost all of the time.
148
149	Why is that? Back merges will muddy the development history of your own
150	branch. They will significantly increase your chances of encountering bugs
151	from elsewhere in the community and make it hard to ensure that the work
152	you are managing is stable and ready for upstream. Frequent merges can
153	also obscure problems with the development process in your tree; they can
154	hide interactions with other trees that should not be happening (often) in
155	a well-managed branch.
156
157	That said, back merges are occasionally required; when that happens, be
158	sure to document why it was required in the commit message. As always,
159	merge to a well-known stable point, rather than to some random commit.
160	Even then, you should not back merge a tree above your immediate upstream
161	tree; if a higher-level back merge is really required, the upstream tree
162	should do it first.
163
164	One of the most frequent causes of merge-related trouble is when a
165	maintainer merges with the upstream in order to resolve merge conflicts
166	before sending a pull request. Again, this temptation is easy enough to
167	understand, but it should absolutely be avoided. This is especially true
168	for the final pull request: Linus is adamant that he would much rather see
169	merge conflicts than unnecessary back merges. Seeing the conflicts lets
170	him know where potential problem areas are. He does a lot of merges (382
171	in the 5.1 development cycle) and has gotten quite good at conflict
172	resolution - often better than the developers involved.
173
174	So what should a maintainer do when there is a conflict between their
175	subsystem branch and the mainline? The most important step is to warn
176	Linus in the pull request that the conflict will happen; if nothing else,
177	that demonstrates an awareness of how your branch fits into the whole. For
178	especially difficult conflicts, create and push a separate branch to show
179	how you would resolve things. Mention that branch in your pull request,
180	but the pull request itself should be for the unmerged branch.
181
182	Even in the absence of known conflicts, doing a test merge before sending a
183	pull request is a good idea. It may alert you to problems that you somehow
184	didn't see from linux-next and helps to understand exactly what you are
185	asking upstream to do.
186
187	Another reason for doing merges of upstream or another subsystem tree is to
188	resolve dependencies. These dependency issues do happen at times, and
189	sometimes a cross-merge with another tree is the best way to resolve them;
190	as always, in such situations, the merge commit should explain why the
191	merge has been done. Take a moment to do it right; people will read those
192	changelogs.
193
194	Often, though, dependency issues indicate that a change of approach is
195	needed. Merging another subsystem tree to resolve a dependency risks
196	bringing in other bugs and should almost never be done. If that subsystem
197	tree fails to be pulled upstream, whatever problems it had will block the
198	merging of your tree as well. Preferable alternatives include agreeing
199	with the maintainer to carry both sets of changes in one of the trees or
200	creating a topic branch dedicated to the prerequisite commits that can be
201	merged into both trees. If the dependency is related to major
202	infrastructural changes, the right solution might be to hold the dependent
203	commits for one development cycle so that those changes have time to
204	stabilize in the mainline.
205
206	Finally
207	=======
208
209	It is relatively common to merge with the mainline toward the beginning of
210	the development cycle in order to pick up changes and fixes done elsewhere
211	in the tree. As always, such a merge should pick a well-known release
212	point rather than some random spot. If your upstream-bound branch has
213	emptied entirely into the mainline during the merge window, you can pull it
214	forward with a command like::
215
216	git merge v5.2-rc1^0
217
218	The "^0" will cause Git to do a fast-forward merge (which should be
219	possible in this situation), thus avoiding the addition of a spurious merge
220	commit.
221
222	The guidelines laid out above are just that: guidelines. There will always
223	be situations that call out for a different solution, and these guidelines
224	should not prevent developers from doing the right thing when the need
225	arises. But one should always think about whether the need has truly
226	arisen and be prepared to explain why something abnormal needs to be done.