`

Mercurial vs Git

阅读更多

http://rg03.wordpress.com/2009/04/07/mercurial-vs-git/

Mercurial vs Git


There are many blog posts and articles all over the Internet providing comparisons between Git and Mercurial. Most of them only briefly describe the main differences and then try to decide which one is better. However, I didn’t find many articles explaining the differences in detail from a neutral point of view and that’s what I’ll try to do here, also providing links to relevant documentation. For simple uses like a single user managing a private project, Git and Mercurial are equivalent. Their workflow differs a little due to the underlying differences, but their usage doesn’t seem to be very far apart. However, those differences start being noticeable when you collaborate with more users and, the more complex the project is, the more you will notice them. Hopefully, this information could be useful to Mercurial users wanting to know how Git works and vice versa, as well as novice users who are not using either one yet. In that case, I suggest you to experiment a little bit with both instead of trying to make a theorethical decision based on what you read in the documentation or in articles like this one. I will focus on four main aspects.

  1. The repository structure, that is, how each one of them record changes and history.
  2. The noticeable differences in how they manage the branching process.
  3. Documentation.
  4. Their two popular hosting sites, GitHub and Bitbucket.

Repository structure

Mercurial and Git differ a lot in the way they store changes and history. Some of those differences are irrelevant from the user’s perspective while others are not. I won’t provide a very verbose explanation of either one. The specific details are well covered in chapter 3 of “Mercurial: The Definitive Guide” for Mercurial and the “Git Object Model” chapter in the Git Community Book . A very brief description of each one: Git does not store differences between different versions of the same file. For each new version of a file, Git stores a copy of that version. Mercurial, on the other hand, stores file differences for a limited number of times and a new full copy from time to time in order to optimize the time needed to reconstruct a particular revision. Also, it uses a binary comparison algorithm that works for both binary or text files. Knowing that, I think it’s possible to read any of those book chapters and understand the details. If you read both chapters, you will notice that the model in Git looks more simple, while the way Mercurial stores changes is not trivial. A consequence of this underlying model is that Git repositories, without repacking, tend to use more disk space than Mercurial repositories. Git can pack and compress objects to change this, but this means that you need to run a command from time to time. Also, this process can take quite a lot of time depending on the size of the repository and the amount of unpacked objects. However, its model is probably easier to understand once you have access to a proper explanation.

They also share similarities. In both of them, history is represented as a sequence of commits, each commit being identified by a character sequence that turns out to be the SHA1 sum of, well, something . Each commit has one or two parent commits, allowing for different branches to exist. This connection is what creates the concept of history in the repository (things were in a given state, then a change happened from that point and we ended up with this new state). The exact wording people use to describe this is “history is a directed acyclic graph”. However, many people won’t know what exactly is a directed acyclic graph, and things can only get worse when it’s called DAG . I’m not sure the Git or Mercurial creators thought “I’m going to make the project history be a directed acyclic graph”. Being a DAG is probably more of a consequence than a starting point in the design of either tool. Knowing it’s a DAG probably won’t help you understand anything really.

Differences in branching

In both Git and Mercurial, a given state (using Git names, a commit; using Mercurial names, a changeset) can be the parent for more than one other state. This means that the project history has the notion of branches in the sense that history can diverge at some point. Using an ASCII diagram:

        C ···
       /
A --- B
       \
        D ···

State B can be the starting point for states C and D. C and D would, somehow, indicate that the parent state is state B. In both Git and Mercurial, too, a state can have more than one parent. This means that it is possible to join branches that were once separated:

··· V --- W
           \
            Z ···
           /
··· X --- Y

Z would somehow indicate that its parent states would be W and Y.

And despite sharing all that in common, the way they export this functionality to the user is quite different, and workflows differ a lot in how you are expected to proceed in different situations. The most common expression you will hear about this is “in Git, heads or branches are explicit, while in Mercurial they are implicit”. But what does this really mean? I read the previous expression in several places and didn’t fully understand what it meant until I read a very useful document that appeared in Hacker News. It’s “Understanding Git Conceptually” . This tutorial or explanation to Git fails to explain properly the object model described in the Git book chapter I mentioned above, but the Git book lacks, as of the time I’m writing this, an explanation on Git branching as good was the one described in this tutorial. I recommend you to read it at some point after having read the chapter, but not too late. It is better to understand that from the beginning.

Briefly, Mercurial doesn’t have the notion of a branch or head as something by itself (I know I’m lying, but read until the end). In Mercurial, a branch is present because history diverged at some point, and a head is a history state that has no children. A head is not explicitly marked as a head, and there is no data structure called a branch. It’s implicit by looking at the project history, more or less.

However, in Git, this information is explicit. There is a type of data structure present in the repository called a “head”. As the tutorial mentioned above explains, a head is like an arrow pointing to a specific commit or state in the history of the project, giving it a name. It’s possible to have several heads in one repository, and one of them is always the currently active one you’re working on. When you commit changes, the current active head points to the commit that will be the parent of the new one you are creating. After it has been created, the arrow is moved “forward” to point to the new commit. Very important too, these arrows always have a name, be it “master” or “experimental” or “testingsomething” or “whoohoo”. Finally, these arrows or heads are created or destroyed as needed. When you branch, you create a new one. When you merge, you no longer need one of the arrows, but you can keep it if you want to.

Going back to Mercurial, you don’t need to do anything special to create a new branch. If someone (or even yourself) starts working on the repository from a point in time, and different commits are created starting at that point, two anonymous branches will automagically appear implicitly. At this point they reside in two different repositories. If you then pull changes from one of them to the other, the two anonymous branches will also automagically (implicitly) appear in the same repository. At that point, Mercurial suggests you to merge both to continue committing changes. You can still force Mercurial to go to any point in history and create a commit there. The commit can create a new branch or extend an existing one. However, working this way can be a bit confusing unless you use tags, named branches or a GUI to see the project history graphically.

In Git, however, you cannot do that. One of the branches is going to need a new name, like “work_on_feature_X”. Actually, you can create anonymous branches like in Mercurial but it’s not the common practice and it would produce an error in old versions. So, for the rest of this document, we will suppose it still produces the error. Why? On the one hand, you worked on the default branch and moved the head forward. On the other hand, someone did the same in parallel and ended up with the head somewhere else. If you then try to pull changes, the head would either have to point to your last commit, or to their last commit. That’s why you need to give names to heads (or branches, at this point you can guess both terms are nearly equivalent). The fact that you can (and probably should) remove one of the heads after merging work means that your branch name doesn’t really have to be unique or special. Also, in Git you can pull changes without being forced or suggested to merge. After all, one of the heads is the active one and the fact that you have another head with another name pointing to another branch does not prevent you from committing more changes to your active head and moving it forward. You can merge whenever you want to and change the active head at will. The usual practice in Git to have named branches for everything makes it a bit easier to jump to any head without confusion and getting an idea of what you were doing in that branch.

While I still have to mention Mercurial’s named branches, these differences already translate to different workflows for both Git and Mercurial.

In Mercurial, branching is simpler, as you don’t need to do anything special. You simply clone the repository and start working. When you’re done, you (or someone else) pulls-and-merges. This is also needed when you want to branch your own repository. Let’s suppose you have a repository which is the official branch and want to start working on an experimental feature just like other developer, in its own branch so as not to mess up your main copy. To do that, you clone the repository to some other directory in your hard drive. In addition, as pulls and merges are a bit tied, you should keep different branches in different repository clones until you’re ready to merge.

In Git, the process is not so simple but it is more flexible. When working on a feature, you should create a new named branch explicitly. This is specially important if you’re a Mercurial user. So if you want to contribute something to a project, you clone the repository but do not immediately start working. First, you should create a branch for your work. The advantage is that you are likely to work with the same directory all the time. Also, the explicit heads allow you to continue working after a pull, before merging, and its named heads help you track what you were doing. In Mercurial, this is usually achieved by giving useful names to your repository clones. If your main repository sits at directory “foo”, you will usually clone that repository to another one called “foo-new-feature” to know what you were working on in that branch.

Despite everything I said above about Mercurial branches being anonymous, in Mercurial there are also named branches. However, they are a different concept. In Mercurial, the name of the branch is stored with each commit. This means that branches cannot be deleted as they are in Git (in Git, it was simply deleting the arrow). You can create new named branches and commit changes to them, and can merge work from one branch to another like you merge anonymous branches. However, the inability to delete a named branch means that they should have unique names, and that they should be used for a different purpose. My impression is that, in Git, there is no difference between a short-term branch (a branch created to implement a new feature or fix a bug) and a long-term branch (like creating a branch for a stable release and only put commits in that branch to fix security issues or annoying bugs). In Mercurial, anonymous branches are used for the first case and named branches are used for the second case.

Documentation

The documentation quality varies from one project to another. Mercurial has always had its documentation well organized. Its manpages are good and describe the features in a comprehensible way. Some time ago, and still in some aspects, the documentation of Git was not very good. It has improved a lot, however, and many unofficial books and tutorials have appeared to fill the gaps in the official documentation, the most notable example being the community book I mentioned above. Still, as you can see, its documentation is still not fully unified. For example, for an aspect as important as branching, I would have expected the book to have a clear and comprehensive explanation such as the one present in the tutorial “Understanding Git Conceptually”, also mentioned above. However, the book doesn’t have one, and the tutorial doesn’t have the beautiful and easy graphical explanation of the object model as the one in the book. So, to fully understand Git when you’re still a novice user, I think it’s still partly true that you cannot read one single document. You need to get information from various sources.

This changes when you already know the principles of Git and are looking for specific information about a command. The manpages of Git are very well written and provide lots of clear examples, even with ASCII schemes like the ones I used above.

github and bitbucket

GitHub and Bitbucket are two sites that provide hosting for projects using Git or Mercurial, respectively, to manage source code. They both have nonfree plans that give projects additional hosting services or higher limits in several aspects, but they both have free plans so any developer can host projects in it. GitHub appeared first and is probably responsible for a good amount of Git’s popularity. It provided an easy way to make distributed development. Any user can register on the site and host their own repositories, fork existing ones and communicate with other projects easily by requesting pulls, etc. Bitbucket appeared later as a clone of GitHub for people prefering Mercurial to Git. Nowadays, they differ a bit in some aspects but they allow you to do exactly the same: upload a repository and give you a project wiki you can use to write news, documentation, FAQs or whatever you want. Basically, your project can live by itself being hosted completely in GitHub or Bitbucket, both the source code and its documentation.

Despite sharing a lot of things in common, there are things you can miss in GitHub from Bitbucket and vice versa. Some examples:

  • GitHub has statistics.
  • GitHub is more popular.
  • Bitbucket has an integrated issue tracker.
  • Bitbucket’s wiki is versioned (you can work on it from your local hard drive and then push changes).

Conclusion

I hope this comparison is easy enough to understand for novice users to both programs, and I hope to have been specific enough for experts in one of the two programs wanting to use or know the other one. I would say that both Git and Mercurial seem to be converging over time in many of the additional features, while keeping the base a bit different. For example, some months ago I checked Git and it didn’t have a feature I loved from Mercurial: the ability to create a bundle (a file containing a group of commits/changesets) that could be sent by email or using a USB stick. Nowadays, it can do that too. From Git, I loved the ability to put your local changes aside (git-stash operation) to do something and being able to select specific difference chunks in files to be committed (you don’t need to commit all the changes inside a single file). Nowadays, Mercurial can do that too. And I suppose the same goes for their respective hosting sites GitHub and Bitbucket. I feel both will converge to provide the same functionality.

I have not talked about the Git staging area intentionally, leaving it for the end. It is well described in most tutorials and documents you can read about it, but let’s only mention that Git introduced a new concept that other tools lack (for good or for bad), which is the staging area. In Git, when you want to commit something you have to “prepare” the commit first, indicating which content you want to commit. This is is performed with the “add” command. With it, you will take snapshots of the current state of files and record them in an area ready to be committed. This provides a lot of flexibility when creating commits and performing some operations, but it also has its dangers. For example, if you modify a file, prepare it to be committed with “git add file” and then modify the file again to make a minor correction, you need to remember to add it again, or you will commit the file as it was before the correction. The commit operation has the “-a” option to minimize the probability of making this mistake.

Another important comment I need to make before closing this article is that Git has a lot of operations to manipulate the project’s history (e.g. git-rebase), while Mercurial does not. In Mercurial, the concept of history is a bit more immutable. Some people will like the immutability of Mercurial, while some others will prefer to be able to mutate history like Git allows you to do.

The only opinion I’m going to give over all of this is that I think both Git and Mercurial are great.

Some time later…

Events that happened, or things I found out, after writing the article above.

分享到:
评论

相关推荐

    Mercurial to Git

    这个命令会将 Mercurial 仓库的全部历史记录导出为 Git 可理解的格式,并通过 `git fast-import` 直接导入到一个新的 Git 仓库中。`--date-format` 参数用于指定日期格式,`--author` 参数确保作者信息被正确保留。 ...

    Git超级简明手册

    版本控制系统的历史中出现了多种类型,例如CVS和SVN都是早期的集中式版本控制系统,而Mercurial和Git则是较新的分布式版本控制系统。 CVS使用文件锁的机制来确保同一时间只有一个用户可以编辑一个文件,这避免了...

    Python库 | hg-git-0.2.5.tar.gz

    总的来说,hg-git-0.2.5是一个用于Python的库,目的是在Mercurial和Git之间建立桥梁,使得两个版本控制系统的用户可以无障碍地交流代码。它对于那些同时使用这两种版本控制系统的开发者来说是非常有价值的工具。理解...

    Smart GIT绿色免安装,内置GIT,Mercurial环境,带注册机

    Smart GIT集成了Git和Mercurial两种分布式版本控制系统的支持,使得在不同版本控制系统之间切换变得方便快捷,用户不再需要单独安装这两个软件。 Git是一种强大的分布式版本控制系统,广泛应用于软件开发中,用于...

    SDL:SDL的自动非官方镜像,它将更改从Mercurial转换为Git

    标题“SDL:SDL的自动非官方镜像,它将更改从Mercurial转换为Git”表明这个项目是关于SDL(Simple DirectMedia Layer)的一个非官方镜像,它已经进行了版本控制系统的变化,从原先使用的Mercurial(Hg)迁移到了Git...

    Mercurial-3.6.3-x64

    **Mercurial与Git的关联** 虽然标题中提及的是Mercurial,但标签提到了"Git"。Git同样是一款非常流行的分布式版本控制系统,广泛应用于开源社区和企业开发。尽管两者有相似之处,如都是分布式模型,支持分支和合并...

    版本控制工具mercurial权威指南

    这一点与Git十分相似,因此作者提到学习Mercurial后,Git的使用也可以迎刃而解。事实上,Mercurial与Git确实都是分布式版本控制系统,它们在理念上有所共通,但是在命令行、工作流程等方面存在一定的差异。 在描述...

    Git讲义 .pdf

    与之相对的分布式版本控制工具,如Git、Mercurial、Bazaar和Darcs,每个开发者都拥有完整的项目副本,包括完整的版本历史记录。 Git是由Linux之父Linus Torvalds开发的一款免费、开源的分布式版本控制系统,专为...

    Mercurial 参考手册

    Mercurial与Git类似,都是分布式版本控制系统,但Mercurial的命令语法更直观,对于新手来说更易上手。而Git在社区支持和插件生态方面可能更强大。 总的来说,《Mercurial 参考手册》提供了关于Mercurial的全面指导...

    git内部培训教程

    - 例如:Git, Mercurial等。 - 每个开发者工作站上都有一份完整的代码库副本,包括完整的历史记录。这意味着开发人员可以在本地提交更改,然后选择性地将这些更改推送到其他仓库或从其他仓库拉取更改。 #### Git...

    Mercurial6.2.2.zip

    而“git 分布式 mercurial”的标签,则将Mercurial与另一种著名的DVCS——Git进行了关联,暗示我们将对比两者之间的异同。 Mercurial的核心概念在于其分布式特性。与集中式的版本控制系统(如CVS或SVN)不同,每个...

    git使用培训PPT

    - **分布式(Git、Mercurial)**:分布式版本控制系统如Git和Mercurial允许每个开发者的本地机器上都有完整的版本库副本,这样即使服务器宕机也不会影响开发进度,同时也极大地增强了系统的健壮性和灵活性。...

    PRO_GIT专业的GIT指导手册

    - **DVCS特点**:如Git、Mercurial等,这些系统允许客户端完整地镜像原始代码仓库,这意味着即便中心服务器出现问题,也可以从任意客户端恢复数据。 - **优势**: - **灵活性**:用户可以与多个远程仓库进行交互。 ...

    sourceTree git 图形化工具

    SourceTree 是 Windows 和Mac OS X 下免费的 Git 和 Hg 客户端,拥有可视化界面,容易上手操作。同时它也是Mercurial和Subversion版本控制系统工具。支持创建、提交、clone、push、pull 和merge等操作。

    几款版本控制工具SVN、GIT、CVS及Mercurial的比较.pdf

    "版本控制工具比较:SVN、GIT、CVS及Mercurial" 版本控制是软件开发过程中的一个重要环节,用于跟踪和管理代码的变化。有多种版本控制工具可供选择,每种工具都有其特点和优缺。下面将对 SVN、GIT、CVS 及 ...

    git详解-PDF

    - **分布式版本控制系统**(DVCS):如Git、Mercurial等,不仅在服务器端保存所有文件的版本库,每个用户的机器上几乎都有一个完整的版本库副本。这意味着即使中央服务器完全崩溃,也可以用任意一个克隆的版本库恢复...

Global site tag (gtag.js) - Google Analytics