原文标题:Software Configuration Management

原文链接:http://sunoano.name/ws/public_xhtml/scm.html

原文作者:Suno Ano

译文标题: 软件配置管理 - GIT篇 FIXME

译文作者:aXqd

摘要

源码管理(SCM - Source Code Management)可以被理解为软件配置管理(SCM - Software Configuration Management)的一个子集,而后者又是配置管理(CM - Configuration Management)的一部分。在本文中,缩写SCM特指软件配置管理(SCM - Software Configuration Management)。SCM关注的问题是:如果有人做了一些工作,那么我们怎样才能重现这些工作?这里我们需要的,通常并不是重现当前最终的结果,而是需要受控的重现它们增量的改变。因此,解决问题的关键,在于比较不同的结果,分析它们的差别。SCM正是“被设计用来跟踪改变的一系列活动”。SCM“通过识别可能改变的结果,在它们之间建立联系,为管理它们的不同版本定义机制,控制实际施加的更改,以及对这些更改进行审计与报告”来实现这一目标。换句话说,SCM是一组控制和管理软件开发过程的方法。

“你还没有开始使用GIT,对吧?” - 小兔子说道

[bunny pic] FIXME

开始之前的几句话

  • 每当我进入一个新的领域时,我总是从FAQ入手,我想读者也许跟我一样。因为这篇文章主要是围绕GIT(这三个毫无意义的随机字母组合),所以本文也链接至GIT的FAQ。当然,这只是因为重复提供GIT的FAQ显得毫无意义。
  • 很多人似乎一开始并不喜欢或者说并不信任GIT,我也一样。然而现在,我已经完全投入GIT的怀抱(具体请查看下面的说明)。我总是喜欢就事论事、直言不讳,在针对GIT所有的批评中,有一点中肯的是,“我们几乎很难找到,关于发布本地repository的速成文档”。直到今天(2009年2月)我们依然不得不,首先在本地创建一个所谓的bare repository,然后使用诸如scp、rsync、sftp等工具来把它移到某台远程服务器。FIXME 这在我看来,是目前GIT仅存的缺点。对于其他的一切,请继续看下面…
  • 在网上,我真的已经厌倦了看到很多人,在不断的抱怨GIT的文档写得如何差劲,GIT的文档有多少引用死循环的问题。他们认为GIT的文档,到处都在谈论bare repositories、refs、reflogs和refspecs等等概念,却从来不解释这些都是什么意思。其实,GIT的文档不仅没有引用死循环的问题,而且写得相当出色。这些概念之所以反复出现,正是由于它们是GIT的核心。所以,那些还在不断抱怨和说废话的人,请花点时间,认真看看GIT的文档。
  • 那些想要发布GIT repository的人,例如想要把repository放在某台服务器上,让别人push、pull的人,请点击这里。FIXME
  • 因为本文提供了大量的信息,那些想要找寻快速起步教程的人,请跟随我钻进兔子窝 FIXME :P

[bunny jump pic] FIXME

我在这问题上的想法

如果你想知道现在市面上,已经有哪些SCM系统,以及它们孰优孰劣 - 这里有一份列表以及对比清单。此外,这里还有一份SVN和GIT的比较清单

到目前为止(2008年8月),我主要使用GIT(这三个毫无意义的随机字母组合)来管理代码,以及其他一些软件和数据的管理工作。在那之前,我花了差不多两年时间来使用SVN - Subversion来进行相同的工作。在那之前,我使用了更多种类的SCM系统,其中包括CVS - Concurrent Versions System、GNU Arch以及Darcs。

我现在的情况是,平时主要使用GIT和SVN,时不时用一点点CVS和GNU Arch。GIT被用来管理我自己的项目或者我积极参与的项目。我也向某些使用SVN的项目贡献代码,但是尽管如此,我最主要的SVN使用方式,仅局限于从远程repository获取HEAD到本地。CVS和GNU Arch则仅被用于更新本地工作副本 - 我已不再使用它们进行开发。

粗略的讲,我之所以最后选择只使用两种(更确切的说是一种)SCM系统的原因在于,我正在试着统一我的所有工作,同时,我也试着在扔掉所有我能意识到的冗余的部分。因为我不再需要或者想要两个甚至更多提供相同功能的东西,来做同一件事情。这么做的好处是,你能省下更多的时间去做其他的事情,同时,也更能专注于所选择的东西,了解它的细节,而使自己的工作变得更有效率。

为什么选择GIT

需要注意的是,GIT和其他大多数我们熟悉的SCM系统都非常不同。Subversion、CVS、Perforce、Mercurial以及其他一些类似的增量存储系统 - 他们存储每个commit之间的差异。GIT并不这么做 - 每当我们提交时,它把项目中的所有数据,在一个树状结构中,存储为一个个快照。这是我们在使用GIT时,需要了解的一个非常重要的概念。下面容许我再阐述几点,在大量的SCM系统中,我最终选择GIT的原因:

[git pic] FIXME

  • Repository: GIT是一个分布式的SCM系统(相对应的,我们称SVN之类的SCM系统为集中式SCM。尽管有人在SVN之上,加上SVK,我发现这样使事情更加繁复之后,依然不能实现某些GIT创造性实现的功能)。对Subversion而言,每个项目都有一个单独的repository,这个repository,集中的存放在某个地方,所有的历史都保存在那个地方,我们签入、签出也都在那个地方进行。GIT则有不一样的工作方式。项目树的每个副本(我们也称之为工作拷贝)都携带它自己的repository(在项目树根目录的.git目录中)。所以我们能够同时拥有本地和远程的分支。当然,我们也可以创建所谓的不依附于任何工作拷贝的bare repository - 这往往在我们发布repository时,特别有用。
    • GIT能快速的导入并合并patch,这使得GIT允许单个维护者,以非常高的速度,处理发送来的patch。而当patch到来的速度太高时,‘git pull’为维护者提供了一个便捷的方式来将这部分工作转交给他人,于此同时,维护者依然能有选择的审阅某些特定的patch。
    • 因为GIT分布的天性,因为中央repository损坏而丢失数据是不可能的。如果有N个repositories,那么我们便有N重的冗余备份.
    • 因为每个开发者的repository都有相同完整的项目历史的拷贝,没有哪个repository是特殊的。所以,其他开发者能很容易的接管项目维护工作。这可能是通过大家一致的协定,或者仅仅因为某个维护者不再能积极的工作或者很难与之共事。
    • GIT(以及其他所有分布式系统)避免冲突的另一个原因,是社区协作方式的改变,而这种改变正是源于repository的分布式天性。因为我们不断的从各个源pull代码,提交单个、原子的改变成为参与的必要条件。如果有人是疯子,只会提交大量散乱的代码,我们可以简单的不从他们pull代码即可。这不像集中式的系统,我们不得不处理所有提交的改变。
    • 缺少核心提交组,也意味着我们几乎不需要在关于谁进谁出的问题上纠缠不休。
  • Metadata: SVN非常令人讨厌的一点,便是它到处存放metadata。不像GIT,仅仅在一个单独的地方 - 位于工作拷贝根目录的.git目录中存放。所有的东西,都在那个目录里,.git目录并不像.svn目录一般,到处都是。
  • URLs: 在Subversion中,URL指定repository的地址以及repository的路径,所以我们自己管理repository的目录结构及其意义。通常我们有../trunk/、../branches/和../tags/目录。在GIT中,URL仅仅指定repository的地址,并且它通常包含branches和tags。branches中有一个是默认的。(通常命名为master)
  • Revisions: Subversion用单调递增的十进制数 - IDs(Identifiers)定义版本。IDs通常很小(尽管在大的项目中,它很容易达到几百、几千)。这种方式对于GIT来说是不实际的。GIT使用SHA1 IDs来定义版本。SHA1 IDs是16进制160位的数。这乍看起来令人畏惧,但是实际上,这并不是什么大问题。你能简单的用HEAD指向最近的版本,用HEAD^指向再前面一个版本,用HEAD^^或者HEAD~2(以此类推)指向再前面的版本。粘贴复制也能解决不少问题,不过一般我们只用写版本号的前面几位(只要它是惟一的),GIT会自动补足剩下的位数。我们能够用版本指示符(Revision Specifiers)完成一些更高级的操作,请参考'git rev-parse'的manpage。
  • Commands: GIT命令一般采用这种形式 - 'git <command>'。以前,你也能使用'git-<command>'的形式。但现在,后面这种形式已经不再提倡,并且在v1.6之后被彻底移除。熟悉类UNIX系统的CLI控们,较之其他SCM臃肿的UI,会对GIT感到无比亲密和熟悉。GIT能让你在很短的时间内上手。
  • Commits: 每个commit都包含一个作者,一个提交者。前者记录谁在何时做了这个改变,后者则记录谁在何时提交了这个改变。(GIT被设计得能很容易的使用邮件发送来的patch,在这种情况下,作者与提交者可能不同)。
  • Net: 在集中式SCM系统中(例如:SVN),开发者为了能有效的开发,需要持续的高速因特网连接。同理,如果他们想要跟踪一个函数在多个文件中如何迁移,他们也必须访问网络来获得这些信息。而当使用GIT时,我们只有在从远程branch上push/pull时,才需要连接网络。
  • Speed/Resources: 较之其他的SCM系统,GIT非常快,并且比我以前使用的SCM系统更节省磁盘空间。
  • Community: GIT拥有非常强健而庞大的社区(SVN也如此),所以无论你选择IRC(Internet Relay Chat)、ML(MailList)或者参与Sprint,开发和支持都很棒。

当然我还能举出其他的理由,但是,以上这些是我选择GIT的主要原因。我甚至从其他SCM系统中导入代码到GIT,然后在GIT中工作,当我搞定之后,再把代码导回原来某个项目所使用的特别的SCM系统。我将在以后详细叙述这部分内容。

GIT词汇表与概念

我决定有意把这部分放在这里而不是本文的最后。至少你能粗略看一眼这些概念,然后阅读本文的其余部分,最后再返回来详细阅读。

词汇表

  • alternate object database
    通过可替换机制,repository的一部分,能够从其他对象数据库中继承过来。
  • bare repository
    bare repository通常是精心命名的一个以.git结尾的目录,这个目录中,没有签出任何处于版本控制下的文件副本。也就是说,通常在隐藏的.git子目录中的git管理和控制文件,被直接放在了repository.git目录中,并且没有其他的文件存在和被签出(例如,没有工作目录等等)。bare repository通常由公共的repository的发布者创建。
  • blob object
    无类型的对象,例如:某个文件的内容。
  • branch
    版本的无环图(例如:被称为branch head的版本的完整历史记录)。branch heads被存储在.git/refs/heads/目录中。
    branch是一条活跃的开发线。在某个branch上最近的提交指的是branch的顶端。前面提到的branch head正是指向这个branch的顶端,并随着在这条branch上开发进展而不断前移。单个的git repository能跟踪任意数量的branches,但是你当前的工作树和它们中的某一个(当前签出的branch)关联,并且HEAD指向那个branch。
  • cache
    过时的称谓,现在叫index。
  • chain
    对象链表,链表中的每个对象都包含下一个对象的引用(例如:一个commit的下一个对象的引用,可能是它的父母).
  • changeset
    BitKeeper/cvsps说commit。因为git并不存储改变,而是存储状态,所以git用这个术语没有太大意义。
  • checkout
    更新工作树到对象数据库中某个版本的动作。
  • cherry-picking
    在SCM的术语中,cherry pick的意思是从一系列的改变(通常指commit)中,选出一个子集,并且在另外一个codebase之上,应用选出的这一系列改变。在GIT中,我们使用'git cherry-pick'命令来实现这一功能 - 从已存在的commit中抽取改变,并应用在当前branch的顶端,作为一个新的commit。
  • clean
    我们说一个工作树是clean的,当且仅当它与当前HEAD指向的版本保持一致。请参考“diry”。
  • commit
    作为一个名词:git记录的历史中的一个点;一个项目的整个历史正是由一系列相互关联的commits构成。在GIT中被频繁使用的词 - commit,在其他版本控制系统中,也被称为revision或者version。commit也被用来作为commit object的缩写
    作为一个动词: 将项目的状态存储一个快照到git历史中的动作。创建新的commit,来表示当前index的状态,并且把HEAD指针向前推进到这个新的commit。
  • commit object
    包含关于某个特别版本信息的对象。例如:父母、提交者、作者、日期以及指向存储版本树根的树对象。
  • core git
    GIT的基础数据结构和工具程式。只暴露有限的源码管理工具。
  • DAG
    有向无环图(Directed acyclic graph)。commit objects形成一个有向无环图,因为它们有父母(有向),并且commit objects形成的图是无环的(没有这样一条链,它的开头和结尾是同一个对象)。
  • dangling object
    An unreachable object which is not reachable even from other unreachable objects; a dangling object has no references to it from any reference or object in the repository. See here for more information.
  • detached HEAD
    Normally the HEAD stores the name of a branch. However, git also allows you to check out an arbitrary commit that is not necessarily the tip of any particular branch. In this case HEAD is said to be detached.
  • dircache
    See index.
  • directory
    The list you get with ls.
  • dirty
    A working tree is said to be dirty if it contains modifications which have not been committed to the current branch.
  • ent
    Favorite synonym to tree-ish by some total geeks. Avoid this term, in order to not to confuse people.
  • evil merge
    An evil merge is a merge that introduces changes that do not appear in any parent.
  • fast forward
    A fast-forward is a special type of merge where you have a revision and you are merging another branch's changes that happen to be a descendant of what you have. In such these cases, you do not make a new merge commit but instead just update to his revision. This will happen frequently on a tracking branch of a remote repository.
  • fetch
    Fetching a branch means to get the branch's head ref from a remote repository, to find out which objects are missing from the local object database, and to get them, too. See also man 1 git-fetch.
  • file system
    Linus Torvalds originally designed git to be a user space file system, i.e. the infrastructure to hold files and directories. That ensured the efficiency and speed of git.
  • git archive
    Synonym for repository (for arch people).
  • grafts
    Grafts enables two otherwise different lines of development to be joined together by recording fake ancestry information for commits. This way you can make git pretend the set of parents a commit has is different from what was recorded when the commit was created. Configured via the .git/info/grafts file.
  • hash
    In GIT's context, synonym to object name.
  • head
    A named reference to the commit at the tip of a branch. Heads are stored in $GIT_DIR/refs/heads/, except when using packed refs. (See man 1 git-pack-refs.)
  • HEAD
    The current branch. In more detail: Your working tree is normally derived from the state of the tree referred to by HEAD. HEAD is a reference to one of the heads in your repository, except when using a detached HEAD, in which case it may reference an arbitrary commit.
  • head ref
    A synonym for head.
  • hook
    During the normal execution of several GIT commands, call-outs are made to optional scripts that allow a developer to add functionality or checking. Typically, the hooks allow for a command to be pre-verified and potentially aborted, and allow for a post-notification after the operation is done. The hook scripts are found in the $GIT_DIR/hooks/ directory, and are enabled by simply removing the .sample suffix. More information can be found with man 5 githooks.
  • index
    A collection of files with stat information, whose contents are stored as objects. The index is a stored version of your working tree. Truth be told, it can also contain a second, and even a third version of a working tree, which are used when merging.
  • index entry
    The information regarding a particular file, stored in the index. An index entry can be unmerged, if a merge was started, but not yet finished (i.e. if the index contains multiple versions of that file).
  • master
    The default development branch. Whenever you create a git repository, a branch named master is created, and becomes the active branch. In most cases, this contains the local development, though that is purely by convention and is not required.
  • merge
    As a verb: To bring the contents of another branch (possibly from an external repository) into the current branch. In the case where the merged-in branch is from a different repository, this is done by first fetching the remote branch and then merging the result into the current branch. This combination of fetch and merge operations is called a pull. Merging is performed by an automatic process that identifies changes made since the branches diverged, and then applies all those changes together. In cases where changes conflict, manual intervention may be required to complete the merge.
    As a noun: unless it is a fast forward, a successful merge results in the creation of a new commit representing the result of the merge, and having as parents the tips of the merged branches. This commit is referred to as a merge commit, or sometimes just a merge.
  • merge base
    The common ancestor of two or more commits.
  • object
    The unit of storage in git. It is uniquely identified by the SHA1 of its contents. Consequently, an object can not be changed without changing its SHA1 hash.
  • object database
    Stores a set of objects, and an individual object is identified by its object name. The objects usually live in $GIT_DIR/objects/.
  • object identifier
    Synonym for object name.
  • object name
    The unique identifier of an object. The hash of the object's contents using the SHA1 (Secure Hash Algorithm 1) and usually represented by the 40 character hexadecimal encoding of the hash of the object (possibly followed by a white space).
  • object type
    One of the identifiers commit, tree, tag or blob describing the type of an object.
  • octopus
    To merge more than two branches (tentacles) into one resulting branch (head) — thus the octopus metaphor.
  • origin
    The default upstream repository. Most projects have at least one upstream project which they track. By default origin is used for that purpose. New upstream updates will be fetched into remote tracking branches named origin/name-of-upstream-branch, which you can see using git branch -r.
  • pack
    A set of objects which have been compressed into one file (to save space or to transmit them efficiently).
  • pack index
    The list of identifiers, and other information, of the objects in a pack to assist in efficiently accessing the contents of a pack.
  • parent
    A commit object contains a (possibly empty) list of the logical predecessor(s) in the line of development, i.e. its parents.
  • pickaxe \\The term pickaxe refers to an option to the diffcore routines that help select changes that add or delete a given text string. With the —pickaxe-all option, it can be used to view the full changeset that introduced or removed, say, a particular line of text. See man 1 git-diff.
  • plumbing
    Cute name for core git.
  • porcelain
    Cute name for programs and program suites depending on core git, presenting a high level access to core git. Porcelains expose more of a SCM interface than the plumbing.
  • pull
    Pulling a branch means to fetch it and merge it. See also man 1 git-pull.
  • push
    Pushing a branch means to get the branch's head ref from a remote repository, find out if it is an ancestor to the branch's local head ref is a direct, and in that case, putting all objects, which are reachable from the local head ref, and which are missing from the remote repository, into the remote object database, and updating the remote head ref. If the remote head is not an ancestor to the local head, the push fails.
  • reachable
    All of the ancestors of a given commit are said to be reachable from that commit. More generally, one object is reachable from another if we can reach the one from the other by a chain that follows tags to whatever they tag, commits to their parents or trees, and trees to the trees or blobs that they contain.
  • rebase
    To reapply a series of changes from a branch to a different base, and reset the head of that branch to the result.
  • ref
    A 40-byte hex representation of a SHA1 or a name that denotes a particular object. These may be stored in $GIT_DIR/refs/.
  • reflog
    A reflog shows the local history of a ref. It is a mechanism to record when the tip of branches are updated. In other words, it can tell you things like what the 3rd last revision in this repository was, and what was the current state in this repository, yesterday 9:14pm. See man 1 git-reflog for details. See here for when a reflog might turn out to be useful.
  • refspec
    A refspec is used by git fetch and git push to describe the mapping between remote refs and local refs. They are combined with a colon in the format <src>:<dst>, preceded by an optional plus sign, +. For example: git fetch $URL refs/heads/master:refs/heads/origin means grab the master branch head from the $URL and store it as my origin branch head. And git push $URL refs/heads/master:refs/heads/to-upstream means publish my master branch head as to-upstream branch at $URL. See also man 1 git-push.
  • repository
    A collection of refs together with an object database containing all objects which are reachable from the refs, possibly accompanied by meta data from one or more porcelains. A repository can share an object database with other repositories via alternates mechanism.
  • resolve
    The action of fixing up manually what a failed automatic merge left behind.
  • revision
    A particular state of files and directories which was stored in the object database. It is referenced by a commit object.
  • rewind
    To throw away part of the development, i.e. to assign the head to an earlier revision.
  • SCM
    Software Configuration Management. As a noun, it mostly describes a particular tool or set of tools. As a verb it is understood as literally doing software configuration and along with various management tasks.
  • SHA1 (Secure Hash Algorithm 1)
    SHA1 hash. And in GIT context a synonym for object name.
  • shallow repository
    A shallow repository has an incomplete history some of whose commits have parents cauterized away (in other words, git is told to pretend that these commits do not have the parents, even though they are recorded in the commit object). This is sometimes useful when you are interested only in the recent history of a project even though the real history recorded in the upstream is much larger. A shallow repository is created by giving the —depth option to git-clone(1), and its history can be later deepened with git-fetch(1).
  • symref
    Symbolic reference: instead of containing the SHA1 id itself, it is of the format ref: refs/some/thing and when referenced, it recursively dereferences to this reference. HEAD is a prime example of a symref. Symbolic references are manipulated with the git-symbolic-ref(1) command.
  • tag
    A ref pointing to a tag or commit object. In contrast to a head, a tag is not changed by a commit. Tags (not tag objects) are stored in $GIT_DIR/refs/tags/. A git tag has nothing to do with a Lisp tag (which would be called an object type in GIT's context). A tag is most typically used to mark a particular point in the commit ancestry chain.
  • tag object
    An object containing a ref pointing to another object, which can contain a message just like a commit object. It can also contain a (GPG/PGP) signature, in which case it is called a signed tag object.
  • topic branch
    A regular git branch that is used by a developer to identify a conceptual line of development. Since branches are very easy and inexpensive, it is often desirable to have several small branches that each contain very well defined concepts or small incremental yet related changes.
  • tracking branch
    A regular git branch that is used to follow changes from another repository. A tracking branch should not contain direct modifications or have local commits made to it. A tracking branch can usually be identified as the right-hand-side ref in a Pull: refspec.
  • tree
    Either a working tree, or a tree object together with the dependent blob and tree objects (i.e. a stored representation of a working tree).
  • tree object
    An object containing a list of file names and modes along with refs to the associated blob and/or tree objects. A tree is equivalent to a directory.
  • tree-ish
    A ref pointing to either a commit object, a tree object, or a tag object.
  • unmerged index
    An index which contains unmerged index entries.
  • unreachable object
    An object which is not reachable from a branch, tag, or any other reference.
  • working tree
    The tree of actual checked out files. The working tree is normally equal to the HEAD plus any local changes that you have made but not yet committed.
 
software-configuration-management/translation.txt · Last modified: 2009-08-11 22:48 by axqd
 
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki