Clones
As I mentioned in the start of this section, I said that Git was a distributed VCS, and I used the word distributed as opposed to centralized to mean that Git does not assume a centralized server hosting your Git repository. That means that every “copy” of a Git repository is a self-sufficient repository of its own.
However, a simple copy is probably not what you want. The advantage of using a distributed VCS is that you can push
(pull
) changes to (from) another repository. In fact, you can set a copy of one repository to pull
changes from one repository and push
changes to a different copy, or vice versa.
Let’s take a look at a simple way to copy a repository using git clone
.
cd
if [ -d "my_clone" ]; then
rm -rf my_clone
fi
git clone ~/my_project my_clone
Let’s remind ourselves what the log looks like in the my_project
repository.
cd ~/my_project
git log --oneline
And now let’s look at the log of our new clone.
cd ~/my_clone
git log --oneline
Notice that there is extra information in the clone. Just like in the my_project
repository, it indicates that you are positioned at the end (HEAD
) of the master
branch’s history, ready to add your next commit.
However, it also says that this position in the history corresponds to the origin/master
and origin/HEAD
. What are those?
Remotes
That origin
thing corresponds to the “original repository.” To see what origin
corresponds to, we need to use git remote -v
(the -v
says to be verbose and display more information).
git remote -v
This tells you that the origin
name is just a short-hand notation for the original repository my_project
. You can also see that origin
is being used for both fetch
(or pull
) and push
operations. To understand how this works, we need to go back to our original repository and make some more commits.
Pulls
We are now going to go back to our original repository and add some commits to it. Then we are going to pull
those commits into our clone.
cd ~/my_project
echo "Even more text." >> file2.txt
git add file2.txt
git commit -m "Adding even more text to file2."
git log --oneline
Now let’s go back to our clone and see if anything changed.
cd ~/my_clone
git log --oneline
Notice that things haven’t changed in our clone. But the new commit we added to the origin
repository doesn’t show up.
To get the new commit into my_project
into our clone, we need to do a git pull
.
git pull
git log --oneline
Now our new commit shows up and has been added to our clone.
What happens if we add commits to our clone, though?
Pushes
Let’s now make a commit to our clone.
echo "Random text" >> file3.txt
git add file3.txt
git commit -m "Adding random text to file3."
Let’s check the status of our clone.
git status
The master
branch on our clone points to the master
branch on the origin
, and we can see from the status message that our clone is 1 commit ahead of origin/master
. To send the commits we made to our clone to the origin
, we just need to git push
them. …sort of.
Unfortunately, you can’t just do a simple git push
because the origin
repository currently has the master
branch checked out. So, instead, we say that we are going to push
the clone’s master
branch into a new branch called newbranch
on the origin
repository.
git push origin master:newbranch
Now, if we go back to our origin
repository and look at our branches, we see the following.
cd ~/my_project
git branch
And to get the new change into the origin/master
branch, we just have to merge.
git merge newbranch
git branch -d newbranch
git log --oneline
And we can now see that the new change we made to the clone has been pushed up to the origin
repository.
“Pull Requests”
Why couldn’t we just do a simple git push
like we could do a git pull
? Why did it have to get so complicated?
The answer to that hase to do with how Git repositories are supposed to work, and how to keep them safe from external pushes while you are doing work in them. Image that you were doing some work in your repository, and some one else cloned your repository and tried to push changes back into yours. You might immediately see conflicts show up and other weird behaviors that might be hard to predict. So, to prevent that scenario from happening, Git prevents you from pushing changes into an existing repository if the branch into which the changes are being pushed has been checked out.
One solution to this complication is what we did above: you can push into a new branch, and create the branch “on the fly.” Another solution is to make sure the origin
repository is a bare repository. A bare repository is, essentially, just the .git
directory in your repository directory; that is, there is no place for files to be “checked out,” so there is no branch that can ever be checked out. (To visualize this, instead of having our my_project/.git
directory structure, imagine that we simply had my_project.git
.) As long as there are no branches checked out, there will never be any weird synching behavior when someone pushes their commits to the origin
repository. (Repositories on GitHub are all bare repositories.)
In general, though, it is usually much easier to just pull
. And you can set up the origin
repository to have fetch
capabilities from the clone repository. Then, all you need to do is tell the owner of the origin
repository that you have some changes they might want to pull
into their repository. This is called a Pull Request, and it is a procedure that is made incredibly easy by GitHub.