How to share code with git in Windows at TUD

I (Seb) finally set up internal code sharing for our group at TUD. This is how it works (in Windows). For general information about source control and how it can make your life in science much easier, see Git for Scientists: A Tutorial, or Github's Try Git tutorial. I also highly recommend to follow, or just be aware of these suggestions for organising your code and data. Michael C. Frank provides some more useful guidelines for organising and communicating your research in his blog.

Connect to group network drive

See proni. Note that directories in proni which will be used as git repositories should have shared ownership access rights, i.e., group members should have the right to write and delete files in the directory. Maybe its sufficient to have the right to add and edit files, but this is not tested.

Install Git

We use Git. See the Git book for a comprehensive reference. It's freely available for Windows at www.git-scm.com/download/win. In the installer, simply choose the defaults. After installing Git for Windows, open "Git Bash" through the start menu. This will bring up a command window in which you'll type

git config --global user.name "John Doe"
git config --global user.email johndoe@example.com

where you replace John Doe and his email with your name and email address. See git's Getting started page for more info.

Also, for interoperability between operating systems type:

git config --global core.autocrlf true

and see Github's "dealing with line endings" for why. Replace the "true" with "input" when you happen to work in Linux.

Cloning and initialising existing git repositories

Now you should be ready to clone your first repository. I will use the BeeExpAnalysis repository as example. To clone it open "Git GUI" through the start menu, or click on "Git GUI here" after a right click on the Desktop, or the Windows explorer. In the opening window click "Clone Existing Repository" and select the BeeExpAnalysis repository which is in Y:\gitshare\BeeExpAnalysis.git when you mapped proni to Y. As destination select anything you like on your computer and then add a name for the folder which will contain all files. For example, if I want to clone the repository to my Desktop in folder BeeExpAnalysis, I'll make sure that the destination is set to

C:\Users\<LocalUsername>\Desktop\BeeExpAnalysis

Submodules

Code that is shared between different projects should be separate from the project-specific code. In this way, developments of the shared code in one project can be easily transferred to other projects. We achieve this separation using git submodules. For example, BeeExpAnalysis used submodules EPABC, Matlab-helpers and models at the time of writing.

If you left the box "Recursively clone submodules too" ticked, then all submodules used by your project should have been copied automatically. If something went wrong, then some of the submodule folders may be empty.

Importantly, the submodule that gets copied in the project directory, e.g. BeeExpAnalysis, will be a particular version of the submodule that was specified for the given project. In this way, a submodule, such as models, can be further developed without breaking any code in a project directory, because the project uses a particular, older version of the submodule. This should ensure reproducibility of results for a project under ongoing development of the code base.

Warning

You should never delete something from, copy something into or move something out of the gitshare directory in proni by hand unless you know exactly what you do. Thank you!

Advanced Usage

Updating submodules

Previously, I stated that submodules are added to a project with a specific version. This means that there might be newer versions of the submodule in the corresponding central submodule repositories. This does not need to worry you, but if you made an improvement to a submodule locally and want to share your changes with everyone else, the old version of your submodule is an issue. Here's how to share your improvements with a newer, central repository of the submodule. Parts of this description are based on the answer to How can I reconcile detached HEAD with master/origin? on StackOverflow.

In the following, the magic of version control means that you can always go back to the old version of a submodule that you were using. So you don't have to fear to break something without being able to recover the old state. If you have made changes in a submodule without committing the state of the submodule in the project that contains the submodule, you can revert back to the old submodule version simply by typing

git submodule update

in the main project directory. Yes, that's right, with submodule update you don't update to a new version, but you checkout the version of the submodule registered with the project.

Catching up with newest submodule versions without local changes

In git-speak having checked out a particular version of a repository means that your working copy has a detached head - detached, because your working copy is not tracking the master branch anymore and instead lives in its own, implicit branch. You can see this by typing

git branch

in the submodule directory.

First, you probably would like to know what has changed in the submodule since your local version. To do this you first fetch the newest changes from the central repository with

git fetch origin

where I assumed that the remote pointing to the central repository is called origin which is the standard. This operation does not change any checked out code in your project directory. It just fetches the commit history from the central submodule repository and allows you to check the corresponding log for changes since your version:

git log HEAD..origin

You can also check the differences in the code directly:

git diff origin

If you liked what you saw and would like to try out the new version, you can merge the changes into your local working copy with

git merge origin

After this you will use the latest submodule version in your local project code. Suppose you like this version and everything runs smoothly (perhaps after you updated some of your own code). Then you should do some bookkeeping and reattach the working copy to your local master branch which is tracking the master of origin. To do this, type

git checkout master
git merge origin

So you essentially just repeat what you did before. Why, then, didn't we switch to master right in the beginning? This was because the local master branch may have been at a different state in history than both your local version checked out by the project repository and the remote version in the central submodule repository. By staying with your detached head first, you were able to see all the changes that happened between then and the newest central version irrespective of the state of the local master branch.

Making and sharing local submodule changes

I consider two scenarios. In the first, you first get the newest version of the submodule from the central repository and then make changes. In the second, you first make local changes and then try to merge them with the newest version. In any case you need to get the newest version from the central repository before you will be able to share your own changes by committing to the central repository. The first scenario is preferred, because you are aware of the changes in the newest version when making your own changes. I only provide the second scenario, because it may happen that you forget to check for newer versions before making your own changes.

I explained already most of the things needed for the first scenario above. So use the description there to update to the most recent version of a submodule which tracks its master branch. Then make changes and commit them. To send them to the central repository simply type

git push origin

If somebody has committed a change in the central repository in the mean time, the push will fail and you'll have to fetch, review and merge the changes as usual before being able to push your own changes.

In the second scenario you started out with your detached head and then added commits on top of that without following more recent developments of the submodule. With that you implicitly created a new branch of the submodule. You first should make this explicit so we can name it properly. I will call the new branch localchanges and you can create it with

git checkout -b localchanges

localchanges now contains all your local changes starting from the version last registered with your project repository. You can then switch to master and update it with changes from the central repository. Do this with

git checkout master
git pull origin

Then you should probably inspect the changes made in the two branches, for example with

git log --graph --decorate --oneline master localchanges
git diff master localchanges

and prepare for conflict resolving. Finally, you can merge the two branches (assuming that you currently have checked out master)

git merge localchanges

After resolving potential conflicts you should then test the code locally. When you're happy with the result, you can delete branch localchanges and push to the central repository:

git branch -d localchanges
git push origin