Getting started with Git

Sahithya Papireddy
17 min readSep 15, 2020

— documenting my learnings from the September Edition of #IBelieveinDoing crio workshop on Linux and Git Basics for developers.

As part of the Crio course on developer essentials , I was prompted to familiarise myself with Linux and Git. It still surprises me how the 10 days back me knew little to nothing about these topics and what started as an attempt to get acquainted with a few handy skills has now shaped up into a full blog post.

So here goes..

What is git?

Git is the free and open source distributed version control system that’s responsible for everything GitHub related that happens locally on your computer.

Woah Woah hold up..

What’s version control? What’s GitHub — is it the same as Git? What exactly do we mean by “locally on our computer”?

Well then, we’ll start off by closing all those multiple tabs opened in our mind and understand each of these terms one by one:

What is “version control”?

“Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later.”

If you are say a graphic designer, its most likely that you would want to keep every version of an image or layout ; In this case a Version Control System (VCS) is a very wise thing to use-

  • It allows you to revert selected files back to a previous state or even revert the entire project back to a previous state,
  • Compare changes over time
  • See who last modified something that might be causing a problem, who introduced an issue and when and more.

Using a VCS also generally means that if you mess things up or lose files, you can easily recover them. A VCS therefore allows you to experiment and try things out more freely without the fear of not being able to recover your previous or original draft.

(a quick analogy for someone familiar with graphic design — the concept of “layers” and “History” in Photoshop is a sort of primitive version control system in the sense that those tools provide you with the flexibility of being able to reach any “state” of the project you’re working on from your current “state” )

So Git is a version control system but to be specific it’s a Distributed Version Control System.

What exactly do we mean by a Distributed Version Control System now?

Let’s start off with a timeline of sorts..

Local Version Control Systems

Many people’s version-control method of choice is to copy files into another directory (a time-stamped directory would be more apt). This approach is very common because it is so simple, but it is also incredibly error prone;

It is easy to forget which directory you’re in and accidentally write to the wrong file or copy over files you don’t mean to.

To deal with this issue, programmers long ago developed local VCSs that had a simple database that kept all the changes to files under revision control.

One of the most popular VCS tools was a system called RCS, which is still distributed with many computers today. RCS works by keeping “patch sets” (that is, the differences between files) in a special format on disk; it can then re-create what any file looked like at any point in time by adding up all the patches.

Centralized Version Control Systems

Centralized Version Control Systems (CVCSs) were developed with the aim of solving the next major issue that people encounter — the need to collaborate with developers on other systems.

These systems (such as CVS, Subversion, and Perforce) have a single server that contains all the versioned files, and a number of clients that check out files from that central place. For many years, this has been the standard for version control.

This setup offers many advantages, especially over local VCSs. For example, everyone knows to a certain degree what everyone else on the project is doing. Administrators have fine-grained control over who can do what, and it’s far easier to administer a CVCS than it is to deal with local databases on every client.

However, the most obvious downside of CVCS is the single point of failure that the centralized server represents. If that server goes down for an hour, then during that hour nobody can collaborate at all or save versioned changes to anything they’re working on. If the hard disk the central database is on becomes corrupted, and proper backups haven’t been kept, you lose absolutely everything — the entire history of the project except whatever single snapshots people happen to have on their local machines. Local VCS systems suffer from this same problem — whenever you have the entire history of the project in a single place, you risk losing everything.

Distributed Version Control Systems

This is where Distributed Version Control Systems (DVCSs) step in. In a DVCS (such as Git, Mercurial, Bazaar or Darcs), clients don’t just check out the latest snapshot of the files; rather, they fully mirror the repository, including its full history. Thus, if any server dies, and these systems were collaborating via that server, any of the client repositories can be copied back up to the server to restore it. Every clone is really a full backup of all the data.

Furthermore, many of these systems deal pretty well with having several remote repositories they can work with, so you can collaborate with different groups of people in different ways simultaneously within the same project. This allows you to set up several types of workflows that aren’t possible in centralized systems, such as hierarchical models.

Okay now that we’ve gotten a bit of insight into what exactly Git does , let’s try to see HOW IT DOES WHAT IT DOES…

Even though Git’s user interface is fairly similar to other VCSs, Git stores and thinks about information in a different way:

The major difference between Git and any other VCS (Subversion and friends included) is the way Git thinks about its data. Conceptually, most other systems store information as a list of file-based changes.

These other systems (CVS, Subversion, Perforce, Bazaar, and so on) Store data as changes over time to a base version of each file (this is commonly described as delta-based version control).

Contrary to this, Git thinks of its data more like a series of snapshots of a miniature filesystem;

With Git, every time you commit, or save the state of your project, Git basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot. To be efficient, if files have not changed, Git doesn’t store the file again, just a link to the previous identical file it has already stored. Git thinks about its data more like a stream of snapshots.

Therefore Git Stores data as snapshots of the project over time

This makes Git more like a mini filesystem with some incredibly powerful tools built on top of it, rather than simply a VCS.

Let’s now begin to actually understand a Git process…

Git has three main states that your files can reside in: modified, staged, and committed:

· Modified means that you have changed the file but have not committed it to your database yet.

· Staged means that you have marked a modified file in its current version to go into your next commit snapshot.

· Committed means that the data is safely stored in your local database.

This leads us to the three main sections of a Git project: the working tree, the staging area, and the Git directory.

The basic Git workflow goes something like this:

1. You modify files in your working tree.

2. You selectively stage just those changes you want to be part of your next commit, which adds only those changes to the staging area.

The staging area is a file, generally contained in your Git directory, that stores information about what will go into your next commit. Its technical name in Git parlance is the “index”, but the phrase “staging area” works just as well.

3. You do a commit, which takes the files as they are in the staging area and stores that snapshot permanently to your Git directory.

The Git directory is where Git stores the metadata and object database for your project. This is the most important part of Git, and it is what is copied when you clone a repository from another computer.

If a particular version of a file is in the Git directory, it’s considered committed. If it has been modified and was added to the staging area, it is staged. And if it was changed since it was checked out but has not been staged, it is modified.

Now to get on with actually working with Git..

GETTING STARTED WITH GIT

There are a lot of different ways to use Git. There are the original command-line tools, and there are many graphical user interfaces of varying capabilities.

For one, the command line is the only place you can run all Git commands — most of the GUIs implement only a partial subset of Git functionality for simplicity. If you know how to run the command-line version, you can probably also figure out how to run the GUI version, while the opposite is not necessarily true. Also, while your choice of graphical client is a matter of personal taste, all users will have the command-line tools installed and available.

Git is installed and maintained on your local system (rather than in the cloud) and gives you a self-contained record of your ongoing programming versions. It can be used completely exclusive of any cloud-hosting service — you don’t even need internet access, except to download it.

GitHub is designed as a Git repository hosting service.

And what exactly is a Git repository hosting service? It’s an online database that allows you to keep track of and share your Git version control projects outside of your local computer/server.

Unlike Git, GitHub is exclusively cloud-based and therefore an individual’s Git repositories can be remotely accessed by any authorized person, from any computer, anywhere in the world (provided it has an internet connection).

GitHub expands upon Git’s basic functionality. It presents an extremely intuitive, graphically represented user interface, and provides programmers with built-in control and task-management tools.

GitHub makes it possible for entire teams to coordinate together on single projects in real-time. As changes are introduced, new branches are created, allowing the team to continue to revise the code without overwriting each other’s work. These branches are like copies, and changes made on them do not reflect in the main directories on other users’ machines unless users choose to push/pull the changes to incorporate them. (Refer branches in the terminology section below)

Other Git repository hosting services also exist; GitLab, BitBucket, and SourceForge are all viable GitHub alternatives, and GitLab even offers a built-in option which allows GitHub users to migrate their projects directly into GitLab.

Simply put, Git is a version control system that lets you manage and keep track of your source code history. GitHub is a cloud-based hosting service that lets you manage Git repositories. If you have open-source projects that use Git, then GitHub is designed to help you better manage them.

1.Before you start using Git, you have to make it available on your computer.

https://git-scm.com/book/en/v2/Getting-Started-Installing-Git —offers a very intuitive and well detailed process to set up git on your computer.

At the end of this step, you should have a working version of Git on your system that’s set up with your personal identity.

2) A)

You may now start working on projects with Git locally on your computer. Again the link below offers a detailed insight to help you get started on your first project along with the commands you need to familiarize yourself with:

https://git-scm.com/book/en/v2/Git-Basics-Getting-a-Git-Repository

But before fully getting into step A) you might want to try out B) first.

B)

You can now choose to create an account on GitHub or GitLab — note this isn’t a mandatory step — in fact it has little to do with YOU working on your project and more to do with collaborating with OTHERS on a project. An online cloud service like GitHub, GitLab etc. is recommended since usually teams of people collaborate on a single project.

Collaborating with others involves managing these remote repositories (see below in the terminology section for a short description on repository) hosted on these online cloud services — we can push and pull data to and from them when you need to share work. (see below in the terminology section for a brief intro on pushing and pulling to and from repositories)

To begin with, this is with reference to setting up your GitLab account since on my crio workshop we worked with GitLab, you can always use GitHub to get you started.

· An empty Git repository is required, follow the steps here — https://docs.gitlab.com/ee/gitlab-basics/create-project.html

to create a blank project on GitLab.

Don’t select the Initialize repository with a README option if you want the repository to initially be an empty one.

· Create & add an SSH key to GitLab

o Create SSH Key –

o For Ubuntu:

https://docs.gitlab.com/ee/ssh/README.html#generating-a-new-ssh-key-pair

o For Windows:

https://www.digitalocean.com/docs/droplets/how-to/add-ssh-keys/create-with-putty/

Add SSH key to GitLab:

https://docs.gitlab.com/ee/ssh/README.html#adding-an-ssh-key-to-your-gitlab-account

Once you’re done with the above steps you can then get the GitLab repository links from your GitLab account. The text highlighted in yellow in the below picture are the links to your repository(HTTPS link and SSH link)

Let us now download our repository on GitLab to a directory in our local system.

Use the below command with the SSH link to your repo to do this

git clone <add-ssh-link-here>

Note: To connect your computer with GitLab, you need to add your credentials to identify yourself. You have two options:

· Authenticate on a project-by-project basis through HTTPS, and enter your credentials every time you perform an operation between your computer and GitLab.

· Authenticate through SSH once like shown above and GitLab won’t ask your credentials every time you pull, push, and clone.

To start the authentication process, we’ll have to clone an existing repository to our computer which is what the above command aims to do.

For more details on authentication mode of GitLab refer to the below link:

https://docs.gitlab.com/ee/gitlab-basics/start-using-git.html#git-authentication-methods

We’d now be able to see a new folder with the same name as our repository name on GitLab

Now that we have the same Git repository on our local system as on GitLab, the former will be referred to as the local repo and the GitLab one, remote repo. Remotes in Git are usually places like GitLab & GitHub where we can share our code with others.

At the end of successful completion of step 2, you have cloned a remote GitHub repository(on GitLab) and you can access it on your machine just like a folder. You might also observe that there is a .git folder inside the repo.

In some cases this .git folder might not be visible because it could be a hidden file. In such a case you can follow the below commands according to your OS to view them

https://en.wikipedia.org/wiki/Hidden_file_and_hidden_directory

here is the basic idea of what we are trying to achieve by performing the above steps:

If you want to make a change to a project you copy the whole repository to your own system. You make your changes on your local copy, then you “check in” the changes to the central server. This encourages the sharing of more granular changes since you don’t have to connect to the server every time you make a change.

Following is a quick summary of the workflow and basic commands of a git process(again link in step 2A contains a very detailed explanation of the below commands):

1. You modify files in your working directory

2. You stage the files, adding snapshots of them to your staging area (`git add file`)

3. You do a commit, which takes the files as they are in the staging area and stores that snapshot permanently to your local Git repository (`git commit file`)

4. You push the changes from the local repository to the remote repository (`git push`)

5. You pull the changes from the remote repository to the local repository (`git pull`)

Note:

Usually if we are expecting some changes to be done by other team members on our project, We’ll need to use the git pull command first to integrate the remote changes locally & then push.

But, why did we have to do a git pull followed by a git push? Why wouldn’t it work the other way around? Consider a scenario where multiple team members could be changing the same file as well. Before we actually start working on the file we need to account for the changes made to the file by other team members so that we can get the updated file to work on and then work accordingly.

When you try to do a push before a pull and there’s some non updated files in your local repo, you are most likely to get an error saying:

“Remote contains work that you don’t have locally”

To know more about this error and why its created you can refer to the following article:

https://blog.plover.com/prog/git-ff-error.html

While performing such a pull followed by a push we might often also run into a MERGE CONFLICT:

On pulling, Git checks the changes coming from the remote, sees that the same file has changed locally as well & automatically tries to include both sets of changes.

So when we make a change say on the same line of a file both locally as well as on the remote repository, a conflict arises as to which change should be actually saved.

Git wouldn’t know which one to prioritise and needs our help to resolve this merge conflict.

https://docs.github.com/en/github/collaborating-with-issues-and-pull-requests/about-merge-conflicts

A BEGINNER’S GUIDE TO GIT TERMINOLOGIES

Repository

Your files in GitLab live in a repository, similar to how you have them in a folder or directory in your computer. Remote repository refers to the files in GitLab and the copy in your computer is called local copy. A project in GitLab is what holds a repository, which holds your files. Often, the word “repository” is shortened to “repo”.

Note : Remote repositories can be on your local machine.

It is entirely possible that you can be working with a “remote” repository that is, in fact, on the same host you are. The word “remote” does not necessarily imply that the repository is somewhere else on the network or Internet, only that it is elsewhere. Working with such a remote repository would still involve all the standard pushing, pulling and fetching operations as with any other remote.

Origin

In Git, “origin” is a shorthand name for the remote repository that a project was originally cloned from. More precisely, it is used instead of that original repository’s URL — and thereby makes referencing much easier.

Note that origin is just a standard convention. Although it makes sense to leave this convention untouched, you could perfectly rename it without losing any functionality.

In the following example, the URL parameter to the “clone” command becomes the “origin” for the cloned local repository:

git clone https://github.com/gittower/git-crash-course.git

Fork

When you want to copy someone else’s repository, you fork the project. By forking it, you’ll create a copy of the project into your own namespace to have read and write permissions to modify the project files and settings.

Once on your namespace, you can then clone it into your computer, work on its files, and (optionally) submit proposed changes back to the original repository if you’d like.

Download vs clone

To create a copy of remote repository files on your computer, you can either download or clone it. If you download it, you cannot sync it with the remote repository on GitLab.

On the other hand, by cloning a repository, you’ll download a copy of its files to your local computer, but preserve the Git connection with the remote repository, so that you can work on the its files on your computer and then upload the changes to GitLab.

Commit

A commit is like a checkpoint in Git for us to us to navigate back and forth (if required) where we save the current state of our files. A commit is often accompanied by a commit message which is like a short description of the change made to a certain file so that you and your teammates can keep a tab of the changes being done to the files in the repository.

Pull and push

After you saved a local copy of a repository and modified its files on your computer, you can upload the changes to GitLab. This is referred to as pushing to GitLab, as this is achieved by the command git push.

When the remote repository changes, your local copy will be behind it. You can update it with the new changes in the remote repo. This is referred to as pulling from GitLab, as this is achieved by the command git pull.

Branching

one thing that really sets Git apart is its branching model.

If you want to add code to a project but you’re not sure if it will work properly, or you’re collaborating on the project with others, and don’t want your work to get mixed up, it’s a good idea to work on a different branch. Branching allows you to create independent local branches in your code.

When you create a branch in a Git repository, you make a copy of its files at the time of branching. You’re free to do whatever you want with the code in your branch without impacting the main branch or other branches. And when you’re ready to bring your changes to the main codebase, you can merge your branch into the main one used in your project (such as master).

For more info on branching:

https://www.atlassian.com/git/tutorials/using-branches

Tracked and Untracked Files

Remember that each file in your working directory can be in one of two states: tracked or untracked. Tracked files are files that were in the last snapshot; they can be unmodified, modified, or staged. In short, tracked files are files that Git knows about.

Untracked files are everything else — any files in your working directory that were not in your last snapshot and are not in your staging area. When you first clone a repository, all of your files will be tracked and unmodified because Git just checked them out and you haven’t edited anything.

As you edit files, Git sees them as modified, because you’ve changed them since your last commit. As you work, you selectively stage these modified files and then commit all those staged changes, and the cycle repeats.

References:

https://git-scm.com/ — documentation part is extremely resourceful!

https://blog.devmountain.com/git-vs-github-whats-the-difference/

https://www.git-tower.com/learn/git/glossary/origin#:~:text=In%20Git%2C%20%22origin%22%20is,but%20just%20a%20standard%20convention.

https://techcrunch.com/2012/07/14/what-exactly-is-github-anyway/

dev.to/mollynem/git-github — workflow-fundamentals-5496 (image used to demonstrate basic command workflow in git)

CONCLUSION

Aaah I see you’ve made it to the end — first up thanks to the reader for actually making it this far (even though you might have skipped reading some parts which I am guilty of doing myself) ,I would love to appreciate your inputs on my article.

And this article wouldn’t be complete without thanking my mentors and teachers on Crio who made sure we had access to every resource possible for smooth learning and to Crio itself which provided a very intuitive and interactive experience through its course by providing us tasks in the form of micro-experiences.

Cheers!

--

--

Sahithya Papireddy

I like oreo flavored ice cream but I do not like oreos.