.

Introduction to Linux

Introduction

All of the current ARC systems run an operating system called Linux.  Whereas Microsoft Windows and Mac OS X place almost total emphasis on graphical interaction with the operating system, Linux also allows (perhaps even encourages) users to do lots of things from the "command line" or "prompt".

Working from a command line has many benefits and once used to it, many users complain how slow and painful using graphical systems can be.  However, it is different and many users find the learning curve off putting.  This is where simple tutorials with Linux can help.

There are many good tutorials around for Linux on the Web. IT Services also run lunch courses in Linux. This tutorial is intended for users of the University of Oxford Advanced Research Computing facility to gain sufficient basic Linux knowledge and skills to be able to utilise our services. This is a basic level of knowledge required to attend the ARC Training Course 1: Introduction to ARC (see the Training page for more information).

Background

All ARC services utilise the Linux operating system. Linux, in its simplest terms, is software on a computer that enables applications and the computer user to access devices to perform desired functions. The operating system (OS) relays instructions from an application to, for instance, the computer’s processor. The processor then performs the instruction sending the results back to the application.

Operating System stack

In managing the communication between application and system hardware the operating system provides several layers of abstraction as in the diagram above. This abstraction makes it easier to develop applications that will run across multiple platforms. Changes in underlying hardware (such as CPU type, network card, graphics card) are managed by changing individual components of the operating system within each of these layers.

As an operating system the principles behind Linux are derived from generations of UNIX developments. There are many different types of operating system including Windows (Microsoft), Mac OS X (Apple). ARC uses Linux for its abilities to provide secure remote access to its systems and the ability to manage many different types of scientific applications across a range of different hardware.

Further Information

For further information and tutorials about Linux/UNIX please see:

Linux concepts

Linux or UNIX is based around the concept of commands which:

  • Carry out one (or very few) simple tasks
  • Carry tasks out very efficiently
  • Carry tasks out very quietly

Commands tend to be typed in a “shell”, the layer between you and the core Linux “kernel”. More complex tasks are carried out by joining simple commands together. To join commands together the vertical line | symbol, or ‘pipe’ is used.

For example:

  • grep – search for a pattern in a file
  • sort - sort into order
  • uniq – only show one copy of identical things
grep ARC *txt | sort | uniq > output.txt

The above command could search thousands of files for lines containing “ARC” and save one copy of each line to a new file (in seconds).

Linux treats everything as files and whenever possible, make files plain text and readable by anyone with anything. Linux does not hide or obscure ‘stuff’ unless it needs to be hidden.

WARNING

Finally, Linux assumes that the user knows what they are doing. On ARC systems users will not have escalated or ‘root’ privileges but will be able to delete or modify any files/data that resides in their own home account or project data directory. Where you are sharing data with other project members you need to be careful when using Linux commands as you may inadvertently delete or modify this data. Most Linux (UNIX) commands do not ask questions. They assume you know what you are typing and blindly execute the task as instructed.

Exercises

The following exercises will help you gain a familiarity with the Linux operating system and the command line. While not a comprehensive tutorial, this should be sufficient introduction to attend ARC Training Course 1: Introduction to ARC (see the Training page for more information).

Conventions used in the exercises

Within the step by step tutorial guides, the following conventions are used to explain the commands being presented and to make it clear which sections of text you need to enter.

This is the font that is used for general information throughout this guide.

Commands and options to be typed into the computer, as well 
as output from the computer are in this font.

Commands are typed in at a “prompt” which is represented by:

prompt>

Parts of a command you should replace (e.g. fill-in-your-name) are in italics.

So if you saw:

prompt> ls -l filename

You would type “ls –l” and then the filename of your choice and hit the return key.

Exercise 1: Logging on

The first exercise is to log on to an ARC system, "arcus-b" from the machine in front of you (a list of the ARC services and their host names may be found on the Services page) using an SSH (or Secure Shell) client. If you run Linux or have a Mac, you should log on using "ssh" from the command line (the Mac command line may be accessed using the "Terminal" application). If you use Windows (of any flavour), you can use SSH client programs such as "PuTTY", "MobaXterm" and "Cygwin"; the following section provides more information on using PuTTY.

Windows (PuTTY)

PuTTY is a program which creates a “terminal” with which you can get remote access to other systems (as well as a whole bunch of other functions beyond the scope of this course). PuTTY is not usually installed by default on Windows PCs so you may need to ask your local IT staff to install it for you (it can be downloaded from http://www.chiark.greenend.org.uk/~sgtatham/putty/).

Launch PuTTY from your PC.

The host name of the machine we're going to connect to is arcus-b.arc.ox.ac.uk, the host name of all ARC facilities may be found on the Services page. We're using an SSH connection on port 22 (don't worry what that means too much). Your window should now look like this:

Putty screenshot arcus-b

Linux and Mac (Terminal/Console)

On a Mac or Linux PC, simply type the following on the command line:

prompt> ssh username@arcus-b.arc.ox.ac.uk

Where “username” is the ARC name given to you when you registered for a user account.

SSH with Graphical Displays

On its own, the above exercise will allow you to connect to ARC systems. However, one of the big differences between Linux and Windows/Mac is that graphical displays (things you can display pictures or windows on) are not assumed and therefore graphical connections are not enabled by default. If you want to see any graphics or open other windows from an ARC system you will also need to enable "X" (in the Linux/Unix world, “X” is usually synonymous with graphics).

This part of the exercise explains how to enable "X" forwarding over SSH. It is strongly encouraged that you try this step but if you prefer not to, please continue on to Exercise 2.

Windows (Putty)

To use "X" forwarding with Windows, you need to install an "X" client such as VcXsrv.

  1. If you don't already have one installed, install an "X" client such as VcXsrv
  2. If you have installed VcXsrv, you will need to start the X Server by running "X Launch" (we suggest reading through Using VcXsrv and putty for more details).
  3. In the menu of the left of the PuTTY window, click on the + button next to “SSH”,
  4. Select “X11” from the new menu.
  5. Select “Enable X forwarding”
  6. Go back to the Session window which you started on

Now click “Open” and a terminal window should appear asking you for your username and password. Enter these and PuTTY will connect to "arcus-b".

Mac OS X

"X" is no longer included with Mac OS X and must be installed separately; the package to install is called XQuartz. Once you have installed XQuartz, you can connect to arcus-b with X forwarding enabled by typing the following on the command line and pressing enter:

prompt> ssh -X username@arcus-b.arc.ox.ac.uk

Where “username” is the ARC name given to you when you registered for a user account and "-X" enables the X forwarding.

Linux

If you are using Linux with a graphical desktop, it should not be necessary to install any more software. You can enable X forwarding by typing the following on the command line and pressing enter:

prompt> ssh -X username@arcus-b.arc.ox.ac.uk

Where “username” is the ARC name given to you when you registered for a user account and "-X" enables the X forwarding.

Exercise 2: Where am I?

Introduces the commands pwd, ls, cd, mkdir
Introduces the concepts of files, directories and the home directory

A Linux system is arranged in a tree like structure with files (leaves) and directories (branches). Windows users will recognise directories as being the Linux equivalent of folders which are used to organise files. In this exercise we are going to navigate around some parts of the Linux file system to see what we can find.

When you do something with files and directories on Linux (making a new file for example), if you do not specify where in the directory “tree” the file should be created, it defaults to your current “location”. This is known as your current working directory. Let's find out where in the file system we are at the moment by using “pwd” (which is short for print-working-directory):

prompt> pwd

In fact because you have just logged in, your current working directory is also your “home directory” i.e. the place in the directory tree where you will always start off.

Your home directory is special and important to you for two reasons:

  • Several (hidden) configuration files which affect your environment (see later) are stored in this area
  • It's owned by you and you can freely manipulate the directories and files under your home area – it's your personal space to work in on the ARC systems

Let's list what you have in your home directory:

prompt> ls

Which gives you a list of files and directories. Let's create a new directory to do some work in.

prompt> mkdir training
prompt> ls

Now let's move out of our home area and into the training directory, checking it worked:

prompt> pwd
prompt> cd training
prompt> pwd
prompt> ls

The “ls” should have no output, it's a new directory and therefore it's empty. To up one directory, there is a shorthand name for “up one level” (note the space between "cd" and ".." unlike on Windows)

prompt> cd ..
prompt> pwd

You can also jump directly to any other part of the file system:

prompt> cd /usr/local
prompt> ls
prompt> pwd

Now keep trying to go up one directory level until you can't go any higher – you have reached the “root” of the file system – what is it called?

Now, to go back to your home directory, there are two short cuts:

prompt> cd

or

prompt> cd ~

What do you think will happen if you run the following command (try it)?

prompt> cd ~/training

In the ARC systems, your home area is rather small and not for storing large data files and running jobs. Instead you should use another area you've been given with your account on /data.

/home/group_name/user_name is a small home area
/data/group_name/user_name is for large data sets, outputs and running jobs.

Now try to change directories to your area in /data and make a new directory in there (choose the name yourself).

Exercise 3: Permissions

Introduces the commands chmod, man
Introduces the concept of permissions on files and directories, groups, man pages and flags

All users have their own identity on a Linux system and in addition they belong to one or more groups. At the ARC, a group is normally the same as your project.

Linux (and Unix) attaches an ownership and set of permissions to every file (including directories) which define what you can and can't do to that file depending on who you are and which groups you are in. How do we find out what permissions are in place for a given file/directory? We use the “ls” command but with an optional extra setting (an example of a so called called a “flag”). Let's try it on a directory called “/usr/bin”:

prompt> ls -l /usr/bin

And you should see output which looks something like this (but may go on for several pages):

-rwxr-xr-x.   1 root root          62 Mar 17  2014 zfgrep
-rwxr-xr-x.   1 root root        2022 Mar 17  2014 zforce
-rwxr-xr-x.   1 root root        4981 Mar 17  2014 zgrep
-rwxr-xr-x    1 root root      216008 Nov 11  2010 zip
-rwxr-xr-x    1 root root      110376 Nov 11  2010 zipcloak
-rwxr-xr-x.   1 root root        2953 Oct 10  2008 zipgrep
-rwxr-xr-x    2 root root      164128 Nov 11  2010 zipinfo
-rwxr-xr-x    1 root root      101856 Nov 11  2010 zipnote
-rwxr-xr-x    1 root root      105280 Nov 11  2010 zipsplit

The permissions of the files are listed in the first column of the output. The first characters are split thus:

  • 1st character is the type of file
  • Characters 2-4 are the read/write/execute permissions for the owner of the file
  • Characters 5-7 are the read/write/execute permissions for the group the file belongs to
  • Characters 8-10 are the read/write/execute permissions for everyone else (World).

In the case of the file “zgrep” this breaks down as

Type User Group World Number of Links Owner Owners Group Size Last modification time Name
- rwx r-x r-x 1 root root 4981 17th March 2014 zgrep
  • First character is “-” so it's a normal file (d means a directory)
  • Characters 2-4 are “rwx” so the user can read and write and execute (run) this file as a program
  • 5-7 are “r-w” so any other user in the group “root” can read and run the file but not change it
  • 8-10 is also “r-w” so any user in any group can read and run the file but not change it

Who can do what with the file “/usr/sbin/sitar.pl”?

Let's now backup for a moment. Where did “ls -l” come from? Each Linux command comes with a little manual of its own. To read the manual for “ls”:

prompt> man ls

and see what it tells you about the “-l” flag. Hit “space” to scroll down and “q” to quit.

  • Now read the manual for a command called “chmod
  • Work out what the command “chmod g-w” would do to a file.
  • Work out the full chmod command you would use to make sure that anyone in any group could read the contents of a file you create. Also, how would you make a file executable (so it can be run as a program)? Check the answer with a demonstrator.

The most common error message which ARC users might encounter is, “Permission denied.” This can cover a range of situations such as:

Trying to read or alter a file which you have no right to
Trying to delete a file or directory which you have no right to
Trying to create a file in an area which you have no right to

Try the following and see what happens:

prompt> mkdir /etc/wibble

This is because /etc is owned by the systems administrators account (known as “root”) and root has not given write permission to this area.

Further reading

Exercise 4: Wildcards

Concepts introduced include wildcards and case sensitivity

As you saw from the /usr/bin directory, some areas contain lots of files. There's a handy trick for gathering groups of files together using "wildcards". These are useful for many Linux commands but for now, let's stick with the “ls” command on the /usr/bin directory.

Remind yourself of the contents of /etc:

prompt> ls /usr/bin

and then try

prompt> ls /usr/bin/a*

“*” is a substitute for “anything from zero to N characters of any type”. So everything starting with a “a” is listed. If a directory starts with an “a”, all it's files are listed even if they do not start with “a” themselves.

Now try

prompt> ls /usr/bin/A*

Is the result the same or different? What do you think will happen if you try

prompt> ls /usr/bin/*a*

Try the following:

prompt> ls /usr/bin/*a
prompt> ls /usr/bin/[a-c]*
prompt> ls /usr/bin/[a-c]*f*
prompt> ls /usr/bin/*[0-9]*

Can you explain what happens in each case?

As well as “*” there is another wild card, “?”. This means “anything which is is one character long”.

Try the following:

prompt> ls /usr/bin/?a*
prompt> ls /usr/bin/*a?
prompt> ls /usr/bin/??a*

Can you explain what happens in each case? Make sure you understand the difference between “/usr/bin/*a*” and “/usr/bin/?a?”

INTERLUDE: KEYBOARD SHORTCUTS

Now you've become slightly more used to typing commands at a shell prompt, we'll introduce three shortcuts which will speed things up:

  1. The up and down arrow keys – these enable you to go back and forth through your history of previous commands
  2. The left and right arrow keys, allow you to move along a line so you can change things
  3. Tab – this auto-completes command names and file names

Exercise 5: Creating a file with a text editor

Unlike Windows where files tend to be in special formats specific to the application designed to use them, Linux tries to use plain text format as much as possible. This keeps files simple and readable and allows lots of different applications to easily read the same file.

In this exercise, we'll use a very simple text editor to create a text file. The text editor we introduce here is called "nano". It's just a very simple screen with some commands written on the bottom.

prompt> cd ~/training
prompt> ls
prompt> nano myfile.txt

Type in the following paragraphs:

"The ARC is a complete high performance computing service available to all researchers at Oxford University. We offer training and application support plus access to a range of powerful clusters and shared memory machines, along with a large storage facility."

"The facilities offered by ARC are the most powerful in the University and enable researchers to tackle projects which could not otherwise be addressed by local facilities."

Now save the file (Ctrl-O) and exit (Ctrl-X).

prompt> ls

You should have a new file called "myfile.txt".

Further reading

INTERLUDE: TEXT EDITORS

There are lots and lots of text editors for Linux. Long-time users have very firm opinions as to which is the best while not being able to agree with each other (see the wikipedia entry on the Editor War). While the "nano" text editor is a simple but versatile text editor, if you are intending to do more than edit the occassional text file on Linux, we would strongly encourage you to learn how to use more power text editors such as emacs, vim or gedit (remote usage of gedit requires "forwarding of graphics", see Exercise 1). The following links will help you gain familiarity with some of the text editors mentioned.

Emacs
VIM
gedit

Exercise 6: Looking inside files

Introduces the commands cat, less, grep

You can use a text editor to look inside a file without changing anything of course but there are quicker (and it turns out, more useful) ways of looking inside text files. To see what's inside a text file:

prompt> cat myfile.txt

(cat is short for concatenate). The problem with “cat” is that big files rapidly scroll off the top of the terminal screen. A more powerful command is called “more” but that has been supplanted by an even more powerful command named “less”:

prompt> less myfile.txt

(press “space” to scroll, “q” to quit). "less" allows you to search through files by hitting the “/” key and entering the string to search for. Look for the word “huge” in myfile.txt using “less”.

There are even faster ways to look for words in text files. Use “grep” to look for the word “ARC” on myfile.txt:

prompt> grep ARC myfile.txt

Exercise 7: Copying, Moving and Deleting files and directories

Introduces the concept of Linux not warning about deletions
Introduces the commands cp, mv, rm and rmdir

To copy a file, use “cp” and to move or rename it, use “mv”.

prompt> cp myfile.txt myjunk.txt
prompt> ls
prompt> mv myjunk.txt myold.txt
prompt> ls
  1. Copy myfile.txt to another file with the name “myfile2.txt”.
  2. Edit the second file with “nano” and change some text at random.
  3. Come up with a single command using wildcards which searches both files for a particular word

To delete a file we use the command "rm".

prompt> rm myold.txt

Notice how you got no warning whatsoever. This is one of the most important differences between Linux and Windows and Mac OS X – Linux assumes you always know exactly what you're doing and rarely checks to see if it's a sensible thing to do.

What do you think the command “rm *” does? What would happen if you tried this in a directory containing all your research results?

Now make a new directory called "temporary" and then try to remove it using:

  1. rm
  2. rmdir

rm is for removing files, rmdir for empty directories.

Exercise 8: Making a shell script

This introduces the concept of writing scripts
Applies the concept of making files executable with "chmod"

Linux is made up from lots of very small commands which perform very specific jobs. It is often the case that you need to group lots of commands together into a “recipe” which you use over and over again. Such recipes are held in text files and are known as “scripts”. There are different flavours of scripts depending on which language they are written in and because we are making a script written with BASH-shell commands, this will be a “shell script”.

Our script will do two things:

  1. List the contents of /etc
  2. Put the output into a new file called listing.txt

Use “nano” to enter the following script into a file called listing.sh:

# A comment. Other than #!, anything starting with # is ignored
# List /etc and REDIRECT the output to a file called listing.txt
ls /etc > listing.txt

and save the script as “listing.sh”.

There are two ways we can run this script. Method one involves sourcing the script (saying to the shell you're typing at, “please run this text file as a BASH script”.

prompt> . ./listing.sh

The “.” character on its own means “run this” file as a BASH script.

(it is good practice to refer to the shell script with a “./” in front of it. “./listing.sh” means “the listing .sh file which is in this directory”  and avoids the risk of running something else with the same name which might be installed elsewhere in the system.)

The second way of doing this is a bit tidier. We place the strange looking line “#!/bin/bash” at the start of the script which forces the script to run as a BASH shell script every time.

Use nano to change the first line of the script so it looks like this

#!/bin/bash
# A comment. Other than #!, anything starting with # is ignored
# List /etc and REDIRECT the output to a file called listing.txt
ls /etc > listing.txt

Now try to run the file directly (so no “.” sign).

prompt> ./listing.sh

You should get “permission denied”. This is because your listing.sh file doesn't have execution permission (a safety feature which stops you running any old file).

Use the chmod command to add executable permission to listing.sh executable and run it using the line above. Re-read the chmod man page if you need to.

Now looking inside your new file, “listing.txt” to check the script has worked. Note how “>” redirected the output from a command into a file. It is also possible to redirect files into commands using “<”. This is known as “redirection” and is used by some of the applications on the ARC systems.

The reason we are showing you shell scripts is that you will need to write some simple scripts to submit jobs on the ARC systems.

Exercise 9: Joining commands together

This exercise is currently in the process of being written.

Introduces the | (or pipe) operator

In the last exercise you learnt how to "redirect" the output of a command to a text file using ">", and how to redirect files into a command using "<". It is also possible to redirect the output (or standard output data stream) of one command into input (or standard input data stream) using the "|" or pipe operator; more information on the data streams may be found on the wikipedia article on Standard Streams.

Using the pipe operator, it is possible to join two or more commands together to obtain the output that you require, you may reading the example in the discussion on Linux Concepts towards the beginning of this guide.

Exercise 10: The Linux environment

Introduces the echo and env commands
Introduces the concept of environment variables

So far we've talked a lot about commands in Linux. We also need to briefly consider the shell environment which is comprised of many variables and their settings. To print anything in Linux we use the echo command. To print the contents of a variable called PATH, we'd do the following:

prompt> echo $PATH

The “$” tells the shell that this is going to be an environment variable which has already been set.

Try setting a variable yourself:

prompt> export MYNAME=jon
prompt> echo $MYNAME

To list all of your environment variables, use the env command.

prompt> env

Can you work out how to redirect env into a file?

Many applications on the ARC systems require environment variables to be set in order to work properly. We usually try to make this easier using “modules”, see below.

Exercise 11: Quotas

Introduces quota checking

Quotas are a way of managing disk space. Each user is governed by two quotas:

/home/your_group/user_name has a 15GB quota for every user
/data/your_group/ has a quota for each project. This is usually about 5TB.

Quotas are checked on the ARC storage by running the pan_quota command.

Read the man page on pan_quota and run the pan_quota command.

Exercise 12: Modules

Introduces the concept of environment variables for applications
Introduces modules
Introduces the command diff

Use the env command to display your environment variables. Now run env again but redirect the output into a file called “env1.dat”. Now run the following command:

prompt> module load intel-compilers

Run env again and redirect it to “env2.dat”. Now use grep to examine the contents of the PATH variable before and after you loaded the Intel module. You should see changes.

A more efficient way to do this is to use diff. Read the man page for diff and try it on the two files you have made.

Modules are an easy way of changing your environment. Most ARC applications will come with a module you should load before you use it. See the ARC How to Guides for details.

Exercise 13: Finishing off

Use “exit” or Ctrl-D to log off.

prompt> exit

What next?

There are a lot of good resources on the internet related to Linux and how to use it. Below is a small selection: