Skip to Main Content

Digital Tools for Research

This guide provides information about digital tools that can be useful for research data management and analysis.

Command Line Interface (CLI)

Command Line Interface (CLI)

We typically interact with modern computers using a mouse, keyboard, or touchscreen to click on buttons and menus within Graphical User Interfaces (GUIs). However, this is not the only way to communicate with your operating system and different programs. Instead of interacting with GUI elements, you can type commands directly into a special programme known as a command line interface (CLI). Simply put, a CLI is a text interface for your computer. From the command line, you can navigate through files and folders on your computer and run applications. Behind a CLI that provides input and output functionality for text commands, there is an interpreter called a shell. Some programmes only have a GUI, some only have a CLI, and some have both.

CLIs can be very useful in bulk-processing your research data and interacting with servers and cloud computing services. The latter usually run on some Unix-based operating system.

Unix vs Windows

Linux and MacOS are Unix-based systems that share common features and design principles, including the CLI. It is called Terminal and uses shells like Bash or Zsh, which support both basic commands and advanced scripting. It means that most commands  for MacOS and Linux users will be the same.

Windows has two different built-in CLIs, Command Prompt (cmd.exe) and PowerShell, which have a different syntax and focus. While cmd is your go-to CLI for everyday tasks, PowerShell is more object-oriented and provides advanced scripting capabilities.

Windows Command Prompt (cmd)

The latest version of cmd available in Windows 11 supports tabs and allows to switch between switching between cmd, PowerShell, Git bash or any other shell in a single app.

Mac Terminal

Mac OS Sequoia 15 version.

Linux Terminal

Ubuntu 18.04 version.

Alternative Terminals

There are a few free open-source alternatives to standard terminals that offer extended functionality and customisation.

Basic Commands

When you open a CLI, it shows you your current location in the file system and a special symbol that shows you that the CLI is ready to take a command. This symbol is > on Windows and $ on Unix systems.

CLIs "remember" a certain amount of commands that you run during a session, and you can navigate between them using arrows: will get you the previous command you run, and goes to the next one. Another useful thing is autocomplete that helps you avoid typing long paths and filenames: just type the first few letters of a file/directory name you need, and then press tab.

Action Windows Linux / MacOS
Change directory cd  cd
Create file copy con touch
Create directory mkdir mkdir
Delete file del, erase rm; rm -rf -- delete folder and all files inside
Delete directory rmdir rmdir
Print message echo echo
Print file contents type cat
Copy file copy, xcopy cp
Rename file ren, rename mv
Move file move mv
Search for s file by name where find, locate
Search for a string in a file find grep
Compare contents of files fc diff
Print list of files and directories dir ls
View your current directory location chdir pwd
Execute scheduled commands at cron
Display the time time date
List running tasks tasklist ps x
Kill a process taskkill kill
Display free space free mem
Set environment variables set export
Send ICMP ECHO_REQUEST to network hosts ping ping
Clear screen cls clear
Help help apropos, man, whatis

Navigation

CLIs support both relative and absolute paths to files and directories. Here are some tips on navigating your file system in a CLI.

  • . refers to your current working directory
  • .. refers to a directory one level up from your current working directory
  • ../.. refers to a directory two levels up from your current working directory (you can go further up the tree in the same way)

For example, if you want to navigated to a directory called "test" within your current working directory, type cd test in the command line. If you want to navigate to a directory called "test", which is on the same level as your current working directory (i.e. they are in the same parent folder), type cd ../test.

The same works with any command. For example, if you want to list the contents of one directory above your current working directory on Windows, type dir .. in the command line.

Keyboard Shortcuts

Circumflex (^) means "Ctrl" on any system: ^C = Ctrl + C

^C — break

^S — stop and restart

^I — same as "Tab" key, goes over files and directories

^M — same as Enter.

^H — same as Backspace.

NB! Many common shortcuts, like Ctrl+C / Ctrl+V for copy and paste, won't work in standard CLIs!

Wildcards

Standard Windows and Unix CLIs support wildcards — symbols that will match any character, including spaces and punctuation marks. There are two of them:

  • * — matches any number of characters (including 0)
  • ? — matches exactly 1 character

Image source: https://www.warp.dev/terminus/linux-wildcards

Here are some commands that can be useful for managing your research data.

Find files that have a specific extension

Windows

dir /b /s /a-d | findstr ".mp3"

Unix

find . -type f -name "*.mp3"

Find files that don’t have a specific extension

Windows

dir /b /s /a-d | findstr /vi ".jpg"

Unix

find . -type f ! -name "*.jpg"

Count all files with a specific extension

Windows PowerShell

Get-ChildItem -Recurse -Filter *.jpg | Measure-Object

Unix

find . -type f -name "*.jpg" | wc -l

Delete files with a specific name recursively

This command deletes files even if they are hidden.

Windows

cd "path to the root folder (where you want to start)" 
del /s /q /f /a:h Thumbs.db

Unix

cd "path to the root folder (where you want to start)"  
find . -type f -name "Thumbs.db" -exec rm -f {} +

Delete all empty files recursively

Windows

forfiles /P "path to root folder" /C "cmd /C if @fsize==0 del @file”

Unix

find "path to root folder" -type f -empty -exec rm -f {} +

Set environment variables

Windows

set PATH=C:\Program Files\Python;%PATH%

Unix

export PATH="/usr/local/bin/python:$PATH"

These commands append the specified value to an existing environment variable PATH. The changes only affect the current session and will be lost once you close the CLI. To change environment variables permanently,

  • Windows: go to System Properties → Advanced → Environment Variables and make the changes you need.
  • Unix: add the export command (see example above) to your shell’s startup file (e.g., ~/.bashrc, ~/.zshrc, ~/.profile, ~/.bash_profile)

SSH

Secure Shell, commonly referred to as SSH, is a highly secure protocol used to communicate with remote computers, known as servers. It functions as a secure channel over the internet, ensuring that the data transmitted remains private and protected.

Both Unix-based systems and the latest versions of Windows have a built-in OpenSSH client. There is no native SSH client in Windows 8 and earlier. Instead, you’ll need to use a third party application, such as PuTTY or Cygwin, which won't be covered in this guide.

Connecting to a Remote Server

The basic command to connect to a server using SSH is as follows (replace username with your remote server username, and hostname with the server’s hostname or IP address).

ssh username@hostname

If you would like to specify a port, use the following command, replacing port-number with a port number.

ssh username@hostname -p port-number

To end an SSH session, type exit in the command line.

Sending Commands to a Remote Server

To run a single command on your remote server, use the following command. Replace username with the username of the remote user, hostname with the IP address or domain name of the remote server, and command with the command you wish to run.

ssh username@hostname command

To run multiple commands on your remote server (one after the other), use the following command. Replace command-1, command-2, and command-3 with the commands you wish to run.

ssh username@hostname  "command-1; command-2; command-3"

The commands should be separated by a semi-colon (;) and all of the commands together should be surrounded by double quotation marks ("). For example, if you want to create a file called bar.txt in a directory called foo within the user admin’s home directory on a Unix server with an IP address 127.0.0.1, you need to run the following command: ssh admin@127.0.0.1 "mkdir foo; cd foo; touch bar.txt".

Copying Files to/from a Remote Server

scp is a program for copying files between computers through the SSH protocol. It is included in the OpenSSH client.

To copy a file from your computer to the remote server, run the following command, replacing file, username@hostname and path with relevant values. The destination path is optional, but it can be a directory on the server, or even a file name if you are copying a single file.

scp file username@hostname:path

To copy a file from the remote server to your machine, use the following command:

scp username@hostname:file path

You can copy an entire directory with all subdirectories, using the -r option. Here is a command that copies path/directory from username@hostname to your current working directory, indicated by .

scp -r username@hostname:path/directory .

Creating Private-Public Key Pairs

SSH require a public-private key pair for authorisation. The private key must be kept confidential, and the public key must be copied to the remote host. Once the public key is transferred to the remote host, the connection will be established using SSH keys instead of a password.

To generate a public-private key pair, use the ssh-keygen command. Your keys will be stored in the files you specify and protected by a password if you choose so.

Command Line Text Editors

Sometimes you need to edit files from within the command line — for example, to write a commit message for git, to edit a configuration file or to create a virtual host. There are simple command line text editors for that.

Windows

The edit command in cmd starts the MS-DOS Editor, which creates and changes ASCII text files. The command has the following syntax:

edit [/b] [/h] [/r] [/s] [/<nnn>] [[<drive>:][<path>]<filename> [<filename2> [...]]

For example, to create and edit a file called test.txt in the current directory, type:

edit test.txt
Parameter Description
[<drive>:][<path>]<filename> [<filename2> [...]] Specifies the location and name of one or more ASCII text files. If the file doens't exist, MS-DOS Editor creates it. If the file exists, MS-DOS Editor opens it and displays its contents on the screen. The filename option can contain wildcard characters (* and ?). Separate multiple file names with spaces.
/b Forces monochrome mode, so that MS-DOS Editor displays in black and white.
/h Displays the maximum number of lines possible for the current monitor.
/r Loads file(s) in read-only mode.
/s Forces the use of short filenames.
<nnn> Loads binary file(s), wrapping lines to nnn characters wide.
/? Displays help at the command prompt.

Unix

There are two command-line text editors that are pre-installed in Unix-based systems: nano and vim. They also come as a part of git for Windows, so if you have git installed, you should be able to access them in your Program Files folder (paths may be a bit different on your machine):

C:\Program Files\Git\usr\bin\nano.exe
C:\Program Files\Git\usr\bin\vim.exe

Nano

nano is a simple and easy to use command line text editor. To open a file in nano, simply type nano filename in the command line (filename can be an absolute path to the file you want to edit). When you are ready to save your changes, type Ctrl+O. You can also save and quit by typing Ctrl+X. Here are some other commands:

  • ^G - Get Help.
  • ^X - Exit. Nano then asks if you want to save with a Y or N option.
  • ^O - Write Out; also known as save.
  • ^R - Read File. Enter the name of a file you want to paste into the current document at your cursor’s position.
  • ^W - Where Is; Search function.
  • ^\ - Replace.
  • ^K - Cut text.
  • ^U - Uncut text.
  • ^J - Justify.
  • ^T - To spell.
  • ^C - Current Position; Cancel save.
  • ^_ - Go to line.

Vim

vim is an older Unix command line text editor with a steeper learning curve. It has two different modes, command and insert, and opens in the command mode by default. To open a file in vim, type vim filename in the command line (filename can be an absolute path to the file you want to edit).

To start writing or editing, you must enter insert mode by pressing the letter i on your keyboard. You should see ---INSERT--- at the bottom of your terminal page if you did it correctly. When you are finished typing, and you want to save your work, you need to exit insert mode. Press the escape (esc) key, which places you back in command mode. Then you can save your work.

After you press escape, press shift + ;. The bottom of your terminal screen changes to reflect that you did it correctly. You now see a : where the ---INSERT--- was. After you see the : in the lower left-hand corner of your vim editor, type w and then press enter to save your work. Then, you can either type i again to go back into insert mode if you want to continue writing, or you can quit the file. To quit, press shift + ; again, type q and then press enter. This saves your file and closes vim. You should see your usual terminal screen again.

You can also enter both the save and quit functions at the same time. To save and quit vim in one command, type wq after the : and then press enter. The file saves and closes.

If you start working on a file, but you change your mind, you can exit without saving. To do this, enter command mode by pressing esc followed by shift + ;. After you see the : at the lower left, enter q!. This force-quits vim without saving. ! is the force function.

Those commands are the ones that you are going use most of the time, but you can use the following list if you want to do more complex actions with vim.

Use the following commands in command mode:

  • h - Moves the cursor to the left by one character; you can also press the left arrow.
  • j - Moves the cursor one line down; you can also press the down arrow.
  • k - Moves the cursor one line up; you can also press the up arrow.
  • l - Moves the cursor to the right by one character; you can also press the right arrow.
  • w - Moves the cursor one full word to the right.
  • b - Moves the cursor one full word to the left.
  • 0 - Moves the cursor to the beginning of the current line.
  • $ - Moves the cursor to the end of the current line.
  • ~ - Changes the case of the current character.
  • dd - Deletes the current line.
  • D - Deletes everything on the line to the right of the cursor’s current position.
  • x - Deletes the current character.
  • u - Undo the last command.
  • . - Repeats the last command.
  • :w - Saves current file, but does not exit.
  • :wq - Saves current file, and quits.

The following commands place you into insert mode:

  • i - Inserts to the left of the current cursor position.
  • a - Appends to the right of the current cursor position.
  • dw - Deletes the current word.
  • cw - Changes the current word.

The section on vim is adapted from https://docs.rackspace.com/docs/command-line-text-editors-in-linux

Tar

Tar is a UNIX-based open-source utility for collecting many files into one archive file, often referred to as a tarball. Tar stands for 'tape archive', referring to a historic archive format originally designed to read and write magnetic tape for file transfer purposes.

NB! Tar is not available on Windows.

The general syntax for the tar command is as follows:

tar [OPERATION_AND_OPTIONS] [ARCHIVE_NAME] [FILE_NAME(s)]
  • OPERATION - Only one operation argument is allowed and required. The most frequently used operations are:
    • --create (-c) - Create a new tar archive.
    • --extract (-x) - Extract the entire archive or one or more files from an archive.
    • --list (-t) - Display a list of the files included in the archive
  • OPTIONS - The most frequently used operations are:
    • --verbose (-v) - Show the files being processed by the tar command.
    • --file=archive=name (-f archive-name) - Specifies the archive file name.
  • ARCHIVE_NAME - The name of the archive.
  • FILE_NAME(s) - A space-separated list of filenames to be extracted from the archive. If not provided, the entire archive is extracted.
  • When executing tar commands, you can use the long or the short form of the tar operations and options. The long forms are more readable, while the short forms are faster to type. The long-form options are prefixed with a double dash (--). The short-form options are prefixed with a single dash (-), which can be omitted.

Creating tar Archive

Tar supports a vast range of compression programs such as gzip, bzip2, lzip, lzma, lzop, xz and compress. When creating compressed tar archives, it is an accepted convention to append the compressor suffix to the archive file name. For example, if an archive has been compressed with gzip , it should be named archive.tar.gz.

To create a tar archive, use the -c option followed by -f and the name of the archive.

For example, to create an archive named archive.tar from the files named file1, file2, file3, you would run the following command:

tar -cf archive.tar file1 file2 file3

Here is the equivalent command using the long-form options:

tar --create --file=archive.tar file1 file2 file3

You can create archives from the contents of one or more directories or files. By default, directories are archived recursively unless --no-recursion option is specified.

The following example will create an archive named user_backup.tar of the /home/user directory:

tar -cf backup.tar /home/user

Use the -v option if you want to see the files that are being processed.

Creating tar.gz Archive

Gzip is the most popular algorithm for compressing tar files. When compressing tar archives with gzip, the archive name should end with either tar.gz or tgz.

The -z option tells tar to compress the archive using the gzip algorithm as it is created. For example, to create a tar.gz archive from given files, you would run the following command:

tar -czf archive.tar.gz file1 file2

Listing tar Archives

When used with the --list (-t) option, the tar command lists the content of a tar archive without extracting it.

The command below, will list the content of the archive.tar file:

tar -tf archive.tar
file1file2file3

Extracting tar Archive

Most of the archived files in Linux are archived and compressed using a tar or tar.gz format. Knowing how to extract these files from the command line is important.

To extract a tar archive, use the --extract (-x) option followed by the archive name:

tar -xf archive.tar

It is also common to add the -v option to print the names of the files being extracted.

tar -xvf archive.tar

Extracting tar Archive in a Different Directory

By default, tar will extract the archive contents in the current working directory . Use the --directory (-C) to extract archive files in a specific directory:

For example, to extract the archive contents to the /opt/files directory, you can use:

tar -xf archive.tar -C /opt/files

Extracting Specific Files from a tar Archive

Sometimes instead of extracting the whole archive, you might need to extract only a few files from it.

To extract a specific file(s) from a tar archive, append a space-separated list of file names to be extracted after the archive name:

tar -xf archive.tar file1 file2

When extracting files, you must provide their exact names, including the path, as printed by --list (-t).

Extracting one or more directories from an archive is the same as extracting files:

tar -xf archive.tar dir1 dir2

If you try to extract a file that doesn’t exist, an error message similar to the following will be displayed:

tar -xf archive.tar README
tar: README: Not found in archive
tar: Exiting with failure status due to previous errors

Extracting Files from a tar Archive using Wildcard

To extract files from an archive based on a wildcard pattern, use the --wildcards switch and quote the pattern to prevent the shell from interpreting it.

For example, to extract files whose names end in .js (Javascript files), you can use:

tar -xf archive.tar --wildcards '*.js'

Adding Files to Existing tar Archive

To add files or directories to an existing tar archive, use the --append (-r) operation.

For example, to add a file named newfile to archive.tar, you would run:

tar -rvf archive.tar newfile

Removing Files from a tar Archive

Use the --delete operation to remove files from an archive.

The following example shows how to remove the file file1 from archive.tar,:

tar --delete -f archive.tar file1


Adapted from https://linuxize.com/post/how-to-create-and-extract-archives-using-the-tar-command-in-linux/

Python & Git

Like many programming languages, Python allows you to run scripts and even write code directly from the command line. Additionally, Python package managers like pip and conda are only accessible through a CLI.

NB! These commands are the same for every operating system.

Python

Command Action
python Run Python in terminal
python <PYTHON_SCRIPT_NAME> Run Python script
python --version Check Python version
pip freeze See list of installed packages
pip install Install package
pip install -r <FILE_PATH> Install all packages listed in a file (usually, requirements.txt)
pip uninstall Delete package
pip show See info about package
pip search Search for package (if you don't remember the exact name)

You can find more on working with pip in the official documentation.

If you have Anaconda, you can use its own package manager called conda (the analogue of pip). Here is a nice conda cheatsheet and documentation.

Git

Although GitHub now provides a desktop client called GitHub Desktop, you may still need to be able to interact with git through the command line. Here are some basic commands, and you can find more in the Git & GitHub section on this guide.

NB! These commands are the same for every operating system.

Command Action
git init Create a local repository
git clone Clone a repository
git pull Pull changes from a remote repository
git status Check the status of local changes
git add <PATH> Add selected files/folders to be tracked
git add * Add all files to be tracked
git rm Delete files
git commit -m "Commit message" Commit changes; -m is for commit message
git push Push local changes to remote
git log Check log