Cognitive Domain | Test |
Speed of processing | Brief Assessment of Cognition in Schizophrenia (BACS): Symbol-Coding
Category Fluency: Animal Naming Trail Making Test: Part A |
Attention/Vigilance | Continuous Performance Test—Identical Pairs (CPT-IP)* |
Working memory (nonverbal) (verbal) |
Wechsler Memory Scale®—3rd Ed. (WMS®-III): Spatial SpanLetter–Number Span |
Verbal learning | Hopkins Verbal Learning Test—Revised™ (HVLT-R™) |
Visual learning | Brief Visuospatial Memory Test—Revised (BVMT-R™) |
Reasoning and problem solving | Neuropsychological Assessment Battery® (NAB®): Mazes |
Social cognition | Mayer-Salovey-Caruso Emotional Intelligence Test(MSCEIT™): Managing Emotions |
AFNI installation for Apple Silicon
Install Homebrew
/bin/bash -c “$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)”
Homebrew set path
echo ‘eval “$(/opt/homebrew/bin/brew shellenv)”‘ >> ~/.zprofile
eval “$(/opt/homebrew/bin/brew shellenv)”
Install essential packages
brew install python netpbm cmake gfortran libpng jpeg expat freetype fontconfig openmotif libomp gsl glib pkg-config gcc libiconv autoconf libxt mesa mesa-glu libxpm
Install XQuartz
brew install –cask xquartz
Add Python to PATH for zsh
export PATH=${PATH}:/opt/homebrew/opt/python/libexec/bin
echo ‘export PATH=${PATH}:/opt/homebrew/opt/python/libexec/bin’ >> ~/.zshrc
Install matplotlib via pip
pip install matplotlib
Install R
https://cran.yu.ac.kr/bin/macosx/ <- download
/usr/sbin/softwareupdate –install-rosetta
Install AFNI
curl -O https://raw.githubusercontent.com/afni/afni/master/src/other_builds/OS_notes.macos_12_b_user.tcsh
tcsh -xef OS_notes.macos_12_b_user.tcsh |& tee out.mac_12_b_user.txt
The Brain Imaging Data Structure (BIDS) standard
The Brain Imaging Data Structure (BIDS) standard
In the previous section, we pointed out that Nipype can be used to create reproducible analysis pipelines that can be applied across different datasets. This is true, in principle, but in practice, it also relies on one more idea: that of a data standard. This is because to be truly transferable, an analysis pipeline needs to know where to find the data and metadata that it uses for analysis. Thus, in this section, we will shift our focus to talking about how entire neuroimaging projects are (or should be) laid out. Until recently, datasets were usually organized in an idiosyncratic way. Each researcher had to decide on their own what data organization made sense to them. This made data sharing and reuse quite difficult because if you wanted to use someone else’s data, there was a very good chance you’d first have to spend a few days just figuring out what you were looking at and how you could go about reading the data into whatever environment you were comfortable with.
The BIDS specification
Fortunately, things have improved dramatically in recent years. Recognizing that working with neuroimaging data would be considerably easier if everyone adopted a common data representation standard, a group of (mostly) fMRI researchers convened in 2016 to create something now known as the Brain Imaging Data Standard, or BIDS. BIDS wasn’t the first data standard proposed in fMRI, but it has become by far the most widely adopted. Much of the success of BIDS can be traced to its simplicity: the standard deliberately insists not only on machine readability but also on human readability, which means that a machine can ingest a dataset and do all kinds of machine processing with it, but a human looking at the files can also make sense of the dataset, understanding what kinds of data were collected and what experiments were conducted. While there are some nuances and complexities to BIDS, the core of the specification consists of a relatively simple set of rules a human with some domain knowledge can readily understand and implement.We won’t spend much time describing the details of the BIDS specification in this book, as there’s already excellent documentation for that on the project’s website. Instead, we’ll just touch on a couple of core principles. The easiest way to understand what BIDS is about is to dive right into an example. Here’s a sample directory structure we’ve borrowed from the BIDS documentation. It shows a valid BIDS dataset that contains just a single subject:
project/
sub-control01/
anat/
sub-control01_T1w.nii.gz
sub-control01_T1w.json
sub-control01_T2w.nii.gz
sub-control01_T2w.json
func/
sub-control01_task-nback_bold.nii.gz
sub-control01_task-nback_bold.json
sub-control01_task-nback_events.tsv
sub-control01_task-nback_physio.tsv.gz
sub-control01_task-nback_physio.json
sub-control01_task-nback_sbref.nii.gz
dwi/
sub-control01_dwi.nii.gz
sub-control01_dwi.bval
sub-control01_dwi.bvec
fmap/
sub-control01_phasediff.nii.gz
sub-control01_phasediff.json
sub-control01_magnitude1.nii.gz
sub-control01_scans.tsv
code/
deface.py
derivatives/
README
participants.tsv
dataset_description.json
CHANGES
There are two important points to note here. First, the BIDS specification imposes restrictions on how files are organized within a BIDS project directory. For example, every subject’s data goes inside a sub-[id]
folder below the project root —- where the sub-
prefix is required, and the [id]
is a researcher-selected string uniquely identifying that subject within the project ("control01"
in the example). And similarly, inside each subject directory, we find subdirectories containing data of different modalities: anat
for anatomical images; func
for functional images; dwi
for diffusion-weighted images; and so on. When there are multiple data collection sessions for each subject, an extra level is introduced to the hierarchy, so that functional data from the first session acquired from subject control01
would be stored inside a folder like sub-control01/ses-01/func
.Second, valid BIDS files must follow particular naming conventions. The precise naming structure of each file depends on what kind of file it is, but the central idea is that a BIDS filename is always made up of (1) a sequence of key-value pairs, where each key is separated from its corresponding value by a dash, and pairs are separated by underscores; (2) a “suffix” that directly precedes the file extension and describes the type of data contained in the file (this comes from a controlled vocabulary, meaning that it can only be one of a few accepted values, such as "bold"
or "dwi"
); and (3) an extension that defines the file format.For example, if we take a file like sub-control01/func/sub-control01_task-nback_bold.nii.gz
and examine its constituent chunks, we can infer from the filename that the file is a Nifti image (.nii.gz
extension) that contains BOLD fMRI data (bold
suffix) for task nback
acquired from subject control01
.Besides these conventions, there are several other key elements of the BIDS specification. We won’t discuss them in detail, but it’s good to at least be aware of them:
-
- Every data file should be accompanied by a JSON “sidecar” containing metadata describing that file. For example, a BOLD data file might be accompanied by a side-car file that describes acquisition parameters, such as repetition time.
- BIDS follows an “inheritance” principle —- meaning that JSON metadata files higher up in the hierarchy automatically apply to relevant files lower in the hierarchy unless explicitly overridden. For example, if all of the BOLD data in a single dataset was acquired using the same protocol, this metadata need not be replicated in each subject’s data folder.
- Every project is required to have a
dataset_description.json
file at the root level that contains basic information about the project (e.g., the name of the dataset and a description of its constituents, as well as citation information). - BIDS doesn’t actively prohibit you from including non-BIDS-compliant files in a BIDS project -— so you don’t have to just throw out files that you can’t easily shoehorn into the BIDS format. The downside of including non-compliant files is just that most BIDS tools and/or human users won’t know what to do with them, so your dataset might not be quite as useful as it otherwise would be.
BIDS Derivatives
The BIDS specification was originally created with static representations of neuroimaging datasets in mind. But it quickly became clear that it would also be beneficial for the standard to handle derivatives of datasets -— that is, new BIDS datasets generated by applying some transformation to one or more existing BIDS datasets. For example, suppose we have a BIDS dataset containing raw fMRI images. Typically, we’ll want to preprocess our images (for example, to remove artifacts, apply motion correction, temporally filter the signal, etc.) before submitting them to analysis. It’s great if our preprocessing pipeline can take BIDS datasets as inputs, but what should it then do with the output? A naive approach would be to just construct a new BIDS dataset that’s very similar to the original one, but replace the original (raw) fMRI images with new (preprocessed) ones. But that’s likely to confuse: a user could easily end up with many different versions of the same BIDS dataset, yet have no formal way to determine the relationship between them. To address this problem, the BIDS-Derivatives extension introduces some additional metadata and file naming conventions that make it easier to chain BIDS-aware tools (see the next section) without chaos taking hold.
The BIDS Ecosystem
At this point, you might be wondering: what is BIDS good for? Surely the point of introducing a new data standard isn’t just to inconvenience people by forcing them to spend their time organizing their data a certain way? There must be some benefits to individual researchers — and ideally, the community as a whole -— spending precious time making datasets and workflows BIDS-compliant, right? Well, yes, there are! The benefits of buying into the BIDS ecosystem are quite substantial. Let’s look at a few.
Easier data sharing and reuse
One obvious benefit we alluded to above is that sharing and re-using neuroimaging data becomes much easier once many people agree to organize their data the same way. As a trivial example, once you know that BIDS organizes data according to a fixed hierarchy (i.e., subject –> session –> run), it’s easy to understand other people’s datasets. There’s no chance of finding time-course images belonging to subject 1 in, say, /imaging/old/NbackTask/rawData/niftis/controlgroup/1/
. But the benefits of BIDS for sharing and reuse come into full view once we consider the impact on public data repositories. While neuroimaging repositories have been around for a long time (for an early review, see {cite}van2001functional
), their utility was long hampered by the prevalence of idiosyncratic file formats and project organizations. By supporting the BIDS standard, data repositories open the door to a wide range of powerful capabilities.
To illustrate, consider OpenNeuro — currently the largest and most widely-used repository of brain MRI data. OpenNeuro requires uploaded datasets to be in BIDS format (though datasets do not have to be fully compliant). As a result, the platform can automatically extract, display, and index important metadata. For example, the number of subjects, sessions, and runs in each dataset; the data modalities and experimental tasks present; a standardized description of the dataset; and so on. Integration with free analysis platforms like BrainLife is possible, as is structured querying over datasets via OpenNeuro’s GraphQL API endpoint.
Perhaps most importantly, the incremental effort required by users to make their BIDS-compliant datasets publicly available and immediately usable by others is minimal: in most cases, users have only to click an Upload button and locate the project they wish to share (there is also a command-line interface, for users who prefer to interact with OpenNeuro programmatically).
BIDS-Apps
A second benefit to representing neuroimaging data in BIDS is that one immediately gains access to a large, and rapidly growing, ecosystem of BIDS-compatible tools. If you’ve used different neuroimaging tools in your research -— for example, perhaps you’ve tried out both FSL and SPM (the two most widely used fMRI data analysis suites) —- you’ll have probably had to do some work to get your data into a somewhat different format for each tool. In a world without standards, tool developers can’t be reasonably expected to know how to read your particular dataset, so the onus falls on you to get your data into a compatible format. In the worst case, this means that every time you want to use a new tool, you have to do some more work.
By contrast, for tools that natively support BIDS, life is simpler. Once we know that fMRIPrep -— a very popular preprocessing pipeline for fMRI data {cite}Esteban2019-md
-— takes valid BIDS datasets as inputs, the odds are very high that we’ll be able to apply fMRIPrep to our own valid BIDS datasets with little or no additional work. To facilitate the development and use of these kinds of tools, BIDS developed a lightweight standard for “BIDS Apps” {cite}Gorgolewski2017-mb
. A BIDS App is an application that takes one or more BIDS datasets as input. There is no restriction on what a BIDS App can do, or what it’s allowed to output (though many BIDS Apps output BIDS-Derivatives datasets); the only requirement is that a BIDS App is containerized (using Docker or Singularity; see {numref}docker
), and accept valid BIDS datasets as input. New BIDS Apps are continuously being developed, and as of this writing, the BIDS Apps website lists a few dozen apps.
What’s particularly nice about this approach is that it doesn’t necessarily require the developers of existing tools to do a lot of extra work themselves to support BIDS. In principle, anyone can write a BIDS-App “wrapper” that mediates between the BIDS format and whatever format a tool natively expects to receive data in. So, for example, the BIDS-Apps registry already contains BIDS-Apps for packages or pipelines like SPM, CPAC, Freesurfer, and the Human Connectome Project Pipelines. Some of these apps are still fairly rudimentary and don’t cover all of the functionality provided by the original tools, but others support much or most of the respective native tool’s functionality. And of course, many BIDS-Apps aren’t wrappers around other tools at all; they’re entirely new tools designed from the very beginning to support only BIDS datasets. We’ve already mentioned fMRIPrep, which has very quickly become arguably the de facto preprocessing pipeline in fMRI; another widely-used BIDS-App is MRIQC {cite}Esteban2017-tu
, a tool for automated quality control and quality assessment of structural and functional MRI scans, which we will see in action in {numref}nibabel
. Although the BIDS-Apps ecosystem is still in its infancy, the latter two tools already represent something close to killer applications for many researchers.
To demonstrate this statement, consider how easy it is to run fMRIPrep once your data is organized in the BIDS format. After installing the software and its dependencies, running the software is as simple as issue this command in the terminal:
fmriprep data/bids_root/ out/ participant -w work/
where, data/bids_root
points to a directory that contains a BIDS-organized dataset that includes fMRI data, out
points to the directory into which the outputs (the BIDS derivatives) will be saved and work
is a directory that will store some of the intermediate products that will be generated along the way. Looking at this, it might not be immediately apparent how important BIDS is for this to be so simple, but consider what software would need to do to find all of the fMRI data inside of a complex dataset of raw MRI data, that might contain other data types, other files and so forth. Consider also the complexity that arises from the fact that fMRI data can be collected using many different acquisition protocols, and the fact that fMRI processing sometimes uses other information (for example, measurements of the field map, or anatomical T1-weighted or T2-weighted scans). The fact that the data complies with BIDS allows fMRIPrep to locate everything that it needs with the dataset and to make use of all the information to perform the preprocessing to the best of its ability given the provided data.
Utility libraries
Lastly, the widespread adoption of BIDS has also spawned a large number of utility libraries designed to help developers (rather than end users) build their analysis pipelines and tools more efficiently. Suppose I’m writing a script to automate my lab’s typical fMRI analysis workflow. It’s a safe bet that, at multiple points in my script, I’ll need to interact with the input datasets in fairly stereotyped and repetitive ways. For instance, I might need to search through the project directory for all files containing information about event timing, but only for a particular experimental task. Or, I might need to extract some metadata containing key parameters for each time-series image I want to analyze (e.g., the repetition time, or TR). Such tasks are usually not very complicated, but they’re tedious and can slow down development considerably. Worse, at a community level, they introduce massive inefficiency, because each person working on their analysis script ends up writing their own code to solve what are usually very similar problems.
A good utility library abstracts away a lot of this kind of low-level work and allows researchers and analysts to focus most of their attention on high-level objectives. By standardizing the representation of neuroimaging data, BIDS makes it much easier to write good libraries of this sort. Probably the best example so far is a package called PyBIDS, which provides a convenient Python interface for basic querying and manipulation of BIDS datasets. To give you a sense of how much work PyBIDS can save you when you’re writing neuroimaging analysis code in Python, let’s take a look at some of the things the package can do.
We start by importing an object called BIDSLayout
, which we will use to manage and query the layout of files on disk. We also import a function that knows how to locate some test data that was installed on our computer together with the PyBIDS software library.
from bids import BIDSLayout
from bids.tests import get_test_data_path
One of the datasets that we have within the test data path is data set number 5 from OpenNeuro. Note that the software has not actually installed into our hard-drive a bunch of neuroimaging data — that would be too large! Instead, the software installed a bunch of files that have the right names and are organized in the right way, but are mostly empty. This allows us to demonstrate the way that the software works, but don’t try reading the neuroimaging data from any of the files in that directory. We’ll work with a more manageable BIDS dataset, including the files in it in {numref}nibabel
.
For now, we initialize a BIDSLayout
object, by pointing to the location of the dataset in our disk. When we do that, the software scans through that part of our file-system, validates that it is a properly organized BIDS dataset, and finds all of the files that are arranged according to the specification. This allows the object to already infer some things about the dataset. For example, the dataset has 16 subjects and 48 total runs (here a “run” is an individual fMRI scan). The person who organized this dataset decided not to include a session folder for each subject. Presumably, because each subject participated in just one session in this experiment, that information is not useful.
layout = BIDSLayout(get_test_data_path() + "/ds005")
print(layout)
BIDS Layout: ...packages/bids/tests/data/ds005 | Subjects: 16 | Sessions: 0 | Runs: 48
The layout
object now has a method called get()
, which we can use to gain access to various parts of the dataset. For example, we can ask to give us a list of the filenames of all of the anatomical ("T1w"
) scans that were collected for subjects sub-01
and sub-02
layout.get(subject=['01', '02'], suffix="T1w", return_type='filename')
['/home/runner/.local/lib/python3.10/site-packages/bids/tests/data/ds005/sub-01/anat/sub-01_T1w.nii.gz',
'/home/runner/.local/lib/python3.10/site-packages/bids/tests/data/ds005/sub-02/anat/sub-02_T1w.nii.gz']
Or, using a slightly different logic, all of the functional ("bold"
) scans collected for subject sub-03
layout.get(subject='03', suffix="bold", return_type='filename')
['/home/runner/.local/lib/python3.10/site-packages/bids/tests/data/ds005/sub-03/func/sub-03_task-mixedgamblestask_run-01_bold.nii.gz',
'/home/runner/.local/lib/python3.10/site-packages/bids/tests/data/ds005/sub-03/func/sub-03_task-mixedgamblestask_run-02_bold.nii.gz',
'/home/runner/.local/lib/python3.10/site-packages/bids/tests/data/ds005/sub-03/func/sub-03_task-mixedgamblestask_run-03_bold.nii.gz']
In these examples, we asked the BIDSLayout
object to give us the 'filename'
return type. This is because if we don’t explicitly ask for a return type, we will get back a list of BIDSImageFile
objects. For example, selecting the first one of these for sub-03
‘s fMRI scans:
bids_files = layout.get(subject="03", suffix="bold")
bids_image = bids_files[0]
This object is quite useful, of course. For example, it knows how to parse the file name into meaningful entities, using the get_entities()
method, which returns a dictionary with entities such as subject
and task
that can be used to keep track of things in analysis scripts.
bids_image.get_entities()
{'datatype': 'func',
'extension': '.nii.gz',
'run': 1,
'subject': '03',
'suffix': 'bold',
'task': 'mixedgamblestask'}
In most cases, you can also get direct access to the imaging data using the BIDSImageFile
object. This object has a get_image
method, which would usually return a nibabel Nifti1Image
object. As you will see in {numref}nibabel
this object lets you extract metadata, or even read the data from a file into memory as a Numpy array. However, in this case, calling the get_image
method would raise an error, because, as we mentioned above, the files do not contain any data. So, let’s look at another kind of file that you can read directly in this case. In addition to the neuroimaging data, BIDS provides instructions on how to organize files that record the behavioral events that occurred during an experiment. These are stored as tab-separated-values (‘.tsv’) files, and there is one for each run in the experiment. For example, for this dataset, we can query for the events that happened during subject sub-03
‘s 3rd run:
events = layout.get(subject='03', extension=".tsv", task="mixedgamblestask", run="03")
tsv_file = events[0]
print(tsv_file)
<BIDSDataFile filename='/home/runner/.local/lib/python3.10/site-packages/bids/tests/data/ds005/sub-03/func/sub-03_task-mixedgamblestask_run-03_events.tsv'>
Instead of a BIDSImageFile
, the variable tsv_file
is now a BIDSDataFile
object, and this kind of object has a get_df
method, which returns a Pandas DataFrame
object
bids_df = tsv_file.get_df()
bids_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 85 entries, 0 to 84
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 onset 85 non-null float64
1 duration 85 non-null int64
2 trial_type 85 non-null object
3 distance from indifference 0 non-null float64
4 parametric gain 85 non-null float64
5 parametric loss 0 non-null float64
6 gain 85 non-null int64
7 loss 85 non-null int64
8 PTval 85 non-null float64
9 respnum 85 non-null int64
10 respcat 85 non-null int64
11 RT 85 non-null float64
dtypes: float64(6), int64(5), object(1)
memory usage: 8.1+ KB
This kind of functionality is useful if you are planning to automate your analysis over large datasets that can include heterogeneous acquisitions between subjects and within subjects. At the very least, we hope that the examples have conveyed to you the power inherent in organizing your data according to a standard, as a starting point to use, and maybe also develop, analysis pipelines that expect data in this format. We will see more examples of this in practice in the next section.
Exercises
BIDS has a set of example datasets available in a GitHub repository at https://github.com/bids-standard/bids-examples. Clone the repository and use pyBIDS to explore the dataset called “ds011”. Using only pyBIDS code, can you figure out how many subjects participated in this study? What are the values of TR that were used in fMRI acquisitions in this dataset?
Additional resources
There are many places to learn more about the NiPy community. The documentation of each of the software libraries that were mentioned here includes fully worked-out examples of data analysis, in addition to links to tutorials, videos, and so forth. For Nipype, in particular, we recommend Michael Notter’s Nipype tutorial as a good starting point.
BIDS is not only a data standard but also a community of developers and users that support the use and further development of the standard. For example, over the time since the standard was first proposed, the community has added instructions to support the organization and sharing of new data modalities (e.g., intracranial EEG) and derivatives of processing. The strength of such a community is that, like open-source software, it draws on the individual strengths of many individuals and many groups who are willing to spend time and effort evolving and improving it. One of the resources that the community has developed to help people who are new to the standard learn more about it is the BIDS Starter Kit. It is a website that includes materials (videos, tutorials, explanations, and examples) to help you get started learning about and eventually using the BIDS standard.
In addition to these relatively static resources, users of both NiPy software and BIDS can interact with other members of the community through a dedicated questions and answers website called Neurostars. On this website, anyone can ask questions about neuroscience software and get answers from experts who frequent the website. In particular, the developers of many of the projects that we mentioned in this chapter, and many of the people who work on the BIDS standard often answer questions about these projects through this website.
Reference
Rokem A, Yarkoni T. Data Science for Neuroimaging: An Introduction. 2023.
Version Control
Another set of tools that give us more control over our computational environment are tools that track changes in our software and analysis. If this seems like a boring or trivial task to you, consider the changes that you need to make to a program that you create to analyze your data, throughout a long project. Also, consider what happens when you work with others on the same project, and multiple different people can introduce changes to the code that you are using. If you have ever tried to do either of these things, you may have already invented your way to track changes to your programs. For example, naming the files with the date you made the most recent changes or adding the name of the author who made the most recent change to the filename. This is a form of version control — but it’s often idiosyncratic and error-prone. Instead, software tools for version control allow us to explicitly add a particular set of changes to the code that we are writing as we go along, keeping track for us of who made which changes and when. They also allow us to merge changes that other people are doing. If needed, it marks where changes we have made conflict with changes that others have made, and allows us to resolve these conflicts, before moving on
Getting started with git
One of the most widely-used software for version control is called “git”. Without going too much into the difference between git and other alternatives, we’ll just mention here that one of the reasons that git is very widely-used is because of the availability of services that allow you to use your version control to share your code with others through the web and to collaborate with them on the same code-base fairly seamlessly. A couple of the most popular services are GitHub and GitLab. After we introduce git and go over some of the basic mechanics of using git, we will demonstrate how you can use one of these services to collaborate with others and to share your code openly with anyone. This is an increasingly common practice and one that we will talk about more in sharing
.
Git is a command-line application. Like the ls
, cd
, and pwd
unix commands that you saw in unix
, we use it in a shell application.
In contrast to the other shell commands you saw before, git can do many different things. To do that, git has sub-commands. For example, before you start using git, you need to configure it to recognize who you are using the `git config` sub-command. We need to tell git both our name and our email address. “`
$ git config --global user.name "Ariel Rokem" $ git config --global user.email arokem@gmail.com
This configuration step only needs to be done once on every computer on which you use git, and is stored in your home directory (in a file called ~/.gitconfig
).
Exercise
Download and install git and configure it with your name and email address.
The next sub-command that we will usually use with git is the one that you will start your work with from now on — the sub-command to initialize a repository. A repository is a folder within your filesystem that will be tracked as one unit. One way to think about that is that your different projects can be organized into different folders, and each one of them should probably be tracked separately. So, one project folder becomes one git repository (also often referred to as a “repo” for short). Before starting to use git, we then make a directory in which our project code will go.
$ mkdir my_project $ cd my_project
The unix mkdir
command creates a new directory called (in this case) my_project
and the cd
command changes our working directory so that we are now working within the new directory that we have created. And here comes the first git sub-command, to initialize a repository:
$ git init Initialized empty Git repository in /Users/arokem/projects/my_project/.git/
As you can see, when this sub-command is issued, git reports back that the empty repository has been created. We can start adding files to the repository by creating new files and telling git that these are files that we would like to track.
$ touch my_file.txt $ git add my_file.txt
We’ve already seen the touch
bash command before — it creates a new empty file. Here, we’ve created a text file, but this can also be (more typically) a file that contains a program written in some programming language. The git add
command tells git that this is a file that we would like to track — we would like git to record any changes that we make in this file. A useful command that we will issue early and often is the git status
command. This command reports to you about the current state of the repository that you are working in, without making any changes to its content or state.
$ git status On branch master No commits yet Changes to be committed: (use "git rm --cached <file>..." to unstage) new file: my_file.txt
Here it is telling us a few things about the repository and our work within it. Let’s break this message down line by line. First of all, it tells us that we are working on a branch called master
. Branches are a very useful feature of git. They can be thought of as different versions of the same repository that you are storing side by side. The default when a repository is initialized is to have only one branch called master
. However, in recent years, git users have raised objections to the use of this word to designate the default branch, together with objections to other uses of “master”/”slave” designations in a variety of computer science and engineering (e.g., in distributed computing, in databases, etc.; to read more about this, see “Additional resources” at the end of this section). Therefore, we will immediately change the name of the branch to “main
“.
$ git branch -M main
In general, it is a good practice to use not the default branch (now called main
), but a different branch for all of your new development work. The benefit of this approach is that as you are working on your project, you can always come back to a “clean” version of your work that is stored on your main
branch. It is only when you are sure that you are ready to incorporate some new development work that you merge a new development branch into your main branch. The next line in the status message tells us that we have not made any commits to the history of this repository yet. Finally, the last three lines tell us that there are changes that we could commit and make part of the history of the repository. This is the new file that we just added, and is currently stored in something called the “staging area”. This is a part of the repository into which we have put files and changes that we might put into the history of the repository next, but we can also “unstage” them so that they are not included in the history. Either because we don’t want to include them yet, or because we don’t want to include them at all. In parentheses git also gives us a hint about how we might remove the file from the staging area — that is, how we could “unstage” it. But for now, let’s assume that we do want to store the addition of this file in the history of the repository and track additional changes. Storing changes in the git repository is called making a commit, and the next thing we do here is to issue the git commit
command.
$ git commit -m "Adds a new file to the repository" [main (root-commit) 1808a80] Adds a new file to the repository 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 my_file.txt
As you can see, the commit
command uses the -m
flag. The text in quotes after this flag is the commit message. We are recording a history of changes in the files that we are storing in our repository, and alongside the changes, we are storing a log of these messages that give us a sense of who made these changes, when they made them, and — stored in these commit messages — what the intention of these changes was. Git also provides some additional information: a new file was created; the mode of the file is 664, which is the mode for files that can be read by anyone and written by the user (for more about file modes and permissions, you can refer to the man page of the chmod
unix command). There were 0 lines inserted and 0 lines deleted because the file is empty and doesn’t have any files yet.
We can revisit the history of the repository using the git log
command.
$ git log
This should drop you into a new buffer within the shell that looks something like this:
commit 1808a8053803150c8a022b844a3257ae192f413c (HEAD -> main) Author: Ariel Rokem <arokem@gmail.com> Date: Fri Dec 31 11:29:31 2021 -0800 Adds a new file to the repository
It tells you who made the change (Ariel, in this case), how you might reach that person (yes, that’s Ariel’s email address; git knows it because of the configuration step we did when we first started using git) and when the change was made (Almost noon on the last day of 2021, as it so happens). In the body of that entry, you can see the message that we entered when we issued the git
command. At the top of the entry, we also have a line that tells us that this is the commit identified through a string of letters and numbers that starts with “1808a”. This string is called the “SHA” of the commit, which stands for “Simple Hashing Algorithm”. Git uses this clever algorithm to encode all of the information that was included in the commit (in this case, the name of the new file, and the fact that it is empty) and to encrypt this information into the string of letters and numbers that you see. The advantage of this approach is that this is an identifier that is unique within this repository because this specific change can only be made once 1. This will allow us to look up specific changes that were made within particular commits, and also allows us to restore the repository, or parts of the repository, back to the state they had when this commit was made. This is a particularly compelling aspect of version control: you can effectively travel back in time to the day and time that you made a particular change in your project and visit the code as it was on that day and time. This also relates to the one last detail in this log message: next to the SHA identifier, in parentheses is the message “
commit(HEAD -> main)
“. The word “HEAD
” always refers to the state of the repository in the commit that you are currently viewing. Currently, HEAD — the current state of the repository — is aligned with the state of the main
branch, which is why it is pointing to main
. As we continue to work, we’ll be moving HEAD
around to different states of the repository. We’ll see the mechanics of that in a little bit. But first, let’s leave this log buffer — by typing the q
key.
Let’s check what git thinks of all this.
$ git status On branch main nothing to commit, working tree clean
We’re still on the main branch. But now git tells us that there is nothing to commit and that our working tree is clean. This is a way of saying that nothing has changed since the last time we made a commit. When using git for your own work, this is probably the state in which you would like to leave it at the end of a working session. To summarize so far: we haven’t done much yet, but we’ve initialized a repository, and added a file to it. Along the way, you’ve seen how to check the status of the repository, how to commit changes to the history of the repository, and how to examine the log that records the history of the repository. As you will see next, at the most basic level of operating with git, we go through cycles of changes and additions to the files in the repository and commits that record these changes, both operations that you have already seen. Overall, we will discuss three different levels of intricacy. The first level, which we will start discussing next, is about tracking changes that you make to a repository and using the history of the repository to recover different states of your project. At the second level, we will dive a little bit more into the use of branches to make changes and incorporate them into your project’s main branch. The third level is about collaborating with others on a project.
Working with git at the first level: tracking changes that you make
Continuing the project we started above, we might make some edits to the file that we added. The file that we created,my_file.txt
, is a text file and we can open it for editing in many different applications (we discuss text editors — used to edit code — in {numref}python-env
). We can also use the unix echo
command to add some text to the file from the command line.
$ echo "a first line of text" >> my_file.txt
Here, the >>
operator is used to redirect the output of the echo
command — you can check what it does by reading the man page — into our text file. Note that there are two >
symbols in that command, which indicates that we would like to concatenate the string of characters to the end of the file. If we used only one, we would overwrite the contents of the file entirely. For now, that’s not too important because there wasn’t anything in the file, to begin with, but it will be important in what follows.
We’ve made some changes to the file. Let’s see what git says.
$ git status On branch main Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git restore <file>..." to discard changes in working directory) modified: my_file.txt no changes added to commit (use "git add" and/or "git commit -a")
Let’s break that down again. We’re still on the main
branch. But we now have some changes in the file, which are indicated by the fact that there are changes not staged for a commit in our text file. Git provides two hints, in parentheses, as to different things we might do next, either to move these changes into the staging area or to discard these changes and return to the state of affairs as it was when we made our previous commit.
Let’s assume that we’d like to retain these changes. That means that we’d like to first move these changes into the staging area. This is done using the git add
sub-command.
$ git add my_file.txt
Let’s see how this changes the message that git status
produces.
$ git status On branch main Changes to be committed: (use "git restore --staged <file>..." to unstage) modified: my_file.txt
The file in question has moved from “changes not staged for commit” to “changes to be committed”. That is, it has been moved into the staging area. As suggested in the parentheses, it can be unstaged, or we can continue as we did before and commit these changes.
$ git commit -m"Adds text to the file" [main 42bab79] Adds text to the file 1 file changed, 1 insertion(+)
Again, git provides some information: the branch into which this commit was added (main
) and the fact that one file changed, with one line of insertion. We can look at the log again to see what has been recorded:
$ git log
Which should look something like the following, with the most recent commit at the top.
commit 42bab7959c3d3c0bce9f753abf76e097bab0d4a8 (HEAD -> main) Author: Ariel Rokem <arokem@gmail.com> Date: Fri Dec 31 20:00:05 2021 -0800 Adds text to the file commit 1808a8053803150c8a022b844a3257ae192f413c Author: Ariel Rokem <arokem@gmail.com> Date: Fri Dec 31 11:29:31 2021 -0800 Adds a new file to the repository
Again, we can move out of this buffer by pressing the q
key. We can also check the status of things again, confirming that there are no more changes to be recorded.
$ git status On branch main nothing to commit, working tree clean
This is the basic cycle of using git: make changes to the files, use git add
to add them to the staging area, and then git
commit` to add the staged changes to the history of the project, with a record made in the log. Let’s do this one more time, and see a couple more things that you can do along the way, as you are making changes. For example, let’s add another line of text to that file.
$ echo "another line of text"
We can continue in the cycle, with a git add
and then git commit
right here, but let’s pause to introduce one more thing that git can do, which is to tell you exactly what has changed in your repository since your last commit. This is done using the git diff
command. If you issue that command:
$ git diff
you should be again dropped into a different buffer that looks something like this:
diff --git a/my_file.txt b/my_file.txt index 2de149c..a344dc0 100644 --- a/my_file.txt +++ b/my_file.txt @@ -1 +1,2 @@ a first line of text +another line of text
This buffer contains the information that tells you about the changes in the repository. The first few lines of this information tell you a bit about where these changes occurred and the nature of these changes. The last two lines tell you exactly the content of these changes: the addition of one line of text (indicated by a “+” sign). If you had already added other files to the repository and made changes across multiple different files (this is pretty typical as your project grows and the changes that you introduce during your work become more complex), different “chunks” that indicate changes in a different file, would be delineated by headers that look like the fifth line in this output, starting with the “@@” symbols. It is not necessary to run git
if you know exactly what changes you are staging and committing, but it provides a beneficial look into these changes in cases where the changes are complex. In particular, examining the output of
diffgit diff
can give you a hint about what you might write in your commit message. This would be a good point to make a slight aside about these messages and to try to impress upon you that it is a great idea to write commit messages that are clear and informative. This is because as your project changes and the log of your repository grows longer, these messages will serve you (and your collaborators, if more than one person is working with you on the same repository, we’ll get to that in a little bit) as guide-posts to find particular sets of changes that you made along the way. Think of these as messages that you are sending right now, with your memory of the changes you made to the project fresh in your mind, to yourself six months from now, when you will no longer remember exactly what changes you made and why.
At any rate, committing these changes can now look as simple as:
$ git add my_file.txt $ git commit -m"Adds a second line of text"
Which would now increase your log to three different entries:
commit f1befd701b2fc09a52005156b333eac79a826a07 (HEAD -> main) Author: Ariel Rokem <arokem@gmail.com> Date: Fri Dec 31 20:26:24 2021 -0800 Adds a second line of text commit 42bab7959c3d3c0bce9f753abf76e097bab0d4a8 Author: Ariel Rokem <arokem@gmail.com> Date: Fri Dec 31 20:00:05 2021 -0800 Adds text to the file commit 1808a8053803150c8a022b844a3257ae192f413c Author: Ariel Rokem <arokem@gmail.com> Date: Fri Dec 31 11:29:31 2021 -0800 Adds a new file to the repository
As you have already seen, this log serves as a record of the changes you made to files in your project. In addition, it can also serve as an entry point to undoing these changes or to see the state of the files as they were at a particular point in time. For example, let’s say that you would like to go back to the state that the file had before the addition of the second line of text to that file. To do so, you would use the SHA identifier of the commit that precedes that change and the git checkout
command.
$ git checkout 42bab my_file.txt Updated 1 path from f9c56a9
Now, you can open the my_file.txt file and see that the contents of this file have been changed to remove the second line of text. Notice also that we didn’t have to type the entire 40-character SHA identifier. That is because the five first characters uniquely identify this commit within this repository and git can find that commit with just that information. To re-apply the change, you can use the same checkout
command, but use the SHA of the later commit instead.
$ git checkout f1bef my_file.txt Updated 1 path from e798023
One way to think of the checkout
command is that as you are working on your project git is creating a big library that contains all of the different possible states of your filesystem. When you want to change your filesystem to match one of these other states, you can go to this library and check it out from there. Another way to think about this is that we are pointing HEAD
to different commits and thereby changing the state of the filesystem. This capability becomes increasingly useful when we combine it with the use of branches, which we will see next.
Working with git at the second level: branching and merging
As we mentioned before, branches are different states of your project that can exist side by side. One of the dilemmas that we face when we are working on data analysis projects for a duration of time is that we would like to be able to rapidly make experiments and changes to the code, but we often also want to be able to switch over to a state of the code that “just works”. This is exactly the way that branches work: Using branches allows you to make rapid changes to your project without having to worry that you will not be able to easily recover a more stable state of the project. Let’s see this in action in the minimal project that we started working with. To create a new branch, we use the git
command. To start working on this branch we check it out from the repository (the library) using the
branchgit checkout
command.
$ git branch feature_x $ git checkout feature_x
$ echo "this is a line with feature x" >> my_file.txt $ git add my_file.txt $ git commit -m"Adds feature x to the file"
Examining the git log
you will see that there is an additional entry in the history. For brevity, we show here just the two top (most recent) entries, but the other entries will also be in the log (remember that you can leave the log buffer by pressing the q
key).
commit ab2c28e5c08ca80c9d9fa2abab5d7501147851e1 (HEAD -> feature_x) Author: Ariel Rokem <arokem@gmail.com> Date: Sat Jan 1 13:27:48 2022 -0800 Adds feature x to the file commit f1befd701b2fc09a52005156b333eac79a826a07 (main) Author: Ariel Rokem <arokem@gmail.com> Date: Fri Dec 31 20:26:24 2021 -0800 Adds a second line of text
You can also see that HEAD
— the current state of the repository — is pointing to the feature_x
branch. We also include the second entry in the log, so that you can see that the main
branch is still in the state that it was before. We can further verify that by checking out the main
branch.
$ git checkout main
If you open the file now, you will see that the feature x line is nowhere to be seen and if you look at the log, you will see that this entry is not in the log for this branch and that HEAD
is now pointing to main
. But — and this is part of what makes branches so useful — it is very easy to switch back and forth between any two branches. Another git checkout feature_x
would bring HEAD
back to the feature_x
branch and adds back that additional line to the file. In each of these branches, you can continue to make changes and commit these changes, without affecting the other branch. The history of the two branches has forked in two different directions. Eventually, if the changes that you made on the feature_x
branch are useful, you will want to combine the history of these two branches. This is done using the git merge
command, which asks git to bring all of the commits from one branch into the history of another branch. The syntax of this command assumes that HEAD
is pointing to the branch into which you would like to merge the changes, so if we are merging feature_x
into main
we would issue one more git checkout main
before issuing the merge command.
$ git checkout main $ git merge feature_x Updating f1befd7..ab2c28e Fast-forward my_file.txt | 1 + 1 file changed, 1 insertion(+)
The message from git indicates to us that the main
branch has been updated — it has been “fast-forwarded”. It also tells us that this update pulled in changes in one file, with the addition of one line of insertion. A call to git log
will show us that the most recent commit (the one introducing the “feature x” line) is now in the history of the main
branch and HEAD
is pointing to both main
and feature_x
.
commit ab2c28e5c08ca80c9d9fa2abab5d7501147851e1 (HEAD -> main, feature_x) Author: Ariel Rokem <arokem@gmail.com> Date: Sat Jan 1 13:27:48 2022 -0800 Adds feature x to the file
This is possible because both of these branches now have the same history. If we’d like to continue working, we can remove the feature_x
branch and keep going.
$ git branch -d feature_x
Using branches in this way allows you to rapidly make changes to your projects, while always being sure that you can switch the state of the repository back to a known working state. We recommend that you keep your main
branch in this known working state and only merge changes from other branches when these changes have matured sufficiently. One way that is commonly used to determine whether a branch is ready to be merged into main
is by asking a collaborator to look over these changes and review them in detail. This brings us to the most elaborate level of using git — in collaboration with others.
Working with git at the third level: collaborating with others
So far, we’ve seen how git can be used to track your work on a project. This is certainly useful when you do it on your own, but git really starts to shine when you put it to work to track the work of more than one person on the same project. For you to be able to collaborate with others on a git-tracked project, you will need to put the git repository somewhere where different computers can access it. This is called a “remote”, because it is a copy of the repository that is on another computer, usually located somewhere remote from your computer. Git supports many different modes of setting up remotes, but we will only show one here, using the GitHub website as the remote.
To get started working with GitHub, you will need to set up a (free) account. If you are a student or instructor at an educational institution, we recommend that you look into the various educational benefits that are available to you through GitHub Education.
Once you have set up your user account on GitHub, you should be able to create a new repository by clicking on the “+” sign on the top right of the web page and selecting “new repository”
On the following page ({numref}`Figure 2`), you will be asked to name your repository. GitHub is a publicly available website, but you can choose whether you would like to make your code publicly viewable (by anyone!) or only viewable by yourself and by collaborators that you will designate. You will also be given a few other options, such as an option to add a README file and to add a license. We’ll come back to these options and their implications when we talk about sharing data analysis code in {numref}`sharing
The newly-created web page will be the landing page for your repository on the internet. It will have a URL that will look something like: https://github.com/<user name>/<project name>
. For example, since we created this repository under Ariel’s GitHub account, it now has the URL https://github.com/arokem/my_project
(feel free to visit that URL. It has been designated as publicly viewable). But there is nothing there until we tell git to transfer over the files from our local copy of the repository on our machine into a copy on the remote — the GitHub webpage. When a GitHub repository is empty, the front page of the repository contains instructions for adding code to it, either by creating a new repository on the command line or from an existing repository. We will do the latter here because we already have a repository that we have started working on. The instructions for this case are entered in the command line with the shell set to have its working directory in the directory that stores our repository.
$ git remote add origin https://github.com/arokem/my_project.git $ git branch -M main $ git push -u origin main
The first line of these uses the git remote
sub-command, which manages remotes, and the sub-sub-command git remote add
to add a new remote called origin
. We could name it anything we want (for example git remote add github
or git remote add arokem
), but it is a convention to use origin
(or sometimes also upstream
) as the name of the remote that serves as the central node for collaboration. It also tells git that this remote exists at that URL (notice the addition of .git
to the URL). Addmitedly, it’s a long command with many parts.
The next line is something that we already did when we first initialized this repository — changing the name of the default branch to main
. So, we can skip that now.
The third line uses the git push
sub-command to copy the repository — the files stored in it and its entire history — to the remote. We are telling git that we would like to copy the main
branch from our computer to the remote that we have named origin
. The -u
flag tells git that this is going to be our default for git push
from now on, so whenever we’d like to push again from main
to origin
, we will only have to type git push
from now on, and not the entire long version of this incantation (git push origin main
).
Importantly, though you should have created a password when you created your GitHub account, GitHub does not support authentication with a password from the command line. This makes GitHub repositories very secure, but it also means that when you are asked to authenticate to push your repository to GitHub, you will need to take a different route. There are two ways to go about this. The first is to create a personal access token (PAT). The GitHub documentation has a webpage that describes this process. Instead of typing the GitHub password that you created when you created your account, copy this token (typically, a long string of letters, numbers, and other characters) into the shell when prompted for a password. This should look as follows.
$ git push -u origin main Username for 'https://github.com': arokem Password for 'https://arokem@github.com':<insert your token here> Enumerating objects: 12, done. Counting objects: 100% (12/12), done. Delta compression using up to 8 threads Compressing objects: 100% (5/5), done. Writing objects: 100% (12/12), 995 bytes | 497.00 KiB/s, done. Total 12 (delta 0), reused 0 (delta 0), pack-reused 0 To https://github.com/arokem/my_project.git * [new branch] main -> main Branch 'main' set up to track remote branch 'main' from 'origin'.
Another option is to use SSH keys for authentication. This is also described in the GitHub documentation. The first step is to create the key and store it on your computer, described here. The second step is to let GitHub know about this SSH key, as described here. The benefit of using an SSH key is that we do not need to generate it anew every time that we want to authenticate from this computer. Importantly, if we choose the SSH key route for authenticating, we need to change the URL that we use to refer to the remote. This is done through another sub-command of the git remote
command:
$ git remote set-url origin git@github.com:arokem/my_project.git
Once we have done that, we can visit the webpage of the repository again and see that the files in the project and the history of the project have now been copied into that website ({numref}Figure 3
). Even if you are working on a project on your own this serves as a remote backup, and that is already quite useful to have, but we are now also ready to start collaborating. This is done by clicking on the “settings” tab at the top right part of the repository page, selecting the “manage access” menu item, and then clicking on the “add people” button ({numref}Figure 4
). This will let you select other GitHub users, or enter the email address for your collaborator ({numref}Figure 5
). As another security measure, after they are added as a collaborator, this other person will have to receive an email from GitHub and approve that they would like to be designated as a collaborator on this repository, by clicking on a link in the email that they received.
Once they approve the request to join this GitHub repository as a collaborator, this other person can then both `git push` to your repository, just as we demonstrated above, as well as `git pull` from this repository, which we will elaborate on below. The first step they will need to take, however, is to `git clone` the repository. This would look something like this:
$ git clone https://github.com/arokem/my_project.git
When this is executed inside of a particular working directory (for example, ~/projects/
), this will create a sub-directory called my_project
(~/projects/my_project
; note that you can’t do that in the same directory that already has a my_project
folder in it, so if you are following along, issue this git clone
command in another part of your file-system) that contains all of the files that were pushed to GitHub. In addition, this directory is a full copy of the git repository, meaning that it also contains the entire history of the project. All of the operations that we did with the local copy on our own repository (for example, checking out a file using a particular commit’s SHA identifier) are now possible on this new copy.
The simplest mode of collaboration is one where a single person makes all of the changes to the files. This person is the only person who issues git push
commands, which update the copy of the repository on GitHub and then all the other collaborators issue git pull origin main
within their local copies, to update the files that they have on their machines. The git pull
command is the opposite of the git push
command that you saw before, syncing the local copy of the repository to the copy that is stored in a particular remote (in this case, the GitHub copy is automatically designated as origin
when git clone
is issued).
A more typical mode of collaboration is one where different collaborators are all both pulling changes that other collaborators made, as well as pushing their changes. This could work, but if you are working on the same repository at the same time, you might run into some issues. The first arises when someone else pushed to the repository during the time that you were working and the history of the repository on GitHub is out of sync with the history of your local copy of the repository. In that case, you might see a message such as this one:
$ git push origin main To https://github.com/arokem/my_project.git ! [rejected] main -> main (fetch first) error: failed to push some refs to 'https://github.com/arokem/my_project.git' hint: Updates were rejected because the remote contains work that you do hint: not have locally. This is usually caused by another repository pushing hint: to the same ref. You may want to first integrate the remote changes hint: (e.g., 'git pull ...') before pushing again. hint: See the 'Note about fast-forwards' in 'git push --help' for details.```
Often, it would be enough to issue a `git pull origin main`, followed by a `gitpush origin main` (as suggested in the hint) and this issue would be resolved –your collaborator’s changes would be integrated into the history of therepository together with your changes, and you would happily continue working.But in some cases, if you and your collaborator introduced changes to the samelines in the same files, git would not know how to resolve this conflict, andyou would then see a message that looks like this:
$ git pull origin main remote: Enumerating objects: 5, done. remote: Counting objects: 100% (5/5), done. remote: Compressing objects: 100% (2/2), done. remote: Total 3 (delta 0), reused 3 (delta 0), pack-reused 0 Unpacking objects: 100% (3/3), 312 bytes | 78.00 KiB/s, done. From https://github.com/arokem/my_project * branch main -> FETCH_HEAD ab2c28e..9670d72 main -> origin/main Auto-merging my_file.txt CONFLICT (content): Merge conflict in my_file.txt Automatic merge failed; fix conflicts and then commit the result.
This means that there is a conflict between the changes that were introduced by your collaborator and the changes that you introduced. There is no way for git to automatically resolve this kind of situation because it can’t tell which of the two conflicting versions should be selected. Instead, git edits the file to add conflict markers. That is, if you open the text file in a text editor, it would now look something like this:
a first line of text another line of text this is a line with feature x <<<<<<< HEAD an addition you made ======= addition made by a collaborator >>>>>>> 9670d72897a5e15defb257010f928bd22f54c929
Everything up to the third line of the file is changes that both of you had in your history. After that, your changes to the file diverge. At the same time that you added the line of text “an addition you made”, the collaborator added the line “addition made by a collaborator”. To highlight the source of the conflict, git introduced the markers you see in the text snippet above. The “<<<<<<< HEAD
” marker starts the part of the file that was in HEAD
on your local copy. Everything below the “=======
” marker, until the “>>>>>>>
…” comes from the collaborator. The SHA identifier at the end of this marker is also the identifier of the commit that the collaborator made to the repository while you were making your changes.
9670d72
In addition, if you issue a git status
command at this point, you will see the following:
$ git status On branch main Your branch and 'origin/main' have diverged, and have 1 and 1 different commits each, respectively. (use "git pull" to merge the remote branch into yours) You have unmerged paths. (fix conflicts and run "git commit") (use "git merge --abort" to abort the merge) Unmerged paths: (use "git add ..." to mark resolution) both modified: my_file.txt no changes added to commit (use "git add" and/or "git commit -a")
Regardless of whether you would like to include your changes, as well as the changes from your collaborator, or just one of these change sets, your next step is to edit the files where conflicts exist to end up with the content that you would want them to have going forward. This would include removing the conflict markers that git introduced along the way. To resolve the conflict, you might also need to talk to your collaborator, to understand why you both made conflicting changes to the same lines. At the end of this process, you would make one more git commit
, followed by a git push
, and keep working from this new state. Your collaborators should be able to issue a git pull
in their copy of the repository to get the conflict resolved on their end as well.
Another slightly more elaborate pattern of collaboration would involve the use of branches. Remember that we mentioned that maybe it would be a good idea to work only on branches other than main
? One reason for that is that this allows you to go through a process of review of new changes before they get integrated into the main
branch. GitHub facilitates this process through an interface called a “pull request”. The process works as follows: whenever you would like to make a change to the project, you create a new branch from main
and implement the change on this branch. For example:
$ git branch feature_y
$ git checkout feature_y
$ echo "this is a line with feature y" >> my_file.txt
Then, we add the file to the staging area, commit this change in this branch, and push the branch to GitHub:
$ git add my_file.txt
$ git commit -m"Implements feature y"
[feature_y 948ea1d] Implements feature y
1 file changed, 1 insertion(+)
$ git push origin feature_y
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 8 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 283 bytes | 283.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (1/1), completed with 1 local object.
remote:
remote: Create a pull request for 'feature_y' on GitHub by visiting:
remote: https://github.com/arokem/my_project/pull/new/feature_y
remote:
To https://github.com/arokem/my_project.git
* [new branch] feature_y -> feature_y
As you can see in the message emitted by git, a new branch has been created and we can visit the GitHub repository at the URL provided to create a pull request. The page that opens when you visit this URL already includes the commit message that we used to describe. It also has a text box into which we can type more details about the change ({numref}Figure 6
). Once we hit the “Create pull request” button, we are taken to a separate page that serves as a forum for discussion of these changes ({numref}Figure 7
). At the same time, all of the collaborators on the repository receive an email that alerts them that a pull request has been created. By clicking on the “Files changed” tab on this page, collaborators can see a detailed view of all of the changes introduced to the code ({numref}Figure 8
). They can comment on the changes through the web interface and discuss any further changes that need to be made for this contribution to be merged into main
. The original author of the branch that led to this pull request can continue working on the branch, based on these comments, making further commits. When consensus is reached, one of the collaborators on the repository can click the “Merge pull request” button ({numref}Figure 6
). This triggers the sending of another email to all of the collaborators on this repository, alerting them that the main
branch has changed and that they should issue a git pull origin main
on their local copies of the repository to sync their main
branch with the GitHub repository.
While this model of collaboration might seem a bit odd at first, it has a lot of advantages: individual contributions to the project are divided into branches and pull requests. The pull request interface allows for a detailed line-by-line review of changes to the project and discussion. This is particularly important for projects where changes to complex code are required. In tandem with automated code testing (which we will discuss in {numref}`python-env`), thorough code review mitigates merging changes to the code that introduce new bugs to the code base. In addition to these advantages, the pull request is retained after the change is merged, as a permanent record of these discussions. This allows all of the collaborators on a project to revisit how decisions were made to introduce particular changes. This is a great way to retain collective memory about the way that a project evolved, even after collaborators on the project graduate and/or move on to do other things.
More complex collaboration patterns
Here, we demonstrated a pattern that works well for a relatively small collaboration, with a few collaborators who all know each other. But, git and GitHub also enable much more complex patterns, that support large and distributed collaborations between people who don’t even know each other but would like to contribute to the same project. These patterns enable the large open-source software projects that we will start discussing in {numref}`numpy`.
Additional Resources
To learn more about the history and more recent objections to the “master”/”slave” terminology in computer science and engineering, you can refer to {cite}eglash2007broken
.
To explain the use of git and GitHub, we went through a whole sequence of operations using the command line. However, there are several graphical user interfaces (GUIs) for git and GitHub that are worth checking out. First of all, GitHub has a GUI that they developed. In addition, there is an entire page of options to choose from in the git documentation.
To learn about more elaborate collaboration patterns for version control and collaboration, we would recommend studying the git workflow used by the nibabel project. You will learn more about what nibabel does in {numref}nibabel
, but for now, you can use their documentation to learn more about how you might use git.
-
It is theoretically possible to create another change that would have the same commit, but figuring out how to do that would be so computationally demanding that, while theoretically possible, it is not something we need to worry about in any practical sense.
Reference
Rokem A, Yarkoni T. Data Science for Neuroimaging: An Introduction. 2023.
Unix Operating System
The Unix operating system is a fundamental tool for controlling the way that the computer executes programs, including in data analysis. Its history goes back to the 1960s, but it is still one of the most commonly-used tools for computing today. In the 1960s, when Unix was first developed, Computers were very different, accessed usually as mainframe systems that were shared among multiple users. Each user would connect to the computer through their terminal device — a keyboard and a screen that were used to send instructions to the computer and receive outputs. This was before the invention of the first graphical user interfaces (GUIs), so all of the instructions had to be sent to the computer in the form of text. The main way to interact with the computer was through an application called a “shell”. The shell application usually includes a prompt, where users can type a variety of different commands. This prompt is also called a “command line”. The commands typed at the command line can be sent to the computer’s operating system to do a variety of different operations, or to launch various other programs. Often, these programs will then produce text outputs that are printed into the shell as well. Over the years, computers evolved and changed, but the idea of a text-based terminal remains. In different operating systems, you will find the shell application in different places. On Windows, you can install a shell application by installing git for Windows (this will also end up being useful for the following sections, which introduce version control with git and containerization with Docker). On Apple’s Mac computers, you can find a shell application in your Applications/Utilities folder under the name “Terminal”. On the many variants of the Linux operating system, the shell is quite central as well and comes installed with the operating system.
Using Unix
The developers of the Unix shell believed that programs that run in this kind of environment should be built to each do only one thing. Ideally, each program’s output should be formatted so that it could be used as input to another program. This means that users could use multiple small programs to construct more complicated programs and pipelines based on combinations of different tools. Let’s look at a simple example of that. Opening up the shell, you will be staring at the prompt. On my computer that looks something like this
&
You can type commands into this prompt and press the “enter” or “return” key to execute them. The shell will send these commands to the operating system and then some output might appear inside the shell. This is what is called a “read, evaluate, print loop”, or REPL. That is because the application reads what you type, and evaluates it to understand what it means and what information to provide in return, it prints that information to the screen, and then it repeats that whole process, in an infinite loop, which ends only when you quit the application, or turn off the computer.
Exploring the filesystem
When you first start the shell, the working directory that it immediately sees is your home directory. This means that the files and folders in your home directory are immediately accessible to you. For example, you can type the ls
command to get a listing of the files and folders that the shell sees in your working directory. For example, in the shell on one of our computers:
$ ls
Applications Downloads Music Untitled.ipynb
Desktop Library Pictures miniconda3
Documents Movies Public projects
Most of the items listed here are folders that came installed with Ariel’s computer when he bought it. For example, Documents
and Desktop
. Others are folders that he created in the home directory, for example, projects
. There is also a single stray file here, Untitled.ipynb
, which is a lone Jupyter notebook (ipynb
is the extension for these files) that remains here from some impromptu data analysis that he once did. The ls command (and many other unix commands) can be modified using flags. These are characters or words added to the command, that modify the way that the command runs. For example, if adding the -F
flag to the call to ls
, adds a slash (/
) character at the end of the names of folders, which is practically useful, because it tells us which of the names in the list refer to a file, and which refer to a folder that contains other files and folders.
$ ls -F
Applications/ Downloads/ Music/ Untitled.ipynb
Desktop/ Library/ Pictures/ miniconda3/
Documents/ Movies/ Public/ projects/
In general, if we want to know more about a particular command, we can issue the man
command, followed by the name of the command for which we would like to read the so-called man page (man presumably stands for “manual”). For example, the following command would open the man page for the ls command, telling us about all of the options that we have to modify the way that ls
works.
$ man ls
To exit the man page, we would type the q
key. We can ask the shell to change the working directory in which we are currently working by issuing the cd
(or “change directory”) command.
$ cd Documents
which would then change what it sees when we ask it to list the files.
$ ls -F
books/
conferences/
courses/
papers/
This is the list of directories that are within the Documents
directory. We can see where the change has occurred by asking the shell where we are, using the pwd
command (which stands for “present working directory”).
$ pwd /Users/arokem/Documents
Note: this is the answer that Ariel sees (on his Mac laptop computer), and you might see something slightly different, depending on the way your computer is set up. For example, if you are using the shell that you installed from gitforwindows on a Windows machine (this shell is also called a git bash
shell), your home directory is probably going to look more like this:
$ pwd /c/Users/arokem
This is the address of the standard C:Usersarokem
Windows home directory, translated into a more unix-like format. If we want to change our working directory back to where the shell started, we can call cd
again. This command can be used in one of several ways:
$ cd /Users/arokem
This is a way to refer to the absolute path of the home directory. It is absolute because this command would bring us back to the home directory, no matter where in the file system we happened to be working before we issued it. This command also tells us where within the structure of the file system the home directory is located. The slash (/
) characters in the command are to be read as separators that designate the relationships between different items within the file system. For example, in this case, the home directory is considered to be inside of a directory called “Users”, which in turn is inside the root of the filesystem (simply designated as the /
at the beginning of the absolute path). This idea — that files and folders are inside other files and folders — organizes the filesystem as a whole. Another way to think about this is that the file system on our computer is organized as a tree. The root of the tree is the root of the entire filesystem (/
) and all of the items saved in the filesystem stem from the root. Different branches split off from the root, and they can split further. Finally, at the end of the branches (the leaves, if you will) are the files that are saved within folders at the end of every path through the branches of the tree. The tree structure makes organizing and finding things easier.
Another command we might issue here, that would also take us to the home directory is:
$ cd ..
The ..
is a special way to refer to the directory directly above the directory in which we are currently working within the filesystem tree, bringing us one step closer to the root. Depending on what directory you are already in, it would take you to different places. Because of that, we refer to this as a relative path. Similarly, if we were working within the home directory and issued the following command:
$ cd Documents
this would be a relative path. This is because it does not describe the relationship between this folder and the root of the filesystem, and would work differently depending on our present working directory. For example, if we issue that command while our working directory was inside of the Documents
directory, we would get an error from the shell because, given only a relative path, it can’t find anything called Documents
inside of the Documents
folder.
One more way I can get to the home directory from anywhere in the filesystem is by using the tilde character (~
). So, this command:
$ cd ~
is equivalent to this command:
$ cd /Users/arokem
Similarly, you can refer to folders inside of your home directory by attaching the ~/
before writing down the relative path from the home directory to that location. For example, to get from anywhere to the Documents
directory, I could issue the following, which is interpreted as an absolute path.
$ cd ~/Documents
Exercise
The touch
command creates a new empty file in your filesystem. You can create it using relative and absolute paths. How would you create a new file called new_file.txt
in your home directory? How would you do that if you were working inside of your ~/Documents
directory? The mv
command moves a file from one location to another. How would you move new_file.txt
from the home directory to the Documents
directory? How would this be different from using the cp
command? (hint: use the mv
and cp
man pages to see what these commands do).
The pipe operator
There are many other commands in the Unix shell, and we will not demonstrate all of them here (see a table below of some commonly-used commands). Instead, we will now proceed to demonstrate one of the important principles that we mentioned before: the output of one command can be directly used as an input to another command. For example, if we wanted to quickly count the number of items in the home directory, we could provide the output of the ls
command directly as input into the wc
command (“word count”, which counts words). In unix, that is called “creating a pipe” between the commands and we use the pipe operator, which is the vertical line that usually sits in the top right of the US English keyboard: |
, to do so:
$ ls | wc
13 13 117
Now, instead of printing out the list of files in my home directory, the shell prints out the number of words, lines, and characters in the output of the ls
command. It tells us that there are 13 words, 13 lines, and 117 characters in the output of ls (it looked as though the output of ls
was 3 lines, but there were line breaks between each column in the output). The order of the pipe operation is from left to right, and we don’t have to stop here. For example, we can use another pipe to ask how many words in the output of ls contain the letter D
. For this, we’ll use the grep
command, which finds all of the instances of a particular letter or phrase in its input:
$ ls | grep "D" | wc 3 3 28
To see why this is the case, try running this yourself in your home directory. Try omitting the final pipe to wc
and seeing what the output of ls | grep "D"
might be. This should give you a sense of how unix combines operations. Though it may seem a bit silly at first — why would we want to count how many words have the letter “D” in them in a list of files? — when combined in the right way, it can give you a lot of power to automate operations that you’d like to do. For example, identifying certain files in a directory and processing all of them through a command line application.
Command | Description |
---|---|
ls | List the contents of the current directory. |
cd | Change the current directory. |
pwd | Print the path of the current directory. |
mkdir | Create a new directory. |
touch | Create a new file. |
cp | Copy a file or directory. |
mv | Move or rename a file or directory. |
rm | Remove a file or directory. |
cat | Print the contents of a file to the terminal. |
less | View the contents of a file one page at a time. |
grep | Search for a pattern in a file or files. |
sort | Sort the lines of a file. |
find | Search for files based on their name, size, or other attributes. |
wc | Print the number of lines, words, and bytes in a file. |
chmod | Change the permissions of a file or directory. |
chown | Change the ownership of a file or directory. |
head | Print the first few lines of a file. |
tail | Print the last few lines of a file. |
diff | Compare two files and show the differences between them. |
More about unix
We’re going to move on from unix now. In the next two chapters, you will see some more intricate and specific uses of the command line through two applications that run as command-line interfaces. Together with and also independently of the tools you will see below, unix provides a lot of power to explicitly operate on files and folders in your filesystem and to run a variety of applications, so becoming facile with the shell will be a boon to your work.
Additional resources
To learn more about the unix philosophy, we recommend “The Unix Programming Environment” By Kernighan and Pike. The authors are two of the original developers of the system. The writing is a bit archaic, but understanding some of the constraints that applied to computers at the time that unix was developed could help you understand and appreciate why unix operates as it does, and why some of these constraints have been kept, even while computers have evolved and changed in other respects.
Reference
Rokem A, Yarkoni T. Data Science for Neuroimaging: An Introduction. 2023.
Computational Psychiatry
Why Computational Psychiatry?
Biologically Based Neural Circuit Models

Endophenotypes across Brain Disorder Categories
Big Data and Model-Aided Diagnosis
Biophysically Based Neural Circuit Modeling: Understanding across Levels

Looking Forward: Building a New Cross-Disciplinary Field
Computational Neuroscience
Computational neuroscience (also known as theoretical neuroscience or mathematical neuroscience) is a branch of neuroscience which employs mathematics, computer science, theoretical analysis and abstractions of the brain to understand the principles that govern the development, structure, physiology and cognitive abilities of the nervous system.[1][2][3][4]
Computational neuroscience employs computational simulations[5] to validate and solve mathematical models, and so can be seen as a sub-field of theoretical neuroscience; however, the two fields are often synonymous.[6] The term mathematical neuroscience is also used sometimes, to stress the quantitative nature of the field.[7]
Computational neuroscience focuses on the description of biologically plausible neurons (and neural systems) and their physiology and dynamics, and it is therefore not directly concerned with biologically unrealistic models used in connectionism, control theory, cybernetics, quantitative psychology, machine learning, artificial neural networks, artificial intelligence and computational learning theory;[8][9] [10] although mutual inspiration exists and sometimes there is no strict limit between fields,[11][12][13] with model abstraction in computational neuroscience depending on research scope and the granularity at which biological entities are analyzed.
Models in theoretical neuroscience are aimed at capturing the essential features of the biological system at multiple spatial-temporal scales, from membrane currents, and chemical coupling via network oscillations, columnar and topographic architecture, nuclei, all the way up to psychological faculties like memory, learning and behavior. These computational models frame hypotheses that can be directly tested by biological or psychological experiments.
The term ‘computational neuroscience’ was introduced by Eric L. Schwartz, who organized a conference, held in 1985 in Carmel, California, at the request of the Systems Development Foundation to provide a summary of the current status of a field which until that point was referred to by a variety of names, such as neural modeling, brain theory and neural networks. The proceedings of this definitional meeting were published in 1990 as the book Computational Neuroscience.[14] The first of the annual open international meetings focused on Computational Neuroscience was organized by James M. Bower and John Miller in San Francisco, California in 1989.[15] The first graduate educational program in computational neuroscience was organized as the Computational and Neural Systems Ph.D. program at the California Institute of Technology in 1985.
The early historical roots of the field[16] can be traced to the work of people including Louis Lapicque, Hodgkin & Huxley, Hubel and Wiesel, and David Marr. Lapicque introduced the integrate and fire model of the neuron in a seminal article published in 1907,[17] a model still popular for artificial neural networksstudies because of its simplicity (see a recent review[18]).
About 40 years later, Hodgkin and Huxley developed the voltage clamp and created the first biophysical model of the action potential. Hubel and Wiesel discovered that neurons in the primary visual cortex, the first cortical area to process information coming from the retina, have oriented receptive fields and are organized in columns.[19] David Marr’s work focused on the interactions between neurons, suggesting computational approaches to the study of how functional groups of neurons within the hippocampus and neocortex interact, store, process, and transmit information. Computational modeling of biophysically realistic neurons and dendrites began with the work of Wilfrid Rall, with the first multicompartmental model using cable theory.
Major topics
Research in computational neuroscience can be roughly categorized into several lines of inquiry. Most computational neuroscientists collaborate closely with experimentalists in analyzing novel data and synthesizing new models of biological phenomena.
Single-neuron modeling
Even a single neuron has complex biophysical characteristics and can perform computations (e.g.[20]). Hodgkin and Huxley’s original model only employed two voltage-sensitive currents (Voltage sensitive ion channels are glycoprotein molecules which extend through the lipid bilayer, allowing ions to traverse under certain conditions through the axolemma), the fast-acting sodium and the inward-rectifying potassium. Though successful in predicting the timing and qualitative features of the action potential, it nevertheless failed to predict a number of important features such as adaptation and shunting. Scientists now believe that there are a wide variety of voltage-sensitive currents, and the implications of the differing dynamics, modulations, and sensitivity of these currents is an important topic of computational neuroscience.[21] The computational functions of complex dendrites are also under intense investigation. There is a large body of literature regarding how different currents interact with geometric properties of neurons.[22] There are many software packages, such as GENESIS and NEURON, that allow rapid and systematic in silico modeling of realistic neurons. Blue Brain, a project founded by Henry Markram from the École Polytechnique Fédérale de Lausanne, aims to construct a biophysically detailed simulation of a cortical column on the Blue Gene supercomputer. Modeling the richness of biophysical properties on the single-neuron scale can supply mechanisms that serve as the building blocks for network dynamics.[23] However, detailed neuron descriptions are computationally expensive and this computing cost can limit the pursuit of realistic network investigations, where many neurons need to be simulated. As a result, researchers that study large neural circuits typically represent each neuron and synapse with an artificially simple model, ignoring much of the biological detail. Hence there is a drive to produce simplified neuron models that can retain significant biological fidelity at a low computational overhead. Algorithms have been developed to produce faithful, faster running, simplified surrogate neuron models from computationally expensive, detailed neuron models.[24]
Modeling Neuron-glia interactions
Glial cells participate significantly in the regulation of neuronal activity at both the cellular and the network level. Modeling this interaction allows to clarify the potassium cycle,[25][26] so important for maintaining homeostasis and to prevent epileptic seizures. Modeling reveals the role of glial protrusions that can penetrate in some cases the synaptic cleft to interfere with the synaptic transmission and thus control synaptic communication.[27]
Development, axonal patterning, and guidance
Computational neuroscience aims to address a wide array of questions, including: How do axons and dendrites form during development? How do axons know where to target and how to reach these targets? How do neurons migrate to the proper position in the central and peripheral systems? How do synapses form? We know from molecular biology that distinct parts of the nervous system release distinct chemical cues, from growth factors to hormones that modulate and influence the growth and development of functional connections between neurons. Theoretical investigations into the formation and patterning of synaptic connection and morphology are still nascent. One hypothesis that has recently garnered some attention is the minimal wiring hypothesis, which postulates that the formation of axons and dendrites effectively minimizes resource allocation while maintaining maximal information storage.[28]
Sensory processing
Early models on sensory processing understood within a theoretical framework are credited to Horace Barlow. Somewhat similar to the minimal wiring hypothesis described in the preceding section, Barlow understood the processing of the early sensory systems to be a form of efficient coding, where the neurons encoded information which minimized the number of spikes. Experimental and computational work have since supported this hypothesis in one form or another. For the example of visual processing, efficient coding is manifested in the forms of efficient spatial coding, color coding, temporal/motion coding, stereo coding, and combinations of them.[29] Further along the visual pathway, even the efficiently coded visual information is too much for the capacity of the information bottleneck, the visual attentional bottleneck.[30] A subsequent theory, V1 Saliency Hypothesis (V1SH), has been developed on exogenous attentional selection of a fraction of visual input for further processing, guided by a bottom-up saliency map in the primary visual cortex.[31] Current research in sensory processing is divided among a biophysical modeling of different subsystems and a more theoretical modeling of perception. Current models of perception have suggested that the brain performs some form of Bayesian inference and integration of different sensory information in generating our perception of the physical world.[32][33]
Motor control
Many models of the way the brain controls movement have been developed. This includes models of processing in the brain such as the cerebellum’s role for error correction, skill learning in motor cortex and the basal ganglia, or the control of the vestibulo ocular reflex. This also includes many normative models, such as those of the Bayesian or optimal control flavor which are built on the idea that the brain efficiently solves its problems.
Memory and synaptic plasticity
Earlier models of memory are primarily based on the postulates of Hebbian learning. Biologically relevant models such as Hopfield net have been developed to address the properties of associative (also known as “content-addressable”) style of memory that occur in biological systems. These attempts are primarily focusing on the formation of medium- and long-term memory, localizing in the hippocampus. One of the major problems in neurophysiological memory is how it is maintained and changed through multiple time scales. Unstable synapses are easy to train but also prone to stochastic disruption. Stable synapses forget less easily, but they are also harder to consolidate. It is likely that computational tools will contribute greatly to our understanding of how synapses function and change in relation to external stimulus in the coming decades.
Behaviors of networks
Biological neurons are connected to each other in a complex, recurrent fashion. These connections are, unlike most artificial neural networks, sparse and usually specific. It is not known how information is transmitted through such sparsely connected networks, although specific areas of the brain, such as the visual cortex, are understood in some detail.[34] It is also unknown what the computational functions of these specific connectivity patterns are, if any. The interactions of neurons in a small network can be often reduced to simple models such as the Ising model. The statistical mechanics of such simple systems are well-characterized theoretically. Some recent evidence suggests that dynamics of arbitrary neuronal networks can be reduced to pairwise interactions.[35]It is not known, however, whether such descriptive dynamics impart any important computational function. With the emergence of two-photon microscopy and calcium imaging, we now have powerful experimental methods with which to test the new theories regarding neuronal networks. In some cases the complex interactions between inhibitory and excitatory neurons can be simplified using mean-field theory, which gives rise to the population model of neural networks.[36] While many neurotheorists prefer such models with reduced complexity, others argue that uncovering structural-functional relations depends on including as much neuronal and network structure as possible. Models of this type are typically built in large simulation platforms like GENESIS or NEURON. There have been some attempts to provide unified methods that bridge and integrate these levels of complexity.[37]
Visual attention, identification, and categorization
Visual attention can be described as a set of mechanisms that limit some processing to a subset of incoming stimuli.[38] Attentional mechanisms shape what we see and what we can act upon. They allow for concurrent selection of some (preferably, relevant) information and inhibition of other information. In order to have a more concrete specification of the mechanism underlying visual attention and the binding of features, a number of computational models have been proposed aiming to explain psychophysical findings. In general, all models postulate the existence of a saliency or priority map for registering the potentially interesting areas of the retinal input, and a gating mechanism for reducing the amount of incoming visual information, so that the limited computational resources of the brain can handle it.[39] An example theory that is being extensively tested behaviorally and physiologically is the V1 Saliency Hypothesis that a bottom-up saliency map is created in the primary visual cortex to guide attention exogenously.[31] Computational neuroscience provides a mathematical framework for studying the mechanisms involved in brain function and allows complete simulation and prediction of neuropsychological syndromes.
Cognition, discrimination, and learning
Computational modeling of higher cognitive functions has only recently[when?] begun. Experimental data comes primarily from single-unit recording in primates. The frontal lobe and parietal lobe function as integrators of information from multiple sensory modalities. There are some tentative ideas regarding how simple mutually inhibitory functional circuits in these areas may carry out biologically relevant computation.[40] The brain seems to be able to discriminate and adapt particularly well in certain contexts. For instance, human beings seem to have an enormous capacity for memorizing and recognizing faces. One of the key goals of computational neuroscience is to dissect how biological systems carry out these complex computations efficiently and potentially replicate these processes in building intelligent machines. The brain’s large-scale organizational principles are illuminated by many fields, including biology, psychology, and clinical practice. Integrative neuroscience attempts to consolidate these observations through unified descriptive models and databases of behavioral measures and recordings. These are the bases for some quantitative modeling of large-scale brain activity.[41] The Computational Representational Understanding of Mind (CRUM) is another attempt at modeling human cognition through simulated processes like acquired rule-based systems in decision making and the manipulation of visual representations in decision making.
Consciousness
One of the ultimate goals of psychology/neuroscience is to be able to explain the everyday experience of conscious life. Francis Crick, Giulio Tononi and Christof Koch made some attempts to formulate consistent frameworks for future work in neural correlates of consciousness (NCC), though much of the work in this field remains speculative.[42]
Computational clinical neuroscience
Computational clinical neuroscience is a field that brings together experts in neuroscience, neurology, psychiatry, decision sciences and computational modeling to quantitatively define and investigate problems in neurological and psychiatric diseases, and to train scientists and clinicians that wish to apply these models to diagnosis and treatment.[43][44]
Predictive computational neuroscience
Predictive computational neuroscience is a recent field that combines signal processing, neuroscience, clinical data and machine learning to predict the brain during coma [45] or anesthesia.[46] For example, it is possible to anticipate deep brain states using the EEG signal. These states can be used to anticipate hypnotic concentration to administrate to the patient.
Computational Psychiatry
Computational psychiatry is a new emerging field that brings together experts in machine learning, neuroscience, neurology, psychiatry, psychology to provide an understanding of psychiatric disorders.[47][48][49]
Technology
Neuromorphic computing
A neuromorphic computer/chip is any device that uses physical artificial neurons (made from silicon) to do computations (See: neuromorphic computing, physical neural network). One of the advantages of using a physical model computer such as this is that it takes the computational load of the processor (in the sense that the structural and some of the functional elements don’t have to be programmed since they are in hardware). In recent times,[50] neuromorphic technology has been used to build supercomputers which are used in international neuroscience collaborations. Examples include the Human Brain Project SpiNNakersupercomputer and the BrainScaleS computer.[51]
References
- Trappenberg, Thomas P. (2010). Fundamentals of Computational Neuroscience. United States: Oxford University Press Inc. pp. 2. ISBN 978-0-19-851582-1.
- Patricia S. Churchland; Christof Koch; Terrence J. Sejnowski (1993). “What is computational neuroscience?”. In Eric L. Schwartz (ed.). Computational Neuroscience. MIT Press. pp. 46–55. Archived from the original on 2011-06-04. Retrieved 2009-06-11.
- Dayan P.; Abbott, L. F. (2001). Theoretical neuroscience: computational and mathematical modeling of neural systems. Cambridge, Mass: MIT Press. ISBN 978-0-262-04199-7.
- Gerstner, W.; Kistler, W.; Naud, R.; Paninski, L. (2014). Neuronal Dynamics. Cambridge, UK: Cambridge University Press. ISBN 9781107447615.
- Fan, Xue; Markram, Henry (2019). “A Brief History of Simulation Neuroscience”. Frontiers in Neuroinformatics. 13: 32. doi:10.3389/fninf.2019.00032. ISSN 1662-5196. PMC 6513977. PMID 31133838.
- Thomas, Trappenberg (2010). Fundamentals of Computational Neuroscience. OUP Oxford. p. 2. ISBN 978-0199568413. Retrieved 17 January 2017.
- Gutkin, Boris; Pinto, David; Ermentrout, Bard (2003-03-01). “Mathematical neuroscience: from neurons to circuits to systems”. Journal of Physiology-Paris. Neurogeometry and visual perception. 97 (2): 209–219. doi:10.1016/j.jphysparis.2003.09.005. ISSN 0928-4257. PMID 14766142. S2CID 10040483.
- Kriegeskorte, Nikolaus; Douglas, Pamela K. (September 2018). “Cognitive computational neuroscience”. Nature Neuroscience. 21 (9): 1148–1160. arXiv:1807.11819. Bibcode:2018arXiv180711819K. doi:10.1038/s41593-018-0210-5. ISSN 1546-1726. PMC 6706072. PMID 30127428.
- Paolo, E. D., “Organismically-inspired robotics: homeostatic adaptation and teleology beyond the closed sensorimotor loop”, Dynamical Systems Approach to Embodiment and Sociality, S2CID 15349751
- Brooks, R.; Hassabis, D.; Bray, D.; Shashua, A. (2012-02-22). “Turing centenary: Is the brain a good model for machine intelligence?”. Nature. 482 (7386): 462–463. Bibcode:2012Natur.482..462.. doi:10.1038/482462a. ISSN 0028-0836. PMID 22358812. S2CID 205070106.
- Browne, A. (1997-01-01). Neural Network Perspectives on Cognition and Adaptive Robotics. CRC Press. ISBN 9780750304559.
- Zorzi, Marco; Testolin, Alberto; Stoianov, Ivilin P. (2013-08-20). “Modeling language and cognition with deep unsupervised learning: a tutorial overview”. Frontiers in Psychology. 4: 515. doi:10.3389/fpsyg.2013.00515. ISSN 1664-1078. PMC 3747356. PMID 23970869.
- Shai, Adam; Larkum, Matthew Evan (2017-12-05). “Branching into brains”. eLife. 6. doi:10.7554/eLife.33066. ISSN 2050-084X. PMC 5716658. PMID 29205152.
- Schwartz, Eric (1990). Computational neuroscience. Cambridge, Mass: MIT Press. ISBN 978-0-262-19291-0.
- Bower, James M. (2013). 20 years of Computational neuroscience. Berlin, Germany: Springer. ISBN 978-1461414230.
- Fan, Xue; Markram, Henry (2019). “A Brief History of Simulation Neuroscience”. Frontiers in Neuroinformatics. 13: 32. doi:10.3389/fninf.2019.00032. ISSN 1662-5196. PMC 6513977. PMID 31133838.
- Lapicque L (1907). “Recherches quantitatives sur l’excitation électrique des nerfs traitée comme une polarisation”. J. Physiol. Pathol. Gen. 9: 620–635.
- Brunel N, Van Rossum MC (2007). “Lapicque’s 1907 paper: from frogs to integrate-and-fire”. Biol. Cybern. 97(5–6): 337–339. doi:10.1007/s00422-007-0190-0. PMID 17968583. S2CID 17816096.
- Hubel DH, Wiesel TN (1962). “Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex”. J. Physiol. 160 (1): 106–54. doi:10.1113/jphysiol.1962.sp006837. PMC 1359523. PMID 14449617.
- Forrest MD (2014). “Intracellular Calcium Dynamics Permit a Purkinje Neuron Model to Perform Toggle and Gain Computations Upon its Inputs”. Frontiers in Computational Neuroscience. 8: 86. doi:10.3389/fncom.2014.00086. PMC 4138505. PMID 25191262.
- Wu, Samuel Miao-sin; Johnston, Daniel (1995). Foundations of cellular neurophysiology. Cambridge, Mass: MIT Press. ISBN 978-0-262-10053-3.
- Koch, Christof (1999). Biophysics of computation: information processing in single neurons. Oxford [Oxfordshire]: Oxford University Press. ISBN 978-0-19-510491-2.
- Forrest MD (2014). “Intracellular Calcium Dynamics Permit a Purkinje Neuron Model to Perform Toggle and Gain Computations Upon its Inputs”. Frontiers in Computational Neuroscience. 8: 86. doi:10.3389/fncom.2014.00086. PMC 4138505. PMID 25191262.
- Forrest MD (April 2015). “Simulation of alcohol action upon a detailed Purkinje neuron model and a simpler surrogate model that runs >400 times faster”. BMC Neuroscience. 16 (27): 27. doi:10.1186/s12868-015-0162-6. PMC 4417229. PMID 25928094.
- “Dynamics of Ion Fluxes between Neurons, Astrocytes and the Extracellular Space during Neurotransmission”. cyberleninka.ru. Retrieved 2023-03-14.
- Sibille, Jérémie; Duc, Khanh Dao; Holcman, David; Rouach, Nathalie (2015-03-31). “The Neuroglial Potassium Cycle during Neurotransmission: Role of Kir4.1 Channels”. PLOS Computational Biology. 11 (3): e1004137. Bibcode:2015PLSCB..11E4137S. doi:10.1371/journal.pcbi.1004137. ISSN 1553-7358. PMC 4380507. PMID 25826753.
- Pannasch, Ulrike; Freche, Dominik; Dallérac, Glenn; Ghézali, Grégory; Escartin, Carole; Ezan, Pascal; Cohen-Salmon, Martine; Benchenane, Karim; Abudara, Veronica; Dufour, Amandine; Lübke, Joachim H. R.; Déglon, Nicole; Knott, Graham; Holcman, David; Rouach, Nathalie (April 2014). “Connexin 30 sets synaptic strength by controlling astroglial synapse invasion”. Nature Neuroscience. 17 (4): 549–558. doi:10.1038/nn.3662. ISSN 1546-1726. PMID 24584052. S2CID 554918.
- Chklovskii DB, Mel BW, Svoboda K (October 2004). “Cortical rewiring and information storage”. Nature. 431(7010): 782–8. Bibcode:2004Natur.431..782C. doi:10.1038/nature03012. PMID 15483599. S2CID 4430167.
Review article - Zhaoping L. 2014, The efficient coding principle , chapter 3, of the textbook Understanding vision: theory, models, and data
- see visual spational attention https:/en.wikipedia.org/wiki/Visual_spatial_attention
- Li. Z. 2002 A saliency map in primary visual cortex Trends in Cognitive Sciences vol. 6, Pages 9-16, and Zhaoping, L. 2014, The V1 hypothesis—creating a bottom-up saliency map for preattentive selection and segmentation in the book Understanding Vision: Theory, Models, and Data
- Weiss, Yair; Simoncelli, Eero P.; Adelson, Edward H. (20 May 2002). “Motion illusions as optimal percepts”. Nature Neuroscience. 5 (6): 598–604. doi:10.1038/nn0602-858. PMID 12021763. S2CID 2777968.
- Ernst, Marc O.; Bülthoff, Heinrich H. (April 2004). “Merging the senses into a robust percept”. Trends in Cognitive Sciences. 8 (4): 162–169. CiteSeerX 10.1.1.299.4638. doi:10.1016/j.tics.2004.02.002. PMID 15050512. S2CID 7837073.
- Olshausen, Bruno A.; Field, David J. (1997-12-01). “Sparse coding with an overcomplete basis set: A strategy employed by V1?”. Vision Research. 37 (23): 3311–3325. doi:10.1016/S0042-6989(97)00169-7. PMID 9425546. S2CID 14208692.
- Schneidman E, Berry MJ, Segev R, Bialek W (2006). “Weak pairwise correlations imply strongly correlated network states in a neural population”. Nature. 440 (7087): 1007–12. arXiv:q-bio/0512013. Bibcode:2006Natur.440.1007S. doi:10.1038/nature04701. PMC 1785327. PMID 16625187.
- Wilson, H. R.; Cowan, J.D. (1973). “A mathematical theory of the functional dynamics of cortical and thalamic nervous tissue”. Kybernetik. 13 (2): 55–80. doi:10.1007/BF00288786. PMID 4767470. S2CID 292546.
- Anderson, Charles H.; Eliasmith, Chris (2004). Neural Engineering: Computation, Representation, and Dynamics in Neurobiological Systems (Computational Neuroscience). Cambridge, Mass: The MIT Press. ISBN 978-0-262-55060-4.
- Marvin M. Chun; Jeremy M. Wolfe; E. B. Goldstein (2001). Blackwell Handbook of Sensation and Perception. Blackwell Publishing Ltd. pp. 272–310. ISBN 978-0-631-20684-2.
- Edmund Rolls; Gustavo Deco (2012). Computational Neuroscience of Vision. Oxford Scholarship Online. ISBN 978-0-198-52488-5.
- Machens CK, Romo R, Brody CD (2005). “Flexible control of mutual inhibition: a neural model of two-interval discrimination”. Science. 307 (5712): 1121–4. Bibcode:2005Sci…307.1121M. CiteSeerX 10.1.1.523.4396. doi:10.1126/science.1104171. PMID 15718474. S2CID 45378154.
- Robinson PA, Rennie CJ, Rowe DL, O’Connor SC, Gordon E (2005). “Multiscale brain modelling”. Philosophical Transactions of the Royal Society B. 360 (1457): 1043–1050. doi:10.1098/rstb.2005.1638. PMC 1854922. PMID 16087447.
- Crick F, Koch C (2003). “A framework for consciousness”. Nat. Neurosci. 6 (2): 119–26. doi:10.1038/nn0203-119. PMID 12555104. S2CID 13960489.
- Adaszewski, Stanisław; Dukart, Juergen; Kherif, Ferath; Frackowiak, Richard; Draganski, Bogdan; Alzheimer’s Disease Neuroimaging Initiative (2013). “How early can we predict Alzheimer’s disease using computational anatomy?”. Neurobiol Aging. 34 (12): 2815–26. doi:10.1016/j.neurobiolaging.2013.06.015. PMID 23890839. S2CID 1025210.
- Friston KJ, Stephan KE, Montague R, Dolan RJ (2014). “Computational psychiatry: the brain as a phantastic organ”. Lancet Psychiatry. 1 (2): 148–58. doi:10.1016/S2215-0366(14)70275-5. PMID 26360579. S2CID 15504512.
- Floyrac, Aymeric; Doumergue, Adrien; Legriel, Stéphane; Deye, Nicolas; Megarbane, Bruno; Richard, Alexandra; Meppiel, Elodie; Masmoudi, Sana; Lozeron, Pierre; Vicaut, Eric; Kubis, Nathalie; Holcman, David (2023). “Predicting neurological outcome after cardiac arrest by combining computational parameters extracted from standard and deviant responses from auditory evoked potentials”. Frontiers in Neuroscience. 17: 988394. doi:10.3389/fnins.2023.988394. ISSN 1662-453X. PMC 9975713. PMID 36875664.
- Sun, Christophe; Holcman, David (2022-08-01). “Combining transient statistical markers from the EEG signal to predict brain sensitivity to general anesthesia”. Biomedical Signal Processing and Control. 77: 103713. doi:10.1016/j.bspc.2022.103713. ISSN 1746-8094. S2CID 248488365.
- Montague, P. Read; Dolan, Raymond J.; Friston, Karl J.; Dayan, Peter (14 Dec 2011). “Computational psychiatry”. Trends in Cognitive Sciences. 16 (1): 72–80. doi:10.1016/j.tics.2011.11.018. PMC 3556822. PMID 22177032.
- Kato, Ayaka; Kunisato, Yoshihiko; Katahira, Kentaro; Okimura, Tsukasa; Yamashita, Yuichi (2020). “Computational Psychiatry Research Map (CPSYMAP): a new database for visualizing research papers”. Frontiers in Psychiatry. 11 (1360): 578706. doi:10.3389/fpsyt.2020.578706. PMC 7746554. PMID 33343418.
- Huys, Quentin J M; Maia, Tiago V; Frank, Michael J (2016). “Computational psychiatry as a bridge from neuroscience to clinical applications”. Nature Neuroscience. 19 (3): 404–413. doi:10.1038/nn.4238. PMC 5443409. PMID 26906507.
- Russell, John (21 March 2016). “Beyond von Neumann, Neuromorphic Computing Steadily Advances”.
- Calimera, Andrea; Macii, Enrico; Poncino, Massimo (2013-08-20). “The human brain project and neuromorphic computing”. Functional Neurology. 28 (3): 191–196. doi:10.11138/FNeur/2013.28.3.191 (inactive 1 November 2024). PMC 3812737. PMID 24139655.
Congraturation!
우리 연구팀이 바둑프로기사들을 대상으로 수행한 어림수 짐작의 뇌기전 연구가 출판되었습니다.
The neural basis of intuitive approximate number system in board game Go (Baduk) experts. Lee T, Jo HJ, Kim M, Kwon JS. Sci Rep. 2025 May 12;15(1):16400. doi: 10.1038/s41598-025-98605-9.
Congraturation!
우리 연구팀이 참여한 ENIGMA 국제공동연구의 논문이 출판되었습니다.
Thalamo-cortical structural co-variation networks are related to familial risk for schizophrenia in the context of lower nuclei volume estimates in patients: an ENIGMA study. Lella A, Antonucci LA, Passiatore R, Bellantuono L, Selvaggi P, Popolizio T, Di Sciascio G, Saponaro A, Ricci P, Altamura M, Blasi G, Rampino A, Vriend C, Calhoun VD, Rootes-Murdy K, Goldman AL, Baeza I, Castro-Fornieles J, Sugranyes G, De la Serna E, Pomarol-Clotet E, Fatjó-Vilas M, Salvador R, Karuk A, Fuentes-Claramonte P, Glahn DC, Rodrigue AL, Blangero J, Wang L, Lee T, Einenkel KE, Hamers S, Gruber O, Preda A, Chung YC, Odkhuu S, Vallée C, Dazzan P, Marcelis M, Michielse S, Brosch K, Stein F, Nenadić I, Straube B, Thomas-Odenthal F, Kircher T, Carruthers S, Rossell SL, Sumner PJ, Van Rheenen TE, Demro C, Ramsay IS, Sponheim SR, Lencer R, Meinert S, Hahn T, Dannlowski U, Grotegerd D, Ciccarelli M, Iasevoli F, Pontillo G, Pearlson GD, Cobia D, Piras F, Banaj N, Vecchio D, Barendse MEA, van Haren NEM, Jo HJ, Sim K, Quidé Y, Green MJ, Slate R, Cecere G, Omlor W, Homan S, Homan P, Thomopoulos SI; Apulian Network on Risk for Psychosis; Turner JA, van Erp TGM, Thompson PM, Bertolino A, Pergola G. Biol Psychiatry. 2025 May 7:S0006-3223(25)01178-3. doi: 10.1016/j.biopsych.2025.03.027.
https://www.biologicalpsychiatryjournal.com/article/S0006-3223(25)01178-3/fulltext
정신과학 I
강의개요: 이 강의는 주요 정신질환의 병태생리를 신경심리학과 뇌기능의 관점에서 통합적으로 이해하는 것을 목표로 한다. 조현병, 우울증, 양극성장애, 불안장애, 자폐스펙트럼장애, 인지장애 등 다양한 정신질환에서 나타나는 인지, 정서, 행동의 이상이 어떤 뇌기능 변화와 관련되는지를 다루며, 이를 통해 뇌-마음-행동의 관계를 체계적으로 고찰한다.
강의에서는 감각지각, 주의, 집행기능, 정서조절, 사회적 인지 등 핵심 인지기능의 손상이 정신질환에서 어떻게 나타나는지를 탐색하고, 각 기능의 뇌 회로와 임상 증상 간의 연관성을 이해하는 데 초점을 둔다. 또한 단일 뇌 영역보다는 네트워크 수준의 통합적 접근을 강조하며, 이를 통해 정신질환에 대한 보다 정교한 이해와 해석이 가능하도록 한다.
이 강의는 정신질환을 생물학적 기반에서 설명하려는 현대적 접근을 소개하고, 신경심리학과 뇌과학 지식을 바탕으로 정신병리의 핵심 기전을 학습하는 과정이다. 학제 간 융합적 사고와 비판적 분석 능력을 기르는 데 중점을 둔다.
권장선수교과목: 없음
권장후수교과목: 정신과학 II
강의언어: 한국어
교재 및 참고문헌:
- Kaplan & Sadock’s Synopsis of Psychiatry by Robert Boland , Marcia Verduin , et al. | Apr 22, 2021
- The American Psychiatric Association Publishing Textbook of Neuropsychiatry and Clinical Neurosciences by David B Arciniegas, Stuart C Yudofsky, et al. | Jun 21, 2018
- The Neuropsychology of Mental Illness by Stephen J. Wood, Nicholas B. Allen, et al. | Oct 1, 2009
- Understanding Neuropsychiatric Disorders: Insights from Neuroimaging by Adam Sutton, Adrian Cherney, et al. | Dec 9, 2010
수강참고사항: 본 강의는 사전학습과 발표, 토론을 중심으로 한 참여형 세미나 방식으로 진행된다. 매주 정해진 주제에 대해 기본 교재의 핵심 내용을 미리 학습하고, 관련 최신 연구 논문을 함께 읽은 뒤, 수업에서는 교수의 간략한 해설과 함께 핵심 쟁점에 대한 토론이 이루어진다. 학생들은 돌아가며 논문 발표를 맡아 해당 주제에 대한 비판적 시각을 제시하고 토론을 이끈다. 강의는 단순한 지식 전달을 넘어서 정신질환에 대한 신경과학적 이해를 바탕으로 비판적 사고와 학제 간 통합적 관점을 함양하는 데 중점을 둔다. 학기 말에는 각자의 관심 주제에 대한 간단한 리뷰 과제나 세미나 발표를 통해 학습 내용을 정리하고 확장하는 시간을 갖는다.
강의목표: 본 강의는 정신질환의 병태생리를 신경심리학적 및 뇌기능 기반 관점에서 이해하는 것을 목표로 한다. 학생들은 정신질환에서 나타나는 주요 인지·정서 기능의 손상을 뇌 회로 수준에서 설명할 수 있도록 학습하며, 관련된 최신 연구 결과를 비판적으로 해석하고 토론하는 능력을 기른다. 또한 다양한 학제적 관점을 통합하여 정신질환의 신경기제를 분석하고, 이를 자신의 연구주제와 연결하여 신경과학 기반 정신병리 이론을 실제 연구나 임상적 맥락에 적용할 수 있는 응용력을 갖추는 것을 지향한다.
평가방법: 사전학습 과제 및 참여도: 30%, 논문 발표 및 토론 기여도: 40%, 기말 리뷰 과제 또는 세미나 발표: 30%
주별강의:
- Introduction to cognitive and affective neuroscience
- Developmental neuropsychology: normative trajectories and risk for psychiatric illness
- Process and mechanisms in neuropsychiatry: sensory-perceptual processes
- Process and mechanisms in neuropsychiatry: motor-executive processes
- Neurobiology of the emotion response: perception, experience and regulation
- Frontal asymmetry in emotion, personality and psychopathology
- Understanding language dysfunction in neuropsychiatric disorders
- Seminar I
- Associative memory
- Neural basis of attention
- Role of executive functions in psychiatric disorders
- Decision-making
- Neuropsychology of social cognition
- Seminar II
- Test and wrap-up