Journal Club
The SNAIL Journal Clubs are held bi-weekly during which we review papers that span a variety of topics in neuroscience, AI, and NeuroAI. On this page, you'll find summaries of the papers that have been reviewed in our journal clubs.
August 18th, 2023
Title: The neuroconnectionist research programme
Authors: Adrien Doerig, Rowan P. Sommers, Katja Seeliger, Blake Richards, Grace W. Lindsay, Konrad P. Kording, Talia Konkle, Marcel A. J. van Gerven & Tim C. Kietzmann
Journal / Conference: Nature Reviews Neuroscience
Background and Objectives:
The paper introduces neuroconnectionism as a progressive, novel research programme that uses artificial neural networks and deep learning to model cognitive processes. The authors propose neuroconnectionism as a Lakatosian philosophy, with several core assumptions and many auxiliary (“belt”) research directions and experiments.
Key Ideas:
The main core of the neuroconnectionist research programme consists of two assumptions:
The brain is best represented by complex models that are distributed and iterative. These models should be able to represent the computations happening in the brain and show how these lead to complex behaviours and neural dynamics. Models should also be able to compute from sensory information in a naturalistic environment, and show how adaptive processes led to these computations.
Artificial networks provide this ideal representation. They are in the “Goldilocks zone” of biological abstraction, and are able to model complex cognitive tasks while still being able to run at scale. This paper expands on many aspects of ANNs that make them effective, efficient models of complex brain function.
Implications:
Neuroconnectionism has become a central model for cognitive neuroscience, as it is able to model complex cognitive tasks from the behavioural level to the neuronal level. ANNs are able to carry out these complex tasks while remaining interpretable and biologically relevant, which allows scientists to create better models of the brain and gain a deeper understanding of cognitive processes. Future directions for neuroconnectionism include developing multitask networks, conducting more experiments in naturalistic experimental conditions, and further modelling of adaptive cognitive development.
August 4th, 2023
Title: Self-supervised video pretraining yields human-aligned visual representations
Authors: Nikhil Parthasarathy, S. M. Ali Eslami, João Carreira, Olivier J. Hénaff
Journal / Conference: Preprint
Background and Objectives:
The paper explores the question of how a neural network can learn to represent objects in a self-supervised manner, aligning more closely with human perception (for a definition of what is meant by alignment see the Methods section below). The authors propose that by employing self-supervised contrastive learning on video data, they can achieve more human-like object perception than was possible with previous pretraining methods that used static images.
Methods:
Dataset: "[...] we hypothesized that collecting a minimally-curated video dataset matched to the rough properties of ImageNet would be beneficial for learning a more general visual model from videos." To validate this hypothesis, the authors developed a data curation pipeline (VideoNet) to selectively filter online videos. The aim was to obtain video training data that more accurately reflects the distribution of categories found in ImageNet.
Self-supervised contrastive learning: Similar to other contrastive SSL algorithms, the definition of positive and negative pairs is crucial. Positive pairs were identified by sampling from 2.56-second video clips within VideoNet, while negative frame pairs were selected from different clips. Along with the inherent temporal augmentations happening by sampling video clips, the authors implemented a series of standard augmentations. Additionally, they introduced an innovative multi-scale contrastive attention pooling method for aggregating positive/negative pairs across different views.
Key Results:
The proposed model, VITO, demonstrates superior generalization to other tasks compared to existing video pretrained models, and it stands competitive with models based on static images.
VITO is found to be more resilient to shifts in distribution, a key criterion used to measure its alignment with human vision. This robustness is shown in two benchmarks (ImageNet-Vid-Robust and ImageNet-A), where VITO outperforms other models by a considerable margin, particularly under the most challenging condition (i.e., IN-Vid pm10).
VITO also exhibits a closer resemblance to the spatial pattern of human attention during object recognition, even surpassing Harmonized, a model specifically trained to emulate human attention maps.
Evaluations conducted using a subset of the dataset proposed by Geirhos et al. (2021) show that VITO displays a significantly greater bias towards shape than other models in the comparison. In contrast, R3M, another contrastive learning model that uses video-language alignment on the Ego4D dataset, performs significantly worse, highlighting the significance of the training dataset.
Ablation experiments highlight the importance of all the proposed components in the obtained results, including the training data content, natural temporal augmentation in the videos, multi-scale attention scaling, etc. At least, in terms of generalization to semantic segmentation (PASCAL benchmark), the dataset content seems to have the largest effect.
Implications:
Robustness to distributional shifts, the ability to generalize to new tasks, and alignment with human cognition (as gauged by attention map alignments and shape bias) may all be attainable through the appropriate pre-training paradigm. This paper posits that both the natural temporal augmentations found within videos and the specific content of the training data play significant roles in achieving these attributes.