Efficient Neural Network Approaches: Implementation and Experimental Setup

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Zheyu Oliver Wang, Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, MA and [email protected];

(2) Ricardo Baptista, Computing + Mathematical Sciences, California Institute of Technology, Pasadena, CA and [email protected];

(3) Youssef Marzouk, Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, MA and [email protected];

(4) Lars Ruthotto, Department of Mathematics, Emory University, Atlanta, GA and [email protected];

(5) Deepanshu Verma, Department of Mathematics, Emory University, Atlanta, GA and [email protected].

Table of Links

5. Implementation and Experimental Setup.

This section describes our implementations and experimental setups and provides guidance for applying our techniques to new problems.

Implementation. The scripts for implementing our neural network approaches and running our numerical experiments are written in Python using PyTorch. For datasets that are not publicly available, we provide the binary files we use in our experiments and the Python scripts for generating the data. We have published the code and data along with detailed instructions on how to reproduce the results in our main repository https://github.com/EmoryMLIP/PCP-Map.git. Since the COTFlow approach is a generalization of a previous approach, we have created a fork for this paper https://github.com/EmoryMLIP/COT-Flow.git.

Hyperparameter selection. Finding hyperparameters, such as network architectures and optimization parameters, is crucial in neural network approaches. In our experience, the choice of hyperparameters is often approach and problem-dependent. Establishing rigorous mathematical principles for choosing the parameters in these models (as is now well known for regularizing convex optimization problems based on results from high-dimensional statistics [35]) is an important area of future work and is beyond the scope of this paper. Nevertheless, we present an objective and robust way of identifying an effective combination of hyperparameters for our approaches.

We limit the search for optimal hyperparameters to the search space outlined in Table 2. Due to the typically large number of possible hyperparameters, we employ a two-step procedure to identify effective combinations. In the initial step, called the pilot run, we randomly sample 50 or 100 combinations and conduct a relatively small number of training steps, which will be specified for each experiment in section 6. For our experiments, the sample space from which we sample for hyperparameters will be a subset of the search space defined in Table 2. The selection of the subset depends on the properties of the training dataset and will be explained in more detail in each experiment. Subsequently, we select the models that exhibit the best performance on the validation set and continue training them for the desired number of epochs. This iterative process allows us to refine the hyperparameters and identify the most effective settings for the given task. We adopt the ADAM optimizer for all optimization problems from the pilot and training runs.

The number of samples in the pilot runs, the number of models selected for the final training, and the number of repetitions can be adjusted based on the computational resources and expected complexity of the dataset. Further details for each experiment are provided in section 6.