Applied Data Science 3+2 Pathway

New College students who complete the second year of their studies in any area of concentration are encouraged to follow the accelerated 3+2 curriculum provided below, if they are interested in completing both undergraduate and graduate programs in five years. Students who are interested in this option will be eligible only after entering the New College undergraduate program and showing strong academic performance. These applicants must satisfy the following minimum conditions before they can be admitted via the 3+2 pathway:

  • Complete 2 years of study with Satisfactory evaluations in all academic undertakings.
  • Complete prerequisite courses (see below)
  • Be recommended for the 3+2 pathway by a faculty member

Prerequisite Courses

The following courses must be completed during the first two years of undergraduate study:

  • MATH 2400 – Calculus I
  • CSCI 2200 – Introduction to Programming in Python
  • CSCI 3250 – Intermediate Python or CSCI 2400 – Object Oriented Programming
  • MATH 2200 – Probability 1 (Mod 1)
  • MATH 2320 – Linear Algebra

These courses also count towards satisfying the IDC 5100 Introduction to Data Science Bootcamp course in the graduate program.

IDC 5204 – Applied Statistics I: A statistics course focusing on descriptive and inferential statistics, with topics on linear regression, confidence intervals and hypothesis testing, including probability theory and modern approaches such as resampling, with all methods illustrated in R and a focus on methods relevant for data science using industrial datasets.

IDC 5110 – Data Munging and Exploratory Data Analysis: A course on practical approaches for reshaping, reorganizing, and summarizing relationships in data through exploratory analysis. Principles and methods for preprocessing, normalizing, and validating data are covered, with an emphasis on collaborative and reproducible research.

IDC 5120 – Algorithms for Data Science: Fundamentals of algorithms and measures of performance. Taught in Python, the course includes an exploration of efficient algorithms for sorting and retrieving data, graph algorithms and combinatorial optimization, dynamic programming, randomized algorithms and approximation algorithms.

IDC 5130 –  Programming for Data Science: Fundamentals of traditional database design and management. Various types and comparison of databases including SQL databases (eg. Postgre, SQLite), NoSQL databases, column-oriented databases (eg. HBase) and document-oriented databases (eg. MongoDb). Consistency, availability, scalability, efficiency and performance in data retrieval and storage.

IDC 5295 – Industrial Workshops: This course offers content modules complementary to the regular coursework of the graduate program in applied data science. Examples include, but are not limited to, topics such as Ethics, emerging or trending techniques in data science, domain-specific applications, industrial software platforms or tools, and professional certification modules and exams widely acknowledged in the industry.

IDC 5205 – Applied Statistics II: A course on statistical modeling, including multiple linear and logistic regression, and more generally, generalized linear models. Emphasis is placed on model formulation, building, assumptions, interpretations, predictions and assessments, with implementation carried out in R and a focus on methods and models relevant for data science using industrial datasets.

IDC 5112 – Data Visualization & Communication: A project-centered introduction to the visual display of quantitative information for both knowledge discovery and the communication of results. Students develop, over the course of the semester, a visual application in their interest with data collected from an industrial application or project.

IDC 5210 – Applied Machine Learning: Project-based course with a coverage of supervised and unsupervised learning and an emphasis on working with real industrial data. Bayesian analysis and other specific learning paradigms including regression, clustering, random forests, support vector machines, kernel methods, and neural networks.

IDC 5131 – Distributed Computing: Fundamentals concerning the design and maintenance of massively parallel data sets. Non-relational databases and their management. Algorithms for parallel architectures and associated software tools including the MapReduce/Hadoop framework and BigTable.

IDC 6200 – Advanced Statistical Modeling: A second statistical modeling course, with a mix of topics such as generalized additive models, models for longitudinal responses, time series models, survival analysis, statistical learning or Bayesian statistics, with a focus on models relevant for data science. Taught with a project-based focus using real industrial data in an applied business context.

IDC 6215 – Deep Learning and AI: Advanced topics in computing, including such topics as image processing and object detection, text mining, natural language processing, recurrent neural networks, reinforcement learning. Taught with a project-based focus using real industrial data in an applied business context.

IDC 6250 – Practical Data Science: Analysis of data and creation of a data science pipeline and deliverable for industry. Working in small groups, students analyze an industry-submitted data set starting with exploratory analysis, followed by statistical or machine learning-based model building, and the construction and presentation of a data product to an industry partner.

IDC 6294 – Industrial Practicum II: A full semester working in industry as part of a data science team, while under the weekly supervision of and submitting reports to a Data Science faculty. This is the second and final stage of the industrial practicum where the student works in an industrial partner company or organization or in a company of their choice. Performance is assessed both by a faculty advisor and a company supervisor.