Last updated on: 11/02/2015 at 11:00AM



General Information

Class Times: Monday, Wednesday and Friday 11:15 AM-12:20 PM
Class Room: 004, Kemeny Hall
Instructor: Nishant Mallik, Office: 310 Kemeny Hall, Phone: 603-646-9020, Email : Nishant.Malik@dartmouth.edu
Office Hours: Monday, Wednesday and Friday 1:30 PM - 2:30 PM [or by appointment].
X-hours: Tuesday 12:00 PM -12:50 PM [Will be used intermittently at instructor's discretion for Python sessions or for review of course material etc. Do not schedule anything regular in this X-hr].

Textbook

Title: Applied Linear Regression Models
Edition: 4th
Authors: Michael H. Kutner, Christopher J. Nachtsheim and John Neter
Publisher: McGraw Hill/Irwin
Click to access data sets used in the book
Important Note:   This book is a subset of larger and more expensive book with the title "Applied Linear Statitsical Models" (5th edition) by Kutner, Nachtsheim, Neter, and Li (McGraw-Hill/Irwin). An old used copy of the following earlier editions of this book and its supersets will also work fine: "Applied Linear Regression Models" (3rd edition) by Neter, Kutner, Nachtsheim and Wasserman (Irwin) and the supserset of this 3rd edition book "Applied Linear Statistical Models" (4th edition) by Neter, Kutner, Nachtsheim and Wasserman (Irwin).

Course Description

The linear regression model and its extension, the generalized linear model, are the most popular and powerful data analysis technique for studying statistical relationships. The course will present the theoretical background for linear models and their statistical properties, demonstrate how various problems and models reduce to the linear case, and explore the assumptions and limitations of linear models through derivation and simulation.

Syllabus
Roughly following topics will be covered during the course:
  • Simple linear regression
  • Multiple regression
  • Analysis of variance
  • Statistical model building strategies
  • Regression diagnostics
  • Analysis of complex data sets

Prerequisite

MATH 10, another elementary statistics course, or permission of the instructor.

Grades

1. Two in CLASS EXAMS (1 hour long) 15 % each i.e., these two tests will account for 30% of the total grade.
2. HOMEWORK accounts for 20% of the total grade.
3. End of the course PROJECT for 15% of the total grade.
4. Final exam (3 hour long) accounts for the remaining 35%.

Exam and project Schedule

1. First in class exam : October 7, 2015.
2. Second in class exam : November 2, 2015.
3. Project submission deadline: November 16, 2015.
4. Final Exam : November 20, 2015 (8AM)

Resources

    Reference books:
  • Statistical Models by A C Davison (Cambridge University Press, 2003). Excellent text with very modern treatment of the subject material.
  • Linear Models with R by Julian J. Faraway (Chapman & Hall/CRC, 2015, 2nd Edition). Great text though we will not be using R in the course.
  • Data sets:
  • The Data and Story Library
  • UCLA Statistics Course Datasets
  • Time Series Data Library
  • Miscellaneous Datasets page of Larry Winner, Department of Statistics, University of Florida. LINK
  • A collection of data sets accompanying the book "Understandable Statistics" by Charles Henry Brase and Corrinne Pellillo Brase (Cengage Learning,7th Edition) LINK
  • Rdatasets is a collection of data sets that is distributed with R, these datasets can be accessed in Python using statsmodels functionalities in python. The html listing of these data sets is available on this page .
Homework

Homework will be assigned once a week on Fridays and will be due the following Friday, unless otherwise explicitly specified by the instructor. Submit homework to the instructor after the class or during the office hours. Homework sheets will be uploaded periodically onto this page. Homework problems marked with an asterisk (*), should be solved using ipython notebook (jupyter) and the resulting python notebook should be submitted in html format to the instructor by uploading it at . In case you are not able to upload the homework files to this website then please contact the instructor. Password for the DROPITTOME website will be provided in the class. A homework file should be named as hw < homework sheet number > < students full name > with no spaces or special characters. Late homework will not be graded.

Homework Sheets
Homework Sheet 1
Posted on: 09/18/2015 Due on: 09/25/2015
Solutions
Homework Sheet 2
Posted on: 09/25/2015 Due on: 10/02/2015 Solutions
Homework Sheet 3
Posted on: 10/02/2015 Due on: 10/09/2015 Solutions




Homework Sheet 4
Posted on: 10/09/2015 Due on: 10/16/2015 Solutions
Homework Sheet 5
Posted on: 10/16/2015 Due on: 10/23/2015 Solutions
Homework Sheet 6
Posted on: 10/23/2015 Due on: 10/30/2015 Solutions




Homework Sheet 7
Posted on: 10/30/2015 Due on: 11/11/2015*















Project

At the end of the course each student has to submit a research project based on the material learned during the course. Students can choose either to work on a project individually or in a team of 2 to 3 students. The main criteria for grading a project will be the originality of the idea/problem and complexity of methods, concepts and techniques used. Project document should be submitted in a pdf format generated using latex or html generated using ipython notebook (jupyter). Students are highly encouraged to use the ipython notebook (jupyter) option for submission and to include interactive graphics in their submission. Please check out tools like mpld3 or Bokeh for creating interactive plots in ipython notebook. Students are also expected to give a brief presentation to the class about their project.

Python

Phyton will be the programming language for the course. No prior knowledge of Python is expected.
Why Python?
Python is among the most popular high level programming languages of our times, its application areas are wide and extensive and includes scientific and numerical computation. It has a large community of developers and contributors, hence it is very well supported. In recent years it has gained popularity among data scientists with the inclusion of highly capable statistics and data analysis toolboxes.
How to install it?
Student are highly recommended to install Anconda Python distribution , it is free and very easy to install on most computers. It comes with all the packages we will need during this course. Another way to install Python and all the required packages is to install Canopy Express (free Enthought Python Distribution). Students with no prior exposure to Python are discouraged to attempt manual installation of Python or its packages, instead should install either Anconda Python distribution or Canopy Express (free Enthought Python Distribution). Students that encounter problems installing Python, should contact the Instructor.
Resources
Class Notes

1. BOX PLOT     2. SCATTER PLOT     3. LEAST SQUARES     4. DISTRIBUTIONS     5. INFERENCE     6. FUNCTIONS
7. CONFIDENCE BAND     8. CORRELATION     9. DIAGNOSTICS AND REMEDIAL MEASURES     10. TRANSFORMATIONS
11. LOWESS     12. BONFERRONI INTERVALS     13. LACK OF FIT

Special needs

Students with diagnosed learning disability are encouraged to discuss with the instructor any appropriate accommodations that might be helpful. All discussions will remain confidential, although the Student Accessibility Services office may be consulted.

Honor Principle

You are encouraged to work together on homework. However, the final writeup should be your own. On exams, all work should be entirely your own; no consultation of other persons, printed works, or online sources is allowed without the instructor's explicit permission.