Class Times: Monday, Wednesday and Friday 11:15 AM-12:20 PM
Class Room: 004, Kemeny Hall
Instructor: Nishant Mallik, Office: 310 Kemeny Hall, Phone: 603-646-9020, Email : Nishant.Malik@dartmouth.edu
Office Hours: Monday, Wednesday and Friday 1:30 PM - 2:30 PM [or by appointment].
X-hours: Tuesday 12:00 PM -12:50 PM [Will be used intermittently at instructor's discretion for Python sessions or for review of course material etc. Do not schedule anything regular in this X-hr].
Title: Applied Linear Regression Models
Edition: 4th
Authors: Michael H. Kutner, Christopher J. Nachtsheim and John Neter
Publisher: McGraw Hill/Irwin
Click to access data sets used in the book
The linear regression model and its extension, the generalized linear model, are the most popular and powerful data analysis technique for studying statistical relationships. The course will present the theoretical background for linear models and their statistical properties, demonstrate how various problems and models reduce to the linear case, and explore the assumptions and limitations of linear models through derivation and simulation.
Roughly following topics will be covered during the course:
- Simple linear regression
- Multiple regression
- Analysis of variance
- Statistical model building strategies
- Regression diagnostics
- Analysis of complex data sets
MATH 10, another elementary statistics course, or permission of the instructor.
1. Two in CLASS EXAMS (1 hour long) 15 % each i.e., these two tests will account for 30% of the total grade.
2. HOMEWORK accounts for 20% of the total grade.
3. End of the course PROJECT for 15% of the total grade.
4. Final exam (3 hour long) accounts for the remaining 35%.
1. First in class exam : October 7, 2015.
2. Second in class exam : November 2, 2015.
3. Project submission deadline: November 16, 2015.
4. Final Exam : November 20, 2015 (8AM)
-
Reference books:
- Statistical Models by A C Davison (Cambridge University Press, 2003). Excellent text with very modern treatment of the subject material.
- Linear Models with R by Julian J. Faraway (Chapman & Hall/CRC, 2015, 2nd Edition). Great text though we will not be using R in the course. Data sets:
- The Data and Story Library
- UCLA Statistics Course Datasets
- Time Series Data Library
- Miscellaneous Datasets page of Larry Winner, Department of Statistics, University of Florida. LINK
- A collection of data sets accompanying the book "Understandable Statistics" by Charles Henry Brase and Corrinne Pellillo Brase (Cengage Learning,7th Edition) LINK
- Rdatasets is a collection of data sets that is distributed with R, these datasets can be accessed in Python using statsmodels functionalities in python. The html listing of these data sets is available on this page .
Homework will be assigned once a week on Fridays and will be due the following Friday, unless otherwise explicitly specified by the instructor. Submit homework to the instructor after the class or during the office hours. Homework sheets will be uploaded periodically onto this page. Homework problems marked with an asterisk (*), should be solved using ipython notebook (jupyter) and the resulting python notebook should be submitted in html format to the instructor by uploading it at . In case you are not able to upload the homework files to this website then please contact the instructor. Password for the DROPITTOME website will be provided in the class. A homework file should be named as hw < homework sheet number > < students full name > with no spaces or special characters.
At the end of the course each student has to submit a research project based on the material learned during the course. Students can choose either to work on a project individually or in a team of 2 to 3 students. The main criteria for grading a project will be the originality of the idea/problem and complexity of methods, concepts and techniques used. Project document should be submitted in a pdf format generated using latex or html generated using ipython notebook (jupyter). Students are highly encouraged to use the ipython notebook (jupyter) option for submission and to include interactive graphics in their submission. Please check out tools like mpld3 or Bokeh for creating interactive plots in ipython notebook. Students are also expected to give a brief presentation to the class about their project.
Phyton will be the programming language for the course. No prior knowledge of Python is expected.
Python is among the most popular high level programming languages of our times, its application areas are wide and extensive and includes scientific and numerical computation. It has a large community of developers and contributors, hence it is very well supported. In recent years it has gained popularity among data scientists with the inclusion of highly capable statistics and data analysis toolboxes.
Student are highly recommended to install Anconda Python distribution , it is free and very easy to install on most computers. It comes with all the packages we will need during this course. Another way to install Python and all the required packages is to install Canopy Express (free Enthought Python Distribution). Students with no prior exposure to Python are discouraged to attempt manual installation of Python or its packages, instead should install either Anconda Python distribution or Canopy Express (free Enthought Python Distribution). Students that encounter problems installing Python, should contact the Instructor.
-
Basic Python tutorials/books/notes/guides:
- A byte of Python by Swaroop CH.   This book is one of the best tutorials for beginners.
- Introduction to Python for Econometrics, Statistics and Data Analysis by Kevin Sheppard. Useful notes for the course.
- Introduction to Python for Computational Science and Engineering (A beginner's guide) by Hans Fangohr Tutorials for the packages we will be using in the course:
- NumPy Tutorial and Tentative NumPy Tutorial (in pdf)
- NumPy User Guide and NumPy reference
- SciPy Tutorial
- An introduction to Numpy and SciPy (in pdf)
- Matplotlib tutorial by Nicolas P. Rougier
- IPython notebook tutorial , Jupyter documentation and Notebook Gallery
1. BOX PLOT     2. SCATTER PLOT     3. LEAST SQUARES     4. DISTRIBUTIONS     5. INFERENCE     6. FUNCTIONS
7. CONFIDENCE BAND     8. CORRELATION     9. DIAGNOSTICS AND REMEDIAL MEASURES     10. TRANSFORMATIONS
11. LOWESS     12. BONFERRONI INTERVALS     13. LACK OF FIT
Students with diagnosed learning disability are encouraged to discuss with the instructor any appropriate accommodations that might be helpful. All discussions will remain confidential, although the Student Accessibility Services office may be consulted.
You are encouraged to work together on homework. However, the final writeup should be your own. On exams, all work should be entirely your own; no consultation of other persons, printed works, or online sources is allowed without the instructor's explicit permission.