Stats Syllabus

CPaT Modeling and Statistics Syllabus –  Spring 2013
last revised April 21, 2013
last revisions bolded

This part of the program explores methods for studying complex phenomena and exposes students to a range of research design and data analysis methods, as well as to some simple modeling and visualization techniques and tools. The primary focus is on the application and interpretation of statistical methods including graphical and tabular summaries, distributions, confidence intervals, t-tests, analysis of variance (ANOVA), Chi-square tests, linear regression, multivariate statistics (if time), and both non-parametric and resampling approaches to these statistical methods. We will also address data management, use software to conduct statistical analyses and test hypotheses, and complete weekly labs for hands-on experience in data analysis, modeling, and visualization.

Learning objectives include:

  1. experimental design, types of variables, and research design, including articulating null and alternative hypotheses,
  2. data management, including recognizing what kinds of data are in a given file, and data archives,
  3. which statistical tests apply to which kinds of data and which software tools to use for those tests,
  4. interpretation and reporting of statistical tests as research results,
  5. what a scientific model is and how to build a simple one,
  6. scientific visualization.

We will generally have two lectures per week (Monday and Wednesday), and one lab (Tuesday morning).  In general, reading assignments should be done by the Tuesday Lab, and lab assignments will be due one week after the lab.

We suggest you work with a partner during labs, but the written Lab Assignment (due as hardcopy one week following your lab day), should be an individual effort. Plan to spend time outside class to complete your lab assignments!   You might find it convenient to work in the computer center or CAL.  Because learning statistics is a linear activity, we will not accept late work, except under dire circumstances.

At the beginning of each lab will be a (closed book) quiz (to be conducted via Moodle in the Computer Lab – you may use your own laptop or a lab machine).  Quizzes will cover information from the reading assigned for that week, and material from prior weeks.  Quizzes will be on Moodle and will be held during the first 15 minutes of lab, promptly at 9:30am. If you are late to lab, or miss the lab,  you will miss your chance to take the quiz. If you must miss lab, we might be able to arrange for you to take it on your honor later.  Once you have taken the quiz in lab, you are free to retake it as many times as you like to master the material, but we will record your first (and possibly last) score.

We will have a midterm and a final; each will have an in-class portion (via Moodle) and a take home part (due one week later).

Help Sessions: Kara Karboski, an MES master’s student statistics tutorwill be available (in addition to faculty) to help you during lab sessions. Robyn Andrusyszn, also an MES master’s student, will conduct a help session each week (probably in the CAL). If you don’t use these help sessions, we reserve the right to cancel them.

 Text: Nicholas J. Gotelli and Aaron M. Ellison, A Primer Of Ecological Statistics.

Additional resource for students who have difficulty with math and seek an intuitive understanding of the statistics:  William E. Magnuson and Guilnerme Mourão, Statistics without Math; other readings TBA.  Students interested in the history of science might want to check out Empire of Chance

Statistical Software: For statistics, we will use MS Excel, JMP, MS Excel Resampling Stats plug-in, and (if time) R. For modeling, we will use Stella, and for visualization Processing. This software is available for you in the computer centers. A free version of JMP is available through the CAL, and an inexpensive version of Resampling Stats is available via its web site. Note that not all programs are available for Macs. R is free-ware. We will likely use Excel for data management, though relational databases (such as Access) are really a better choice for actual data management!

 Faculty Roles: Judy will take responsibility for modeling/statistics lectures and lab periods.  Aaron will be available to help during the lab sessions. Richard (our mathematician!) will help us understand some of the mathematics behind the modeling and statistics.  Judy and Aaron will split responsibility for reading lab reports, and grading exams and quizzes.

Tentative Schedule:

Week 1, April 1: Introduction to Modeling. Read (by Tuesday 9:30!) Ch. 1-2 Meadows Thinking in Systems.

Week 2, April 8:  Introduction to statistics. Read (by Tuesday 9:30) Gotelli Ch. 1.  Read by Wednesday 10am Ch. 2, 3).

  • *First stats quiz in lab.
  • — Intro to Statistics – Why?
  • —  Probability
  • —  Variables
  • —  Summary statistics
  • —  Graphing basics

Week 3, April 15: Experimental design & Hypothesis testing (Ch. 4, 5, 6), Manipulative vs. Natural experiments; Press vs. Pulse experiments

  • —  Replication and randomization
  • —  Sampling designs – an introduction
  • —  Hypothesis testing
  • —  Comparing two means
  • —  Parametric vs. nonparametric methods

Week 4, April 22: Bestiary of Experimental & Sampling Designs; Intro to ANOVA, (Ch. 7).
Advanced ANOVA Ch.10 – put off until Week 5
***No class on Wednesday, April 24 (Day of Absence)***

  • —  Exam Review
  • —  Comparing 2 means – t-test or ANOVA
  •   Comparing many means – ANOVA
  • —  Parametric vs. nonparametric ANOVA
  • —  Advanced ANOVA topics – put off to Week 5

Week 5, April 29: Exam Review, data management and archiving
* In-lab midterm Moodle exam (Tuesday);
take home midterm exam given on Wednesday (due the following Tuesday).

  • More ANOVA
  • Spreadsheets and databases
  • Storing and cleaning data
  • Metadata

Week 6, May 6: Intro to Regression, (Ch. 8, 9 [239-262]); Advanced Regression (Ch. 9); Linear relationships among variables

  • Simple Linear Regression
  • Correlation
  • If time: An Brief Introduction to:
    • Multiple Linear Regression
    • Structural Equation Modeling
    • Model Selection
    • If time: How to make tables and figures
  • Parametric vs. nonparametric methods

 

Week 7, May 13: Categorical dependent variables & (if time) CART models (Ch. 11), Chi-square tests

  • Contingency tables
  • Goodness of fit tests
  • CART models ?

Week 8, May 20: Scientific Visualization Lab – Processing. 

Week 9, May 27: Monday – Holiday!  Tuesday and Wednesday – Review for Final, Final in-class Moodle Exam (Tuesday or Thursday in lab). Final take home exam (due Friday).

Topics we will skip:
Students with solid prior statistical work encouraged to pursue these topics;  Ask Judy for labs!

  • Advanced ANOVA
  • Ordination & Multivariate Methods (Ch. 12)
  • The Measurement of Biodiversity (Ch. 13),
  • Multivariate analyses
  • MANOVA
  • Distance measurements
  • NMDS Ordination
  • MRPP and PerMANOVA
  • Other types of ordination
  • Mantel tests
  • Indicator Species AnalysisSpecies richness
  • Diversity indices
  • Rarefaction curves
  • Species accumulation curves

Week 10, June 3 – reserved for project work!

Evaluation Week, June 10: Conferences.
*Evergreen Graduation: Friday, June 14!

Some modeling/statistics projects for advanced stats students:

  1. Do one or more of advanced topics (listed above)
  2. Learn R, and do something with it, e.g.,
    1. complete as many of the labs in both JMP and R, then compare the two packages
    2. learn how to link R to Python, then write some ‘extensions’ to R
    3. Do further work in Stella, e.g., to implement a complex model and learn how to interpret results.  Judy has some rather large models written in other languages.
    4. Conduct a research project on scientific visualization, perhaps specific to the science domain of interest to you.
    5. Learn processing, or another visualization language, and complete a visualization project.
    6. Write a data and/or visualization plugin to the VISTAS’ software tool (see blogs.evergreen.edu/vistas.