Statistical Data Analysis
Statistical analysis of ecological data is a highly specialised area that has been a mainstay of the work of
BioEcoSS Ltd over the last fifteen years. In undertaking analyses for clients and collaborators,
as well as advising students, BioEcoSS has always promoted the following principles of robust analysis:
- Data visualisation and exploration – always explore the distribution of data variables and
how they are related to each other before undertaking formal anayses.
This is the most overlooked phase of data analysis but can be the most crucial.
Apart from revealing missing data, inputting errors and outliers, this stage highlights variables
that may need transformation, categorisation or are highly cross-correlated.
- Careful and conservative hypothesis testing – only test hyoptheses that can be supported by the data.
Be especially aware of multiple testing issues, which can be a big problem with modern statistical software.
- Be aware of spatial and temporal autocorrelation. This is a particular problem for ecological
survey and monitoring data, where spatial and temporal components of survey design may not be easily controlled
by the observer. Unlike correctly controlled experimental design, it is often easy to draw conclusions
about predictive relationships between variables which, in fact, are caused by other, unrecorded
and spatially distributed variables.
- Occam's razor – never use a more complex modelling process, just for show,
when a simpler one will suffice!
BioEcoSS has applied these principles to many analyses, utilising traditional statistical methods
such as ANOVA, linear and logistic regression, generalized linear modelling, principal components analysis
and cluster analysis. Furthermore, to overcome the limitations of some of these techniques on "dirty"
ecological data, BioEcoSS has developed many computer-intensive routines, such as Monte Carlo methods
and genetic algorithms to uncover hidden relationships in multi-dimensional data.