Statistical Data Analysis

Statistical analysis of ecological data is a highly specialised area that has been a mainstay of the work of BioEcoSS Ltd over the last fifteen years. In undertaking analyses for clients and collaborators, as well as advising students, BioEcoSS has always promoted the following principles of robust analysis:

  • Data visualisation and exploration – always explore the distribution of data variables and how they are related to each other before undertaking formal anayses. This is the most overlooked phase of data analysis but can be the most crucial. Apart from revealing missing data, inputting errors and outliers, this stage highlights variables that may need transformation, categorisation or are highly cross-correlated.
  • Careful and conservative hypothesis testing – only test hyoptheses that can be supported by the data. Be especially aware of multiple testing issues, which can be a big problem with modern statistical software.
  • Be aware of spatial and temporal autocorrelation. This is a particular problem for ecological survey and monitoring data, where spatial and temporal components of survey design may not be easily controlled by the observer. Unlike correctly controlled experimental design, it is often easy to draw conclusions about predictive relationships between variables which, in fact, are caused by other, unrecorded and spatially distributed variables.
  • Occam's razor – never use a more complex modelling process, just for show, when a simpler one will suffice!

BioEcoSS has applied these principles to many analyses, utilising traditional statistical methods such as ANOVA, linear and logistic regression, generalized linear modelling, principal components analysis and cluster analysis. Furthermore, to overcome the limitations of some of these techniques on "dirty" ecological data, BioEcoSS has developed many computer-intensive routines, such as Monte Carlo methods and genetic algorithms to uncover hidden relationships in multi-dimensional data.