How to account for error and bias?



Despite the huge potential of CS furthering scientific research, the utility of data collected by citizen scientists is still debated. According to Crall et al., (2010), the involvement of large numbers of volunteers with different levels of competence is criticized to result in decreased quality of measurements and significant sources of bias. An example of a common bias is the general reluctance of citizen scientists to enter negative data (non-observed species) or to avoid sample specimens of their favorite species. However, in many CS projects data collection resembles the methods used in studies on landscape ecology, accounting for the within-observer variability when conducting statistical analyses. In 2013, Bird and colleagues presented a study on the application of CS data in conservation ecology and policy by addressing issues of data quality using an array of statistical tools. After giving a description on main data quality issues, the paper presents modeling approaches available for CS data, combined with a broad overview on statistical ecology literature. Further, they contemplate how to address specific cases of error and bias in datasets, applying statistical approaches designed for meta-data.

Among the many useful references regarding study design and correct choice of statistical analyses, Bird et al. (2013) devise five recommendations; first, one should understand the trade-off between volunteer effort, data quality, and data quantity. They advocate that project designers need to be fully aware of the scope of the project, including recruitment, training, maintenance of interests, and the degree of data collection. Second, one must record survey information. By this the authors refer to the necessity of recording as much data as possible, even though it seems unnecessary, to help define and account for sources of random error and bias. Third, one is advised to plot the data to identify potential inconsistencies. Illustrating data helps accessing if requirements on data quality are met and may lead to an earlier detection of errors. Fourth, have a network of statisticians among your team. Considering analytical approaches prior to sampling will help decide what type of data will have to be recorded. At last, keep it simple. To avoid complicated models with many unknown parameters, favor simpler, more robust statistical approaches where the requirements of the model can be met. Robust approaches are more likely to lead reliable findings, resulting in qualitative research.
The recent growth of CS datasets and the gain of more statistical tools create unprecedented opportunities to investigate and understand global-scale patterns and changes in species distributions and biodiversity. For a deeper insight in the content of the article, see the reference attached in the further reading section below.

Further reading:
Bird, T. J. et al. (2014) ‘Statistical solutions for error and bias in global citizen science datasaets’, Biological Conservation, (173), pp. 144–154.

Crall, A. W. et al. (2010) ‘Improving and integrating data on invasive species collected by citizen scientists’, Biological Invasions, 12, pp. 3419–3428. doi: 10.1007/s10530-010-9740-9.

Comments