© pixabay | sumanley

The missing link?

Geospatial and digital behavioral data

Umfragedaten sind die in den quantitativen Sozialwissenschaften am häufigsten verwendete Datenart. Sie wurden für die Untersuchung einer Vielzahl gesellschaftlich relevanter Themen verwendet. Sie haben jedoch auch spezifische Einschränkungen. Ein effektiver Weg, diese zu überwinden, ist die Verknüpfung von Umfragedaten mit anderen Datentypen, z. B. mit Geodaten oder digitalen Verhaltensdaten.
In diesem Blogbeitrag werden die Vorteile der Verknüpfung von Umfragedaten mit anderen Datentypen erörtert, aber auch, was Forschende beachten müssen, wenn sie dies tun wollen. Daher stellen wir einen allgemeinen Rahmen für die Organisation des Datenverknüpfungsworkflows vor.

Data Linking – What is it?

Data linking sounds like an abstract concept, but let’s try to be specific: For us, data linking means the enrichment of one focal dataset with information from another auxiliary dataset through specific identifiers that can be used to create the link between the data sources. In our case, the focal dataset is survey data, and its basic structure remains unchanged in the linking process. In the end, it is just extended by additional attributes (variables) from another data source.

Admittedly, the definition we use here is somewhat narrow. There are plenty of other data linking (often also called linkage) approaches out there, such as probabilistic linkage, which does not require unique/unambiguous identifiers. Yet, we focus on a common approach for social scientists working with survey data.

Data Linking – What is it good for?

Survey data can provide a wealth of interesting insights. Nevertheless, they also have some clear limitations regarding which things they can measure reliably. This is where data linking comes in. If we link survey data with other types of data, we can increase their analytic potential and address some of their limitations at the same time.

Survey data are typically based on self-reports. This can be both a key strength and a weakness. Though self-reports can be used to assess a huge variety of attitudes, opinions, and behaviors, a problem with self-reports is that they can be unreliable. One issue in this context is social desirability. For example, people tend to report to consume substantially more news than they actually do, as it is socially desirable to consume a lot of news.

Another issue is that respondents may simply not be able to remember things they are asked about. Imagine that you are asked how many times you checked your Twitter feed yesterday. In this case, you are probably able to provide a good answer. However, imagine you are asked how often you did that in the last week or month. In that case, your answers are quite likely going to be guesstimates.

Other things may simply not be known to respondents. For example, it is quite unlikely that people are able to accurately report the noise level in decibels in their neighborhood or the exact unemployment rate in their hometown. […]

Den vollständigen Blog-Eintrag lesen Sie hier.