Statistics and network analysis

in

Interview to Tom Snijders by Mario Diani

Mario Diani: Although it massively uses quantitative analytic methods and mathematical models, social network analysis has had a somew hat problematic relationship with statistics. Given your position as a professor of statistics with a very strong interest in social networks, you have found yourself in an excellent position to attempt to bridge the two fields. How to go about such integration has actually been at the core of your lecture in Riva. Could you remind us first of all of the main reasons why the integration of network analysis and statistics has proved so problematic?
 

Tom Snijders: In short, the statistical method usually focuses on the distribution of properties across a number of cases, while network analysis focuses on relations between the members of a given population. Moreover, statistics starts by extracting [samples: campioni] from a population, and then tests the possibility to generalize back findings from samples to that population. It operates on the assumption of independence between the units of the sample, namely, that the values taken by one case on any variable cannot affect in any way the values of other cases on the same variable. In contrast, social network analysis is not interested in studying general populations, but specific subgroups within them (for example, adopters of certain lifestyles, people suffering from a certain medical condition, or advocates of a certain political cause); it usually has a limited interest in generalizing the findings for those specific groups to the general population; and finally, it explicitly acknowledges that cases not be independent, e.g., that one actor’s involvement in ties to certain actors significantly conditions the probability of ties to other actors. These are the main sources of the ‘tensions’ to which I referred in my lecture.

MD: What has been done to overcome such tensions and build a more fruitful integration between statistics and network analysis?

TS: A lot of progress has been made since the 1980s. It is difficult to summarize in a few sentences all the relevant contributions, yet I would single out the development of models that explain specific network configurations drawing largely, although not exclusively, on network properties (the so-called ‘endogenous effects’). Examples of this line of work are the [Markov graphs: grafi di Markov] developed by Frank and Strauss in the 1980s, the [p* models: modelli p*] developed in the 1990s by Wasserman, Pattison, and others, and more recently the broader family [of Exponential Random Graph Models: dei cosiddetti Exponential Random Graph Models], able to represent many different kinds of dependencies simultaneously, and yielding also goodness of fit assessments. My own work has mostly been in this particular line of research, focusing mainly on the developement of technics and models for the longitudinal analysis of networks.

MD: What are in your view the next challenges for social network analysis? In particular, how is network thinking going to affect our treatment of social science data?

TS: We must first of all understand that we need a shift in the way we think of explanation. We could focus merely on the improvement in the amount of explained variance that one could get from the introduction of network variables into social science standard models. But that would not be quite enough. We should really ask, instead, how do actors choose their networks, i.e., how do they end up linked to certain social milieus rather than others. Only as the next step, we should then ask to what extent such particular network configurations may affect actors’ prospects and behavior. Of course, social ties are determined by class, gender, ethnic and territorial origins, but there are margins of autonomy for actors too. In this sense, as social capital theory has long suggested, networks can be fruitfully viewed as much as dependent variables as independent ones. Our relations have effect on us, therefore, we actively try to ‘improve’ our networks by focusing on the building of certain ties to the detriment of others.

MD: How do you think will the integration of social network analysis and statistics proceed?

TS: I can confidently predict that there will be a growing interest in modeling, and in the development of multilevel network studies, that enable us to take better into account the variable impact of different social settings on network dynamics. I also expect greater attention to be paid to models, suited to the analysis of large networks. At the same time, I see some risks associated with the growing use of statistics, in particular, their mechanistic use. I fear that people might commit uncritically to statistical models, forgetting about concepts and approaches, traditionally associated with network analysis. What we need, instead, is a combination of the two, maintaining the interest in substantive problems but paying greater attention than in the past to issues of goodness of fit. Another area of development which I see as crucial has to do with appropriate software for network models. I have put a lot of effort myself in this direction, with the development of the package Siena for the longitudinal analysis of networks, and I intend to continue in the years to come.