The use and misuse of statistics in the early years of the People’s Republic
Branko Milanović is an economist specialised in development and inequality. His newest book is “Capitalism, Alone: The Future of the System That Rules the World”
Cross-posted from Branko’s Substack Site
The statistical work during the first fifteen years of the People’s Republic of China can be usefully, if somewhat simplistically, divided into three periods as the excellent book “Making it Count” by Arunabh Ghosh argues. The first goes from the foundation of the People’s Republic in 1949 to approximately 1956. During that period the Chinese statistical system and the overall approach to statistics were heavily influenced by the experience of the Soviet Union. The statistics were seen as a handmaiden of planning. The implication of that view was, as it became clear at a very important conference in Moscow in 1954 (i.e., after Stalin’s death but before Khrushchev’s “thaw”) that statistics is a social science and that its use is directly related to the tasks of industrialization and development. Its three key principles, as applied by Chinese statisticians too, were exhaustiveness, completeness, and objectivity. This meant that the entire phenomenon studied should be covered and documented, and that it should be done in a non-probabilistic “objective”, almost descriptive, way. The implication was to relegate what is today the dominant view of philosophy of statistics into the more abstract mathematical statistics that hardly ever dealt with social phenomena. (The politicization of statistics in the Soviet Union became such, Ghosh writes, that several prominent statisticians decided to move away from anything that may be politically controversial and to apply their statistical knowledge to the study of astronomy.)
The Soviet approach was soon found wanting in China. It put extremely high demands on providers of information, generated a huge amount of paperwork so much so that the State Statistics Bureau (SSB) was drowning in data—which paradoxically it did not know how to summarize into useful information for policy-makers. Thus the two contradictory phenomena appeared: on one hand the providers of data complained of the enormous, and quasi-continuous, cost in effort and time, while, on the other hand, SSB was unable to fulfill its role. Ghosh shows that the problems were very severe in the agricultural sector, composed of hundred of thousands of villages and farms from which crucial information about yields and production was needed. The system was less inefficient for the much smaller and more concentrated sector of industrial enterprises.
With the political change in 1956 and 1957, leading to the break of close relations with the Soviet Union, there was also a change in the approach undertaken by Chinese statisticians. They turned much more towards India. India was then also beginning its Second Five-Year Plan (1956-61) and it saw statistics as an important planning tool. But rather than using the exhaustive censuses it pursued, under the influence of its famous statistician P. S. Mahalanobis (Professor Ma to the Chinese), the system of random surveys. Such surveys, it was argued, were not only faster and cheaper than the alternatives, but produced the statistics (for example, on grain or cotton yield) that were accurate and whose mean values had a bias that could be quantified.
Mahalanobis, who was personally and politically close to Nehru, was able to stimulate an interest in Indian statistics in Zhou Enlai and other Chinese officials during one of their visits to India. Propelled by the politics of Sino-Indian rapprochement in the wake of the Bandung conference, there were several years of close relations between the Indian Statistical institute in Calcutta and SSB in Beijing. SSB began a cautious move away from comprehensive enumerative approach toward the use of random sampling.
Despite several practical advantages of random sampling, one should not disregard the philosophical differences between the two approaches. Ghosh’s book brings them out quite well. The comprehensive and exhaustive approach aims to a full and complete understanding of social reality. Like in Borges’ short story “On exactitude in science”, its aim is nothing less than the replication of reality it studies. The sampling approach is more limited in its objectives, more pragmatic and utilitarian, and holds that through randomization and stratification it is able to comprehend the same reality much more cheaply, quickly and in a more purposeful manner.
Before we come to the third period, and the third approach, it is important to mention that throughout all this time in the background was present yet a different method, championed by Mao himself, when he studied social structure in rural areas of Hunan in 1927. Mao privileged the ethnographic method with researcher’s direct involvement. The ethnographic method is comprehensive but is also purposeful in the sense that its objective is not to study the peasant society for its own sake, but to find out, through careful observation of reality, what are the differences in class interests, and what classes are likely to support or to oppose communist policies. The ethnographic approach advocated an unmediated contact with, and direct knowledge of, reality that is studied. That is not a feature that the comprehensive enumeration or sampling normally exhibit. There is a distance between the people who supply information in factories and fields, those who collect it, and the statisticians in the center who decide how to present it to the public and the policy-makers.
The statistical methods used during the first and the second period were to some extent antithetical to Mao’s view where the producer of information should be personally involved with the object of his study. It is true that the direct knowledge of the reality that is being studied is helpful but Mao’s approach to complex and large economies, and to the China that at the time had more than 700 million citizens, is simply not feasible.
The third period begins with the anti-Rightist campaign in 1958 and the Great Leap Forward in 1959-60. It led to the abandonment of the earlier approaches in favor of “typical” or “purposeful” sampling, where researchers are not interested in the integrity of the phenomenon but in some of its typical or average features. In terms of the field with which I am familiar, distribution of income and consumption, the typical approach does not aim to cover the entire spectrum of incomes that are being received, i.e. the poor, the middle class, and the rich; rather it focuses on a priori selected types of households who are studied in detail. In other words, the interest is how various typical households are faring, not how all households are doing. The typical approach has its origin in the early Soviet family budgets surveys of the 1920s that were concerned with the rural-urban differences and where the objective was to look at how typical industrial household compares with the typical agricultural household. (One can go even further back to the mid-19th century English surveys of workers’ households.) There are two major problem with this approach: its neglect of the entire distribution, and its a priori selection of what typical is. Of course, the latter is driven by policy choices and, as we shall see, it produced disastrous effects during the Great Leap Forward.
Ghosh’s discussion of the use and misuse of statistics during the Great Leap Forward (GLF) is especially important. While it is commonly argued that the statistical information during the GLF collapsed as the center became disorganized and weakened by the placing of political correctness before professional skills, and the collection of information became decentralized with clear incentives to present only positive, and to suppress every negative, information, Ghosh argues that this is not a full story. Ideological change in statistics was also to blame. Even if political incentives of the suppliers of information to show a much more rosy picture, are left aside, the methodological choice led to the misrepresentation of reality. During the GLF information was collected mostly from the villages that were doing things relatively successfully or were not affected by the worst effects of famine. Data that were then presented to the leaders, under such extraordinary circumstances, were biased by the very design of surveys. (Obviously, had circumstances been less dramatic, the consequence of the use of typical surveys would be far less.)
Ghosh’s book is an important contribution because the philosophy behind the statistical research is very poorly understood and the history of how statistics has evolved to the position that it now occupies is neither taught nor known even among the practitioners. The contribution of the book, while it looks at China specifically, is not only that it enables us to study the philosophy behind the statistical work in China but to see the ideological or philosophical underpinnings to much of statistical work in general. Another contribution of the book, as the author mentions, is that it departs from the simplistic US- or Soviet-centric approach and looks at the early instances of the South-South cooperation and the role that the exchange of information, ideas and methods between India and China played in the 1950s. The reader is left in no doubt that had that cooperation continued, and had it did not been derailed by the Great Leap Forward and the political incidents following the revolt in Tibet and the exile of Dalai Lama to India, the Chinese statistical situation would have been much better in the 1970s than it was.
The book ends before the Cultural Revolution which caused yet another, possibly even greater, shock to the Chinese statistics. The number of statistical publications by SSB during the first several years of the Cultural Revolution fell practically to zero. That the statistical office which in the 1950s employed, over all China, more than 200,000 people came to employ barely several hundred, illustrates the extent of the disruption. The next stage, which continues to this day, is only hinted at: it begins in the early 1970s with some improvements in the collection of data, and finally with 1981 when the first issue of the Chinese statistical yearbook was published.