網易首頁 > 網易號 > 正文申請入駐

JCS本刊論文 | 處理效應異質性分析——機器學習方法帶來的機遇與挑戰

2026-04-19 18:04:11　來源: 社會學研究雜志

北京舉報

分享至

Welcome to follow

The Journal of Chinese Sociology

2026年4月1日，The Journal of Chinese Sociology（《中國社會學學刊》）上線文章An analysis of heterogeneous treatment effects: new opportunities and challenges with machine learning techniques（《處理效應異質性分析——機器學習方法帶來的機遇與挑戰》）。

| 作者簡介

胡安寧

復旦大學社會學系教授

主要研究方向：文化社會學、教育社會學、社會研究方法

陳云松

南京大學社會學院教授

主要研究方向：計算社會學、數字人文、社會心態、社會治理、社會網絡

吳曉剛

上海紐約大學應用社會經濟研究中心主任；紐約大學文理學院社會學教授

主要研究方向：教育不平等、教育與發展、教育財政

Abstract

The investigation of heterogeneous treatment effects is a current focus for many empirical sociologists. This article considers causal random forests and Bayesian additive regression trees as methods to demonstrate how algorithmic approaches potentially transcend conventional model-form and covariate-selection restrictions, and can examine complex interactions between the treatment effect and covariates. These two methods, respectively, illustrate the ideas of “matching” and “simulation”, and provide estimates of the individual treatment effect. This enables scholars to examine the empirical distribution of treatment effects and investigate the determinants of their heterogeneity. However, algorithm-based methods can also pose new challenges. For instance, arbitrariness in parameter configuration and algorithm variation can undermine the consistency of empirical results.

Keywords

Heterogeneous treatment effects; Machine learning; Causal random forests; Bayesian additive regression trees

Problem statement

Empirical research in the social sciences often centers on the relationships among variables. With the increasing popularity of causal inference methods in the field, quantitative social research has gradually shifted its focus from identifying correlations to estimating causal effects (Hu 2012; Morgan and Winship 2015). In addition to estimating the average treatment effect, a growing number of scholars have turned their attention to the heterogeneity of treatment effects. This attention to heterogeneity has a solid sociological foundation. On the one hand, many meso-level theories in sociology are constructed around the distinct characteristics of subpopulations, emphasizing individual-level diversity. It is thus understandable that sociologists, when testing or extending these theories, must consider differences in treatment effects. On the other hand, from a practical standpoint, a large body of policy-oriented research is concerned with heterogeneous treatment effects across various groups (Heckman and Vytlacil 2001; Heckman and García 2017). This analytical logic parallels that of the emerging practice of precision medicine in biomedical research, which tailors treatment to specific types of patients. Such practice-oriented inquiries require researchers to address the heterogeneous nature of treatment effects.

Traditional regression models analyze heterogeneous treatment effects by incorporating interaction terms (Aiken et al. 1991). Subsequent methodological developments have increasingly relied on the estimation of propensity scores, transforming the investigation of heterogeneous treatment effects into the study of how treatment effects vary with individuals’ estimated propensity scores (Xie and Wu 2005; Wu 2008; Carneiro et al. 2010; Xie et al. 2012). While these approaches have demonstrated distinct strategies for estimating heterogeneous treatment effects, each comes with its own limitations. With the growing integration of machine learning techniques and causal inference in the social sciences, an emerging methodological trend is to investigate heterogeneous treatment effects using algorithmic methods.

Against this backdrop, this article aims to provide a systematic review of the methodological developments in social science research on heterogeneous treatment effects, tracing the progression from traditional linear models to recent machine learning algorithms, paying particular attention to the strengths and limitations of respective approaches. Building on this review, the article focuses on two non-parametric tree-based algorithmic methods: causal random forests and Bayesian additive regression trees. It offers a detailed account of their algorithmic logic and explains how they overcome several limitations of conventional methods when analyzing heterogeneous treatment effects. This article further reflects on the potential challenges brought by emerging algorithmic-based methods, such as compromised robustness results that stem from differences in parameter settings and algorithmic designs. Such lack of robustness in the analysis of heterogeneous treatment effects may itself be described as the “heterogeneity of heterogeneity” problem. Finally, an empirical case on modeling the heterogeneity of returns to elite universities in China is presented to illustrate the advantages and limitations of these methodological approaches.

Traditional approaches

to analyzing heterogeneous

treatment effects:

a methodological overview

Interaction terms in

traditional regression models

Although interaction-term models are widely used, methodological inquiry has long questioned whether they can accurately capture heterogeneous treatment effects (Hainmueller et al. 2019). These concerns arise primarily from two sources. First, there may be many C variables that contribute to treatment effect heterogeneity. But within a given dataset, it is not feasible to add an unlimited number of interaction terms to the linear model. As a result, the selection of interaction terms inevitably involves a degree of subjectivity, and at times, arbitrariness. Second, the functional form of the interaction (e.g., including the squared or cubic terms of variable C, or interactions involving three or more variables) is often subjectively specified by the researcher. These specifications do not necessarily reflect the underlying data-generating process. The complexity of interaction structures is typically beyond the scope of conventional bivariate interaction analyses.

Propensity score-oriented

heterogeneous treatment

effect analysis

The propensity score-oriented analysis of heterogeneous treatment effects has its own advantages. For instance, this approach does not examine the role of a specific variable C, but rather reduces the full set of C variables into a single dimension, namely, the propensity score Z, and then investigates how the treatment effect varies with different levels of Z. In this sense, the method overcomes the first limitation of interaction-term models in conventional regression analysis as mentioned above. Furthermore, the treatment effect and the propensity score form a two-dimensional space, so when the relationship between them goes beyond the linear specification, we are allowed to draw on semi-parametric or even non-parametric smoothing methods to address potential nonlinearities (Keele 2008). In this way, the second limitation of the interaction-term analysis in said regression models is also addressed.

Specifically, Yu Xie and his collaborators developed three approaches to analyzing heterogeneous treatment effects based on propensity scores (Xie et al. 2012; Zhou and Xie 2020). The first is known as the stratification-multilevel method, which divides the estimated propensity scores into discrete intervals, then estimates the treatment effect within each stratum, and finally examines the variation across strata to assess the heterogeneity of the treatment effect. The second approach is known as the matching-smoothing method, which first performs propensity score matching to estimate the treatment effect for each matched pair. Subsequently, a smoothing curve is fitted to the series of pair-specific treatment effects to examine how the treatment effect changes with the propensity score. The third approach, referred to as the smoothing-differencing method, differs from the second by first fitting separate curves for the treated and control groups, depicting how the outcome variable Y changes with the propensity score. It then takes the difference between the two curves to estimate the heterogeneity in treatment effects. This series of propensity score-oriented methods developed by Xie and colleagues shares conceptual similarities with the marginal treatment effect framework proposed by economist James Heckman (Carneiro et al. 2010). For further discussion on the marginal treatment effect method, see Hu (2015) and Zhou and Xie (2019), which are not elaborated on here.

Although the propensity score-oriented analysis of heterogeneous treatment effects addresses the limitations of interaction terms in regression models, it also has its own shortcomings. First, the estimation of the propensity score is subject to both model uncertainty and coefficient uncertainty (Hu 2017). Second, while summarizing multiple covariates into a single propensity score Z simplifies the analysis through dimensionality reduction, it also makes it difficult to identify which specific covariate C contributes to generating the heterogeneity in treatment effects. Lastly, both Xie’s and Heckman’s approaches focus primarily on describing how the treatment effect varies with the propensity score, but they are limited in analyzing what factors actually lead to such heterogeneity in treatment effects.

Algorithm-based

machine learning methods:

causal random forests

and Bayesian additive

regression trees

According to the classic distinction made by statistician Leo Breiman (2001), both interaction terms in linear regression models and propensity score-based analyses of heterogeneous treatment effects fall under the category of models based on stochastic generation of data. This analytical paradigm requires clear specification of the statistical model, and the analytical focus lies on specific statistics provided by the model (such as particular coefficients). In contrast, algorithm-based analytical tools do not make assumptions about the data-generation process. Instead, they apply specific algorithms to the data to “let the data speak” and uncover certain associations. While early algorithmic models were not widely adopted in the social sciences due to limitations in computing power and data availability, the increasing accessibility of computational resources has made it necessary to take seriously the potential role of algorithmic models in social science research.

One of the most prominent developments in this regard is the integration of causal inference techniques with machine learning algorithms, which represents a key frontier in methodological innovation within the social sciences. Building on earlier exploratory approaches, such as generalized additive modeling and partial linear regression, a new set of algorithmic models designed for causal inference emerged. This article focuses on two particular algorithm-based methods utilizing tree models to investigate heterogeneous treatment effects. They are causal random forests (Athey et al. 2019; Wager and Athey 2018) and Bayesian additive regression trees (Chipman et al. 2010; Hill et al. 2020). Since both methods are grounded in the logic of tree modeling, the following section provides a general overview of tree-based models.

Tree models

and random forests

A “tree model” is a general term referencing a series of algorithmic methods based on data partitioning (Breiman et al. 1984). When the dependent variable Y is categorical, the model is typically referred to as a decision tree; when Y is a continuous variable, it is referred to as a regression tree. For ease of exposition, we refer to both as “tree models” throughout this paper.

Casual random forests

Causal random forests can be viewed as a direct application of the random forest algorithm to causal inference problems (Athey et al. 2019; Wager and Athey 2018). The fundamental goal of this method is to maximize the variation in treatment effects across different tree nodes. More specifically, causal random forests exhibit distinct characteristics compared to traditional random forests in three key aspects: node splitting, model fitting, and treatment effect estimation.

Node splitting

Model fitting

Estimation of the treatment effect

Bayesian additive

regression trees

Interpretability of tree models:

variable importance measures

For empirical research in quantitative social sciences, scholars place great emphasis on the interpretability of models. In causal inference, the treatment variable and the outcome variable are typically well-defined. Therefore, model interpretability often centers on understanding the role of control variables (or covariates) in the estimation of causal relationships (Molnar 2020). In the case of tree-based models, each split at a node involves scanning through all covariates one by one. As a result, across multiple nodes, some covariates are used more frequently, while others are used less often. This difference in usage frequency essentially reflects a given covariate’s explanatory power for the outcome variable; the higher the explanatory power, the more frequently it is selected for node splits. Thus, by examining how often each covariate is reused across multiple tree models, we can assess its overall importance. In the machine learning literature, this importance of covariates is also referred to as “feature importance”.

It is important to note that the feature importance of covariates carries different meanings in causal random forests and Bayesian additive regression trees. In Bayesian additive regression trees, traditional tree fitting is applied, where the role of covariates lies in increasing the purity of the outcome variable Y within child nodes at each split. In contrast, causal random forests select covariates at each node to maximize the difference in estimated treatment effects across child nodes. In other words, important covariates in Bayesian additive regression trees are those that best differentiate the values of the outcome variable, while important covariates in causal random forests are those that best distinguish between treatment effects across nodes. This conceptual distinction in the definition of feature importance across the two models warrants particular attention.

New methodological tools,

new opportunities

and new challenges

Unlike traditional approaches such as regression interaction terms or propensity score-based analysis of heterogeneous treatment effects, both causal random forests and Bayesian additive regression trees rely on more complex tree-based algorithms to process data. These two methods offer new methodological tools for estimating heterogeneous treatment effects. Given their methodological features, each presents new opportunities for researchers in quantitative social sciences, while simultaneously introducing new challenges.

New opportunities:

approximating individual

treatment effects

and their applications

Compared to traditional methods, one of the key advantages of causal random forests and Bayesian additive regression trees lies in their ability to provide an approximation of the individual treatment effect. A fundamental challenge in causal inference is that we cannot observe both the factual and the counterfactual outcomes for the same individual (Holland 1986). Precisely because of this limitation, conventional causal inference techniques typically estimate the average treatment effect for specific groups rather than the individual treatment effect.

Although the counterfactual outcome cannot be directly observed, it can be conceptualized as a missing value and imputed (Ding and Li 2018). In other words, we only need to use some method to fill in the missing counterfactual value; then, by subtracting it from the observed factual outcome, we can obtain an estimate of the individual treatment effect. Concerning this method for missing value imputation, existing literature offers two main strategies. One is “matching.” Matching aims to find comparison units that closely resemble the unit of interest but differ in their treatment assignment (Stuart 2010). The other is “simulation” (Abadie and Imbens 2011). Simulation seeks to fit a comprehensive model for the outcome variable Y. Through this model, the factors influencing Y can be identified and analyzed. As long as individual A conforms to the model, A’s counterfactual outcome can be estimated by changing the value of the treatment variable T. In other words, the difference in predicted outcomes under different treatment conditions can be used to approximate the individual treatment effect.

Based on the above methodological discussion, it becomes clear that causal random forests adopt a matching strategy. By generating a series of tree models, each unit in the training set receives a weight that represents the probability of appearing in the same terminal node as the target individual across all trees. Since individuals assigned to the same node tend to have similar values across the covariates C, this weight essentially reflects the degree of proximity or “match” between the training unit and the individual of interest. The greater the weight, the more similar the unit is to the target individual, and the more influence it will have on estimating the individual treatment effect.

In contrast, Bayesian additive regression trees employ a simulation strategy. Through the Bayesian framework, posterior distributions of parameters are obtained based on specific prior distributions, which define the structure of the additive regression trees. To estimate the individual treatment effect for a given individual A, A’s information is input into the model. The Bayesian additive regression trees then simulate the expected values of Y under different values of T, and the difference between these simulated outcomes yields the estimated individual treatment effect for A. This analysis relies on the assumption that the additive tree model has been sufficiently trained, allowing us to simulate the counterfactual outcome.

So, what is the value of using causal random forests and Bayesian additive regression trees to approximate an individual treatment effect for the analysis of heterogeneous treatment effects? First, both causal random forests and Bayesian additive regression trees are tree-model algorithms based on algorithmic construction. This enables both methods to minimize human intervention in model specification and functional form assumptions. As such, they overcome, to a large extent, the limitations imposed by regression-based interaction terms and propensity score-oriented approaches in modeling heterogeneity in treatment effects.

Furthermore, the construction of tree models (e.g., the determination of splitting points) systematically evaluates the various combinations of values of covariates (excluding T). Therefore, one distinctive advantage of both causal random forests and Bayesian additive regression trees is their capacity to comprehensively explore potential interactions between treatment variable T and the covariates. This level of exhaustive exploration is unattainable in traditional methods analyzing treatment effect heterogeneity.

Finally, the estimated individual treatment effects can themselves serve as objects of further analysis. As discussed earlier, conventional approaches such as regression-based interaction terms and propensity-score-based stratification tend to emphasize the description of heterogeneity rather than its explanation. In contrast, causal random forests and Bayesian additive regression trees help researchers estimate the size of the treatment effect for each individual. This, in turn, facilitates further investigation into what factors account for inter-individual differences, thereby allowing researchers to better explain the heterogeneity of treatment effects.

New challenges:

the heterogeneity

of heterogeneity

Although causal random forests and Bayesian additive regression trees offer new avenues for investigating heterogeneous treatment effects through the approximation of individual effects, they also introduce challenges for empirical researchers. A significant challenge can be referred to as the “heterogeneity of heterogeneity.” The first “heterogeneity” here, denotes the estimation of treatment effect heterogeneity, whereas the second indicates inconsistencies within these estimates resulting from algorithmic differences.

There are two main reasons for the phenomenon of the “heterogeneity of heterogeneity.” On the one hand, compared to traditional statistical analysis, algorithm-based analytical tools require the configuration of a substantially larger number of algorithmic parameters. Although most algorithmic models provide default values, these defaults are not tailored to specific research problems and therefore cannot guarantee universal applicability. In such cases, different researchers may adopt different preferences for parameter settings. As a result, even when analyzing the same research question, differences in parameter configuration may lead to divergent empirical results. On the other hand, variations in results may also arise from differences between algorithms themselves. Among the various machine learning-based analytical techniques, algorithms occupy a central role in contrast to traditional models. In commercial applications beyond academia, there is even talk of algorithmic supremacy (O’Neil 2018). While it may still be premature to speak of algorithmic hegemony in the social sciences, there is no doubt that algorithms play a decisive role in shaping empirical results, and differences across algorithms can constitute a significant source of empirical heterogeneity.

Empirical example

Research question and data

This article presents an empirical analysis of heterogeneity concerning future financial returns associated with elite university education in China, whether and how the income premium associated with attending elite universities, relative to ordinary institutions, varies across individuals (Hu and Vargas 2015). Data are drawn from the Beijing College Students Panel Survey (BCSPS), which contains detailed background information on students prior to college enrollment. These background variables serve as potential covariates in the analysis, helping to mitigate possible selection bias. Moreover, as a longitudinal dataset, the BCSPS includes follow-up information on college graduates’ first job income after completing university. In the analysis below, elite universities are defined as Peking University, Tsinghua University, and Renmin University of China, which are three institutions that constitute distinct sampling frames within the BCSPS, thus ensuring a sufficient sample size for each. For more details on the BCSPS, see Wu (2016a, b).

Variables

In the following analysis, the treatment variable is whether the respondent graduated from Tsinghua University, Peking University or Renmin University of China, respectively (1?=?yes, 0?=?no). The outcome variable is the individual’s monthly income from the first job after graduation. In addition to these two variables, we include a series of potential covariates. These are: gender (1?=?female, 0?=?male); ethnicity (1?=?Han, 0?=?ethnic minority); age, whether the respondent repeated the final year of high school (1?=?yes, 0?=?no); current academic year (1?=?first year of college, 3?=?third year); annual family income (log-transformed); number of siblings; father’s education level (1?=?no formal education, 2?=?primary school, 3?=?junior high school, 4?=?high school, 5?=?vocational/technical school, 6?=?secondary specialized school, 7?=?junior college, 8?=?bachelor’s degree, 9?=?postgraduate and above); mother’s education level (same scale as father’s); whether the father is a member of the Communist Party of China (1?=?yes, 0?=?no); whether the mother is a member of the Communist Party of China (1?=?yes, 0?=?no); whether the father works full-time (1?=?yes, 0?=?no); whether the mother works full-time (1?=?yes, 0?=?no); ranking of the high school attended (1?=?nationally key school, 2?=?provincially key school, 3?=?prefecture-level key school, 4?=?county-level key school, 5?=?non-key school); and, finally, region of residence prior to college enrollment (1?=?eastern provinces, 2?=?central provinces, 3?=?western provinces).

Results from

traditional analytical methods

As discussed above, this study examines the heterogeneity in income returns to elite universities, compared to regular universities. We first investigate whether the heterogeneity in returns is associated with the propensity score for attending an elite university (Brand and Xie 2010). In Table 1, Model I uses a series of background variables to fit a logistic regression model. Based on this model, we estimate the propensity score for each individual in the dataset. Model II then fits an ordinary least squares (OLS) regression model that incorporates an interaction term between the treatment variable and the propensity score. The results indicate that the interaction between elite university attendance and the propensity score is not statistically significant. Therefore, based solely on the regression model interaction term, there is no evidence that the treatment effect varies with the propensity score.

Figure 2 presents the results of three methods for analyzing heterogeneous treatment effects proposed by Xie and colleagues, along with the marginal treatment effect model developed by Heckman. The stratification-multilevel method indicates a clear pattern of positive selection. Individuals who are more likely to enter elite universities tend to receive higher returns to education (as shown by the upward-sloping trend). However, when examining the results of the matching-smoothing method and the smoothing-differencing method, no clear evidence of heterogeneous treatment effect is observed. Finally, the results from the marginal treatment effect model also support the conclusion of positive selection (note that the horizontal axis represents the resistance variable, which is conceptually the inverse of the propensity score).

In brief, neither the interaction terms in the regression model, the matching-smoothing method, nor the smoothing-difference method provides evidence of heterogeneous treatment effects. However, both the stratification-multilevel method and the marginal treatment effect analysis indicate some degree of treatment effect heterogeneity. This divergence itself reflects how different analytical approaches can influence empirical conclusions. What, then, can we learn from the approximation of individual treatment effects? In what follows, we apply the causal random forests and the Bayesian additive regression trees, respectively, to conduct the analysis.

Approximation and application

of individual treatment effect

Figure 3 displays three notable characteristics. First, the two distributions largely overlap and exhibit similar shapes, suggesting a relatively high level of consistency in the individual-level causal effects estimated by the causal random forests and the Bayesian additive regression trees. Second, the peaks of the two distributions differ. Along the X-axis, the mode (i.e., the value corresponding to the peak of the distribution) of the Bayesian additive regression trees is higher than that for the causal random forests. This indicates that the two methods yield different estimates for the most likely treatment effect. Third, both distributions reveal a considerable degree of dispersion, indicating strong heterogeneity in treatment effects across individuals, even when examining the same treatment, namely, the income return of elite universities.

Which covariates are most important in generating these estimates? To address this, we present the variable importance indicators of the covariates, as shown in Fig. 4.

Based on the individual-level estimates of the treatment effect, we can directly use a scatter plot to examine how the treatment effect varies with the propensity score. The corresponding results are shown in Fig. 5. Regardless of the analytical method used, there is a positive association between the estimated individual treatment effect and the propensity score (P?<?0.001). In other words, the return to attending an elite university increases as the probability of entering an elite university increases, indicating the presence of a positive selection effect.

In the following sections, we further explore which specific covariates influence the heterogeneity of treatment effects. The results from the OLS model are presented in Table 2. Among the predictors, having more siblings, higher paternal educational attainment, and higher household income are significantly associated with a greater individual treatment effect. This suggests that individuals from more privileged family backgrounds tend to gain higher returns from attending elite universities compared to those from less advantaged families. However, students from key high schools nationally, appear to receive lower returns from elite universities. This may be related to sample selection effects. For example, a considerable number of students from such high schools may choose to study abroad after high school graduation or pursue further studies after completing degrees at elite domestic universities rather than immediately entering the job market. In such cases, elite university graduates who enter the labor market right after graduation may not be those best positioned to earn high incomes. In addition to these variables, maternal education attainment and full-time employment are both statistically significant in the two models, although their estimated effects point in opposite directions.

The analysis above reveals notable differences between the two analytical methods. For instance, when estimating individual-level causal treatment effects, the results based on Bayesian additive regression trees indicate significant associations between several covariates (e.g., high school tier, grade level, region, father’s membership in the Communist Party of China and employment status) and the treatment effect. In contrast, the results based on causal random forests do not exhibit similar empirical patterns. This discrepancy may stem from differences in the underlying algorithmic logic, a point that will be further discussed in the following section on the “heterogeneity of heterogeneity.”

Another possible explanation lies in data limitations. Algorithm-based analytical techniques typically rely on “big data” to provide sufficient information for model training. Hence, the sample size of 2821 people used in this study may be insufficient for adequately training causal random forests and Bayesian additive regression trees. If this is the case, the trained models may lack precision, contributing to the observed differences between the two methods. To examine the potential impact of sample size, we adopt the idea of the bootstrap method. Specifically, we generate a new dataset of 100,000 observations using sampling with replacement from the original BCSPS data. The analysis shows that even when the sample size is expanded to 100,000, the associations between individual-level treatment effects and covariates estimated by the two methods still exhibit clear methodological differences. Based on this finding, it can be preliminarily concluded that the observed empirical discrepancies are more likely attributable to methodological differences than to sample size limitations.

The heterogeneity

of heterogeneity

After presenting the advantages of algorithm-based approaches rooted in machine learning, this section turns to a key challenge posed by what we refer to as the “heterogeneity of heterogeneity.” This analysis first examines internal heterogeneity by assessing how changes in fundamental algorithm parameters affect the variability of empirical results (whereas the previous analyses were based on default parameter settings). For the causal random forests, we sequentially estimate several models: a basic model (with all parameters set to default); a variable-selection model (retaining variables whose importance scores, measured by the random forest feature-importance metric, exceed the average importance across all variables); an honest algorithm model (employing the honesty algorithm); and additional models using varying sample proportions. In total, we construct six models based on different algorithmic parameter settings. The estimated distributions of individual treatment effects and their interrelations are shown in Fig. 6.

As shown in Fig. 6, although the estimated distributions of individual treatment effects differ slightly across different parameter settings, the results exhibit a high degree of correlation across model specifications, with correlation coefficients ranging from 0.79 to 0.98 (all statistically significant). Therefore, causal random forests display relatively low internal heterogeneity.

As discussed above, the primary parameter for the Bayesian additive regression trees is the number of tree models, with the default setting specifying 200 trees. In addition to this baseline model, we also fitted Bayesian additive regression tree models with 5, 10, 50, 100, and 500 trees, respectively. The estimated individual treatment effects under these different specifications and their mutual correlations are presented in Fig. 7.

As such, Bayesian additive regression trees exhibit a high degree of internal heterogeneity. Although all correlation coefficients are statistically significant, the estimated individual treatment effects across parameter configurations show only modest correlations.

External heterogeneity can be assessed by comparing the analysis results between causal random forests and Bayesian additive regression trees. If external heterogeneity is low, the estimates from both algorithms should be similar and thus show high correlation. Otherwise, we have reason to believe that there is substantial external heterogeneity. The analysis results are shown in Fig. 8. Figure 8 presents the correlation matrix of the individual treatment effect estimates from the two methods. Clearly, the correlations are not high, indicating a significant level of external heterogeneity in the results produced by different algorithms.

In sum, causal random forests demonstrate relatively low internal heterogeneity, whereas Bayesian additive regression trees show relatively high internal heterogeneity. The contrast between the two algorithms further reveals that algorithm-based analytical methods exhibit considerable external heterogeneity.

Conclusion

Empirical studies in the social sciences have shown that treatment effects exhibit heterogeneity due to individual differences. Traditionally, the analysis of heterogeneous treatment effects relies on interaction terms in regression models. However, this approach suffers from limitations related to model specification and variable selection. These limitations have prompted researchers to examine how treatment effects vary with changes in the propensity score. This propensity score-based approach overcomes the restrictions of regression interactions, yet it introduces its own issues model dependence and estimation uncertainty in the specification of the propensity score. Moreover, because the propensity score summarizes covariates into a single index, it does not allow researchers to directly identify which specific variables contribute to the observed heterogeneity. This method, then, primarily illustrates the presence of heterogeneity rather than identifying its determinants.

In light of these challenges, algorithm-driven machine learning methods have emerged as new analytical tools. Considering causal random forests and Bayesian additive regression trees, as detailed in this paper, it can be said that these methods do not require model pre-specification and thus avoid the constraints of parametric modeling. In addition, both methods are capable of fully capturing the complex interactions between treatment effect and covariates. Causal random forests and Bayesian additive regression trees, respectively, represent the logic of “matching” and “simulation” in estimating individual treatment effects, thereby helping researchers identify both the empirical distribution and the determinants of heterogeneous treatment effects. However, these new tools also pose new challenges. Internal heterogeneity may arise from different parameter settings within a single algorithm, while external heterogeneity may emerge from differences between algorithms themselves.

With advancements in computing power and the increasing accessibility of statistical software, the integration of machine learning methods into quantitative social science research has become increasingly feasible. This methodological development invites reflection and discussion on its implications for sociology as a discipline. Compared with conventional quantitative sociological approaches, such as regression models, machine learning is fundamentally algorithm-driven. It differs not only in model-building logic (i.e., whether the goal is to understand data-generating processes or to enhance prediction accuracy) but also in practical implementation, relying on pre-packaged algorithms versus researcher-defined parameters. Thus, machine learning methods can be viewed as a new set of tools for empirical sociologists, beyond conventional quantitative techniques. These tools can be used independently or to augment traditional approaches, for instance, by overcoming constraints related to model specification. At the same time, for sociology as a discipline, this also implies the need to update and revise method training. Furthermore, given the widespread application of machine learning in other fields, such as business analytics and urban planning, its incorporation into sociological research may also serve as a valuable means for promoting interdisciplinary collaboration and exchange.

This article has engaged in a series of discussions on the integration of machine learning and causal inference. However, algorithm-based analytical approaches represent only one direction of development in the era of computational social science. In addition to incorporating algorithmic models into quantitative analysis, another hallmark of computational sociology is the analysis of large-scale unstructured data and the exploration of emergent patterns within complex models. How these new directions collectively shape the disciplinary characteristics and future trajectory of quantitative social science thus calls for further in-depth exploration and debate.

免費閱讀并下載全文

引用本文

Hu, A., Chen, Y. & Wu, X. An analysis of heterogeneous treatment effects: new opportunities and challenges with machine learning techniques. J. Chin. Sociol. 13, 7 (2026). https://doi.org/10.1186/s40711-026-00257-3

https://link.springer.com/article/10.1186/s40711-026-00257-3

文章僅為作者觀點，不代表本刊立場

更多相關文章

以上就是本期JCS推文的內容啦！

定期查收講座、征文信息/趣文推薦/熱點追蹤/主題漫談

學術路上

JCS陪你一起成長！

關于JCS

《中國社會學學刊》(The Journal of Chinese Sociology)于2014年10月由中國社會科學院社會學研究所創辦。作為中國大陸第一本英文社會學學術期刊，JCS致力于為中國社會學者與國外同行的學術交流和合作打造國際一流的學術平臺。JCS由全球最大科技期刊出版集團施普林格·自然(Springer Nature)出版發行，由國內外頂尖社會學家組成強大編委會隊伍，采用雙向匿名評審方式和“開放獲取”(open access)出版模式。JCS已于2021年5月被ESCI收錄。2022年，JCS的CiteScore分值為2.0（Q2），在社科類別的262種期刊中排名第94位，位列同類期刊前36%。2023年，JCS在科睿唯安發布的2023年度《期刊引證報告》（JCR）中首次獲得影響因子并達到1.5（Q3）。2025年JCS最新影響因子1.3，位列社會學領域期刊全球前53%（Q3）。

歡迎向《中國社會學學刊》投稿！

Please consider submitting to The Journal of Chinese Sociology!

官方網站：

https://journalofchinesesociology.springeropen.com

特別聲明：以上內容(如有圖片或視頻亦包括在內)為自媒體平臺“網易號”用戶上傳并發布，本平臺僅提供信息存儲服務。

Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.