Why (and when) we test at scale: No Lean Season and the quest for impact

Update: Evidence Action is terminating No Lean Season. We share more about this decision in this blog post.

Introduction and summary

Evidence Action’s Beta incubator identifies evidence-based interventions and incubates them for scale to reduce the burden of poverty for millions. As part of our incubation process, we subject programs to ‘testing at scale’ to ensure they continue to meet key criteria of impact and cost-effectiveness as they grow. No Lean Season, a late-stage program in the Beta incubation portfolio, provides small loans to poor, rural households for seasonal labor migration. Based on multiple rounds of rigorous research showing positive effects on migration and household consumption and income, the program was delivered and tested at scale for the first time in 2017. Performance monitoring revealed mixed results: program operations expanded substantially, but we observed some implementation challenges and take-up rates were lower than expected. An RCT-at-scale found that the program did not have the desired impact on inducing migration, and consequently did not increase income or consumption[1]. We believe that implementation-related issues – namely, delivery constraints and mistargeting – were the primary causes of these results. We have since adjusted the program design to reduce delivery constraints and improve targeting. Data from performance monitoring of the ongoing 2018 program suggest these issues have been substantially resolved, but the real test will be the results of the ongoing, second RCT-at-scale, which will begin to emerge in early 2019.

While the results of the 2017 RCT-at-scale are disappointing, No Lean Season’s experience illustrates precisely why we view “testing at scale” as a critical component of our Beta incubation process. Evidence Action believes firmly that resources should be directed to what works most effectively to improve opportunities for the world’s poor. No Lean Season is grounded in a robust evidence base that includes multiple studies[2] showing the effects of providing travel subsidies on seasonal migration and on increasing income, consumption, and welfare. These were, however, tightly controlled studies, conducted at a relatively small scale. And as we know, as programs expand, their impact can change — for a range of reasons we delineate below. With No Lean Season, testing at scale enables us to identify and respond to emerging issues. Consistent with our organizational values, we are putting ‘evidence first,’ and using the 2017 results to make significant program improvements and pivots. We are continuing to rigorously test to see if program improvements have generated the desired impacts, with results emerging in 2019. We have agreed with GiveWell that No Lean Season should not be a top charity in 2018. Until we assess these results, we will not be seeking additional funding for No Lean Season.

In this post, we share:

An introduction to Evidence Action’s Beta Incubator, and why we test programs at scale as part of the incubation process
An overview of No Lean Season, a program in the Beta incubation portfolio
A description of the 2017 implementation round of No Lean Season, including:
3.1 Operational results
3.2 RCT-at-scale results
A deeper dive into unpacking the results of testing No Lean Season at scale in 2017
A summary of where we are now and what’s next for No Lean Season
Why do we test at scale?

1. Evidence Action Beta: Taking research results to scale

Evidence Action’s Beta incubator builds evidence-based, cost-effective programs that can reach millions, tens of millions, and even hundreds of millions of people. We work with researchers, practitioners, technical experts, and others to translate proof-of-concept evidence into robust, viable programs. Along the way, we prototype and pressure-test concepts, build scalable models for real-world delivery, and optimize design and cost-effectiveness.

All interventions that we consider bringing into Beta to “incubate for scale” have already been subjected to randomized evaluations. When we innovate and create new interventions motivated by evidence generated in small-scale, experimental conditions, and when we incubate programs based on complex theories of change, we make a point of “testing at scale.” Initial evidence of impact and a compelling theory of change are both essential — but they are just the beginning.

As a program moves through the Beta incubation process, testing at scale [3] is important because:

Scale brings the opportunity –and the responsibility– to question and to learn. As a program scales, so does the level of resources invested in it and the potential scope of its influence on real people and communities. Testing at scale helps us hold ourselves accountable for this growing responsibility, to question our assumptions, and to keep our eyes on the ultimate goal of cost-effectively reducing the burden of poverty for millions.
Programs can change as they scale. As promising early-stage prototypes and pilots grow into programs, we re-design, standardize, and automate them for delivery by large institutions, such as governments and local NGOs, that have massive reach. Testing at scale helps us understand whether the scaled model of a program continues to deliver the intended impact.
Delivery could change as programs scale. Increasing scale also requires using (or developing) new, more complex delivery platforms and implementation strategies. As a program moves away from tightly-controlled iterations delivered by researchers, implementation ‘as-intended’ and ‘as-delivered’ may diverge. Testing at scale helps us interrogate the entire impact pathway when a program is delivered in a real-world setting.
Impact could change as programs scale. When the scope or intensity of a program grows substantially, the very nature of its impact may change as a result. Scaled programs may influence the environments in which they operate or may alter aspects of the underlying contexts that make them effective or necessary. As coverage expands, a program may be more or less effective in new geographies. Testing at scale provides an opportunity to understand these critical dynamics.

Rigorous, randomized impact evaluations are a cornerstone of testing at scale. But robust monitoring of a program’s performance and deep engagement with administrative data are also key (as illustrated by the analysis in Section 3.1). Combining rigorous evidence and monitoring gives us the best chance to unpack the results of testing at scale and to understand not only whether a program worked, but why — or why not.

2. No Lean Season: A program in the Beta portfolio

One such program in Evidence Action’s Beta incubator was motivated by Gharad Bryan, Shyamal Chowdhury, and Mushfiq Mobarak’s rigorous research demonstrating that providing small travel subsidies for temporary migration during the ‘lean season’ — the period between planting and harvesting when prices rise and job opportunities are scarce — could substantially increase the income and consumption of poor, rural households in northern Bangladesh. Since 2014, our Beta incubator, with several partners, has strived to transform these compelling results into No Lean Season: a scalable program aimed at reducing the burden of seasonal poverty for millions.

No Lean Season [4] has progressed steadily through a complex and rigorous incubation process with distinct but complementary programmatic and research agendas. On the research front, Yale Professor Mushfiq Mobarak, his co-authors, and our evaluation partner, Innovations for Poverty Action, have expanded the original evidence base substantially, conducting multiple studies to gain a deeper understanding of the direct and indirect effects of the intervention on households, villages, and markets [5].

Programmatically, we have worked closely with RDRS Bangladesh, a national NGO with a large microfinance program, to build and iterate on a scalable program design. In 2016, as part of an operational pilot, RDRS delivered a full implementation round of “version 1.0” of No Lean Season for the first time. It was clear that more work was needed to streamline, standardize, and automate the program, but the basic building blocks were there — a workable design, strong partnerships, an expanding and promising evidence base, encouraging cost-effectiveness projections, and potential for growth.

Many important questions remained. Would the program design and delivery platform hold up to massive growth? Would delivery at scale affect nearby rural villages and labor markets to which migrants travel? Would there be other unintended consequences, either positive or negative? Would actual costs stay within the targeted range of cost-effectiveness? And above all — would No Lean Season still generate the desired impact on migration, and ultimately on increasing income and consumption?

There was one way to answer these questions: we needed to test at scale. And in 2017, we set out to do just that.

3. No Lean Season in 2017: A plan to deliver –and test– at scale

By 2017, after substantial investments in standardizing and automating the program[6], No Lean Season was ready to be delivered, and tested, at scale for the first time.

Our goal was twofold: 1) to deliver No Lean Season as it might look in ‘steady state’ — by real, end-line implementers; using automated tools and standardized protocols; and covering a wider geographic area; and 2) to do so in a way that would enable robust monitoring of operational performance (how it was delivered) and rigorous evaluation of impact (the effect it had on migration, income and consumption).

As the table below summarizes, we expanded operations from 15 to 52 of RDRS’s Branch Offices, the end-line field units of their microfinance business and the scaffolding on which No Lean Season’s delivery is built. We aimed to survey as many as 180,000 households for eligibility — a sixfold increase in household coverage from 2016. Knowing that this was an ambitious target, we based our projections on a slightly more modest estimate of 130,000. Of these, we expected to extend eligibility to just over half of all households — approximately 72,000. Based on take-up rates (the percentage of eligible households who take out loans) in previous studies ranging from 37% to 56%, we expected to disburse between 30,000 and 40,000 loans in 2017.

To assess No Lean Season’s operational performance, we would use the program’s extensive administrative data and insights from our independent qualitative and quantitative monitoring of implementation.

To evaluate No Lean Season’s impact, we would rely on the results of a rigorous ‘RCT-at-scale’ conducted by Innovations for Poverty Action and a research team led by Professor Mobarak.

As is often the case, these two sets of information are delivered on different timelines. On one hand, administrative and monitoring data are available first — and, in the case of No Lean Season, in close to real time as the program rolls out. On the other hand, the results of the impact evaluation are not available until at least four to six months after the annual implementation program cycle concludes, given that we aim to measure effects that may occur in the months after the program ends, and due to the time required to conduct household surveys in both treatment and control villages. In the sections that follow, we describe both types of results.

3.1 Operational performance in 2017 based on administrative and monitoring data: Increased coverage, but concerns about take-up

No Lean Season achieved a significant expansion in coverage and scale of operations in 2017. RDRS’s team of over 100 Migration Organizers delivered No Lean Season in almost 700 villages between August and December, 2017. The team’s productivity and the efficiency gains from automation and digitization surpassed our expectations; over 205,000 households were surveyed for eligibility, far exceeding our original projection of 130,000. We expanded eligibility[7] to a higher proportion of households than in previous rounds, ultimately granting eligibility to 158,000 households — almost twice as many as we had originally planned for, and over nine times as many as in 2016. The RDRS team conducted over 2,300 “offer meetings” with groups of eligible households in their villages to tell them about the program, and followed up with many households through door-to-door visits.

The significant expansion during the household survey phase of the program initially boded well for the program’s potential cost-effectiveness and RDRS’s ability to manage scaled operations. As we will see, however, exceeding initial coverage targets and expanding eligibility had perhaps created other challenges.

We expected that the demand for loans would be highest during the first half of the lean season, as labor opportunities grew scarce and as prices begin to rise in rural villages. As the chart below demonstrates, however, our administrative data shows that while interest in taking loans did indeed grow quickly and steadily, conversion into loan disbursements lagged.

Our analysis of the administrative and monitoring data flagged three areas of concern.

Slower-than-expected pace and flow of loan disbursements. Many Branch Offices limited disbursements to a steady two days per week throughout the entire lean season, rather than responding dynamically to demand. Anecdotal reports of ‘cash-outs’ at Branch Offices, for example, indicated that there may have been other constraints to disbursement.
Some suboptimal aspects of the program design and resource allocation. For example, two Migration Organizers had been assigned to each Branch Office, but the Branch Office catchment areas varied widely, creating differences of up to 300% in Migration Organizer workload.
The total number of loans disbursed — 40,574 — was extremely close to the upper-bound target in our original projections for the 2017 round. Given the subsequent expansion of eligibility, this now reflected a take-up rate of only 26%, significantly lower than the expected range of 37%-56% observed previously. The source of this concern merits some further explanation outlined below.

As described earlier in Section 3, we had based our planning for this work on our expectations of identifying 72,000 eligible households, not the 158,000 ultimately identified as eligible. Based on the originally planned eligibility and previously-observed take-up rates ranging from 37% to 56%, we projected demand for between 30,000 and 40,000 loans. With expanded eligibility, however, 40,000 loans translated to a much lower potential take-up rate of 26%. Our available funding, the contract and agreed operating budget with RDRS, and regulatory approvals from the Bangladeshi NGO Affairs Bureau[8] all reflected the original eligibility estimates.

We pivoted to address this unexpected increase in program coverage and eligibility and, with key support from one of No Lean Season’s primary funding partners, GiveWell, were able to raise new funds to allow for up to 80,000 additional loans, enabling take-up rates as high as 76%. We updated our contract and budget with RDRS to cover increases in eligibility and potential demand. But the program was well underway at this stage, and given the sometimes lengthy implementation lags, including the need for new regulatory approvals, it is unclear if this pivot was fully operationalized in a timely way.

The similarity of the final loan disbursement figure (40,574) to the original upper-bound target (40,000) may or may not be a coincidence, and it is difficult to disaggregate the anchoring effects of initial targets from the other types of operational constraints we observed. But it is clear that the 2017 take-up rate were far lower than in previous years.

Take-up rates alone are not enough to assess the impact of the program on migration, income, and consumption — only the impact evaluation could do that, and at this point in time we didn’t yet have the results of the RCT-at-scale. But it was clear, as we began to plan for the 2018 implementation round, that we needed to make adjustments to the program in order to ensure that there were no artificial resource or capacity constraints on the availability of loans in the future.

Consistent with one of our core organizational values – “iterate, again” – we focused on three areas of improvement to the program for 2018.

Redesigned our partnership agreement with RDRS and obtained a three-year regulatory approval to enable greater flexibility in predicting the demand for loans and to ensure that sufficient capital and capacity would be available on the ground, even if initial eligibility targets were exceeded, so that there would be no artificial constraints or caps on the number of loans available to meet demand.
Redesigned program phases and protocols to finish making loan offers earlier and to disburse loans faster during the first half of the lean season, when demand is highest. To minimize any artificial constraints on disbursements, we updated key performance indicators to incentivize the right delivery priorities.
Substantially improved the delivery platform and overall implementation structure. In particular, we:
a. Worked with RDRS to refine the management structure and roles and responsibilities of the dedicated staff working on the program
b. Improved training materials and the overall training strategy
c. Changed how field officers were deployed so that their workloads would be more uniform
d. Revamped the mobile phone application and backend software to enable more real-time access to data dashboards and management tools that can be used to dynamically respond to demand as the program rolls out.

These changes were all put in place as we embarked on the 2018 implementation round. Meanwhile, we eagerly awaited the results of the impact evaluation of the 2017 program.

3.2 Impact in 2017 based on the RCT-at-scale: No effect on migration, income, consumption

As the results of the “RCT-at-scale” began to emerge in mid-2018, it was clear that the program had not, in fact, had the desired impact on inducing migration. While earlier iterations of the program found that travel subsidies increased migration by 16 to 40 percentage points, the effect of the program on migration in 2017 was below 5 percentage points. “As a result,” the research team noted[9], “the program did not produce any detectable effects on outcomes secondary to migration, such as expenditures, caloric intake, food security, or income for treated households.” Given that additional migration was not induced, it follows logically that the knock-on effects of increasing income and consumption were not found. These results were disappointing, but not surprising, in light of our observations from administrative and monitoring data, and the lower-than-expected take-up rates.

4. Unpacking the results of testing No Lean Season at scale: 3 sets of hypotheses

So what happened? There are three categories of hypotheses that might explain the results of the 2017 implementation round of No Lean Season. The results might:

Be due to 2017 being an anomalous year. Severe flooding in the program area, for example, might have reduced interest in migration.
Reflect the true effect of the program at scale. Spillover effects (i.e., a huge increase in migration in many villages may have caused a decrease in migration in others) might have reduced the overall effectiveness of the program, or perhaps the program was expanded into villages that were unlikely to benefit from it. General economic growth in the region may have lowered the demand for seasonal migration.
Be due to intentional or unintentional changes to program design and delivery that happened as a part of scaling up. Examples of these types of changes include capacity constraints on delivery, or changes to how loans were targeted to households.

Drawing on experimental and non-experimental data from “testing at scale,” Evidence Action and the research team have been actively exploring these hypotheses. In the sections that follow, we share a high-level overview of each category of hypotheses and end with our working hypothesis that the results are likely to be explained by one form of change to program design and delivery – mistargeting.

4.1 Hypothesis 1: 2017 was an anomalous year

It is possible that the lack of impact in 2017 was due to 2017 being an anomalous year. The most obvious difference was the extreme level of flooding. While the monsoon is a yearly phenomenon with expected flooding in the northwestern region, 2017 was a uniquely bad year, with reports calling it the worst in 40 years. Analysis reveals no difference in the effect on migration in flooded versus non-flooded areas; this leads us to conclude that flooding did not meaningfully contribute to the 2017 results. Flooding may have changed job availability in destination areas, but we have not explored this empirically.

The extreme flooding in 2017 draws attention to the fact that there may be other phenomena that can affect the delivery and the impact of No Lean Season in any given year — some of which can be anticipated and observed, and some of which can not. In 2013, for example, unprecedented mass political strikes broke out in anticipation of elections and were the likely cause of a trial round failing to induce migration that year. Extreme or unexpected conditions affect all types of programs, and are certainly relevant to programs with an annual cadence like No Lean Season. We do not, however, find compelling evidence that the 2017 results can be sufficiently explained by this hypothesis.

4.2 Hypothesis 2: The 2017 results reflect the true effect of the program at scale

It is also possible that the 2017 results reflect the true effect of the program at scale for one or more of the following reasons:

a) When delivered as a scaled program, providing subsidies for seasonal migration ceases to have the effects observed when delivered at smaller scale;

b) When delivered in other geographies as its coverage area expands, the program doesn’t have similar impacts in those new areas as in those where it was previously tested.

c) The broader underlying context may have changed – what worked in the past may no longer be effective today.

We explored each of these possibilities.

a) When delivered as a scaled program, providing subsidies for seasonal migration ceases to have the effects observed when delivered at smaller scale.

In 2017, for the first time in the multiple rounds of research on seasonal migration subsidies, the researchers collected data from “spillover villages,” those which were surrounded by treatment villages but not treated themselves. This was done to assess if making offers to hundreds of poor households across several villages could affect those living nearby.

The data shows that households in spillover villages are potentially less likely to migrate during the same period, even compared to villages in a pure control group. Though previous research found that having close neighbors and friends migrating increased one’s likelihood to migrate as well, this recent result indicates that being surrounded by hundreds of others who are migrating may have the opposite effect. That is, while individuals may be more likely to migrate if people to whom they are closely connected are migrating, they may be less likely to migrate if they perceive that hundreds of other people who would compete with them for the same jobs are also migrating. But this latter effect is small (likely under 5 percentage points) and not enough to explain the drop from an effect on migration of 16-40 percentage points observed in previous research rounds.

b) When delivered in other geographies as its coverage area expands, the program doesn’t have similar impacts in those new areas as in those where it was previously tested.

Scale-up expands the program to new districts. The original study was tested in Lalmonirhat and Kurigram, two districts in northern Bangladesh, and expanded into seven additional districts in 2017. In 2017, we found that base migration rates are higher in Lalmonirhat and Kurigram, and that in those districts, treated households were 7 percentage points more likely to migrate in 2017 than the control group. This still falls substantially short of the 16-40 percentage point increase observed in previous studies, indicating that the geographic expansion alone cannot explain the generally low take-up and effects on migration rates observed in 2017.

The differences observed between new and old districts, however, does remind us that migration subsidies might not work everywhere and that effects at scale may vary in different areas.

c) The broader underlying context may have changed – what worked in the past may no longer be effective today.

Lastly, it is possible that basic conditions underpinning why or how the program worked previously have changed. While there are many types of underlying trends that might be relevant, we focused on one that would have the most significant implications for the theory of change — if seasonal poverty itself was no longer a problem. Of course, if No Lean Season ‘didn’t work’ in 2017 because the problem it sought to address had been largely solved, this would be great news and a wonderful reason to discontinue it.

Economic conditions have indeed improved over the last 10 years in this part of Bangladesh, and, as employment opportunities increase in rural areas, migration (and travel subsidies for migration) may be less necessary as a coping strategy during the lean season. While there is some anecdotal evidence that there are more employment options in this part of Bangladesh today, and that real wages available to the poor have also increased, the researchers do not believe that this fully explains the low take-up in 2017. Why? The data, as demonstrated in the graph on the left, still reveals clear evidence of a lean season, when consumption drops substantially and over 20% of poor households report “regularly” skipping meals, suggesting there is still space and demand for migration and subsidies.

4.3 Hypothesis 3: The 2017 results were caused by intentional and unintentional changes to program design and delivery

The design and the delivery of ‘travel subsidies for seasonal migration’ underwent many changes — some intentional and some unintentional — as it grew from a tightly-controlled study to a prototype to a scaled program called No Lean Season. The third set of hypotheses focuses on changes to the design and delivery of the program that may have caused the lack of impact in 2017.

For the viability and cost-effectiveness of scale-up, the program had to increase the number and efficiency of Migration Organizers, the front-line staff at RDRS who deliver the program. In the 2014 RCT, for example, each of the front-line workers offered subsidies to around 200 households; in 2017, each Migration Organizer was responsible for making offers to an average of 1,500 households[10]. Though these increased workloads may have contributed to lower disbursement rates, they do not alone explain the results. Most Migration Organizers disbursed over 400 loans, over twice the number of households assigned per Migration Organizer in 2014, and several disbursed over 500.

In contrast to previous tightly controlled research stages, in which everyone randomly selected for an offer received the offer and full access to a disbursement, during the 2017 implementation round, RDRS may have set targets for the number of loans that each Migration Organizer should disburse – a theory we’ll call “mistargeting.” In this case, Migration Organizers would most easily meet targets by making offers to ‘regular migrants’ (individuals who migrate frequently, and are likely to do so even without a loan) first, given that these individuals are more eager to take-up the loan, are easier to persuade, and require less effort. They are also most likely to comply with the loan condition (i.e., to migrate) and to repay the loan. If the set target is low relative to the number of potential migrants in the village(s), then the very people we are aiming to reach — ‘induced migrants’ (those who would have migrated with the help of a loan but without one, cannot) would be left out. The program would not induce migration, and yet targets and compliance metrics would be met by disbursements to the easier-to-reach ‘regular migrants’, making the program look ‘successful’ from an implementation perspective.

Four conditions must be in place for this theory of mistargeting to explain the outcomes observed in the 2017 RCT-at-scale.

Targets must be set for field staff.
Field staff must pay attention to those targets.
The targets must be too low relative to potential migration demand.
‘Regular migrants’ — or those who were most likely to have migrated anyway without a subsidy — should be most likely to receive the loans, either because they were targeted or because they were most eager to take them.

The results of a special-purpose survey conducted of Migration Organizers working on the 2017 program indicate that targets were indeed set (condition 1), and that these targets were salient to the field staff (condition 2). In this survey, 85% of Migration Organizers reported that they were given specific targets[11], and the majority independently reported the exact same target of 450 loans to be disbursed. As we see in the chart on the right, the data strongly suggests that individual Migration Organizers decreased their activity level once they reached their own assigned targets.

If all or most Migration Organizers were given a target of 450 loans, this would cover less than one-third of all eligible households — a much lower proportion of households than had taken up the loans in previous years (condition 3). So where did this figure come from?

To understand the overall expectations of disbursement figures, we must recall the initial projections for the size of the 2017 program, and the fact that the original contract with RDRS set an upper-bound target of 40,000 loans to be disbursed, which translated to 55% of anticipated eligible households. Even though we raised additional funds and amended the contract when the actual number of eligible households exceeded expectations, it seems plausible that implementation targets had already been set and were not adjusted to reflect the expanded coverage and resources.

The fourth condition necessary for the ‘mistargeting theory’ to work is that ‘regular migrants’ should be more likely to receive loans, either because they were (directly or indirectly) targeted, or because they were most eager to receive loans and ‘lined up first’. A comparison of loan recipients in 2008 and 2017 shows that those who have a history of migration (i.e., have migrated at some point over the last three years) were indeed more likely than those without a recent history of migration to have received loans in 2017, suggesting that the fourth condition also held.

We therefore have suggestive evidence that all four conditions necessary for this mistargeting theory held.

After considering the various possible hypotheses and analyzing the rich operational and experimental data from testing at scale, we suspect that the the third category of hypotheses — intentional or unintentional changes to program design and delivery as a part of scaling up — was the main source of the limited effects that were found in 2017. Other design changes necessary for scaling, such as increasing field officers’ workloads, may also have contributed to the results. Nonetheless, we believe that mistargeting (setting delivery targets and setting them too low) was the main culprit for the 2017 results, and is the explanation most consistent with the evidence base.

No single hypothesis may explain the entire difference in results as compared to previous years. Increases in workloads, expansion into new areas, and ambient changes in the environment may all contribute in some way. But — our synthesis of the results does not lead us to conclude that migration subsidies categorically “do not work at scale;” rather, we believe that delivery constraints and mistargeting were the most likely explanations for the 2017 results of testing No Lean Season at scale. This is particularly notable because improving delivery efficiency and removing the use of targets may be a addressable issues — and we have already taken steps to address them.

5. No Lean Season in 2018: Where are we now, and what’s next?

We are currently deeply engaged in the 2018 implementation round.

As discussed in Section 3.1 above, the 2018 program was set in motion before the 2017 RCT-at-scale results emerged. In response to operational monitoring results and lower-than-expected take-up rates, we worked with RDRS to revamp the delivery strategy and the program design for 2018. Given the extent of the changes that were made, we decided not to increase the scale of the program any further in 2018, focusing instead on executing design and delivery improvements within the same program footprint as 2017. As our concerns were subsequently confirmed by the emerging evaluation results, we also adjusted our evaluation plans for 2018, deciding to increase resources and bandwidth – in another round of “testing at scale”.

Soon after the 2018 implementation round began, as the results of the 2017 became available, Professor Mobarak traveled to Rangpur to present them directly to RDRS management, reinforcing the need to expand access to the program and to avoid the use of delivery targets.

In the meantime, the administrative data that comes in on a daily basis gives us some visibility into whether the changes in design, contracting, performance metrics, and approach to targeting are making a difference. As the chart below demonstrates, we can see that many more subsidies are being disbursed at a faster pace, and that take-up rates are dramatically higher this year as compared to 2017. As of mid-November, the overall take-up rate in the 2018 implementation round reached 60% — exceeding the maximum rate of 56% observed in previous rounds that were rigorously evaluated prior to 2017.

Early indications of increased take-up rates in 2018 are quite promising, but ultimately, the critical question is whether the program succeeded in inducing migration and increasing income and consumption. No Lean Season must empirically demonstrate a viable pathway to impact at scale in order to merit further investment. The answers to this question will begin to emerge from the results of the impact evaluation in early 2019. In consultation with our partners, key stakeholders, and a wide range of experts, we’ll assess those results in the broader context of the accumulated evidence base and a complex range of factors affecting the potential cost-effectiveness and scalability of the program.

We will grapple with several critical decisions about No Lean Season, including, for example: Should the program be delivered again in 2019? If so, at what scale? Were the changes we made to the design and delivery of the program in 2018 effective? How else might we optimize impact and efficiency?

Seasonal poverty affects hundreds of millions of rural families around the world. While we hope that No Lean Season will prove to be an effective tool in combating it, we are, first and foremost, advocates for evidence and impact, not for any particular program. We therefore will remain skeptical and make tough decisions consistent with our “evidence first” core value — the importance of delivering meaningful impact at scale demands no less.

6. Why do we test at scale?

Our experience with No Lean Season underscores, more broadly, the importance of the “testing at scale” phase of the Beta incubation process, and deepens our belief in why we do what we do here at Evidence Action. Had we decided on the basis of the previous evidence base to scale it within Bangladesh and beyond without “testing at scale,” we could have inappropriately imputed impact from smaller-scale studies to a larger, real-world setting in which the impact could differ substantially. With robust monitoring data alone, we were able to see some challenges in implementation, but not the full picture that emerges by combining highly complementary ongoing monitoring data with results from a rigorous trial. Testing No Lean Season at scale enables us to fulfill our core value of putting “evidence first,” and to use robust, rigorous evidence to inform our choices and decisions.

About the authors: Karen Levy is Senior Director, Innovation with the Beta Incubator at Evidence Action. She led the design, iteration on, and pressure-testing of No Lean Season during the first several years of the program’s development and growth and remains a key advisor to the No Lean Season team. Varna Sri Raman is the Director of No Lean Season with Beta. She leads the program, including partnerships with researchers and others.

Endnotes

[1] Bryan, Gharad, Mushfiq Mobarak, Karim Naguib, Maira Emy Reimao and Ashish Shenoy (2018). “No Lean Season 2017 Evaluation.” AEA RCT Registry. May 21. https://www.socialscienceregistry.org/trials/2685/history/29803

[2] G. Bryan, S. Chowdhury and A. M. Mobarak. “Under-Investment in a Profitable Technology: The Case of Seasonal Migration in Bangladesh,” Econometrica, September 2014 (Supplement, Econometrica Link); A. Akram, S. Chowdhury and A. M. Mobarak. “Effects of Emigration on Rural Labor Markets“, working paper. D. Lagakos, A. M. Mobarak, M. E. Waugh. “The Welfare Effects of Encouraging Rural-Urban Migration“, working paper.

[3] Does this mean we think that running randomized controlled trials (RCTs) at scale of all scaled programs is required? No. In 2015, we wrote that we do not necessarily measure the impact of our flagship, at-scale programs. It was, in our own words, a “controversial stance in an NGO community where M&E teams pride themselves in always measuring ‘impact.’” For fully-scaled, flagship programs that have been through rigorous incubation and pressure-testing, this makes sense, and we stand by this position. But this does not mean that we should never “test at scale.” Our goal, as we develop new programs, is to reach a point where formal impact evaluations may not be necessary — or even feasible. Evaluating impact at scale is an important part of establishing a compelling evidence base and a key step in the intensive Beta incubation process. Even flagship programs may benefit from ongoing testing at scale — for example, monitoring for drug resistance or measuring underlying trends that are relevant for targeting. These are types of ‘testing at scale’ but are not always large-scale RCTs.

[4] We thank the Global Innovation Fund, GiveWell, and a number of individual donors for their generous support of No Lean Season.

[5] Meghir, C., M. Mobarak, C. Mommaerts, and M. Morten (2017). “Migration and Consumption in Bangladesh”, working paper. Mobarak, A. M. and M. E. Reimão (2018). “Does Migration Alter Beliefs and Attitudes with Respect to Gender and Social Norms?”, working paper. Chowdhury, S., A. M. Mobarak, and M. E. Reimao (2018). “Seasonal Migration and Permanent Migration: lessons from eight-years of tracking in Bangladesh”, working paper. Mobarak, A. M. and A. Ramos (2018). “The Effects of Seasonal Migration on Intimate Partner Violence in Bangladesh: Evidence of Exposure Theory,” working paper. Bryan, G., S. Chowdhury, A. M. Mobarak, M. Morten and J. Smits (2018). “Too Much of a Good Thing? The Encouraging and Distortionary Effects of Conditional Cash Transfers,” working paper. Other papers cited above.

[6] Between the 2016 and 2017 implementation rounds, we worked intensively with RDRS counterparts to refine program protocols and to build a standardized toolkit for delivery. We developed a mobile phone-based application on the CommCare platform for use by field personnel to guide each stage of the program, including conducting household surveys, holding offer meetings, generating loan applications, processing disbursements and repayments, and capturing relevant data from households and from migrants.

[7] The goal of setting the threshold for eligibility is to focus the program’s resources where we think it is most likely to have the most benefit. For No Lean Season, we would, in theory, want to define eligibility as “households who experience the negative effects of seasonal poverty, who are likely to benefit from seasonal migration, and who would migrate with a loan but not without it”. As this is obviously unknowable, we use observable characteristics like “owning less than 50 decimals of cultivable land” and “reporting of missed meals in the preceding lean season” as proxies to identify the target population. In previous studies, 50%-60% of all households (as defined by these two proxies) were made eligible. The positive impact of subsidies for seasonal migration generated in these studies was substantial, and there was no reason to believe that households falling just outside the cutoff (who were also extremely poor) might not also benefit from the program. We predicted that expanding eligibility to include an additional 10% or 20% of households might be quite cost-effective. In doing so in the context of an RCT-at-scale, we also hoped to conduct analyses which would help us optimize eligibility cutoffs in future rounds.

[8] The NGO Affairs Bureau is a government bureau in Bangladesh, founded in 1990, that regulates non-governmental organizations. All NGOs that receive funds from outside Bangladesh are required by law to register with the Bureau, which falls under the Prime Minister’s office.

[9] Communication with researchers

[10] Several factors should be considered when comparing (expected and achieved) Migration Organizer capacity across program years.
In the context of northern Bangladesh, villages are typically sets of dense settlements separated by large rice fields. Travelling to, from, and between villages on a regular basis is extremely costly in terms of time. The marginal time costs of serving additional households within a given village, on the other hand, are relatively low. Therefore, reaching 80 or even 100 households in two villages, for example, can take much less time than serving 60 households spread over three or four villages.
In research rounds, subsidies were offered to only a small fraction of eligible households in each village. In 2014, each field officer served, on average, 200 households spread over 4.75 villages. During the 2016 operational pilot, using paper tools, Migration Organizers were each able to conduct 1,100 eligibility surveys and deliver offers to 540 eligible households spread over 2.73 villages.
For the 2017 implementation round, we anticipated further increases in efficiency from the use of a mobile phone-based application, and planned for each Migration Organizer’s workload to include between 700 and 1,000 eligible households spread over 4 to 6 villages.
In 2017, we exceeded survey coverage targets and expanded eligibility criteria; these changes raised Migration Organizers’ workload to an average of 1,500 households over 6.85 villages.

[11] Note that the program was designed with the intention that all eligible households should have full access to loans; the use of disbursement targets was not included in program protocols.