Published: May 2011
Country | Quality of study design | Bottom line from abstract/summary | Outcomes showing statistically significant improvement | Outcomes not showing statistically significant effect | Outcomes showing statistically significant deterioration |
---|---|---|---|---|---|
Peru8 | Rigorous (randomized controlled trial) | "We find little or no evidence of changes in key outcomes such as business revenue, profits, or employment. We nevertheless observed business knowledge improvements and increased client retention rates for the microfinance institution."9 | 8 indicators (2 of business results, 4 of business practices, 2 of institutional results) | 32 indicators (6 indicators of business results, 10 indicators of business practice, 13 indicators of household outcomes, 3 indicators of institutional outcomes) | None |
Dominican Republic10 | Rigorous (randomized controlled trial) | "We find no significant effect from a standard, fundamentals-based accounting training [designed by Freedom from Hunger]. However, a simplified, rule-of-thumb training produced significant and economically meaningful improvements in business practices and outcomes."11 | None | 10 indicators of business practices and sales | None |
India12 | Somewhat rigorous (randomized controlled trial with imperfect randomization) | "Participation resulted mainly in improved confidence levels of the daughters (and mothers) regarding their money management. Statistically significant improvements in savings levels and effective bargaining were not detected in the quantitative studies...The only [health] topic with significant gains in the randomized controlled trial evaluation was HIV/AIDS. All other topics, such as hand-washing, diarrhea, nutrition and reproductive health, saw few significant differences...[Gains] were seen in the girls’ comfort levels in discussing the topics with their family members."13 | Unmarried: 1 indicator of financial literacy, 6 indicators of health literacy; Married: 2 indicators of financial literacy | Unmarried: 14 indicators of financial literacy, 44 indicators of health literacy; Married: 14 indicators of financial literacy, 48 indicators of health literacy | Unmarried:1 indicator of financial literacy; Married: 2 indicators of financial literacy |
Ghana14 | Results not reported on intent-to-treat basis | - | - | - | - |
Bolivia15 | Results not reported on intent-to-treat basis | - __________________________________ | - | - ____________________ | - |
Country | Quality of study design | Bottom line from abstract | Outcomes showing statistically significant improvement | Outcomes not showing a statistically significant effect | Outcomes showing statistically significant deterioration |
---|---|---|---|---|---|
Ghana16 | Rigorous (randomized controlled trial) | "The malaria education complemented the other activities to increase knowledge and positive behaviors. Yet, even the increased knowledge and behaviors often were impeded by gaps in a family’s ability to access promoted prevention methods such as ITNs."17 | 1 indicator of net ownership; 4 indicators of net use by vulnerable group; Net re-treatment | 3 indicators of net ownership | None |
Peru18 | Rigorous (randomized controlled trial) | "Individuals in the IMCI treatment arm demonstrated more knowledge about a variety of issues related to child health, but there were no changes in anthropometric measures or reported child health status."19 | Only knowledge indicators | All directly measured and reported child health indicators | None |
Benin20 | Somewhat rigorous (randomized controlled trial with imperfect randomization) | "Results revealed that that the education villages perform somewhat better than the credit-only villages in malaria knowledge indicators [and] also have somewhat better malaria behaviors...Education villages were substantially more likely than credit-only villages to perform better on HIV knowledge indicators...There were no significant differences...when assessing knowledge and behavior change as a result of the childhood illnesses module."21 __________________________________ | 7 health indicators, 3 credit and finance indicators | 74 health indicators; all food security, social network, and decision making indicators; 39 credit and finance indicators ____________________ | 1 indicator |
I was surprised by the speed of your evaluation, which is a tribute to your work pace, especially considering the number of research reports you had to read carefully and summarize. But I was also surprised that I was the only staff person interviewed directly. In an effort to communicate succinctly and clearly, I tend to paint our work and our organization in broad summary brushstrokes that can mislead; I get away with this approach because usually evaluators also talk to my colleagues who know and love the details. In particular, I clearly misled you in my comments regarding monitoring. Moreover, you have drawn the wrong conclusion about our expansion plans for Saving for Change. And we want to raise a caution flag regarding the way you summarize the results of randomized control trial (RCT) research, not just ours.
We do not contest your overall conclusion that Freedom from Hunger does not currently qualify for your highest rating””we are pleased to be deemed a Notable organization, given the very high standards set by GiveWell.
Freedom from Hunger is fundamentally different from the organizations that have qualified for your highest rating, at least the ones we know well (Small Enterprise Foundation and Village Enterprise Fund), because we do not directly control (either through legal governance or funding leverage) the program operations that deliver services to intended beneficiaries. Such control would allow us to require particular systems of implementation, including quality-monitoring and control and the detailed reporting of quality data to Freedom from Hunger, which we could roll up into a regular global report for posting on our website. Your report appropriately recognizes the strategic tradeoff we have made for scale of outreach at the expense of the kind of time-and-money-consuming control of partners that allows for a meaningful global quality reporting system. However, it is not correct (my fault) that “Freedom from Hunger does not monitor its partners over time to determine whether they implement Freedom from Hunger's programs well.” So let me try to correct the misperception I created with my breezy remarks during the interview (I was enjoying myself too much, which always gets my staff worried!).
Monitoring
Our impact research on the basic delivery models and the education modules and other components is meant to serve the “proving” function. It also guides design of impact-monitoring systems designed for use by implementing organization managers to improve rather than prove impact. As such, impact- and quality-monitoring primarily serves the needs of local organization management. Because organizations vary greatly in their circumstances and priorities, their management needs and capacity for quality-monitoring vary greatly as well. The result is considerable heterogeneity in the quality-monitoring systems they develop and implement.
Until GiveWell showed such interest, quite frankly, we have not seen value in trying to persuade our disparate partners to standardize their quality-monitoring systems to generate data about quality of delivery that can be rolled up into global reports that we can post on Freedom from Hunger's website. Donors have not demanded this level of detail, and we are skeptical of our ability to create a system that generates global reports that would be meaningful, either to donors or even to ourselves. The point I was making in the interview is that even if we could develop a global-level monitoring system, our lack of control of partner implementation would frustrate our desire to act on the information such a quality-monitoring system might provide. Putting this point more positively, our priorities have been on building the partners' own capacities for quality-monitoring. Let me explain our approach to quality-monitoring in more detail.
We recognize that dissemination of our innovative program models requires customization to suit the particular circumstances of every independent implementing organization that we train. This customization process is more likely to lead to quality delivery for poor beneficiaries and sustainability by the local organization than imposing a fixed design with a pre-determined quality-monitoring and -control system. Our aim is to train the organization to build and maintain its own system, following the philosophy and methods of Social Performance Management (SPM). We have led the global development of SPM training for microfinance institutions, as part of the ImpAct Consortium; SPM is equally applicable to non-financial NGOs that promote savings groups. Our training is designed to build institutional capacity to recruit, train, supervise and incentivize program-delivery staff. This involves quality-monitoring and -control by supervisors and internal auditors to generate feedback to management and to the field staff themselves. It includes training in such monitoring tools as “client satisfaction surveys” and other techniques applied to “lot quality assurance samples” of women participants. Moreover, every education module includes in its design package an impact-monitoring tool that tests whether the women participating in the education have attained pre-determined levels of change in knowledge and behavior.
Our monitoring of partners after they have been trained and guided in the customization process is done by “relationship managers”””our staff assigned to liaise with particular partners in particular countries, both during the intensive training and technical assistance phase of innovation dissemination and customization, and long afterward. As I said in the interview, the frequency with which these staff can actually make on-site visits to the assigned partners depends on dedicated funding and partner willingness to receive our visits (by far, the former is the more common constraint). However, the relationship manager stays in touch with key staff of the partner organization by phone and e-mail to stay abreast of the partner's progress and challenges. Through these means, if not by direct observation, our staff has fairly accurate knowledge of how the partner is applying the tools and systems we have helped its staff develop and install. That is, we have a good sense of how the partner is implementing (including our many partners in Mexico), often including the implementation of quality-monitoring that suits its management purposes. This relationship management process often leads to joint identification of problems that need our troubleshooting assistance and opportunities for new product development and system design. We often build our fundraising around meeting these particular needs and opportunities to improve the quality of service to the partner's clients. These more highly engaged relationships are with the 20–40 partners I referred to in the interview.
In summary, we are monitoring the quality of implementation through our relationship managers, which gives us a valuable but often qualitative picture of the quality of implementation across our portfolio of (now) 132 partners who report outreach (scale) numbers to us on a biannual basis.
While I have despaired of rolling up our partners' numbers to provide global reports, I want to be clear I am referring to numbers on “quality” as we were using the term in our interview. We have long provided the Credit with Education Status Reports (CSRs) on our website (see the latest CSR attached to the same e-mail to which this commentary is attached). As you know now, these CSRs report the outreach numbers, the PAR, the OSS and the education modules delivered by microfinance institutions. These data give us only weak proxy indicators of the quality of program implementation and no verification of impact. Still, when we receive data from the partner, we compare the numbers against those from the prior period and take note of any changes, positive or negative. Often this leads our relationship managers to reach out and talk to the partner; either to commend them for a growing program or to inquire about programs that are clearly struggling.
We actually have a similar reporting system for our Saving for Change partners. Given this savings group model is new (relative to Credit with Education), we haven't had a sufficient number of partners to merit a separate report on the website until now. I am attaching the latest Saving for Change report, and also the underlying MS Excel workbook with the Data Collection Form, Performance Ratios and Project Performance templates, to illustrate the similarities and differences from the CSR. We have a similar outreach and activity report for our partners engaged in Microfinance and Health Protection (see latest version attached). These reports and other data sets are in fact rolled up into a global (all program models, all 132 partners, all 19 countries) “Performance Management Report” (latest is also attached). Our Board of Trustees asked staff not to post this PM Report on our website, because they find it too confusing without detailed explanation by staff. I do believe, however, that the footnotes do a pretty good job of explaining how the report is put together, but I'll let you be the judge. Freedom from Hunger now has multiple program models at play in the world; we are working to provide reports comparable to the CSR for all our main models.
Our “impact stories” monitoring methodology is still relatively new. It includes sending a U.S.-based individual to interview approximately 40 incoming and current clients with a qualitative tool that collects information on the client's well-being, life opportunities, program participation, poverty level and food-security status. Surveys are administered once to act as a baseline, and then again in three years as a follow-up, to observe any changes. Although the study design does not allow for attribution of any observed changes to the program, it is a useful tool for obtaining a snapshot indication of food-security and poverty-level status of clients of the institution, as well as further understanding how clients perceive the impact of the program. Visiting the organization also gives Freedom from Hunger the opportunity to observe live credit or savings group meetings in which education sessions from our program are delivered. To date, nine of our partners have participated in this monitoring. We have summarized our learning to date in the attached white paper just about to be published on our website.
Sorry for this long discourse on monitoring, but I hope you can now see that it is misleading to say that Freedom from Hunger does not monitor quality of implementation by partners after we train them. The problem I referred to in the interview is that we would like very much to have much better, more standardized information on quality and the ability to act on that information in a consistently productive way across our whole portfolio of partners. So far, the cost of getting such information and acting on it would prohibit the massive outreach on which we place such high priority, given the global prevalence of chronic hunger and the massive need for self-help support against it. As I said, this is our strategic tradeoff. But still, we could do better given sufficient funds designated specifically to monitoring.
Saving for Change Expansion
The Performance Management Report (attached, see above) shows that Saving for Change is growing very rapidly in West Africa. Through Freedom from Hunger's strategic alliance with Oxfam America, as well as our independent work in West Africa, our Saving for Change methodology is enabling 483,554 people to form and operate effective village-level savings and loan groups, many of which are also benefitting from malaria education. We are continuing to expand Saving for Change in West Africa and now are exploring new partnerships in Mexico, Ecuador and Peru. We also have a feasibility study scheduled in May in Haiti.
Additionally, Freedom from Hunger has been involved in an intensive quantitative and qualitative research plan for our Saving for Change program in Mali. The research involves a large-scale randomized control trial conducted by Innovations for Poverty Action (IPA), comparing 500 treatment and control villages, as well as 24 months of financial diaries (high-frequency surveys) conducted with a subset of those participating in the RCT. The qualitative work is conducted by the Bureau for Applied Research in Anthropology (BARA) at the University of Arizona in Tucson. BARA is carrying out longitudinal anthropological studies in twelve villages, comparing villages that have Saving for Change with those that do not. The studies will provide extensive information on program impact in the domains of poverty outreach, agricultural production, economic activities, food security, social capital, savings and lending behaviors and much more. The baseline study was not previously placed on the ffhtechnical.org website since we did not author the study; however, the paper is now posted here: http://www.ffhtechnical.org/resources/articles/baseline-study-saving-cha....
In short, we are indeed prioritizing the massive scale-up of our savings-led approach to microfinance. The challenge is that Saving for Change is not a self-financing program delivery system like the credit-led models; the implementing NGOs depend on charitable donations. Therefore, the expansion of the savings-led model is far more expensive, unless it is dovetailed with already-funded program delivery systems (such as microfinance, agricultural extension, literacy training, health services, religious congregations, etc.), which we want to actively explore.
I have to say I am puzzled by your comment that our commitment to expansion of Saving for Change would change your “overall conclusion about the organization.” I don't want to discourage that change of heart, but this seems at variance with your concerns about the inconsistency of our impact research results (our research on the impacts of the savings-led approach is still in progress) and our lack of quality-monitoring (which applies as much to our savings-led partners as to our credit-led partners). These were the reasons you cited to explain why Freedom from Hunger does not currently qualify for your highest ratings. Do you hold savings-led models to a different standard than credit-led models of microfinance?
Summarizing Results of RCTs
Here I want to pass on to you some thoughts from my evaluation specialists regarding inconsistency of RCT results. Bobbi and Megan contributed enormously to the first two sections on monitoring and Saving for Change, but the following is almost entirely their thoughts, which I endorse.
When Freedom from Hunger (and its collaborating academic researchers) designs and implements evaluations of its programs, not all indicators assessed through client surveys carry equal weight, nor do they all serve to evaluate the impact of the program. A simple count of indicators showing positive change, no change or negative change misrepresents the meaning of the data to the implementing organizations as well as to Freedom from Hunger and its supporting researchers. We want to provide a few examples of why this is the case and to share some changes that are occurring in the evaluation field that are resulting in improved analysis and interpretation of data from large-scale evaluations.
First, not all indicators assessed in a survey have equal value or weight. In a sense, there is a hierarchy of objectives that has to be taken into consideration. Some questions are simply more important than others in evaluating impacts; some questions get included as explanatory variables and are not meant to measure impact. For example, in the Bénin study, one of our primary concerns is about mosquito net ownership and use. We also added some questions about whether participants also used mosquito coils, indoor sprays or mosquito sprays for the body. We are not specifically concerned about whether the clients improve indoor spraying, but we added the question out of interest to see whether those who owned bednets were also likely to exhibit other protection measures or to simply evaluate whether the promotion of mosquito nets improved other protection measures as well. If we saw positive impact in mosquito net ownership and not in mosquito spray, this did not mean we'd consider the lack of improvement in mosquito sprays as a failure; we're primarily interested in mosquito net ownership. Thus, if the “use of mosquito spray” indicator gets categorized as “no effect,” this suggests that improvement in this indicator was weighted equally with the “net ownership” indicator, when it was not.
Second, not all indicators are meant to individually assess impact. For example, in the GiveWell analysis of the Karlan and Valdivia study on business education in Peru, only eight indicators were counted as having a positive effect, while 32 are counted as having no effect. There are a few reasons why this does not accurately portray the meaning of the data. First, very little data exists on microentrepreneurs and their business activities in general, so researchers must ask multiple questions to test one effect. For example, the questions about amount of sales that occurred in the last month, in a good month, in a normal month and in a bad month is trying to test whether microentrepreneurs experience improved sales due to their participation in business education. Four questions are asked instead of one question due to issues of recall, reliability of clients' ability to provide accurate data, and to account for the fact that businesses are likely cyclical or seasonal. We saw no effect of the education on clients' reported sales for a good month, but we did find an effect for clients in a bad sales month. This result (no effect in three of the four questions) is not inconsistent; what it shows is that the business education was effective, but this effect manifested most prominently by protecting sales volumes in the typically slow months of the year. If we had chosen to ask only about sales in the prior month (which is the best recall period and is normally what is used when asking people about their finances), we would have missed altogether the effect that occurred at some other point in the year. In the time we had to evaluate this program, the main impact of the education was first on income-smoothing. Over time (and with more time to evaluate), we might have seen improvement in typically good months as well. If we had seen no effect in any four of these questions, we would have had one conclusion: the education has no effect on the clients' sales. Since we saw an effect in one of the questions, our conclusion must be that the education does have an effect on the clients' sales and you detect it more clearly during their bad sales months.
A third important point to consider is that our impact evaluations are also “research,” which means some questions are included because we're simply interested in collecting data that would be important to the industry and are not there to measure impact. We also ask the same question in different ways to detect discrepancies or to simply test which question will better represent a concept.
Finally, for RCTs and other non-RCT research, researchers lately are more and more developing indices of indicators to “concentrate” and better help explain impacts. This is best seen in the Bénin evaluation. Because we ask multiple knowledge questions, the risk is that we'll find improvement in some and not in others. To evaluate how broad was the knowledge change, we could take the percentage of indicators which showed an effect, if we believe that all questions were equally important for detecting impact. Because this is not always the case, researchers are now developing indices that present a set of indicators as a single unit to avoid having to assess the meaning of each question individually. We can say in the Bénin study that, on average, clients participating in the malaria education had better malaria knowledge and better malaria behaviors, even though the results for individual indicators might give the impression that there was little impact or create confusion about what the data were actually showing.
Finally, a few additional questions/clarifications that need to be explored:
In general, it is devilishly difficult to interpret the results of RCTs that aim to investigate impacts on multiple variables. We find it is necessary to triangulate on the meaning of RCT results by gathering and interpreting qualitative research (such as our “impact stories”) that provides a broader, even though less precise, view of overall impact on the lives of participants in a program. Other researchers are coming to this same view. We would be interested in better knowing your thoughts on this thorny issue.