Featuring stories about data sharing and data analysis from science, journalism, government, and open source.
May 4-5, 2021, online.
Thank you for making csv,conf,v6 possible! Look out for csv,conf,v7 in 2022!
csv,conf is about:
Building Community
csv,conf brings diverse groups together to discuss data topics, and features stories about data sharing and data analysis from science, journalism, government, and open source.
People who love data
csv,conf is a non-profit community conference run by folks who really love data and sharing knowledge. If you are as passionate about data and its application to society as we are, then this is the conference for you.
Big and small
csv,conf conferences aren't just about spreadsheets. We curate content on broader topics like advancing the art of data collaboration- from putting your data on GitHub, to producing meaningful insight by running large scale distributed processing on a cluster.
Conference Keynotes
From a spreadsheet to critical data infrastructure: building The COVID Tracking Project
Julia Kodysh, Michal Mart, Kevin Miller, Kara Schechtman
The COVID Tracking Project began in March 2020 as a stopgap spreadsheet maintained by a handful of journalists, hoping to provide some information on COVID across the US until the federal government stepped in. But that day never came. Powered by over a thousand volunteers collecting data from disparate state and federal systems every day, the project accidentally became an indispensable source of data used by governments, individuals, and institutions to make critical decisions.
The decentralized nature of public health infrastructure in the United States, which is mostly managed by overstretched health departments at the local and state level, made it impossible to automate the collection and normalization of data on the pandemic. States, which suddenly found themselves needing to produce and report data out of underfunded and overstretched systems, produced COVID dashboards that were all different from each other, didn't provide APIs and used technologies that are difficult to scrape. Human gleaning of the data from these systems allowed us to identify sudden changes in reporting, keep an eye on data definitions, and develop a deep well of experience and metadata that informed how we reported every state's data. Volunteers not only do the critical work of data collection, but through their experience in working with the data, are empowered to make key decisions in our analysis and reporting.
Our tooling has matured as we have tested the edges of the possible with Google Sheets. We still use spreadsheets, but have developed more powerful tools to ensure data quality and improve our publishing process. This infrastructure allows teams working on our website and API to use a reliable dataset that serves millions of users a day.
The COVID Tracking Project became the de-facto source of COVID data for so many because of a community built in Slack channels by strangers. The tools and sheets and websites we have built are impressive and useful for others to learn from. But the biggest legacy of the project will be the thousands of people who caught a glimpse of the best of themselves during a terrible time.
Race Matters in Health Data
Dr. Kadija Ferryman
Dr. Kadija Ferryman is a cultural anthropologist and bioethicist who studies the social, cultural, and ethical implications of health information technologies. Specifically, her research examines how genomics, digital medical records, artificial intelligence, and other technologies impact racial disparities in health. She is currently Industry Assistant Professor at New York University’s Tandon School of Engineering. As a Postdoctoral Scholar at the Data & Society Research Institute in New York, she led the Fairness in Precision Medicine research study, which examines the potential for bias and discrimination in predictive precision medicine.
She earned a BA in Anthropology from Yale University, and a PhD in Anthropology from The New School for Social Research. Before completing her PhD, she was a policy researcher at the Urban Institute where she studied how housing and neighborhoods impact well-being, specifically the effects of public housing redevelopment on children, families, and older adults. Ferryman is a member of the Institutional Review Board for the All of Research Program, a Mozilla Open Science Fellow, and an Affiliate at the Center for Critical Race and Digital Studies. Dr. Ferryman has published research in journals such as Paediatric and Perinatal Epidemiology, the Journal of Health Care for the Poor and Underserved, European Journal of Human Genetics, and Genetics in Medicine. Her research has been featured in multiple publications including Nature, STAT, and The Financial Times.
Datasette and Dogsheep: Liberating your personal data
Simon Willison
Datasette is an open source web application for exploring, analyzing and publishing data. I originally designed it to support data journalists working in newsrooms, but quickly realized that it has applications way beyond journalism.
I decided to start digging into my own personal data - the data that sites and services collect about and for me, which thanks to regulations like Europe's GDPR is increasingly available for me to export myself. This led to Dogsheep, a collection of tools for importing personal data from Twitter, GitHub, 23AndMe, Foursquare, Google, HealthKit and more.
Being able to export your data isn't much good if you can't easily do interesting things with it. I'll show how the combination of Datasette and Dogsheep can help liberate your personal data, and discuss the lessons I've learned about personal data and open source along the way.