- published: 29 Jun 2012
- views: 25012
Data cleansing, data cleaning or data scrubbing is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database. Used mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant, etc. parts of the data and then replacing, modifying, or deleting this dirty data or coarse data. Data cleansing may be performed interactively with data wrangling tools, or as batch processing through scripting.
After cleansing, a data set will be consistent with other similar data sets in the system. The inconsistencies detected or removed may have been originally caused by user entry errors, by corruption in transmission or storage, or by different data dictionary definitions of similar entities in different stores.
Data cleansing differs from data validation in that validation almost invariably means data is rejected from the system at entry and is performed at entry time, rather than on batches of data.
http://www.xlninja.com/2012/06/29/cleaning-data-in-excel/ How often do you receive an Excel spreadsheet with information that is written incorrectly and you have to spend hours to clean it up? Very often! With this video I will share a quick and easy way to clean a set of data, saving you hours of tedious work.
In this session, Jennifer Thompson, M.S., introduces data cleaning and outliers. Outliers can be a tricky problem for a data mining project. This session will address these problems and help understand what caused them in the first place. http://statsoft.com/products/data-mining-solutions/
Data Cleaning and Dates using lubridate, dplyr, and plyr
This video is part of an online course, Data Wrangling with MongoDB. Check out the course here: https://www.udacity.com/course/ud032. This course was designed as part of a program to help you and others become a Data Analyst. You can check out the full details of the program here: https://www.udacity.com/course/nd002.
Hear Kyle Kwaiser, data manager at the Institute for Social Research, discuss dataset structure and data cleaning. Follow along with the presentation slides: https://docs.google.com/file/d/0B9JmNm1XrA7CZTJxVGJ5c3JiVEk/edit?usp=sharing Connect with Kyle: http://www.linkedin.com/in/kylekwaiser
Excel can be an amazing tool for data analysis. But we hardly get the data that can be used right away. And a bad data leads to bad analysis. In this video, I will show you 10 Super Neat Ways to Clean Data in Excel. Read the full tutorial here: http://trumpexcel.com/2014/08/clean-data-in-excel/
Bitcoin donations are welcome: 1GGV3gbJeA83FWmz9hDfPri8EuqcUtodXy Mike's SAS Tutorials Lesson 5 This video series is intended to help you learn how to program using SAS for your statistical needs. Lesson 5 introduces the concept of data reduction (also known as subsetting data sets). I discuss how one can subset a data set (i.e. reduce a data set's number of observations) based on some criteria using the IF statement in the DATA STEP, or using the WHERE statement in a PROC STEP. I also discuss using the KEEP, DROP, and RENAME statements for reducing data to only a handful of the original variables (i.e. reduce a data set's number of variables). Furthermore, I show how one can label variables so that descriptive information can be presented in output and value formats so that specific val...
R Statistics Basics Continued: Cleaning up your data via the "subset(...)" function. One way to remove outliers in R (a rather visual, qualitative way). Removing missing data (NAs, NaNs). More R Ssoftware and Econometrics Videos: http://sites.google.com/site/curtiskephart/ta/econ113 ------Video----Outline-------- Cleaning Up Your Data 2:11 Looking for outliers and non-numeric values -• str(...) -• summary(...) -• hist(...) 4:35 Dealing with outliers -• Do you really want to remove these? -• subset(...) 8:08 Dealing with non-numeric values (NAs or NaNs). -• Summary Stats with NAs -• Removing NAs ----------------------------------- References: - R Code from this Video: http://sites.google.com/site/curtiskephart/data/Cleaning%20up%20your%20data.R - Download Data f...
This video provides a brief introduction to Stata commands used to annotate, subset, and browse a data set.
Data Cleaning and Dates using lubridate, dplyr, and plyr
Town Peterson speaking on data cleaning workflows for biodiversity informatics applications.
Learn how to clean and prepare your data before using in Tableau Public
HUB Office Hours session on Duplicates and Data Cleaning. With Jason Cook, Anne Crawford, Ashima Saigal and Debra Van Zegeren, moderated by Caroline Renard.
How to clean election data in Excel
Learn about the DODD data clean-up efforts. This video reviews: the excel user tip of the week, the emergency and still waiting list, deceased and actively waiting list and future data clean-up efforts.
This data warehouse webinar touches on various topics including: Overtime Reports, Data Warehouse Survey, Data Clean-Ups and Data Briefs