The ActiveClean codebase is written in Python and includes the core ActiveClean algorithm and a data cleaning benchmark.
The Data Cleaning Benchmark automatically injects data errors into a datasets to test the robustness of a machine learning model against data errors. It can be installed using pip:
pip install cleaningbenchmark
To reproduce the results and run the code, simply download the files in the following link ([login to view URL]) and run the python file using:
python [login to view URL]
The script is quite simple, so you can read it to see everything in action.
We want this to be accomplished using Python 3x, placed in a Jupyter Notebook, and well documented explaining what is approximately happening in each cell/step.
We can provide additional references to help with the above (papers/links).
Our aim is to find someone who can achieve the above, enjoys the work, and would be helping us extend code resulting in more ongoing work.