Vision

To create an interactive machine learning tutorial for novices that provides hands-on learning through a practical application on image difference classification.

Guiding Questions

In our project, we were guided by a few questions that we wanted to answer:

How do we help novices approach the problem more like experts?
How can we scaffold novices to do machine learning applications without technical expertise?

Goals

To achieve our vision and answer our guiding questions, we came up with a few actionable goals for our project:

Develop a guided tutorial to explain the machine learning process to novices (Tutorial Mode)
Provide an interactive platform for non-techincal users who want to apply machine learning to their own image difference classification problems (Build a Classifier Mode)
Scaffold users with instructive graphics and visualizations to make machine learning more intuitive and less like a black-box

Sources

In our tutorial mode, we provided users with a sample dataset on deforestation. Deforestation is an appropriate dataset for this application because it is inherently about image differences (which our tutorial and application focused on). By looking at the satellite images of the same region from two seperate years, you can tell if deforestation had happened within that time period. For instance, the images below (from 2000 and 2015) show evidence of deforestation in an area in Indonesia:

To collect this historical satellite imagery data, we used Google Earth. We took screenshots of 100-square-mile regions in Indonesia, a country that is notorious for the deforestation of its rainforests. Specifically, we took two screenshots of the same area, one from the year 2000 and another from 2015. We did this for 25 seperate areas, getting 25 pairs of before and after images.

Since we collected this data ourselves, concerns about data quality were mitigated:

Completeness of the data was completely (no pun intended) in our hands, and we ensured that our small dataset was complete with all the pairs of before and after images.
Since these images were taken by Google Earth's image providers using state-of-the-art satellite technology, we were not concerned about accountability. For the resolution that we needed, these were reliable images.
We were very careful during data collection and cleaning to ensure coherence and correctness. To ensure coherence, we took before-and-after screenshots of exactly the same patch of area from the exact same altitude. To ensure correctness, we tried our best to ensure that there were no non-deforestation disturbances in the images (i.e. clouds, changes in rivers) that our simple classifier would misclassify as deforestation.

Approach

Broadly, we approached this project in a few stages:

Contextual Inquiry (Novices): We interviewed novices and learned how they approached an image difference classification problem on deforestation. We figured out what they did and did not know about machine learning. From this, we identified gaps in knowledge that we could focus our instruction on.
Contextual Inquiry (Experts): We interviewed two experienced machine learning developers on our team to learn how they tackled the same problem. We compared and contrasted their approach from the approach of the novices.
Instructional Design: Using our insights from the contextual inquires, we drafted a narrative to explain the machine learning process to non-technical, novice users. This narrative would go in our "Tutorial" section.
User Interface Design: We desgned an intuitive UI, filled with instructional GIFs and visualizations to explain what was happening in each stage of the machine learning process. We did this for both the "Tutorial" and "Build a Classifier" sections.
Back-end: We built a back-end to do classification on a generic image difference problem. We hosted a web server on Heroku to perform feature generation, training, and testing using a user-provided dataset.
Front-end: We developed a front-end tutorial and image classification platform (using only the deforestation data) that interfaced with our back-end on Heroku.

The slides below detail our approach in planning and implementing this project: