COMPSCI 596E Machine Learning Applied to Child Rescue
3 Credits
Meetings: Mondays 230-245 (mandatory). CMPSCI 140 (or online for UWW section)
Instructors:
Prasanna Lakkur Subramanyam
Brian Levine
Description:
Our goal is to build practical machine learning models to be used by professionals dedicated to rescuing children from abuse. This course is a group-based, guided independent study. Students will be encouraged to design and build their own diagnostic and machine learning tools, while also learning from professionals in the fields of digital forensics and law enforcement. The entire student group will meet once a week to share progress via short presentations. Prerequisites: Permission of instructor only. To gain permission, you must be a graduate student in CMPSCI with a machine learning background. We expect high grades in CS 589 or CS 682.
Learning Goals & Objectives:
For more than a decade, UMass Amherst has assisted law enforcement thwart Internet-based crimes against children. These efforts have included building new tools that are used daily by investigators to rescue children. This group independent study is an opportunity for students to build applications for assisting law enforcement investigating Internet-based crimes. Here is a summary of the course goals and objectives.
- Our focus will be crimes involving the distribution of CSAM: text, images, and videos pertaining to abuse of minors. We will restrict ourselves to projects that our law enforcement partners say ahead of time will be helpful to the rescue of children.
- Every project should be developed with the intent to be run and used in practice by law enforcement. Since CSAM is illegal to possess, work is performed by students on proxy data, but the systems will be executed by law enforcement partners on real data during the semester to give feedback.
- We encourage you to be creative and agile with your solutions. This is not a course to try out the latest research (even your own). Your solutions must be based on stable libraries and code already available. All solutions must be local (i.e., no calls to cloud services).
- Excellent documentation and well-written code are the standard for this course. You may inherit a project from a previous semester, and you may be passing your project onto the next semester. In either case, documentation is paramount.
- Weekly meetings will be like standups, and we'll be using Agile sprints over long development. We want to see progress every week. Every student will be allowed to skip up to 2 standups in the semester.
- The projects are for individuals, but aligning your project's goals with others is ok. For example, the output of your project may be the input to someone else's. In fact, this would be ideal.
- This course is not a venue to learn about machine learning. Instead, you'll gain experience applying ML to a real problem, under constraints set by a real need, and how deployment must happen in a large systems-building project. We'll care about real performance and progress, and not just that it runs once to complete an assignment.
- The goal of the course will be to have a tool that can either be integrated into law-enforcement workflow. We will end the semester with final presentations from students, recapping every project.
Logistics
This course is actually an independent study, and there is a limit to how many can count toward your degree.
We plan to give the code to law enforcement for free and as open source at the end of the semester.
Grading will be based on weekly standups. Students must demonstrate consistent progress each week to pass and earn the highest grade. Each week students will present according to this template: 1) Reminder of my project's goals; 2) My "next week" slide 5 from last week's presentation; 3) What I actually did (including negative results); 4) A light demo of progress; 5) What I plan to do next week.
FERPA notice: By participating in the course, you'll end up interacting with some folks outside UMass in law enforcement. They'll learn your name and that you are enrolled. But certainly not your grades, etc.
Note from Instructors
We've run a large number of independent studies over the years, and have found that it is difficult for students to prioritize them over regular classes. Therefore, we'll be treating meetings pretty seriously -- so that you think of it as important as (or more important!) than your regular classes. Students must be enrolled in not more than 6 other credits (9 total). Independent studies should take up your remaining time; if you take 3 other courses that you won't have the time. Similarly, you can't take the course if you are performing research with another group or with your advisor.
We'll get real feedback all the time from our contacts in the FBI. It should be pretty fun and we hope, high impact.
Rules:
- This course gives you no special rights. You are not part of investigations with the FBI or anyone else in law enforcement. Don't get yourself into trouble by trying to download anything even related broadly to CSAM. Don't investigate the area or topic. Stay clear.
- Don't name your project with terms related to CSAM or any word in that acronym. Nor words related to perpetrators, crime. etc. Someone else looking through your code might not understand without knowing the context of this class.
- Don't discuss CSAM, terms, investigations, etc. in public places (cafes, umass lobbies, etc.) People might get the wrong idea that you are a perpetrator. You can discuss with the door closed with someone else in the class. Open discussions lead to troublesome misunderstanding.
- By participating in the course, you'll learn something about how people commit crimes. Don't distribute such information publicly (reddit, facebook, your blog, etc.) or you'll be teaching people how to commit crimes.
Past Project Samples:
- Age Estimation: The goal of this project is to allow users to classify images into four broad classes of age depending on the people identified in the image.
- Object Detection: A library that makes use of object detection to provide insights into image data.
- SodaNet: A library for detecting soda cans in an image using object recognition and text recognition
- Speech-to-Text NER: A library for converting speech to text with natural entity recognition to properly capitalize proper nouns
- ToxicComments: Neural network for identifying and categorizing toxic and potentially harmful comments and text strings
- Many of these projects are publicly available projects: https://github.com/UMass-Rescue/