Spring 2021: 696E

Machine Learning Applied to Child Rescue
3 Credits
Meetings: TBD

Jagath Jai Kumar (Research Fellow): Co-Instructor and weekly meeting point person

Brian Levine (Professor): Co-Instructor

Description:

Our goal is to build practical machine learning models to be used by professionals dedicated to rescuing children from abuse.This course is a group-based, guided independent study. Students will be encouraged to design and build their own diagnostic and machine learning tools, while also learning from professionals in the fields of digital forensics and law enforcement. The entire student group will meet once a week to share progress via short presentations. Prerequisites: Permission of instructor only. To gain permission, you must be a graduate student in CMPSCI with a machine learning background. We expect high grades in CS 589 or CS 682. Instructions for enrollment are below.

Learning Goals & Objectives:

For more than a decade, UMass Amherst has assisted law enforcement thwart Internet-based crimes against children. These efforts have included building new tools that are used daily by investigators to rescue children. This group independent study is an opportunity for students to build applications for assisting law enforcement investigating Internet-based crimes. Here is a summary of the course goals and objectives.

1. Our focus will be crimes involving the distribution of CSAM: text, images, and videos pertaining to abuse of minors. We will restrict ourselves to projects that our law enforcement partners say ahead of time will be helpful to the rescue of children.

2. Every project should be developed with the intent to be run and used in practice by law enforcement. Since CSAM is illegal to possess, work is performed by students on proxy data, but the systems will be executed by law enforcement partners on real data during the semester to give feedback.

3. We encourage you to be creative and agile with your solutions. This is not a course to try out the latest research (even your own). Your solutions must be based on stable libraries and code already available. All solutions must be local (i.e., no calls to cloud services).

4. Excellent documentation and well-written code are the standard for this course. You may inherit a project from a previous semester, and you may be passing your project onto the next semester. In either case, documentation is paramount.

5. Weekly meetings will be like standups, and we'll be using Agile sprints over long development. We want to see progress every week. Every student will be allowed to skip up to 2 standups in the semester.

6. The projects are for individuals, but aligning your project's goals with others is ok. For example, the output of your project may be the input to someone else's. In fact, this would be ideal.

7. This course is not a venue to learn about machine learning. Instead, you'll gain experience applying ML to a real problem, under constraints set by a real need, and how deployment must happen in a large systems-building project. We'll care about real performance and progress, and not just that it runs once to complete an assignment.

8. The goal of the course will be to have a tool that can either be integrated into law-enforcement workflow. We will end the semester with final presentations from students, recapping every project.

Logistics

This course is actually an independent study, and there is a limit to how many can count toward your degree. Please confirm the limitations of your own degree program.

We plan to give the code to law enforcement for free and as open source at the end of the semester.

Grading will be based on weekly standups. Students must demonstrate consistent progress each week to pass and earn the highest grade. Each week students will present according to this template: 1) Reminder of my project's goals; 2) My "next week" slide 5 from last week's presentation; 3) What I actually did (including negative results); 4) A light demo of progress; 5) What I plan to do next week.

FERPA notice: By participating in the course, you'll end up interacting with some folks outside UMass in law enforcement. They'll learn your name and that you are enrolled. But certainly not your grades, etc.

Note from Instructors

We've run a large number of independent studies over the years, and have found that it is difficult for students to prioritize them over regular classes. Therefore, we'll be treating meetings pretty seriously -- so that you think of it as important as (or more important!) than your regular classes. Students must be enrolled in not more than 6 other credits (9 total). Independent studies should take up your remaining time; if you take 3 other courses that you won't have the time. Similarly, you can't take the course if you are performing research with another group or with your advisor.

We'll get real feedback all the time from our contacts in law enforcement. It should be pretty fun and we hope, high impact.

Rules:

1. This course gives you no special rights. You are not part of investigations with any one or any agency in law enforcement. Don't get yourself into trouble by trying to download anything even related broadly to CSAM. Don't investigate the area or topic. Stay clear.

2. Don't name your project with terms related to CSAM or any word in that acronym. Nor words related to perpetrators, crime. etc. Someone else looking through your code might not understand without knowing the context of this class.

3. Don't discuss CSAM, terms, investigations, etc. in public places (cafes, umass lobbies, etc.) People might get the wrong idea that you are a perpetrator. You can discuss with the door closed with someone else in the class. Open discussions lead to troublesome misunderstanding.

4. By participating in the course, you'll learn something about how people commit crimes. Don't distribute such information publicly (reddit, facebook, your blog, etc.) or you'll be teaching people how to commit crimes.

Past Project Samples:

Age Estimation: Classification of persons in image into four broad classes of age.
Object Detection: Object detection can provide insight into images so that they can be compared to other images for clustering or to find common scenes or locales .
Text Recognition: Detection and transcription of text in images either from real life (e.g., on a sign or poster) or written to the image (e.g., a watermark or caption/chyron).
Speech-to-Text NER: Converting speech to text with named entity recognition to properly capitalize proper nouns and pull out important details.
ToxicComments: Identifying and categorizing toxic and potentially harmful comments and text strings.

Many past projects are publicly available: https://github.com/UMass-Rescue/

Enrollment

We have limited space. If you would like to request enrollment, please send email to both jjaikumar@umass.edu and brian@cs.umass.edu telling us: your degree program; how many semesters you've completed at UMass (including spring 2020); a list of machine learning related courses you taken and grades; the list of courses you'll take alongside this one; and your interest or motivation for taking the course. A resume would be helpful as well. We'll prioritize persons with software engineering coursework and/or experience.

Credits:

Instructor:

Jagath Jai Kumar & Brian Levine

CompSci

Graduate

COMPSCI 696E MACHINE LEARNING APPLIED TO CHILD RESCUE

Spring 2021: 696E