CIS 6930 Spring 24

Logo

This is the web page for Data Engineering at the University of Florida.

View the Project on GitHub ufdatastudio/cis6930sp24

CIS 6930 Special Topics: Data Engineering Spring 2024

Class Hours: Tuesday 4th Period (10:40 AM to 11:30 AM) and Thursday 4th and 5th Period (10:40 AM to 12:35 PM)
Location: MCCARTY HALL B G086

Instructors

Dr. Christan Grant

Teaching Assistant

Yifan Wang


**Note: Any email messages to the professors or teaching assistants must include `cs6930` in the subject line.**

Any email without this string in the subject line will likely be filtered as junk.

Course Information

Course Overview

Data are the fundamental units in Artificial Intelligence (AI) and Machine Learning (ML) systems. Effectively harnessing this data is the responsibility of software engineers and data scientists. In this course, we will survey the landscape of AI/ML systems to understand how data flows through the systems. We will look at the engineer’s responsibilities for developing performant systems ethically and responsibly. Students will learn how to design, build, and evaluate data pipelines. We will cover the theoretical underpinnings of fairness and bias throughout data systems. Students will produce a comprehensive project using state-of-the-art systems that integrate best practices.

Topics include:

  1. Getting data
  2. Sourcing various data types (Image, Visual, Logs)
  3. Cleaning/Labeling Data
  4. Crowdsourcing
  5. Benchmarks and Metrics
  6. Ethics and Fairness
  7. Visualizing Data
  8. Evaluation types

This is a cross-listed undergraduate and graduate course. Graduate students will be required to perform additional work and will be graded accordingly.

Course Objectives

By the end of the course, students will be able to:

  1. Design appropriate data pipelines for real-world problems.
  2. Evaluate the performance of each stage of a data pipeline.
  3. Create interactions to modify data pipelines.

Course Pre-Requisites

A sufficient background in Systems and Machine Learning (ML) is required to enroll in the course. The instructor will use completion of a course in Database Management Systems (CIS 4301 or COP 5735) as a signal of sufficient background in Systems. The instructor will use comple tion of Machine Learning or Math for Intelligent systems (CAP 6610 or COT 5615) to determine sufficient background in ML. Additionally, to be successful, one must have experience in Python programming and familiarity with SQL and GNU/Linux systems.

Required Textbooks and Software

Material for this course includes instructor notes and research papers from the literature.

Course Schedule

Lectures will be a mix of traditional lectures, class discussions, videos, and other activities. Participation is required to get the most out of the class. The first four weeks are intensive programming and software engineering discussions. The following weeks will include discussions of research papers.

Week Topic
0 Introduction to Data Types and Systems
1 Extracting Data and Analytics
2 Loading Data and Analytics
3 Data Integration and Data Ingestions
4  
5 Data Labeling Data Augmentation
6  
7 Data Cleaning and Data Wrangling
8  
9 Crowdsourcing and Weak Labeling
10  
11 Fairness, Ethics, and Bias
12  
13  
14 Data Visualization and Data Exploration
15  

Attendance Policy, Class Expectations, and Makeup Policy

Students are expected to attend class and participate regularly.

The grade breakdown will be as follows:

  Percentage
Quizzes 45%
Activities 25%
Projects 30%
  100%

The course will have quizzes given throughout the semester. Quizzes are essential to summarize the content covered in each module. Each quiz will be held in class.

Activities may include tasks assigned regularly. Activities may include discussions that ask your opinion on a topic, or they could ask you to perform a task and “report back.” Activities may also include generating content associated with the learning module. Activities may also include submitted “quiz questions” based on class content.

We will have one semester project with several portions given over the semester. These projects will require substantial planning, programming, and debugging. We encourage you to budget your time well.

Late Policy

Late policies are often at odds with the ability of students to receive feedback. I strongly encourage all students to submit assignments at the posted due date. If assignments are not completed on time, the frequent assignments will mount, causing an undue burden on students and staff. All assignments must be completed and submitted before their due date. After the due date, students may submit assignments until they are graded. This typically means students will have 1-3 days to complete the assignment. If the assignment is submitted before grading starts, it will be accepted. However, the grading time will not be announced, and we will not accept assignments after grading begins.

Grading Policy

Grade cut-offs will be at or below the scale published by the University of Florida.

Grade questions

Please note that when an exam/assignment is brought with grading questions, we may examine the entire exam/assignment, and your final grade may end up lower.

Integrity Examples

Below is a selection of example situations on the border of being or not being an academic integrity violation.

Note that this is not an exhaustive list, and the instructor reserves the right to make the final decision on whether a situation is an academic integrity violation.

Any use of CoPilot, ChatGPT, and other generative AI sytems should be clearly declared. Any prompt used should be preserved and clearly included. Failure to do so will be considered an academic integrity violation.

Situation Integrity Violation?
Students A and B meet and work on their assignments together. Neither student prepared anything in advance, and the resulting work is identical. Yes
Students A and B create drafts of their assignment independently and get together to compare answers and discuss their understanding of the material. Each person decides independently whether to make changes that are discussed. No
Students A and B agree to prepare drafts of their assignments independently, but only Student A does. Student A shares her draft with Student B, who reviews it and offers suggestions for improvement. Yes
Students A and B agree that student A will work the even problems and Student B will work the odd problems. They share their work. Yes
Students A and B agree that Student A will work on a read function, and Student B will work on the sorting function. They share their solutions. Yes
Student A has completed a project and is helping Student B complete the same project. Student A explains to Student B what Student B’s code actually does, which is different than what Student B thinks the code does. Student B determines how to modify the code independently. No
Student A has completed a project and is helping Student B complete the same project. Student B is having trouble getting one part of the program to work, so Student A texts Student B three lines of their solution. Yes
Student A has completed a project and is helping Student B complete the same project. Student B has difficulty getting the program to work, so student A tells student B exactly what to type for several lines. Yes
Student A has completed a project and is helping Student B complete the same project. Student B has difficulty getting the program to work, so Student A suggests that Student B use a specific debugging strategy (e.g., “Print out the contents of the variable”). No
Student A has completed a project and is helping Student B complete the same project. Student A shows Student B an example program in the online textbook that will be helpful in figuring out the solution to the problem. No
Student A publishes solutions to an assignment on a public Internet page. Yes
Students A and B work on a project together. After they have finished it, student A takes the code and modifies it so the programs do not appear to be identical. Yes
Student A copies and pastes code from a public Internet page but changes the variable names. Yes
Student A uses a public Internet page to help them understand a concept and then writes their own code to implement it. No
Student A uses an AI system to generate an idea or solution without proper attribution. Yes

Important Messages

Students Requiring Accommodations

Students with disabilities who experience learning barriers and would like to request academic accommodations should connect with the Disability Resource Center by visiting https://disability.ufl.edu/students/get-started/. It is important for students to share their accommodation letter with their instructor and discuss their access needs, as early as possible in the semester.

Course Evaluation

Students are expected to provide professional and respectful feedback on the quality of instruction in this course by completing course evaluations online via GatorEvals. Guidance on how to give feedback in a professional and respectful manner is available at https://gatorevals.aa.ufl.edu/students/. Students will be notified when the evaluation period opens, and can complete evaluations through the email they receive from GatorEvals, in their Canvas course menu under GatorEvals, or via https://ufl.bluera.com/ufl/. Summaries of course evaluation results are available to students at https://gatorevals.aa.ufl.edu/public-results/

In-Class Recording

Students are allowed to record video or audio of class lectures. However, the purposes for which these recordings may be used are strictly controlled. The only allowable purposes are (1) for personal educational use, (2) in connection with a complaint to the university, or (3) as evidence in, or in preparation for, a criminal or civil proceeding. All other purposes are prohibited. Specifically, students may not publish recorded lectures without the written consent of the instructor. A “class lecture” is an educational presentation intended to inform or teach enrolled students about a particular subject, including any instructor-led discussions that form part of the presentation, and delivered by any instructor hired or appointed by the University, or by a guest instructor, as part of a University of Florida course. A class lecture does not include lab sessions, student presentations, clinical presentations such as patient history, academic exercises involving solely student participation, assessments (quizzes, tests, exams), field trips, private conversations between students in the class or between a student and the faculty or lecturer during a class session.

Publication without permission of the instructor is prohibited. To “publish” means to share, transmit, circulate, distribute, or provide access to a recording, regardless of format or medium, to another person (or persons), including but not limited to another student within the same class section. Additionally, a recording, or transcript of a recording, is considered published if it is posted on or uploaded to, in whole or in part, any media platform, including but not limited to social media, book, magazine, newspaper, leaflet, or third party note/tutoring services. A student who publishes a recording without written consent may be subject to a civil cause of action instituted by a person injured by the publication and/or discipline under UF Regulation 4.040 Student Honor Code and Student Conduct Code

University Honesty Policy

UF students are bound by The Honor Pledge which states, “We, the members of the University of Florida community, pledge to hold ourselves and our peers to the highest standards of honor and integrity by abiding by the Honor Code. On all work submitted for credit by students at the University of Florida, the following pledge is either required or implied: “On my honor, I have neither given nor received unauthorized aid in doing this assignment.” The Honor Code (https://sccr.dso.ufl.edu/process/student-conduct-code/) specifies a number of behaviors that are in violation of this code and the possible sanctions. Furthermore, you are obligated to report any condition that facilitates academic misconduct to appropriate personnel. If you have any questions or concerns, please consult with the instructor or TAs in this class.

Commitment to a Safe and Inclusive Learning Environment

The Herbert Wertheim College of Engineering values broad diversity within our community and is committed to individual and group empowerment, inclusion, and the elimination of discrimination. It is expected that every person in this class will treat one another with dignity and respect regardless of gender, sexuality, disability, age, socioeconomic status, ethnicity, race, and culture. If you feel like your performance in class is being impacted by discrimination or harassment of any kind, please contact your instructor or any of the following:

Software Use

All faculty, staff, and students of the University are required and expected to obey the laws and legal agreements governing software use. Failure to do so can lead to monetary damages and/or criminal penalties for the individual violator. Because such violations are also against University policies and rules, disciplinary action will be taken as appropriate. We, the members of the University of Florida community, pledge to uphold ourselves and our peers to the highest standards of honesty and integrity.

Student Privacy

There are federal laws protecting your privacy with regards to grades earned in courses and on individual assignments. For more information, please see: https://registrar.ufl.edu/ferpa.html

Campus Resources (Health and Wellness)

I encourage all students to wear masks or other personal protective equipment.

U Matter, We Care:

Your well-being is important to the University of Florida. The U Matter, We Care initiative is committed to creating a culture of care on our campus by encouraging members of our community to look out for one another and to reach out for help if a member of our community is in need. If you or a friend is in distress, please contact umatter@ufl.edu so that the U Matter, We Care Team can reach out to the student in distress. A nighttime and weekend crisis counselor is available by phone at 352-392-1575. The U Matter, We Care Team can help connect students to the many other helping resources available including, but not limited to, Victim Advocates, Housing staff, and the Counseling and Wellness Center. Please remember that asking for help is a sign of strength. In case of emergency, call 9-1-1.

Counseling and Wellness Center:

Visit https://counseling.ufl.edu, and 392-1575; and the University Police Department: 392-1111 or 9-1-1 for emergencies.

Sexual Discrimination, Harassment, Assault, or Violence

If you or a friend has been subjected to sexual discrimination, sexual harassment, sexual assault, or violence contact the Office of Title IX Compliance, located at Yon Hall Room 427, 1908 Stadium Road, (352) 273-1094, title-ix@ufl.edu

Sexual Assault Recovery Services (SARS)

Student Health Care Center, 392-1161.

University Police Department

Call at 392-1111 (or 9-1-1 for emergencies), or http://www.police.ufl.edu/.

Campus Resources (Academic)

E-learning technical support

Call 352-392-4357 (select option 2) or e-mail to Learningsupport@ufl.edu. https://lss.at.ufl.edu/help.shtml.

Career Connections Center

Located in ihe Reitz Union, 392-1601. Career assistance and counseling; https://career.ufl.edu.

Library Support

Visit http://cms.uflib.ufl.edu/ask. Various ways to receive assistance with respect to using the libraries or finding resources.

Teaching Center

Located in Broward Hall, 392-2010 or 392-6420. General study skills and tutoring. https://teachingcenter.ufl.edu/.

Writing Studio

Located in 302 Tigert Hall, 846-1138. Help brainstorming, formatting, and writing papers. https://writing.ufl.edu/writing-studio/.

Student Complaints Campus

Visit https://sccr.dso.ufl.edu/policies/student-honor-code-studentconduct-code/;https://care.dso.ufl.edu.

On-Line Student Complaints

Visit https://distance.ufl.edu/state-authorization-status/#studentcomplaint.

Giving Quality Feedback

This page describes the types of grading feedback https://citt.ufl.edu/resources/assessing-student-learning/providing-effective-feedback/types-of-feedback/