In the era of big data, we are increasingly faced with the challenges of converting massive amounts of data to actionable knowledge. Given the limits of individual machines (compute power, memory, bandwidth), increasingly the solution is to clean, integrate, and process the data using statistical machine learning techniques, in parallel on many machines. This course focuses on the fundamentals of scaling computation to handle common data analytics tasks.
You will learn about basic tasks in collecting, wrangling, and structuring data; programming models for performing certain kinds of computation in a scalable way across many compute nodes; common approaches to converting algorithms to such programming models; standard toolkits for data analysis consisting of a wide variety of primitives; and popular distributed frameworks for analytics tasks such as filtering, graph analysis, clustering, and classification.
Course Learning Objectives
- Students will be able to formulate and evaluate hypotheses, given data.
- Students will be able to import, model, wrangle, clean, analyze, visualize, and build classifiers from disparate datasets.
- Students will understand how to leverage parallel processing and the cloud to analyze larger-than-memory data.
- Familiarity with probability, statistics and coding in Python
- MCIT 591
Course Textbook & Readings
Python for Data Analysis, by Wes McKinney
Recommended: Data Science from Scratch, 2nd ed, by Grus, from O’Reilly. This book provides a quick refresher in Python, probability, statistics, and linear algebra.
Grading & Assessment
You must attempt all graded assignments to pass the course. If you have any questions or concerns about grading or progress in the course, please reach out to the instructor.
Late Policy / Extensions
The instruction staff is committed to your success and understands how challenging it can be to learn online while balancing other commitments. Despite students’ best intentions, sometimes life gets in the way and a little extra time to complete an assignment may be necessary.
If you need extra time on an assignment, you can obtain an extension for extenuating circumstances. If an extension is not approved, an assignment that is turned in late will receive a 10% grade reduction per day up to 5 days. After the 5th day, no credit will be given. For extensions please request them via the Extension Request Form. These extension requests must be submitted at least 24 hours before the assignment deadline. If your request is granted,you will see updated deadlines reflected on Canvas by Mondays at 5 PM EST.
Regrade requests are handled on a case-by-case basis and are allowed up to 1 week after the grades are released. Requests must be created through the “Regrade Request” function via Gradescope. Regrade requests may take up to a week to process at the discretion of the faculty. When submitting a regrade request, please explain (in detail) why you feel the grading is incorrect.
Other Course Activities
The following activities are not mandatory, but will greatly support your success on the graded assignments.
Discussion forums (on Ed Discussion) are designed to give you optional extra practice with the material and to see examples of how your classmates are thinking and working.
Recitations are optional instructor- or TA-led guided live activities.
The instructor may add additional optional segments, such as tutorial videos, to support the class as needed.
Getting and Giving Help
TA and Instructor Support
TAs will hold weekly private live drop-in office hours. Live drop-in private office hours are indicated on the course calendar and held through OHQ.io. TAs will help a queue of students on a first-come-first served basis and meet one on one with students. These sessions are not recorded.
Your instructors will be available during one Open Office Hour session a week, and for private office hours by request. Request these by emailing email@example.com.
Open Office Hours will be recorded and uploaded under Class Recordings within about 24 hours of the live session ending.
The discussion forum is meant to be a collaborative space where you can ask questions and get answers from classmates and members of the instruction staff. Although most students will use the forum for homework-specific questions, you are encouraged to also ask questions about course content from the videos and reading assignments.
Students are also encouraged to answer other students’ questions, as long as they do not reveal specific details about their own solutions to homework and project assignments (see Collaboration Guidelines below).
When posting public questions in the discussion forum to ask for help with a programming assignment, please do not share your code, as you may unintentionally give away a solution.
Likewise, even when posting privately, please do not post your entire code and ask the instruction staff to debug it for you or tell you what’s wrong with it; the goal is for you to be able to resolve such issues on your own, though the instruction staff will be happy to give general advice for troubleshooting and debugging.
In the professional world of software development, collaboration—including using code that others have written—is both practical and ubiquitous.
However, to prepare for entering that professional context, you need to develop a full set of software development skills so that you are both able to create your own code and evaluate the quality of someone else’s code that you might use. In the context of this course, independent work and evaluation are critical. Do not collaborate with others on individual graded assignments unless it is explicitly indicated. The inappropriate collaboration will be considered cheating and considered under Penn’s Code of Academic Integrity.
Discussion forums are collaborative—please take advantage of those times to work with your colleagues. For general communication with your colleagues, use your Slack channels or Slack direct messages.
本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易！
E-mail: firstname.lastname@example.org 微信:itcsdx