本次美国代写是一个大数据spark dataframes的assignment
1. 15 points
Datafile: BreadBasket_DMS.zip
Solve: Show the total number by item, per day per hour
Example, given the input:
Bread, 2016-10-30, 09, 1
Bread, 2016-10-30, 10, 12
:
2. 15 Points
Dataset: Restaurants_in_Durham_County_NC.csv
NOTE*** This file is colon delimited (not comma). Do not preprocess it; read it
with spark.read…
Solve: Summarize the number of entities by “rpt_area_desc”
Example:
“Swimming Pools”, 13
“Tatoo Establishment”, 2
:
3. 25 Points
Dataset: populationbycountry19802010millions.csv
Solve: For each year and each region, compute percentage increase in population,
year over year. Note the year 1980 will not have a preceding year.
Show the percentage of yearly population increase as a percentage of the global
population increase for that year.
Display the top 10 in deceasing order of global growth
Example:
Year, Region, yearly increase, percent of global year increase (these results are
made up)
1981, North America, 1.30%, 1%
1982, Aruba, …
4. 15 Points
Dataset: romeo-juliet-pg1777.txt
Solve: WordCount
Do a word count exercise using pyspark. Ignore punctuation, and normalize to
lower case. Accept only the characters in this set: [0-9a-zA-Z]
程序代写代做C/C++/JAVA/安卓/PYTHON/留学生/PHP/APP开发/MATLAB

本网站支持淘宝 支付宝 微信支付 paypal等等交易。如果不放心可以用淘宝交易!
E-mail: itcsdx@outlook.com 微信:itcsdx
如果您使用手机请先保存二维码,微信识别。如果用电脑,直接掏出手机果断扫描。
