大数据系统代写 | CS4545/CS6545 – Big Data Systems Programming Project Topics

本次加拿大代写是使用Java或C++开发大数据系统的assignment

The programming project involves implementation of an existing/already proposed idea related to database/big data management. The project can be done alone or in a team (of two students). The language for implementation will be Java or C++. The deliverables are:

Pick any one of the programming projects from the list below (being updated).

1 Implementation of Conditional Cuckoo Filter

 References:

[1]  Conditional Cuckoo Filters: Daniel Ting and Rick Cole. Conditional Cuckoo Filters. SIGMOD 2021

2 Implementation of Pattern-Oriented-Split Tree (POS-Tree)

 

References:

[1] Cong Yuey et al. Analysis of Indexing Structures for Immutable Data. https://arxiv.org/pdf/2003.02090.pdf

3 Implementation of Merkle Bucket Tree

 

References:

[1] Cong Yuey et al. Analysis of Indexing Structures for Immutable Data. https://arxiv.org/pdf/2003.02090.pdf

4 Implementation of Hash table based LSM-tree

 

Note: The log-structured merge-tree (or LSM tree) [1,2] is a data structure with performance characteristics that make it attractive for providing indexed access to files with high insert volume, such as transactional log data. However, it uses a tree-based data structure for each component, such as B+Tree, which requires periodic merging, and involves re-sorting the data. Many high-velocity event data such as from IoT and sensor, are implicitly sorted and may not require the costly sorting step.

 

Hash tables on the other hand do not require data sorting, and are widely used in memory based data tables. However, a hash table requires all keys to be in memory. The goal is to implement  an hash table based LSM-like index that is at least as memory efficient as the LSM tree and compare its performance against LSM tree.

 

References:

[1] The Log-Structured Merge-Tree (LSM-Tree). Patrick O’Neil1 and Edward Cheng. Acta Informatica 33, pp. 351-385, June 1996.

5 Implementation of Quadtree based LSM RUM-Tree

 

Note: Extract the R-tree based LSM RUM-Tree code provided by the paper (http://bit.ly/lsmrum) and implement a new version by replacing R-Tree [2] with Quadtree [3]

 

References:

[1] Jaewoo Shin et al. The LSM RUM-Tree: A Log Structured Merge R-Tree for Update-intensive Spatial Workloads. ICDE 2021

[2] https://en.wikipedia.org/wiki/R-tree

[3]  https://en.wikipedia.org/wiki/Quadtree

6

 

Implementation of QBS-tree

 

References:

[1] Zonglei Zhang et al. QBS-tree: A Spatial Index with High Update Efficiency for Real-time Processing System. IEEE HPCC/SmartCity/DSS, 2019

7

 

 

 

Implementation of Proportionality in Spatial Keyword Search

 

References:

[1] Georgios Kalamatianos et al. Proportionality in Spatial Keyword Search. SIGMOD 2021

8 Implementation of OSS, ORD and ORU Skyline operators (Java)

 

Note: The code is provided in C++ by the authors. The goal is to implement these in Java.

 

References:

[1] Kyriakos Mouratidis et al. Marrying Top-k with Skyline Queries: Relaxing the Preference

Input while Producing Output of Controllable Size. SIGMOD 2021

9 Implementation of Data Canopy

 

References:

 [1] Abdul Wasay, Xinding Wei, Niv Dayan, Stratos Idreos. Data Canopy: Accelerating Exploratory Statistical Analysis. SIGMOD 2017

10 Implementation of Spatio-Temporal Aggregation Using Sketches

 

References:

[1] Yufei Tao et al. Spatio-Temporal Aggregation Using Sketches. ICDE 2004

11 Implementation of FPGA based  B+ Tree index

 

References:

[1] Dennis Heinrich et al. FPGA Approach for a B+ Tree in a Semantic Web Database System. International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), 2015

12 Implementation of FPGA-Based SQL Query Accelerator

 

References:

[1] Christopher Dennl et al. On-the-fly Composition of FPGA-Based SQL Query Accelerators Using A Partially Reconfigurable Module Library. International Symposium on Field-Programmable Custom Computing Machines. 2012

[2] Louis Woods et al. Ibex: an intelligent storage engine with support for advanced SQL offloading. VLDB 2014

13 Implementation of Brie index for Jatalog (a Java Datalog engine)

 

References:

[1] https://en.wikipedia.org/wiki/Datalog

[2] https://github.com/wernsey/Jatalog

[3] Herbert Jordan et al. Brie: A Specialized Trie for Concurrent Datalog. PMAM 2019

14 Implementation of  Bit-Matrix index for Jatalog (a Java Datalog engine)

 

References:

[1] https://en.wikipedia.org/wiki/Datalog

[2] https://github.com/wernsey/Jatalog

[3] Zhiwei Fan et al. Scaling-Up In-Memory Datalog Processing: Observations and Techniques. VLDB 2019

15 Implementation of At-the-time and Back-in-time Persistent Sketches

 

References:

[1] Benwei Shi et al. At-the-time and Back-in-time Persistent Sketches. SIGMOD 2021

16 Implementation of compact sketch

 

References:

[1] Rundong Li  et al. Building Fast and Compact Sketches for Approximately Multi-Set Multi-Membership Querying. SIGMOD 2021

17 Implementation of an In-Memory Updatable Bitmap Index

 

References:

 [1] Manos Athanassoulis et al. UpBit: Scalable In-Memory Updatable Bitmap Indexing. SIGMOD 2016

18 Implementation of Euler Histogram Tree

 

References:

[1]  Hairuo Xie et al. Euler Histogram Tree: A Spatial Data Structure for Aggregate Range Queries on Vehicle Trajectories. SIGSPATIAL 2014

 

19 Implementation of Vector Quotient Filters

 

References:

[1]

Prashant Pandey et al. Vector Quotient Filters: Overcoming the Time/Space Trade-Off in Filter Design. SIGMOD 2021