Projects
Research Projects
- Data ETL for Deep Learning Model Selection Tasks with Dask, UCSD, Jan ‘22 - present
- Working on designing a extensible and generic data loader for deep learning tasks pertaining to multi modal datasets.
- Developing a data parallel approach for scalable data preprocessing ETL.
- Integrate Cerebro (deep learning model selection system) with Dask, UCSD, Sep ‘21 - Dec ‘21
- Developed a new backend for Cerebro using Dask that completely removes the Spark dependency while still implementing MOP.
- Implemented and showed the parallelism achieved by MOP using the Dask backend in a distributed cluster environment.
- Evaluated the system performance on the Criteo dataset and performed an extensive analysis of results and comparison with the Spark backend.
- Project featured on: ADALab Cerebro
- Parallelization of K-Medoids Clustering Algorithm, BITS Pilani, Aug ‘18 - Dec ‘18
- Developed parallel K-Medoids algorithm using Adaptive Gridding for spatial partitioning in Spark Java.
- Improved the algorithm’s efficiency of selecting initial medoids without compromising the clustering error (average sample size is 10x less than the state of the art - PAMAE given any skewed data set.
- Parallelization of Union-find Algorithm, BITS Pilani, Jan ‘18 - Oct ‘18
- Developed a communication efficient distributed Union-find algorithm using Open MPI in C++.
- Reduced the number of message passing operations between processes (15% reduction) using deferred bulk updates.
Course Projects
- Implementing and simulating performance of branch predictors, Computer Architecture, Oct ‘21 - Dec ‘21
- Implemented the GShare, Tournament and Perceptron branch predictors in C.
- Compared qualitatively and quantitatively the performance of the branch predictors on the Championship Branch Prediction traces.
- Showed the improvement in the branch prediction performance (10% improvement on average) when using Perceptron predictor over Tournament and GShare predictors.
- Kinship Verification from Facial Images of Parents and their Kids, Machine Learning, Nov ‘18 - Dec ‘18
- Compared qualitatively and quantitatively the existing techniques (Artificial Neural Networks, SVM, CNN, ensemble of SVMs) for Kinship Verification in R using Keras library.
- Used the results to design and implement an ensemble of Metric Learning based CNN architecture.
- Improved accuracy by 2.8% on the KinFaceW-1 dataset and by 3.1% on the KinFaceW-2 dataset.
- Compare and Contrast Linear Regression Models - Machine Learning, Machine Learning, Sep ‘18 - Oct ‘18
- Linear regression is used for finding relationship between a target variable and one or more predictors.
- Compared Simple Linear Regression and Bayesian Linear Regression models both qualitatively and quantitatively.
- Data Analysis and Modelling of Student Course Grades, Machine Learning, Sep ‘18 - Oct ‘18
- Created a Bayesian Belief Network using bnlearn library in R based on grades of students, incorporating various hypotheses as to how attributes in data are related.
- The network can answer complex queries without being adversely affected by missing values, irrelevant attributes, and size of data.
- The network can be used to assess teaching pedagogies by modelling natural language queries as conditional probabilities.
- Foster’s Design Methodology on a Distributed Data Structure (RAQ), Parallel Computing, Apr ‘18 - May ‘18
- Designed a parallel algorithm to facilitate joining and leaving of peers from a peer to peer network (represented as RAQ data structure) using Foster’s Design methodology with a commodity cluster as the target platform.
- Obtained logarithmic speedup and improved time complexity of joining mechanism compared to sequential execution.
- Compiler for C-Like Language, Compiler Construction, Jan ‘18 - Apr ‘18
- Developed lexical, syntax, semantic analyzers, and code generator modules of a compiler for a language in C.
- Implemented functionalities to support simple functions, simple matrix operations, and conditional statements.
Designing Sieve of Eratosthenes Algorithm for Distributed Memory Systems, Parallel Computing, Feb ‘18 - Mar ‘18
Designing Word Document Index for Distributed Memory Systems, Parallel Computing, Feb ‘18 - Mar ‘18
- Design Word Document Index Creation for Shared Memory Systems, Parallel Computing, Jan ‘18 - Feb ‘18
- Designed a PRAM algorithm for document index creation using OpenMP in C++ for a UNIX based file system.
- Developed a scalable divide and conquer algorithm on a file system with up to 160,000 files.
- Reduced time taken to create an index from 43 seconds on 1 CPU core to 9 seconds on 32 CPU cores.
- Implement and Validate “AnyDBC” Algorithm (a variant of DBSCAN), Parallel Computing, Jun ‘17 - Jul ‘17
- Implemented AnyDBC sequential algorithm in C++ to compare its execution time results against those of DBSCAN.
- The algorithm performs fewer range queries compared to DBSCAN and produces an approximate result quickly and improves the result over time until the correct solution is obtained.
- Night Canteen Data Analysis, Data Mining, Nov ‘17 - Dec ‘17
- Given a dataset comprising of all transactions that happened at the Night Canteen analysed the data using Data Mining techniques and also draw inferences on the dataset.
- Proposed questions that can be answered using Data Mining techniques.
- Analysis of Rank-Indexed Hashing, Data Structures and Algorithms, Nov ‘17 - Dec ‘17
- Analysis of the research paper: “Rank-Indexed Hashing: A Compact Construction of Bloom Filters and Variants”. The paper describes a new hash table which performs similar to a counting bloom filter while using lesser space.
- The analysis includes explaining rank indexed hashing, pseudo codes for insert, find, time and space complexity analysis and advantages, limitations and its applications.
- Microprocessor: Design and Simulation, Microprocessors and Interfacing, Mar ‘17 - Apr ‘17
- Designed a microprocessor for an automatic washing machine using the 80x86 processor.
- Designed and simulated the circuit on Proteus with assembly code written in MASM.