Learn with Puneet

Posts

Apache Spark and PySpark

February 15, 2025

Apache Spark and PySpark Some reading materials: Spark Documentation: Spark Official Documentation PySpark Documentation: PySpark API Documentation. Books: "Learning Spark", "Advanced Analytics with Spark" Big Data: Data volume in TBs, PBs and more. Hadoop is old system which is used for processing bigdata. Hadoop is using file system to process data. Characterisitcs: Volume - size of data, (bytes<kb<mb<gb<tb<pb....) Ex: Digital payments, Social media data, e-commerce data etc... Velocity - Data speed of travel Variety - Excel, RDBMS, txt, JSON, XML, HTML, Documents, images, audio, video, geo maps, etc... Veracity - Quality of data and accuracy of data, trust worthy data or not. Value - Actionable information or any data which will provide meaning for business decision or can be considered as useful. Company requirements: Data storage Data processing speed Scalability Hadoop: (HDFS - filesystem) (Map Reduce(computation) - Programming framework) Hand...

VS Code

February 07, 2025

VS Code a generic IDE You can purchase the course from Udemy for VS Code. All these notes are from same course. Boost your productivity with Visual Studio Code, the best code editor of the moment! Completely updated in 2024! Best part of this course is that it is for lifetime and will never expire and will get update as well whenever developer of course think it is needed. You can download the software form VS code official website. After installing VSCODE in Windows you can check the vscode version and command to launch it through cmd. Cmd command to check VS code version: Cmd command to launch VS code from present folder: VS Code image: cmd command for launching VS Code with specific location: VS Code image and folder details: VS Code Details: It provide details about workspace status, action bar, Folder structure, workspace and terminals part. Exp...

Data Structure and Algorithm

February 06, 2025

Linked List Introduction to Linked List: We have arrays concept in programming then why we need to know about LinkedList. As array is continous memory allocation and variables can be created on different memory location hence there is a chance that we will loose memory space for continous array creation. To overcome this issue we need to have a datatype which can act as Array but won't be using contionous memory for storing data. Graphical representation of issue and solution: Array representation in memory: Linked List representation as an object for memory: