Prasad KhodeUnderstanding SQL Self Joins with ScenariosSelf joins can often be an overlooked aspect of SQL, yet they are incredibly useful for querying hierarchical or related data within a…Oct 10Oct 10
Prasad KhodeGrouping a Spark DataFrame and Creating JSON Lists in ScalaIn this post, we’ll explore how to group a Spark DataFrame by a specific column and create a list of JSON objects from other columns. This…Oct 9Oct 9
Prasad KhodeSimplifying Dynamic Partition Overwrite in Spark: A Guide to PartitionOverwriteModeWhen you’re dealing with large amounts of data in Apache Spark, managing your data efficiently becomes important. One way to do this is by…Oct 9Oct 9
Prasad KhodeHow to Run Ollama Locally Using DockerRunning AI models locally can be a great way to leverage the power of machine learning without relying on cloud services. In this guide, I…Sep 2Sep 2
Prasad KhodeSetting Up Apache Airflow for Local DevelopmentIn this guide, we’ll walk through setting up Apache Airflow on a local machine using Conda to manage the Python environment.Aug 29Aug 29
Prasad KhodeHandling Dynamic JSON Schemas in Apache Spark: A Step-by-Step Guide Using ScalaIn the world of big data, working with JSON data is a common task. However, handling JSON schemas that may vary or are not predefined can…Aug 21Aug 21
Prasad KhodeHow to Retrieve the Input File Name as a Column Value in Apache SparkWhen working with large datasets in Apache Spark, there are scenarios where you might need to identify the origin of each row of data —…Aug 16Aug 16
Prasad KhodeAdding External and Maven JARs to Spark Shell for Ad-Hoc AnalysisWhen performing ad-hoc data analysis using Apache Spark, you may encounter situations where you need additional libraries to process your…Aug 13Aug 13
Prasad KhodeHandling Invalid Column Names in Spark: A Step-by-Step GuideIn data processing, it’s common to encounter files where the first line contains invalid or dummy column names, which can disrupt the…Aug 12Aug 12