The error is that you are trying to write the dataframe onto the same location for each the item in your givenItemList collection. Usually if do that it should give error
OutputDirectory already exists
But since the foreach function would execute all the items in parallel thread, you are getting this error.You can give separate directories for each thread like this
givenItemList.parallelStream().forEach( item -> { String query = "select $item as itemCol , avg($item) as mean groupBy year";Dataset<Row> resultDs = sparkSession.sql(query);saveDsToHdfs(Strin.format("%s_item",hdfsPath), resultDs );
});
Or else you can also have subdirectories under hdfspath like this
givenItemList.parallelStream().forEach( item -> { String query = "select $item as itemCol , avg($item) as mean groupBy year";Dataset<Row> resultDs = sparkSession.sql(query);saveDsToHdfs(Strin.format("%s/item",hdfsPath), resultDs );
});`