admin管理员组

文章数量:1431949

I'm trying to use AWS Glue Streaming ETL job to read and write using trigger.AvailableNow with Kinesis Data Streams as I used with Kafka, but no record is processed and all the checkpoint files have inconsistent data related to startingPosition and iteratorType.

streaming_df: DataFrame = spark \
.readStream \
.format("kinesis") \
.option("streamName", "oxg-cdp-cdc-stream-sbx") \
.option("endpointUrl", ";) \
.option("region", "eu-central-1")\
.option("startingPosition", "earliest")\
.load()


streaming_df \
.writeStream \
.option("checkpointLocation", <s3_location>") \
.trigger(availableNow=True) \
.foreachBatch(for_each_batch_funtion) \
.start() \
.awaitTermination()

It seems that availableNow doesn't work properly with Kinesis, but I can't find any oficial documentation stating that.

I'm trying to use AWS Glue Streaming ETL job to read and write using trigger.AvailableNow with Kinesis Data Streams as I used with Kafka, but no record is processed and all the checkpoint files have inconsistent data related to startingPosition and iteratorType.

streaming_df: DataFrame = spark \
.readStream \
.format("kinesis") \
.option("streamName", "oxg-cdp-cdc-stream-sbx") \
.option("endpointUrl", "https://kinesis.eu-central-1.amazonaws") \
.option("region", "eu-central-1")\
.option("startingPosition", "earliest")\
.load()


streaming_df \
.writeStream \
.option("checkpointLocation", <s3_location>") \
.trigger(availableNow=True) \
.foreachBatch(for_each_batch_funtion) \
.start() \
.awaitTermination()

It seems that availableNow doesn't work properly with Kinesis, but I can't find any oficial documentation stating that.

Share Improve this question edited Nov 28, 2024 at 14:32 Rob 15.2k30 gold badges48 silver badges73 bronze badges asked Nov 18, 2024 at 20:54 Alexandre SilvaAlexandre Silva 133 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 0

Here Spark github example: Spark-Kinesis

I also find the official documentation from Spark: Spark Streaming + Kinesis

Hope this can help you.

本文标签: apache sparkno record is processed and all the checkpoint files have inconsistent dataStack Overflow