This function extracts cascades from a given jsonl file where each line is a tweet json object. Please refer to the Twitter developer documentation: https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object

parse_raw_tweets_to_cascades(
  path,
  batch = 1e+05,
  cores = 1,
  output_path = NULL,
  keep_user = F,
  keep_absolute_time = F,
  progress = T,
  return_as_list = T,
  save_temp = F
)

Arguments

path

File path to the tweets jsonl file

batch

Number of tweets to be read for processing at each iteration, choose the best number for your memory load. Defaults to at most 10000 tweets each iteration.

cores

Number of cores to be used for processing each batch in parallel.

output_path

If provided, the index.csv and data.csv files which define the cascaddes will be generated. In index.csv, each row is a cascade where events can be obtained from data.csv by corresponding indics (start_ind to end_ind). Defaults to NULL.

keep_user

Twitter user ids will be kept

keep_absolute_time

Keep the absolute tweeting times

progress

The progress will be reported if set to True (default)

return_as_list

If true then a list of cascades (data.frames) will be returned.

Value

If return_as_list is TRUE then a list of data.frames where each data.frame is a retweet cascade. Otherwise there will be no return.