parse_raw_tweets_to_cascades.Rd
This function extracts cascades from a given jsonl file where each line is a tweet json object. Please refer to the Twitter developer documentation: https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object
parse_raw_tweets_to_cascades( path, batch = 1e+05, cores = 1, output_path = NULL, keep_user = F, keep_absolute_time = F, progress = T, return_as_list = T, save_temp = F )
path | File path to the tweets jsonl file |
---|---|
batch | Number of tweets to be read for processing at each iteration, choose the best number for your memory load. Defaults to at most 10000 tweets each iteration. |
cores | Number of cores to be used for processing each batch in parallel. |
output_path | If provided, the index.csv and data.csv files which define the cascaddes will be generated. In index.csv, each row is a cascade where events can be obtained from data.csv by corresponding indics (start_ind to end_ind). Defaults to NULL. |
keep_user | Twitter user ids will be kept |
keep_absolute_time | Keep the absolute tweeting times |
progress | The progress will be reported if set to True (default) |
return_as_list | If true then a list of cascades (data.frames) will be returned. |
If return_as_list is TRUE then a list of data.frames where each data.frame is a retweet cascade. Otherwise there will be no return.