Trailer Reimagined: An Innovative, Llm-DRiven, Expressive Automated Movie Summary framework (TRAILDREAMS)

Roberto Balestri 1 * , Pasquale Cascarano 1, Mirko Degli Esposti 1, Guglielmo Pescatore 1
More Detail
1 Università di Bologna, Bologna, ITALY
* Corresponding Author
Online Journal of Communication and Media Technologies, Volume 15, Issue 3, Article No: e202524. https://doi.org/10.30935/ojcmt/16669
OPEN ACCESS   101 Views   43 Downloads   Published online: 28 Jul 2025
Download Full Text (PDF)

ABSTRACT

This paper introduces TRAILDREAMS, a framework that uses a large language model (LLM) to automate the production of movie trailers. The purpose of LLM is to select key visual sequences and impactful dialogues, and to help TRAILDREAMS to generate audio elements such as music and voiceovers. The goal is to produce engaging and visually appealing trailers efficiently. In comparative evaluations, TRAILDREAMS surpasses current state-of-the-art trailer generation methods in viewer ratings. However, it still falls short when compared to real, human-crafted trailers. While TRAILDREAMS demonstrates significant promise and marks an advancement in automated creative processes, further improvements are necessary to bridge the quality gap with traditional trailers.

CITATION

Balestri, R., Cascarano, P., Degli Esposti, M., & Pescatore, G. (2025). Trailer Reimagined: An Innovative, Llm-DRiven, Expressive Automated Movie Summary framework (TRAILDREAMS). Online Journal of Communication and Media Technologies, 15(3), e202524. https://doi.org/10.30935/ojcmt/16669

REFERENCES

  • Alberani, D. (2006). Cinemagoer [Computer software]. GitHub. https://cinemagoer.github.io/
  • Baek, Y., Lee, B., Han, D., Yun, S., & Lee, H. (2019). Character region awareness for text detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9357–9366). IEEE. https://doi.org/10.1109/CVPR.2019.00959
  • Balestri, R., Cascarano, P., Degli Esposti, M., & Pescatore, G. (2024a). An automatic deep learning approach for trailer generation through large language models. In 2024 9th International Conference on Frontiers of Signal Processing (ICFSP), Paris, France (pp. 93–100). https://doi.org/10.1109/ICFSP62546.2024.10785516
  • Balestri, R., Cascarano, P., Degli Esposti, M., & Pescatore, G. (2024b). TRAILDREAMS-framework [Computer software]. GitHub. https://github.com/robertobalestri/TRAILDREAMS-Framework
  • Bellard, F. (2000). FFmpeg [Computer software]. https://ffmpeg.org/
  • Brachmann, C., Chunpir, H. I., Gennies, S., Haller, B., Kehl, P., Mochtarram, A. P., Möhlmann, D., Schrumpf, C., Schultz, C., Stolper, B., Walther-Franks, B., Jacobs, A., Hermes, T., & Herzog, O. (2009). Automatic movie trailer generation based on semantic video patterns. In I. Maglogiannis, V. Plagianakos, & I. Vlahavas (Eds.), Artificial intelligence: Theories and applications. SETN 2012. Lecture notes in computer science (Vol. 7297, pp. 345–352). Springer. https://doi.org/10.1007/978-3-642-30448-4_44
  • Bredin, H., Yin, R., Coria, J. M., Gelly, G., Korshunov, P., Lavechin, M., Fustes, Di., Titeux, H., Bouaziz, W., & Gill, M. P. (2020). Pyannote.Audio: Neural building blocks for speaker diarization. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 7124–7128). IEEE. https://doi.org/10.1109/ICASSP40776.2020.9052974
  • Castellano, B. (2014). PySceneDetect [Computer software]. https://www.scenedetect.com/
  • Copet, J., Kreuk, F., Gat. Itai, Remez, T., Kant, D., Synnaeve, G., Adi, Y., & Défossez, A. (2024). Simple and controllable music generation. arXiv. https://doi.org/10.48550/arXiv.2306.05284
  • Coqui AI. (2021). XTTS [Computer software]. https://docs.coqui.ai/en/latest/models/xtts.html
  • De Palma, B. (Director). (1996). Mission: Impossible [Film]. Paramount Pictures.
  • Degli Esposti, M., & Pescatore, G. (2023). Exploring TV seriality and television studies through data-driven approaches. In Proceedings of the 13th Media Mutations International Conference. https://doi.org/10.21428/93b7ef64.ec022085
  • Epstein, Z., Hertzmann, A., Akten, M., Farid, H., Fjeld, J., Frank, M. R., Groh, M., Herman, L., Leach, N., Mahari, R., Pentland, A., Russakovsky, O., Schroeder, H., & Smith, A. (2023). Art and the science of generative AI. Science, 380(6650), 1110–1111. https://doi.org/10.1126/science.adh4451
  • Explosion AI. (2016). spaCy English models [Computer software]. https://spacy.io/models/en
  • Gallifant, J., Fiske, A., Levites Strekalova, Y. A., Osorio-Valencia, J. S., Parke, R., Mwavu, R., Martinez, N., Gichoya, J. W., Ghassemi, M., Demner-Fushman, D., McCoy, L. G., Celi, L. A., & Pierce, R. (2024). Peer review of GPT–4 technical report and systems card. PLOS Digital Health, 3(1), Article e0000417. https://doi.org/10.1371/journal.pdig.0000417
  • Hesham, M., Hani, B., Fouad, N., & Amer, E. (2018). Smart trailer: Automatic generation of movie trailer using only subtitles. In Proceedings of the 1st International Workshop on Deep and Representation Learning (pp. 26–30). https://doi.org/10.1109/IWDRL.2018.8358211
  • Hu, Y., Jin, L., & Jiang, X. (2022). A GCN-based framework for generating trailers. In Proceedings of the 8th International Conference on Computing and Artificial Intelligence (pp. 610–617). https://doi.org/10.1145/3532213.3532306
  • Irie, G., Satou, T., Kojima, A., Yamasaki, T., & Aizawa, K. (2010). Automatic trailer generation. In Proceedings of the 18th ACM International Conference on Multimedia (pp. 839–842). ACM. https://doi.org/10.1145/1873951.1874092
  • Jackson, P. (Director). (2012). The Hobbit: An unexpected journey [Film]. Warner Bros.
  • JaidedAI. (2023). EASYOCR [Computer software]. GitHub. https://github.com/JaidedAI/EASYOCR
  • jianfch. (2023). Stable Whisper [Computer software]. GitHub. https://github.com/jianfch/stable-ts
  • Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 22(140), Article 55.
  • Long, F., Qiu, Z., Yao, T., & Mei, T. (2024). VideoDrafter: Content-consistent multi-scene video generation with LLM. arXiv. https://doi.org/10.48550/arXiv.2401.01256
  • Mahasseni, B., Lam, M., & Todorovic, S. (2017). Unsupervised video summarization with adversarial LSTM networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (pp. 2982–2991). IEEE. https://doi.org/10.1109/CVPR.2017.318
  • Marhon, S. A., Cameron, C. J. F., & Kremer, S. C. (2013). Recurrent neural networks. In M. Bianchini, M. Maggini, & L. Jain (Eds.), Handbook on neural information processing. Intelligent systems reference library (Vol. 49, pp. 29–65). Springer. https://doi.org/10.1007/978-3-642-36657-4_2
  • Nolan, C. (Director). (2014). Interstellar [Film]. Paramount Pictures.
  • Oliveira, D. (2024, January 7). Creating movie trailers with AI. Towards AI. https://towardsai.net/p/machine-learning/creating-movie-trailers-with-ai
  • OpenAI. (2023a). OpenAI–GPT-4. OpenAI. https://openai.com/gpt-4
  • OpenAI. (2023b). Whisper [Computer software]. GitHub. https://github.com/openai/whisper
  • Papalampidi, P., Keller, F., & Lapata, M. (2021). Film trailer generation via task decomposition. arXiv. https://doi.org/10.48550/arXiv.2111.08774
  • Pavel, A., Reed, C., Hartmann, B., & Agrawala, M. (2014). Video digests: A browsable, skimmable format for informational lecture videos. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (pp. 573–582). ACM. https://doi.org/10.1145/2642918.2647400
  • Piccolomini, E. L., Gandolfi, S., Poluzzi, L., Tavasci, L., Cascarano, P., & Pascucci, A. (2019). Recurrent neural networks applied to GNSS time series for denoising and prediction. In Proceedings of the 26th International Symposium on Temporal Representation and Reasoning. https://doi.org/10.4230/LIPIcs.TIME.2019.10
  • Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning transferable visual models from natural language supervision [Preprint]. arXiv. https://arxiv.org/abs/2103.00020
  • Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2022). clip-ViT-L-14 [Computer software]. https://huggingface.co/sentence-transformers/clip-ViT-L-14
  • Ratcliff, J. W., & Metzener, D. (1988). Pattern matching: The gestalt approach. Dr. Dobb’s Journal, 13, Article 46.
  • Rehusevych, O., & Firman, T. (2020). movie2trailer: Unsupervised trailer generation using anomaly detection. In D. Tabernik, A. Lukezic, & K. Grm (Eds.), Proceedings of the 25th Computer Vision Winter Workshop.
  • Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the EMNLP-IJCNLP 2019–2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. https://doi.org/10.18653/v1/d19-1410
  • Richards, G. (2018, March 14). Going in deep: How have movie trailers changed in the last decade? Exit6 Film Festival Blog. https://www.exit6filmfestival.com/post/2018/03/14/going-in-deep-how-have-movie-trailers-changed-in-the-last-decade
  • Rouard, S., Massa, F., & Défossez, A. (2023). Hybrid transformers for music source separation. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece (pp. 1–5). https://doi.org/10.1109/ICASSP49357.2023.10096956
  • Sentence Transformers. (2022). clip-ViT-L-14. https://huggingface.co/sentence-transformers/clip-ViT-L-14
  • Shi, B., Bai, X., & Yao, C. (2017). An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2298–2304. https://doi.org/10.1109/TPAMI.2016.2646371
  • Smeaton, A. F., Lehane, B., O’Connor, N. E., Brady, C., & Craig, G. (2006). Automatically selecting shots for action movie trailers. In Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval (pp. 231–238). ACM. https://doi.org/10.1145/1178677.1178709
  • Smith, J. R., Joshi, D., Huet, B., Hsu, W., & Cota, J. (2017). Harnessing A.I. for augmenting creativity: Application to movie trailer creation. In Proceedings of the 25th ACM International Conference on Multimedia (pp. 1799–1808). ACM. https://doi.org/10.1145/3123266.3127906
  • Snyder, B. (2005). Save the cat!: The last book on screenwriting you’ll ever need. Michael Wiese Productions.
  • Tarwani, K. M., & Edem, S. (2017). Survey on recurrent neural network in natural language processing. International Journal of Engineering Trends and Technology, 48(6), 301–304. https://doi.org/10.14445/22315381/IJETT-V48P253
  • Wasko, J. (2003). How Hollywood works. SAGE. https://doi.org/10.4135/9781446220214
  • Xie, J., Chen, X., Zhang, T., Zhang, Y., Lu, S.-P., Cesar, P., & Yang, Y. (2023). Multimodal-based and aesthetic-guided narrative video summarization. IEEE Transactions on Multimedia, 25, 4894–4908. https://doi.org/10.1109/TMM.2022.3183394
  • Xu, H., Zhen, Y., & Zha, H. (2015). Trailer generation via a point process-based visual attractiveness model. In Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI'15) (pp. 2198–2204). AAAI Press. https://dl.acm.org/doi/10.5555/2832415.2832554
  • Zen, H., Dang, V., Clark, R., Zhang, Y., Weiss, R. J., Jia, Y., Chen, Z., & Wu, Y. (2019). Libritts: A corpus derived from LibriSpeech for text-to-speech. In Proceedings of the Annual Conference of the International Speech Communication Association. https://doi.org/10.21437/Interspeech.2019-2441
  • Zhou, H., Hermans, T., Karandikar, A. V., & Rehg, J. M. (2010). Movie genre classification via scene categorization. In Proceedings of the 18th ACM International Conference on Multimedia (pp. 747–750). ACM. https://doi.org/10.1145/1873951.1874068
  • Zhou, K., Qiao, Y., & Xiang, T. (2018). Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. AAAI. https://doi.org/10.1609/aaai.v32i1.12255
  • Zhu, J., Yang, H., He, H., Wang, W., Tuo, Z., Cheng, W.-H., Gao, L., Song, J., & Fu, J. (2023). MovieFactory: Automatic movie creation from text using large generative models for language and images. In Proceedings of the 31st ACM International Conference on Multimedia (pp. 9313–9319). ACM. https://doi.org/10.1145/3581783.3612707