Trailer Reimagined: An Innovative, Llm-DRiven, Expressive Automated Movie Summary framework (TRAILDREAMS)

Roberto Balestri; Pasquale Cascarano; Mirko Degli Esposti; Guglielmo Pescatore

doi:10.30935/ojcmt/16669

Trailer Reimagined: An Innovative, Llm-DRiven, Expressive Automated Movie Summary framework (TRAILDREAMS)

Roberto Balestri ¹ ^* , Pasquale Cascarano ¹, Mirko Degli Esposti ¹, Guglielmo Pescatore ¹

More Detail

¹ Università di Bologna, Bologna, ITALY
^* Corresponding Author

Online Journal of Communication and Media Technologies, Volume 15, Issue 3, Article No: e202524. https://doi.org/10.30935/ojcmt/16669

OPEN ACCESS 2601 Views 1066 Downloads Published online: 28 Jul 2025

Download Full Text (PDF)

ABSTRACT

This paper introduces TRAILDREAMS, a framework that uses a large language model (LLM) to automate the production of movie trailers. The purpose of LLM is to select key visual sequences and impactful dialogues, and to help TRAILDREAMS to generate audio elements such as music and voiceovers. The goal is to produce engaging and visually appealing trailers efficiently. In comparative evaluations, TRAILDREAMS surpasses current state-of-the-art trailer generation methods in viewer ratings. However, it still falls short when compared to real, human-crafted trailers. While TRAILDREAMS demonstrates significant promise and marks an advancement in automated creative processes, further improvements are necessary to bridge the quality gap with traditional trailers.

Keywords: LLM, GPT, GenAI, movie marketing, film communication, movie’s trailer, multimedia

CITATION

Balestri, R., Cascarano, P., Degli Esposti, M., & Pescatore, G. (2025). Trailer Reimagined: An Innovative, Llm-DRiven, Expressive Automated Movie Summary framework (TRAILDREAMS). Online Journal of Communication and Media Technologies, 15(3), e202524. https://doi.org/10.30935/ojcmt/16669

REFERENCES

Alberani, D. (2006). Cinemagoer [Computer software]. GitHub. https://cinemagoer.github.io/
Baek, Y., Lee, B., Han, D., Yun, S., & Lee, H. (2019). Character region awareness for text detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9357–9366). IEEE. https://doi.org/10.1109/CVPR.2019.00959
Balestri, R., Cascarano, P., Degli Esposti, M., & Pescatore, G. (2024a). An automatic deep learning approach for trailer generation through large language models. In 2024 9th International Conference on Frontiers of Signal Processing (ICFSP), Paris, France (pp. 93–100). https://doi.org/10.1109/ICFSP62546.2024.10785516
Balestri, R., Cascarano, P., Degli Esposti, M., & Pescatore, G. (2024b). TRAILDREAMS-framework [Computer software]. GitHub. https://github.com/robertobalestri/TRAILDREAMS-Framework
Bellard, F. (2000). FFmpeg [Computer software]. https://ffmpeg.org/
Brachmann, C., Chunpir, H. I., Gennies, S., Haller, B., Kehl, P., Mochtarram, A. P., Möhlmann, D., Schrumpf, C., Schultz, C., Stolper, B., Walther-Franks, B., Jacobs, A., Hermes, T., & Herzog, O. (2009). Automatic movie trailer generation based on semantic video patterns. In I. Maglogiannis, V. Plagianakos, & I. Vlahavas (Eds.), Artificial intelligence: Theories and applications. SETN 2012. Lecture notes in computer science (Vol. 7297, pp. 345–352). Springer. https://doi.org/10.1007/978-3-642-30448-4_44
Bredin, H., Yin, R., Coria, J. M., Gelly, G., Korshunov, P., Lavechin, M., Fustes, Di., Titeux, H., Bouaziz, W., & Gill, M. P. (2020). Pyannote.Audio: Neural building blocks for speaker diarization. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 7124–7128). IEEE. https://doi.org/10.1109/ICASSP40776.2020.9052974
Castellano, B. (2014). PySceneDetect [Computer software]. https://www.scenedetect.com/
Copet, J., Kreuk, F., Gat. Itai, Remez, T., Kant, D., Synnaeve, G., Adi, Y., & Défossez, A. (2024). Simple and controllable music generation. arXiv. https://doi.org/10.48550/arXiv.2306.05284
Coqui AI. (2021). XTTS [Computer software]. https://docs.coqui.ai/en/latest/models/xtts.html
De Palma, B. (Director). (1996). Mission: Impossible [Film]. Paramount Pictures.
Degli Esposti, M., & Pescatore, G. (2023). Exploring TV seriality and television studies through data-driven approaches. In Proceedings of the 13^th Media Mutations International Conference. https://doi.org/10.21428/93b7ef64.ec022085
Epstein, Z., Hertzmann, A., Akten, M., Farid, H., Fjeld, J., Frank, M. R., Groh, M., Herman, L., Leach, N., Mahari, R., Pentland, A., Russakovsky, O., Schroeder, H., & Smith, A. (2023). Art and the science of generative AI. Science, 380(6650), 1110–1111. https://doi.org/10.1126/science.adh4451
Explosion AI. (2016). spaCy English models [Computer software]. https://spacy.io/models/en
Gallifant, J., Fiske, A., Levites Strekalova, Y. A., Osorio-Valencia, J. S., Parke, R., Mwavu, R., Martinez, N., Gichoya, J. W., Ghassemi, M., Demner-Fushman, D., McCoy, L. G., Celi, L. A., & Pierce, R. (2024). Peer review of GPT–4 technical report and systems card. PLOS Digital Health, 3(1), Article e0000417. https://doi.org/10.1371/journal.pdig.0000417
Hesham, M., Hani, B., Fouad, N., & Amer, E. (2018). Smart trailer: Automatic generation of movie trailer using only subtitles. In Proceedings of the 1^st International Workshop on Deep and Representation Learning (pp. 26–30). https://doi.org/10.1109/IWDRL.2018.8358211
Hu, Y., Jin, L., & Jiang, X. (2022). A GCN-based framework for generating trailers. In Proceedings of the 8^th International Conference on Computing and Artificial Intelligence (pp. 610–617). https://doi.org/10.1145/3532213.3532306
Irie, G., Satou, T., Kojima, A., Yamasaki, T., & Aizawa, K. (2010). Automatic trailer generation. In Proceedings of the 18^th ACM International Conference on Multimedia (pp. 839–842). ACM. https://doi.org/10.1145/1873951.1874092
Jackson, P. (Director). (2012). The Hobbit: An unexpected journey [Film]. Warner Bros.
JaidedAI. (2023). EASYOCR [Computer software]. GitHub. https://github.com/JaidedAI/EASYOCR
jianfch. (2023). Stable Whisper [Computer software]. GitHub. https://github.com/jianfch/stable-ts
Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 22(140), Article 55.
Long, F., Qiu, Z., Yao, T., & Mei, T. (2024). VideoDrafter: Content-consistent multi-scene video generation with LLM. arXiv. https://doi.org/10.48550/arXiv.2401.01256
Mahasseni, B., Lam, M., & Todorovic, S. (2017). Unsupervised video summarization with adversarial LSTM networks. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (pp. 2982–2991). IEEE. https://doi.org/10.1109/CVPR.2017.318
Marhon, S. A., Cameron, C. J. F., & Kremer, S. C. (2013). Recurrent neural networks. In M. Bianchini, M. Maggini, & L. Jain (Eds.), Handbook on neural information processing. Intelligent systems reference library (Vol. 49, pp. 29–65). Springer. https://doi.org/10.1007/978-3-642-36657-4_2
Nolan, C. (Director). (2014). Interstellar [Film]. Paramount Pictures.
Oliveira, D. (2024, January 7). Creating movie trailers with AI. Towards AI. https://towardsai.net/p/machine-learning/creating-movie-trailers-with-ai
OpenAI. (2023a). OpenAI–GPT-4. OpenAI. https://openai.com/gpt-4
OpenAI. (2023b). Whisper [Computer software]. GitHub. https://github.com/openai/whisper
Papalampidi, P., Keller, F., & Lapata, M. (2021). Film trailer generation via task decomposition. arXiv. https://doi.org/10.48550/arXiv.2111.08774
Pavel, A., Reed, C., Hartmann, B., & Agrawala, M. (2014). Video digests: A browsable, skimmable format for informational lecture videos. In Proceedings of the 27^th Annual ACM Symposium on User Interface Software and Technology (pp. 573–582). ACM. https://doi.org/10.1145/2642918.2647400
Piccolomini, E. L., Gandolfi, S., Poluzzi, L., Tavasci, L., Cascarano, P., & Pascucci, A. (2019). Recurrent neural networks applied to GNSS time series for denoising and prediction. In Proceedings of the 26^th International Symposium on Temporal Representation and Reasoning. https://doi.org/10.4230/LIPIcs.TIME.2019.10
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning transferable visual models from natural language supervision [Preprint]. arXiv. https://arxiv.org/abs/2103.00020
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2022). clip-ViT-L-14 [Computer software]. https://huggingface.co/sentence-transformers/clip-ViT-L-14
Ratcliff, J. W., & Metzener, D. (1988). Pattern matching: The gestalt approach. Dr. Dobb’s Journal, 13, Article 46.
Rehusevych, O., & Firman, T. (2020). movie2trailer: Unsupervised trailer generation using anomaly detection. In D. Tabernik, A. Lukezic, & K. Grm (Eds.), Proceedings of the 25^th Computer Vision Winter Workshop.
Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the EMNLP-IJCNLP 2019–2019 Conference on Empirical Methods in Natural Language Processing and the 9^th International Joint Conference on Natural Language Processing. https://doi.org/10.18653/v1/d19-1410
Richards, G. (2018, March 14). Going in deep: How have movie trailers changed in the last decade? Exit6 Film Festival Blog. https://www.exit6filmfestival.com/post/2018/03/14/going-in-deep-how-have-movie-trailers-changed-in-the-last-decade
Rouard, S., Massa, F., & Défossez, A. (2023). Hybrid transformers for music source separation. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece (pp. 1–5). https://doi.org/10.1109/ICASSP49357.2023.10096956
Sentence Transformers. (2022). clip-ViT-L-14. https://huggingface.co/sentence-transformers/clip-ViT-L-14
Shi, B., Bai, X., & Yao, C. (2017). An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11), 2298–2304. https://doi.org/10.1109/TPAMI.2016.2646371
Smeaton, A. F., Lehane, B., O’Connor, N. E., Brady, C., & Craig, G. (2006). Automatically selecting shots for action movie trailers. In Proceedings of the 8^th ACM International Workshop on Multimedia Information Retrieval (pp. 231–238). ACM. https://doi.org/10.1145/1178677.1178709
Smith, J. R., Joshi, D., Huet, B., Hsu, W., & Cota, J. (2017). Harnessing A.I. for augmenting creativity: Application to movie trailer creation. In Proceedings of the 25^th ACM International Conference on Multimedia (pp. 1799–1808). ACM. https://doi.org/10.1145/3123266.3127906
Snyder, B. (2005). Save the cat!: The last book on screenwriting you’ll ever need. Michael Wiese Productions.
Tarwani, K. M., & Edem, S. (2017). Survey on recurrent neural network in natural language processing. International Journal of Engineering Trends and Technology, 48(6), 301–304. https://doi.org/10.14445/22315381/IJETT-V48P253
Wasko, J. (2003). How Hollywood works. SAGE. https://doi.org/10.4135/9781446220214
Xie, J., Chen, X., Zhang, T., Zhang, Y., Lu, S.-P., Cesar, P., & Yang, Y. (2023). Multimodal-based and aesthetic-guided narrative video summarization. IEEE Transactions on Multimedia, 25, 4894–4908. https://doi.org/10.1109/TMM.2022.3183394
Xu, H., Zhen, Y., & Zha, H. (2015). Trailer generation via a point process-based visual attractiveness model. In Proceedings of the 24th International Conference on Artificial Intelligence (IJCAI'15) (pp. 2198–2204). AAAI Press. https://dl.acm.org/doi/10.5555/2832415.2832554
Zen, H., Dang, V., Clark, R., Zhang, Y., Weiss, R. J., Jia, Y., Chen, Z., & Wu, Y. (2019). Libritts: A corpus derived from LibriSpeech for text-to-speech. In Proceedings of the Annual Conference of the International Speech Communication Association. https://doi.org/10.21437/Interspeech.2019-2441
Zhou, H., Hermans, T., Karandikar, A. V., & Rehg, J. M. (2010). Movie genre classification via scene categorization. In Proceedings of the 18^th ACM International Conference on Multimedia (pp. 747–750). ACM. https://doi.org/10.1145/1873951.1874068
Zhou, K., Qiao, Y., & Xiang, T. (2018). Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In Proceedings of the 32^nd AAAI Conference on Artificial Intelligence. AAAI. https://doi.org/10.1609/aaai.v32i1.12255
Zhu, J., Yang, H., He, H., Wang, W., Tuo, Z., Cheng, W.-H., Gao, L., Song, J., & Fu, J. (2023). MovieFactory: Automatic movie creation from text using large generative models for language and images. In Proceedings of the 31^st ACM International Conference on Multimedia (pp. 9313–9319). ACM. https://doi.org/10.1145/3581783.3612707

Journal Details

Founded In: 2011

Published: Quarterly

Language: English

APC: €950

Indexed in ESCI & SCOPUS

Impact Factor (IF): 0.7 (2023)

Five Year IF : 0.8

JCR Category : Q2

JCI : 1.07 (2024)

CiteScore 2024 : 3.3

Submit Now