Mostrar el registro sencillo del ítem

dc.contributor.advisorGarzón A, Wilmer
dc.contributor.advisorBenavides Navarro, Luis Daniel
dc.contributor.authorDíaz Chica, Luis Felipe
dc.date.accessioned2023-10-03T20:32:29Z
dc.date.available2023
dc.date.available2023-10-03T20:32:29Z
dc.date.issued2023
dc.identifier.urihttps://repositorio.escuelaing.edu.co/handle/001/2623
dc.descriptionWe introduce the concept of ”implicit architec- tural patterns,” which we define as the knowledge related to architectural patterns that is not explicitly expressed in the code. We build a biased labeled dataset of 14000 files with modern cloud architectural patterns. We used the dataset and fine-tuning techniques to train CodeBERT, UnixCode, CodeT5, and RobERTA pre-trained LLM in code. The trained models achieved an F1-score of 96% on average. We generated a second unknown dataset for testing the fine-tuned models, revealing consistent predictions across the models. Notably, in their original state, the pre-trained models could not accurately identify and classify patterns. However, after applying fine-tunning the mod- els substantially improved the accuracy for classifying modern architectural patterns. We found that the most common patterns present in GitHub repositories are event-driven 34%, serverless 30%, object- storage 16% and microservices 10%. We used the analysis results to investigate further relationships between IaC components and cloud architectural patternseng
dc.description.abstractLa infraestrucutura como código o por sus siglas en inglés IaC (Infrastructure as Code) es una modelo de gestión de recursos en la nube por medio de especificaciones de código. En nuestra investigación buscamos extraer conocimiento implícito de los proyectos de IaC relacionado a los patrones de arquitectura que están siendo utilizados en la comunidad de código libre. Para esto hemos realizado un análisis del estado del arte en temas relacionados con el análisis estático de código con modelos de lenguaje de gran envergadura también conocidos como Large Language Models en inglés(LLM), para posteriormente aplicar técnicas de transferencia de conocimiento a un conjunto de modelos pre-entrenados y categorizar los patrones de arquitectura encontrados en los proyectos de IaC. La transferencia de conocimiento es aplicada usando refinamiento (fine-tuning) y su- pervisado débil. Definimos un sistema de reglas que según los componentes de la infraestructura presente en el proyecto categorizamos un posible patrón de arqui- tectura. Este sistema de reglas es usado para construir un dataset inicial de 13200 archivos en 4 lenguajes de programación con sus respectivas etiquetas en 11 cate- gorías de patrones de arquitectura. Hemos logrado encontrar una mejora significativa en la categorización de los patrones de arquitectura después de aplicar transferencia de conocimiento a los modelos pre- entrenados en código. UnixCode y CodeBERT lograron alcanzar un F1-score 0.96% de precisión durante entrenamiento. Después de aplicar los modelos a un dataset desconocido encontramos que los patrones más usado son event-driven, serverless, microservicios y object storage dentro de la comunidad open source(Github). Tam- bién el lenguaje de programación predominante en Cloud Development Kit (CDK) es Typescript seguido por python. Logramos evidenciar un buen rendimiento en la clasificación de los patrones usando seq2seq como la técnica de representación del código y modelos pre-entrenados basados en RoBERTa.spa
dc.format.extent101 páginasspa
dc.format.mimetypeapplication/pdfspa
dc.language.isospaspa
dc.publisherEscuela Colombian de Ingenieríaspa
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/spa
dc.titleRevelando patrones arquitectónicos implícitos en Infraestructura como código a través de la transferencia de conocimiento de repositorio de códigoeng
dc.typeTrabajo de grado - Maestríaspa
dc.type.versioninfo:eu-repo/semantics/publishedVersionspa
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2spa
oaire.awardtitleRevelando patrones arquitectónicos implícitos en Infraestructura como Código (IaC) a través de la transferencia de conocimientos del repositorio de códigospa
oaire.versionhttp://purl.org/coar/version/c_970fb48d4fbd8a85spa
dc.description.degreelevelMaestríaspa
dc.description.degreenameMagíster en Informáticaspa
dc.identifier.urlhttps://catalogo.escuelaing.edu.co/cgi-bin/koha/opac-detail.pl?biblionumber=23583
dc.publisher.facultyIngeniería de Sistemasspa
dc.publisher.placeBogotáspa
dc.publisher.programMaestría en Informáticaspa
dc.relation.indexedN/Aspa
dc.relation.referencesAhmad, A., Jamshidi, P., Pahl, C., 2013. A framework for acquisi- tion and application of software architecture evolution knowledge URLspa
dc.relation.referencesAlexander, C., Ishikawa, S., Silverstein, M., 1977. A Pattern Language: Towns, Buildings, Construction. Center for Environmental Structure Berkeley, Calif.: Center for Environmental Structure series, OUP USAspa
dc.relation.referencesAlom, M.Z., Taha, T.M., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, M.S., Hasan, M., Van Essen, B.C., Awwal, A.A.S., Asari, V.K., 2019. A state-of-the-art survey on deep learning theory and architectures. Electronics 8.spa
dc.relation.referencesAlon, U., Brody, S., Levy, O., Yahav, E., 2019. code2seq: Generating sequences from structured representations of code.spa
dc.relation.referencesAlon, U., Zilberstein, M., Levy, O., Yahav, E., 2018a. code2vec: Learning distributed representations of code.spa
dc.relation.referencesAlon, U., Zilberstein, M., Levy, O., Yahav, E., 2018b. A general path-based repre- sentation for predicting program propertiesspa
dc.relation.referencesAlzubaidi, L., Zhang, J., Humaidi, A.J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., J., S., Fadhel, M.A., Al-Amidie, M., Farhan, L., 2021. Review of deep learning: Concepts, cnn architectures, challenges, applications, future directions - journal of big data.spa
dc.relation.referenceshe future of cloud development - Ampt — getampt.com. https://www. getampt.com/blog/introducing-ampt/.spa
dc.relation.referencesAviv, I., Gafni, R., Sherman, S., Aviv, B., Sterkin, A., Bega, E., 2023. Infrastructure from code: The next generation of cloud lifecycle automation. IEEE Software 40, 42–49.spa
dc.relation.referencesBabar, M., Gorton, I., Jeffery, R., 2005. Capturing and using software architec- ture knowledge for architecture-based software development, in: Fifth Interna- tional Conference on Quality Software (QSIC’05), pp. 169–176.spa
dc.relation.referencesBecker, M., Liang, S., Frank, A., 2021. Reconstructing implicit knowledge with language models, in: Workshop on Knowledge Extraction and Integration for Deep Learning Architectures; Deep Learning Inside Out.spa
dc.relation.referencesBorovits, N., Kumara, I., Krishnan, P., Palma, S.D., Di Nucci, D., Palomba, F., Tamburri, D.A., van den Heuvel, W.J., 2020. Deepiac: Deep learning-based lin- guistic anti-pattern detection in iac, in: Proceedings of the 4th ACM SIGSOFT In- ternational Workshop on Machine-Learning Techniques for Software-Quality Eval- uation,spa
dc.relation.referencesBorovits, N., Kumara, I., Krishnan, P., Palma, S.D., Di Nucci, D., Palomba, F., Tamburri, D.A., van den Heuvel, W.J., 2020. Deepiac: Deep learning-based lin- guistic anti-pattern detection in iac, in: Proceedings of the 4th ACM SIGSOFT In- ternational Workshop on Machine-Learning Techniques for Software-Quality Eval- uation,spa
dc.relation.referencesBriem, J.A., Smit, J., Sellik, H., Rapoport, P., 2019. Using distributed representation of code for bug detection.spa
dc.relation.referencesBrock, A., Lim, T., Ritchie, J.M., Weston, N., 2017. Freezeout: Accelerate training by progressively freezing layersspa
dc.relation.referencesMcCandlish, S., Radford, A., Sutskever, I., Amodei, D., 2020. Language models are few-shot learners.spa
dc.relation.referencesFine-tuning convolutional neu- ral networks for fine art classification. Expert Systems with Applicationsspa
dc.relation.referencesWithin-project defect prediction of infrastructure-as-code using product and process metrics. IEEE Transactions on Software Engineering 48, 2086–2104spa
dc.relation.referencesDalla Palma, S., Di Nucci, D., Tamburri, D.A., 2020. Ansiblemetrics: A python library for measuring infrastructure-as-code blueprints in ansible.spa
dc.relation.referencesDe Lauretis, L., 2019. From monolithic architecture to microservices architecture, in: 2019 IEEE International Symposium on Software Reliability Engineering Work- shopsspa
dc.relation.referencesDu, X., Cai, Y., Wang, S., Zhang, L., 2016. Overview of deep learning, in: 2016 31st Youth Academic Annual Conference of Chinese Association of Automationspa
dc.relation.referencesFadlullah, Z.M., Tang, F., Mao, B., Kato, N., Akashi, O., Inoue, T., Mizutani, K., 2017. State-of-the-art deep learning: Evolving machine intelligence toward tomorrow’s intelligent network traffic control systemsspa
dc.relation.referencesFehling, C., Leymann, F., Retter, R., Schupeck, W., Arbitter, P., 2014. Cloud computing patterns. 2014 ed., Springer, Vienna, Austria.spa
dc.relation.referencesFeitosa, D., Penca, M.T., Berardi, M., Boza, R.D., Andrikopoulos, V., 2023. Mining for cost awareness in the infrastructure as code artifacts of cloud-based applica- tions: an exploratory study.spa
dc.relation.referencesFeng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., Zhou, M., 2020. Codebert: A pre-trained model for programming and natural languages.spa
dc.relation.referencesGalassi, A., Lippi, M., Torroni, P., 2021. Attention in natural language processing. IEEE Transactions on Neural Networks and Learning Systemsspa
dc.relation.referencesGamma, E., Helm, R., Larman, C., Johnson, R., Vlissides, J., 2005. Valuepack: Design Patterns:Elements of Reusable Object-Oriented Software with Applying UML and Patterns:An Introduction to Object-Oriented Analysis and Design and References 83 Iterative Development. Addison Weslespa
dc.relation.referencesGeorgousis, S., Kenning, M.P., Xie, X., 2021. Graph deep learning: State of the art and challenges. IEEE Access 9, 22106–22140spa
dc.relation.referencesGoodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y., 2014.spa
dc.relation.referencesGu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, G., Cai, J., Chen, T., 2018.spa
dc.relation.referencesGuerriero, M., Garriga, M., Tamburri, D.A., Palomba, F., 2019. Adoption, support, and challenges of infrastructure-as-code: Insights from industry, in: 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME)spa
dc.relation.referencesGuo, D., Lu, S., Duan, N., Wang, Y., Zhou, M., Yin, J., 2022. Unixcoder: Unified cross-modal pre-training for code representationspa
dc.relation.referencesClement, C., Drain, D., Sundare- san, N., Yin, J., Jiang, D., Zhou, M., 2021. Graphcodebert: Pre-training code representations with data flowspa
dc.relation.referencesHao, W., Bie, R., Guo, J., Meng, X., Wang, S., 2018. Optimized cnn based image recognition through target region selection.spa
dc.relation.referencesHasan, M.M., Bhuiyan, F.A., Rahman, A., 2020. Testing practices for infrastructure as code, in: Proceedings of the 1st ACM SIGSOFT International Workshop on Languages and Tools for Next-Generation Testing, Association for Computing Machinery, New York, NY, USA. p. 7–12spa
dc.relation.referencesJoshi, A.V., 2020. Amazon’s Machine Learning Toolkit: Sagemaker. Springer In- ternational Publishing, Cham. pp. 233–243. URLspa
dc.relation.referencesKagdi, H., Collard, M.L., Maletic, J.I., 2007. A survey and taxonomy of approaches for mining software repositories in the context of soft- ware evolution. Journal of Software Maintenance and Evolution: Re- search and Practice 19, 77–131.spa
dc.relation.referencesKaliyar, R.K., 2020. A multi-layer bidirectional transformer encoder for pre-trained word embedding: A survey of bert, in: 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence), pp. 336–340.spa
dc.relation.referencesKaramanolakis, G., Mukherjee, S., Zheng, G., Awadallah, A.H., 2021. Self-training with weak supervision. CoRR abs/2104.05514spa
dc.relation.referencesarras, T., Aila, T., Laine, S., Lehtinen, J., 2017. Progressive growing of gans for improved quality, stability, and variation. CoRR abs/1710.10196.spa
dc.relation.referencesKeery, S., Harber, C., Young, M., 2019. Implementing Cloud Design Patterns for AWS: Solutions and design ideas for solving system design problems. Packt Pub- lishing, Limitedspa
dc.relation.referencesKovalenko, V., Bogomolov, E., Bryksin, T., Bacchelli, A., 2019. Pathminer: A library for mining of path-based representations of code, in: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), pp. 13– 17spa
dc.relation.referencesLand, L., Aurum, A., Handzic, M., 2001. Capturing implicit software engineering knowledge, in: Proceedings 2001 Australian Software Engineering Conference, pp. 108–114.spa
dc.relation.referencesLinthicum, D.S., 2017. Cloud-native applications and cloud migration: The good, the bad, and the points between. IEEE Cloud Computing 4, 12–14spa
dc.relation.referencesLiu, Y., Agarwal, S., Venkataraman, S., 2021. Autofreeze: Automatically freezing model blocks to accelerate fine-tuningspa
dc.relation.referencesMaffort, C., Valente, M.T., Bigonha, M., Hora, A., Anquetil, N., Menezes, J., 2013. Mining Architectural Patterns Using Association Rules, in: International Con- ference on Software Engineering and Knowledge Engineering (SEKE’13), Boston, United Statesspa
dc.relation.referencesMistrik, I., Bahsoon, R., Ali, N., Heisel, M., Maxim, B., 2017. Software architecture for Big Data and the cloud.spa
dc.relation.referencesNiu, C., Li, C., Ng, V., Ge, J., Huang, L., Luo, B., 2022. Spt-code: Sequence-to- sequence pre-training for learning source code representations, in: Proceedings of the 44th International Conference on Software Engineering, Association for Computing Machinery, New York, NY, USA. p. 2006–2018spa
dc.relation.referencesOpdebeeck, R., Zerouali, A., Velázquez-Rodríguez, C., De Roover, C., 2021. On the practice of semantic versioning for ansible galaxy roles: An empiri- cal study and a change classification model. Journal of Systems and Software 182, 111059spa
dc.relation.referencesPalangi, H., Deng, L., Shen, Y., Gao, J., He, X., Chen, J., Song, X., Ward, R., 2016. Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, 694–707.spa
dc.relation.referencesPerez., Q., Borgne., A.L., Urtado., C., Vauttier., S., 2021. Towards profiling runtime architecture code contributors in software projects, in: Proceedings of the 16th International Conference on Evaluation of Novel Approaches to Soft- ware Engineeringspa
dc.relation.referencesRadford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., 2019a. Language models are unsupervised multitask learners.spa
dc.relation.referencesRadford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al., 2019b. Language models are unsupervised multitask learners. OpenAI blog 1, 9.spa
dc.relation.referencesRahman, A., Mahdavi-Hezaveh, R., Williams, L., 2019. A systematic mapping study of infrastructure as code research. Information and Software Technology 108, 65–77spa
dc.relation.referencesThe RedMonk Programming Language Rankings: Jan- uary 2023 — redmonk.comspa
dc.relation.referencesRühling Cachay, S., Boecking, B., Dubrawski, A., 2021. End-to-end weak supervi- sion, in: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (Eds.), Advances in Neural Information Processing Systems, Curran Associates, Inc.. pp. 1845–1857spa
dc.relation.referencesSalehinejad, H., Sankar, S., Barfett, J., Colak, E., Valaee, S., 2018. Recent advances in recurrent neural networks.spa
dc.relation.referencesSavidis, A., Savvaki, K., 2021. Software architecture mining from source code with dependency graph clustering and visualizationspa
dc.relation.referencesSchmidt, F., MacDonell, S.G., Connor, A.M., 2014. An automatic architecture re- construction and refactoring framework, in: International Conference on Software Engineering Research and Applications.spa
dc.relation.referencesSchuster, M., Paliwal, K., 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing 45, 2673–2681.spa
dc.relation.referencesSehovac, L., Grolinger, K., 2020. Deep learning for load forecasting: Sequence to sequence recurrent neural networks with attention. IEEE Access 8, 36411–36426spa
dc.relation.referencesSharma, A., Kumar, M., Agarwal, S., 2015. A complete survey on software archi- tectural styles and patterns. Procedia Computer Science 70, 16–28spa
dc.relation.referencesSharma, S., Sharma, S., Athaiya, A., 2017. Activation functions in neural networks. Towards Data Sci 6, 310–316spa
dc.relation.referencesShin, C., Li, W., Vishwakarma, H., Roberts, N.C., Sala, F., 2021. Universalizing weak supervision. CoRR abs/2112.03865spa
dc.relation.referencesShrestha, A., Mahmood, A., 2019. Review of deep learning algorithms and archi- tectures. IEEE Access 7, 53040–53065.spa
dc.relation.referencesSiow, J.K., Liu, S., Xie, X., Meng, G., Liu, Y., 2022. Learning program seman- tics with code representations: An empirical study, in: 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pp.spa
dc.relation.referencesSmite, D., Moe, N.B., Levinta, G., Floryan, M., 2019. Spotify guilds: How to succeed with knowledge sharing in large-scale agile organizations. IEEE Software 36, 51–57.spa
dc.relation.referencesSriram, A., Jun, H., Satheesh, S., Coates, A., 2017. Cold fusion: Training seq2seq models together with language models.spa
dc.relation.referencesSundararaman, D., Subramanian, V., Wang, G., Si, S., Shen, D., Wang, D., Carin, L., 2019. Syntax-infused transformer and bert models for machine translation and natural language understanding.spa
dc.relation.referencesTaibi, D., El Ioini, N., Pahl, C., Niederkofler, J.R.S., 2020. Serverless cloud com- puting (function-as-a-service) patterns: A multivocal literature review, in: Pro- ceedings of the 10th International Conference on Cloud Computing and Services Science (CLOSER’20)spa
dc.relation.referencesVaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I., 2017. Attention is all you needspa
dc.relation.referencesWan Mohd Isa, W.A.R., Suhaimi, A.I.H., Noordin, N., Harun, A., Ismail, J., Teh, R., 2019. Cloud computing adoption reference model. Indonesian Journal of Electrical Engineering and Computer Science 16, 395.spa
dc.relation.referencesWang, Y., Wang, W., Joty, S., Hoi, S.C.H., 2021. Codet5: Identifier-aware uni- fied pre-trained encoder-decoder models for code understanding and generation.spa
dc.relation.referencesWashizaki, H., Ogata, S., Hazeyama, A., Okubo, T., Fernandez, E.B., Yoshioka, N., 2020. Landscape of architecture and design patterns for iot systems. IEEE Internet of Things Journal 7, 10091–10101spa
dc.relation.referencesYussupov, V., Soldani, J., Breitenbücher, U., Brogi, A., Leymann, F., 2021. From serverful to serverless: A spectrum of patterns for hosting application components, pp. 268–279spa
dc.relation.referencesZeng, C., Yu, Y., Li, S., Xia, X., Wang, Z., Geng, M., Xiao, B., Dong, W., Liao, X., 2021. degraphcs: Embedding variable-based flow graph for neural code searchspa
dc.relation.referencesZhang, J., Wang, X., Zhang, H., Sun, H., Wang, K., Liu, X., 2019. A novel neural source code representation based on abstract syntax tree, in: 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pp. 783–794spa
dc.relation.referencesZhang, X., Fan, J., Hei, M., 2022. Compressing bert for binary text classification via adaptive truncation before fine-tuning. Applied Sciences 12.spa
dc.relation.referencesZhuang, F., Qi, Z., Duan, K., Xi, D., Zhu, Y., Zhu, H., Xiong, H., He, Q., 2021. A comprehensive survey on transfer learning. Proceedings of the IEEE 109, 43–76.spa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.rights.creativecommonsAtribución 4.0 Internacional (CC BY 4.0)spa
dc.subject.armarcInfraestructura como código
dc.subject.armarcConocimiento implícito
dc.subject.armarcTransferencia de conocimiento
dc.subject.armarcPatrones de arquitectura
dc.subject.armarcModelos de lenguaje
dc.subject.proposalInfraestructura como códigospa
dc.subject.proposalConocimiento implícitospa
dc.subject.proposalTransferencia de conocimientospa
dc.subject.proposalPatrones de arquitecturaspa
dc.subject.proposalModelos de lenguajespa
dc.subject.proposalInfrastructure as codeeng
dc.subject.proposalImplicit knowledgeeng
dc.subject.proposalKnowledge transfereng
dc.subject.proposalArchitecture patternseng
dc.subject.proposalLanguage modelseng
dc.type.coarhttp://purl.org/coar/resource_type/c_bdccspa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/masterThesisspa
dc.type.redcolhttps://purl.org/redcol/resource_type/TMspa


Ficheros en el ítem

Thumbnail
Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

https://creativecommons.org/licenses/by/4.0/
Excepto si se señala otra cosa, la licencia del ítem se describe como https://creativecommons.org/licenses/by/4.0/