Mostrar el registro sencillo del ítem

dc.contributor.authorGarzón, Wilmer
dc.contributor.authorBenavides, Luis Alberto
dc.contributor.authorGignard, Alban
dc.contributor.authorSüdholt, Mario
dc.date.accessioned2024-07-11T16:51:03Z
dc.date.available2024-07-11T16:51:03Z
dc.date.issued2022
dc.identifier.urihttps://repositorio.escuelaing.edu.co/handle/001/3156
dc.description.abstractThe amount of biomedical data collected and stored has grown significantly. Analyzing these extensive amounts of data cannot be done by individuals or single organizations anymore. Thus, the scientific community is creating global collaborative efforts to analyze these data. However, biomedical data is subject to several legal and socio- economic restrictions hindering the possibilities for research collaboration. In this paper, we argue that researchers require new tools and techniques to address the restrictions and needs of global scientific collaborations over geo-distributed biomedical data. These tools and techniques must support what we call Fully Distributed Collaborations (FDC), which are research endeavors that harness means to exploit and analyze massive biomedical information collaboratively while respecting legal and socio-economical restrictions. This paper first motivates and discusses the requirements of FDCs in the context of a research collaboration on the development of diagnostic and predictive tools for the risk of intracranial aneurysm formation and rupture (the ICAN project). The paper then presents a taxonomy classifying the current tools and techniques for biomedical analysis with respect to the proposed requirements. The taxonomy considers three key architectural features to support FDC scenarios: data and computation placement, Privacy and Security, and Performance and Scalability. The review reveals new research opportunities to design tools and techniques for multi-site analyses encouraging scientific collaborations while mitigating technical and legal constraints.eng
dc.description.abstractLa cantidad de datos biomédicos recopilados y almacenados ha aumentado significativamente. El análisis de estas grandes cantidades de datos ya no lo pueden realizar individuos ni organizaciones individuales. Así, la comunidad científica está creando esfuerzos colaborativos globales para analizar estos datos. Sin embargo, los datos biomédicos están sujetos a varias restricciones legales y socioeconómicas que obstaculizan las posibilidades de colaboración en investigación. En este artículo, sostenemos que los investigadores necesitan nuevas herramientas y técnicas para abordar las restricciones y necesidades de las colaboraciones científicas globales sobre datos biomédicos geodistribuidos. Estas herramientas y técnicas deben respaldar lo que llamamos Colaboraciones Totalmente Distribuidas (FDC), que son esfuerzos de investigación que aprovechan los medios para explotar y analizar información biomédica masiva de manera colaborativa respetando las restricciones legales y socioeconómicas. En primer lugar, este artículo motiva y analiza los requisitos de los CDF en el contexto de una colaboración de investigación sobre el desarrollo de herramientas de diagnóstico y predicción del riesgo de formación y rotura de aneurismas intracraneales (el proyecto ICAN). Luego, el artículo presenta una taxonomía que clasifica las herramientas y técnicas actuales para el análisis biomédico con respecto a los requisitos propuestos. La taxonomía considera tres características arquitectónicas clave para admitir escenarios FDC: ubicación de datos y cálculos, privacidad y seguridad, y rendimiento y escalabilidad. La revisión revela nuevas oportunidades de investigación para diseñar herramientas y técnicas para análisis multisitio que fomenten colaboraciones científicas y al mismo tiempo mitiguen las limitaciones técnicas y legales.spa
dc.format.extent17 páginasspa
dc.format.mimetypeapplication/pdfspa
dc.language.isoengspa
dc.publisherElsevier Ltdspa
dc.sourcewww.elsevier.com/locate/imuspa
dc.titleA taxonomy of tools and approaches for distributed genomic analyseseng
dc.typeArtículo de revistaspa
dc.type.versioninfo:eu-repo/semantics/publishedVersionspa
oaire.accessrightshttp://purl.org/coar/access_right/c_abf2spa
oaire.versionhttp://purl.org/coar/version/c_970fb48d4fbd8a85spa
dc.contributor.researchgroupCTG - Informáticaspa
dc.identifier.eissn2352-9148spa
dc.identifier.instnameUniversidad Escuela Colombiana de Ingeniería Julio Garavitospa
dc.identifier.reponameRepositorio Digitalspa
dc.identifier.repourlhttps://repositorio.escuelaing.edu.co/spa
dc.publisher.placeBogotá (Colombia)spa
dc.relation.citationeditionVol. 32 año 2022spa
dc.relation.citationendpage17spa
dc.relation.citationstartpage1spa
dc.relation.citationvolume32spa
dc.relation.ispartofjournalInformatics in Medicine Unlockedeng
dc.relation.referencesAbouelhoda M, Issa SA, Ghanem M. Tavaxy: integrating taverna and galaxy workflows with cloud computing support. BMC Bioinfo 2012;13:77. https://doi. org/10.1186/1471-2105-13-77spa
dc.relation.referencesAbu-Doleh A, Catalyurek UV. Spaler: spark and GraphX based de novo genome assembler. In: 2015 IEEE international conference on big data (big data). IEEE; 2015. https://doi.org/10.1109/bigdata.2015.7363853spa
dc.relation.referencesAbuín JM, Pichel JC, Pena TF, Amigo J. SparkBWA: speeding up the alignment of high-throughput DNA sequencing data. PLOS ONE 2016;11:e0155461. https:// doi.org/10.1371/journal.pone.0155461spa
dc.relation.referencesAl-Zoubi K, Wainer G. Modelling fog amp; cloud collaboration methods on large scale. In: 2020 winter simulation conference. WSC); 2020. p. 2161–72. https:// doi.org/10.1109/WSC48552.2020.9384058spa
dc.relation.referencesAlmeida JS, Grüneberg A, Maass W, Vinga S. Fractal MapReduce decomposition of sequence alignment. Algorithm Mol Biol 2012;7. https://doi.org/10.1186/ 1748-7188-7-12spa
dc.relation.referencesANR. IntraCranial ANeurysms: from familial forms to pathophysiological mechanisms – I-CAN. 2019. http://www.agence-nationale-recherche.fr/Project- ANR-15-CE17-0008. [Accessed 10 October 2019]spa
dc.relation.referencesAtkinson M, Gesing S, Montagnat J, Taylor I. Scientific workflows: past, present and future. 2017. https://doi.org/10.1016/j.future.2017.05.041spa
dc.relation.referencesBarillot C, Bannier E, Commowick O, Corouge I, Baire A, Fakhfakh I, Guillaumont J, Yao Y, Kain M. Shanoir: applying the software as a service distribution model to manage brain imaging research repositories. Front ICT 2016;3:25. URL: https://www.frontiersin.org/article/10.3389/fict.2016.00025spa
dc.relation.referencesBarseghian D, Altintas I, et al. Workflows and extensions to the kepler scientific workflow system to support environmental sensor data access and analysis. Ecol Inf 2010;5:42–50. https://doi.org/10.1016/j.ecoinf.2009.08.008spa
dc.relation.referencesBez M, Fornari G, Vardanega T. The scalability challenge of ethereum: an initial quantitative analysis. In: 2019 IEEE international conference on service-oriented system engineering (SOSE). IEEE; 2019. https://doi.org/10.1109/ sose.2019.00031spa
dc.relation.referencesBondiombouy C, Valduriez P. Query processing in multistore systems: an overview. Int J Cloud Comput 2016;5:309–46spa
dc.relation.referenceszahra Boujdad F, Sudholt M. Constructive privacy for shared genetic data. In: Proceedings of the 8th international conference on cloud computing and services science. SCITEPRESS - Science and Technology Publications; 2018. https://doi. org/10.5220/0006765804890496spa
dc.relation.referencesBoujdad FZ, Gaignard A, et al. On distributed collaboration for biomedical analyses. In: 2019 19th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGRID), IEEE; 2019. https://doi.org/10.1109/ ccgrid.2019.00079spa
dc.relation.referencesBoujdad FZ, Niyitegeka D, Bellafqira R, Gouenou C, Emmanuelle G, Südholt M. A hybrid cloud deployment architecture for privacy-preserving collaborative genome-wide association studies. In: ICDF2C 2021 - 12th EAI international conference on digital forensics & cyber crime; 2021spa
dc.relation.referencesBourcier R, Chatel S, et al. Understanding the pathophysiology of intracranial aneurysm: the ICAN project. Neurosurgery 2017;80:621–6. https://doi.org/ 10.1093/neuros/nyw135spa
dc.relation.referencesBux M, Brandt J, Witt C, Dowling J, Leser U. Hi-way: execution of scientific workflows on hadoop yarn. In: 20th international conference on extending database technology, EDBT 2017, 21 march 2017 through 24 march 2017, Open Proceedings. Org; 2017. p. 668–79. https://doi.org/10.5441/002/edbt.2017.87spa
dc.relation.referencesBux M, Leser U. Parallelization in scientific workflow management systems. 2013. arXiv preprint arXiv:1303.7195spa
dc.relation.referencesCanali C, Lancellotti R, Mione S. Collaboration strategies for fog computing under heterogeneous network-bound scenarios. In: 2020 IEEE 19th international symposium on network computing and applications. NCA); 2020. p. 1–8. https:// doi.org/10.1109/NCA51143.2020.9306730spa
dc.relation.referencesCano I, Weimer M, Mahajan D, Curino C, Fumarola GM. Towards geo-distributed machine learning. 2016. arXiv preprint arXiv:1603.09035spa
dc.relation.referencesde Castro MR, dos Santos Tostes C, et al. SparkBLAST: scalable BLAST processing using in-memory operations. BMC Bioinf 2017;18. https://doi.org/10.1186/ s12859-017-1723-8spa
dc.relation.referencesCattaneo G, Giancarlo R, et al. MapReduce in computational biology - a synopsis. 10.1007%2F978-3-319-57711-1_5. In: Advances in artificial life, evolutionary computation, and systems chemistry. Springer International Publishing; 2017. p. 53–64. URLspa
dc.relation.referencesCattaneo G, Petrillo UF, Giancarlo R, Roscigno G. An effective extension of the applicability of alignment-free biological sequence comparison algorithms with hadoop. J Supercomput 2016;73:1467–83. https://doi.org/10.1007/s11227-016- 1835-3spa
dc.relation.referencesChang YJ, Chen CC, Chen CL, Ho JM. A de novo next generation genomic sequence assembler based on string graph and MapReduce cloud computing framework. In: BMC genomics, BioMed central; 2012. S28. https://doi.org/ 10.1186/1471-2164-13-S7-S28spa
dc.relation.referencesChen Z, Hu J, Min G, Chen X. Effective data placement for scientific workflows in mobile edge computing using genetic particle swarm optimization. Concurrency Comput: Pract Ex 2019;e5413doi. https://doi.org/10.1002/cpe.5413spa
dc.relation.referencesChervenak A, Deelman E, Foster I, Guy L, Hoschek W, Iamnitchi A, Kesselman C, Kunszt P, Ripeanu M, Schwartzkopf B, Stockinger H, Stockinger K, Tierney B. Giggle: a framework for constructing scalable replica location services. In: ACM/ IEEE SC 2002 conference (SC’02), IEEE; 2002. https://doi.org/10.1109/ sc.2002.10024spa
dc.relation.referencesClaerhout B, DeMoor G. Privacy protection for clinical and genomic data: the use of privacy-enhancing techniques in medicine. Int J Med Inf 2005;74:257–65.spa
dc.relation.referencesCohen-Boulakia S, Belhajjame K, et al. Scientific workflows for computational reproducibility in the life sciences: status, challenges and opportunities. Future Generat Comput Syst 2017;75:284–98. https://doi.org/10.1016/j. future.2017.01.012spa
dc.relation.referencesColosimo ME, Peterson MW, Mardis S, Hirschman L. Nephele: genotyping via complete composition vectors and MapReduce. Source Code Biol Med 2011;6. https://doi.org/10.1186/1751-0473-6-13spa
dc.relation.referencesCommission, E., Council. Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data. http://data.europa.eu/eli/reg/2016/679/2016-05-04; 2016spa
dc.relation.referencesCongress of Colombia. Colombian data protection law. URL: https://www.fun cionpublica.gov.co/eva/gestornormativo/norma.php?i=49981. [Accessed 16 September 2021]spa
dc.relation.referencesConsortium DS, Consortium DM, Mahajan A, et al. Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nature genetics 2014;46:234. https://doi.org/10.1038/ng.2897spa
dc.relation.referencesCook CE, Lopez R, et al. The european bioinformatics institute in 2018: tools, infrastructure and training. Nucleic Acids Res 2018;47:D15–22. https://doi.org/ 10.1093/nar/gky1124spa
dc.relation.referencesCope JM, Trebon N, Tufo HM, Beckman P. Robust data placement in urgent computing environments. In: 2009 IEEE international symposium on parallel & distributed processing. IEEE; 2009. p. 1–13. https://doi.org/10.1109/ IPDPS.2009.5160914spa
dc.relation.referencesCorpas M, Kovalevskaya NV, McMurray A, Nielsen FG. A fair guide for data providers to maximise sharing of human genomic data. PLoS Comput Biol 2018; 14:e1005873. https://doi.org/10.1371/journal.pcbi.1005873spa
dc.relation.referencesDe Moor G, Claerhout B, De Meyer F. Privacy enhancing techniques. Method Inf Med 2003;42:148–53spa
dc.relation.referencesDe Roure D, Belhajjam K, Missier P, G´ omez-P´ erez JM, Palma R, Ruiz JE, Hettne K, Roos M, Klyne G, Goble C. Towards the preservation of scientific workflows. In: iPRES 2011-8th international conference on preservation of digital objects. National Library Board Singapore and Nanyang Technology University; 2011. p. 228–31spa
dc.relation.referencesDe Wit P, Pespeni MH, et al. The simple fool’s guide to population genomics via rna-seq: an introduction to high-throughput sequencing data analysis. Mol Eco Res 2012;12:1058–67. https://doi.org/10.1111/1755-0998.12003spa
dc.relation.referencesDecap D, Reumers J, Herzeel C, Costanza P, Fostier J. Halvade: scalable sequence analysis with MapReduce. Bioinformatics 2015;31:2482–8. https://doi.org/ 10.1093/bioinformatics/btv179spa
dc.relation.referencesDeelman E, Gannon D, et al. Workflows and e-science: an overview of workflow system features and capabilities. Future Generat Comput Syst 2009;25:528–40. https://doi.org/10.1016/j.future.2008.06.012spa
dc.relation.referencesDeelman E, Vahi K, et al. Pegasus, a workflow management system for science automation. Future Generat Comput Syst 2015;46:17–35. https://doi.org/ 10.1016/j.future.2014.10.008spa
dc.relation.referencesDolev S, Florissi P, et al. A survey on geographically distributed big-data processing using MapReduce. IEEE Transact Big Data 2019;5:60–80. https://doi. org/10.1109/tbdata.2017.2723473spa
dc.relation.referencesDong G, Fu X, Li H, Pan X. An accurate sequence assembly algorithm for livestock, plants and microorganism based on spark. Int J Pattern Recognit Artif Intell 2017; 31:1750024. https://doi.org/10.1142/s0218001417500240spa
dc.relation.referencesEbrahimi M, Mohan A, Kashlev A, Lu S. Bdap: a big data placement strategy for cloud-based scientific workflows. In: 2015 IEEE first international conference on big data computing service and applications. IEEE; 2015. p. 105–14. https://doi. org/10.1109/BigDataService.2015.70spa
dc.relation.referencesElmroth E, Hern´andez F, Tordsson J. Three fundamental dimensions of scientific workflow interoperability: model of computation, language, and execution environment. Future Generat Comput Syst 2010;26:245–56spa
dc.relation.referencesFakas GJ, Karakostas B. A peer to peer (P2P) architecture for dynamic workflow management. Inf Software Technol 2004;46:423–31spa
dc.relation.referencesFan J, Han F, Liu H. Challenges of big data analysis. Nat Sci Rev 2014;1:293–314. https://doi.org/10.1093/nsr/nwt032spa
dc.relation.referencesFederer LM, Lu YL, et al. Biomedical data sharing and reuse: attitudes and practices of clinical and scientific research staff. PLOS ONE 2015;10:e0129506. https://doi.org/10.1371/journal.pone.0129506spa
dc.relation.referencesFreire J, Bonnet P, Shasha D. Computational reproducibility: state-of-the-art, challenges, and database research opportunities. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data; 2012. p. 593–6spa
dc.relation.referencesFrye SV, Arkin MR, et al. Tackling reproducibility in academic preclinical drug discovery. Nat Rev Drug Discovery 2015;14:733–4. https://doi.org/10.1038/ nrd4737spa
dc.relation.referencesGil Y, Ratnakar V, et al. Wings: intelligent workflow-based design of computational experiments. IEEE Intell Syst 2011;26:62–72. https://doi.org/ 10.1109/mis.2010.9spa
dc.relation.referencesGilbert S, Lynch N. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 2002;33:51–9. https://doi.org/ 10.1145/564585.564601spa
dc.relation.referencesGoecks J, Nekrutenko A, Taylor J, Team TG. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 2010;11:R86. https://doi.org/10.1186/gb-2010- 11-8-r86.spa
dc.relation.referencesGoodman SN, Fanelli D, Ioannidis JPA. What does research reproducibility mean? Sci Translat Med 2016;8. https://doi.org/10.1126/scitranslmed.aaf5027. 341ps12–341ps12spa
dc.relation.referencesGoodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next- generation sequencing technologies. Nature Rev Genet 2016;17:333spa
dc.relation.referencesGuo R, Zhao Y, Zou Q, et al. Bioinformatics applications on Apache spark. GigaScience 2018. https://doi.org/10.1093/gigascience/giy098spa
dc.relation.referencesof Health NI, et al. Guidance: rigor and reproducibility in grant applications. 2017spa
dc.relation.referencesHuang H, Tata S, Prill RJ. BlueSNP: R package for highly scalable genome-wide association studies using hadoop clusters. Bioinformatics 2012;29:135–6. https:// doi.org/10.1093/bioinformatics/bts647spa
dc.relation.referencesHuang L, Krüger J, Sczyrba A. Analyzing large scale genomic data on the cloud with sparkhit. Bioinformatics 2017;34:1457–65. https://doi.org/10.1093/ bioinformatics/btx808spa
dc.relation.referencesHuang Y, Gottardo R. Comparability and reproducibility of biomedical data. Briefings Bioinfo 2012;14:391–401. https://doi.org/10.1093/bib/bbs078spa
dc.relation.referencesHung CL, Lin YL, Hua GJ, Hu YC. CloudTSS: a TagSNP selection approach on cloud computing. In: Communications in computer and information science. Springer Berlin Heidelberg; 2011. p. 525–34. https://doi.org/10.1007/978-3- 642-27180-9_64spa
dc.relation.referencesHutson S. Data handling errors spur debate over clinical trial. 618–618 Nature Med 2010;16. https://doi.org/10.1038/nm0610-618aspa
dc.relation.referencesKarim MR, Michel A, et al. Improving data workflow systems with cloud services and use of open data for bioinformatics research. Briefings Bioinfo 2017;19: 1035–50. https://doi.org/10.1093/bib/bbx039spa
dc.relation.referencesKhan A, Kim T, Byun H, Kim Y. Scispace: a scientific collaboration workspace for geo-distributed hpc data centers. Future Generat Comput Syst 2019;101:398–409.spa
dc.relation.referencesKhan FZ, Soiland-Reyes S, Sinnott RO, Lonie A, Goble C, Crusoe MR. Sharing interoperable workflow provenance: a review of best practices and their practical application in cwlprov. GigaScience 2019;8:giz095spa
dc.relation.referencesKim D, Vouk MA. Assessing run-time overhead of securing kepler. Procedia Comput Sci 2016;80:2281–6. https://doi.org/10.1016/j.procs.2016.05.412spa
dc.relation.referencesKim JH. Genome data analysis. Springer Singapore; 2019. URL: https://www.sp ringer.com/gp/book/9789811319419spa
dc.relation.referencesKoster J, Rahmann S. Snakemake–a scalable bioinformatics workflow engine. Bioinfo 2012;28:2520–2. https://doi.org/10.1093/bioinformatics/bts480spa
dc.relation.referencesKuhn K, et al. The cancer biomedical informatics grid (cabig): infrastructure and applications for a worldwide research community. Medinfo 2007;1:330spa
dc.relation.referencesLangmead B, Hansen KD, Leek JT. Cloud-scale RNA-sequencing differential expression analysis with myrna. Genome Biol 2010;11:R83. https://doi.org/ 10.1186/gb-2010-11-8-r83spa
dc.relation.referencesLangmead B, Schatz MC, et al. Searching for SNPs with cloud computing. Genome Biol 2009;10:R134. https://doi.org/10.1186/gb-2009-10-11-r134spa
dc.relation.referencesLegislature CS. The California consumer privacy act of. 2018. https://leginfo.legi slature.ca.gov/faces/billTextClient.xhtml?bill_id=201720180SB1121spa
dc.relation.referencesLeo S, Santoni F, Zanetti G. Biodoop: bioinformatics on hadoop. In: 2009 international conference on parallel processing workshops. IEEE; 2009. https:// doi.org/10.1109/icppw.2009.37spa
dc.relation.referencesLi R, Li Y, Fang X, Yang H, Wang J, Kristiansen K, Wang J. SNP detection for massively parallel whole-genome resequencing. Genome Res 2009;19:1124–32. https://doi.org/10.1101/gr.088013.108spa
dc.relation.referencesLi X, Zhang L, et al. A novel workflow-level data placement strategy for data- sharing scientific cloud workflows. IEEE Transact Serv Comput 2016. https://doi. org/10.1109/TSC.2016.2625247spa
dc.relation.referencesLiu J, Pacitti E, Valduriez P, Mattoso M. Parallelization of scientific workflows in the cloud. 2014spa
dc.relation.referencesLiu J, Pacitti E, Valduriez P, Mattoso M. A survey of data-intensive scientific workflow management. J Grid Comput 2015;13:457–93. https://doi.org/ 10.1007/s10723-015-9329-8spa
dc.relation.referencesLiu J, Pacitti E, Valduriez P, Mattoso M. Scientific workflow scheduling with provenance data in a multisite cloud. In: Transactions on large-scale data-and knowledge-centered systems XXXIII. Springer; 2017. p. 80–112spa
dc.relation.referencesLiu J, Pineda L, Pacitti E, Costan A, Valduriez P, Antoniu G, Mattoso M. Efficient scheduling of scientific workflows using hot metadata in a multisite cloud. IEEE Transact Knowl Data Eng 2019;31:1940–53. https://doi.org/10.1109/ tkde.2018.2867857spa
dc.relation.referencesLiu X, Datta A. Towards intelligent data placement for scientific workflows in collaborative cloud environment. In: 2011 IEEE international symposium on parallel and distributed processing workshops and phd forum. IEEE; 2011. p. 1052–61. https://doi.org/10.1109/IPDPS.2011.259spa
dc.relation.referencesLiu Y, Zhang L, Ge N, Li G. A systematic literature review on federated learning: from a model quality perspective. 2020. arXiv preprint arXiv:2012.01973spa
dc.relation.referencesLu S, Zhang J. Collaborative scientific workflows supporting collaborative science. Int J Bus Process Integrat Manag 2011;5:185. https://doi.org/10.1504/ ijbpim.2011.040209spa
dc.relation.referencesLu YY, Tang K, et al. CAFE: aCcelerated Alignment-FrEe sequence analysis. Nucleic acids research 2017;45:W554–9. https://doi.org/10.1093/nar/gkx351spa
dc.relation.referencesMalin BA, Emam KE, O’Keefe CM. Biomedical data privacy: problems, perspectives, and recent advances. 2013spa
dc.relation.referencesMcKenna A, Hanna M, Banks E, et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010;20:1297–303. https://doi.org/10.1101/gr.107524.110spa
dc.relation.referencesMcMahan B, Moore E, Ramage D, Hampson S, y Arcas BA. Communication- efficient learning of deep networks from decentralized data. In: Singh A, Zhu J, editors. Proceedings of the 20th international conference on artificial intelligence and statistics. Fort Lauderdale, FL, USA: PMLR; 2017. p. 1273–82. URL: http://pr oceedings.mlr.press/v54/mcmahan17a.htmlspa
dc.relation.referencesMoreau L, Missier P, Cheney J, Soiland-Reyes S. Prov-n: the provenance notation. 2013spa
dc.relation.referencesNagappan M, Vouk MA. A model for sharing of confidential provenance information in a query based system. In: International provenance and annotation workshop. Springer; 2008. p. 62–9. https://doi.org/10.1007/978-3-540-89965-5_ 8spa
dc.relation.referencesNguyen T, Shi W, Ruden D. CloudAligner: a fast and full-featured MapReduce based tool for sequence mapping. BMC Res Notes 2011;4. https://doi.org/ 10.1186/1756-0500-4-171spa
dc.relation.referencesNHGRI-EBI. GWAS catalog. 2019. https://www.ebi.ac.uk/gwas/. accessed 20- Sept-2019spa
dc.relation.referencesNIH-BMIC. NIH data sharing repositories. 2019. https://www.nlm.nih.gov/NIH bmic/nih_data_sharing_repositories.html. accessed 20-Sept-2019spa
dc.relation.referencesNordberg H, Bhatia K, Wang K, Wang Z. BioPig: a hadoop-based analytic toolkit for large-scale sequence data. Bioinformatics 2013;29:3014–9. https://doi.org/ 10.1093/bioinformatics/btt528spa
dc.relation.referencesNSF, 2019. Chapter XI - Other Post Award Requirements and Consideration. https://www.nsf.gov/pubs/policydocs/pappg19_1/pappg_11.jsp\#XID4. [Online; accessed 20-June-2019]spa
dc.relation.referencesO’Brien AR, Saunders NFW, et al. VariantSpark: population scale clustering of genotype information. BMC Genom 2015;16. https://doi.org/10.1186/s12864- 015-2269-7spa
dc.relation.referencesPandey RV, Schl¨otterer C. DistMap: a toolkit for distributed short read mapping on a hadoop cluster. PLoS ONE 2013;8:e72614. https://doi.org/10.1371/journal. pone.0072614spa
dc.relation.referencesPapageorgiou L, Eleni P, et al. Genomic big data hitting the storage bottleneck. EMBnetjournal 2018;24:e910. https://doi.org/10.14806/ej.24.0.910spa
dc.relation.referencesParks R, Chu CH, Xu H. Healthcare information privacy research: iusses, gaps and what next? AMCIS; 2011spa
dc.relation.referencesPeteiro-Barral D, Guijarro-Berdi˜ nas B. A survey of methods for distributed machine learning. Prog Artif Intell 2013;2:1–11spa
dc.relation.referencesPineda-Morales L, Costan A, Antoniu G. Towards multi-site metadata management for geographically distributed cloud workflows. In: 2015 IEEE international conference on cluster computing. IEEE; 2015. p. 294–303. https:// doi.org/10.1109/cluster.2015.49spa
dc.relation.referencesPineda-Morales L, Liu J, Costan A, Pacitti E, Antoniu G, Valduriez P, Mattoso M. Managing hot metadata for scientific workflows on multisite clouds. In: 2016 IEEE international conference on big data (big data). IEEE; 2016. p. 390–7spa
dc.relation.referencesPireddu L, Leo S, Zanetti G. SEAL: a distributed short read mapping and duplicate removal tool. Bioinformatics 2011;27:2159–60. https://doi.org/10.1093/ bioinformatics/btr325spa
dc.relation.referencesRasheed Z, Rangwala H. A map-reduce framework for clustering metagenomes. In: 2013 IEEE international symposium on parallel & distributed processing, workshops and phd forum. IEEE; 2013. https://doi.org/10.1109/ ipdpsw.2013.100spa
dc.relation.referencesRasheed Z, Rangwala H. A map-reduce framework for clustering metagenomes. In: 2013 IEEE international symposium on parallel & distributed processing, workshops and phd forum. IEEE; 2013. https://doi.org/10.1109/ ipdpsw.2013.100spa
dc.relation.referencesRodriguez MA, Buyya R. Scientific workflow management system for clouds. In: Software architecture for big data and the cloud. Elsevier; 2017. p. 367–87. https://doi.org/10.1016/b978-0-12-805467-3.00018-1spa
dc.relation.referencesRoss RB, Thakur R, et al. Pvfs: a parallel file system for linux clusters. In: Proceedings of the 4th annual Linux showcase and conference; 2000. p. 391–430spa
dc.relation.referencesRoss RB, Thakur R, et al. Pvfs: a parallel file system for linux clusters. In: Proceedings of the 4th annual Linux showcase and conference; 2000. p. 391–430spa
dc.relation.referencesSalloum S, Dautov R, et al. Big data analytics on Apache spark. Int J Data Sci Anal 2016;1:145–64. https://doi.org/10.1007/s41060-016-0027-9spa
dc.relation.referencesSantana-Perez I, P´ erez-Hern´ andez MS. Towards reproducibility in scientific workflows: an infrastructure-based approach. Scientific Program 2015:1–11. https://doi.org/10.1155/2015/243180spa
dc.relation.referencesSchadt EE, Linderman MD, et al. Computational solutions to large-scale data management and analysis. Nature Rev Genet 2010;11:647–57. https://doi.org/ 10.1038/nrg2857spa
dc.relation.referencesSchatz MC. BlastReduce: high performance short read mapping with MapReduce. University of Maryland; 2008. http://cgis.cs.umd.edu/Grad/scholarlypapers/pa pers/MichaelSchatz.pdfspa
dc.relation.referencesSchatz MC. BlastReduce: high performance short read mapping with MapReduce. University of Maryland; 2008. http://cgis.cs.umd.edu/Grad/scholarlypapers/pa pers/MichaelSchatz.pdfspa
dc.relation.referencesSchatz MC, Sommer D, Kelley D, Pop M. De novo assembly of large genomes using cloud computing. In: Proceedings of the cold spring harbor biology of genomes conference; 2010spa
dc.relation.referencesSchatz MC, Sommer D, Kelley D, Pop M. De novo assembly of large genomes using cloud computing. In: Proceedings of the cold spring harbor biology of genomes conference; 2010spa
dc.relation.referencesSenturk IF, Balakrishnan P, et al. A resource provisioning framework for bioinformatics applications in multi-cloud environments. Future Generat Comput Syst 2018;78:379–91. https://doi.org/10.1016/j.future.2016.06.008spa
dc.relation.referencesSharov AA, Schlessinger D, Ko MSH. ExAtlas: an interactive online tool for meta- analysis of gene expression data. J Bioinfo Comput Biol 2015;13:1550019. https://doi.org/10.1142/s0219720015500195spa
dc.relation.referencesSoiland-Reyes S, Alper P, Goble C. Tracking workflow execution with tavernaprov. In: PROV: three tears later: Provenance Week 2016; 2016spa
dc.relation.referencesStephens ZD, Lee SY, et al. Big data: astronomical or genomical? PLOS Biology 2015;13:e1002195. https://doi.org/10.1371/journal.pbio.1002195spa
dc.relation.referencesTannenbaum T, Wright D, Miller K, Livny M. Condor: a distributed job scheduler. In: Beowulf cluster computing with windows; 2001. p. 307–50spa
dc.relation.referencesTaylor I, Shields M, Wang I, Harrison A. The triana workflow environment: architecture and applications. In: Workflows for e-Science. Springer; 2007. p. 320–39. https://doi.org/10.1007/978-1-84628-757-2_20spa
dc.relation.referencesTaylor IJ, Deelman E, et al. Workflows for e-Science: scientific workflows for grids, ume 1. Springer; 2007. https://doi.org/10.1007/978-1-84628-757-2spa
dc.relation.referencesThain D, Tannenbaum T, Livny M. Distributed computing in practice: the condor experience. Concurr Comput: Pract Exp 2005;17:323–56. https://doi.org/ 10.1002/cpe.938spa
dc.relation.referencesTommaso PD, Chatzou M, et al. Nextflow enables reproducible computational workflows. Nature Biotechnol 2017;35:316–9. https://doi.org/10.1038/ nbt.3820spa
dc.relation.referencesTurakhia MP, Desai M, Hedlin H, Rajmane A, Talati N, Ferris T, Desai S, Nag D, Patel M, Kowey P, Rumsfeld JS, Russo AM, Hills MT, Granger CB, Mahaffey KW, Perez MV. Rationale and design of a large-scale, app-based study to identify cardiac arrhythmias using a smartwatch: the apple heart study. Am Heart J 2019; 207:66–75. https://doi.org/10.1016/j.ahj.2018.09.002. https://www.sciencedi rect.com/science/article/pii/S0002870318302710.spa
dc.relation.referencesUnion I. Communication from the commission to the european parliament, the council, the european economic and social committee and the committee of the regions. A new skills agenda for europe. 2014 [Brussels].spa
dc.relation.referencesValduriez P, Mattoso M, Akbarinia R, Borges H, Camata J, Coutinho A, Gaspar D, Lemus N, Liu J, Lustosa H, et al. Scientific data analysis using data-intensive scalable computing: the scidisc project. In: LADaS: Latin America data science workshop, CEUR-WS. Org; 2018spa
dc.relation.referencesVan Hung T, Chuanhe H. An effective data placement strategy in main-memory database cluster. In: 2011 second international conference on networking and distributed computing. IEEE; 2011. p. 93–8. https://doi.org/10.1109/ ICNDC.2011.27.spa
dc.relation.referencesVerbraeken J, Wolting M, Katzy J, Kloppenburg J, Verbelen T, Rellermeyer JS. A survey on distributed machine learning. ACM Comput Surv (CSUR) 2020;53: 1–33spa
dc.relation.referencesWang J, Crawl D, Altintas I. Kepler + hadoop. In: Proceedings of the 4th workshop on workflows in support of large-scale science - WORKS ’09. ACM Press; 2009. https://doi.org/10.1145/1645164.1645176spa
dc.relation.referencesWang R, Li M, Peng L, Hu Y, Hassan MM, Alelaiwi A. Cognitive multi-agent empowering mobile edge computing for resource caching and collaboration. Future Generat Comput Syst 2020;102:66–74. https://doi.org/10.1016/j. future.2019.08.001. URL: https://www.sciencedirect.com/science/article/pii/ S0167739X19318783spa
dc.relation.referencesWang Y. Automating experimentation with distributed systems using generative techniques. Ph.D. thesis. University of Colorado at Boulder; 2006spa
dc.relation.referencesWang Y, Carzaniga A, Wolf AL. Four enhancements to automated distributed system experimentation methods. In: Proceedings of the 30th international conference on Software engineering; 2008. p. 491–500spa
dc.relation.referencesWiewi´ orka MS, Messina A, et al. SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision. Bioinformatics 2014;30:2652–3. https://doi.org/10.1093/bioinformatics/btu343spa
dc.relation.referencesWilde M, Hategan M, et al. Swift: a language for distributed parallel scripting. Parallel Comput 2011;37:633–52. https://doi.org/10.1016/j.parco.2011.05.005.spa
dc.relation.referencesWolstencroft K, Haines R, et al. The taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud. Nucleic Acids Res 2013;41:W557–61. https://doi.org/10.1093/nar/gkt328spa
dc.relation.referencesXiao Y, Zhou AC, Yang X, He B. Privacy-preserving workflow scheduling in geo- distributed data centers. Future Generat Comput Syst 2022;130:46–58spa
dc.relation.referencesXie J, Yin S, et al. Improving mapreduce performance through data placement in heterogeneous hadoop clusters. In: 2010 IEEE international symposium on parallel & distributed processing, workshops and phd forum (IPDPSW). IEEE; 2010. p. 1–9. https://doi.org/10.1109/IPDPSW.2010.547088spa
dc.relation.referencesXie T. Sea: a striping-based energy-aware strategy for data placement in raid- structured storage systems. IEEE Transact Comput 2008;57:748–61. https://doi. org/10.1109/TC.2008.27spa
dc.relation.referencesXing EP, Ho Q, Dai W, Kim JK, Wei J, Lee S, Zheng X, Xie P, Kumar A, Yu Y. Petuum: a new platform for distributed machine learning on big data. IEEE Transact Big Data 2015;1:49–67. https://doi.org/10.1109/tbdata.2015.2472014spa
dc.relation.referencesXu B, Gao J, Li C. An efficient algorithm for DNA fragment assembly in MapReduce. Biochem Biophys Res Commun 2012;426:395–8. https://doi.org/ 10.1016/j.bbrc.2012.08.101spa
dc.relation.referencesXu B, Li C, Zhuang H, et al. DSA: scalable distributed sequence alignment system using SIMD instructions. In: 2017 17th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGRID), IEEE; 2017. https://doi.org/ 10.1109/ccgrid.2017.74spa
dc.relation.referencesXu B, Li C, Zhuang H, et al. Efficient distributed smith-waterman algorithm based on Apache spark. In: 2017 IEEE 10th international conference on cloud computing (CLOUD). IEEE; 2017. https://doi.org/10.1109/cloud.2017.83spa
dc.relation.referencesYu HF, Hsieh CJ, Chang KW, Lin CJ. Large linear classification when data cannot f it in memory. In: ACM Transactions on Knowledge Discovery from Data (TKDD); 2012. p. 1–23. 5spa
dc.relation.referencesYu J, Buyya R. A taxonomy of workflow management systems for grid computing. J Grid Comput 2005;3:171–200. https://doi.org/10.1007/s10723-005-9010-8.spa
dc.relation.referencesYuan D, Yang Y, Liu X, Chen J. A data placement strategy in scientific cloud workflows. Future Generat Comput Syst 2010;26:1200–14. https://doi.org/ 10.1016/j.future.2010.02.004spa
dc.relation.referencesZhang D, Zhao L, Li B, et al. SEQSpark: a complete analysis tool for large-scale rare variant association studies using whole-genome and exome sequence data. The American J Human Genet 2017;101:115–22. https://doi.org/10.1016/j. ajhg.2017.05.017spa
dc.relation.referencesZhang L, Gu S, Liu Y, Wang B, Azuaje F. Gene set analysis in the cloud. Bioinformatics 2011;28:294–5. https://doi.org/10.1093/bioinformatics/btr630spa
dc.relation.referencesZhao G, Ling C, Sun D. SparkSW: scalable distributed computing system for large- scale biological sequence alignment. In: 2015 15th IEEE/ACM international symposium on cluster, cloud and grid computing, IEEE; 2015. https://doi.org/ 10.1109/ccgrid.2015.55spa
dc.relation.referencesZhao J, Gomez-Perez JM, Belhajjame K, Klyne G, Garcia-Cuesta E, Garrido A, Hettne K, Roos M, De Roure D, Goble C. Why workflows break—understanding and combating decay in taverna workflows. In: 2012 ieee 8th international conference on e-science. IEEE; 2012. p. 1–9spa
dc.relation.referencesZhao Q, Xiong, et al. A new energy-aware task scheduling method for data- intensive applications in the cloud. J Network Comput Appl 2016;59:14–27. https://doi.org/10.1016/j.jnca.2015.05.001spa
dc.relation.referencesZhao Y, Li Y, Raicu I, Lu S, Tian W, Liu H. Enabling scalable scientific workflow management in the cloud. Future Generat Comput Syst 2015;46:3–16. https:// doi.org/10.1016/j.future.2014.10.023.spa
dc.relation.referencesZhou W, Li R, Yuan S, et al. MetaSpark: a spark-based distributed processing tool to recruit metagenomic reads to reference genomes. Bioinformatics 2017. https:// doi.org/10.1093/bioinformatics/btw750. btw750spa
dc.relation.referencesZielezinski A, Vinga S, Almeida J, Karlowski WM. Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol 2017;18. https://doi. org/10.1186/s13059-017-1319-7spa
dc.relation.referencesZytnicki M, Quesneville H. S-MART, a software toolbox to aid RNA-seq data analysis. PLoS ONE 2011;6:e25988. https://doi.org/10.1371/journal. pone.0025988.spa
dc.rights.accessrightsinfo:eu-repo/semantics/openAccessspa
dc.subject.armarcBiometría
dc.subject.armarcBiometry
dc.subject.armarcAnálisis de la información
dc.subject.armarcInformation analysis
dc.subject.armarcInvestigación biomédica
dc.subject.armarcBiomedical research
dc.subject.armarcTecnología médica
dc.subject.armarcMedical technology
dc.subject.proposalDistributed biomedical analyseseng
dc.subject.proposalAnálisis biomédicos distribuidosspa
dc.subject.proposalFully distributed collaborationseng
dc.subject.proposalColaboraciones totalmente distribuidasspa
dc.subject.proposalReproducibilityeng
dc.subject.proposalReproducibilidadspa
dc.subject.proposalScalability Multi-site analyseseng
dc.subject.proposalAnálisis de escalabilidad multisitiospa
dc.subject.proposalDistributed workflow analyseseng
dc.subject.proposalAnálisis de flujo de trabajo distribuidospa
dc.type.coarhttp://purl.org/coar/resource_type/c_6501spa
dc.type.contentTextspa
dc.type.driverinfo:eu-repo/semantics/articlespa


Ficheros en el ítem

Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem