Mostrar el registro sencillo del ítem
A taxonomy of tools and approaches for distributed genomic analyses
dc.contributor.author | Garzón, Wilmer | |
dc.contributor.author | Benavides, Luis Alberto | |
dc.contributor.author | Gignard, Alban | |
dc.contributor.author | Südholt, Mario | |
dc.date.accessioned | 2024-07-11T16:51:03Z | |
dc.date.available | 2024-07-11T16:51:03Z | |
dc.date.issued | 2022 | |
dc.identifier.uri | https://repositorio.escuelaing.edu.co/handle/001/3156 | |
dc.description.abstract | The amount of biomedical data collected and stored has grown significantly. Analyzing these extensive amounts of data cannot be done by individuals or single organizations anymore. Thus, the scientific community is creating global collaborative efforts to analyze these data. However, biomedical data is subject to several legal and socio- economic restrictions hindering the possibilities for research collaboration. In this paper, we argue that researchers require new tools and techniques to address the restrictions and needs of global scientific collaborations over geo-distributed biomedical data. These tools and techniques must support what we call Fully Distributed Collaborations (FDC), which are research endeavors that harness means to exploit and analyze massive biomedical information collaboratively while respecting legal and socio-economical restrictions. This paper first motivates and discusses the requirements of FDCs in the context of a research collaboration on the development of diagnostic and predictive tools for the risk of intracranial aneurysm formation and rupture (the ICAN project). The paper then presents a taxonomy classifying the current tools and techniques for biomedical analysis with respect to the proposed requirements. The taxonomy considers three key architectural features to support FDC scenarios: data and computation placement, Privacy and Security, and Performance and Scalability. The review reveals new research opportunities to design tools and techniques for multi-site analyses encouraging scientific collaborations while mitigating technical and legal constraints. | eng |
dc.description.abstract | La cantidad de datos biomédicos recopilados y almacenados ha aumentado significativamente. El análisis de estas grandes cantidades de datos ya no lo pueden realizar individuos ni organizaciones individuales. Así, la comunidad científica está creando esfuerzos colaborativos globales para analizar estos datos. Sin embargo, los datos biomédicos están sujetos a varias restricciones legales y socioeconómicas que obstaculizan las posibilidades de colaboración en investigación. En este artículo, sostenemos que los investigadores necesitan nuevas herramientas y técnicas para abordar las restricciones y necesidades de las colaboraciones científicas globales sobre datos biomédicos geodistribuidos. Estas herramientas y técnicas deben respaldar lo que llamamos Colaboraciones Totalmente Distribuidas (FDC), que son esfuerzos de investigación que aprovechan los medios para explotar y analizar información biomédica masiva de manera colaborativa respetando las restricciones legales y socioeconómicas. En primer lugar, este artículo motiva y analiza los requisitos de los CDF en el contexto de una colaboración de investigación sobre el desarrollo de herramientas de diagnóstico y predicción del riesgo de formación y rotura de aneurismas intracraneales (el proyecto ICAN). Luego, el artículo presenta una taxonomía que clasifica las herramientas y técnicas actuales para el análisis biomédico con respecto a los requisitos propuestos. La taxonomía considera tres características arquitectónicas clave para admitir escenarios FDC: ubicación de datos y cálculos, privacidad y seguridad, y rendimiento y escalabilidad. La revisión revela nuevas oportunidades de investigación para diseñar herramientas y técnicas para análisis multisitio que fomenten colaboraciones científicas y al mismo tiempo mitiguen las limitaciones técnicas y legales. | spa |
dc.format.extent | 17 páginas | spa |
dc.format.mimetype | application/pdf | spa |
dc.language.iso | eng | spa |
dc.publisher | Elsevier Ltd | spa |
dc.source | www.elsevier.com/locate/imu | spa |
dc.title | A taxonomy of tools and approaches for distributed genomic analyses | eng |
dc.type | Artículo de revista | spa |
dc.type.version | info:eu-repo/semantics/publishedVersion | spa |
oaire.accessrights | http://purl.org/coar/access_right/c_abf2 | spa |
oaire.version | http://purl.org/coar/version/c_970fb48d4fbd8a85 | spa |
dc.contributor.researchgroup | CTG - Informática | spa |
dc.identifier.eissn | 2352-9148 | spa |
dc.identifier.instname | Universidad Escuela Colombiana de Ingeniería Julio Garavito | spa |
dc.identifier.reponame | Repositorio Digital | spa |
dc.identifier.repourl | https://repositorio.escuelaing.edu.co/ | spa |
dc.publisher.place | Bogotá (Colombia) | spa |
dc.relation.citationedition | Vol. 32 año 2022 | spa |
dc.relation.citationendpage | 17 | spa |
dc.relation.citationstartpage | 1 | spa |
dc.relation.citationvolume | 32 | spa |
dc.relation.ispartofjournal | Informatics in Medicine Unlocked | eng |
dc.relation.references | Abouelhoda M, Issa SA, Ghanem M. Tavaxy: integrating taverna and galaxy workflows with cloud computing support. BMC Bioinfo 2012;13:77. https://doi. org/10.1186/1471-2105-13-77 | spa |
dc.relation.references | Abu-Doleh A, Catalyurek UV. Spaler: spark and GraphX based de novo genome assembler. In: 2015 IEEE international conference on big data (big data). IEEE; 2015. https://doi.org/10.1109/bigdata.2015.7363853 | spa |
dc.relation.references | Abuín JM, Pichel JC, Pena TF, Amigo J. SparkBWA: speeding up the alignment of high-throughput DNA sequencing data. PLOS ONE 2016;11:e0155461. https:// doi.org/10.1371/journal.pone.0155461 | spa |
dc.relation.references | Al-Zoubi K, Wainer G. Modelling fog amp; cloud collaboration methods on large scale. In: 2020 winter simulation conference. WSC); 2020. p. 2161–72. https:// doi.org/10.1109/WSC48552.2020.9384058 | spa |
dc.relation.references | Almeida JS, Grüneberg A, Maass W, Vinga S. Fractal MapReduce decomposition of sequence alignment. Algorithm Mol Biol 2012;7. https://doi.org/10.1186/ 1748-7188-7-12 | spa |
dc.relation.references | ANR. IntraCranial ANeurysms: from familial forms to pathophysiological mechanisms – I-CAN. 2019. http://www.agence-nationale-recherche.fr/Project- ANR-15-CE17-0008. [Accessed 10 October 2019] | spa |
dc.relation.references | Atkinson M, Gesing S, Montagnat J, Taylor I. Scientific workflows: past, present and future. 2017. https://doi.org/10.1016/j.future.2017.05.041 | spa |
dc.relation.references | Barillot C, Bannier E, Commowick O, Corouge I, Baire A, Fakhfakh I, Guillaumont J, Yao Y, Kain M. Shanoir: applying the software as a service distribution model to manage brain imaging research repositories. Front ICT 2016;3:25. URL: https://www.frontiersin.org/article/10.3389/fict.2016.00025 | spa |
dc.relation.references | Barseghian D, Altintas I, et al. Workflows and extensions to the kepler scientific workflow system to support environmental sensor data access and analysis. Ecol Inf 2010;5:42–50. https://doi.org/10.1016/j.ecoinf.2009.08.008 | spa |
dc.relation.references | Bez M, Fornari G, Vardanega T. The scalability challenge of ethereum: an initial quantitative analysis. In: 2019 IEEE international conference on service-oriented system engineering (SOSE). IEEE; 2019. https://doi.org/10.1109/ sose.2019.00031 | spa |
dc.relation.references | Bondiombouy C, Valduriez P. Query processing in multistore systems: an overview. Int J Cloud Comput 2016;5:309–46 | spa |
dc.relation.references | zahra Boujdad F, Sudholt M. Constructive privacy for shared genetic data. In: Proceedings of the 8th international conference on cloud computing and services science. SCITEPRESS - Science and Technology Publications; 2018. https://doi. org/10.5220/0006765804890496 | spa |
dc.relation.references | Boujdad FZ, Gaignard A, et al. On distributed collaboration for biomedical analyses. In: 2019 19th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGRID), IEEE; 2019. https://doi.org/10.1109/ ccgrid.2019.00079 | spa |
dc.relation.references | Boujdad FZ, Niyitegeka D, Bellafqira R, Gouenou C, Emmanuelle G, Südholt M. A hybrid cloud deployment architecture for privacy-preserving collaborative genome-wide association studies. In: ICDF2C 2021 - 12th EAI international conference on digital forensics & cyber crime; 2021 | spa |
dc.relation.references | Bourcier R, Chatel S, et al. Understanding the pathophysiology of intracranial aneurysm: the ICAN project. Neurosurgery 2017;80:621–6. https://doi.org/ 10.1093/neuros/nyw135 | spa |
dc.relation.references | Bux M, Brandt J, Witt C, Dowling J, Leser U. Hi-way: execution of scientific workflows on hadoop yarn. In: 20th international conference on extending database technology, EDBT 2017, 21 march 2017 through 24 march 2017, Open Proceedings. Org; 2017. p. 668–79. https://doi.org/10.5441/002/edbt.2017.87 | spa |
dc.relation.references | Bux M, Leser U. Parallelization in scientific workflow management systems. 2013. arXiv preprint arXiv:1303.7195 | spa |
dc.relation.references | Canali C, Lancellotti R, Mione S. Collaboration strategies for fog computing under heterogeneous network-bound scenarios. In: 2020 IEEE 19th international symposium on network computing and applications. NCA); 2020. p. 1–8. https:// doi.org/10.1109/NCA51143.2020.9306730 | spa |
dc.relation.references | Cano I, Weimer M, Mahajan D, Curino C, Fumarola GM. Towards geo-distributed machine learning. 2016. arXiv preprint arXiv:1603.09035 | spa |
dc.relation.references | de Castro MR, dos Santos Tostes C, et al. SparkBLAST: scalable BLAST processing using in-memory operations. BMC Bioinf 2017;18. https://doi.org/10.1186/ s12859-017-1723-8 | spa |
dc.relation.references | Cattaneo G, Giancarlo R, et al. MapReduce in computational biology - a synopsis. 10.1007%2F978-3-319-57711-1_5. In: Advances in artificial life, evolutionary computation, and systems chemistry. Springer International Publishing; 2017. p. 53–64. URL | spa |
dc.relation.references | Cattaneo G, Petrillo UF, Giancarlo R, Roscigno G. An effective extension of the applicability of alignment-free biological sequence comparison algorithms with hadoop. J Supercomput 2016;73:1467–83. https://doi.org/10.1007/s11227-016- 1835-3 | spa |
dc.relation.references | Chang YJ, Chen CC, Chen CL, Ho JM. A de novo next generation genomic sequence assembler based on string graph and MapReduce cloud computing framework. In: BMC genomics, BioMed central; 2012. S28. https://doi.org/ 10.1186/1471-2164-13-S7-S28 | spa |
dc.relation.references | Chen Z, Hu J, Min G, Chen X. Effective data placement for scientific workflows in mobile edge computing using genetic particle swarm optimization. Concurrency Comput: Pract Ex 2019;e5413doi. https://doi.org/10.1002/cpe.5413 | spa |
dc.relation.references | Chervenak A, Deelman E, Foster I, Guy L, Hoschek W, Iamnitchi A, Kesselman C, Kunszt P, Ripeanu M, Schwartzkopf B, Stockinger H, Stockinger K, Tierney B. Giggle: a framework for constructing scalable replica location services. In: ACM/ IEEE SC 2002 conference (SC’02), IEEE; 2002. https://doi.org/10.1109/ sc.2002.10024 | spa |
dc.relation.references | Claerhout B, DeMoor G. Privacy protection for clinical and genomic data: the use of privacy-enhancing techniques in medicine. Int J Med Inf 2005;74:257–65. | spa |
dc.relation.references | Cohen-Boulakia S, Belhajjame K, et al. Scientific workflows for computational reproducibility in the life sciences: status, challenges and opportunities. Future Generat Comput Syst 2017;75:284–98. https://doi.org/10.1016/j. future.2017.01.012 | spa |
dc.relation.references | Colosimo ME, Peterson MW, Mardis S, Hirschman L. Nephele: genotyping via complete composition vectors and MapReduce. Source Code Biol Med 2011;6. https://doi.org/10.1186/1751-0473-6-13 | spa |
dc.relation.references | Commission, E., Council. Regulation (eu) 2016/679 of the european parliament and of the council of 27 april 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data. http://data.europa.eu/eli/reg/2016/679/2016-05-04; 2016 | spa |
dc.relation.references | Congress of Colombia. Colombian data protection law. URL: https://www.fun cionpublica.gov.co/eva/gestornormativo/norma.php?i=49981. [Accessed 16 September 2021] | spa |
dc.relation.references | Consortium DS, Consortium DM, Mahajan A, et al. Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nature genetics 2014;46:234. https://doi.org/10.1038/ng.2897 | spa |
dc.relation.references | Cook CE, Lopez R, et al. The european bioinformatics institute in 2018: tools, infrastructure and training. Nucleic Acids Res 2018;47:D15–22. https://doi.org/ 10.1093/nar/gky1124 | spa |
dc.relation.references | Cope JM, Trebon N, Tufo HM, Beckman P. Robust data placement in urgent computing environments. In: 2009 IEEE international symposium on parallel & distributed processing. IEEE; 2009. p. 1–13. https://doi.org/10.1109/ IPDPS.2009.5160914 | spa |
dc.relation.references | Corpas M, Kovalevskaya NV, McMurray A, Nielsen FG. A fair guide for data providers to maximise sharing of human genomic data. PLoS Comput Biol 2018; 14:e1005873. https://doi.org/10.1371/journal.pcbi.1005873 | spa |
dc.relation.references | De Moor G, Claerhout B, De Meyer F. Privacy enhancing techniques. Method Inf Med 2003;42:148–53 | spa |
dc.relation.references | De Roure D, Belhajjam K, Missier P, G´ omez-P´ erez JM, Palma R, Ruiz JE, Hettne K, Roos M, Klyne G, Goble C. Towards the preservation of scientific workflows. In: iPRES 2011-8th international conference on preservation of digital objects. National Library Board Singapore and Nanyang Technology University; 2011. p. 228–31 | spa |
dc.relation.references | De Wit P, Pespeni MH, et al. The simple fool’s guide to population genomics via rna-seq: an introduction to high-throughput sequencing data analysis. Mol Eco Res 2012;12:1058–67. https://doi.org/10.1111/1755-0998.12003 | spa |
dc.relation.references | Decap D, Reumers J, Herzeel C, Costanza P, Fostier J. Halvade: scalable sequence analysis with MapReduce. Bioinformatics 2015;31:2482–8. https://doi.org/ 10.1093/bioinformatics/btv179 | spa |
dc.relation.references | Deelman E, Gannon D, et al. Workflows and e-science: an overview of workflow system features and capabilities. Future Generat Comput Syst 2009;25:528–40. https://doi.org/10.1016/j.future.2008.06.012 | spa |
dc.relation.references | Deelman E, Vahi K, et al. Pegasus, a workflow management system for science automation. Future Generat Comput Syst 2015;46:17–35. https://doi.org/ 10.1016/j.future.2014.10.008 | spa |
dc.relation.references | Dolev S, Florissi P, et al. A survey on geographically distributed big-data processing using MapReduce. IEEE Transact Big Data 2019;5:60–80. https://doi. org/10.1109/tbdata.2017.2723473 | spa |
dc.relation.references | Dong G, Fu X, Li H, Pan X. An accurate sequence assembly algorithm for livestock, plants and microorganism based on spark. Int J Pattern Recognit Artif Intell 2017; 31:1750024. https://doi.org/10.1142/s0218001417500240 | spa |
dc.relation.references | Ebrahimi M, Mohan A, Kashlev A, Lu S. Bdap: a big data placement strategy for cloud-based scientific workflows. In: 2015 IEEE first international conference on big data computing service and applications. IEEE; 2015. p. 105–14. https://doi. org/10.1109/BigDataService.2015.70 | spa |
dc.relation.references | Elmroth E, Hern´andez F, Tordsson J. Three fundamental dimensions of scientific workflow interoperability: model of computation, language, and execution environment. Future Generat Comput Syst 2010;26:245–56 | spa |
dc.relation.references | Fakas GJ, Karakostas B. A peer to peer (P2P) architecture for dynamic workflow management. Inf Software Technol 2004;46:423–31 | spa |
dc.relation.references | Fan J, Han F, Liu H. Challenges of big data analysis. Nat Sci Rev 2014;1:293–314. https://doi.org/10.1093/nsr/nwt032 | spa |
dc.relation.references | Federer LM, Lu YL, et al. Biomedical data sharing and reuse: attitudes and practices of clinical and scientific research staff. PLOS ONE 2015;10:e0129506. https://doi.org/10.1371/journal.pone.0129506 | spa |
dc.relation.references | Freire J, Bonnet P, Shasha D. Computational reproducibility: state-of-the-art, challenges, and database research opportunities. In: Proceedings of the 2012 ACM SIGMOD international conference on management of data; 2012. p. 593–6 | spa |
dc.relation.references | Frye SV, Arkin MR, et al. Tackling reproducibility in academic preclinical drug discovery. Nat Rev Drug Discovery 2015;14:733–4. https://doi.org/10.1038/ nrd4737 | spa |
dc.relation.references | Gil Y, Ratnakar V, et al. Wings: intelligent workflow-based design of computational experiments. IEEE Intell Syst 2011;26:62–72. https://doi.org/ 10.1109/mis.2010.9 | spa |
dc.relation.references | Gilbert S, Lynch N. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. SIGACT News 2002;33:51–9. https://doi.org/ 10.1145/564585.564601 | spa |
dc.relation.references | Goecks J, Nekrutenko A, Taylor J, Team TG. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol 2010;11:R86. https://doi.org/10.1186/gb-2010- 11-8-r86. | spa |
dc.relation.references | Goodman SN, Fanelli D, Ioannidis JPA. What does research reproducibility mean? Sci Translat Med 2016;8. https://doi.org/10.1126/scitranslmed.aaf5027. 341ps12–341ps12 | spa |
dc.relation.references | Goodwin S, McPherson JD, McCombie WR. Coming of age: ten years of next- generation sequencing technologies. Nature Rev Genet 2016;17:333 | spa |
dc.relation.references | Guo R, Zhao Y, Zou Q, et al. Bioinformatics applications on Apache spark. GigaScience 2018. https://doi.org/10.1093/gigascience/giy098 | spa |
dc.relation.references | of Health NI, et al. Guidance: rigor and reproducibility in grant applications. 2017 | spa |
dc.relation.references | Huang H, Tata S, Prill RJ. BlueSNP: R package for highly scalable genome-wide association studies using hadoop clusters. Bioinformatics 2012;29:135–6. https:// doi.org/10.1093/bioinformatics/bts647 | spa |
dc.relation.references | Huang L, Krüger J, Sczyrba A. Analyzing large scale genomic data on the cloud with sparkhit. Bioinformatics 2017;34:1457–65. https://doi.org/10.1093/ bioinformatics/btx808 | spa |
dc.relation.references | Huang Y, Gottardo R. Comparability and reproducibility of biomedical data. Briefings Bioinfo 2012;14:391–401. https://doi.org/10.1093/bib/bbs078 | spa |
dc.relation.references | Hung CL, Lin YL, Hua GJ, Hu YC. CloudTSS: a TagSNP selection approach on cloud computing. In: Communications in computer and information science. Springer Berlin Heidelberg; 2011. p. 525–34. https://doi.org/10.1007/978-3- 642-27180-9_64 | spa |
dc.relation.references | Hutson S. Data handling errors spur debate over clinical trial. 618–618 Nature Med 2010;16. https://doi.org/10.1038/nm0610-618a | spa |
dc.relation.references | Karim MR, Michel A, et al. Improving data workflow systems with cloud services and use of open data for bioinformatics research. Briefings Bioinfo 2017;19: 1035–50. https://doi.org/10.1093/bib/bbx039 | spa |
dc.relation.references | Khan A, Kim T, Byun H, Kim Y. Scispace: a scientific collaboration workspace for geo-distributed hpc data centers. Future Generat Comput Syst 2019;101:398–409. | spa |
dc.relation.references | Khan FZ, Soiland-Reyes S, Sinnott RO, Lonie A, Goble C, Crusoe MR. Sharing interoperable workflow provenance: a review of best practices and their practical application in cwlprov. GigaScience 2019;8:giz095 | spa |
dc.relation.references | Kim D, Vouk MA. Assessing run-time overhead of securing kepler. Procedia Comput Sci 2016;80:2281–6. https://doi.org/10.1016/j.procs.2016.05.412 | spa |
dc.relation.references | Kim JH. Genome data analysis. Springer Singapore; 2019. URL: https://www.sp ringer.com/gp/book/9789811319419 | spa |
dc.relation.references | Koster J, Rahmann S. Snakemake–a scalable bioinformatics workflow engine. Bioinfo 2012;28:2520–2. https://doi.org/10.1093/bioinformatics/bts480 | spa |
dc.relation.references | Kuhn K, et al. The cancer biomedical informatics grid (cabig): infrastructure and applications for a worldwide research community. Medinfo 2007;1:330 | spa |
dc.relation.references | Langmead B, Hansen KD, Leek JT. Cloud-scale RNA-sequencing differential expression analysis with myrna. Genome Biol 2010;11:R83. https://doi.org/ 10.1186/gb-2010-11-8-r83 | spa |
dc.relation.references | Langmead B, Schatz MC, et al. Searching for SNPs with cloud computing. Genome Biol 2009;10:R134. https://doi.org/10.1186/gb-2009-10-11-r134 | spa |
dc.relation.references | Legislature CS. The California consumer privacy act of. 2018. https://leginfo.legi slature.ca.gov/faces/billTextClient.xhtml?bill_id=201720180SB1121 | spa |
dc.relation.references | Leo S, Santoni F, Zanetti G. Biodoop: bioinformatics on hadoop. In: 2009 international conference on parallel processing workshops. IEEE; 2009. https:// doi.org/10.1109/icppw.2009.37 | spa |
dc.relation.references | Li R, Li Y, Fang X, Yang H, Wang J, Kristiansen K, Wang J. SNP detection for massively parallel whole-genome resequencing. Genome Res 2009;19:1124–32. https://doi.org/10.1101/gr.088013.108 | spa |
dc.relation.references | Li X, Zhang L, et al. A novel workflow-level data placement strategy for data- sharing scientific cloud workflows. IEEE Transact Serv Comput 2016. https://doi. org/10.1109/TSC.2016.2625247 | spa |
dc.relation.references | Liu J, Pacitti E, Valduriez P, Mattoso M. Parallelization of scientific workflows in the cloud. 2014 | spa |
dc.relation.references | Liu J, Pacitti E, Valduriez P, Mattoso M. A survey of data-intensive scientific workflow management. J Grid Comput 2015;13:457–93. https://doi.org/ 10.1007/s10723-015-9329-8 | spa |
dc.relation.references | Liu J, Pacitti E, Valduriez P, Mattoso M. Scientific workflow scheduling with provenance data in a multisite cloud. In: Transactions on large-scale data-and knowledge-centered systems XXXIII. Springer; 2017. p. 80–112 | spa |
dc.relation.references | Liu J, Pineda L, Pacitti E, Costan A, Valduriez P, Antoniu G, Mattoso M. Efficient scheduling of scientific workflows using hot metadata in a multisite cloud. IEEE Transact Knowl Data Eng 2019;31:1940–53. https://doi.org/10.1109/ tkde.2018.2867857 | spa |
dc.relation.references | Liu X, Datta A. Towards intelligent data placement for scientific workflows in collaborative cloud environment. In: 2011 IEEE international symposium on parallel and distributed processing workshops and phd forum. IEEE; 2011. p. 1052–61. https://doi.org/10.1109/IPDPS.2011.259 | spa |
dc.relation.references | Liu Y, Zhang L, Ge N, Li G. A systematic literature review on federated learning: from a model quality perspective. 2020. arXiv preprint arXiv:2012.01973 | spa |
dc.relation.references | Lu S, Zhang J. Collaborative scientific workflows supporting collaborative science. Int J Bus Process Integrat Manag 2011;5:185. https://doi.org/10.1504/ ijbpim.2011.040209 | spa |
dc.relation.references | Lu YY, Tang K, et al. CAFE: aCcelerated Alignment-FrEe sequence analysis. Nucleic acids research 2017;45:W554–9. https://doi.org/10.1093/nar/gkx351 | spa |
dc.relation.references | Malin BA, Emam KE, O’Keefe CM. Biomedical data privacy: problems, perspectives, and recent advances. 2013 | spa |
dc.relation.references | McKenna A, Hanna M, Banks E, et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010;20:1297–303. https://doi.org/10.1101/gr.107524.110 | spa |
dc.relation.references | McMahan B, Moore E, Ramage D, Hampson S, y Arcas BA. Communication- efficient learning of deep networks from decentralized data. In: Singh A, Zhu J, editors. Proceedings of the 20th international conference on artificial intelligence and statistics. Fort Lauderdale, FL, USA: PMLR; 2017. p. 1273–82. URL: http://pr oceedings.mlr.press/v54/mcmahan17a.html | spa |
dc.relation.references | Moreau L, Missier P, Cheney J, Soiland-Reyes S. Prov-n: the provenance notation. 2013 | spa |
dc.relation.references | Nagappan M, Vouk MA. A model for sharing of confidential provenance information in a query based system. In: International provenance and annotation workshop. Springer; 2008. p. 62–9. https://doi.org/10.1007/978-3-540-89965-5_ 8 | spa |
dc.relation.references | Nguyen T, Shi W, Ruden D. CloudAligner: a fast and full-featured MapReduce based tool for sequence mapping. BMC Res Notes 2011;4. https://doi.org/ 10.1186/1756-0500-4-171 | spa |
dc.relation.references | NHGRI-EBI. GWAS catalog. 2019. https://www.ebi.ac.uk/gwas/. accessed 20- Sept-2019 | spa |
dc.relation.references | NIH-BMIC. NIH data sharing repositories. 2019. https://www.nlm.nih.gov/NIH bmic/nih_data_sharing_repositories.html. accessed 20-Sept-2019 | spa |
dc.relation.references | Nordberg H, Bhatia K, Wang K, Wang Z. BioPig: a hadoop-based analytic toolkit for large-scale sequence data. Bioinformatics 2013;29:3014–9. https://doi.org/ 10.1093/bioinformatics/btt528 | spa |
dc.relation.references | NSF, 2019. Chapter XI - Other Post Award Requirements and Consideration. https://www.nsf.gov/pubs/policydocs/pappg19_1/pappg_11.jsp\#XID4. [Online; accessed 20-June-2019] | spa |
dc.relation.references | O’Brien AR, Saunders NFW, et al. VariantSpark: population scale clustering of genotype information. BMC Genom 2015;16. https://doi.org/10.1186/s12864- 015-2269-7 | spa |
dc.relation.references | Pandey RV, Schl¨otterer C. DistMap: a toolkit for distributed short read mapping on a hadoop cluster. PLoS ONE 2013;8:e72614. https://doi.org/10.1371/journal. pone.0072614 | spa |
dc.relation.references | Papageorgiou L, Eleni P, et al. Genomic big data hitting the storage bottleneck. EMBnetjournal 2018;24:e910. https://doi.org/10.14806/ej.24.0.910 | spa |
dc.relation.references | Parks R, Chu CH, Xu H. Healthcare information privacy research: iusses, gaps and what next? AMCIS; 2011 | spa |
dc.relation.references | Peteiro-Barral D, Guijarro-Berdi˜ nas B. A survey of methods for distributed machine learning. Prog Artif Intell 2013;2:1–11 | spa |
dc.relation.references | Pineda-Morales L, Costan A, Antoniu G. Towards multi-site metadata management for geographically distributed cloud workflows. In: 2015 IEEE international conference on cluster computing. IEEE; 2015. p. 294–303. https:// doi.org/10.1109/cluster.2015.49 | spa |
dc.relation.references | Pineda-Morales L, Liu J, Costan A, Pacitti E, Antoniu G, Valduriez P, Mattoso M. Managing hot metadata for scientific workflows on multisite clouds. In: 2016 IEEE international conference on big data (big data). IEEE; 2016. p. 390–7 | spa |
dc.relation.references | Pireddu L, Leo S, Zanetti G. SEAL: a distributed short read mapping and duplicate removal tool. Bioinformatics 2011;27:2159–60. https://doi.org/10.1093/ bioinformatics/btr325 | spa |
dc.relation.references | Rasheed Z, Rangwala H. A map-reduce framework for clustering metagenomes. In: 2013 IEEE international symposium on parallel & distributed processing, workshops and phd forum. IEEE; 2013. https://doi.org/10.1109/ ipdpsw.2013.100 | spa |
dc.relation.references | Rasheed Z, Rangwala H. A map-reduce framework for clustering metagenomes. In: 2013 IEEE international symposium on parallel & distributed processing, workshops and phd forum. IEEE; 2013. https://doi.org/10.1109/ ipdpsw.2013.100 | spa |
dc.relation.references | Rodriguez MA, Buyya R. Scientific workflow management system for clouds. In: Software architecture for big data and the cloud. Elsevier; 2017. p. 367–87. https://doi.org/10.1016/b978-0-12-805467-3.00018-1 | spa |
dc.relation.references | Ross RB, Thakur R, et al. Pvfs: a parallel file system for linux clusters. In: Proceedings of the 4th annual Linux showcase and conference; 2000. p. 391–430 | spa |
dc.relation.references | Ross RB, Thakur R, et al. Pvfs: a parallel file system for linux clusters. In: Proceedings of the 4th annual Linux showcase and conference; 2000. p. 391–430 | spa |
dc.relation.references | Salloum S, Dautov R, et al. Big data analytics on Apache spark. Int J Data Sci Anal 2016;1:145–64. https://doi.org/10.1007/s41060-016-0027-9 | spa |
dc.relation.references | Santana-Perez I, P´ erez-Hern´ andez MS. Towards reproducibility in scientific workflows: an infrastructure-based approach. Scientific Program 2015:1–11. https://doi.org/10.1155/2015/243180 | spa |
dc.relation.references | Schadt EE, Linderman MD, et al. Computational solutions to large-scale data management and analysis. Nature Rev Genet 2010;11:647–57. https://doi.org/ 10.1038/nrg2857 | spa |
dc.relation.references | Schatz MC. BlastReduce: high performance short read mapping with MapReduce. University of Maryland; 2008. http://cgis.cs.umd.edu/Grad/scholarlypapers/pa pers/MichaelSchatz.pdf | spa |
dc.relation.references | Schatz MC. BlastReduce: high performance short read mapping with MapReduce. University of Maryland; 2008. http://cgis.cs.umd.edu/Grad/scholarlypapers/pa pers/MichaelSchatz.pdf | spa |
dc.relation.references | Schatz MC, Sommer D, Kelley D, Pop M. De novo assembly of large genomes using cloud computing. In: Proceedings of the cold spring harbor biology of genomes conference; 2010 | spa |
dc.relation.references | Schatz MC, Sommer D, Kelley D, Pop M. De novo assembly of large genomes using cloud computing. In: Proceedings of the cold spring harbor biology of genomes conference; 2010 | spa |
dc.relation.references | Senturk IF, Balakrishnan P, et al. A resource provisioning framework for bioinformatics applications in multi-cloud environments. Future Generat Comput Syst 2018;78:379–91. https://doi.org/10.1016/j.future.2016.06.008 | spa |
dc.relation.references | Sharov AA, Schlessinger D, Ko MSH. ExAtlas: an interactive online tool for meta- analysis of gene expression data. J Bioinfo Comput Biol 2015;13:1550019. https://doi.org/10.1142/s0219720015500195 | spa |
dc.relation.references | Soiland-Reyes S, Alper P, Goble C. Tracking workflow execution with tavernaprov. In: PROV: three tears later: Provenance Week 2016; 2016 | spa |
dc.relation.references | Stephens ZD, Lee SY, et al. Big data: astronomical or genomical? PLOS Biology 2015;13:e1002195. https://doi.org/10.1371/journal.pbio.1002195 | spa |
dc.relation.references | Tannenbaum T, Wright D, Miller K, Livny M. Condor: a distributed job scheduler. In: Beowulf cluster computing with windows; 2001. p. 307–50 | spa |
dc.relation.references | Taylor I, Shields M, Wang I, Harrison A. The triana workflow environment: architecture and applications. In: Workflows for e-Science. Springer; 2007. p. 320–39. https://doi.org/10.1007/978-1-84628-757-2_20 | spa |
dc.relation.references | Taylor IJ, Deelman E, et al. Workflows for e-Science: scientific workflows for grids, ume 1. Springer; 2007. https://doi.org/10.1007/978-1-84628-757-2 | spa |
dc.relation.references | Thain D, Tannenbaum T, Livny M. Distributed computing in practice: the condor experience. Concurr Comput: Pract Exp 2005;17:323–56. https://doi.org/ 10.1002/cpe.938 | spa |
dc.relation.references | Tommaso PD, Chatzou M, et al. Nextflow enables reproducible computational workflows. Nature Biotechnol 2017;35:316–9. https://doi.org/10.1038/ nbt.3820 | spa |
dc.relation.references | Turakhia MP, Desai M, Hedlin H, Rajmane A, Talati N, Ferris T, Desai S, Nag D, Patel M, Kowey P, Rumsfeld JS, Russo AM, Hills MT, Granger CB, Mahaffey KW, Perez MV. Rationale and design of a large-scale, app-based study to identify cardiac arrhythmias using a smartwatch: the apple heart study. Am Heart J 2019; 207:66–75. https://doi.org/10.1016/j.ahj.2018.09.002. https://www.sciencedi rect.com/science/article/pii/S0002870318302710. | spa |
dc.relation.references | Union I. Communication from the commission to the european parliament, the council, the european economic and social committee and the committee of the regions. A new skills agenda for europe. 2014 [Brussels]. | spa |
dc.relation.references | Valduriez P, Mattoso M, Akbarinia R, Borges H, Camata J, Coutinho A, Gaspar D, Lemus N, Liu J, Lustosa H, et al. Scientific data analysis using data-intensive scalable computing: the scidisc project. In: LADaS: Latin America data science workshop, CEUR-WS. Org; 2018 | spa |
dc.relation.references | Van Hung T, Chuanhe H. An effective data placement strategy in main-memory database cluster. In: 2011 second international conference on networking and distributed computing. IEEE; 2011. p. 93–8. https://doi.org/10.1109/ ICNDC.2011.27. | spa |
dc.relation.references | Verbraeken J, Wolting M, Katzy J, Kloppenburg J, Verbelen T, Rellermeyer JS. A survey on distributed machine learning. ACM Comput Surv (CSUR) 2020;53: 1–33 | spa |
dc.relation.references | Wang J, Crawl D, Altintas I. Kepler + hadoop. In: Proceedings of the 4th workshop on workflows in support of large-scale science - WORKS ’09. ACM Press; 2009. https://doi.org/10.1145/1645164.1645176 | spa |
dc.relation.references | Wang R, Li M, Peng L, Hu Y, Hassan MM, Alelaiwi A. Cognitive multi-agent empowering mobile edge computing for resource caching and collaboration. Future Generat Comput Syst 2020;102:66–74. https://doi.org/10.1016/j. future.2019.08.001. URL: https://www.sciencedirect.com/science/article/pii/ S0167739X19318783 | spa |
dc.relation.references | Wang Y. Automating experimentation with distributed systems using generative techniques. Ph.D. thesis. University of Colorado at Boulder; 2006 | spa |
dc.relation.references | Wang Y, Carzaniga A, Wolf AL. Four enhancements to automated distributed system experimentation methods. In: Proceedings of the 30th international conference on Software engineering; 2008. p. 491–500 | spa |
dc.relation.references | Wiewi´ orka MS, Messina A, et al. SparkSeq: fast, scalable and cloud-ready tool for the interactive genomic data analysis with nucleotide precision. Bioinformatics 2014;30:2652–3. https://doi.org/10.1093/bioinformatics/btu343 | spa |
dc.relation.references | Wilde M, Hategan M, et al. Swift: a language for distributed parallel scripting. Parallel Comput 2011;37:633–52. https://doi.org/10.1016/j.parco.2011.05.005. | spa |
dc.relation.references | Wolstencroft K, Haines R, et al. The taverna workflow suite: designing and executing workflows of web services on the desktop, web or in the cloud. Nucleic Acids Res 2013;41:W557–61. https://doi.org/10.1093/nar/gkt328 | spa |
dc.relation.references | Xiao Y, Zhou AC, Yang X, He B. Privacy-preserving workflow scheduling in geo- distributed data centers. Future Generat Comput Syst 2022;130:46–58 | spa |
dc.relation.references | Xie J, Yin S, et al. Improving mapreduce performance through data placement in heterogeneous hadoop clusters. In: 2010 IEEE international symposium on parallel & distributed processing, workshops and phd forum (IPDPSW). IEEE; 2010. p. 1–9. https://doi.org/10.1109/IPDPSW.2010.547088 | spa |
dc.relation.references | Xie T. Sea: a striping-based energy-aware strategy for data placement in raid- structured storage systems. IEEE Transact Comput 2008;57:748–61. https://doi. org/10.1109/TC.2008.27 | spa |
dc.relation.references | Xing EP, Ho Q, Dai W, Kim JK, Wei J, Lee S, Zheng X, Xie P, Kumar A, Yu Y. Petuum: a new platform for distributed machine learning on big data. IEEE Transact Big Data 2015;1:49–67. https://doi.org/10.1109/tbdata.2015.2472014 | spa |
dc.relation.references | Xu B, Gao J, Li C. An efficient algorithm for DNA fragment assembly in MapReduce. Biochem Biophys Res Commun 2012;426:395–8. https://doi.org/ 10.1016/j.bbrc.2012.08.101 | spa |
dc.relation.references | Xu B, Li C, Zhuang H, et al. DSA: scalable distributed sequence alignment system using SIMD instructions. In: 2017 17th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGRID), IEEE; 2017. https://doi.org/ 10.1109/ccgrid.2017.74 | spa |
dc.relation.references | Xu B, Li C, Zhuang H, et al. Efficient distributed smith-waterman algorithm based on Apache spark. In: 2017 IEEE 10th international conference on cloud computing (CLOUD). IEEE; 2017. https://doi.org/10.1109/cloud.2017.83 | spa |
dc.relation.references | Yu HF, Hsieh CJ, Chang KW, Lin CJ. Large linear classification when data cannot f it in memory. In: ACM Transactions on Knowledge Discovery from Data (TKDD); 2012. p. 1–23. 5 | spa |
dc.relation.references | Yu J, Buyya R. A taxonomy of workflow management systems for grid computing. J Grid Comput 2005;3:171–200. https://doi.org/10.1007/s10723-005-9010-8. | spa |
dc.relation.references | Yuan D, Yang Y, Liu X, Chen J. A data placement strategy in scientific cloud workflows. Future Generat Comput Syst 2010;26:1200–14. https://doi.org/ 10.1016/j.future.2010.02.004 | spa |
dc.relation.references | Zhang D, Zhao L, Li B, et al. SEQSpark: a complete analysis tool for large-scale rare variant association studies using whole-genome and exome sequence data. The American J Human Genet 2017;101:115–22. https://doi.org/10.1016/j. ajhg.2017.05.017 | spa |
dc.relation.references | Zhang L, Gu S, Liu Y, Wang B, Azuaje F. Gene set analysis in the cloud. Bioinformatics 2011;28:294–5. https://doi.org/10.1093/bioinformatics/btr630 | spa |
dc.relation.references | Zhao G, Ling C, Sun D. SparkSW: scalable distributed computing system for large- scale biological sequence alignment. In: 2015 15th IEEE/ACM international symposium on cluster, cloud and grid computing, IEEE; 2015. https://doi.org/ 10.1109/ccgrid.2015.55 | spa |
dc.relation.references | Zhao J, Gomez-Perez JM, Belhajjame K, Klyne G, Garcia-Cuesta E, Garrido A, Hettne K, Roos M, De Roure D, Goble C. Why workflows break—understanding and combating decay in taverna workflows. In: 2012 ieee 8th international conference on e-science. IEEE; 2012. p. 1–9 | spa |
dc.relation.references | Zhao Q, Xiong, et al. A new energy-aware task scheduling method for data- intensive applications in the cloud. J Network Comput Appl 2016;59:14–27. https://doi.org/10.1016/j.jnca.2015.05.001 | spa |
dc.relation.references | Zhao Y, Li Y, Raicu I, Lu S, Tian W, Liu H. Enabling scalable scientific workflow management in the cloud. Future Generat Comput Syst 2015;46:3–16. https:// doi.org/10.1016/j.future.2014.10.023. | spa |
dc.relation.references | Zhou W, Li R, Yuan S, et al. MetaSpark: a spark-based distributed processing tool to recruit metagenomic reads to reference genomes. Bioinformatics 2017. https:// doi.org/10.1093/bioinformatics/btw750. btw750 | spa |
dc.relation.references | Zielezinski A, Vinga S, Almeida J, Karlowski WM. Alignment-free sequence comparison: benefits, applications, and tools. Genome Biol 2017;18. https://doi. org/10.1186/s13059-017-1319-7 | spa |
dc.relation.references | Zytnicki M, Quesneville H. S-MART, a software toolbox to aid RNA-seq data analysis. PLoS ONE 2011;6:e25988. https://doi.org/10.1371/journal. pone.0025988. | spa |
dc.rights.accessrights | info:eu-repo/semantics/openAccess | spa |
dc.subject.armarc | Biometría | |
dc.subject.armarc | Biometry | |
dc.subject.armarc | Análisis de la información | |
dc.subject.armarc | Information analysis | |
dc.subject.armarc | Investigación biomédica | |
dc.subject.armarc | Biomedical research | |
dc.subject.armarc | Tecnología médica | |
dc.subject.armarc | Medical technology | |
dc.subject.proposal | Distributed biomedical analyses | eng |
dc.subject.proposal | Análisis biomédicos distribuidos | spa |
dc.subject.proposal | Fully distributed collaborations | eng |
dc.subject.proposal | Colaboraciones totalmente distribuidas | spa |
dc.subject.proposal | Reproducibility | eng |
dc.subject.proposal | Reproducibilidad | spa |
dc.subject.proposal | Scalability Multi-site analyses | eng |
dc.subject.proposal | Análisis de escalabilidad multisitio | spa |
dc.subject.proposal | Distributed workflow analyses | eng |
dc.subject.proposal | Análisis de flujo de trabajo distribuido | spa |
dc.type.coar | http://purl.org/coar/resource_type/c_6501 | spa |
dc.type.content | Text | spa |
dc.type.driver | info:eu-repo/semantics/article | spa |
Ficheros en el ítem
Este ítem aparece en la(s) siguiente(s) colección(ones)
-
AD - CTG – Informática [89]
Clasificación B- Convocatoria 2018