Big Data, Free Software and Open Source: Available Applications

Big Data, Free Software and Open Source: Available Applications

Big Data, Free Software and Open Source: Available Applications

Big Data is a technological concept that is related to the management of large volumes of data, structured and unstructured, which are currently handled by large business, technological, scientific and even government sectors.

Although when talking about Big Data, it is not really the amount of data that is important, but what organizations do with the data. Since Big Data, its associated technology, can analyze them to obtain ideas that lead to better decision-making, movements and strategies. And in this aspect, Free Software (SL) and Open Source (CA) have contributed a lot to this technology, since many developed applications have been implemented in this development format.

Big Data and Free Software: Introduction

Big Data and Free Software

For those skilled in the art, it is already well known that Free Software, its development model, its philosophy, is based on creating technologies, mainly software products, which in turn can be used, modified and distributed freely. And that Open Source is an important element in the development of free software, since it focuses on the practical advantages of this development dynamic more than on the ethics of product freedom and citizens.

Therefore, while The SL / CA contribute with the means to carry out Big Data, Big Data complements these indirectly, not only for the benefit of the accelerated expansion of technological development, but also for the freedom of access to information that Big Data brings with it.

Big Data and Free Software: What is Big Data?

What is the big data?


For one of the greats of Software and technological development, IBM, Big Data is a:

«... technology that has opened the doors to a new approach to understanding and decision making, which is used to describe huge amounts of data (structured, unstructured and semi-structured) that would take too much time and would be very expensive to load into a relational database for analysis.


Big Data, its technology, was born with the aim of covering the entire spectrum of data analysis possible, that is, to cover both what exists and is resolved with current and different technologies, as well as what is not resolved by existing technologies, such as storage and management of large volumes of data that have very specific characteristics.


Bid Data handles volumes of data that are usually defined by the following characteristics:

  • Volume: Size of data from multiple sources.
  • Speed: Speed ​​with which data from multiple sources arrives and is managed.
  • Variety: Format of analyzed data from multiple sources.

That is, data volumes that are typically composed of Structured, Semi-structured, and Unstructured data, and be handled in huge quantities that are usually described with high quantity prefixes, such as: Tera, Peta or Exa, among others.

And from all kinds of sources, such as the Internet (Social networks, Digital Media, Websites and Databases), Equipment (Mobile phones, Multimedia players, Positioning systems, Civil and industrial digital sensors, among others) and Organizations (Private and Public, Commercial, Government and Community).

Big Data and Free Software: Concept, Objective, Data, Importance, Advantages and Benefits


What makes Big Data such a useful technology for Organizations (Private and Public, Commercial, Government and Community), is the fact that it provides valuable information that often serves as an accurate and reliable answer to questions that had not even been asked for certain situations or problems. That is, its usefulness is often seen on aspects that usually arise from the same information collected and managed.

The processing of large volumes of information makes it easier for the processed data to be shaped or tested in the most appropriate way. or specifies, that is considered appropriate by its administrator. This allows organizations that use Big Data to be able to identify problems in a more understandable way.

The collection of large volumes of data and its subsequent analysis to search for trends within them allow Organizations to be more effective and efficient, by moving much more quickly, smoothly and in a timely manner over them. In addition, it allows them to eliminate problem areas before problems overtake them, causing them to lose benefits, reputation or support.


Big Data helps Organizations to manage their data much better, this results in the identification of new positive or productive opportunities for their members (clients or citizens). And this in turn, leads to smarter and more efficient actions, savings in hours / labor and money, which often translates into happiness for everyone involved. When Big Data is used, value is usually added to the activities carried out in the following ways:

  • Cost reduction: In the storage and management of large volumes of data.
  • Time reduction: More efficiency and effectiveness in decision making.
  • New products and services: With the ability to measure and anticipate the needs and problems of users (clients and / or citizens), their satisfaction is increased.


Well used Big Data is often capable of determining the root causes of failures, problems and defects almost in real time. However, it is to take into account that Big Data technology is not a panacea by itself. So citing another great of technology such as Oracle, it can be added that:

«Identifying the value of big data does not only mean analyzing it (which is already an advantage in itself). It is an entire discovery process that requires analysts, business users and executives to ask the right questions, identify patterns, make informed decisions and predict behaviors.

Big Data and Free Software: SL / CA Applications

SL / CA Applications for Big Data

Among the Free Software and Open Source applications that are worth mentioning for research, testing, and implementation are:


  • Apache Hadoop: Open source platform made up of Hadoop Distributed File System (HDFS), Hadoop MapReduce, and Hadoop Common.
  • Avro: Apache project that provides serialization services.
  • Cassandra: Distributed non-relational database based on a storage model of , developed in Java.
  • Chukwa: Software designed for large-scale collection and analysis of event logs.
  • Flow: Software whose main task is to direct data from one source to some other location.
  • HBase: Columnar database (column-oriented database) running on HDFS.
  • Winter: "Data Warehouse" infrastructure that facilitates the administration of large volumes of data that are stored in a distributed environment.
  • Jaql: Functional and declarative language that allows the exploitation of data in JSON format designed to process large volumes of information.
  • Lucene: Software that provides libraries for indexing and searching on text.
  • oozie: Open source project that simplifies workflows and coordination between each of the processes.
  • Pig: Software that allows Hadoop users to focus more on analyzing all data sets and spend less time building MapReduce programs.
  • Zookeeper: Centralized infrastructure and services that can be used by applications to ensure that processes across a cluster are serialized or synchronized.


Others just as well known, but not related to the open source platform Hadoop are:

  • Elastic search: Full-text-based search and analysis engine.
  • MongoDB: NoSQL database based on the document data model.
  • Cassandra: Apache open source project designed for NoSQL database administration.
  • CouchDB: Open source NoSQL database based on common standards for easy accessibility and web compatibility with a diversity.
  • Sun: Open source search engine based on the Lucene project Java library.
    Other RDBMS tools: MySQL Cluster and VoltDB.

Big Data and Free Software: Conclusion


Our present (and immediate next) time is immersed or drowned in a high and growing mass of data, which has much to say as a whole, than individually. Therefore, the use of Big Data technology in the present and the immediate future will help society, the whole of humanity, to discover an infinity of things (events or inventions), which could have taken many years to discover themselves. , without the use of this.

As Big Data and its tools provide sufficient analysis speed analyze a result obtained quickly and rework it as many times as necessary, in a short time, to find the true or closest value to which you are trying to reach. If you have found the topic of Big Data interesting, you can expand the topic a little more by reading this Report on BBVA.

The content of the article adheres to our principles of editorial ethics. To report an error click here!.

Be the first to comment

Leave a Comment

Your email address will not be published.



  1. Responsible for the data: Miguel Ángel Gatón
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.