Big Data
Big Data means mainly operating on data sets which Volume, Velocity and Variety (3V*) exceeds capabilities of traditional tools and techniques.
Markets / Industries mainly benefiting from Big Data technologies include:
- Big science: Large Hadron Collider – 25 petabytes annually
- Energy: online consumption measurements, Geographical Information Systems (GIS)
- Digital media: global VoD service – 100k titles for 4M subscribers
- Smart factory: supply planning based sensors (acoustics, vibration, pressure, current, voltage) for zero down time
- Smart city: 400K CCTV cameras in London
- Analytic workload: enterprise financial analysis, production optimization, etc.
The data sets incorporate multiple heterogeneous sources ranging from autonomous devices and sensors, through enterprise systems up to user generated content in social services. Data capturing from these sources involves M2M technologies, load balancing and Message Queuing (MQTT). Cloud storage supports data curation.
Distributed No-SQL Databases (Cassandra, MongoDB, …) deliver scalable data storage, transfer, sharing and search capabilities. Hadoop mass parallel processing (MapReduce) powers analysis and Data Mining. Knowledge elicitation benefits from Machine Learning and other artificial intelligence technologies. The cloud computing is a natural environment of Big Data domain.
* in some cases, companies try to introduce the fourth āVā. Some of them even try to do six. However, the ā3Vā symbol specifies the most fundamental issues and any additional Vs add no big value.
Key Competences
- Programming languages
- Java
- Scala
- Frameworks
- Hadoop
- MQTT
- Databases:
- Cassandra
- MongoDB,
- Redis
- Platforms:
- Distributed & parallel computing
- Cloud (Amazon Web Services…)
Related technologies:
- Cloud
- M2M