Implications of big data for individuals and society

Limitations of predictive analytics

With big data analytics and machine learning technologies it is becoming possible to very accurately predict trends and patterns in many areas but also in regards to human behaviour. So far this has been applied in most part to retail market and online shopping.

There has been discussions of future systems that while technologically possible are controversial. Crime prediction. Big data and even real time big data analytics combined with machine learning technology can be very accurate. This could possibly palce authorities in advantaged position in order to prevent crime by for example dispatching appropriate resources to an area with high risk potential in real time.

Discussions also tend to steer into more of a science fiction scenarios taken from movies like Minority Report where police uses technology to predict crime and make arrests based on that prediction.

Obvious controversy is in human belief in destiny and predefined path, but this could also be a discussion about religion. there will always be pros and cons, believers and non believers. Its a free will problem and I believe that human beings should never be judged on an assumption or a prediction.

Theoretically there could be benefits of preventing crime from happening and while technologically possible I believe that scenario should be left in the entertainment industry in the hands of movie makers.

Implications of big data for individuals

Implications of big data analytics are a constant topic of a debate. There are many ways that individuals can be affected. The positive implications can greatly improve a life of an individual.

Those include an idea of driver-less cars that utilise real time big data analytics.

Car journeys could become:

safer due to elimination of a human error
faster due to big data real time traffic analysis
more efficient and economical due to big data route, traffic and even optimisation for the most economic driving style

Another positive implication of utilisation of big data for individuals lays in healthcare. With previously mentioned examples of ability to considerably cut diagnosis time as well costs. Use of dedicated healthcare smart wearable devices could mean early detection of health problems for individuals even before symptoms occur which could potentially have life saving implications. Healthcare is an area where bid data can make an impact which could include:

Extension of life
Early health problem detection
Diagnosis time reduced
Reduced costs of diagnosis
Ability to treat more patients due to time saved

While there are many more benefits of how big data analytics can affect and individuals life there are many negative effects that I think we need to be aware of. Big data analytics brings more automation to almost every sector that it is applied to. It comes with territory.

First example of possible negative implications is something that is already happening and more and more companies are starting to use automated profiling of prospective employees. This could greatly reduce chances of employment even for individuals with the right skills and expertise solely due to being less skilled / technical about formatting their resumes to a specific formats that an algorithm will be able to “read” or “understand”.

Similar automated profiling techniques can cause individuals not the be awarded place at educational establishments or being regarded as inadequate students during their time at university solely due to inner workings of an algorithm used to asses the entry applications or academic progress.

In line with the above techniques and their implications there may be a fear of loss of personal liberty where choices are made for an individual and solely up to inner workings of algorithms that asses and analyse data.

Negative implications include:

Profiling of job applicants,
Profiling of students,
Profiling of credit applicants
Loss of personal liberty
Loss of privacy

Implications of big data for society

Implications of big data analytics for society can overlap with previously mentioned implications for individuals.

Main positive implications include:

Great improvement in healthcare

With great potential of reducing diagnosis time and effectively its cost there can be less drain on healthcare secedes. This could enable faster and more effective appointment systems. Patients with rare ore more complicated conditions could be seen by specialists much sooner.

Extension of life could also be possible due to faster diagnosis ans well a increased rate of early health problems detection could lead to life saved.

Use of big data analytics in healthcare can have a positive economical effect for the society where costs of diagnosis were saved.

Near instant court decisions for crime sentencing

While controversial, another possible positive implication of use of big data analytics are arguably near instant court decisions for crime sentencing based on hard facts and eliminating factors like bias or likeability .

Cutting time and effectively costs of court time could have a positive impact on economy for the benefit of the society.

This example is widely debated and controversial and I will also present an opposed view in a negative implications of that particular use of big data analytics.

Main negative implications include:

Potential loss of jobs

Most discussed and debated implication of big data analytics (or progress technological progress in general) for the society were always topics of job loss and replacement of manual labour with automated systems or robots.

While not that long ago this may have been a chapter from William Gibson’s science fiction book it poses a real life implications today for low skilled jobs in certain industries that could affect teachers, nurses or any kind of manual labour jobs.

Topic of job loss due to advancement technology is not new. It was widely debated during each major technological advancement and there are always two sides to the medal.

Some argue that technolog is not at fault here but the problem lies in how our current societies and governmental systems are structured.

A quote from New York congressional representative Alexandria Ocasio-Cortez shine some light on that side that is usually left out from the debate:

“We should not be haunted by the specter of being automated out of work,” “We should be excited by that. But the reason we’re not excited by it is because we live in a society where if you don’t have a job, you are left to die. And that is, at its core, our problem.”

Alexandria Ocasio-Cortez (2019)

Human rights implications – mass surveillance

To power data driven services from driver-less cars, traffic automation to life saving medical wearable devices a massive about of dat needs to be collected on daily basis. To enable the above debated scenarios and technologies to benefit individuals and the society more data gathering deceives and technologies need to be deployed.

This comes at cost to personal privacy and can have human rights implications.

A laud case of Edward Snowden, a former Central Intelligence Agency employee who with help of professional journalists disclosed use of mass surveillance technologies by the government agency on unsuspecting citizens.

With large percent of our lives shifting into the online space and sharing data and information on each one of us this can be possible cost to the benefits we gain or a scary scenario where personal privacy is abolished heavily implicating human rights and affecting whole societies.

Near instant court decisions for crime sentencing – counterargument

There is also a counter argument for use of big data analytics in the workings of court proceedings and criminal sentencing. The argument being that basing the sentencing solely on data analytics does not take a human factor and most importantly the context which often needs to be assessed on case by case basis due to often unique nature of each case and its circumstances.

Strategies for limiting the negative effects of big data

Creation of rules and laws
Enforcement
Opt Out – data sharing
Misinformation

With big data analytics the privacy of individuals comes very close to being non existent. With that said and the previously mentioned negative effects on society and individuals there are many ways to limit or in some cases eliminate those negative effects.

In the public and official sectors like councils or banking where data must be accurate the only way to limit the negative effects for an individual is creation and enforcement of new rules and laws that would protect an individual from being exploited.

As an individual we also have few limited choices. In many cases we can op out from online data sharing and most of institution are already obliged to clearly state how our personal data is handled.

In regards to already mentioned possible surveillance and protection of individual privacy there are ways to use the same technologies in counteract. This method was known for hundreds of years and it is misinformation.

There could be a market for those privacy-conscious for an applications which would constantly generate and share randomly generated data (for example search queries) from our devices. This in turn would hide our actual online presence in plain sight effectively rending big data’s Veracity useless every time.

Applications of Big Data

In this post I will summarise applications of big data in business, science and society as well as possible future applications of big data.

I will also summarise the technological requirements of big data and how it is gathered, stored, processed and visualised.

Applications of big data in business

Big data helps businesses and organisations to use data to their advantage and turn it to identification of new opportunities.

This improves business decision making, making operations more efficient, turns higher profits and making customers happier with their product / service.

Big data and systems like cloud based analytics or Hadoop makes it possible to reduce costs fore organisations. This mostly applies to the costs of storing large amounts of data.

With the efficiency ans speed of systems like Hadoop better and faster decision making is now possible. This is due to ability to analyse new sources of data, in-memory analytics. Organisations are now able to analyse information much faster or even immediately which in turn allows to make business decisions based of what they have learned much faster.

Organisations with the ability to measure customer needs and their satisfaction with analytics gives organisations the ability to give the customers what they want.

More and more businesses are able to create new products that come from big data analytics.

In media and entertainment industry that revolves around user-generated data through various social media platforms, businesses were able to leveraged that data in order to predict the interests of their audience, getting insights into customers reviews and behavioural patterns.

Spotify, an on-demand music streaming service uses big data to to serve targeted music recommendations to individual users.

Similar use od big data is employed at Amazon Prime that offers e-books, video streaming as well as music streaming.

Applications of big data in science

There are wast benefits of application of big data in healthcare.

Healthcare analytics have great potential in areas like improving quality of life, prediction of potential outbreaks of epidemics but also significantly reducing costs of treatments ant time of diagnosis.

Big data application in healthcare means that doctors now can make more informed decisions as they have access to wider range of data.

There are many healthcare data types created in hospitals or clinics those include:

Clinical data: doctors motes, prescriptions, reports from medical imaging, laboratory data, pharmacy data
Machine generated data: generated from monitoring vital signs, emergency care data, medical journals
Patient data that includes electronic patient records

Another example for the use of big data in healthcare is predictive analytics that use data like pre-existing conditions, habit patterns, which enable to foresee how vulnerable an individual is to cancer and in turn allows for early treatment.

There are many more applications of big data in healthcare and those include:

Tracking Patient Vitals

It is much easier to monitor patients in emergency rooms plugged to monitoring devices because any change in patterns can be quickly alerted to doctors in hospitals that often do not have enough staffing.

Other uses of big data usage in healthcare include:

Improved hospital administration
Fraud prevention and detection

Applications of big data in society

With big data already having an impact on business world and healthcare does it have any application in society?

The short answer is yes but lets fins some examples.

The main use of gig data that has impact on society are transformation of cities into smart cities. Using variety of sensors, tracking movement of public transport and that data that is generated enable cities to be more efficient and provide citizens with better and more personalised services while cutting down unnecessary costs and greatly reducing waste of resources.

Cities transportation systems that utilises big data analytics can be greatly beneficial.

Enabling data streaming gathered from variety of sensors, on-vehicle devices and smart traffic lights, processing in real time and communicating traffic information with drivers directly to their smart phones.

Recognition of traffic patterns by analysing real time data, reduction of roads congestion by traffic prediction and adjusting traffic controls accordingly.

Future applications of big data

Smart Cities

There are many possible uses of big data in the future. Previously mentioned evolving smart-cities could turn into fully automates and even semi-autonomous cities providing public with with benefits of automated and personalised transportation systems.

There are many speculations on topics of ethics and security but from purely technological point of view the potential of transformations of cities into smart cities driven by big data analytics are huge.

Smart transportation is one of the areas that can transform how we commute.

Recognition of traffic patterns in real time, solving road congestion problems, utilising smart traffic lights systems and even driver-less cars that utilise real-time big data analytics.

Smart Environment

Environment data gathered from large amount of sensors and processed can provide weather information that will lead to improving agriculture, better energy management and informing the public about hazardous conditions.

Healthcare

Healthcare is another area where future use of big data analytics can have life changing effects. We all know the current technology we use on daily basis, our smartphones, smart watches and sport / fitness trackers. Ability to gather data in real time from all kinds of smart wearable devices can save your live one day.

I had a pleasure to attend Glasgow TEDx Talks where Dr Jack Kreindler talked about the future of healthcare, data and technology.

Dr Jack presenting a Bio Sensor to the audience

In the above picture Dr Jack Kreindler holds a bio-sensor that when attached to patients body is able to measure and transmit hart rate, ECG, body temperature, posture (standing, sitting, falling), stress, breathing rate.

To do this 10 years ago (for the drivers of formula 1) it costed about 10000 dollars.

Today a cost os such device it about $1 a day.

With devices like this and with many people using them, for a $1 a day (including all of the machine intelligence to compute the data) it is believed that the crisis that healthcare is facing with chronic diseases can be greatly reduced.

Among many different tests the results have shown that such device with big data analytics and cloud computing can alert patients of medical problems even before symptoms occur.

Technological requirements of Big Data

I have already mentioned some of the technological requirements that are very much linked to big data characteristics and especially the volume of it.

Storage

As mentioned previously traditional storage solution are not well suited for big data due to its volume and large variety of types of data.

Number of organisation who already have data storing capabilities in-house. Those organisations that are exploring the option of using big data analytics may need to reevaluate and opt for storage types that are more efficient for big data analytics such as cloud computing or the use of flash storage that may be more suited due to its performance advantages.

The largest companies such as Facebook for best efficiency and performance use. Those may consist of clusters of severs with direst-attached storage and using tools like Hadoop while often using PCIe flash based storage for improved latency.

Processing

Processing bid data and performing analytics operations require powerful computing resources.

While companies like Google can afford to build custom mage infrastructure of server clusters with powerful processing power smaller companies may opt for cost effective cloud computing (renting) that can be used on demand saving them large investments that would come from the purchase, set-up and maintenance of of their oven big data analytics server infrastructure. Smaller companies are also able tu use technologies like previously mentioned Hadoop.

Gathering data

There are many sources of big data. Most companies collect data about their customers, business operations, cities and councils collect data from sensors, big stores collect in-store customer traffic data.

Depending on the type of organisation the type of data may vary but all of those who collect big data have one thing in common, storage of gathered data. This is closely tied to the above mentioned storage requirements and will depend on type and size of the organisations.

There are meny technological choices including own big data analytics sever infrastructure, cloud computing or distributed storage systems that store data that is gathered from company customers that are interacting with their websites, shops, services.

Data is also gathered from loyalty cards, social media websites and apps, satellites and so on.

Big data Visualisation

Visualisation can be thought of as another characteristic of big data and currently it is a most challenging aspect that data scientist face.

Enormous amounts of data, event in a processed and analysed state is often not very useful insight for a human being. It may not be easily comprehended and auctioned on.

While traditional graphs did a good job for traditional data analysis in the past with big data there are simple too many data points (billions) to plot which usually fails due to in-memory limitations and poor scalability which is one of the important requirements in big data. When it comes to big date scalability is a must.

Solutions to those problems are constantly being developed. Some include the use of data clustering, tree maps, parallel coordinates or circular network diagrams.

Defining Big Data

The term big data is loosely defined as extremely large set of data that may be computationally analysed to reveal trends, associations and patterns especially when it comes to human interactions and behaviour.

This is just one of many definitions of big data. This is because the definition is still fluid and concept of big data is continually evolving.

Data itself isn’t a new concept. Even before computers and databases we had paper records: customer records, bank transaction records and so on. All of which are still data.

The emergence of computers, spreadsheets and then databases gave the possibility to store and organise this data more efficiently and on much larger scale.

Before we dive in big data let’t get familiar with basic definitions of data, information and knowledge.

Data can be words, dates, sounds, images, numbers and so on without its context. Data is a raw form and without context does not hold much value.

Information is a collection of data that is put into context that gives the data a meaning. It is data that has been processed and is organised.

Knowledge comes from understanding information, ability make predictions, form judgements and opinions based on understanding of that information.

Some also define big data with three or even five Vs. Those being:

Volume – The size of the data
Velocity – Defines the speed at which new data is generated, and the speed at which new data moves around
Variety – Defines that data is of many different types
Veracity – Refers to quality of data, its accuracy
Value – Big data by itself has no value. Value refers to the ability to turn data into value. According to IBM Big Data & Analytics Hub:
“It is easy to fall into the buzz trap and embark on big data initiatives without a clear understanding of the business value it will bring.” (Marr, 2015)

Historical development of big data – technologies and techniques

To talk about historical development of big data it will be easier to explain it in context of a business institutions and the term Business Intelligence that first references dated back to 1958 by Hans Peter Luhn an IBM researcher in IBM Journal of Research and Development “A Business Intelligence System” published in October 1958.

Later in the timeline with the progress of computerisation of business processes the definition of the term have then evolved in the 80’s decade and became more established in terms of defining set of software systems that would aid business decision making process based of gathering and analysing data (facts).

Those systems were that mainly focused on descriptive analysis that would take aggregated historical data and cross-match indicators to acquire better view of what has happened and is happening in the organisation.

Because data was mainly used for analysis of which the outcome produced some sort of value to the business (mostly in form of intelligence that could be acted on) at the end of the 80’s a Data Mining term have emerged. The term was used interchangeably with another trem called Knowledge Discovery in Databases (KDD) and meant extraction of data from data banks to form knowledge. This term have led the first international conference on Knowledge Discovery and Data Mining in 1995.

While businesses previously focused on descriptive analysis of data this is were predictive analysis began to spread among businesses during the 90’s decade using machine learning techniques.

Those techniques were utilised to search for patterns in data. This type of analysis were first mostly applied and used in banking and insurance industry. Early examples of that were detection of insurance fraud or process od approving or denial of credit applications.

The concept of Data Science has emerged from those king of mining applications. This term emerged in early 2000 and wal closely related to the field of statistics (Cleveland WS, 2001).

With data mining and popularisation of the World Wide Web companies started to face the problem of the dimensions of data.

It is worth to note that while early examples of storing and processing data (mainly focusing on descriptive analysis) could be stored and processed using capabilities of conventional approach (including relational databases), due to big data volume, velocity and variety it became unfeasible. It was the 3Vs model that began being a key aspect in strategies for optimal management of data within a business context.

Google has faced this problem initially when trying to solver PageRank algorithm efficiency problem when applying large volumes of data coming from the mining of the websites (Page L, Brin S, Motwani R, 1998)

Instead of using high performance machines with large amounts of processing power they came with another solution. They divided the large volumes of data to be processed among set of distributed file system using high performance computers.

The software and the programming model was named MapReduce and was a significant milestone that refers to origin of big data technologies (Dean J, Ghemawat S, 2004)

” Thanks to the use of this model, the work to be done by an application programmer is reduced to defining the details of two functions (“map” and “reduce”), which represent the two main steps in the processing of data.” (Niño, Mikel & Illarramendi, Arantza, 2015)

(Niño, Mikel & Illarramendi, Arantza, 2015)

This project served as grounds for Doug Cutting developed while working for Yahoo. He was building a system that implemented MapReduce. A system capable of processing enormous amounts of data required by the search engine. This is how Apache Hadoop was born.

Apache Hadoop being an open-source facilitated adoption of big data technologies.

At this point starting in 2005 with Google Sawzall (a language to program tasks when analysing big data structures on MapReduce other technologies were build including Apache Pig, Apache Hive, and the emergence of many NoSQL systems.

Below is a table showing measurement of data:

With ever increasing storage and processing capabilities the rate at which data is created in modern society is enormous. Data created over two days is as much data as was created from beginning of time until 2003.

“Global mobile data traffic grew 63 percent in 2016. Global mobile data traffic reached 7.2 exabytes per month at the end of 2016, up from 4.4 exabytes per month at the end of 2015. ” (Cisco)

It is also claimed that about 90% of data in the world was created in the las two years.

Image below shows data growth over time:

“The basic reason for such growth is that more people have more tools to create and share information than ever before therefore creating and generating new data. In the past data ( or information) was created by few companies and everyone else were the consumers of data and information. Nowadays all of us generate data on daily basis but also all of us consume data.

From a technical perspective to add to the tools that we have that enable us to create or generate data the costs of those tools as well as to store and process data are no longer prohibitive.

Going back to traditional data analysis we can point out few limitations of this approach.

As previously mentioned the large size of modern data is a limitation itself. It is not fesible to sore such large volumes of data in traditional relational database systems.

Another limitation is that most of the data created comes in semi-structured or unstructured form which relational databases are not suited for.

Velocity of modern data also created a problem for traditional
RDBMS which lacks in high velocity because it’s designed for steady data retention rather than rapid growth. It would be very expensive to handle and store “big data” in RDBMS.

While previous examples have mentions a little about some of the limitations of traditional data analysis in relation to modern data, in order to further expand on the topic we first need to dive in into basic definitions of traditional statistics such as descriptive and inferential statistics.

Descriptive statistics involve computing values which summarize and describe
a set of data , to tell about the features of data. Typically this includes statistics like mean, median, min, standard deviation, max, and so on, which are called summary statistics.

Inferential Statistics involve scientists to perform complex mathematical calculations. Those allow scientists to conclude trends about larger population, examine relationships between variables within a sample taken from the data and make generalisations or predictions about how those variables relate to a larger population.

This method does not take each individual piece of data into account and usually a representative sample is taken called a statistical sample. From the analysis scientists are able to conclude about the population that the sample came from.

Main limitations of traditional data analysis in context of large volumes of data such as big data come from the volume itself. As mentioned above volume is a problem in terms of storage technology which traditional methods are not well suited for.

Also going back to relational database management systems it is not feasible to store such vast amounts of data.

Three of the 5Vs of big data are good examples of limitations of traditional data alalisys on comparison to new technologies. Those are: Volume, Velocity and Variety that are defined above. All of those 3Vs present a great challenge in context of traditional data analysis while are more efficiently managed by new approaches and technologies such as Hodoop or Machine Learning.

In previous section we have covered characteristics of big data itself that came in form of 5Vs. In this section we will have a look at characteristics of big data analisis.

There are 4 main characteristics of big data analysis:

Programmatic

In most cases analysing big data involves custom software or procedures that may need to be programmed even when using open source software to operate on big data custom extensions may need to be added for particular needs of data analisis.

Data Driven

Having huge data sets enables a data driven analysis. As opposed to hypothesis-driven approach where scientist would develop a premise and collect data to whenever that premise is correct. Machine learning algorithms allow to do this kind of analysis hypothesis-free.

Attributes usage

In the past analysts may have been dealing with 100s of attributes or characteristics of data source. With big data there may be gigabytes of those attributes or observations.

Iterative

Modern advancements in computing power makes iteration easier and faster. Models can be trained on larger chunks of data as well as can be iterated as many times as it is required for analysts to be satisfied with the results. This was also enabled by distributed computing networks and by leveraging cloud computing as service. This in turn made those resources readily available and more cost effective even for smaller companies as well as individuals.

Visualisation

Visualisation can be thought of as another characteristic of big data and currently it is a most challenging aspect that data scientist face.

Enormous amounts of data, event in a processed and analysed state is often not very useful insight for a human being. It may not be easily comprehended and actioned on.

Solutions to those problems are constantly being developed. Some include the use of data clustering, tree maps, parallel coordinates or circular network diagrams.

Value of data:

Value is one of the characteristics of big data. there is a great value to be found form the big data analysis which currently including understanding of customers, targeting, optimising business processes.

Big data combined with machine learning has a huge value now and it will only grow in the future. Currently machine learning helps to regulate traffic more efficiently as well as saves huge amounts of money in medical industry by cutting the time required to make diagnosis.

Currently value of big data is on the raise and for the right reasons. Having enough data and incorporating machine-learning technologies to analyse that data leads to even more accurate predictions.

Main and most cited benefit that comes from big data analysis making better strategic decisions.

Companies can improved their operational processes, reduced costs, or improve customer insights and/or experience.

With new technologies and computing power growth the value of future data is on the raise because of the near-future possibility to tap into the amounts of data that is not currently possible.

To give an example, current value of Facebook is around £263 billion while their revenue for the year 2017 was $40 billion. The future value of data.

References:

Luhn HP. “A Business Intelligence System”. IBM Journal of Research and Development. October 1958. Vol.2-4. p.314-319

Marr, B. (2015). Why only one of the 5 Vs of big data really matters. [online] IBM Big Data & Analytics Hub. Available at: https://www.ibmbigdatahub.com/blog/why-only-one-5-vs-big-data-really-matters [Accessed 27 Mar. 2019].

Cleveland WS. “Data Science: an Action Plan for Expanding the Technical Areas of the Field of Statistics”. International Statistical Review. April 2001. Vol.69-1. P.21-26. DOI: http://dx.doi.org/10.2307/1403527

Page L, Brin S, Motwani R. “The PageRank Citation Ranking: Bringing Order to the Web”. Technical Report. January 1998. Stanford Digital Library Technologies Project

Dean J, Ghemawat S. “MapReduce: Simplified Data Processing on Large Clusters”. Proceedings of the 6th Symposium on Operating System Design and Implementation (OSDI’04). December 2004. p.137-150

Niño, Mikel & Illarramendi, Arantza. (2015). Understanding Big Data: antecedents, origin and later development. Dyna New Technologies. 2. 1-8. 10.6036/NT7835.