Big Data


Designed by Freepik

In this article, I will be going big on Big Data, and asking just how big is it?

I'll be looking at where Big Data comes from? 

Should we be worried about it?

What mysteries are hidden within the seemingly endless amounts of information collected about us. As we go about our daily lives.

You may assume big data does not really affect you.

You may choose to spend very little time on the internet. You do not use social media or watch online media. You only make one or two purchases a week. You only started that due to COVID 19 lock down. 

Yet you have noticed when you go online shopping ads appear suggesting a purchase. The offering resembles things you purchased in the past.

In the past data gathering meant someone stopping you in the street and asking you questions.

Things have certainly moved on. 

Every time you search online. Switch on your phone. Use Alexa, Google Assistant, Siri. Your Oyster card. Shop loyalty cards. Your credit cards. Your smart TV, you give away information about you. 

What is this information used for?

Big data is an amalgamation of different data sets.

Let us assume you are a research scientist. Your specialty is the progression of multiple sclerosis.

You seek patient post codes, weather reports patient symptoms, brain scans, disease progression. You automatically collect many other dimensions of data in real time.

Using artificial intelligence. You can process the data to project forward in time and make predictions. Your intention is learn how the disease might progress and possibly slow and or cure the disease. 

In some ways big data is not new.

Population census told us a lot about one person and many. We all contributed to the data set.

What has changed is the type and volume.

For example, your mobile phone collects data as where you are, where you have been. Everything you type into a browser is recordedYou may have a smart watch monitoring your heart rate. Data is transmitted to your mobile phone.

You may, have a smart home, you may be able to switch your lights, heating on remotely. You may have home security so you can remotely monitor your home. All these devices collect data.

What is different today is the range of data collected. 

The data can tell someone your gender. Sexual orientation, whether you have had an abortion, problems conceiving and much more.

A statistician worked out if a woman purchases unscented body lotion, vitamins. And they stopped buying alcohol. They maybe pregnant and in the second stage of their trimester. This information indicates when a woman is likely to give birth. The result of this knowledge is they know when to begin targeting ads. When to send coupons to a woman needing baby consumables.

The issue with this is when does this become an infringement on civil liabilities. For example, a father was outraged his daughter had been sent coupons. The daughter had not told her father she was pregnant. The father assumed his daughter was not pregnant. So he complained. The company apologised. The father later found out his daughter was pregnant.

How does an algorithm work out whether a woman is pregnant or you need a new mattress? 

Lose associations of what you search for. 

For example back pain, insomnia suggests you may be interested in a new mattress.

Some may find it frightening so much knowledge can be attained.

However, its’ not infallible. I am sure you have seen data suggesting your gender, your age range. Both are wrong. The reason why. Predicting human behaviour is not black or white. Humans are irrational beings.

An example of this. Hawaii reported a possible North Korean missile strike. Pornhub collects viewer audience data. When the first text message was released stating eminent missile strike. Pornhub’s viewer audience dropped by eighty percent. The second text message sent a few minutes later stated no impending missile strike. Audience figures jumped fifty percent higher than normal.

You can predict what a demographic group or population will do.

Predicting what an individual will do is problematical.

It is simply not possible to predict what an individual will do with absolute certainty.

Algorithms are being used to analysis peoples past to predict whether a person will go on to commit a crime. Increasingly algorithms are used to determine whether someone should be given bail. Or following a trial given a prison sentence.

Durham police undertook research into these algorithms. They discovered your post code determined whether you where given bail. Or sent to prison following a trial. 

Doctor Hannah Fry Mathematician. Has said ‘Artificial Intelligence (AI) is a revolution is computation statistics. Not a revolution in intelligence’. She went on to suggest human intent cannot be predicted.

But you can predict how people will behave in a crowd. How passengers will behaviour. How people use the transport system. When and where people are going. When and where the pinch points will occur. Where to redirect passengers.

The same techniques can be applied. To how resources are deployed in law enforcement, health care.

The problem with defining individuals as a group

A person lives in high area of unemployment. An assumption is made the person is likely to be unemployed.

Government wants to nudge people to live healthy lives. So government redesign the system. To force people to walk more without consulting them.

Is this an infringement upon civil liabilities?

Questions must be asked

What can we do with big data? 

Who has oversight of the data? 

Should government control and or regulate data? 

China requires citizens to have ID cards. Face recognition is widespread in China.

One Chinese city used facial recognition for public toilets. When a person uses the toilet nine pieces of toilet paper is automatically dispensed.

If that person returned within nine minutes. Access to all toilet paper dispenses is locked.

The reason high theft rate of toilet paper.

The question needs to be raised.

Should data collection be defined? 

Why, what, when, where, who, how is the data to be collected, stored, and used. 

The problem with data regulation?

Big data offers huge opportunities. If restrictive controls are introduced. We lose the chance to identify patterns in data.

You may ask why is that a loss? 

There is know way of knowing what patterns might exist prior to data interrogation.

Think of it this way. No one suspects eating to many apples will lead to heart problem. Analysing the data may indicate there is a correlation. Silly example nevertheless some break throughs are made this way. An example Alexander Fleming serendipity discover of penicillin.

On the downside. An individual requires medical treatment. Analysis of their purchases indicates they have eaten pizza for thirty years. They are told their eating habits disqualifies them from treatment.

If we are not too lose the opportunities  serendipity presents. People must have trust in data governance

  • Individuals must have confidence in how data is collected.
  • How data is stored, and used.
  • How an individual’s data can be identified and used.

Maybe individuals should be given a choice?

Interestingly some people are attending Crypto parties

At such gathering people are taught how to reclaim their privacy. For example, use a USB with an operating system installed. How to hide what you search for using your USB and the dark net. How to change settings on your mobile phone to increase privacy. How to use encryption and VPNs. 

It’s a civil liabilities minefield. Nevertheless huge opportunities are there for the taking if safe guards are put in place.

What do you think?

About the author 

Christopher Bird

Building your own Power App, BI solution, or automated workflow can be a mind-blowing experience. It can also be a nightmare. Particularly when you begin with a blank screen. My advice, get professional help as and when you need it. That's what successful people do.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}
Subscribe to get the latest updates