... can be Very confusing. Help people out with lettered parking signs

Beyond Volume, Variety and Velocity is the Issue of Big Data Veracity

Big data incorporates all the varieties of data, including structured data and unstructured data from e-mails, social media, text streams, and so on. This kind of data management requires companies to leverage both their structured and unstructured data.Big data enables organizations to store, manage, and manipulate vast amounts of disparate data at the right speed and at the right time. To gain the right insights, big data is typically broken down by three characteristics:

  • Volume:How much data
  • Velocity:How fast data is processed
  • Variety:The various types of data


Big data implies enormous volumes of data. It used to be that employees created data. Now that data is generated by machines, networks and human interaction on systems like social media the volume of data to be analyzed is massive. The tons of information in any of the billions of network poured in a single day can make easily understand the ever growing volume of data. Further research on Big Data is focused on making this volume available for various kinds of analysis in enterprises or business organizations. For instance, billions transaction data in a retail chain can be subject to analyze buying trend or consumer’s buying frequency for select products or for example trillions of fuel bills can be subject to analysis for next vehicle fuel policy.


Variety refers to the many sources and types of data both structured and unstructured. We used to store data from sources like spreadsheets and databases. Now data comes in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. This variety of unstructured data creates problems for storage, mining and analyzing data. The widest possible variety of data types is one aspect corresponding to Big Data analysis that is going to pave the way for numerous benefits for big to small, all sorts of organizations. Big Data comprises any type of data, both structured and non-structured. It can be audio visual, graphic representation, spreadsheets and log files, 3D images to simple text to click links or simply anything. When these multifarious types of data are analyzed together they may provide great range of insights for particular researches. For instance tons and tons of text messages over a football match can be analyzed in contradiction to measurably small number of actual spectators which may indicate a necessity for changing marketing and publicity tactics for the event managers and organizers of the match.


Big Data Velocity deals with the pace at which data flows in from sources like business processes, machines, networks and human interaction with things like social media sites, mobile devices, etc. The flow of data is massive and continuous. This real-time data can help researchers and businesses make valuable decisions that provide strategic competitive advantages and ROI if you are able to handle the velocity. Big data has much more bigger implications in time sensitive business processes than others. It is always a hurried process to analyze to scrutinize maximum volume of data for a potential business objective like catching a fraud in transactions or locating the exact reason of why clients of a particular business process are not coming back. Only faster scrutinizing capability that can handle large volume of data in real time can translate into business benefits. Faster processing of time sensitive data is to give you an edge in fault finding or finding the hidden loop in the process, that is exactly one of the demands of Big Data that is increasingly becoming crucial.


Big Data Veracity refers to the biases, noise and abnormality in data. Is the data that is being stored, and mined meaningful to the problem being analyzed. Inderpal feel veracity in data analysis is the biggest challenge when compares to things like volume and velocity. In scoping out your big data strategy you need to have your team and partners work to help keep your data clean and processes to keep ‘dirty data’ from accumulating in your systems.Accuracy or trustworthiness of information is one aspect that challenges the use of data in business analysis or trend analytics. While many business managers and top decision makers are still skeptics about the accuracy and corresponding outcome of business analysis based on various sources of data, when this body of data grows enormously bigger to contain various contradictory trends and aspects it can as well be a good basis for determining the accuracy. As the volume grows bigger in Big Data analysis, the efforts motivated by partial observation becomes futile and thus exceptionally Big Data reserves when handled properly can render more accurate observations.

The Four V’s – Volume, Velocity, Variety, Veracity Infographic

,Just in case you ever needed an infographic for the 4 V’s of big data, IBM has one for you,

4 V's of big data: volume, velocity, variety, veracity

Mircosoft: Connecting Kopparberg to growth


   Image result for kopparberg

Kopparberg is a subsidiary of Kopparberg Brewery Sweden that operates both in Britain and Northern Ireland. It was launched in 2006 and has been a tremendous success, becoming Britains largest fruit cider brand in both the on and off trade. It has a staggering £120 million turnover per annum, helped greatly by Microsofts suite of solutions. The Kopparberg team has a unique structure for a business of its kind with the majority of staff working remotely. The team spends most of its time visiting prospects and customers around Britain and Northern Ireland, so being able to link up with the main office to share information in real time is crucial to growth, so this was an area of the business that needed to be addressed. Through a Microsoft partner, “leaf”, a number of Microsoft solutions was implemented within Kopparberg that significantly improved performance. Its made the team 15 percent more efficient and significantly reduced lead times from prospects to clients.

Image result for lync  Image result for lync

Adopting “Lync” improved communication while removing the need for physical face-to-face meetings, this was very useful for the sales team when they were briefing new business and team members. Kopparberg needed streamlined communications and secure back-up to ensure continuity of sales and order processing. The implementation of “OneDrive” ensured that relevant information could be shared across the team instantly and securely. Management can set permissions on files and share them with select team members as necessary, further heightening security. The previous method of emails created extra administration and sometimes confusion.


Image result for microsoft crm Image result for microsoft crm

Microsoft Dynamics CRM, which was deployed not, too long ago, has made a positive impact. Its become an invaluable part of the business as the management can now collate information from the sales team on the road and upload it to a central point. Kopparberg now have the information to really focus on marketing and distribution efforts. They can now identify specific venues in certain areas of the country that do not stock a particular product of ours, thereby helping us pinpoints marketing and sales spend. Kopparberg has found the Microsoft process to be efficient. It was easy to sell into the business as the benefits of integration were obvious from the outset. Thanks to the ease of use of “Lync”, “OneDrive” and Office365, Kopparberg can rapidly train staff and engage new staff as well as minimise the impact when existing staff move away from the organisation. Kopparberg plan to add additional improvements to its systems, including email tagging linking directly to the CRM, enabling Kopparberg see any pertinent customer or prospective customer trends. They are acutely aware they live in a “Microsoft world” and chose to be there as it allows our team to be on the road, more productive and more successful.

Microsoft the softly, softly approach I don’t think so!

Image result for microsoft

Microsoft simply wants to make business smarter, and it is on a mission to simplify BI and analytics by making it accessible from the heart of a business. The last 15 years has seen Microsoft extend its reach with enterprise applications, expanding out for for its best- selling Office suite to encompass ERP and CRM. BI and analytics are also part of the mix, but we must be very clear about the difference. “BI is about dealing with facts, maybe merging multiple data sets and connecting dots across business. Analytics is more scientific, predicting the less obvious. It’s usually aligned with innovation and more agile decision-making. What Microsoft has seen  is a shift in trend were a core group of business people stand up and take ownership of data. They want to leverage it as an asset and bring technology to bear on it. Microsoft reckons that data cleansing or as they like to call it “data wrangling” takes up to 50% of an organisations analytics project time that is why they are setting about making people more productive with data. The first move was to make BI more accessible, by leveraging the power and familiarity of Excel at the front end and the ubiquity of SQL server at the back, they have been able to developed self-service tools and procedures to mask complex models and visualisation techniques, something you could only do with specialised tools 15 years ago is now a simple click in Excel today. You can have three dimensional models that live on your pc and don’t require a ton of sever horse power. Microsoft is fully aware that this is only one part of the intelligence jigsaw and has been working hard to slot in the big data piece. Bigger projects that leverage open source super compute frameworks like Hadoop can be accessed through Azure Microsoft`s cloud-based development platform.

“R” Is It Really A Global Phenomenon?


Well it was founded and created in New Zealand, so that’s not a bad start, when you have a situation that every data analysis technique is at your fingertips and “R” has the luxury that it can draw upon virtually every data manipulation, statistical model, and chart that the modern data scientist could ever need. “R” has the capabilities of creating beautiful and unique data visualizations, that go far beyond the traditional bar chart and line plot, from the simplicity of variables, vectors and data fames to the stunning infographics of multi-panel charts, 3-D surfaces and more “R” has the “lot”. These custom charting capabilities of “R” are featured worldwide in so many different domains, ie(The New York Times and the Economist).

Image result for THE ECONOMISTImage result for new york times

“R” is a masterly, proficient and skilful tool that gets better results faster, it does not have to use point-and-click menus, it is designed expressly for data analysis. Intermediate level “R” programmers create data analyses faster than users of legacy statistical software, With the added bonus that it can mix-and-match models for the best results. It should also be noted that “R” scripts are easily automated, promoting both reproducible research and production deployments.

Image result for r languageImage result for IMAGES OF R DEVELOPERS


“R” is without doubt a global community,it has more than 2 million users and developers who voluntarily contribute their time and technical expertise to maintain, support and extend the “R” language and its environment, tools and infrastructure. At the centre of the “R” community is the “R” core group of approximately twenty developers worldwide who maintain “R” and guide its evolution. The official public structure for the “R” community is provided by the “R” foundation, a not for profit organization that ensures the financial  stability of the “R”-project and holds and administers the copyright of “R” software and its documentation.

Image result for IMAGES OF R DEVELOPERS   Image result for IMAGES OF R DEVELOPERS



“Trickle, Trickle, Trickle”( “Simple”,”Fast”,”Cheap”) Part 1

Image result for trickling effect

Data analytics has trickled down from large corporates and is now readily available in the mainstream. No longer the sole preserve of large corporations, it is more accessible, more immediate and more affordable. When you think about it “computers” were invented for analysing data, and really “big data” has been around for ages and has just been rebranded. Organisations are swamped in data; a steady trickle from accounts and ERP packages was swelled with the advent of email, e-commerce and the increasing use of CRM. Now with new technologies able to scrape unstructured and as well as structured data and “big data” entering the lexicon of business language, we are waking up to a deluge. The challenge is where to start turning it all to business advantage. Firms will find different reasons to surface data for analytics. Its sector specific and depends on where the company is coming from.

Image result for accenture  Image result for accenture

Take Accenture as an example, they are seeing a lot of focus on getting a better understanding of customers and customer behaviour. Companies are looking to leverage broad sets of data to get a holistic view that allows them to engage in a more personalised and targeted way. In manufacturing Accenture is seeing is a focus on operations, leveraging sensor driven data to send out alerts on the imminent failure of a device, a good example of this would be financial institutions and how they focus on risk, identifying deviations in data that exposes fraud sooner rather than later. They key challenge for companies is trying to corral their data and get it in order, trying to identify the sources of the data and whether they can trust it. Despite all the hype around big data, organisations in Ireland are still relatively immature and struggling with the fundamentals. Accenture estimates that anywhere between 20 and 70 percent of a project is the data cleansing piece, making the data ready for analytics, long established processes such as ETL(extract transform load) are still used, but whats changed is the expectation of faster results. Making this possible are new technologies such as “Hadoop” that can crunch vast amounts of data quickly. If a bank wants to measure the impact of closing down a branch, for example, where you are not as concerned about the quality of data, but want to get a quick answer to a quick question then Hadoop will do the job. It will be able to tell you the systemic impact of closing it down.

Image result for hadoop  Image result for hadoop

R (Programming Language), A Quick Introduction:

“R” is a programming language and software environment for statistical computing and graphics. The “R” language is widely used among statisticians and data miners for developing statistical software and data analysis. It is also called “GNU S” and is a strong functional language, with a big emphasis on linear and non-linear modelling, classical statistical tests, time series analysis, clustering and classification. The “R” language offers an open source to participation in the area of statistical methodology, and one of its great strengths lies in its ability to design high quality publication plots, i.e. (mathematical symbols and formulae) and secondly it is designed around a true computer language, so it allows users the opportunity to add additional functionality by defining new functions.

Image result for r language  Image result for r language

Why use the “R” language

Well it is both flexible and powerful, and really importantly it is designed to operate the way that problems are thought about, “R” is not just a package, it is a language, there is a difference it has to be said, with a package you can perform a set number of tasks – often with some options that can be varied. A language allows you to specify the performance of new tasks. One of the goals of “R” is that language should mirror the way that people think. Ok let’s take a simple example, suppose we think that weight is a function of height and girth. The “R” formula to express this is: “ weight ~ height + girth” so very simply The + is not “as in addition”, but as in “and”.

Another feature of the “R” Language is that it is vector-oriented – meaning that objects are generally treated as a whole – as humans tend to think of the situation – rather than as a collection of individual numbers. Suppose that we want to change the heights from inches to centimetres. In “R” the command would be “ <- 2.54 * height. Inches”, here height. Inches are an object that contains some number – one or millions – of heights. “R” hides from the user that this is a series of multiplications, but acts more like we think – whatever is in inches multiply by 2.54 to get centimetres. Over the last decade “R” has become the most powerful and widely used statistical software, it has without doubt enhanced itself as the most popular language for data science and an essential tool for finance and analytics-driven companies such as Google, Facebook, and LinkedIn. We might explore a little bit more about “R” in my next “post”, and in the mean time we might ask Larry!