Organizations rely on different data sources to capture information for making smart business decisions. A lot of the information gathered is for compliance purposes. Many organizations have discovered that they not just lack the right policies to capture the data but also lack a robust technology infrastructure to manage and understand the data.
Over the past couple of decades, Data has grown in volume and type, which has forced organizations to finally address the issue of dark data. This excessive amount of data has not just increased the storage cost but continue to remain unutilized.
What is Dark Data?
Contrary to what the name suggests, there is nothing dark in dark data neither it is scary. Organizations collect a vast amount of data to make logical decisions for their benefit but most of the collected data is never used for making a business decision, and this unutilized data is known as dark data.
Where does Dark Data come from and what is its type?
Dark data could be found in log files, data archives, website log files, emails, etc. of an organization. Data is very similar to the iceberg where the visible part is the data that is being utilized whereas the data that is submerged and is invisible is the dark data.
Dark Data is usually categorized into two different types. Let’s try and understand each type with an example
Type 1: For example, let’s take chat messages or customer emails, the content in the message can turn into dark data if the organization doesn’t extract the meaning from the message in a way that the data analysis tools can analyze it.
Type 2: The metadata which comes along with the chat message or from the customer’s emails like the time at which it was sent, sender name, receiver name, device used to send it, location, attachments (if any), etc. become dark data when the email or message gets archived.
Data for both the types reside in the databases but they are not used to derive any insights. It is stored in the database so that it can be retrieved in the future if required.
Real-Life Example
One of the restaurants of a famous food chain wanted to identify the reason behind the decreasing footfall. Any restaurant will typically try and collect feedback on the quality of food, the quantity of food, pricing, presentations, taste, ambiance, service, etc.
There is a good chance that the primary reason behind decreasing footfall in the restaurant is due to the limited or no parking facilities. Information about the limited and no parking facility was always there with the restaurant but they never used it to identify the problem. This kind of data that is available with the organizations which they never consider to look at for any query is referred to as “dark data”.
Is Dark Data available only in the unstructured data?
Dark Data can be there in both Structured Data as well as in Unstructured Data. Often unstructured data becomes dark data as organizations don’t know how to analyze the data to get the insights. However, structured data could also be a part of dark data. When data is stored in the structured format in the database but it is not being used by the organization to obtain the insights in that case stored data becomes dark data.
Problems Associated with Dark Data
Organizations often capture much more data than they are capable of. Most of the captured data stay in the dark because most organizations do not have the required tools and capabilities to process the data efficiently.
“According to IDC, organizations fail to analyze 90% of the unstructured data.”
Most organizations don’t have access to tools that can manage and utilize all the captured data. It’s being observed that most organizations want to capture as much data as they can but they don’t have enough resources to analyze all the captured data. Organizations are looking for tools that can look inside their data and can reveal insights that can provide them with a business advantage.
How can we leverage Dark Data?
Drawbacks of storing dark data are often more than their benefits. Lack of data security associated with dark data could even lead to cyber-attacks, non-compliance issues, etc.
The best way to tackle dark data is by utilizing it well. It may not be easy for most of the organization to utilize all the captured data as it requires both the considerable investment of time and the money. There are some ways with the help of which organizations can reduce/use most of their dark data.
Organizations should regularly audit their databases. They should eliminate such data points which are not useful for them this will eventually save a lot of space.
Organizations should even try to keep their data in a structured format.
Even if the business decides to dump dark data even then they should keep the data encrypted and in a secure manner.
Organizations should label their unstructured data so that it is easy for them to find in the future for analysis.
Organizations should have their data retention and data disposal policies in place so that data can be retained and disposed of with ease.
Organizations should use Artificial Intelligence tools as they have capabilities to make documents discoverable through search. AI has the ability to crawl through the data to understand and classify them automatically.
Conclusion
Dark Data represents unused opportunities that organizations are unable to utilize because of the investment and technology constraint. The investment required to deal with dark data is costly but the outcome is worth the investment made. If organizations opt to sit on the dark data and do nothing about it then it could eventually lead them to several risks like cyber-attacks. The key is to do something about the dark data rather than treating it to use fewer data.
About Girikon
Girikon is a reputed name in the IT service space with a focus on Salesforce consulting and Salesforce implementation services. Besides being a Salesforce Gold partner, the company has multiple accreditations to its credit.
Are you dealing with duplicate data?
Does your data not fall under exact match?
Are the duplicates in your data not consistent for an exact match?
Are you struggling with cleansing of different types of data duplicates?
If you have answered yes to most or all of the aforementioned questions then the solution to your problem is Fuzzy Matching. Fuzzy matching allows you to deal with the above mentioned problems easily and efficiently.
What is Data Matching?
Data Matching is the process of discovering records that refer to the same data set. When records come from multiple data sets and do not have any common key identifier, we can use data matching techniques to detect duplicate records within a single dataset.
We perform the following steps:
Standardize the dataset
Pick unique and standard attributes
Break dataset into similar sized blocks
Match and Assigning weights to the matches
Add it all up — get a TOTAL weight
What is Fuzzy matching?
Fuzzy matching allows you to identify non-exact matches of your dataset. It is the foundation of many search engine frameworks and it helps you get relevant search results even if you have a typo in your query or a different verbal tense.
There are many algorithms that can be used for fuzzy searching on text, but virtually all search engine frameworks (including bleve) use primarily the Levenshtein Distance for fuzzy string matching:
Levenshtein Distance: Also known as Edit Distance, it is the number of transformations (deletions, insertions, or substitutions) required to transform a source string into the target one. For example, if the target term is “book” and the source is “back”, you will need to change the first “o” to “a” and the second “o” to “c”, which will give us a Levenshtein Distance of 2.
Additionally, some frameworks also support the Damerau-Levenshtein distance:
Damerau-Levenshtein distance: It is an extension to Levenshtein Distance, allowing one extra operation: Transposition of two adjacent characters:
Ex: TSAR to STAR
Damerau-Levenshtein distance = 1 (Switching S and T positions cost only one operation)
Levenshtein distance = 2 (Replace S by T and T by S)
How to Use Fuzzy Matching in TALEND?
Step 1: Create an Excel “Sample Data” with 2 columns “Demo Event 1” and “Demo Event 2”.
Demo Event 1: This column contains the records on which we need to apply Fuzzy Logic.
Demo Event 2: This column contains the records that need to be compared with the Column 1 for Fuzzy match.
Step 2: In TALEND use the above Excel as input in the tfileInputExcel component and provide the same file again as input to the same component as shown in the diagram.
Step 3: In the tFuzzyMAtch component choose the following configurations as shown in the below diagram.
Step 4: In the tMap we need to choose the following column to take an output.
Demo_Events_1
MATCHING
VALUE
Step 5: Finally, you need to select an tFileOutputExcel component for the desired output.
In the final Extracted file, the Column “VALUE” shows the difference between the records and matches the records to their duplicate.
Conclusion:
In a nutshell, we can say that the use of TALEND’s Fuzzy Matching helps in ensuring the data quality of any source data against a reference data source by identifying and removing any kind of duplicity created from inconsistent data. This technique is also useful for complex data matching and data duplicate analysis.
About Girikon
Girikon is a reputed provider of high-quality IT services including but not limited to Salesforce consulting, Salesforce implementation and Salesforce support.
Data Visualization Using Tableau
-
April 22, 2020
-
Saurav Sindhwani
Data visualization is the act of taking data and placing it into a visual context, such as a map or graph to bring the Information out of it. Visualization also makes it easier to detect patterns, trends, and outliers in groups of data which can define the next strategy for a business.
Good visualizations should extractinformation from complicated datasets so that theinformation is clear and concise. Now the question comes which Tool can we use for a better understanding of Data? The Answer is Tableau because
Tableau has more flexible deployment options compared to Visualization Tools.
Tableau, along with on-premises deployment also supports cloud services as well.
Tableau connects to many different data sources and can visualize larger data sets than any other BI tool can.
The inbuilt AI gives it more power than any other tools, through which you only needs to drag and drop the data and Tableau Engine will display the most suitable visualization for your data, which definitely you can change and customize as per your need. Also customization is way better than any other BI as it can be formatted to the slightest detail.
Tableau is very good with creating processes and calculations. For example, while creating calculations in tableau, the formula can be typed once, stored as a field and applied to all referencing that source. This makes it easier to create and apply recurring processes. Tableau’s flexibility allows users to create custom formulas that can be applied to a filter or a field.
Data storytelling is also one of the unique and easy to use features in Tableau which makes it different from other.
Another major thing is Apart from Real Time data, it also allows using extract for fast retrieval and display of data which can be refreshed as per the need of the user.
About Girikon:
Girikon is a reputer provider of end-to-end IT services including Salesforce consulting, implementation and Salesforce support. Their commitment to excellence has made them a preferred choice among their customers.
Girikon boasts of a strong Data Management Practice when it comes to handling CRM data. With a strong team of Data Architects, Data Specialists, ETL Experts and Data Stewards, we have successfully walked hand-in-hand with our clients in helping them define & implement custom-fitted Data Management Strategies. We aim at CRM deployments that are high performing, scalable and adhering to security protocols, data privacy and third-party compliances that matter in your industry like Payment Card Industry (PCI), the Health Insurance Portability and Accountability Act (HIPAA) etc.
We have extensive experience in Data Extraction, Transformation and Loading (ETL) data from a wide variety of sources including legacy applications, ERP systems, CRMs & other web content, Standard relational databases, NoSQL Database (MongoDB), on premise/cloud-based applications, Files (e.g. XML, Excel, CSV, flat files) and web service APIs.
Our Enterprise Data Integration skills extends to cutting-edge ETL tools including Talend, Informatica etc. and support delivery of reliable data integration solutions to our clients across the globe.
Girikon’s Data Services include:
Data Integration Services using Talend/Informatica – Girikon’s team of experts provide scalable data integration and data quality solutions for integrating, cleansing and profiling of all kinds of corporate data using Talend & Informatica.
Master Data Management Services – Our MDM services include consolidation of data across various businesses in an enterprise using Talend or otherwise. We help create a single “version of the truth” for our customers.
Application Integration Services – Using Mulesoft, we specialize in providing a common set of application integration tools to build a service-oriented architecture, to connect and manage services in real-time.
Data Preparation Services – These services include manipulation of data into a form suitable for further discovery, visualization, processing and enrichment.
Data Migration from various orgs in Salesforce – We have successfully completed several enterprise-wide business consolidation projects for our customers. Along with the Salesforce system development to meet the required business needs, we have gained thorough experience in migration of the related Data to enable synchronized business.
Data Migration from different CRMs like Sugar, MS Dynamics to Salesforce – In addition to inter-org migration of Data within the Salesforce environment, we are also adept in migrating data from other CRMs like Sugar, MS Dynamics etc. to Salesforce.
Data Stewardship Services – Of late, we have seen a surge in demand for Data Stewardship services requiring resources to be responsible for maintenance and quality of data required throughout the organization. By scaling up to serve the needs of our existing accounts in these areas, we now have developed a dedicated team of Data Stewards who are ready to become custodians of your organizations data in a way that would facilitate your growth.
End-to-End ETL (Extract, Transform and Load) Services – While as mentioned above, we can take up activities in parts if that is the business need, what we exceptionally excel at is end-to-end ETL processes. Using Talend, Informatica etc., we would love to help you to eliminate the silos in your business, bring in data from multiple sources and Load to Salesforce for a consolidate view resulting in good, well-analyzed decision making.
How Girikon, as a Salesforce Consulting Partner Helps
Organizations looking at any kind of Data Services in relation with Salesforce can reach out to Girikon for assistance with consulting, design, execution and training in the above mentioned areas and rest assured of quality deliverables.
ETL – Extract data from multiple, varied sources, transform it as required to meet the need and then Load data to Salesforce (Tools – Talend, Informatica etc.)
Data preparation (or data preprocessing) – We can help prepare and deliver clean, usable data for use as per business requirements
Data Stewardship – Girikon’s Data Stewards possess the required expertise & experience to be responsible custodians of your business data’s quality & maintenance
CRM Master Data Management & Data Migration – We are experts in making your data much more usable in a very cost efficient manner. We have our in house de-dup & data merging application which makes the process much more simpler than it would be otherwise.
Any specific Training & Support – Girikon’s Data consultants have so much exposure and experience of varied systems, situations & solutions that they would love to share some of the knowledge gained with your teams to bring in various perspectives. Additionally, we can also help with specific processes and tools related trainings.