Data Warehousing
- Data Warehousing is the process of collecting, organizing and managing data from different data sources to provide meaningful business insights and forecasts to respective users.
- Integrates data and information collected from various sources into one comprehensive database.
- that can be analyzed to make more informed decisions.
- The concept of data warehouses first came into use in the 1980s when IBM researchers Paul Murphy and Barry Devlin developed the business data warehouse.
- American computer scientist Bill Inmon is considered the "father"of data warehouse due to his authorship of several works.
Data Warehousing |
Characteristics of Data Warehousing:-
- Subject Oriented
- Integrated
- Historical Data
- Non Volatile
- Summarised
Data Mining:-
- Mining of knowledge from large data.
- the process of sorting through large data sets to identify patterns and relationships that can help solve business problems through data analysis.
- Data Mining techniques and tools enable enterprises to predict future trends and make more-informed business decisions.
- Extract useful patterns are relationship from database.
Q1. Define Data set.?
·
A data set is a collection of data that is used
for analysis and modeling.
·
The dataset can contain one or more variables
or attributes that describe the characteristics or properties of the
observations.
· The dataset is a crucial component of the data mining process, as the quality and quantity of the data can have a significant impact on the accuracy, validity, and usefulness of the results.
Q2. Define an attribute ?
·
An attribute is an object’s
property or characteristics. For example. A person’s hair colour,
air humidity etc.
· An attribute set defines an object. The object is also referred to as a record of the instances or entity.
Q3.
Difference between Discrete Attribute and continuous attribute?
·
In data analysis, a variable can be classified
as either discrete or continuous. Discrete attributes are those that can take
on only a finite or countable number of values. Continuous attributes, on the
other hand, can take on any value within a certain range.
Here are some key differences between discrete and
continuous attributes:
·
Values: Discrete attributes take on a limited number of values
(often integers), whereas continuous attributes can take on an infinite number
of values.
·
Nature: Discrete attributes are often categorical in nature
and represent distinct categories or classes. Continuous attributes, on the
other hand, are usually quantitative and represent measurements along a
continuous scale.
·
Measurability: Discrete attributes are easy to measure since
they have distinct, well-defined values. Continuous attributes, however, can be
more challenging to measure since they can take on any value within a range,
and may require more precise instruments.
· Representation: Discrete attributes are often represented using bar charts or pie charts, while continuous attributes are usually displayed using histograms or line charts.
Q4. Define Data mining (as many
definition that many)?
· Data mining refers to the process of extracting hidden, previously unknown, and potentially useful information from large datasets.
·
There are several definitions of data
mining, and here are a few of them:
· Data mining is the process of discovering meaningful patterns, correlations, and trends by sifting through large amounts of data using advanced algorithms and statistical techniques. (IBM)
·
Data mining is the process of extracting
valid, novel, potentially useful, and ultimately understandable patterns
from data. (Fayyad, Piatetsky-Shapiro, & Smyth, 1996)
·
Data mining is the process of finding
correlations or patterns among dozens of fields in large relational databases.
(Microsoft)
·
Data mining is a process that uses
statistical, mathematical, artificial intelligence, and machine learning
techniques to uncover hidden patterns and relationships in data that can be
used to predict future behavior. (SAS)
·
Data mining is the computational process
of discovering patterns in large data sets involving methods at the intersection
of artificial intelligence, machine learning, statistics, and database systems.
(Wikipedia)
·
Data mining is the process of identifying
useful information from a large dataset, where the goal is to extract
interesting and non-trivial patterns from the data. (University of California,
Riverside)
Overall, data mining involves a combination of techniques
from several disciplines, including statistics, machine learning, artificial
intelligence, and database systems, to analyze large datasets and uncover
hidden insights that can be used to make better decisions or predictions.
Q5. What is a pattern?
· A pattern is a set of features or attributes that describes a specific behavior, trend, or relationship within a dataset. Patterns can be represented in many forms such as rules, associations, sequences, clusters, and classification models.
· Pattern mining is the process of discovering interesting, useful, and previously unknown patterns from large datasets. It involves analyzing data to identify frequent patterns, correlations, and trends. These patterns can be used to make predictions, inform decision-making, and improve business processes.
· For example, a pattern in customer purchase behavior may be that customers who buy product A are likely to also buy product B. This pattern can be used to make recommendations to customers and increase sales. In healthcare, patterns in patient data can be used to identify risk factors for certain diseases and inform treatment decisions.
Overall, pattern mining is an important technique in data
mining that helps to uncover hidden insights and knowledge from large datasets.