Data Mining and Data Warehousing

Data Warehousing

Data Warehousing is the process of collecting, organizing and managing data from different data sources to provide meaningful business insights and forecasts to respective users.
Integrates data and information collected from various sources into one comprehensive database.
that can be analyzed to make more informed decisions.
The concept of data warehouses first came into use in the 1980s when IBM researchers Paul Murphy and Barry Devlin developed the business data warehouse.
American computer scientist Bill Inmon is considered the "father"of data warehouse due to his authorship of several works.

Data Warehousing

Characteristics of Data Warehousing:-

Subject Oriented
Integrated
Historical Data
Non Volatile
Summarised

Data Mining:-

Mining of knowledge from large data.
the process of sorting through large data sets to identify patterns and relationships that can help solve business problems through data analysis.
Data Mining techniques and tools enable enterprises to predict future trends and make more-informed business decisions.
Extract useful patterns are relationship from database.

Q1. Define Data set.?

· A data set is a collection of data that is used for analysis and modeling.

· The dataset can contain one or more variables or attributes that describe the characteristics or properties of the observations.

· The dataset is a crucial component of the data mining process, as the quality and quantity of the data can have a significant impact on the accuracy, validity, and usefulness of the results.

Q2. Define an attribute ?

· An attribute is an object’s property or characteristics. For example. A person’s hair colour, air humidity etc.

· An attribute set defines an object. The object is also referred to as a record of the instances or entity.

Q3. Difference between Discrete Attribute and continuous attribute?

· In data analysis, a variable can be classified as either discrete or continuous. Discrete attributes are those that can take on only a finite or countable number of values. Continuous attributes, on the other hand, can take on any value within a certain range.

Here are some key differences between discrete and continuous attributes:

· Values: Discrete attributes take on a limited number of values (often integers), whereas continuous attributes can take on an infinite number of values.

· Nature: Discrete attributes are often categorical in nature and represent distinct categories or classes. Continuous attributes, on the other hand, are usually quantitative and represent measurements along a continuous scale.

· Measurability: Discrete attributes are easy to measure since they have distinct, well-defined values. Continuous attributes, however, can be more challenging to measure since they can take on any value within a range, and may require more precise instruments.

· Representation: Discrete attributes are often represented using bar charts or pie charts, while continuous attributes are usually displayed using histograms or line charts.

Q4. Define Data mining (as many definition that many)?

· Data mining refers to the process of extracting hidden, previously unknown, and potentially useful information from large datasets.

· There are several definitions of data mining, and here are a few of them:

· Data mining is the process of discovering meaningful patterns, correlations, and trends by sifting through large amounts of data using advanced algorithms and statistical techniques. (IBM)

· Data mining is the process of extracting valid, novel, potentially useful, and ultimately understandable patterns from data. (Fayyad, Piatetsky-Shapiro, & Smyth, 1996)

· Data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. (Microsoft)

· Data mining is a process that uses statistical, mathematical, artificial intelligence, and machine learning techniques to uncover hidden patterns and relationships in data that can be used to predict future behavior. (SAS)

· Data mining is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. (Wikipedia)

· Data mining is the process of identifying useful information from a large dataset, where the goal is to extract interesting and non-trivial patterns from the data. (University of California, Riverside)

Overall, data mining involves a combination of techniques from several disciplines, including statistics, machine learning, artificial intelligence, and database systems, to analyze large datasets and uncover hidden insights that can be used to make better decisions or predictions.

Q5. What is a pattern?

· A pattern is a set of features or attributes that describes a specific behavior, trend, or relationship within a dataset. Patterns can be represented in many forms such as rules, associations, sequences, clusters, and classification models.

· Pattern mining is the process of discovering interesting, useful, and previously unknown patterns from large datasets. It involves analyzing data to identify frequent patterns, correlations, and trends. These patterns can be used to make predictions, inform decision-making, and improve business processes.

· For example, a pattern in customer purchase behavior may be that customers who buy product A are likely to also buy product B. This pattern can be used to make recommendations to customers and increase sales. In healthcare, patterns in patient data can be used to identify risk factors for certain diseases and inform treatment decisions.

Overall, pattern mining is an important technique in data mining that helps to uncover hidden insights and knowledge from large datasets.

Data Mining and Data Warehousing

Contact Form