Meaning of Classification of Data
- It is the process of arranging data into homogeneous (similar) groups according to their common characteristics.
- Raw data cannot be easily understood, and it is not fit for further analysis and interpretation. Arrangement of data helps users in comparison and analysis.
- For example, the population of a town can be grouped according to sex, age, marital status, etc.
Classification of data
The method of arranging data into homogeneous classes according to the common features present in the data is known as classification.
A planned data analysis system makes the fundamental data easy to find and recover. This can be of particular interest for legal discovery, risk management, and compliance. Written methods and sets of guidelines for data classification should determine what levels and measures the company will use to organise data and define the roles of employees within the business regarding input stewardship.
Once a data -classification scheme has been designed, the security standards that stipulate proper approaching practices for each division and the storage criteria that determines the data’s lifecycle demands should be discussed.
Objectives of Data Classification
The primary objectives of data classification are:
- To consolidate the volume of data in such a way that similarities and differences can be quickly understood. Figures can consequently be ordered in sections with common traits.
- To aid comparison.
- To point out the important characteristics of the data at a flash.
- To give importance to the prominent data collected while separating the optional elements.
- To allow a statistical method of the materials gathered.
|Definition of classification given by Professor. Secrist||“Classification is the process of arranging data into sequences according to their common characteristics or separating them into different related parts.”|
|Q.- What is meant by a variable? Explain its two kinds.|
|(a) Meaning of variable||● The term variable is derived from the word ‘vary’ that means to differ or change. Hence, variable means the characteristic that varies, differs, or changes from person to person, time to time, place to place, etc.
● A variable refers to a quantity or attribute whose value varies from one investigation to another.
|(b) Kinds of variables:|
|(I) Discrete variables||● Variables that are capable of taking only an exact value and not any fractional value are termed as discrete variables.
● For example, the number of workers or the number of students in a class is a discrete variable as they cannot be in fraction. Similarly, the number of children in a family can be 1, 2, and so on, but cannot be 1.5, 2.75.
|(II) Continuous variables||● Variables that can take all the possible values (integral as well as fractional) in a given specified range are termed as continuous variables.
● For example, temperature, height, weight, marks, etc.
|Q.- Explain the basis or methods of classification.|
Following are the basis of classification:
|(1) Geographical classification||● When data are classified with reference to geographical locations such as countries, states, cities, districts, etc., it is known as geographical classification.
● It is also known as ‘spatial classification’.
|(2) Chronological classification||● A classification where data are grouped according to time is known as a chronological classification.
● In such a classification, data are classified either in ascending or in descending order with reference to time such as years, quarters, months, weeks, etc.
● It is also known as temporal classification’.
|(3) Qualitative classification||● Under this classification, data are classified on the basis of some attributes or qualities like honesty, beauty, intelligence, literacy, marital status, etc.
● For example, the population can be divided on the basis of marital status (as married or unmarried)
|(4) Quantitative classification||● This type of classification is made on the basis of some measurable characteristics like height, weight, age, income, marks of students, etc.|
|Q.- What is a statistical series? Discuss the various kinds of statistical series.|
|(a) Statistical series||● Statistical series is a systematic arrangement of statistical data in some logical order.|
|(b) Statistical series can be divided as:|
|(I) On the basis of general characteristics||On the basis of general characteristics, statistical series are of three kinds:
(i) Time series (Chronological series)
If the different values that a variable has taken in a period of time are arranged in a chronological order, the series so obtained is known as a time series.
(ii) Spatial series (Geographical series)
The data arranged according to location or geographical considerations form a spatial series.
(iii) Condition series
In this series, data are classified according to the changes occurring in variables according to a condition, such as height, weight, age, marks, income, etc.
|(II) On the basis of construction||According to construction, statistical series can be categorised as :
(i) Individual series
Individual series refers to a series in which items are listed singly, i.e., each item is given a separate value of the measurement. Example:
|Marks (Out of 50)||20||30||10||30||40||50||45||40||42||40|
|(ii) Discrete series
A discrete series is a series where individual values differ from each other by a definite amount.
|No. of students||3||5||2||2||1|
|(iii) Continuous series
A continuous series is a series that represents continuous variables, showing a range of values of different items of the series. Example:
|Marks||0 – 10||10 – 20||20 – 30||30 – 40||40 – 50|
|No. of students||1||4||5||6||4|
|Q.- Discuss the various types of continuous series.|
|(A) Exclusive series||Age (in years)||No. of students||● Frequency distribution having classes wherein:
● The upper limit of one class becomes the lower limit of the next class.
● For grouping or counting the number of observations, lower limit (l1) is considered but upper limit (l2) is not considered/included.
|0 – 10||3|
|10 – 20||5|
|20 – 30||12|
|30 – 40||6|
|40 – 50||4|
|In the above example,
– There are five classes..
– Class size = l2– l1 = 10 (for all)
– Mid-value = (l2 + l1) ÷ 2
|(B) Inclusive series||Age (in years)||No. of students||● Frequency distribution having classes wherein:
● The upper limit of one class is not equal to the lower limit of the next class.
● For grouping or counting the number of observations, lower limit (l1) and upper limit (l2) are not considered/included.
|0 – 9||3|
|10 – 19||5|
|20 – 29||12|
|30 – 39||6|
|40 – 49||4|
|(C) Mid- value series||Mid- values||(f)||● Mid- value = (l1 + l2) ÷ 2
● Mid-value or mid-point is the central value of a class -interval.
● When such mid-values are given, it is known as the mid-value series.
|(D) Open- ended series (Distribution)||Age (in years)||No. of students||● In a frequency distribution, if the lower limit (l1) of the first class and the upper limit (l2) of last class are not given, then it is known as “open-ended distribution”.|
|10 – 20||5|
|20 – 30||12|
|30 – 40||6|
|40 and above||4|
|(E) Continuous series with unequal intervals||(X)||(f)||● When the class size, i.e., the gap between (l2) and (l1), is not equal in all the classes, it is known as unequal class interval series.
● It can be converted into equal interval distribution by:
● Merging the classes;
● Splitting the classes.
|0 – 10||3|
|10 – 15||5|
|15 – 30||12|
|30 – 40||6|
|40 – 45||4|
|(F) Cumulative frequency distribution:-
“Less than Cf distribution”
|Age (in years)||No. of students||● Cumulative frequency series is a modification of the simple frequency distribution.
● It is obtained by successively adding the frequencies of the values of the classes.
|Less than 10||3|
|Less than 20||8|
|Less than 30||20|
|Less than 40||26|
|Less than 50||30|
|“More than Cf distribution”||Age (in years)||No. of students|
|More than 10||30|
|More than 20||27|
|More than 30||22|
|More than 40||10|
|More than 50||4|
|Q.1- What is meant by classification of data?|
Classification of data is the process of arranging data in groups or classes on the basis of certain properties.
|Q.2- What is meant by geographical classification?|
When data are classified according to a geographical location or region, it is known as geographical classification.
|Q.3- What is quantitative classification?|
When data is classified on the basis of characteristics that can be measured, it is known as quantitative classification.
|Q.4- Define qualitative classification.|
When data is classified on the basis of attributes, it is known as qualitative classification.
|Q.5- Give the names of statistical series on the basis of construction.|
(i) Time series;
(ii) Spatial series;
(iii) Condition series.
|Q.6- What is a class?|
‘Class’ means a group of numbers, in which items are placed, such as 0–-10, 10–-20, 20–-30, etc.
|Q.7- What do you understand by the class limits?|
● The two extreme values of each class are known as the class limits.
● The lowest value is termed as the ‘lower limit’ (l2), and the highest value is known as the upper limit’ (l2) of the class.
● For example, in the class “5–-10”, 5 is the lower limit (l1) and 10 is the upper limit (l2).
|Q.8- What is meant by the magnitude of a class?|
● The difference between the upper limit (l2) and the lower limit (l1) of a class is known as the magnitude of the class or class size.
● For example, in the class -interval 20–-50, the magnitude of class -interval is (l2– l1), i.e., 50 – 20 = 30.
|Q.9- Which series excludes the upper limit of the class -interval?|
|Q.10- What is meant by mid-point?|
● Mid-point is the central point of a class -interval, which lies halfway between the lower and upper-class limits. It is (l1 + l2) ÷ 2.
● For example, the mid-point of class 10–-20 will be: Mid-point = (10 + 20) / 2 = 15.
|Q.11- Which method includes both the class limits in the class of a continuous series?|
|Q.12- What is meant by the term ‘frequency’?|
Frequency refers to a number of times a given value appears in a distribution.
|Q.13- What is a frequency distribution?|
A table, in which the frequencies and the associated values of a variable are written side by side, is known as a frequency distribution.
|Q.14- What do you understand by raw data?|
A mass of data in its original form is known as raw data.
|Q.15- Name the series, which has class -intervals.|
|Q.1- Which of the following is the objective of classification?|
|a. To condense the mass of data.
b. To present data in a simple, logical, and understandable form.
c. To bring out points of similarity and dissimilarity among various groups.
d. All of the above
|Q.2- Temperature, height, weight, marks are an example of ________ .|
|a. Discrete variables
b. Continuous variables
c. Both a. and b.
d. None of the above
|1 – d., 2 – b.|
|Q. no.||Fill in the blanks|
|1||_________ of data is the process of arranging data into homogeneous groups according to their common characteristics.|
|Q. no.||Answer Key|
|Important Topics in Commerce:|