ITEM QUALITY ANALYSIS FOR MEASURING MATHEMATICAL PROBLEM-SOLVING SKILLS

This research aims to determine the quality of the questions used to measure students ' mathematical problem-solving skills. This type of research includes quantitative descriptive research. The research subject is a grade VII student with a primary data source is scores of mathematical problem-solving skills. The research instrument used was a written test to measure problem-solving abilities. The data analysis technique used is a quantitative analysis by looking at the validity, reliability, differentiation power, and bullet-grain index of the matter. The results showed that the problem has been valid, the reality belongs to the category quite well, the differentiator power for three items, including bad, enough, and good classes. Meanwhile, the difficulty index shows a number in the easy category, and two names are included in the medium category. Thus, it concluded that the instruments used to measure problem-solving abilities are valid and in a suitable category. It can see it from the instrument trial results with the percentage value of students' problem-solving abilities, which is included in the very high category, namely 20%. The number of students who have problem-solving skills in great variety could be presented at 53.33%. Students who have sufficient and low problem-solving abilities were given at 16.67% and 10%.


INTRODUCTION
The question item is part of the test instrument used for a particular purpose. Tests can be categorized as a means of measuring an object by collecting interconnected information and fulfilling the characteristics of the object. In this study, the object in ISSN 2089-8703 (Print) Volume 9, No. 4, 2020, 1223-1234ISSN 2442

1224|
question was the cognitive level of students seen from the problem-solving skills. The question item is part of the test instrument used for a particular purpose. Tests can be categorized as a means of measuring an object by collecting interconnected information and fulfilling the characteristics of the object. In this study, the object in question was the cognitive level of students seen from the problem-solving skills, depending on the results of the stimulus that varies across types of stages or a game that can introduce the significant potential for confusion, namely arousal and cognitive load (Gundry & Deterding, 2019).
An instrument can be said to be a good instrument if the test has been conducted as an analysis of the quality of instruments included in the report of grain items. Several stages are done to obtain a good question that is: to develop a particular purpose test and usability, to compose items, to conduct test tests, required test tests, conducting grain analysis, revising criteria if there is still less precise, rearranging tests, using principles, and performing an interpretation of test results. This is supported by another opinion, which states that the first step in constructing the instrument is identifying the problem and then presenting it in a formula so that it is easy to understand its purpose. The second step is to design an instrument by the capabilities to be measured (Supardi et al., 2019).
Theoretically, it is said to be a good instrument if it can be used to improve the test results that interpret how far away the capabilities are. A study can determine and take a conclusion if the tools used in retrieving data are of good quality to provide the exact picture associated with real and actual conditions. Therefore, the instrument that has a good variety of data takers is necessary for the study (Aida et al., 2017).
The test instruments used in this focus are instruments used to measure students ' problem-solving skills. After knowing the results of the measurement of mathematical problem-solving ability, it will be easy to see the level of understanding of students in associating some mathematical concepts, thus obtaining the right solution.
Problem-solving skills become the underlying thing in learning mathematics from primary to college. The mathematics learning process takes place. It is better to provide a problem that must be resolved by steps that are conveyed orally or in writing, to be able to give provisions to students in improving their ability to think systematically and mathematically (Cholily, Y.M., Kamil, T.R., & Kusgiarohmah, 2020).
Definitive problem-solving capability is one's ability to solve problems with appropriate procedures and steps. The use of methods in the process of solving mathematical problems will facilitate the students to train in logical and systematic thinking (Sumartini, 2018).
As a further form of exercise, the use of mathematical problems faced with the story will provide an experience for students to understand the text of reading and thoroughness in knowing the data elements used in solving the problem.
The process of solving mathematical problems to be directed and systematic can be done with the steps based on the opinion of the experts is the Polya that expressed is: understand the problem, make a plan, carry out our program, look back at the ISSN 2089-8703 (Print) Volume 9, No. 4, 2020, 1223-1234ISSN 2442 completed solution (Amam, 2017). The teachers and students in solving the question is very well know this workaround, the problem of the story. In general, they hand out the elements known, asked, and the answer process then concludes with a conclusion. Sometimes, students in middle and high school tend to have difficulty rechecking answers. However, it is different from students in low or elementary schools who sometimes have difficulty in planning (Arfiana & Wijaya, 2018).

METHOD
This type of research includes quantitative descriptive. The subject used in the study was a grade VII student at SMP Negeri 3 Jetis school year 2019/2020. The number of students who became the data source consisted of 30 people. All students were adjusted to the research objectives and the learning indicators of achievement goals. Thus, the technique used is purposive sampling. The data source used in this study is quantitative data extracted from students ' answers. Data collection techniques were using a test with essay-shaped. This test refers to the indicator and assessment of problem-solving capabilities.
The data analysis technique used is quantitative analysis. Quantitative analysis was done with IBM SPSS 20 and Microsoft Excel to determine the validity, reliability, differentiation power, and index difficulty of the problem. The validity of the instrument in this study was tested using the Pearson product-moment correlation coefficient formula (1): Information: xy r : correlation coefficient between item score (X) and the total score (Y) N : many subjects X : score of an item or statement/question item score Y : total score.
In this study, to determine the items' validity, the Correlations test formula was used with SPSS 20 software. In the output, if the Pearson Correlation value is ≥ , then the item is valid. However, if Pearson Correlation < , then the thing is invalid. Then, to determine the reliability of the test instrument using the Alpha Cronbach formula (2), namely:

 
This study to test the reliability using the Cronbach-alpha test with the help of SPSS 20 software. At the SPSS output, if the Cronbach-alpha value is ≥ α (0.05), then the instrument is reliable. However, if the Cronbach-alpha value <α (0.05), the instrument is not reliable.
In this study, to determine the distinguishing power using the help of Microsoft Excel with the formula (3): ISSN 2089-8703 (Print) Volume 9, No. 4, 2020, 1223-1234ISSN 2442 = the average score of the answers of the lower group students SMI = Maximum Ideal Score, which is the maximum score a student will get if he answers the item correctly (perfectly).
In this study, to determine the distinguishing power using the help of Microsoft Excel with the formula (4): ..(4) Information: IK = item difficulty index X = the average score of students' answers on an item SMI = Maximum Ideal Score, which is the maximum score a student will get if he answers the questions correctly.

RESULTS AND DISCUSSION
Concerning the difficulty level of the item, other researchers defined it as the proportion of test-takers who correctly answered the question (Angriani et al., 2018). The level of difficulty of the items was seen from students' ability or ability to answer them, not from the assumption of the teacher who compiled the questions, because items that are difficult or easy for the teacher are not necessarily difficult or easy for students. An item of question can distinguish between able students and less capable students. The ability of such an item is called discrimination (Aida et al., 2017) .
Based on the results of data analysis obtained, the results of the quality grain problem with the characteristics of validity, reliability, different power, and difficulty level as follows.

Validity
Validity is calculated using the product correlation formula of the Pearson moment. The number of students working about 30 students, so it is known n = 30, and the value of the R table shows the number 0.361.
A test is called valid or has validity when the test can precisely gauge what to measure. As additional information is, there is an expert opinion which states that the internal validity hierarchy is essential. Overall validity and external validity are considered as second things to consider after internal validity. However, sometimes perspective is not always the right thing (Westreich et al., 2019).
Based on the analysis of the three questions can be found that the three problems are categorized as valid (Tabel 2). The results of the analysis of the research data on Question No. 1 indicates the validity of 0.438 so that the criteria is good enough. Student response also showed that students answer this question sufficient in measuring the ability of problemsolving mathematics in daily life by 66.7%, and students answered to have encountered the same type of 60%. ISSN 2089-8703 (Print) Volume 9, No. 4, 2020, 1223-1234ISSN 2442 | 1227 In Question, No. 2 indicates the validity of 0.882 so that in proper criteria, the student response also shows that it can measure the ability of math problem solving by 86.7%, and students claimed to have never encountered this type of question before at 53.3%.
In Question, No. 3 indicates the validity of 0.883 so that in proper criteria, the student response also shows that it can measure the ability of math problem solving by 83.3%, and students claimed to have never encountered this type of question before at 56.7%.
Another supportive opinion also states that the discrimination of an item is the item's ability to differentiate between students who score high and score low (Angriani et al., 2018). In terms of distinguishing power, a good question is answered correctly by a test taker who is able/clever / mastering the test material, and cannot be answered correctly by a test participant who has not mastered the test material.
The level of difficulty is a number stating the degree of difficulty of an item of matter. A good question is not too easy or not too difficult (Susanti et al., 2017). The problem is too easy not to stimulate students to heighten the effort to break it. Otherwise, the problem that is too difficult will cause students to become discouraged and have no enthusiasm to try again because it is beyond its reach (Arifin, 2017). The difficulty level is how easy and how difficult a problem is for students (Hayati & Lailatussaadah, 2016). The higher the percentage of students answers to the problem correctly, the easier it is, the smaller the percentage of students answer the question correctly, the harder the challenge.

Reliability
The reliability test is used to see the consistency of grain problems in measuring students' troubleshooting skills. The word reliability comes from the word reliability, from a reliable word that means trustworthy. Tests are said to be credible if they provide fixed results when dealt with repeatedly. The reliability of the items on this instrument is only used on the subject of this study, with conditions adapted to reality (Priyambodo & Marfuatun, 2016). The results of the Reliability test are presented in the Table 3. The results of test instrument analysis using IBM SPSS 20 program application known that reliability for T.M. 1st problem of 0.636 with the category is good enough. It can be noted that this problem has consistency when given to the same subject even by different people, different times, or different places, it will provide the same relative results.
A test is said to be reliable when the results of the analysis show a decree. The reliability of an instrument is the gift or consistency of the instrument when given to the same subject even by different people, different times, or different places. It will provide the same relative results. The items are said to have a reliable construct if the test results show the Cronbach alpha value, and the construct reliability is 0.7 (Bintarti & Kurniawan, 2017).

Power differentiator
The differentiator power analysis by using Microsoft Excel obtained the results in Table 4. The distinguishing power analysis on question number 1 shows a characteristic power of 0.133 so that in inadequate criteria, most respondents stated that this problem is straightforward, and data shows that the student score is almost entirely correct. Hence, the problem is bad in distinguishing the students who are highly capable and low-skilled.
Meanwhile, in question number 2 indicates the differentiator power of 0.252 so that insufficient criteria. It means that they can distinguish highability and low-skilled students. It is different in question number 3 with proper criteria with differentiation of 0.437. The results showed this problem in both distinguishing high-ability and low-skilled students.
The distinguishing power of a single question expresses how far the ability of the item differentiates between students who can answer questions appropriately and students who are unable to answer the question adequately (Dewi et al., 2019). The distinguishing power of the problem is the ability to distinguish between intelligent (highly skilled) students and ignorant (low-skilled) students (Arifin, 2017). Different power analysis means reviewing test questions in terms of the strength of the test to distinguish students belonging to the low and high category categories (Angriani et al., 2018). The distinguishing power of the problem is the ability of a test item to differentiate between a highly capable and low-capacity testee.

Tribulation Index
Analysis of difficulty levels by using Microsoft Excel obtained the results in Table 5. Analysis of the difficulty index shows the difficulty level of 0.785, so that in easy criteria. The student response shows that students consider this question to be easy at 63.3%, and students get an answer to this problem independently of 83.3%.
In question number 2 obtained the result of difficulty level 0.533, so that in the medium criteria. The student's response shows that students consider this to be 80%, and on the other, the students get an answer to this problem independently of 56.7%.
The characteristic of number 3 indicates the severity of 0.574 so that it is in moderate criteria. The student response shows that students consider this to be 80%, and students have an answer to this problem independently of 80%.
The analysis of the validity, reliability, level of difficulty, and distinction was intended to reveal the quality of the items so that in this study, the researcher wanted to analyze the validity, reliability, difficulty level, and discrepancy of the items that could be used to reveal students' problem-solving abilities (Almanasreh et al., 2019). The form of the questions analyzed in this study is in the form of problem-solving questions in the form of descriptions with the solution guidelines adjusted to the problemsolving stages proposed by Polya (Lee, 2017). Using these problem-solving steps is straightforward and straightforward so that it is easy to analyze student answers. The items were analyzed for junior high school students to solve geometry, namely flat shapes. The three items that were designed had good quality in terms of reliability, validity, difficulty index, and difference power index (Hidayat et al., 2019).
The results of the reliability calculation gave a Cronbach's alpha value of 0.636. The Cronbach's alpha value is categorized in the reliable items with the moderate category because 0.636 is in the 0.50-0.70 interval. These results indicate that test items' problemsolving ability can be trusted because they tend to provide fixed results (Mohammed, 2019). Although it has been declared reliable, it is not sufficient and must be combined with validity because reliable test questions are not necessarily valid (Martin et al., 2020).
The readability of the test instruments used to measure problemsolving skills (Baştürk, 2016) with the analysis of grain quality problem is to be sought by the characteristics of instruments test problem-solving math problems seen from validity, reliability, differentiation power, and index of tribulations. Reliability analysis was carried out on several test items as a whole, but the validity analysis was carried out on each test item (Priyambodo & Marfuatun, 2016).
The validity of the problem is a degree of precision between data that occurs on the research object with data that can be reported by researchers. Validity is one of the crucial things in the use of instruments for research and practice (Almanasreh et al., 2019).
A test is said to have validity if the result corresponds to the criteria, in the sense of having a parallel between the test result and the requirements. In scientific research, validity is the primary construct and indicates the quality of the study (Gundry & Deterding, 2019).
The items' validity is the measuring accuracy that is owned by an item, which is an integral part of the test as a totality. The same is the case with the level of difficulty and difference in the calculated items individually. Test the validity, difficulty index, and different power of the three questions as described in Figure 1. The results of the calculation in determining the validity of item number 1 give the Sig. (2-tailed) of 0.438, which means that the item is valid. The result of the difficulty levels was 0.785. These results indicate that the problem difficulty level is in the easy category and the discrimination index is 0.133 in Halaman rumah berbentuk persegi panjang berukuran panjang 70 meter dan lebar 55 meter. Di sekeliling halaman itu, akan dipasang pagar dengan biaya Rp 125.000,00 per meter. Berapakah biaya yang diperlukan untuk pemasangan pagar tersebut?

1230|
the not good enough category. It provides information that the abilities that will be displayed by students when solving problems in question number 1 can easily find solutions (Hayati & Lailatussaadah, 2016). Based on these results, the researchers were also able to find new information that question number 1 was a routine question for students at SMP N 3 Jetis. Thus, questions with problem types such as number 1 should not be used continuously to differentiate between students' abilities. Furthermore, item number 2 aims to measure students' ability to add algebraic forms concerning finding the area of a flat shape (Figure 2 The result of calculating the validity of item number 2 ( Figure 2) gives a value of 0.882. Statistically, this means that item number 2 is valid. While the results of the difficulty index of 0.533 (moderate) and the discrimination index of 0.252 were in a relatively good category because they were in the 20 ≤ D <40 intervals. These questions can provide students with experience in honing what must be prepared in solving the questions as consideration is the size of the floor and the tiles' size. If students have been able to identify this, it can be said that the students have met the indicator of problem-solving abilities (Putri & Sutarto, 2018). Furthermore, if students can solve it well, they can solve problems with coherent and systematic procedures. Indicator question number 3 measures students' ability to formulate and add algebraic forms such as the following passage.
A pool is rectangular. If the pool area is 32 m2 and the pool length is twice the width, then determine the pool's circumference! The result of calculating the validity of item number 3 is 0.883. This statistical value shows that item number 3 is valid. The difficulty index calculation is 0.574 in the moderate category because it is in the interval 0.30 ≤ DI <0.80, and the discrimination index is 0.437 in the excellent category. It shows that the questions that should be used in measuring problem-solving abilities are of the type as in question number 3.
Based on the description of the research results and the discussion above, it can be concluded that the three test items have good quality in terms of validity, reliability, difficulty level, and distinguishing power and are suitable for measuring students' mathematical problem-solving abilities (Brundle et al., 2019).
Sebuah kolam berbentuk persegi panjang jika luas kolam tersebut adalah 32 m 2 dan panjang kolam dua kali lebar maka tentukan keliling kolam tersebut ! ISSN 2089-8703 (Print) Volume 9, No. 4, 2020, 1223-1234ISSN 2442 | 1231 solving abilities. The analysis used to determine students' mathematical problem-solving abilities, namely the researchers calculated each indicator and the average achievement of the mathematics problem-solving ability steps (Westreich et al., 2019), namely the mathematics problem-solving ability test's average score. The results of students' math problem-solving abilities and average score test are presented in Table 6 and Table 7. As an additional explanation, it is called indicator A, which identifies the elements known, asked for, and the adequacy of the elements needed. Indicator B means being able to formulate mathematical problems or compile mathematical models. Indicator C means that it can implement the strategies. Solve the problem. Furthermore, indicator D describes students who can explain or interpret the results of problem-solving. Based on the research data above, it was known that in test number 1, all students had written indicator A, which explains the elements that are known and asked. In indicator B, most of the students formulate the problem, the rest of the students do not write it down; on indicator C, almost all students are correct in implementing the strategic problem solving, whereas, in the D indicator, not many students explain or interpret the results of problem-solving.
Based on the research data above, it was known that in test number 2, most students had written indicator A, which explains the elements that are known and asked. In indicator B, some students formulate problems for students who do not write them down. Then, in indicator C, some students are correct in applying some students are wrong in implementing the problemsolving strategies, while in indicator D, there are still many students who do not explain or interpret the results of problem-solving.
Based on the research data above, it was known that in test number 3, most students had written indicator A, which explains the elements that are known and asked. On indicator B, some students formulate problems for students who did not write them down. On indicator C, some students are correct in applying some students are wrong in implementing problemsolving strategies, while in indicator D, there are still many students who do not explain or interpret the results of problem-solving.
Based on the results of the test data to measure mathematical problemsolving abilities, it can be seen that the number of students who were included ISSN 2089-8703 (Print) Volume 9, No. 4, 2020, 1223-1234ISSN 2442

1232|
in the category of having a very high level of mathematical problem-solving ability is six students (20%). The number of students with a high level of mathematics problem-solving category is 16 students (53.33%). The number of students who have sufficient mathematical problem-solving ability is five students (16.67%). Three students were described at a low level of mathematical problem-solving skills that represented 10%.
Based on the explanation above, the test instrument developed meets the valid and reliable criteria, then viewed from the level of difficulty and differentiation (Almanasreh et al., 2019). Some questions have an insufficient level of difficulty, but in general, the test instrument has an appropriate level of difficulty to meet the difficulty level criteria. Based on distinguishing power (Gundry & Deterding, 2019). There is one item with insufficient distinguishing power, but the questions have the distinguishing power according to the criteria. Therefore the test instrument to measure the problem-solving ability reaches the final prototype.

CONCLUSION AND SUGGESTION
Based on the results of research and discussion, it can conclude that the problem grain quality has been valid 100%, with the characteristics of the reality shows at 0.636 in a reasonably good category. Besides, the differentiator power for three items each has a lousy grade (33.3%), enough (33.3%), and good (33.3%). Meanwhile, a grain index of problem-solving skills tests shows easy categories and two medium category items. Besides, the number of students included in having a very high mathematical problemsolving ability is six students (20%).
The students who are in the high category were represented in 53.33%. The number of students who have sufficient mathematical problemsolving ability is five students (16.67%). Students with a low level of mathematical problem-solving skills are three students (10%).
Advice for further research is to analyze the problem quantitatively to know the effectiveness of the test itself to improve test quality. Besides, researchers need to analyze questions quantitatively to find out how practical the test is to improve the test's quality. Also, in making the questions, it should not be too difficult and not too easy to be balanced. Furthermore, after analyzing quantitatively, if others encounter items that are not functioning correctly, it should be corrected, and those that are already functioning correctly can be used as references for future tests.