Web Analytics Made Easy -
StatCounter mysql full-text search query results - CodingForum

Announcement

Collapse
No announcement yet.

mysql full-text search query results

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • mysql full-text search query results

    Good evening, I'm a little green when it comes to MySQL, so I'm not sure what I need to do to get the results I'm looking for. I'm not sure if it can be done within the query itself, or if I would need to query, then maybe do some post query parsing to get the results.

    I have a document library database where I'm searching both the docTitle and docContent fields. Here is the basic query I'm passing through PHP:

    $query = "select *, match(docTitle, docContent) against('".$basicSearch."' IN BOOLEAN MODE) as relevance from content where match(docTitle, docContent) against('".$basicSearch."' IN BOOLEAN MODE) order by relevance DESC";

    This is a good start for me. However there are one or two documents towards the bottom of the list ranked as a relevance of 1 or 2 where I would like them to be displayed at the top of the results.

    What I'm looking for is a way to have these searches weighted to where a document that has any of the words searched for in the docTitle weighted and ranked higher in the results than those documents that have the same words, but are not in the docTitle.

    Something else is, there are a few "main" documents in the database, labeled as "MD - (title)" in the docTitle. I would like my MD documents to ALWAYS be at the VERY TOP of the search results page, but then have the rest of the results displayed based on their regular relevance without the user having to type "MD" in their search query.

    Are either of these capable by just changing the way I'm querying the database... or is this something I need to do by post processing the array that is returned in PHP?

    Any advice?

    thanks a million.

    -Brock
    Last edited by brobarn; Feb 16, 2004, 02:24 AM.

  • #2
    Welcome here!

    No you can't do any of these, just by changing the query.

    For the first one, you will probably need to write your own search-algorithme. But the relevance is something different then wether or not the words from the searchstring appear in the fieldvalue.

    I don't understand the second question. You want to run a full text search but the top records should always be the same? Then why not run 2 querys. First the select for the MD records, the process the resultset and build a commadelimited string with the records id's. Then run the full-text search ad use the build colections in the where clause. Like

    $sql="select ... where ID Not In (" . $collection . ") ... ";

    where $collection contains the ID's from your first select
    Posting guidelines I use to see if I will spend time to answer your question : http://www.catb.org/~esr/faqs/smart-questions.html

    Comment


    • #3
      Thanks for the fast response raf. Through trial and error, I was able to figure it out. I was actually able to do BOTH of the things I mentioned just by editing my query alone. It is as follows:

      $query = "select *, if(locate('".$basicSearch."', docTitle),5,1) as title, if(locate('MD', docTitle),6,1) as main, match(docTitle,docContent) against('".$basicSearch."') as relevance from content where match(docTitle, docContent) against('".$basicSearch'' IN BOOLEAN MODE) order by main DESC, title DESC, relevance DESC LIMIT 0, 30";

      It may be ugly, but it worked out exactly like I was hoping:
      Code:
      +--------+-------+------+------------------+
      | docNum | title | main | relevance         |
      +--------+-------+------+------------------+
      |    180 |     1 |    6 |   1.610320634487  |
      |    133 |     5 |    1 |  2.7115913179495  |
      |    134 |     1 |    1 |  2.5842761474056  |
      |    154 |     1 |    1 |   1.416108614917  |
      |     81 |     1 |    1 | 0.72192275264432  |
      +--------+-------+------+------------------+
      As you can see I gave the main document a 6 and everything else a 1, then sorted by main, that sticks the main doc at the very top.

      I gave anything that had the keywords searched for in the title a 5 and everything else a 1, then sorted by title, then I just sorted everything else by the normal relevance.

      Trial and error (LOTs OF ERROR) will do it everytime. =)

      Comment


      • #4
        No it doesn't. Not trying to spoil your party buth this does not return what you asked for in your original post + it's not the most efficient and evn an incorrect way of trying to get that result.

        Like i said in my first post : wether the keyword is part of fieldvalue is not the same as the relevance of that fieldvalue to the searchstring. Using locate will even match against substrings ! So it absolutely is NOT what you asked in your original post.
        The full text search is meant to match datafields, against a 'natural language search' --> meaning NOT entered keywords, but a sentence or part of a sentence.

        Your solluton will only work if the users enters 1 keyword and if that keyword doesn't exists as a substring of any of the records title-words.

        If the user would enter 'delete a page', then your " if(locate('".$basicSearch."', docTitle),5,1) " will return False for all records --> so it then is way to restrictive.
        If you would chop up the searchstring and run the locate against every keyword, then all records will probalby return True, because they will all contain a wordt with an "a" --> way to loose.

        Your second locate clause should also be replaced by a regular select using a LIKE operator that you then add to the results of the full text search with a UNION. But that is simply pointless, because there is no gain in mixing these two resultsets together, since they are selected through differnt criteria. I cant imagen what service this would provide to the user. It's more transparent and correct to display the results from the MD records in a seperate resultlist.

        So you can get a result if the user only types in 1 keyword, but that is absolutely not an appropriate use of full-text matching, and i doubt if even then, the result of your two locate-function will deliver the desired result.

        But if your happy with it, then the problem is solved.
        Posting guidelines I use to see if I will spend time to answer your question : http://www.catb.org/~esr/faqs/smart-questions.html

        Comment


        • #5
          If the user would enter 'delete a page', then your " if(locate('".$basicSearch."', docTitle),5,1) " will return False for all records --> so it then is way to restrictive.
          That is wierd... I'm not getting the same results as you mention. Obviously I'm not using 'delete a page'. But lets use Pentium 4 motherboard for example. If the user types in "Pentium 4 motherboard" between quotes... obviously it is going to do it as a natural language selection like you spoke of and look for all 3 words together as a phrase, which will probably not return any results in the title. Maybe, maybe not. However if I type it in without the quotes... since it is running through a boolean search first, it is grabbing any title that has the word pentium or the word motherboard, thus including pentium 4 motherboard. But the default setup on mysql full text searches are also set up to where 3 letter words or less such as the "a" in delete a page, or the "4" in pentium 4 motherboard would be excluded unless it was put in quotes or had a + in front of it to insure it was included.

          I've ran this query against a rather small database of just 1500 documents, and it seems to be putting MDs at the very top, documents with keywords in the titles right under the MDs, and everything else that is relevant under those based on the relevancy field. It seems to be doing what I was looking for in my original post just fine right now. Maybe that will change like you said once I start adding more documents... or try doing some more complex searches in my search textbox.

          Like I said, I'm green/new when it comes to MySQL, so I'm sure there are a million better ways to do this. It is definatelly probably not the fastest or the most elegant way to query, but that is what I came on here for... was suggestions. You guys are the teachers... I'm just the student.

          Thank you for your suggestions raf. I'll defintely take everything you have mentioned into consideration. I appreciate the willingness and kindness of individuals like yourself who are willing to take the time to help us newbies out.

          Thanks again,

          -Brock

          Comment


          • #6
            You don't seem to understand it

            --> full text searches IS NOT THE SAME as checking if the keyword is part of the datafield. And the relevance does not only depend on wether and how much keywords are included, but also on other parameters like the number of wirds on the fields etc. It can also match against known synonyms (v 4.1.1 on) of a part of your searchstring. So all your argument about parts of the searchstring appearing in whatever field, are at least partialy incorrect and from reading your posts, i don't think that the used searchalgoritme will deliver the results you are after, although you will probably never know it.
            I've got the impression that you want to use the full text search to match against multiple keywords, but it is not build and tuned for that.
            --> locate will search if a substring appears in your title. Point finale. It is completely useless for what you are after. Locate is the oposite of a full text search. It will also have no effect in your query for 90% of your searches. For 100% of the searchstring s that contain more then 1 word.
            Your example is very well chosen as one of the 10% since nore Pentium, 4, or motherboard will probably be substrings of
            --> you are also confusing my explanation about the locate() function and the default full text search settings. So the argument about excluding words with less then four positions is irrelevant then. The fact that these words are ignored and that they will not be included through the WHERE clause will even make the locate-functions more pointless.
            -->your explanation
            If the user types in "Pentium 4 motherboard" between quotes... obviously it is going to do it as a natural language selection like you spoke of and look for all 3 words together as a phrase, which will probably not return any results in the title.
            is simply incorrect. Enclosing it in double quotes will cause a literal earch. Not enclosing it in quotes, will result in a natural language search. Enclosing it in a literal comparison within a natural language search.
            But it would realy surprise me if the locate-function would return true on "Pentium 4 motherboard" (with or wothout quotes)

            It's not because your query returns the expected result for some of the searchstring that it is usefull/the best you can do.
            Posting guidelines I use to see if I will spend time to answer your question : http://www.catb.org/~esr/faqs/smart-questions.html

            Comment


            • #7
              I must be misunderstanding something exactly like you said. I don't understand how that exact query I just gave when searching for Pentium 4 Motherboard will give title searches a 5 if it has Pentium 4 Motherboard, Pentium 3 motherboard, Pentium chip, or 486 motherboard. That same query above gives a title ranking of 5 for anything that has ANY of the keywords in the search query... not just "Pentium 4 Motherboard". That query is giving the document with the title "486 motherboard, 128mb memory" a ranking of 5 like I asked it to. But it doesn't have the word Pentium or the number 4 in it. Why is it giving it a ranking of 5 then? I don't understand just like you said... but I want to know WHY it would be doing this if you say it shouldn't. I'm trying to learn here.

              So this is making any document that has "ANY" of the keywords searched for in the title at the top of my search results right under any of the MDs. It maybe shouldn't be doing it like you said... but I want to know why it is. I'm not doing any post query processing with php, I'm just redisplaying my results based on the array that is passed back. All of this is being done solely by the query being passed into the database.

              Maybe I need to go buy a MySQL book to better understand it. You are obviously the expert... I've only been working with MySQL for a matter of weeks. I just really don't understand why my query is doing something that you are emphatically disagreeing that it can or should be doing.

              I'm not opposed to doing it in any way. I want to learn the best and most efficient way of doing it. I don't want to just do it because somebody says this is the best way to do it... I want to understand how and why this is the best way of doing it. And also understand why and how it shouldn't be done to avoid it in the future.

              I'm just a sponge trying to soak up everybody else's knowledge.

              Thanks again raf,

              -Brock

              Comment


              • #8
                so you are saying that

                select if(locate('Pentium 4 Motherboard', '486 motherboard, 128mb memory'),5,1) ;

                will return 5 ?

                I'll try it this evening (don't have mySQL at work).
                Posting guidelines I use to see if I will spend time to answer your question : http://www.catb.org/~esr/faqs/smart-questions.html

                Comment


                • #9
                  Any good MySQL books you could suggest raf? Books. Tutorials. Anything of the such? You've peaked my interest, so I'm wanting to get deep down into how to create complex queries and why and how they work. Suggestions?

                  Comment


                  • #10
                    Originally posted by brobarn
                    Any good MySQL books you could suggest raf? Books. Tutorials. Anything of the such? You've peaked my interest, so I'm wanting to get deep down into how to create complex queries and why and how they work. Suggestions?
                    Yes. 1 suggestion : don't try to write complex querys and try to have your db-design optimised in such a way that you don't need complex querys. They always come with a performancy-impact.
                    But complex db-operations will still be more performant then doing your comparisons/conversion etc in your applicationlayer.

                    Anyway, tutorials: http://www.mysql.com/documentation/index.html

                    If you need info about the mySQL syntax or functions, then just run a search at mysql.com. It's about the only reference i've used for mySQL (but i still use it every day).
                    Posting guidelines I use to see if I will spend time to answer your question : http://www.catb.org/~esr/faqs/smart-questions.html

                    Comment


                    • #11
                      attention raf

                      Hey raf... you have been a wealth of help and knowlege so far... so I thought I would direct this one towards you. Lets say people were using the query below... (it will probably definately change), but for sake of having a query... let's say people do a search using the following query below. How can I create a sub query of this query... using the same constraints... but using different keywords? I guess, I'm looking for a way people can do a search, then refine their query results with another search within those results already displayed.

                      I was thinking I could just erase the "from content" in the query and paste the same query in itself within from (), but it definately is not that easy. Or atleast... it won't work in that way.

                      How can I create a sub query using the exact same query?

                      Thanks man!

                      select *, if(locate('SSTU', docTitle),7,1) as sstu, if(locate('MD', docTitle),6,1) as main, if(locate('inventory management', docTitle),5,1) as title, match(docTitle,docTitle) against('inventory management') as relevance from content where match(docTitle, docContent) against('inventory management' IN BOOLEAN MODE) order by sstu DESC, main DESC, title DESC, relevance DESC

                      Comment


                      • #12
                        So you want to run a search on the searchresults (of your original search)
                        2 option:

                        1. You store the results from your first query inside a tabel. The second query then only runs on this table, using the new searchcriteria --> so you only need the condition for the second query. Say they want all results from your first query, where the title or description contains the keyword 'test', then you just run
                        select * from tableresultsquery1 where title LIKE '% test %' or description LIKE '% test %'. You dont need anything from the first querys since it's results are stored and you only search on these results

                        2. You rerun the original query, but you add the new searchcriteria as an extra condition to the where clause. Like
                        [/php]
                        $extracond=''; // so for your first run, there wount be injected anything in your query
                        if (isset $_POST['secondsearch']){
                        $extracond = "AND (title LIKE '% " . $_POST['secondsearch'] . " %' or description LIKE '% ". $_POST['secondsearch'] . " %') " // there is no space in front of the AND and one behind the last )
                        }
                        sqm="select *, if(locate('SSTU', docTitle),7,1) as sstu, if(locate('MD', docTitle),6,1) as main, if(locate('inventory management', docTitle),5,1) as title, match(docTitle,docTitle) against('inventory management') as relevance from content where match(docTitle, docContent) against('inventory management' IN BOOLEAN MODE) " . $extracond . "order by sstu DESC, main DESC, title DESC, relevance DESC
                        [/php]
                        Well, you need to do so extra checking when you build the $extracond to see if the searchfield contained more then one word + to see on which fields you need to search, but this is the basic idea --> you just build an extra condition and rerun the complete query with the new condition included.

                        Performance of this last option is lower, because you need to reproces the complete table, but it will be the best option if you have a dynamic table (lots of datamanipulation going on, deleting, updating etc) and you need a real-time result + you don'tneed to clean out the table than you must create for option 1 --> in option 1, you'll need a cron or scheduled task to temprarely drop the tables of abandonned session. For non-abandonned sessions, you can drop them if the user hits a link to another page or whatever.
                        Another argument is that you could expand the column to search on to the whole original table (so then it is not really a search on the results, but one that focusses in on the searchresults + grabs similar, new, records). Inside option 1, you can only search on the fields you included inside your created table.

                        There is actually also a third option using a "data island". You then create an XML file for your initial searchresults and then display the by loading them as an external XML file inside the html (or as an inline XML section). You can then filter/order/search on that as completely clientsided actions. It's like a disconnecte recordset that you work and query on inside your browser. But that will only work in IE (from v 5.5 or 6 on, i believe) so that would only be usefull in a completely controled environment (like an intranet with terminals or so). It's more an option for the (near?) future when XML and QML support becomes universal.


                        <edit>
                        actually, thinking about this further, there is actually a third and very easy to implement alternative, where you do the same as alternative 2, but where you simply append the second searchstring to the first, and then you can just reuse your existing query and code. You just need to add a few lines to store the first searchstring (in a hidden formfield or in a sessionvariable) and then append it to the searchstring. So like for instance:
                        PHP Code:
                        $extracond='';   // so for your first run, there wount be injected anything in your query
                        if (isset ($_POST['secondsearch'])){
                            
                        $match .= $_SERVER['firstsearch'] .  ' ' $_POST['secondsearch'] ;
                        } else {
                            
                        $match $_POST['firstsearch'] ;
                            
                        $_SERVER['firstsearch'] = $_POST['firstsearch'] ;
                        }
                        sql="select *, if(locate('SSTU', docTitle),7,1) as sstu, if(locate('MD', docTitle),6,1) as main, if(locate('inventory management', docTitle),5,1) as title, match(docTitle,docTitle) against('"$match ."') as relevance from content where match(docTitle, docContent) against('"$match ."' IN BOOLEAN MODE) order by sstu DESC, main DESC, title DESC, relevance DESC" 
                        </edit>
                        Last edited by raf; Feb 17, 2004, 06:46 AM.
                        Posting guidelines I use to see if I will spend time to answer your question : http://www.catb.org/~esr/faqs/smart-questions.html

                        Comment

                        Working...
                        X