I am new to the community and I have a question that I hope someone can answer which is that I have a query that joins 3 tables. If I only select the columns in table1 the correct number of records are returned but when when I select the columns from table2 and table3 as shown below the rows mulitple exponentially even though I am using the on keyword. Does anyone have any idea why that it happening?
select a.fname, a.lname, a.address, a.phone, b.lname, b.address2, c.lname, c.phone2 from table1 a inner join table2 b on a.lname = b.lname inner join table3 c on a.lname = c.lname order by a.lname
"A problem well stated is a problem half solved." -- Charles F. Kettering
Please post DDL, so that people do not have to guess what the keys, constraints, Declarative Referential Integrity, data types, etc. in your schema are. If you know how, follow ISO-11179 data element naming conventions and formatting rules. Temporal data should use ISO-8601 formats. Code should be in Standard SQL as much as possible and not local dialect.
Sample data is also a good idea, along with clear specifications. It is very hard to debug code when you do not let us see it. If you want to learn how to ask a question on a Newsgroup, look at: http://www.catb.org/~esr/faqs/smart-questions.html
Are you absolutely sure that is the same query you ran and you didn't have a DISTINCT in there?
Can you post the other query please.
Given the exact same FROM clause you should be getting the same number of rows because its the joins that bring stuff back and you've no WHERE so there is no filtering and the SELECT does no filtering.
Tony.
"Sharon" <Sha...@discussions.microsoft.com> wrote in message
> I am new to the community and I have a question that I hope someone > can answer which is that I have a query that joins 3 tables. If I only > select > the columns in table1 the correct number of records are returned but when > when I select the columns from table2 and table3 as shown below the rows > mulitple exponentially even though I am using the on keyword. Does anyone > have any idea why that it happening?
> select a.fname, a.lname, a.address, a.phone, > b.lname, b.address2, > c.lname, c.phone2 > from table1 a > inner join table2 b > on a.lname = b.lname > inner join table3 c > on a.lname = c.lname > order by a.lname
Since you join on the lname column, that would mean that there are duplicate lname values in one or more of the tables. You can easily check that in each table with a query like this:
SELECT lname, COUNT(*) AS cnt FROM Table GROUP BY lname HAVING COUNT(*) > 1;
To solve this there are different approaches based on your data. If the rows contain exact duplicate data then you can use DISTINCT:
SELECT DISTINCT a.fname, ...
Also, maybe the query needs to join the tables on additional columns, fname for example:
SELECT ... FROM Table1 AS A JOIN Table2 AS B ON A.lname = B.lname AND A.fname = B.fname JOIN Table3 AS C ON A.lname = C.lname AND A.fname = C.fname
You are right, the actual query says "Select Distinct(Description), ... " but how can that Distinct keyword cause so many rows to be returned when I add the column names from the other tables to the select statement?
> I am new to the community and I have a question that I hope someone > can answer which is that I have a query that joins 3 tables. If I only select > the columns in table1 the correct number of records are returned but when > when I select the columns from table2 and table3 as shown below the rows > mulitple exponentially even though I am using the on keyword. Does anyone > have any idea why that it happening?
> select a.fname, a.lname, a.address, a.phone, > b.lname, b.address2, > c.lname, c.phone2 > from table1 a > inner join table2 b > on a.lname = b.lname > inner join table3 c > on a.lname = c.lname > order by a.lname
You are right, the actual query has says "Select Distinct(Description), ...." but how can the Distinct keyword cause all of those rows to mulitply when I add the columns from the other tables?
> Are you absolutely sure that is the same query you ran and you didn't have a > DISTINCT in there?
> Can you post the other query please.
> Given the exact same FROM clause you should be getting the same number of > rows because its the joins that bring stuff back and you've no WHERE so > there is no filtering and the SELECT does no filtering.
> > I am new to the community and I have a question that I hope someone > > can answer which is that I have a query that joins 3 tables. If I only > > select > > the columns in table1 the correct number of records are returned but when > > when I select the columns from table2 and table3 as shown below the rows > > mulitple exponentially even though I am using the on keyword. Does anyone > > have any idea why that it happening?
> > select a.fname, a.lname, a.address, a.phone, > > b.lname, b.address2, > > c.lname, c.phone2 > > from table1 a > > inner join table2 b > > on a.lname = b.lname > > inner join table3 c > > on a.lname = c.lname > > order by a.lname
On Wed, 4 Nov 2009 12:34:08 -0800, Sharon wrote: >Hi Tony
> You are right, the actual query says "Select Distinct(Description), ... >" but how can that Distinct keyword cause so many rows to be returned when I >add the column names from the other tables to the select statement?
Hi Sharon,
The DISTINCT keyword doesn't add extra rows, it removes duplicate rows. And for determining duplicates, the entire SELECT list is used. So if you have a query without DISTINCT that would return
FirstName --------- George George John
than adding DISTINCT would remove one of the duplicated rows with first name equal to George. However, if the query without DISTINCT would have returned
FirstName LastName --------- ----------- George Harrison George Michael John Lennon
adding DiSTINCT would not remove anything at all, because George Michael is not a duplicate of George Harrison.
So when you added extra columns to the SELECT, rows that were duplicates in the old, smaller subset of columns now no longer are duplicates so they are no longer removed.
> You are right, the actual query has says "Select Distinct(Description), > ...." but how can the Distinct keyword cause all of those rows to mulitply > when I add the columns from the other tables?
> Sharon
> "Tony Rogerson" wrote:
>> hi Sharon,
>> Are you absolutely sure that is the same query you ran and you didn't >> have a >> DISTINCT in there?
>> Can you post the other query please.
>> Given the exact same FROM clause you should be getting the same number of >> rows because its the joins that bring stuff back and you've no WHERE so >> there is no filtering and the SELECT does no filtering.
>> > I am new to the community and I have a question that I hope someone >> > can answer which is that I have a query that joins 3 tables. If I only >> > select >> > the columns in table1 the correct number of records are returned but >> > when >> > when I select the columns from table2 and table3 as shown below the >> > rows >> > mulitple exponentially even though I am using the on keyword. Does >> > anyone >> > have any idea why that it happening?
>> > select a.fname, a.lname, a.address, a.phone, >> > b.lname, b.address2, >> > c.lname, c.phone2 >> > from table1 a >> > inner join table2 b >> > on a.lname = b.lname >> > inner join table3 c >> > on a.lname = c.lname >> > order by a.lname
Does that mean of the following 2 queries that the first one would only compare the lastname fields to determine if the row is a duplicate and the second query compare the lastname and firstname and address fields to determine if the row is a duplicate?
select distinct(lastname), firstname, address or select distinct lastname, firstname, address
"Hugo Kornelis" wrote: > On Wed, 4 Nov 2009 12:34:08 -0800, Sharon wrote:
> >Hi Tony
> > You are right, the actual query says "Select Distinct(Description), ... > >" but how can that Distinct keyword cause so many rows to be returned when I > >add the column names from the other tables to the select statement?
> Hi Sharon,
> The DISTINCT keyword doesn't add extra rows, it removes duplicate rows. > And for determining duplicates, the entire SELECT list is used. So if > you have a query without DISTINCT that would return
> FirstName > --------- > George > George > John
> than adding DISTINCT would remove one of the duplicated rows with first > name equal to George. However, if the query without DISTINCT would have > returned
> FirstName LastName > --------- ----------- > George Harrison > George Michael > John Lennon
> adding DiSTINCT would not remove anything at all, because George Michael > is not a duplicate of George Harrison.
> So when you added extra columns to the SELECT, rows that were duplicates > in the old, smaller subset of columns now no longer are duplicates so > they are no longer removed.
> Are you absolutely sure that is the same query you ran and you didn't have a > DISTINCT in there?
> Can you post the other query please.
> Given the exact same FROM clause you should be getting the same number of > rows because its the joins that bring stuff back and you've no WHERE so > there is no filtering and the SELECT does no filtering.
> > I am new to the community and I have a question that I hope someone > > can answer which is that I have a query that joins 3 tables. If I only > > select > > the columns in table1 the correct number of records are returned but when > > when I select the columns from table2 and table3 as shown below the rows > > mulitple exponentially even though I am using the on keyword. Does anyone > > have any idea why that it happening?
> > select a.fname, a.lname, a.address, a.phone, > > b.lname, b.address2, > > c.lname, c.phone2 > > from table1 a > > inner join table2 b > > on a.lname = b.lname > > inner join table3 c > > on a.lname = c.lname > > order by a.lname
No, both of those queries would do the same thing. Distinct would apply to all columns in both cases and compare the lastname and firstname and address fields to determine if the row is a duplicate.
Tom
"Sharon" <Sha...@discussions.microsoft.com> wrote in message
> Does that mean of the following 2 queries that the first one would only > compare the lastname fields to determine if the row is a duplicate and > the > second query compare the lastname and firstname and address fields to > determine if the row is a duplicate?
>> On Wed, 4 Nov 2009 12:34:08 -0800, Sharon wrote:
>> >Hi Tony
>> > You are right, the actual query says "Select Distinct(Description), >> > ... >> >" but how can that Distinct keyword cause so many rows to be returned >> >when I >> >add the column names from the other tables to the select statement?
>> Hi Sharon,
>> The DISTINCT keyword doesn't add extra rows, it removes duplicate rows. >> And for determining duplicates, the entire SELECT list is used. So if >> you have a query without DISTINCT that would return
>> FirstName >> --------- >> George >> George >> John
>> than adding DISTINCT would remove one of the duplicated rows with first >> name equal to George. However, if the query without DISTINCT would have >> returned
>> FirstName LastName >> --------- ----------- >> George Harrison >> George Michael >> John Lennon
>> adding DiSTINCT would not remove anything at all, because George Michael >> is not a duplicate of George Harrison.
>> So when you added extra columns to the SELECT, rows that were duplicates >> in the old, smaller subset of columns now no longer are duplicates so >> they are no longer removed.
On Thu, 5 Nov 2009 16:20:01 -0800, Sharon wrote: >Hello Hugo and thank you for the explanation
> Does that mean of the following 2 queries that the first one would only >compare the lastname fields to determine if the row is a duplicate and the >second query compare the lastname and firstname and address fields to >determine if the row is a duplicate?
As Tom already said, they are the same. The parentheses are not associated with the DISTINCT keyword - the first query is considered as starting with SELECT (lastname), firstname, address [where the parentheses are superfluous but syntactically correct], and then with the DISTINCT keyword added.
And that is a good thing. After all, imagine we have these two people in the database: White, George, 1334 Main St.; and White, Burt, 6339 Small Ave. If the former query would look only at last names for determining duplicates, it should return one row only - but what values should it return for the firstname and address columns?
There are two forms of the SELECT [ALL | DISTINCT]; the SELECT ALL keeps duplicate rows and the SELECT DISTINCT removes redundant duplicate rows, leaving one copy. SELECT by itself is a shorthand for SELECT ALL. You will find these keywords used in other parts of SQL for this sort of thing.
As a generalization, a SELECT DISTINCT is a symptom of a bad query with an unexpected CROSS JOIN hidden in it or a non-normalized schema that has redundant duplicates. I would guess the later, since you have three tables with the same data. It looks like attribute splitting but you did not post any DDL, all we do is guess.
> There are two forms of the SELECT [ALL | DISTINCT]; the SELECT ALL > keeps duplicate rows and the SELECT DISTINCT removes redundant > duplicate rows, leaving one copy. SELECT by itself is a shorthand for > SELECT ALL. You will find these keywords used in other parts of SQL > for this sort of thing.
> As a generalization, a SELECT DISTINCT is a symptom of a bad query > with an unexpected CROSS JOIN hidden in it or a non-normalized schema > that has redundant duplicates. I would guess the later, since you > have three tables with the same data. It looks like attribute > splitting but you did not post any DDL, all we do is guess.