`

《Pro Oracle SQL》--Chapter 5--5.4 Questions about the Question

阅读更多

5.4 Questions about the Question   关于问题的问题 (page 132)


     Developing new queries is usually easier than trying to modify a query that someone else has already written.  This is because when you write a brand new query, you don’t have to worry about interpreting the meaning of someone else’s code.  But, what you do have to worry about is the query specification.  Whether it’s detailed or not, it’s your job to make sure you code the SQL to deliver the answer to the question you’ve been handed.  
     开发新的查询通常要容易于试图修改别人已经写好的查询。这是因为当你写一个全新的查询,你不必关注他人代码的解释含义。但是你必须关注于查询的需求规范。它是否详细,确保你编写的SQL能给予问题答案,这是你的工作。
     Let’s walk through an example of how this process might work.  I’ll play the role of business user and you play the role of application developer.  My request is for you to write a query that provides a list of employees who have held more than one job in the company.  I’d like the output to display only the employee_id and a count of how many total jobs they’ve held.  Listing 5-1 shows the query you create to satisfy my request.
     让我们来通过一个例子展示这个过程。我扮演业务用户,你是一个应用开发人员。我请求你写一个查询,提供一个列表展示在公司里所有干过多于一份工作的人。我希望输出仅显示他的employee_id和所作工作的小计。列表5-1显示你创建的查询满足我的需求。
Listing 5-1. List of Employees Who Have Held Multiple Jobs
SQL> select employee_id, count(*) job_ct
  2  from job_history
  3  group by employee_id
  4  having count(*) > 1;

    EMPLOYEE_ID          JOB_CT
--------------- ---------------
            101               2
            176               2
            200               2

3 rows selected.  

 

    That was pretty simple, right?  You complete your testing and deliver this code.  However, I quickly
come back to you and say it’s wrong.  The list is missing some employees who have held more than one job.  I had been manually producing this list previously and know that the following list of employees should be displayed:  101, 102, 114, 122, 176, 200, and 201.  
这是不是太简单了?你完成你的测试,提交了代码。然而,我立马返回告诉你,它是错的。结果列表中少了一些至少干过一份工作的雇员。我之前已经手动的查询过结果,且知道下列人员101,102,114,112,176,200和201是应该出现在结果列表中的。
What went wrong?  This seemed like a fairly simple query, didn’t it? It went wrong because the
solution was developed without any questions being asked.  By not asking questions, you made some
assumptions (whether you realized it or not).  The assumptions you made caused you to write the query as you did.  The way you wrote the query didn’t provide the result I was expecting.  Admittedly, I could have helped you out more by giving you a more detailed specification or providing you with the
expected result set initially. Regardless of the quality of the query specification you have, never forget
that it is your job to ferret out the details and make sure you develop code that specifically answers the real question being asked.
     哪 里出了问题?这是一个非常简单的查询,不是么?之所以会出错,就是因为做出的解答没有被题及任何问题。如果不问问题,你就事先做出了某种假设(无论你有意 还是无意)。你做出的假设使你写出上述查询,它不能给我期望的结果。诚然,我应该一开始就给予你更多帮助,给你更为详细的需求规范或者提供给你期望的结果 集。
     Let’s start over.  The query specification I provided asked you to write a query that provides a list of employees who have held more than one job in the company, displaying the employee_id and a count of how many total jobs they’ve held.  While at first glance, the query request seems straightforward, the apparent simplicity hides several nuances that you won’t be aware of unless you ask some questions.  
     让我们重新开始。我提供的查询需求要求你写一个查询提供一个列表,展示在一家公司做过两份工作的雇员的employee_id且小计共几份工作。初看,这个查询似乎很直白,表面的简单隐藏了几个微妙之处,除非你提出问题才能弄明白。
The following list includes a few questions that you might have asked to help clarify the request:

  •  Should the query consider the employee’s current job as part of the count or only jobs held other than the current position?
  •   Where is the data that will satisfy the request stored, i.e. in one table or several?
  •   What is the data model and can I get a copy of the data dictionary or an ERD (entity relationshipdiagram) if one exists?
  •   Is there an expected typical size of the result set?
  •   How is the data stored?
  •   Must this query meet any response time SLA (service level agreement)?
  •   How frequently will the query execute?

下面列出了你可能需要帮助澄清需求的一些问题。

  • 应该考虑雇员的当前职位作为小计的一部分么?还是要排除当前职位。
  • 满足需要的数据存放在哪?例如,一张表还是多张表?
  • 数据模型是什么或者我能否复制数据字典或者一个ERD(实体关系图)是否存在?
  • 这是不是结果集预期的大小?
  • 数据如何存储的?
  • 查询是否要满足任何SLA(服务级别协议)响应时间?
  • 查询执行的频率

    If you receive a request from a business user, it might not be feasible to ask them all of these
questions.  Asking a business user about which tables contain the data they want or if you could get a
copy of the ERD might be answered with blank stares as those things aren’t typically in the domain of
the business user’s knowledge.  It is important to note whether the request for the query is coming from an application user or an application technical architect.  Many of these questions can only be answered by someone with an understanding of the application from the technical perspective.  Therefore, learn who the “go to” people are when you need to get detailed technical information.  This may be the DBA, the data architect, or perhaps a developer that initially worked on other code in this particular application.  Over time, you’ll build the knowledge you need to determine many of these answers for yourself, but it’s always good to know who the subject matter experts are for any application you support.
    如 果你接到业务用户一个需求,可能不太容易问他们所有这些问题。问一个业务用户哪些表包含他们想要的数据或你是否能得到ERD,得到的回答可能一脸惘然。因 为这些东西通常不在业务用户的知识领域中。重要的是要意识到:你的需求来至于一个业务用户还是一个应用的技术架构者。许多这些问题只能由一个从技术方面理 解这个应用的人来回答。因此,当你要获得详细的技术信息时,先要知道谁是你要问的人。他可能是一个DBA,一个数据架构师,或者可能是一个最初编写过这个 特殊系统其他代码的开发人员。这期间,你必须自己搞清楚这些问题的答案,才能构建对系统的认知。但是知道谁是你支持应用的课题专家总是有益的。
    Getting answers to the first three questions are the most important, initially.  You must know more
than just a description of what the query needs to ask for.  Being as familiar as possible with the data
model is the starting point.  When writing the original query, an assumption was made that the only
table containing information needed to satisfy the query was the job_history table.  If you had asked the first three questions, you’d have found out that the job_history table is truly a history table; it only
contains historical data, not current data.  The employees table contains a job_id column that holds the employee’s current position information.  Therefore, in order to determine how many positions an
employee has held, you need to get their current job from the employees table and their previous jobs
from the job_history table.
    一 开始,就搞清楚前三个问题是最重要的。你必须知道描述查询需要做什么以外的更多的信息。尽可能的熟悉数据模型是起点。前面写的那个查询,做了一个假设,包 含满足查询的必要信息的表只有job_history。如果你问了前三个问题,你将发现job_history表实际上是一个历史记录表。它仅仅包含历史 数据而非当前数据。employees 表包含job_id列,表示雇员的当前职位信息。因此,为了确定一个雇员迄今做过多少职位,你需要从employees表获取当前工作职位信息还有从 job_history表获取他们的历史工作职位信息。
With this information, you might rewrite the query as shown in Listing 5-2.
有了这些信息,你才可能重写像List 5-2样的查询

Listing 5-2. The Rewritten Employee Jobs Query
SQL> select employee_id, count(*) job_ct
  2  from
  3  (
  4  select e.employee_id, e.job_id
  5  from employees e
  6  union all
  7  select j.employee_id, j.job_id
  8  from job_history j
  9  )
10  group by employee_id
11  having count(*) > 1;
EMPLOYEE_ID          JOB_CT
--------------- ---------------
            102               2
            201               2
            101               3
            114               2
            200               2
            176               2
            122               2

7 rows selected.

    It looks like the answer is correct now.  It’s at this point that the answers to the next questions come into play.  Knowing what to ask for is certainly important and the first three questions helped me
describe the data the query needed to return.  Most people would stop here.  However, knowing how to get the data I’m after is just as important.  This is contrary to what most of us are taught about relational databases in general.  In one of my college courses on relational database management systems, I was taught that SQL is used to access data.  There is no requirement that I need to know anything about where or how the data is stored or how the RDBMS processes a SQL statement in order to access that data.  In other words, SQL is used to describe what will be done, not how it will be done.  
    这 样看上去答案现在是正确了。接下来就要思考下一个问题了。知道问什么自然是重要的,前三个问题帮我确定查询需要返回的数据范围。大多数人就此为止。然而, 知道之后怎样获取数据同样重要。这与我们一般所学的关于关系数据库的认识截然不同。我的关系数据库管理系统的大学课程就说SQL是用来访问数据的。没有要 求我需要知道任何关于数据是怎样存放,存放在哪,或者RDBMS是怎样处理一条SQL语句来访问那些数据的。换句话说,SQL是用来描述做什么的,而非怎 么做的。
    The reality is that knowing how your data is stored and accessed is just as important as describing
the data your query retrieves.  Let’s say you need to book a trip from Washington, DC to Los Angeles, CA. You call your travel agent to handle the booking for you.  If the only information you provide to the agent is your departure and arrival cities and that you want the least expensive fare possible, what could happen?  Well, it’s possible that the least expensive fare involves leaving at 5:30 A.M. from Washington, DC then making stopovers in Atlanta, Chicago and Dallas before finally connecting into Los Angeles at midnight (Los Angeles time, which means it would be 3 A.M. in DC time).  Would that be OK with you?  Probably not.  Personally, I’d be willing to pay extra to get a direct flight from DC to Los Angeles.  Think about it.  If you could get a direct flight leaving from DC at 8 A.M. and arriving into Los Angeles at 10 A.M., wouldn’t it be worth quite a bit to you versus making multiple stopovers and spending nearly a full day to complete the trip?  And what if the direct flight only cost 10% more than nightmare flight?  Your original request to book the least expensive fare didn’t include any conditions under which you’d be willing to pay more.  So, your request was satisfied but you probably won’t be happy with the outcome.
    事实上,知道你的数据是如何存储和访问的和描述你的查询所检索的数据同样重要。比如说,你需要定一个旅行计划,从华盛顿到洛杉矶。你 让你的旅行代理人去给你定机票。如果你只告诉他你的起点和终点城市且你想要尽可能便宜的路费,那将会发生什么?哦,最便宜的路费可能涉及路程是早上 5:30从华盛顿出发,然后中途停留于亚特兰大,芝加哥,达拉斯,最终在半夜到达洛杉矶(洛杉矶时间,也就是华盛顿时间下午3点)。这样OK么?可能不 行,个人而言,我更愿意多花点钱从华盛顿直飞洛杉矶。想想看,如果你能够直飞的话,早上8点从华盛顿出发,早上10点就到达洛杉矶了。那不是比你多次中 转,花费一整天的时间完成整个路途更有价值?如果直飞仅仅比半夜飞贵10%了?你最初的定最便宜的票的要求没有包含任何其它的愿意多花费的条件。这样你的 要求得到满足但是结果可能让你不爽。
    Knowing how the data is stored and how it should be accessed will ensure that your query not only
returns the correct answer, but does so as quickly and efficiently as possible.  That’s why questions like “How big is the typical expected result set?”, “How is the data stored?”, and “How fast and how
frequently does it need to execute?” must be asked.  Without those answers, your query may get the
correct answer but still be a failure due to poor performance.  Simply getting the right result isn’t
enough.  To be successful, your query must be right and it must be fast.
    知道数据是怎么存储的且它将是怎么访问的,将确保你的查询不仅能返回正确的答案,而且能尽可能的做的快而有效。这就是为什么诸如“典型的期望的结果集有多 大?”,“数据是怎么存储的?” 和 “它需要运行的多快,多久运行一次?”这样的问题必须问及的原因。没有回答这些,你的查询可能能查出正确的结果,但是由于糟糕的性能而失败。仅仅获得正确 的结果是不够的。要获得成功,你的查询必须是正确的且必须是快的。

分享到:
评论
1 楼 Branding 2012-01-06  
 

相关推荐

Global site tag (gtag.js) - Google Analytics