Wednesday, January 20, 2010

Random Stimulus - Seed Vs Transactions

Any one can easily understand how Constrained Random Coverage Driven Verification works by referring the good text books or attending the trainings. But you learn certain things only when you do the real verification. Testsuite optimization is one of those gray areas for the beginners.

I do not want to write new articles just for the sake of keeping my blog alive. Sometimes I take my own time and prepare myself to write about some new topics, especially the unusual but important ones. As this topic "Seed Vs Transaction" demands not only my experience, but my learnings from others experience too. I have actually interacted with many of my customers and
peers and collected lot of details about various things like seeds, number of transactions, redundant tests, testcase run time etc.

verification folks who are new to CDV usually ask, which option [1 or 2] is good for achieving 100% coverage:

[1] Minimum number of seeds and maximum transactions per testcase
[2] Minimum number of transactions per testcase and many seeds

Either way you can improve the coverage but neither will get you 100% coverage, as they will definitely reach saturation limit at some point of time.

Considering the productivity, means improvement in the coverage, a testcase with 5 seeds could be same as running the same with even 500 seeds. Similarly a testacse generating 1000 transactions could give the same productivity with even 10000 transactions or sometimes it might improve the coverage hardly 1%.

It's always good to consider various factors like the simulation run time, DUT features, Unusual bugs etc. while defining the testcase, rather than just focusing only on reaching coverage goals.

Let me brief how these factors influence us to choose between seed and transactions.

* Run time - We decide the number of transactions based on the simulation run time of a testcase. Run time is a critical factor which decides how quickly we can reproduce the bug. Once we hit the bug, we want to rerun the testcase and reproduce it to debug the design. Ideal testcases consume 10 -15 minutes of run time.

Longer Tests - Also note that shorter testcases are not the ideal ones. Sometimes we need to pumpin more transactions to the DUV to reach it's deeper states. FIFO overflow, Suspend Data, Loop around etc. features can be verified only by running longer testcases

* Effective Seeds

If you are looking for shorter testcases, running hundreds of seeds would help but still you need to identify the effective test/seed pairs. One should measure the productivity of each testcase and mark the redundant tests as low ranking tests.

* Experience

Learn from others experience. If you are working on new version of the legacy DUT, it's good to interact with the verification folks who verified the legacy. They can tell you exactly what really worked, not worked, what kind of bugs, when they found bugs, how they debugged etc. Why don't you at least scan through the postmortem report of the legacy verification?

Monday, January 4, 2010

Debugging FSM Dead Lock

Most of the verification engineers start their career as an HVL expert. They reaslise the importance of Assertion Based Verification, only when they become seasoned verification engineers, especially when they take the complete ownership of RTL sign-off.

We all know that assertions are good for implementing the protocol checkers. But only very few in the industry knows how to use assertions to debug the design issues. I am writing this article mainly to motivate the designers to learn ABV and help the verification folks to debug the design easily.

Whether you are a desgn or verification engineer, you should know "Debugging the testcase failures becomes nightmare, if there are no assertions embedded in the design".

Let us take a look at controller verification. A complex controller is usually implemented as FSM, composed of many states. It's very important to make sure that the FSM does not get into any dead lock condition. Dead lock means FSM loops into a particular state/states forever due to an unexpected condition. How can you debug when the simulator hangs due to FSM dead lock?

You would get only timeout error,only if your testbench is really smart enough to kill the simulation, using some watch dog timers. But still it does not tell you why it happens.

But one can easily identify even this kind of FSM dead lock issue using assertion. Assertion languages have the feature called 'Eventually' and using the strong flavour of the same one can easily write the assertion and capture this bug.

PSL Example:

Property_Dead_Lock_Check : assert always ( (INIT_STATE) -> eventually! (INIT_STATE) ) @ (negedge clock);

Since the simulator hangs, you have to kill the simulation process which is running forever. The easiest way is 'CTRL+C'. This assertion fails and produces error message beautifully when the OS terminates your simulation and more importantly it tells you exactly which IP, which cluster and which instance of the FSM module has the issue. What else you want to debug your SoC?

Saturday, November 7, 2009

Are you scared of Lay-Offs?

If you say "Ofcourse everybody is scared of layoffs", then I would say "You are probably wrong". You get this insecured feeling only when you work on outdated technologies and do the same thing which you were doing 10 years back. This happens to young folks too, when they do not spend time on updating their knowledge in the emerging technologies.

People who learn continuously and try new technologies are treated as STARs in the organisations. They do not worry about these recessions and layoffs because they have huge demand in the industries. These people are SMART. They always think how they can improve their Market Value.

If you are working in the VLSI domain, especially in the functional verification domain, you should know about the latest verification methodologies and technologies. Most of the engineers run the regressions and spend most of their time on analysing the coverage reports. They wrongly assume that they are verifying the chips. Actually they are managing the regressions and reporting the bugs to the designers.

To help you to understand how much you know about verification, I would like to ask you few questions,

[1] Have you ever created the verification plan?
You can't do anything without the plan whether its about designing the chip or verifying it. During the planning process we identify various things like key features of the DUV, beta features, how many assertions, how to validate the DUV protocols etc.

[2] Have you ever architected the testbenches?
Verification engineers mostly use the HVLs to implement the testbenches. Usually the testbench is composed of various verification components like generators, monitors, scoreboards,receivers etc.

The architecture of the testbench completely depends on the design and the kind of verification you do.


[3] Have you created the coverage model?
One can measure the quality of the verification by looking at the functional coverage values. Achieving 100% coverage does not guarantee the high quality verification.The quality of your verification completely depends on the completness of your coverage model.


[4] Have you defined assertions to validate the DUV protocol?
You cannot verify everything through data integrity checks that you do in the scoreboard. You have to define assertions to validate the control oriented behaviors. One can easily do the white box verification using ABV, especially for the critical blocks in the chip.

This biggest challenge of chip level simulation is identifying the reason for the testcase failure. We spend too much time to identify the cause, which logic has bugs...


[5] Have you created the regression testsuite?
This is more about defining the testcases. One can define different kinds of tetcases like random tests, corner case testcases and directed testcases. We create these testcases by changing the seeds, generating different kind of transactions/scenarios and passing the directed values.

If you feel that you haven't done any of the things which I mentioned here, you really need to think about learning the Functional Verification process and SystemVerilog, the industry preferred IEEE standard Hardware Verification Language.





Thursday, February 5, 2009

How to live with Legacy BFMs?

Every time when I talk about the class based verification environment, most of my customers ask questions curiously about using HDL based legacy BFMs in SystemVerilog based testbenches. Many of my customers even approached me to convert their legacy BFMs into SV based transactors. From my experience, I would say you can’t easily exclude the legacy VIPs/BFMs while architecting the SV based TBs.

The SV based TB is completely based on Object Oriented Programming and it will use only *Classes*. Though class based testbenches are very complex, the actual challenge would be building them using legacy BFMs. One needs to understand he can't directly instantiate the Verilog modules in his class based verification environment because modules are static and classes are dynamic type of constructs.

Usually the chip will have diffrent kinds of standard interfaces that would be driven by some of the third party VIPs and internally developed BFMs. The VIPs from the external vendors are typically encrypted. Here the challenge is, if they are HDL based VIPs, then you can't directly use them as transactors in your SV TB. Also you can't re-write them as transactors because you have access only to the user interface. In this case, the only possible way is you should develop a SV wrapper on top of the VIP and convert it into transactor.

If your chip uses some of your internally developed BFMs, you can easily re-architect them as transactor. In some cases writing SV wrapper would be tougher and time consuming than rewriting them as transactors from the scratch. If you are sure that your BFM will be used by most of other long term projects, then you may want to consider the option, re-architecting it as transactor.

To understand how to convert the module based BFMs into SV based transactors, please refer:

Verification Methodology Manual
chapter4: Testbench Infrastructure
- Ad-Hoc Testbenches
- Legacy Bus-functional Models

Monday, January 19, 2009

Accelerate your verification

Why my regression testing consumes more time? How can I reduce the run time of my regressions? Verification engineers usually ask these questions to EDA vendors. They even push the EDA vendors to increase the speed of simulator as much as possible. Yes its possible they can tune the simulator engine and achieve more performance. But this performance gain can't reduce week long regressions into hours. After all your simulator is a software that is executing everything sequentially on the processor.

You have increased the speed of simulator to its maximum limit. You are making the LSF load free and running the simulation. Still If you feel that your simulation is dead slow, then you need to think of your verification methodology and analyze your simulation process .

There are various other factors mentioned here impact the performance of your simulator.

Design Abstraction
-Behavioral models do not work at the signal level. They run much faster than RTLs and netlists.
- Well proven RTLs/IPs at the system levels can be replaced by their functional models.
- Verilog netlists are better than VITAL.
- Memory modeling - Huge memories can be modeled efficiently using dynamic memories. Memories modeled in HDLs occupy the RAM.

Testing mode
- Are you running simulation on performance mode or debug mode? - Refer my blog "Slow and Fast simulators"
- Assertions are good for 'White Box Verification' but they slow down the simulation speed.
Assertions at the subsystem/module level can be disabled for the SoC verification, especially for the regression testing.
Assertions that verify the interfaces, port connections and system protocols are sufficient to verify the system.
Based on the testcase failures, one can rerun the simulation in debug mode by enabling the assertions of buggy modules.
- Avoid using simulator TCL commands to generate stimuli.
Using TCL commands like 'force' need read & write access permissions which again reduces the performance
Enable the read/write access permissions selectively based on the need Ex: PLI access to particular design instance does not need read/write permission for the entire system.
- Avoid dumping log file and doing post processing
Testbench should be self-checking
- Avoid the compilation of DUT for every testcase

Verification Methodologies
- Avoid using HDLs for implementing complex testbenches
Ex: Testbench that needs to generate transactions like frames, packets etc
- Avoid using ad-hoc & traditional methodologies
Ex: Using C/C++ language based protocol checkers/ testcases, Using PLIs, Creating random stimuli using C functions etc.
- Use standard HVLs like SYSTEMVERILOG that works seamlessly with HDLs, without using PLIs. It provides DPI, OOP, Assertions, CDV etc., almost everything that you need for your verification ...

Most of the time we need to make use of the legacy testbenches. I also agree that you can't easily move away from the existing verification stuff. But if the project is a long term project, I urge you to consider seriously on re-architecting the testbench using latest verification technologies. One needs to plan meticulously on introducing the new methodologies. The best approach would be trying out these technologies on existing IPs and introducing them step by step.

Wednesday, December 31, 2008

Happy New Year

Another Day, another Month, another Year, another Smile, another Tear, another Winter, A Summer too, But there will never be Another You!

As we get into the year 2009, I offer my most heartfelt and respectful wishes to you, your team at work and your family.

Thank you for your great support and faith in me.

Cheers
Siva

Thursday, December 18, 2008

Verification Sign-Off


Your project manager wants to know how much more time you require to complete the simulation. His management wants to know when he can sign off the verification. Marketing folks are very keen on finding the status of the product. The common objective of all these stake holders is to release the product on time and meet the TTM. So everybody needs some information to track the status of the product.

In the verification world, usually the engineers begin their learning with the term *COVERAGE* and they explore more on Coverage Driven Verification [CDV] as they grow as seasoned verification engineers. Coverage information is mainly used to track the functional verification process. There are different kinds of coverage information, like functional and code coverage.

Functional coverage information indicates how well the functional features of the design are verified and code coverage measures the quality of the stimulus. One needs to define the coverage models and assertions manually to generate the functional coverage but code coverage is automatically generated by the simulator.

Instead of dumping you with the definitions of all coverage metrics’, I would like to show how we make use of the coverage information to sign off the verification.

Let us take a small and powerful example *Synchronous Counter* and explore the CDV. Let us assume that we are verifying 32 bits counter. We need to make sure that the counter counts 2 to the power 32 [ 2 ^ 32 = 4294967296] possible values. One needs to spend billions of clock cycles to verify this design. Instead of running the counter through all possible values, why don't we load the counter with the random values and verify its functionality.

Let us use four bits counter end explore how this concept really works.
---------------------------
3-2-1-0 --- Bits postion
---------------------------
0000
0001
0010
0011
0100
......
0111
1000
......
1111
0000
------------------------
When the LSB [0th bit] is '1', 1st bit toggles from 0 to 1 on the active clock edge. Similarly when the 0th and 1st bits are '1', the 2nd bit toggles from 0 to 1 and so on. If you look at this sequence carefully, you can understand that one can verify the counter easily by making each bit to toggle.

Now let us go back to the '32 bits' counter. As the counter has billions of possible states, load the counter with random values and run. Every time when you load the counter, run it for a clock cycle and check how the bits are toggling. The random values are very effective on catching the bugs quickly, especially when the design is very complex.

To track the functional features of the counter, generate functional coverage by creating the coverage model with different bins as,

----------------------------------------------------------------------------------
BINS--VALUE --------FEEDBACK INFO
----------------------------------------------------------------------------------
MIN---[0]----------------Whether counter works properly in Zero state
MID1--[1-1000]---------Whether counter has gone through at least one of these values
MID2--[1001-10000]--Whether counter has gone through at least one of these values
...............
......................... [Create as many bins as required] ...........................
...............
MAX---[4294967296]--Whether counter has reached its maximum value
----------------------------------------------------------------------------------

These bins will count when the random values generated by the simulator are within the range of their definitions. If all of the bins have hit at least once, then the functional coverage becomes 100%. But it does not mean that you verified the counter completely. You also need to check whether all the 32 bits are toggled during simulation. When you generate random stimulus, there may be a lot of repetitions. So you need to analyze how much it is exciting the design.

When code coverage metrics' are enabled, especially the toggle coverage, it makes sure that each bits toggle from 0->1 and 1->0. If all the 32 bits toggle, then the code coverage becomes 100%. This coverage clearly indicates the quality of the random stimulus.

Functional coverage is mainly for tracking the functional features of the design where as the code coverage is mainly for checking the effectiveness of the testcases. So one needs to look at both the coverage information to sign off the verification process. But expecting 100% coverage or less than that depends on the design feature, complexity of the coverage models and metrics’ and more importantly the time that you can spend for the verification.

Obviously your project manager will be happy when you report 100% coverage but he will be excited more when he releases the product on time, without re-spins.