Simply put, a trace is the path of a single user request through your application. Traces can be configured to start at any level, but the most common setup is to start traces at your top-level load balancer or webserver. This request may touch many different services, data stores, or machines within your system and still be considered a single trace. For example, a trace may be any of the following:

  • An http request to apache and php, that causes 5 memcache lookups and 2 database calls.
  • An http request to an nginx load balancer, through to a tomcat instance, that makes parallel calls to an authentication service, a recommendation service, and a logging service. Any database access within these services is still part of the same request.
  • A custom trace around a celery task, which pulls images out of S3 and stores metadata about them in mysql.

Overhead from our lightweight tracing instrumentation is minimal; production deployments yield overhead of less than 1% of average request latency. The instrumentation adds approximately 40 µs, or 0.04 ms, per trace event recorded to the time taken to process a request. In terms of trace events, there is no such thing as a typical web request; a dynamically generated page might involve anywhere from tens to thousands of instrumented events, depending generally on the complexity of the work done by the web application.

Deploy TraceView

We created this checklist to give you exactly one place to go every time you want to start monitoring a new app in TraceView, and at the same time take you from zero to 100% successful with every component of TraceView. So while it takes only a few minutes after completing base package installation for traces to start flowing into your dashboard, you shouldn’t consider TraceView fully-deployed until you’ve also set up apdex, alerts, rum, and possibly synthetic monitoring too. We feel strongly that the performance insights you gain as a result will be well worth the time invested. Plus all of those features come with your subscription anyway so there’s no reason not to take advantage. If you have any questions or comments please don’t hesitate to reach out. There are multiple ways to get in touch with us, and we’re happy to help!

In-app configuration: The deployment path outlined below can be completed entirely via the dashboard. However, the most common setup and management tasks in TraceView can also be accomplished via our public API.

Address the pre-requisites

The performance data collected by our instrumentation modules is reported to an TraceView agent called Tracelyzer, which then forwards performance data back to our collector servers over an SSH tunnel. Use the following table to open up the required ports and protocols.

Download files hosted by TraceView
Outbound Dynamic IP 443 Any files we might ask you to download—the installation script java API jar, etc—will be hosted here.
files.tv.solarwinds.com
Download the TraceView agents
Outbound Dynamic IP 80 During initial setup and upgrades a configuration script makes an outgoing https connection to apt.tv.solarwinds.com or yum.tv.solarwinds.com. If you have a particularly restrictive policy you can just open up 80 to these servers temporarily.
yum.tv.solarwinds.com
apt.tv.solarwinds.com
Download the TraceView agent configs
Outbound Dynamic IP 443 During initial setup a configuration script makes an outgoing https connection to config.tv.solarwinds.com. If you have a particularly restrictive policy you’ll can just to open up 443 to this server temporarily.
config.tv.solarwinds.com
Report trace data back to TraceView
Outbound Dynamic IP 443 Every machine running the TraceView agent makes an outgoing SSH tunnel to a collector on port 443. If 443 fails the agent tries again on port 2222. The collector hostname can be found in the file collector.conf, in line containing ‘COLLECTOR_HOST=’. On Linux machines collector.conf is in the directory /etc/tracelyzer ; on Windows it’s in C:\Program Files\AppNeta\TraceView.
customer-specific domain name

Static configuration

TraceView collectors, repositories, and the configuration download server are all behind an elastic load balancer, which means that their IPs are subject to change. If your organization requires static IPs:

  1. Contact customer care. They will provision a static IP for your collector.
  2. On Linux systems, run install_traceview.sh with the --static option. On Windows, the installer provides a ‘user a configuration server with a static IP’ option, or for unattended installation you can specify static_ip=yes in config.inf.
  3. Use the following table to open up the required ports and protocols.
Download files hosted by TraceView
Outbound 54.243.61.243
54.243.38.48
443 Any files we might ask you to download—the installation script java API jar, etc—will be hosted here.
files.static.appneta.com
Download the TraceView agents
Outbound 54.243.61.243
54.243.38.48
80 During initial setup and upgrades a configuration script makes an outgoing https connection to apt.tv.solarwinds.com or yum.tv.solarwinds.com. If you have a particularly restrictive policy you can just open up 80 to these servers temporarily.
yum-static.appneta.com
apt-static.appneta.com
Download the TraceView agent configs
Outbound 54.243.61.243
54.243.38.48
443 During initial setup a configuration script makes an outgoing https connection to config.tv.solarwinds.com. If you have a particularly restrictive policy you can just open up 443 to this server temporarily.
config-static.appneta.com
Report trace data back to TraceView
Outbound Contact Support 443 Every machine running the TraceView agent makes an outgoing SSH tunnel to a collector on port 443. If 443 fails the agent tries again on port 2222. The collector hostname can be found in the file collector.conf, in line containing ‘COLLECTOR_HOST=’. On Linux machines collector.conf is in the directory /etc/tracelyzer ; on Windows it’s in C:\Program Files\AppNeta\TraceView.
customer-specific domain name

Proxy settings

If you want to use a SOCKS proxy to communicate with our configuration and collector servers, you’ll need to add your proxy configuration details to a new file.

Web proxies like Squid will not forward all of the connections required by TraceView. If you do not have access to a full SOCKS 4 or 5 compliant proxy direct outgoing connections must be allowed.

Linux

  1. Create /etc/default/tracelyzer on Ubuntu/Debian or /etc/sysconfig/tracelyzer for RHEL/CentOS.
  2. Add the configuration options, each on their own line as shown.

    • SOCKS5 is assumed if PROXY_PROTOCOL is not specified; supply ‘4’ for SOCKS4 proxies.
    • Username and password may be omitted if not required.
    • 1080 is assumed if no port is specified for PROXY_ADDRESS.

       # SOCKS configuration for TraceView
       PROXY_ADDRESS=username:password@proxy.server.com:1080
       PROXY_PROTOCOL=5
      

Windows

During Tracelyzer installation you’ll have a chance to configure a SOCKS5 proxy. The proxy config is stored in C:\Program Files\AppNeta\TraceView\tracelyzer.conf as shown in case you ever need to revisit it.

proxy_host=
proxy_port=
proxy_username=
proxy_password=

Install instrumentation

TraceView requires the installation of several components which collect and report the information you see on the dashboard. The first two are an instrumentation library called liboboe, and a daemon called Tracelyzer. Tracelyzer is the thing that connects to the TraceView service and does the actual reporting on requests. The methods in liboboe are used by our instrumentation modules to capture event and timing information as a request traverses your app.

The next component is web server instrumentation that captures apache httpd and nginx activity. This is not strictly necessary to take advantage of TraceView, but it will give you the most complete picture of application performance. You’ll also get real-user monitoring right out of the gate, instead of having to make additional code changes later.

Finally, there is the application language instrumentation, which is the code that makes tracing across all of our supported platforms and components possible. Nearly all of our webserver and language instrumentation is available as an add-on or module, with the exception of nginx which has no pluggable module support and so we offer a drop-in replacement. In any case all of the required TraceView code is already baked in, which means that you can get standard tracing up and running with just a few changes to your application environment.

Install instrumentation now…

Define a new app

The instrumented app will need to receive some requests before anything appears in the TraceView dashboard. In production, there is typically enough traffic already. In development or QA environments, you might have to generate some traffic manually. Just clicking around your web app will generally create enough requests to starting seeing traces.

Before we go look at them, let’s use the API to make sure that the requests you generated were actually traced. The following call will return the last_trace’ time reported back for the specified host.

curl -G "https://api.tv.solarwinds.com/api-v2/hosts" -d key=access-key | python -m json.tool | grep -B 5 -A 1 hostname

Now let’s go look at those traces in the dashboard. By default, they’ll be in the ‘default’ app, this is where all traces go until you tell the dashboard to put them elsewhere.

  1. Find the ‘default’ app in the search box in the top left corner under the TraceView logo.
  2. Select the ‘view traces’ tab.
  3. Sort the ‘seen on’ column by most recent and you should see your traces.

traceview-default-app.png

Create a new app

In TraceView, an ‘app’ is the combination of a host plus entry layer. An entry layer is the first component to see the incoming request, usually your webserver. Initially TraceView doesn’t know how to meaningfully separate all the instrumented hosts and entry layers it sees, so it dumps everything into one bucket called ‘default’. It’s critical that you define separate buckets for each of your applications, so resulting performance analysis isn’t based on commingled data. In addition, some features like total request count aren’t available in the default app.

  1. Click on the settings icon next your profile name in the upper right corner.
  2. From the drop-down select ‘app configuration’.
  3. Click ‘new app’.
  4. Name your app and click ‘save’.

traceview-app-config.png

Assign hosts to your app

Next you’ll need to add your instrumented-host/entry-layer combo to the newly created app. Once a host+layer has been moved, traces will follow to their new home, but old traces will remain in the default app.

  1. On the app configuration page, go to the ‘default’ app and hover over the entry layer that belongs to the application you just instrumented.
  2. The number in parenthesis is the number of hosts tracing for that layer. Any of these can be moved. If there’s only one host, click the ‘move’ button to the right. If there’s more than one, click on the row to list the hosts, and then select the one you want to move.
  3. The subsequent prompt will instruct you to move the host+layer to a new app. To do so, click on the app you created in the preceding section.

If you’re interested you can go back to the command line and look at what we just did via the API. The following method returns the hosts assigned to the specified app.

curl -G "https://api.tv.solarwinds.com/api-v2/app/app-name/hosts" -d key=access-key | python -m json.tool

traceview-host-assignment.png

Tag your apps

If several teams are sharing the same organization, the number of apps in your organization could grow quickly. Tagging your apps keeps things manageable by helping identify different environments and enabling you to filter down to just the apps you’re interested in. Tagging also has a direct impact on billing. Use the pre-defined tag ‘staging’ or ‘development’ for all pre-production apps so the hosts in those apps are properly excluded from billing. Hosts in all other apps will be subject to billing; read more about it here.

  1. Find your app on the overview page, and click on the ‘plus’ button to tag it.
  2. Select an existing tag or create a new one by clicking ‘manage tags’.
  3. Create or select a view from the left-side panel.

Double-check your setup

Some mis-configurations will manifest immediately, while others will take some time. We recommend that you verify the following points now, but also go back over them later.

  1. Do all of your application hosts have TraceView instrumentation installed? You can use the hosts or hosts-by-app API methods to confirm.
  2. Are there any hosts in the configuration errors section of the app config page. If so, those hosts don’t have a valid app id.
  3. Are there any inactive or unassociated hosts? Ideally, all hosts for your application will show in the active hosts tab. If any are missing, you can check the other two tabs. Inactive hosts are those that haven’t received any traces the last hour; if this state persists, try the traces aren’t showing article. Unassociated hosts are those that have Tracelyzer installed but no instrumentation; in this case try pickup where you left off in installation overview.

Configure apdex

Apdex is an industry-standard method for reporting and comparing application performance in terms of end user experience. It uses a simple formula to represent user satisfaction as a single number, which is called your ‘apdex score’.

Apdext = [Satisfied Count + (1/2 x Tolerating Count) + (0 x Frustrated Count)] / Total Samples

Satisfaction to end users means responsiveness. Snappy page loads and interactions. You define satisfaction for your app in terms of two thresholds. The first is maximum number of seconds an interaction may take to be considered performant. Beyond this performance is only tolerable. The second threshold is the maximum number of seconds an interaction may take before the user becomes frustrated. With these two thresholds in hand, every interaction can be categorized as either satisfied, tolerating, or frustrated. Each one of those categories is given a weight. You can see above that tolerable interactions are reduced by half and frustrating interactions are nullified. What you end up with then is a single number that represents the fraction of users that are satisfied with the responsiveness of your app.

Three different apdex scores

Apdex can viewed and configured from the left-side navigation panel on the TraceView dashboard. The apdex page shows up to three numbers: app server, end user, synthetic.

traceview-apdex.png

App server score

The app server score is calculated using server-side information collected through TraceView. Satisfaction thresholds are based on the time to send a response to a single http request. This includes time spent passing through proxies, making database calls, and calculating results. Poor app server satisfaction is typically the result of internal factors, like un-optimized database queries or high server load. External factors that affect your app server apdex score are limited, and usually associated with user-submitted data or queries. Setting app server apdex thresholds is most useful for determining an endpoint service level agreement. App server latency is typically just a portion of the latency experienced by human users, but it accurately represents the experience of ‘users’ that are other programs. You might want to work with other teams whose software consumes your APIs to determine apdex thresholds that reflect what they expect from your app.

End user score

The end user score is calculated using client-side information collected through real-user monitoring. Satisfaction thresholds are based on the time to finish rendering a single page. This time includes the time used to calculate the app server score, but also network time to make and retrieve requests, and browser time to render the page. Poor end user satisfaction can be caused by anything affecting app server apdex. It can also be the result of other internal factors, like poor content delivery strategies. However, it is most often affected by external factors like user agent, client-side caching, and end user network connections. Setting end user apdex thresholds is most useful for determining the average user experience when visiting a particular page of your site, such as a product gallery. However, real user monitoring data does not account for the order that you visit pages in, such as logging in which, results in more data being loaded on your next page view.

Synthetic score

The synthetic score is calculated using AppView synthetic monitoring. A sequence of interactions with a page, each of which might trigger new requests. Poor satisfaction is more likely to be the result of internal factors, either server-side or in client-side architecture. Tighter controls on execution environment mean that there is less variation between clients, and the action performed is much more consistent than user behavior. However, network conditions can still vary, which means that some external factors can still affect satisfaction. Setting synthetic apdex thresholds is most useful for calculating satisfaction with a complete workflow that users are expected to perform, like buying an item on a website or logging into an online course. RUM data can be a powerful tool in setting per-page synthetic apdex thresholds because it records what users are experiencing today, and users often expect performance to remain consistent.

Apdex thresholds for all transactions

By default one set of thresholds is applied to all requests to an application. Click the gear in the upper right to open the threshold configuration form.

apdex-thresholds.png

Apdex thresholds based on use case

By default one set of thresholds is applied to all requests to an application. However, not all transactions/endpoints in your application have the same demands or performance characteristics. For that reason, it can be helpful to create a one or more subsets of related transactions, defined by the http urls and domains involved, and apply separate apdex thresholds to them. For example, you might like to track a key transaction like signup or checkout, separate performance urls from internal administration urls, or calculate apdex separately for API calls and interactive page loads.

apdex-groups.png

Configuration of match criteria involves a simple regex-like language:

  • asterisks (*) can be used as a wildcard character, meaning one or more of any character

    • ‘*.domain.org’ to match all sub-domains on domain.org
    • ‘/user/*/profile’ to match all user profile pages
  • commas (,) can be used to separate multiple patterns with an OR relationship

    • ‘domain.org, www.domain.org’ to match two domains
    • ‘/cart/*,/checkout,/success’ to match order-processing pages

Because the criteria are based on pattern-matching, depending on how your rules are written, there could be multiple matches. The following match priority is used to determine which group the request falls into in case multiple criteria match a single request url:

  • The criteria are evaluated from top to bottom and the first match wins.
  • If no matches are found, the request is not part of any apdex group.

Set up alerts

TraceView provides proactive alerting on various aspects of application performance. They’re similar to other monitoring solutions, with one important difference: instead of keeping track of machine availability, network connectivity, or other low-level metrics, alerts measures what matters most, your website’s performance. Performance is more that just latency distributions and sparklines, it is a direct measure of how effectively you are serving your content to users. Because of this, measuring, tracking, and keeping you up-to-date on your website’s responsiveness is a more relevant metric of availability than availability itself.

  1. From the top navigation menu click ‘alerts’, or go to https://<your domain>.tv.solarwinds.com/alerts#/.
  2. Click ‘create an alert’.
  3. An alert is comprised of a name, a metric it is watching, one or more apps it applies to, and email addresses to which notifications should be sent.

    1. Start by choosing a name that will quickly identify the alert, as it will be used in the subject line of the emails you’ll receive when it triggers.
    2. Apply the alert to some or all of the apps in your organization. If you choose multiple applications, an email is sent every time an app violates the alert condition.
    3. Choose one metric. The preview chart helps you choose a threshold that makes sense for your apps; the red areas indicate times that the selected metric would have triggered the alert.

      • Application performance, errors, and http status code conditions are evaluated against the last 15 minutes of data.
      • Alerts with host metric conditions apply to all hosts in specified applications.
    4. (Optional) For particularly noisy metrics, you have to option to turn on smoothing, which dampens short-term fluctuations making alerts less likely to trigger.
    5. Enter the email addresses to which violation and clear notifications will be sent.
    6. (Optional) Latency and error rate conditions can be applied to particular layers or urls via normal TraceView filtering options.
  4. Click ‘save alert’. The alert is now active. You’ll receive one email when the alert condition violates and another when it has cleared.
  5. To see the last time that an alert was triggered, check the alerts page in TraceView.
  6. Click ‘alerts’ in the left-side panel to see the alerts applied to a particular app.

Configure RUM

The effect of end-user page load latency on conversions, perceived user experience, bounce rates, etc. is well-documented. Total latency is determined by three main components: the client’s browser, the network, and the servers responsible for providing the data. While often the most pernicious and difficult to optimize performance problems occur on the server side, most of the total latency actually occurs in your end users’ browsers! The ratio of time spent in the client’s browser fetching and rendering resources versus server-side computation can be as high as 80/20. It’s crucial then to understand the full end-to-end pageload experience from the perspective of real users around the world.

Real-user monitoring (RUM) gives you two things:

  1. Client-side performance data for pages served by your app:

    Network Time
    the time the browser spent requesting the initial page load from your app and then receiving the response over the wire
    Server Time
    the time spent waiting on the application backend to return results.
    DOM Processing Time
    the time spent assembling the DOM; this corresponds to the jQuery document.ready() callback.
    Page Render Time
    the time between DOM assembled to page fully loaded.
  2. Break down this data by page, geography, browser, etc.

For example, RUM on our own app returns:

10.136.8.159 - - [20/Jan/2016:22:46:10 +0000] "GET /g0oqkjU0YJH1ZEPKm2QQ6A4an0o=/__tl.gif?url=https%3A//vpt1.pathviewcloud.com/pvc/emberview/webApplicationOverview/webApplications&v=0.3&xt=1BDA1419B4359088FBD184E7F4674C0C85AF35974005418BDD494A6FD2&vid=8TMLfRfUxbMTiSI3&ets=domload%3D2007%26winload%3D2070 HTTP/1.1" 200 43 "https://vpt1.pathviewcloud.com/pvc/emberview/webApplicationOverview/webApplications" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10) AppleWebKit/600.1.25 (KHTML, like Gecko) Version/8.0 Safari/600.1.25" "c=avw&x=562979125590149&p=4708" "209.139.228.33"

Here it is again, refactored for readability. Note that URL query parameters are reported back to the collector.

User IP: 10.136.8.159
Timestamp: 20/Jan/2016:22:46:10 +0000
Request URL: "GET /g0oqkjU0YJH1ZEPKm2QQ6A4an0o=/__tl.gif?url=https%3A//vpt1.pathviewcloud.com/pvc/emberview/webApplicationOverview/webApplications&v=0.3&xt=1BDA1419B4359088FBD184E7F4674C0C85AF35974005418BDD494A6FD2&vid=8TMLfRfUxbMTiSI3&ets=domload%3D2007%26winload%3D2070 HTTP/1.1"
Response: 200 43 "https://vpt1.pathviewcloud.com/pvc/emberview/webApplicationOverview/webApplications" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10) AppleWebKit/600.1.25 (KHTML, like Gecko) Version/8.0 Safari/600.1.25" "c=avw&x=562979125590149&p=4708" "209.139.228.33"

Setting up RUM

Our approach is designed to capture performance data seamlessly in any client browser. It relies on a hybrid approach based on the Navigation Timing API and Steve Souders’ Episodes. To setup up RUM data collection for your app you’ll need to insert two scripts into your page templates: one in the <head> tag, and one at the bottom of the <body> tag. These scripts that capture timing events as the page loads. After the page has loaded, we asynchronously report that timing data back to our beacon server. This means you’re getting real browser performance, with no penalty to your users. Data shows up nearly real-time in the dashboard.

30-second maximum: RUM data is particularly sensitive to performance outliers due to unreliable networks and underpowered clients (cellphones, etc). For that reason, pageload latency in excess of 30 seconds is clamped to that target in our representation of the data.

There are two ways to insert the required scripts into your pages:

  • Making code-level changes manually to your page templates, OR
  • Enable our auto-RUM feature, which automatically inserts the requisite scripts at the webserver-level.
Enable auto-RUM

Cases where auto-RUM isn’t available: There are some cases in which auto-RUM is not available. See the support matrix.

  1. Update your apache, nginx, or .NET instrumentation.
  2. Remove the RUM js from your templates if you added them manually.
  3. Head to the real user monitoring settings page, and enable it for the apps in question.
Making code-level changes manually

If you’re not using a webserver that supports auto-RUM, you’ll need to enable RUM by manually updating your page templates. See the language-specific instructions below.

RUM + client-side templates

Single-page app developers often employ the technique of client-side templating to achieve a particular user experience. Since in-line partials, e.g., <script type=”text/html”> are generally transferred by the web server as part of a larger html document auto-RUM will have no effect other than to inject the necessary javascript into the parent html document.
However when partials are lazy loaded on demand from a webserver employing auto-RUM their mime-type must be taken into consideration: partials served on demand with a mime-type of ‘text/html’ on a webserver with auto-RUM enabled will be subject to RUM javascript injection which will incur unnecessary overhead and could break client side rendering.
To ensure both partials and complete html documents are treated correctly partials should be served with an alternate mime-type, e.g., ‘text/x-handlebars-template’.

RUM and client cookies

Our RUM instrumentation uses cookies to aid in measuring browser performance, like Google Analytics and other browser-based user analytics services. We use only first-party cookies, not third-party cookies, which means that all cookies used for RUM performance measurements for your domain send data only to the servers for your domain. The cookie used by RUM begins with an underscore and is named ‘_tly’. It’s set by the client, not the backend, and thus will not appear as a Set-Cookie header in the HTTP response. Varnish users can follow this official cookie configuration tutorial for an example of how to prevent underscore-prefixed cookies from skipping the cache.

Synthetic monitoring

TraceView is integrated with AppView which means that TraceView users can take advantage of automated end-user monitoring that’s directly related to server and network performance data. There are a couple ways to set this up, you can click ‘synthetic user in the left-side navigation panel, or you can click ‘synthetic monitoring’ from the trace detail page. The latter is preferred if it’s your first time working with AppView because a portion of the workflow will be completed for you automatically.

Free trial! Try synthetic-monitoring even if you don’t have AppNeta appliances or an AppView license. Every organization gets a free trial of AppView with global web monitors.

  1. From the overview page, click through to the trace details page of the domain and url you want to monitor.
  2. At the top of the trace details page, click ‘synthetic monitoring’.

    /files/traceview-synthetic-1.png

  3. You’ll be taken directly to AppView, where an app has been partially defined. The target domain has already been learned from information about the request in TraceView, and similarly a script which fetches the same requested resource has also been created. All that remains is providing the geographic perspective for monitoring.

    /files/traceview-synthetic-1.png

  4. Click the ‘plus’ button to add a monitoring location. The subsequent pop-up might be empty if you don’t already have physical appliances or web monitors. In this case, click the green ‘request global web monitor’ button.

    /files/traceview-synthetic-1.png

  5. Choose a location and click ‘submit’. You’ll be taken back to the web applications page, where you’ll need to click the ‘plus’ button again. This time your web monitor will be listed.
  6. Select your web monitor and click save. Results will arrive in 5 minutes.
  7. At this point you can return to TraceView, and your synthetic apdex score will update when results become available.