
  <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
      <title>markokello</title>
      <link>https://markokello.com/blog</link>
      <description></description>
      <language>en-us</language>
      <managingEditor> ()</managingEditor>
      <webMaster> ()</webMaster>
      <lastBuildDate>Tue, 06 Aug 2019 00:00:00 GMT</lastBuildDate>
      <atom:link href="https://markokello.com/tags/ml/feed.xml" rel="self" type="application/rss+xml"/>
      
  <item>
    <guid>https://markokello.com/blog/classical-ml</guid>
    <title>Theory Behind Some classical Machine Learning Algorithms</title>
    <link>https://markokello.com/blog/classical-ml</link>
    <description>This blog is a follow-up to a presentation I gave at Outbox Hub, Its self-contained  and explains the mathematics and theory behind key classical machine learning algorithms.</description>
    <pubDate>Tue, 06 Aug 2019 00:00:00 GMT</pubDate>
    <author> ()</author>
    <category>ML</category><category>Statistics</category>
  </item>

  <item>
    <guid>https://markokello.com/blog/derivatives</guid>
    <title>Derivatives, Partial Derivatives, Vector and Matrix Calculus</title>
    <link>https://markokello.com/blog/derivatives</link>
    <description>.</description>
    <pubDate>Wed, 30 Oct 2019 00:00:00 GMT</pubDate>
    <author> ()</author>
    <category>ML</category><category>Maths</category><category>Deep Learning</category>
  </item>

  <item>
    <guid>https://markokello.com/blog/gradient-descent</guid>
    <title>Gradient Descent Variants</title>
    <link>https://markokello.com/blog/gradient-descent</link>
    <description>G.D is an iterative optimization algorithm for training Machine Learning models with the primary purpose of finding the optimal parameters (weights and biases) for the model. The gradient of the loss function, also known as a vector of partial derivatives, indicates the steepest direction of increase. It is then repeatedly updated by taking a step in the opposite direction. The size of the step is controlled by the learning rate. And through this process, the algorithm gradually drives the loss value lower until it converges toward a local minimum.</description>
    <pubDate>Tue, 19 Nov 2019 00:00:00 GMT</pubDate>
    <author> ()</author>
    <category>ML</category><category>Deep Learning</category>
  </item>

  <item>
    <guid>https://markokello.com/blog/information-theory</guid>
    <title>Entropy, Cross-entropy, KL divergence and Beyond</title>
    <link>https://markokello.com/blog/information-theory</link>
    <description>Entropy measures the level of uncertainty or randomness in a dataset. Information gain, in turn, evaluates how effectively a decision tree split reduces this entropy. It measures the reduction in uncertainty achieved by a particular split, helping to identify which features create the most meaningful divisions in the data and lead to better classification decisions.</description>
    <pubDate>Mon, 23 Sep 2019 00:00:00 GMT</pubDate>
    <author> ()</author>
    <category>ML</category><category>Deep Learning</category>
  </item>

  <item>
    <guid>https://markokello.com/blog/probability-distributions</guid>
    <title>A Sample of Probability Distributions and Their Properties</title>
    <link>https://markokello.com/blog/probability-distributions</link>
    
    <pubDate>Fri, 15 May 2020 00:00:00 GMT</pubDate>
    <author> ()</author>
    <category>ML</category><category>Statistics</category><category>Maths</category><category>Deep Learning</category>
  </item>

  <item>
    <guid>https://markokello.com/blog/probability-theory</guid>
    <title>Understanding and Quantifying Uncertainties Related to Random Events</title>
    <link>https://markokello.com/blog/probability-theory</link>
    <description>TDA.</description>
    <pubDate>Tue, 17 Dec 2019 00:00:00 GMT</pubDate>
    <author> ()</author>
    <category>Maths</category><category>Statistics</category><category>ML</category>
  </item>

    </channel>
  </rss>
